# 🧠 SuffolkGPT — Retrieval Agent Challenge (INTERVIEWER VERSION)

This Colab notebook is designed for **live AI Engineer interviews** at Suffolk.
It evaluates the candidate’s ability to design and implement a **hybrid retrieval system** that adapts to different document types and query intents.

## What is Suffolk GPT?
**Suffolk GPT** is an internal assistant for Suffolk Construction. It must answer employee questions across three kinds of content:
- **HR policies** (narrative text): PTO carryover rules, parental leave, benefits, Workday processes.
- **Construction docs / glossary** (ID & acronym heavy): project IDs like `BOS-774`, references such as `RFI-1034`, acronyms `PCO`, `ASI`.
- **Tickets** (short, noisy snippets): troubleshooting like SSO loops, login issues, browser quirks.

This mix is tricky: HR needs **meaning** understanding, Construction needs **exact** ID/acronym matching, Tickets benefit from a **balanced** approach.


⏱️ **Timebox:** ~45 minutes




## ✍️ Q1 — Architecture Flow (answer before coding)

**Question:**
Sketch or describe the **end-to-end architecture flow** for how SuffolkGPT should process a user query — from input to final answer.  
Outline the **key components** (e.g., intent detection, weight selection, retrieval, LLM answering) and how they interact.  


➡️ **Reference (for interviewer):**  
LLM detects intent → map to α (semantic vs keyword weight) → hybrid retriever fuses BM25 + embeddings → top-k docs passed to LLM for grounded answer → clarify path for vague queries.

```
User Query
   │
   ├──► LLM: Intent Understanding
   │        Returns one of: {hr | construction | tickets | clarify}
   │        If "clarify": ask a follow‑up (no retrieval yet)
   │
   ├──► Weight Selector (from intent)
   │        Example: hr → semantic‑heavy, construction → keyword‑heavy, tickets → balanced
   │
   ├──► Hybrid Retriever
   │        Uses both keyword (BM25) and embeddings (MiniLM)
   │        Combines normalized scores with chosen weights
   │
   └──► LLM: Answer Synthesis
            Generates a concise answer grounded in retrieved context
```


---
## ⚙️ Step 0 — Setup (run once)
Installs the few packages we need. **Run this in Colab.**

In [4]:
# ⚙️ Step 0 — Setup (run once)
# -------------------------------------------------------------------
# This cell installs only the minimal dependencies we need.
# It avoids torchvision / torchaudio conflicts that can break Colab.
# Run once at the start of the session.

# 1️⃣  Remove any conflicting packages
!pip -q uninstall -y torchvision torchaudio || true

# 2️⃣  Install only what we need (small, stable versions)
!pip -q install -U \
    torch==2.3.1 \
    transformers==4.44.2 \
    sentence-transformers==2.7.0 \
    rank-bm25==0.2.2

# 3️⃣  Force Transformers to skip torchvision entirely
%env TRANSFORMERS_NO_TORCHVISION=1
import os
os.environ["TRANSFORMERS_NO_TORCHVISION"] = "1"

# 4️⃣  Import and sanity-check models (no pipelines to avoid import traps)
import torch, numpy as np
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from sentence_transformers import SentenceTransformer
from rank_bm25 import BM25Okapi

print("numpy:", np.__version__)
print("✅ Setup complete — models ready (FLAN-T5-Small + MiniLM-L6-v2)")


[0menv: TRANSFORMERS_NO_TORCHVISION=1
numpy: 2.0.2


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]



config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Setup complete — models ready (FLAN-T5-Small + MiniLM-L6-v2)


## 📚 Step 1 — Sample Data
Four tiny documents representing Suffolk content:
- 2 HR policy texts
- 1 Construction glossary (IDs/acronyms)
- 1 Tickets bundle (3 short issues)

In [5]:
hr_policy_1 = '''HR Policy: Paid Time Off (PTO)
Full-time employees accrue PTO each pay period. PTO requests should be submitted at least 7 days in advance via Workday.
ALL unused PTO carries over up to the annual cap.'''

hr_policy_2 = '''HR Policy: Parental Leave
Eligible employees may take up to 12 weeks of paid leave. Coordinate with HR for approval and expected return date.
Benefits continue during leave per company policy.'''

construction_glossary = '''Construction Glossary
Project IDs: SJ-1029, BOS-774, PCO-22, RFI-1034
Acronyms: RFI (Request for Information), PCO (Potential Change Order), ASI (Architect's Supplemental Instruction)
Usage: "Please submit RFI-1034 for the curtain wall question."'''

tickets = [
    {'id': 't-001', 'text': 'User cannot log in to Workday; SSO loop in mobile Safari. Suggested fix: disable content blockers and retry.'},
    {'id': 't-002', 'text': 'Employee asks how many PTO days carry over at year end.'},
    {'id': 't-003', 'text': 'Project BOS-774 requires submitting an RFI for façade details. Mention RFI-1034 and PCO-22 in notes.'}
]

DOCUMENTS = [
    {'id': 'hr_policy_1', 'type': 'hr', 'text': hr_policy_1},
    {'id': 'hr_policy_2', 'type': 'hr', 'text': hr_policy_2},
    {'id': 'construction_glossary', 'type': 'construction', 'text': construction_glossary},
    {'id': 'tickets', 'type': 'tickets', 'text': '\n'.join([f"[{t['id']}] {t['text']}" for t in tickets])}
]
print('Docs loaded:', [d['id'] for d in DOCUMENTS])

Docs loaded: ['hr_policy_1', 'hr_policy_2', 'construction_glossary', 'tickets']


## 🤖 Step 2 — Load Models
- **LLM (intent + answer):** `google/flan-t5-small`
- **Embeddings:** `sentence-transformers/all-MiniLM-L6-v2`

In [20]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from sentence_transformers import SentenceTransformer
import torch

# --- Load a stronger LLM for both intent & answer ---
_tok = AutoTokenizer.from_pretrained("google/flan-t5-base")  # stronger than small
_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")

def _generate(prompt: str, max_new_tokens: int = 64, do_sample: bool = False):
    """Wrapper that mimics pipeline output format."""
    inputs = _tok(prompt, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        out = _model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=do_sample
        )
    text = _tok.decode(out[0], skip_special_tokens=True).strip()
    return [{"generated_text": text}]

# --- Assign to match existing code names ---
intent_llm = _generate
answer_llm = _generate

# --- Embedding model unchanged ---
embed_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

print("✅ Models loaded: FLAN-T5-Base as intent/answer + MiniLM-L6-v2 for embeddings")


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

✅ Models loaded: FLAN-T5-Base as intent/answer + MiniLM-L6-v2 for embeddings


## 🧰 Step 3 — Tokenizer & Indexes
Build BM25 for keyword search and an embedding matrix for semantic search.

In [21]:
def tokenize(text: str):
    return re.findall(r'[A-Za-z0-9\-]+', text.lower())

BM25 = BM25Okapi([tokenize(d['text']) for d in DOCUMENTS])
EMB = embed_model.encode([d['text'] for d in DOCUMENTS], normalize_embeddings=True, convert_to_numpy=True)
print('✅ BM25 + Embeddings ready')

✅ BM25 + Embeddings ready


## 🔎 Step 4 — Retrieval Functions
Plain keyword and semantic retrieval helpers.

In [22]:
def keyword_retrieve(query: str, k=3):
    scores = BM25.get_scores(tokenize(query))
    top = np.argsort(scores)[::-1][:k]
    return [(float(scores[i]), DOCUMENTS[i]) for i in top]

def semantic_retrieve(query: str, k=3):
    qv = embed_model.encode([query], normalize_embeddings=True, convert_to_numpy=True)[0]
    sims = EMB @ qv
    top = np.argsort(sims)[::-1][:k]
    return [(float(sims[i]), DOCUMENTS[i]) for i in top]

print('✅ Keyword & Semantic retrieval ready')

✅ Keyword & Semantic retrieval ready


## 🏷️ Step 5 — LLM Intent Classification
Small, strict prompt; deterministic decoding; map to {hr, construction, tickets, clarify}.

In [23]:
def llm_intent(query: str):
    prompt = f"""Classify the user query into ONE label: hr | construction | tickets | clarify
Definitions:
- hr: HR policies, PTO, leave
- construction: project IDs, RFI, BOS, PCO
- tickets: troubleshooting (SSO/login/browsers)
- clarify: greeting or vague

Q: {query}
A:""".strip()
    out = intent_llm(prompt, max_new_tokens=3, do_sample=False)[0]['generated_text'].strip().lower()
    for lab in ['hr','construction','tickets','clarify']:
        if out.startswith(lab) or lab in out:
            return lab
    return 'clarify'

print('✅ Intent classifier ready')

✅ Intent classifier ready


## ⚖️ Step 6 — Hybrid Retrieval (Weighted Fusion)
Map intent → α (semantic weight). Combine normalized scores: `fused = α·semantic + (1−α)·keyword`.

In [24]:
def alpha_for_label(label: str):
    return {'hr': 0.8, 'construction': 0.2, 'tickets': 0.5}.get(label, 0.5)

def hybrid_retrieve(query: str, k=3):
    label = llm_intent(query)
    alpha = alpha_for_label(label)
    kw = np.array(BM25.get_scores(tokenize(query)))
    qv = embed_model.encode([query], normalize_embeddings=True, convert_to_numpy=True)[0]
    sem = EMB @ qv
    kw_norm = (kw - kw.min()) / (kw.max() - kw.min() + 1e-9)
    sem_norm = (sem + 1) / 2
    fused = alpha * sem_norm + (1 - alpha) * kw_norm
    top = np.argsort(fused)[::-1][:k]
    return {'intent': label, 'alpha': alpha, 'results': [(float(fused[i]), DOCUMENTS[i]) for i in top]}

print('✅ Hybrid retriever ready')

✅ Hybrid retriever ready


## 📝 Step 7 — LLM Answer back to user
Instruct the LLM to answer concisely using **only** the retrieved context.

In [25]:
def llm_answer(query: str, contexts: List[Dict[str, Any]]):
    ctx = '\n\n'.join([f"[{d['id']}] {d['text']}" for d in contexts])
    prompt = f"You are SuffolkGPT. Answer briefly using ONLY the context below.\nQuestion: {query}\n\nContext:\n{ctx}\n\nAnswer:"
    out = answer_llm(prompt, max_new_tokens=120, do_sample=False)[0]['generated_text'].strip()
    return out

def agent_answer(query: str):
    res = hybrid_retrieve(query)
    docs = [d for _, d in res['results']]
    ans = llm_answer(query, docs)
    print(f"\nQuery: {query}\nIntent: {res['intent']} | α={res['alpha']}\nAnswer:\n{ans}")

print('✅ Agent ready')

✅ Agent ready


## ▶️ Step 8 — Demo Queries
Run the three queries representing each content type.

In [28]:
agent_answer('How many PTO days carry over at year end for employees?')
agent_answer('What is ASI?')
agent_answer('What do I need')
agent_answer('How many weeks of paid leave I have as a new mom?')
agent_answer('Workday SSO loop on mobile Safari')


Query: How many PTO days carry over at year end for employees?
Intent: hr | α=0.8
Answer:
unused

Query: What is RFI-1034?
Intent: construction | α=0.2
Answer:
Request for Information

Query: How many weeks of paid leave I have as a new mom?
Intent: hr | α=0.8
Answer:
12

Query: Workday SSO loop on mobile Safari
Intent: tickets | α=0.5
Answer:
[t-001] User cannot log in to Workday; SSO loop in mobile Safari. Suggested fix: disable content blockers and retry
