# System Design Questions

1. Easy: You need to deploy an LLM-powered chatbot for a government website. What are the key components you would include to make it scalable and reliable?

2. Medium: Design an end-to-end multilingual voice assistant pipeline (ASR → LLM → TTS). Where do you handle language detection, latency constraints, and fallback when speech recognition fails?

3. Medium–Hard: A ministry wants an AI system to summarize large legal documents and answer questions. Would you use RAG, fine-tuning, or both? Explain your architecture and trade-offs.

4. Hard: Design an agentic AI system that can handle citizen service requests by calling internal tools (databases, ticketing APIs). How do you prevent hallucinated tool calls and ensure auditability?

5. Very Hard: After deployment, your AI assistant performs well normally but fails catastrophically on rare edge cases (accent bias, incorrect legal advice, high latency spikes). How do you detect, monitor, and systematically reduce these failures over time?


Good. These are the **5 most efficient system design questions** (easy → hard) that cover **RAG, fine-tuning, agents, orchestration, and production AI** without wasting time.

Answer them in this format only:

**Problem → Approach → Tradeoffs → Metrics → Risks → Next steps**

---

# 1) Easy — RAG Baseline Design

**Question:**
Design a **Document Q&A system** for internal government officers to query policy PDFs.

Constraints:

* 500k documents
* Must return answers in < 2 seconds
* Sources must be cited

What is your architecture end-to-end?

**Follow-ups:**

* Where do embeddings live?
* How do you handle document updates


### ✅ Correct Senior Answer (what you should say)

#### Problem

We need a **Document Q&A system** for 500K government policy PDFs with:

* <2s latency
* Answer must include citations
* High reliability for internal officers

---

#### Approach (Architecture)

##### 1. Offline Ingestion Pipeline

* Parse PDFs → extract clean text
* Chunk documents (300–500 tokens)
* Generate embeddings using a multilingual embedding model
* Store chunks in a **vector database** with metadata:

```json
{
  chunk_text,
  doc_id,
  page_number,
  section_title,
  timestamp
}
```

This metadata enables citation.

---

##### 2. Retrieval Layer (Fast + Accurate)

At query time:

* User query → embedding
* Retrieve top-k chunks using:
  * Dense similarity search
  * Optional hybrid BM25 + vector search for legal text

This ensures both semantic + keyword accuracy.

---

##### 3. Generation Layer (Cited Answer)

* Pass retrieved chunks + query into the LLM
* Prompt enforces:
* Only answer from context
* Cite page/section for every claim
* If not found → say “Not in documents”

---

##### 4. Latency Optimizations (<2s SLA)

* Use ANN indexes (HNSW/IVF) in vector DB
* Cache embeddings + frequent queries
* Stream responses
* Use smaller tuned models for generation if needed

---

#### Tradeoffs

* RAG avoids retraining when policies update
* Fine-tuning improves style but risks stale knowledge
* Hybrid search improves recall but increases complexity

---

#### Metrics

* Retrieval Recall@k
* Answer groundedness (% supported by sources)
* Latency p95 <2s
* Citation accuracy
* User satisfaction

---

#### Risks + Mitigation

* Hallucination → strict context-only prompting + refusal
* Stale docs → incremental re-indexing pipeline
* Security → access control per officer role

---

## Next Steps

* Add evaluation harness for retrieval + citations
* Monitor production drift
* Expand to multilingual queries

---

---

# 2) Medium — RAG Failure & Evaluation

**Question:**
Your RAG assistant works well on common queries but fails badly on rare edge cases.

How do you:

* detect these failures early
* evaluate retrieval quality
* reduce hallucinations over time?

**Follow-ups:**

* What offline + online metrics do you track?

---

# 3) Medium–Hard — Fine-tuning vs RAG Decision

**Question:**
A ministry wants a chatbot that answers citizen questions about complex legal rules.

You have:

* 30k labeled Q&A pairs
* Privacy constraints (data cannot leave premises)
* Need multilingual support

Do you choose:

* Fine-tuning (LoRA)
* RAG
* Hybrid

Explain why.

**Follow-ups:**

* What changes when laws update every month?

---

# 4) Hard — Agentic Workflow + Tool Orchestration

**Question:**
Design an AI agent that can:

* understand user intent
* fetch citizen complaint status from databases
* escalate tickets via APIs
* generate summaries for human officers

It must be:

* auditable
* safe
* resistant to hallucinated tool calls

Where do you put:

* planner
* tool executor
* memory
* verification layer?

---

# 5) Very Hard — Government-Grade Voice Agent at Scale

**Question:**
Design a **multilingual voice agent** for a national helpline:

* Hindi, English, Tamil
* Handles 10k concurrent calls
* Must degrade safely under load
* Must be monitored + explainable

Describe:

* ASR → LLM → TTS pipeline
* Latency budget
* Fallback mechanisms
* Monitoring and bias mitigation

---

## Brutal Interview Rule

If you can answer #4 and #5 crisply, you’re senior.
If not, you’re still mid-level.

---

Pick **Question 1** and answer in the required structure.
I will interrupt and correct you like a real interviewer.
