
## 1.1 Core Understanding

### 1. What is Generative AI?

<details><summary>Click here for answer</summary>

**Generative AI** is a class of artificial intelligence systems that **create new content** such as text, images, audio, video, or code by learning patterns from large datasets and then producing novel outputs that resemble the learned data.

**Example:**
ChatGPT generating text, DALL·E generating images, Copilot generating code.

</details>

---

### 2. What is an LLM and how does it differ from traditional ML models?

<details><summary>Click here for answer</summary>

An **LLM (Large Language Model)** is a deep neural network (usually Transformer-based) trained on massive text corpora to understand and generate human language.

| Traditional ML         | LLM                                  |
| ---------------------- | ------------------------------------ |
| Task-specific          | General-purpose language system      |
| Feature-engineered     | Learns representations automatically |
| Small datasets         | Trained on internet-scale data       |
| Fixed output structure | Flexible, generative output          |

</details>

---

### 3. What are the key components of a GenAI application?

<details><summary>Click here for answer</summary>

A typical GenAI system includes:

1. **User Interface** – Web/app frontend
2. **Orchestration Layer** – LangChain, custom pipelines
3. **Prompt Management** – Templates, versions
4. **Model Layer** – LLMs (GPT, Claude, LLaMA, etc.)
5. **Retrieval System** – Vector DB + embeddings
6. **Data Sources** – Documents, databases, APIs
7. **Memory & State** – Conversation & session data
8. **Monitoring & Evaluation** – Logs, metrics, feedback

</details>

---

### 4. What is prompt engineering?

<details><summary>Click here for answer</summary>

**Prompt engineering** is the practice of designing and structuring input instructions so the LLM produces accurate, consistent, and useful outputs.

Includes:

* Role definition
* Output formatting
* Constraints
* Examples (few-shot learning)

</details>

---

### 5. What is temperature and top-p sampling?

<details><summary>Click here for answer</summary>

Both control randomness in output generation.

| Parameter   | Purpose                                                  |
| ----------- | -------------------------------------------------------- |
| Temperature | Controls creativity (0 = deterministic, high = creative) |
| Top-p       | Limits token choices to most probable cumulative mass    |

Used to balance **accuracy vs creativity**.

</details>

---

### 6. What is context window?

<details><summary>Click here for answer</summary>

The **context window** is the maximum number of tokens a model can read and generate in a single request.

Includes:

* Prompt
* Retrieved documents
* Conversation history
* Model response

</details>

---

### 7. What is tokenization?

<details><summary>Click here for answer</summary>

**Tokenization** is breaking text into smaller units (tokens) the model understands.

Example:
"Artificial Intelligence" → ["Artificial", "Intelligence"]

Models operate on tokens, not raw text.

</details>

---

### 8. What is inference vs training?

<details><summary>Click here for answer</summary>

| Training           | Inference           |
| ------------------ | ------------------- |
| Learning from data | Using trained model |
| Compute-heavy      | Latency-sensitive   |
| Offline process    | Real-time process   |

</details>

---

### 9. What is hallucination in LLMs?

<details><summary>Click here for answer</summary>

A **hallucination** occurs when an LLM generates information that is **factually incorrect or unsupported**, while sounding confident.

Mitigated using:

* Retrieval (RAG)
* Verification steps
* Strict prompting

</details>

---

### 10. Why are embeddings important?

<details><summary>Click here for answer</summary>

**Embeddings** convert text into numeric vectors representing meaning.

They enable:

* Semantic search
* Document retrieval
* Clustering
* Recommendation systems

Core to modern RAG systems.

</details>

---

## 1.2 Simple Design Scenarios

---

### 1. Design a Chatbot for FAQs

<details><summary>Click here for answer</summary>

**Architecture:**

1. User query
2. Embed query
3. Retrieve relevant FAQs from vector DB
4. Inject into prompt
5. LLM generates answer
6. Return response

**Key concerns:** latency, accuracy, fallback responses

</details>

---

### 2. Design a Document Summarization System

<details><summary>Click here for answer</summary>

**Flow:**

1. Upload document
2. Chunk text
3. Summarize each chunk
4. Combine summaries
5. Generate final summary

Handles long documents using chunking + hierarchical summarization.

</details>

---

### 3. Design a Resume Screening System

<details><summary>Click here for answer</summary>

**Pipeline:**

1. Upload resumes
2. Extract text
3. Generate embeddings
4. Compare with job description embedding
5. Rank candidates
6. Generate evaluation summary

Supports objective, scalable hiring decisions.

</details>

---

### 4. Design a Content Generation Pipeline

<details><summary>Click here for answer</summary>

**Flow:**

1. Input: topic + constraints
2. Prompt template selection
3. LLM generation
4. Quality checks
5. Human approval (optional)
6. Publish

Includes moderation, versioning, and feedback loop.

</details>



### 1. Explain Retrieval-Augmented Generation (RAG)

RAG combines **search** and **generation**.
Instead of relying only on the LLM’s training data, the system retrieves relevant external knowledge and provides it to the model before generating a response.

**Pipeline:**
User Query → Embedding → Vector Search → Relevant Documents → Prompt Construction → LLM → Answer

**Why it matters:**
Higher accuracy, up-to-date knowledge, lower hallucination.

---

### 2. Design a GenAI-Powered Enterprise Search Engine

**Components:**

1. **Ingestion:** Documents, databases, APIs
2. **Preprocessing:** Cleaning, chunking
3. **Embedding:** Convert chunks to vectors
4. **Storage:** Vector database + metadata store
5. **Query Handling:**
   Query → embed → retrieve top-k chunks
6. **Generation:** LLM answers using retrieved content
7. **Output:** Ranked results with citations

**Key Features:**
Access control, logging, feedback loop, caching.

---

### 3. How Do Vector Databases Work Internally?

1. Store high-dimensional vectors
2. Build indexes (HNSW, IVF, PQ) for fast search
3. Perform **Approximate Nearest Neighbor** search
4. Return closest vectors by similarity (cosine, dot-product)
5. Attach metadata for filtering

---

### 4. Compare Fine-Tuning vs RAG

| Fine-Tuning           | RAG               |
| --------------------- | ----------------- |
| Model weights updated | Model unchanged   |
| Slow & expensive      | Fast & flexible   |
| Static knowledge      | Dynamic knowledge |
| Hard to maintain      | Easy to update    |

---

### 5. How Do You Build a Conversational Memory System?

1. Store conversation history
2. Summarize older interactions
3. Convert memory into embeddings
4. Retrieve relevant memory per query
5. Inject into prompt dynamically

Supports both short-term and long-term memory.

---

### 6. Handling Long Documents Beyond Context Limits

* Chunk documents
* Create embeddings
* Retrieve only relevant chunks
* Apply hierarchical summarization for large content

---

### 7. How Do You Evaluate LLM Responses?

* Automated metrics (ROUGE, BLEU, similarity scores)
* Human review
* Regression tests
* Task success metrics
* User feedback

---

### 8. How Do You Reduce Hallucinations?

* Use RAG
* Require citations
* Constrain prompts
* Add verification step
* Limit model freedom

---

### 9. Embedding Strategies

* Optimal chunk size & overlap
* Metadata enrichment
* Hybrid search (keyword + vector)
* Domain-specific embeddings
* Periodic re-embedding

---

### 10. Choosing Between Multiple LLM Providers

Consider:
Accuracy, cost, latency, privacy, reliability, tool support.
Use **dynamic routing** to balance quality and cost.

---

## 2.2 System Design Scenarios

---

### 1. Design a ChatGPT-Like System

**Architecture:**

UI → API Gateway → Orchestrator → Prompt Manager →
LLM → Memory → Moderation → Logging & Monitoring

---

### 2. Design a Legal Document Assistant

RAG over legal corpus → citation engine → compliance controls → audit logs → role-based access.

---

### 3. Design an AI Tutor Platform

Student profile → knowledge retrieval → adaptive prompt generation → progress tracking → feedback engine.

---

### 4. Design a Code Review Assistant

Code ingestion → static analysis → LLM review → suggestion ranking → CI/CD integration.

---

### 5. Design a Real-Time Customer Support AI

Customer query → CRM lookup → RAG → LLM response → sentiment detection → human handoff → learning loop.


---

## 3. Advanced Level — Production & Scalability

### 3.1 Engineering & Reliability

1. Design scalable RAG architecture for millions of documents.
2. How do you implement caching in GenAI?
3. How do you handle LLM rate limits?
4. How do you build prompt versioning & rollback?
5. How do you design observability for GenAI?
6. How do you manage model drift?
7. How do you implement A/B testing for prompts?
8. How do you detect and prevent prompt injection?
9. How do you ensure data privacy & compliance?
10. How do you handle model outages?

### 3.2 Real-World Design

1. Design AI-powered CRM system.
2. Design enterprise knowledge assistant.
3. Design medical diagnosis assistant.
4. Design fraud detection AI system.
5. Design AI DevOps assistant.

---

## 4. Expert Level — Strategic & Research

### 4.1 Deep System Thinking

1. Design multi-agent autonomous system for enterprise automation.
2. How do you design long-term memory for AI?
3. How do you handle multimodal GenAI pipelines?
4. How do you evaluate business ROI of GenAI?
5. How do you balance cost vs quality in LLM systems?
6. How do you architect GenAI microservices?
7. How do you manage continuous learning pipelines?
8. How do you handle model governance at scale?

---

## 5. Master Level — Leadership & Vision

### 5.1 Executive-Level Questions

1. How would you build a GenAI platform for a large enterprise?
2. How do you create company-wide AI governance?
3. How do you scale GenAI from prototype to global product?
4. How do you align GenAI systems with business strategy?
5. How do you measure success of AI transformation?

---

## High-Frequency Interview Focus Areas

| Area                    | Importance  |
| ----------------------- | ----------- |
| RAG & Vector Search     | Very High   |
| Scalability             | Very High   |
| Security & Compliance   | Very High   |
| Evaluation & Monitoring | High        |
| Cost Optimization       | High        |
| Multi-Agent Systems     | Medium–High |
| Multimodal AI           | Medium–High |
