# üéØ RAG Interview Questions - Accenture

**Retrieval-Augmented Generation - Complete Guide**  
**Total Questions:** 60

---

## üìì Contents

| Section | Questions |
|---------|----------|
| üî∑ **RAG Fundamentals** | Q1-Q10 |
| üî∑ **Chunking Strategies** | Q11-Q18 |
| üî∑ **Embeddings & Vector Search** | Q19-Q28 |
| üî∑ **Retrieval Optimization** | Q29-Q38 |
| üî∑ **Advanced RAG Techniques** | Q39-Q48 |
| üî∑ **Evaluation & Production** | Q49-Q55 |
| üî∑ **Enterprise & System Design** | Q56-Q60 |

---

## üî∑ RAG Fundamentals (Q1-Q10)

### 1. What is RAG (Retrieval-Augmented Generation)?

RAG combines LLMs with external knowledge retrieval. Instead of relying solely on trained knowledge, it retrieves relevant documents and includes them as context for generation. This reduces hallucinations and enables up-to-date, domain-specific responses.

### 2. What are the main components of a RAG pipeline?

1. **Document Ingestion**: Load and preprocess documents
2. **Chunking**: Split documents into manageable pieces
3. **Embedding**: Convert chunks to vector representations
4. **Vector Store**: Store and index embeddings
5. **Retrieval**: Find relevant chunks for a query
6. **Generation**: LLM generates response using retrieved context

### 3. When should you use RAG vs Fine-tuning?

| Use Case | RAG | Fine-tuning |
|----------|-----|-------------|
| Domain knowledge | ‚úÖ Best | Possible |
| Frequently updated data | ‚úÖ | ‚ùå |
| Need citations | ‚úÖ | ‚ùå |
| Style/behavior change | ‚ùå | ‚úÖ |
| Limited compute | ‚úÖ | ‚ùå |

### 4. How does RAG reduce hallucinations?

RAG grounds LLM responses in retrieved documents rather than relying on parametric memory. The model generates answers based on provided context, which can be verified. This factual grounding significantly reduces fabricated information.

### 5. What is Naive RAG vs Advanced RAG?

**Naive RAG**: Simple retrieve-then-generate pipeline. Issues: retrieval inaccuracy, no optimization.

**Advanced RAG**: Adds pre-retrieval (query transformation), post-retrieval (reranking, compression), and iterative refinement. Better accuracy and relevance.

### 6. What are the limitations of RAG?

- Retrieval quality bottleneck
- Context window limitations
- Latency from retrieval step
- Chunking may lose context
- Cannot learn new behaviors (only knowledge)

### 7. What is the typical data flow in RAG?

```
User Query ‚Üí Query Embedding ‚Üí Vector Search ‚Üí 
Retrieved Chunks ‚Üí Prompt Construction ‚Üí LLM ‚Üí Response
```

### 8. What frameworks are used for building RAG systems?

| Framework | Strengths |
|-----------|----------|
| **LangChain** | Comprehensive, agents, chains |
| **LlamaIndex** | Data indexing, retrieval focus |
| **Haystack** | Production-ready, modular |
| **Semantic Kernel** | Microsoft ecosystem |

### 9. What is the role of context in RAG?

Context is the retrieved information injected into the prompt. Quality context leads to accurate answers. Irrelevant context confuses the LLM. Context management (selection, ordering, compression) is critical for RAG success.

### 10. How do you handle the "Lost in the Middle" problem?

LLMs pay less attention to information in the middle of long contexts. Solutions:
- Place important content at beginning/end
- Use reranking to prioritize
- Limit context length
- Use recursive summarization

---
## üî∑ Chunking Strategies (Q11-Q18)

### 11. What is chunking and why is it important?

Chunking splits documents into smaller pieces for embedding and retrieval. Important because: embeddings work best on focused content, retrieval returns relevant portions only, and it fits within context limits.

### 12. What are the main chunking strategies?

| Strategy | Description |
|----------|-------------|
| **Fixed-size** | Split by token/character count |
| **Sentence-based** | Split at sentence boundaries |
| **Paragraph-based** | Split at paragraphs |
| **Semantic** | Split by meaning/topics |
| **Recursive** | Hierarchical splitting |

### 13. What is the optimal chunk size?

Depends on use case. General guidelines:
- **Small (128-256 tokens)**: Precise retrieval, may lose context
- **Medium (512-1024 tokens)**: Good balance (most common)
- **Large (1024+ tokens)**: More context, less precise

Always test on your specific data and queries.

### 14. What is overlapping/sliding window chunking?

Chunks share some content with neighbors (e.g., 10-20% overlap). Benefits:
- Preserves context at boundaries
- Reduces information loss
- Improves retrieval for boundary-spanning queries

### 15. What is semantic chunking?

Splits text based on meaning rather than fixed size. Uses embeddings to detect topic shifts. Keeps semantically related content together. More complex but preserves context better than fixed-size chunking.

### 16. What is contextual chunking?

Enhances chunks with additional context:
- Adds document summary to each chunk
- Includes section headers
- Appends adjacent chunk summaries

Helps retrieval understand chunk in broader context.

### 17. What is agentic/structural chunking?

Uses document structure (headings, sections) to guide chunking. May use LLM to identify logical boundaries. Best for structured documents like reports, manuals, legal documents.

### 18. How do you handle tables and images in chunking?

- **Tables**: Keep together, convert to text/markdown, or use specialized table embeddings
- **Images**: Extract text (OCR), generate captions, use multimodal embeddings
- Store metadata linking to original assets

---
## üî∑ Embeddings & Vector Search (Q19-Q28)

### 19. What are embeddings in RAG?

Embeddings are dense vector representations of text that capture semantic meaning. Similar meanings result in similar vectors. They enable semantic search - finding relevant content based on meaning, not just keywords.

### 20. What embedding models are commonly used?

| Model | Provider | Dimensions |
|-------|----------|------------|
| text-embedding-3-large | OpenAI | 3072 |
| text-embedding-3-small | OpenAI | 1536 |
| BGE-large | BAAI | 1024 |
| E5-large | Microsoft | 1024 |
| Cohere embed-v3 | Cohere | 1024 |

### 21. What is a Vector Database?

Specialized database for storing and querying high-dimensional vectors. Optimized for similarity search. Examples: Pinecone, Weaviate, Chroma, Qdrant, Milvus, pgvector.

### 22. What is Cosine Similarity?

Measures angle between vectors (not magnitude). Range: -1 to 1 (1 = identical). Most common similarity metric for embeddings. Formula: cos(Œ∏) = (A¬∑B) / (||A|| √ó ||B||)

### 23. What is ANN (Approximate Nearest Neighbor) search?

Trades perfect accuracy for speed. Finds "close enough" matches much faster than exact search. Essential for large-scale production systems. Algorithms: HNSW, IVF, LSH.

### 24. What is HNSW (Hierarchical Navigable Small World)?

Popular ANN algorithm using multi-layer graph structure. Fast queries with high recall. Used by Pinecone, Weaviate, pgvector. Good balance of speed and accuracy.

### 25. How do you choose a vector database?

Consider: scale (millions of vectors?), latency requirements, managed vs self-hosted, hybrid search support, filtering capabilities, cost, existing infrastructure.

### 26. What is metadata filtering in vector search?

Combines vector similarity with metadata constraints. Example: Find similar documents WHERE date > 2023 AND category = 'finance'. Reduces search space, improves relevance.

### 27. How do you optimize embedding quality?

- Choose domain-appropriate model
- Fine-tune on your data if needed
- Benchmark on your queries
- Use instruction-tuned models
- Consider query vs document embeddings

### 28. What is the embedding asymmetry problem?

Queries are short, documents are long - different embedding characteristics. Solutions:
- Use asymmetric embedding models
- HyDE (generate hypothetical answer)
- Query expansion

---
## üî∑ Retrieval Optimization (Q29-Q38)

### 29. What is Hybrid Search?

Combines dense (semantic) and sparse (keyword) retrieval:
- **Dense**: Vector embeddings, semantic meaning
- **Sparse**: BM25/TF-IDF, exact keywords

Merged using RRF (Reciprocal Rank Fusion). Better than either alone.

### 30. What is Reranking?

Two-stage retrieval:
1. Fast initial retrieval (top-K candidates)
2. Precise reranking with cross-encoder

Cross-encoders process query+document together for better relevance scoring. Models: Cohere Rerank, BGE-reranker.

### 31. What is Query Expansion?

Generate multiple query variations to improve recall:
- Synonyms and related terms
- LLM-generated alternatives
- Multi-query retrieval

Catches documents matching different phrasings.

### 32. What is HyDE (Hypothetical Document Embeddings)?

1. LLM generates hypothetical answer to query
2. Embed the hypothetical answer
3. Search for similar real documents

Bridges query-document embedding gap. Often improves retrieval quality.

### 33. What is Contextual Compression?

Extract only relevant portions from retrieved documents:
- LLM extracts relevant sentences
- Summarizes long chunks
- Removes irrelevant content

Reduces noise, fits more documents in context.

### 34. What is Multi-hop Retrieval?

For complex queries requiring information from multiple sources:
1. Initial retrieval
2. Reason about what's missing
3. Retrieve additional information
4. Repeat until sufficient

Example: "Who is the CEO of the company that made the iPhone?"

### 35. What is Parent-Child Retrieval?

- Embed small chunks (children) for precise retrieval
- Return larger chunks (parents) for context
- Best of both: precision + context

Also called "Small-to-Big" retrieval.

### 36. What is Semantic Caching?

Cache query embeddings and responses. For semantically similar queries, return cached answers. Benefits:
- Reduces LLM API calls
- Lower latency (ms vs seconds)
- Cost savings

### 37. How do you optimize retrieval latency?

- Use efficient ANN indexes (HNSW)
- Limit retrieved documents (top-K)
- Metadata pre-filtering
- Caching
- Async retrieval
- Index sharding for scale

### 38. What is Corrective RAG (CRAG)?

Adds verification step:
1. Retrieve documents
2. LLM evaluates relevance of each
3. Discard irrelevant ones
4. Optionally trigger web search if context insufficient

Self-correcting retrieval improves accuracy.

---
## üî∑ Advanced RAG Techniques (Q39-Q48)

### 39. What is Adaptive RAG?

Dynamically adjusts retrieval based on query:
- Simple queries: No retrieval needed
- Moderate: Single retrieval
- Complex: Multi-hop/iterative

Classifier routes queries to appropriate strategy.

### 40. What is Self-RAG?

LLM decides when and what to retrieve:
1. Generate initial response
2. Self-critique for gaps
3. Retrieve if needed
4. Refine response

More autonomous, reduces unnecessary retrieval.

### 41. What is GraphRAG?

Combines RAG with knowledge graphs:
- Extract entities and relationships
- Build graph structure
- Retrieve via graph traversal

Better for multi-hop reasoning, entity relationships.

### 42. What is Multimodal RAG?

Extends RAG to multiple modalities:
- Text + Images
- Text + Tables
- Text + Audio/Video

Uses multimodal embeddings, image understanding models.

### 43. What is Agentic RAG?

RAG enhanced with agent capabilities:
- Query planning and decomposition
- Tool use (calculators, APIs)
- Iterative retrieval and reasoning
- Self-reflection and correction

### 44. What is Modular RAG?

Flexible architecture with interchangeable components:
- Swap embedding models
- Change retrieval strategies
- Add/remove processing steps

Enables experimentation and optimization.

### 45. How do you handle long documents?

- Recursive summarization (tree structure)
- Map-reduce: summarize chunks, then combine
- Hierarchical indexing
- Use long-context models (Gemini 1M tokens)

### 46. What is RAG Fusion?

1. Generate multiple query variations
2. Retrieve for each query
3. Fuse results using RRF
4. Generate from combined context

Improves recall by covering different angles.

### 47. How do you handle real-time/streaming data?

- Incremental indexing
- Time-based partitioning
- TTL (time-to-live) for freshness
- Hybrid with real-time API calls

### 48. What is the role of prompts in RAG?

Critical for RAG success:
- System prompt: Define behavior, citation format
- Context injection: How to present retrieved docs
- Instructions: Use only provided context, cite sources
- Output format: JSON, structured responses

---
## üî∑ Evaluation & Production (Q49-Q55)

### 49. What is RAGAS?

RAG Assessment framework with metrics:
- **Faithfulness**: Is answer grounded in context?
- **Answer Relevancy**: Does answer address query?
- **Context Precision**: Is retrieved context relevant?
- **Context Recall**: Is all needed info retrieved?

### 50. What retrieval metrics should you track?

| Metric | Description |
|--------|-------------|
| Precision@K | % of retrieved docs that are relevant |
| Recall@K | % of relevant docs retrieved |
| MRR | Mean Reciprocal Rank |
| nDCG | Normalized Discounted Cumulative Gain |

### 51. What generation metrics should you track?

- **Faithfulness/Groundedness**: Factual accuracy
- **Answer Relevance**: Addresses the question
- **Completeness**: Covers all aspects
- **Coherence**: Well-structured response
- **Hallucination rate**: Fabricated content

### 52. How do you evaluate RAG in production?

- User feedback (thumbs up/down)
- LLM-as-judge for automated eval
- A/B testing for changes
- Human evaluation samples
- Logging and monitoring dashboards

### 53. What are common RAG failure modes?

1. **Retrieval failure**: Wrong documents retrieved
2. **Context overflow**: Too much irrelevant info
3. **Lost in middle**: Important info ignored
4. **Hallucination**: Ignores context, makes up info
5. **Incomplete**: Missing key information

### 54. How do you debug RAG issues?

1. Log retrieved chunks for each query
2. Check retrieval relevance scores
3. Inspect prompt construction
4. Test components in isolation
5. Create evaluation test sets

### 55. What production monitoring is needed?

- Latency (retrieval, generation, E2E)
- Token usage and costs
- Error rates
- User satisfaction metrics
- Index freshness

---
## üî∑ Enterprise & System Design (Q56-Q60)

### 56. Design a RAG system for enterprise Q&A.

```
1. Data Ingestion: Connectors for various sources
2. Processing: Chunking, cleaning, metadata extraction
3. Embedding: Batch processing, incremental updates
4. Vector Store: Scalable, filtered search
5. Retrieval: Hybrid search + reranking
6. Generation: Prompt engineering, guardrails
7. Frontend: Chat interface, citation display
8. Monitoring: Metrics, logging, feedback
```

### 57. How do you handle security in enterprise RAG?

- **Access control**: User-based document permissions
- **Data privacy**: PII detection and masking
- **Audit logging**: Track all queries and responses
- **Prompt injection**: Input validation
- **Data residency**: On-premise or private cloud

### 58. How do you scale RAG for millions of documents?

- Distributed vector database (sharding)
- Efficient indexing (HNSW, IVF)
- Metadata pre-filtering
- Caching layer
- Async processing
- Load balancing

### 59. How do you keep the knowledge base fresh?

- Scheduled re-indexing
- Incremental updates (CDC)
- Document versioning
- TTL-based expiration
- Monitoring for staleness

### 60. What is Accenture's approach to enterprise RAG?

Accenture emphasizes:
- **Digital Core**: Strong data foundation
- **Responsible AI**: Governance, compliance
- **Human+AI**: Augmentation, not replacement
- **Value-led**: ROI-focused use cases
- **Scalability**: Enterprise-grade architecture

---

### üçÄ Good luck with your Accenture interview! üí™