## 1. What is RAG?

**Retrieval-Augmented Generation (RAG)** combines:
1. **Retrieval**: Finding relevant information from a knowledge base
2. **Augmentation**: Adding that information to the prompt
3. **Generation**: Using an LLM to generate responses with retrieved context

### Why RAG?

**Without RAG:**
```
User: "What is the STDF PTR record format?"
LLM: "I don't have specific information about STDF formats..."
```

**With RAG:**
```
User: "What is the STDF PTR record format?"
System retrieves: [STDF spec documentation about PTR records]
LLM: "The STDF PTR (Parametric Test Record) contains..."
[Accurate, sourced response]
```

### RAG vs Fine-Tuning vs Prompt Engineering

| Approach | Knowledge Update | Cost | Use Case |
|----------|-----------------|------|----------|
| **RAG** | Easy (add documents) | Low | Dynamic knowledge, documents |
| **Fine-Tuning** | Retrain model | High | Behavioral changes, style |
| **Prompt Engineering** | Change prompt | Very Low | Simple tasks, known context |

### When to Use RAG?

‚úÖ **Use RAG when:**
- Need to query large document collections
- Knowledge changes frequently
- Want to cite sources
- Working with proprietary/internal data
- Cost-effective solution needed

‚ùå **Don't use RAG when:**
- Simple Q&A with fixed knowledge
- Real-time performance critical (<100ms)
- No document/knowledge base available
- Task doesn't need external knowledge

---

## 2. RAG Architecture

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   User      ‚îÇ
‚îÇ   Query     ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Query Embedding    ‚îÇ  ‚Üê Encode query to vector
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Vector Database    ‚îÇ  ‚Üê Find similar documents
‚îÇ  (Similarity Search)‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Retrieved Documents ‚îÇ  ‚Üê Top-k most relevant chunks
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Prompt Construction ‚îÇ  ‚Üê Context + Query
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   LLM Generation    ‚îÇ  ‚Üê Generate answer
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Final Response     ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

---

## 3. Setup and Installation

In [None]:
# Install required packages (run once)
# !pip install langchain chromadb sentence-transformers openai tiktoken pypdf python-docx

import os
import numpy as np
import pandas as pd
from typing import List, Dict, Any
import warnings
warnings.filterwarnings('ignore')

# For embeddings
from sentence_transformers import SentenceTransformer

# For vector storage
import chromadb
from chromadb.config import Settings

# For text processing
import re
from collections import Counter

print("‚úÖ Libraries imported successfully")
print("\nüì¶ Key Components:")
print("  - sentence-transformers: Generate embeddings")
print("  - chromadb: Vector database")
print("  - LangChain: RAG orchestration (optional)")

---

## 4. Document Preparation & Chunking

Chunking is critical for RAG performance. Good chunks:
- Contain complete thoughts
- Are not too large (context window) or too small (missing context)
- Have overlap to maintain continuity

### Chunking Strategies

In [None]:
class DocumentChunker:
    """Various strategies for chunking documents"""
    
    @staticmethod
    def fixed_size_chunking(text: str, chunk_size: int = 512, overlap: int = 50) -> List[str]:
        """
        Split text into fixed-size chunks with overlap
        
        Args:
            text: Input text
            chunk_size: Characters per chunk
            overlap: Overlapping characters between chunks
        """
        chunks = []
        start = 0
        text_length = len(text)
        
        while start < text_length:
            end = start + chunk_size
            chunk = text[start:end]
            chunks.append(chunk)
            start += (chunk_size - overlap)
        
        return chunks
    
    @staticmethod
    def sentence_chunking(text: str, max_sentences: int = 5) -> List[str]:
        """
        Split text by sentences, grouping max_sentences together
        """
        # Simple sentence splitting (can use spacy/nltk for better results)
        sentences = re.split(r'[.!?]+', text)
        sentences = [s.strip() for s in sentences if s.strip()]
        
        chunks = []
        for i in range(0, len(sentences), max_sentences):
            chunk = '. '.join(sentences[i:i+max_sentences]) + '.'
            chunks.append(chunk)
        
        return chunks
    
    @staticmethod
    def paragraph_chunking(text: str) -> List[str]:
        """
        Split text by paragraphs (double newlines)
        """
        paragraphs = text.split('\n\n')
        chunks = [p.strip() for p in paragraphs if p.strip()]
        return chunks
    
    @staticmethod
    def semantic_chunking(text: str, embedding_model, similarity_threshold: float = 0.7) -> List[str]:
        """
        Split based on semantic similarity (advanced)
        Group sentences with similar meanings
        """
        sentences = re.split(r'[.!?]+', text)
        sentences = [s.strip() for s in sentences if s.strip()]
        
        if len(sentences) <= 1:
            return sentences
        
        # Generate embeddings for each sentence
        embeddings = embedding_model.encode(sentences)
        
        chunks = []
        current_chunk = [sentences[0]]
        
        for i in range(1, len(sentences)):
            # Calculate similarity with previous sentence
            similarity = np.dot(embeddings[i-1], embeddings[i]) / (
                np.linalg.norm(embeddings[i-1]) * np.linalg.norm(embeddings[i])
            )
            
            if similarity >= similarity_threshold:
                current_chunk.append(sentences[i])
            else:
                chunks.append('. '.join(current_chunk) + '.')
                current_chunk = [sentences[i]]
        
        # Add last chunk
        if current_chunk:
            chunks.append('. '.join(current_chunk) + '.')
        
        return chunks

# Example semiconductor documentation
sample_stdf_doc = """
STDF (Standard Test Data Format) is the industry standard for semiconductor test data.
It was developed by Teradyne in the 1980s and has become widely adopted.

The PTR (Parametric Test Record) is one of the most important record types in STDF.
It contains the results of parametric tests performed on devices.
Each PTR includes test number, test result, and pass/fail status.

The FTR (Functional Test Record) stores functional test results.
Unlike parametric tests, functional tests verify digital logic operations.
FTR records include test number and binary pass/fail results.

STDF files are binary files that improve storage efficiency.
They use a specific byte ordering and data type encoding.
Reading STDF files requires specialized parsers.

Common uses of STDF data include yield analysis and failure analysis.
Engineers use STDF data to identify manufacturing defects.
Statistical analysis of STDF data helps improve production processes.
"""

# Test different chunking strategies
chunker = DocumentChunker()

print("="*70)
print("CHUNKING STRATEGY COMPARISON")
print("="*70)

print("\n1. Fixed Size Chunking (chunk_size=200, overlap=50):")
fixed_chunks = chunker.fixed_size_chunking(sample_stdf_doc, chunk_size=200, overlap=50)
for i, chunk in enumerate(fixed_chunks, 1):
    print(f"\n  Chunk {i} ({len(chunk)} chars):")
    print(f"  {chunk[:100]}...")

print("\n2. Sentence Chunking (max_sentences=3):")
sentence_chunks = chunker.sentence_chunking(sample_stdf_doc, max_sentences=3)
for i, chunk in enumerate(sentence_chunks, 1):
    print(f"\n  Chunk {i}:")
    print(f"  {chunk}")

print("\n3. Paragraph Chunking:")
paragraph_chunks = chunker.paragraph_chunking(sample_stdf_doc)
for i, chunk in enumerate(paragraph_chunks, 1):
    print(f"\n  Chunk {i}:")
    print(f"  {chunk}")

print("\n" + "="*70)
print(f"Fixed Size: {len(fixed_chunks)} chunks")
print(f"Sentence-based: {len(sentence_chunks)} chunks")
print(f"Paragraph-based: {len(paragraph_chunks)} chunks")
print("="*70)

### Choosing the Right Chunking Strategy

| Strategy | Pros | Cons | Best For |
|----------|------|------|----------|
| **Fixed Size** | Simple, consistent size | May split mid-sentence | General purpose |
| **Sentence** | Complete thoughts | Variable size | Q&A systems |
| **Paragraph** | Natural boundaries | Can be too large | Long-form content |
| **Semantic** | Meaningful groups | Computationally expensive | High-quality retrieval |

**Recommendation for STDF docs**: Paragraph or Sentence chunking (technical documentation has clear structure)

---

## 5. Embeddings - Converting Text to Vectors

Embeddings convert text into numerical vectors that capture semantic meaning.

In [None]:
# Load embedding model
print("Loading embedding model...")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print(f"‚úÖ Model loaded: {embedding_model}")
print(f"   Embedding dimension: {embedding_model.get_sentence_embedding_dimension()}")

# Generate embeddings for sample chunks
chunks_to_embed = paragraph_chunks[:3]  # Use first 3 paragraphs

embeddings = embedding_model.encode(chunks_to_embed)

print("\n" + "="*70)
print("EMBEDDING GENERATION")
print("="*70)

for i, (chunk, embedding) in enumerate(zip(chunks_to_embed, embeddings), 1):
    print(f"\nChunk {i}:")
    print(f"  Text (first 80 chars): {chunk[:80]}...")
    print(f"  Embedding shape: {embedding.shape}")
    print(f"  First 10 dimensions: {embedding[:10]}")
    print(f"  Embedding norm: {np.linalg.norm(embedding):.4f}")

# Demonstrate semantic similarity
print("\n" + "="*70)
print("SEMANTIC SIMILARITY DEMO")
print("="*70)

test_queries = [
    "What is STDF?",
    "Tell me about PTR records",
    "How to read STDF files",
    "What is the weather like?"  # Irrelevant query
]

query_embeddings = embedding_model.encode(test_queries)

# Calculate similarity with each chunk
for query, query_emb in zip(test_queries, query_embeddings):
    print(f"\nQuery: '{query}'")
    print("  Similarities with chunks:")
    
    for i, chunk_emb in enumerate(embeddings, 1):
        # Cosine similarity
        similarity = np.dot(query_emb, chunk_emb) / (
            np.linalg.norm(query_emb) * np.linalg.norm(chunk_emb)
        )
        print(f"    Chunk {i}: {similarity:.4f}")

### Popular Embedding Models

| Model | Dim | Speed | Quality | Use Case |
|-------|-----|-------|---------|----------|
| **all-MiniLM-L6-v2** | 384 | Fast | Good | General purpose |
| **all-mpnet-base-v2** | 768 | Medium | Better | Higher quality |
| **text-embedding-ada-002** (OpenAI) | 1536 | API | Best | Production |
| **instructor-large** | 768 | Medium | Domain-specific | Specialized |

---

## 6. Vector Database - ChromaDB

Store and retrieve embeddings efficiently.

In [None]:
# Initialize ChromaDB
chroma_client = chromadb.Client(Settings(
    anonymized_telemetry=False,
    is_persistent=False  # In-memory for demo
))

# Create a collection
collection = chroma_client.create_collection(
    name="stdf_documentation",
    metadata={"description": "STDF specification and test data documentation"}
)

print("‚úÖ ChromaDB collection created")

# Prepare documents for insertion
documents = paragraph_chunks
doc_embeddings = embedding_model.encode(documents)

# Add documents to collection
collection.add(
    embeddings=doc_embeddings.tolist(),
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))],
    metadatas=[{"source": "STDF_spec", "chunk_id": i} for i in range(len(documents))]
)

print(f"‚úÖ Added {len(documents)} documents to vector database")
print(f"   Collection size: {collection.count()} documents")

# Query the collection
def query_rag_system(query: str, n_results: int = 3):
    """Query the RAG system and return relevant chunks"""
    
    # Generate query embedding
    query_embedding = embedding_model.encode([query])[0]
    
    # Search vector database
    results = collection.query(
        query_embeddings=[query_embedding.tolist()],
        n_results=n_results
    )
    
    return results

# Test queries
print("\n" + "="*70)
print("RAG RETRIEVAL DEMO")
print("="*70)

test_queries = [
    "What is PTR record in STDF?",
    "How is STDF data analyzed?",
    "What are functional test records?",
]

for query in test_queries:
    print(f"\n{'='*70}")
    print(f"Query: {query}")
    print('='*70)
    
    results = query_rag_system(query, n_results=2)
    
    for i, (doc, distance) in enumerate(zip(results['documents'][0], results['distances'][0]), 1):
        print(f"\n  Result {i} (distance: {distance:.4f}):")
        print(f"  {doc}")

---

## 7. Complete RAG Pipeline

Now let's build a complete RAG system that generates answers.

In [None]:
class SimpleRAGSystem:
    """
    A simple RAG system without external LLM API
    (For demonstration - in production, use GPT-4, Claude, etc.)
    """
    
    def __init__(self, embedding_model, vector_db_collection):
        self.embedding_model = embedding_model
        self.collection = vector_db_collection
    
    def retrieve(self, query: str, top_k: int = 3) -> List[Dict[str, Any]]:
        """Retrieve relevant documents"""
        
        # Generate query embedding
        query_embedding = self.embedding_model.encode([query])[0]
        
        # Search vector database
        results = self.collection.query(
            query_embeddings=[query_embedding.tolist()],
            n_results=top_k
        )
        
        # Format results
        retrieved_docs = []
        for i in range(len(results['documents'][0])):
            retrieved_docs.append({
                'content': results['documents'][0][i],
                'distance': results['distances'][0][i],
                'metadata': results['metadatas'][0][i]
            })
        
        return retrieved_docs
    
    def generate_response(self, query: str, retrieved_docs: List[Dict]) -> str:
        """
        Generate response based on retrieved documents
        (Simplified - in production, use actual LLM)
        """
        
        # Build context from retrieved documents
        context = "\n\n".join([f"Context {i+1}:\n{doc['content']}" 
                               for i, doc in enumerate(retrieved_docs)])
        
        # In production, you would do:
        # prompt = f"Context:\n{context}\n\nQuestion: {query}\n\nAnswer:"
        # response = llm.generate(prompt)
        
        # For demo, we'll return a formatted response
        response = f"""
Based on the retrieved documentation:

{context}

[In production, an LLM would generate a natural language answer here based on the context above]

Query: {query}
Number of sources: {len(retrieved_docs)}
Average relevance: {np.mean([doc['distance'] for doc in retrieved_docs]):.4f}
        """
        
        return response.strip()
    
    def query(self, question: str, top_k: int = 3, verbose: bool = True) -> Dict[str, Any]:
        """
        Complete RAG query pipeline
        
        Returns:
            Dictionary with query, retrieved docs, and generated response
        """
        
        if verbose:
            print(f"üîç Processing query: {question}")
            print(f"üìö Retrieving top {top_k} documents...")
        
        # Step 1: Retrieve relevant documents
        retrieved_docs = self.retrieve(question, top_k=top_k)
        
        if verbose:
            print(f"‚úÖ Retrieved {len(retrieved_docs)} documents")
            for i, doc in enumerate(retrieved_docs, 1):
                print(f"   {i}. Distance: {doc['distance']:.4f}")
        
        # Step 2: Generate response
        if verbose:
            print(f"üí≠ Generating response...")
        
        response = self.generate_response(question, retrieved_docs)
        
        if verbose:
            print(f"‚úÖ Response generated")
        
        return {
            'query': question,
            'retrieved_docs': retrieved_docs,
            'response': response,
            'num_sources': len(retrieved_docs)
        }

# Initialize RAG system
rag_system = SimpleRAGSystem(embedding_model, collection)

print("="*70)
print("COMPLETE RAG SYSTEM DEMO")
print("="*70)

# Test the RAG system
questions = [
    "What is STDF and why is it used?",
    "Explain PTR and FTR records",
    "How do engineers use STDF data?",
]

for question in questions:
    print(f"\n{'='*70}")
    result = rag_system.query(question, top_k=2)
    print(f"\n{result['response']}")
    print('='*70)

---

## 8. Advanced RAG Techniques

### 8.1 Hybrid Search (Keyword + Semantic)

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

class HybridRAGSystem(SimpleRAGSystem):
    """RAG with hybrid search (semantic + keyword)"""
    
    def __init__(self, embedding_model, vector_db_collection, documents):
        super().__init__(embedding_model, vector_db_collection)
        
        # Build TF-IDF index for keyword search
        self.tfidf_vectorizer = TfidfVectorizer(max_features=100)
        self.tfidf_matrix = self.tfidf_vectorizer.fit_transform(documents)
        self.documents = documents
    
    def keyword_search(self, query: str, top_k: int = 3) -> List[int]:
        """Perform keyword-based search using TF-IDF"""
        query_vec = self.tfidf_vectorizer.transform([query])
        
        # Calculate cosine similarity
        similarities = (self.tfidf_matrix @ query_vec.T).toarray().flatten()
        
        # Get top-k indices
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        
        return top_indices.tolist()
    
    def hybrid_retrieve(self, query: str, top_k: int = 3, alpha: float = 0.5) -> List[Dict]:
        """
        Hybrid retrieval combining semantic and keyword search
        
        Args:
            alpha: Weight for semantic search (1-alpha for keyword)
        """
        
        # Semantic search
        semantic_results = self.retrieve(query, top_k=top_k*2)
        
        # Keyword search
        keyword_indices = self.keyword_search(query, top_k=top_k*2)
        
        # Combine scores (simplified - in production, use reciprocal rank fusion)
        combined_scores = {}
        
        for i, doc in enumerate(semantic_results):
            doc_id = doc['metadata']['chunk_id']
            # Lower distance = higher relevance, so invert
            combined_scores[doc_id] = alpha * (1 - doc['distance'])
        
        for rank, idx in enumerate(keyword_indices):
            score = (1 - alpha) * (1 - rank / len(keyword_indices))
            if idx in combined_scores:
                combined_scores[idx] += score
            else:
                combined_scores[idx] = score
        
        # Sort by combined score
        sorted_indices = sorted(combined_scores.items(), 
                               key=lambda x: x[1], reverse=True)[:top_k]
        
        # Retrieve full documents
        results = []
        for doc_id, score in sorted_indices:
            results.append({
                'content': self.documents[doc_id],
                'score': score,
                'metadata': {'chunk_id': doc_id, 'source': 'hybrid'}
            })
        
        return results

# Test hybrid search
hybrid_rag = HybridRAGSystem(embedding_model, collection, documents)

print("="*70)
print("HYBRID SEARCH COMPARISON")
print("="*70)

query = "parametric test PTR"

print(f"\nQuery: '{query}'")

print("\n1. Semantic Search Only:")
semantic_results = rag_system.retrieve(query, top_k=3)
for i, doc in enumerate(semantic_results, 1):
    print(f"   {i}. {doc['content'][:100]}... (distance: {doc['distance']:.4f})")

print("\n2. Hybrid Search (Semantic + Keyword):")
hybrid_results = hybrid_rag.hybrid_retrieve(query, top_k=3, alpha=0.7)
for i, doc in enumerate(hybrid_results, 1):
    print(f"   {i}. {doc['content'][:100]}... (score: {doc['score']:.4f})")

### 8.2 Query Expansion

In [None]:
def expand_query(original_query: str) -> List[str]:
    """
    Expand query with synonyms and related terms
    (In production, use LLM for sophisticated expansion)
    """
    
    # Simple synonym/related terms dictionary
    expansions = {
        'PTR': ['Parametric Test Record', 'parametric test'],
        'FTR': ['Functional Test Record', 'functional test'],
        'STDF': ['Standard Test Data Format', 'test data format'],
        'test': ['testing', 'measurement', 'evaluation'],
        'record': ['data', 'entry', 'information'],
    }
    
    # Expand query
    expanded_queries = [original_query]
    
    for term, synonyms in expansions.items():
        if term.lower() in original_query.lower():
            for synonym in synonyms:
                expanded = original_query.replace(term, synonym)
                if expanded != original_query:
                    expanded_queries.append(expanded)
    
    return expanded_queries

# Test query expansion
original = "What is PTR record format?"
expanded = expand_query(original)

print("="*70)
print("QUERY EXPANSION")
print("="*70)
print(f"Original: {original}")
print(f"\nExpanded queries:")
for i, eq in enumerate(expanded, 1):
    print(f"  {i}. {eq}")

# Retrieve with expanded queries
all_results = []
for eq in expanded:
    results = rag_system.retrieve(eq, top_k=2)
    all_results.extend(results)

# Deduplicate by content
seen_content = set()
unique_results = []
for doc in all_results:
    if doc['content'] not in seen_content:
        seen_content.add(doc['content'])
        unique_results.append(doc)

print(f"\nüìö Retrieved {len(unique_results)} unique documents using query expansion")
for i, doc in enumerate(unique_results[:3], 1):
    print(f"\n  {i}. {doc['content'][:150]}...")

---

## 9. Real-World Project: STDF Documentation RAG

Let's build a comprehensive RAG system for semiconductor test documentation.

In [None]:
# Extended STDF documentation
comprehensive_stdf_docs = [
    """
    STDF Overview: The Standard Test Data Format (STDF) is a binary format 
    for semiconductor test data. It was developed by Teradyne in 1988 and 
    is now the industry standard. STDF files store test results from 
    Automatic Test Equipment (ATE) used in semiconductor manufacturing.
    """,
    """
    File Structure: STDF files consist of a series of records. Each record 
    has a header containing record type and length, followed by the data fields. 
    Records include FAR (File Attributes Record), MIR (Master Information Record), 
    and various test result records.
    """,
    """
    PTR - Parametric Test Record: Contains results from parametric tests that 
    measure electrical characteristics. Fields include TEST_NUM (test number), 
    RESULT (measured value), LO_LIMIT and HI_LIMIT (specification limits), 
    UNITS (measurement units), and TEST_FLG (pass/fail status). Used for voltage, 
    current, frequency measurements.
    """,
    """
    FTR - Functional Test Record: Stores results from functional tests that verify 
    digital logic operations. Contains TEST_NUM, CYCL_CNT (number of cycles executed), 
    REL_VADR (failing vector address if applicable), and pass/fail status. Used for 
    testing logical functionality of devices.
    """,
    """
    MPR - Multiple Result Parametric Record: Similar to PTR but can store multiple 
    test results in a single record. Useful for tests that produce arrays of values, 
    such as pin-to-pin measurements or sampled waveforms. Contains arrays of results 
    rather than single values.
    """,
    """
    Data Analysis: STDF data is analyzed for yield improvement and failure analysis. 
    Common analyses include: (1) Yield calculation and trending, (2) Pareto analysis 
    of failure modes, (3) Correlation between test parameters, (4) Wafer mapping, 
    (5) Outlier detection, and (6) Statistical process control.
    """,
    """
    Parsing STDF: Reading STDF requires handling binary data with specific byte ordering 
    (little-endian). Each field has a defined data type (U1, U2, U4, I1, I2, I4, R4, R8, 
    Cn, Bn, etc.). Python libraries like pystdf can parse STDF files. Must handle 
    variable-length records and arrays.
    """,
    """
    Test Flow: A typical test flow in STDF includes: (1) File setup records (FAR, MIR), 
    (2) Wafer and device information (WIR, WRR, PIR, PRR), (3) Test definitions (TSR), 
    (4) Test results (PTR, FTR, MPR), (5) Bin summary (SBR, HBR), and (6) File close (MRR).
    """,
    """
    Best Practices: When working with STDF data: (1) Always validate record structure, 
    (2) Handle missing or corrupt data gracefully, (3) Use appropriate tools for analysis, 
    (4) Consider data volume (files can be gigabytes), (5) Implement caching for large datasets, 
    (6) Document test specifications clearly.
    """,
    """
    STDF V4: The current standard is STDF V4, which supports modern test equipment features. 
    It includes records for wafer test, final test, and package test. Supports multiple sites 
    (parallel testing), complex test structures, and extended device information. Backward 
    compatible with STDF V3.
    """
]

# Create production-ready RAG system
class ProductionRAGSystem:
    """
    Production-ready RAG system for STDF documentation
    """
    
    def __init__(self, embedding_model_name='all-MiniLM-L6-v2'):
        self.embedding_model = SentenceTransformer(embedding_model_name)
        self.chroma_client = chromadb.Client(Settings(
            anonymized_telemetry=False,
            is_persistent=False
        ))
        self.collection = None
        self.documents = []
        
    def index_documents(self, documents: List[str], collection_name: str = "stdf_kb"):
        """Index documents into vector database"""
        
        # Clean and chunk documents
        processed_docs = []
        for doc in documents:
            # Clean whitespace
            doc = ' '.join(doc.split())
            processed_docs.append(doc)
        
        self.documents = processed_docs
        
        # Generate embeddings
        print(f"üîÑ Generating embeddings for {len(processed_docs)} documents...")
        embeddings = self.embedding_model.encode(processed_docs, show_progress_bar=True)
        
        # Create collection
        if self.collection:
            self.chroma_client.delete_collection(collection_name)
        
        self.collection = self.chroma_client.create_collection(
            name=collection_name,
            metadata={"description": "STDF knowledge base"}
        )
        
        # Add to vector database
        self.collection.add(
            embeddings=embeddings.tolist(),
            documents=processed_docs,
            ids=[f"stdf_doc_{i}" for i in range(len(processed_docs))],
            metadatas=[{"doc_id": i, "category": "stdf"} for i in range(len(processed_docs))]
        )
        
        print(f"‚úÖ Indexed {len(processed_docs)} documents")
    
    def search(self, query: str, top_k: int = 3) -> Dict[str, Any]:
        """Search knowledge base"""
        
        # Generate query embedding
        query_embedding = self.embedding_model.encode([query])[0]
        
        # Search
        results = self.collection.query(
            query_embeddings=[query_embedding.tolist()],
            n_results=top_k
        )
        
        # Format results
        search_results = {
            'query': query,
            'results': []
        }
        
        for i in range(len(results['documents'][0])):
            search_results['results'].append({
                'content': results['documents'][0][i],
                'distance': results['distances'][0][i],
                'metadata': results['metadatas'][0][i]
            })
        
        return search_results
    
    def answer_question(self, question: str, top_k: int = 3) -> str:
        """
        Answer question using RAG
        (In production, integrate with OpenAI/Anthropic API)
        """
        
        # Retrieve relevant docs
        search_results = self.search(question, top_k=top_k)
        
        # Build context
        context_parts = []
        for i, result in enumerate(search_results['results'], 1):
            context_parts.append(f"[Source {i}]\n{result['content']}")
        
        context = "\n\n".join(context_parts)
        
        # In production:
        # prompt = f"""Based on the following context, answer the question.
        #
        # Context:
        # {context}
        #
        # Question: {question}
        #
        # Answer:"""
        #
        # answer = openai.ChatCompletion.create(
        #     model="gpt-4",
        #     messages=[{"role": "user", "content": prompt}]
        # )
        
        # For demo, return formatted context
        answer = f"""
Question: {question}

Relevant Information Found:
{context}

[An LLM would generate a natural language answer here synthesizing the above sources]
        """.strip()
        
        return answer

# Initialize and index
print("="*70)
print("PRODUCTION STDF RAG SYSTEM")
print("="*70)

stdf_rag = ProductionRAGSystem()
stdf_rag.index_documents(comprehensive_stdf_docs)

# Test with various questions
test_questions = [
    "What is the difference between PTR and FTR records?",
    "How do I parse STDF files in Python?",
    "What analysis can be done with STDF data?",
    "Explain the structure of STDF V4",
]

for question in test_questions:
    print(f"\n{'='*70}")
    print(f"Question: {question}")
    print('='*70)
    answer = stdf_rag.answer_question(question, top_k=2)
    print(answer)

---

## 10. RAG Evaluation Metrics

How do we know if our RAG system is good?

In [None]:
class RAGEvaluator:
    """Evaluate RAG system performance"""
    
    @staticmethod
    def retrieval_metrics(relevant_docs: List[str], retrieved_docs: List[str]) -> Dict[str, float]:
        """
        Calculate retrieval metrics
        
        Args:
            relevant_docs: List of actually relevant document IDs
            retrieved_docs: List of retrieved document IDs
        """
        relevant_set = set(relevant_docs)
        retrieved_set = set(retrieved_docs)
        
        # True positives
        tp = len(relevant_set & retrieved_set)
        
        # Precision: What fraction of retrieved docs are relevant?
        precision = tp / len(retrieved_set) if retrieved_set else 0
        
        # Recall: What fraction of relevant docs were retrieved?
        recall = tp / len(relevant_set) if relevant_set else 0
        
        # F1 Score
        f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        
        return {
            'precision': precision,
            'recall': recall,
            'f1_score': f1,
            'retrieved_count': len(retrieved_set),
            'relevant_count': len(relevant_set)
        }
    
    @staticmethod
    def mrr(relevant_docs: List[str], retrieved_docs_ranked: List[str]) -> float:
        """
        Mean Reciprocal Rank
        Score based on position of first relevant document
        """
        for i, doc in enumerate(retrieved_docs_ranked, 1):
            if doc in relevant_docs:
                return 1.0 / i
        return 0.0
    
    @staticmethod
    def ndcg(relevant_docs: List[str], retrieved_docs_ranked: List[str], k: int = None) -> float:
        """
        Normalized Discounted Cumulative Gain
        Accounts for ranking quality
        """
        if k:
            retrieved_docs_ranked = retrieved_docs_ranked[:k]
        
        # Calculate DCG
        dcg = 0.0
        for i, doc in enumerate(retrieved_docs_ranked, 1):
            if doc in relevant_docs:
                dcg += 1.0 / np.log2(i + 1)
        
        # Calculate ideal DCG
        idcg = sum(1.0 / np.log2(i + 2) for i in range(min(len(relevant_docs), len(retrieved_docs_ranked))))
        
        return dcg / idcg if idcg > 0 else 0.0

# Example evaluation
print("="*70)
print("RAG EVALUATION EXAMPLE")
print("="*70)

# Simulate ground truth
relevant_docs = ['stdf_doc_2', 'stdf_doc_3', 'stdf_doc_6']  # PTR, FTR, parsing
retrieved_docs = ['stdf_doc_2', 'stdf_doc_3', 'stdf_doc_5', 'stdf_doc_1']

evaluator = RAGEvaluator()

metrics = evaluator.retrieval_metrics(relevant_docs, retrieved_docs)
print("\nRetrieval Metrics:")
print(f"  Precision: {metrics['precision']:.4f}")
print(f"  Recall:    {metrics['recall']:.4f}")
print(f"  F1 Score:  {metrics['f1_score']:.4f}")

mrr_score = evaluator.mrr(relevant_docs, retrieved_docs)
print(f"\nMean Reciprocal Rank: {mrr_score:.4f}")

ndcg_score = evaluator.ndcg(relevant_docs, retrieved_docs, k=3)
print(f"NDCG@3: {ndcg_score:.4f}")

print("\nüí° Interpretation:")
print(f"  - Retrieved 2 out of 3 relevant documents")
print(f"  - First relevant doc at position 1 (excellent)")
print(f"  - Overall ranking quality: {ndcg_score:.2%}")

---

## 11. Advanced RAG Patterns

### Common RAG Challenges & Solutions

| Challenge | Solution | Implementation |
|-----------|----------|----------------|
| **Chunk size too large** | Reduce chunk size, increase overlap | Notebook 080 |
| **Retrieval misses context** | Use parent-child chunking | Advanced RAG |
| **Hallucination** | Add source citations, confidence scores | Prompt engineering |
| **Slow retrieval** | Use approximate nearest neighbors, cache | Vector DB optimization |
| **Multi-hop questions** | Iterative retrieval, chain of thought | Agentic RAG |
| **Domain mismatch** | Fine-tune embeddings, use domain LLMs | Fine-tuning |

### Next-Level RAG Techniques (Covered in 080):
- üîπ **Re-ranking**: Two-stage retrieval for better precision
- üîπ **Query transformation**: Rewrite queries for better matching
- üîπ **Contextual compression**: Remove irrelevant parts of chunks
- üîπ **Self-RAG**: Model checks if retrieved content is relevant
- üîπ **Corrective RAG**: Falls back to web search if local docs insufficient

---

## 12. Key Takeaways

### ‚úÖ RAG is Essential When:
- Building Q&A systems over documents
- Need up-to-date information beyond training data
- Want to cite sources and provide transparency
- Working with proprietary/private data
- Cost-effective alternative to fine-tuning

### üéØ Best Practices:
1. **Chunking**: Experiment with strategies, use overlap
2. **Embeddings**: Choose model based on domain and speed requirements
3. **Retrieval**: Start with top-k=3-5, tune based on performance
4. **Evaluation**: Always measure precision, recall, and user satisfaction
5. **Hybrid Search**: Combine semantic and keyword for robustness
6. **Metadata**: Store source, date, section for filtering
7. **Monitoring**: Track retrieval quality and user feedback

### ‚ö†Ô∏è Common Pitfalls:
- Chunks too large or too small
- Not handling edge cases (empty results)
- Ignoring retrieval quality metrics
- Over-relying on retrieval without LLM verification
- Not updating knowledge base regularly

---

## 13. Practice Exercises

### Exercise 1: Custom Chunking
Implement a chunking strategy optimized for technical specifications with:
- Section headers preserved
- Tables kept together
- Code blocks not split

### Exercise 2: Multi-Document RAG
Extend the RAG system to handle multiple document types:
- PDF files (specs)
- Code files (examples)
- Test logs (results)

### Exercise 3: RAG with Citations
Modify the system to return citations with each answer:
- Document name
- Page/section number
- Relevance score

### Exercise 4: Real STDF RAG
Build a RAG system for your actual STDF documentation:
- Index your company's test specs
- Add test procedure documents
- Include troubleshooting guides

---

## 14. Integration with LLMs

### Using OpenAI API (Production Example)

```python
import openai

def rag_with_gpt4(question: str, retrieved_docs: List[str]) -> str:
    """RAG with GPT-4"""
    
    context = "\n\n".join([f"Source {i+1}:\n{doc}" 
                          for i, doc in enumerate(retrieved_docs)])
    
    prompt = f"""You are a semiconductor test engineer assistant. 
Answer the question based on the provided STDF documentation context.
If the answer is not in the context, say "I don't find that information in the provided documentation."

Context:
{context}

Question: {question}

Answer:"""
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful semiconductor test engineer assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3  # Lower temperature for factual responses
    )
    
    return response.choices[0].message.content
```

### Using LangChain (Simplified)

```python
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma

# Create retrieval chain
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(temperature=0),
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True
)

# Query
result = qa_chain({"query": "What is PTR record?"})
print(result['result'])
print(result['source_documents'])
```

---

## 15. Next Steps

**Congratulations! üéâ** You've mastered RAG fundamentals!

### Continue Your Learning:
- ‚Üí **080_Advanced_RAG_Techniques.ipynb** - Re-ranking, query transformation, self-RAG
- ‚Üí **081_Knowledge_Graphs_Basics.ipynb** - Structured knowledge representation
- ‚Üí **083_KG_Enhanced_RAG.ipynb** - Combine KG with RAG for better results
- ‚Üí **084_Semantic_Search_Advanced.ipynb** - Advanced retrieval techniques
- ‚Üí **086_LangChain_Framework.ipynb** - Production RAG with LangChain
- ‚Üí **090_Production_Agent_Systems.ipynb** - Deploy RAG in production

### Project Ideas:
1. **STDF Documentation Assistant**: Complete Q&A system for test documentation
2. **Failure Analysis RAG**: Retrieve similar historical failures
3. **Test Procedure Guide**: Interactive guide with RAG
4. **Multi-Modal RAG**: Combine text, images, and wafer maps

---

**You're now ready to build production RAG systems! üöÄ**

**Next: Advanced RAG Techniques (080) ‚Üí**