# Lecture 83 â€“ RAG with LangChain and Gradio

## Learning Objectives
- Understand Retrieval-Augmented Generation (RAG)
- Build a semantic search system with embeddings
- Create a FAISS vector database
- Implement a Gradio UI for RAG
- Integrate with language models

## Expected Runtime
~7 minutes (includes model downloads)

---

In [None]:
# !pip install sentence-transformers faiss-cpu gradio langchain transformers

In [None]:
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss
from typing import List, Tuple

print("âœ“ Imports successful")

## 1. Create Sample Knowledge Base

In [None]:
# Sample corpus about ML deployment
DOCUMENTS = [
    "Deep learning uses neural networks with multiple layers for complex pattern recognition.",
    "Model deployment involves serializing trained models and serving them via APIs.",
    "FastAPI is a high-performance Python framework ideal for ML model serving.",
    "Docker containers ensure consistent deployment across different environments.",
    "RAG systems combine retrieval and generation for accurate, grounded responses.",
    "TensorFlow and PyTorch are the leading deep learning frameworks.",
    "Monitoring model performance in production is crucial for maintaining quality.",
    "API versioning allows smooth transitions between model updates."
]

print(f"Knowledge base: {len(DOCUMENTS)} documents")
for i, doc in enumerate(DOCUMENTS, 1):
    print(f"{i}. {doc[:60]}...")

## 2. Create Embeddings with Sentence Transformers

In [None]:
# Load embedding model
print("Loading embedding model...")
embedder = SentenceTransformer('all-MiniLM-L6-v2')

print(f"Model dimension: {embedder.get_sentence_embedding_dimension()}")
print("âœ“ Model loaded")

In [None]:
# Generate embeddings
print("Encoding documents...")
embeddings = embedder.encode(DOCUMENTS, show_progress_bar=True)
embeddings = np.array(embeddings).astype('float32')

print(f"Embeddings shape: {embeddings.shape}")
print(f"Sample embedding (first 10 values): {embeddings[0][:10]}")

## 3. Build FAISS Index

In [None]:
# Create FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)  # L2 distance

# Add vectors to index
index.add(embeddings)

print(f"âœ“ FAISS index created")
print(f"  Vectors in index: {index.ntotal}")
print(f"  Index dimension: {dimension}")

## 4. Semantic Search Function

In [None]:
def semantic_search(query: str, top_k: int = 3) -> List[Tuple[str, float]]:
    """Perform semantic search and return top-k results."""
    # Encode query
    query_embedding = embedder.encode([query]).astype('float32')
    
    # Search index
    distances, indices = index.search(query_embedding, top_k)
    
    # Format results
    results = []
    for idx, dist in zip(indices[0], distances[0]):
        results.append((DOCUMENTS[idx], float(dist)))
    
    return results

# Test
test_query = "How do I deploy a model?"
results = semantic_search(test_query, top_k=3)

print(f"Query: '{test_query}'\n")
for i, (doc, score) in enumerate(results, 1):
    print(f"{i}. [{score:.4f}] {doc}")

## 5. RAG Pipeline with Mock LLM

In [None]:
def generate_answer(query: str, context_docs: List[str]) -> str:
    """
    Generate answer using retrieved context.
    In production, replace with actual LLM API call.
    """
    context = "\n".join([f"- {doc}" for doc in context_docs])
    
    # Mock response (replace with OpenAI/Anthropic/HuggingFace API)
    answer = f"""**Question:** {query}

**Retrieved Context:**
{context}

**Generated Answer:**
Based on the retrieved information, here's what you need to know:

{context_docs[0]}

This demonstrates a RAG system that retrieves relevant information and uses it to generate contextual answers. In production, this would use a large language model like GPT-4, Claude, or an open-source model to generate more sophisticated responses.

*Note: To use a real LLM, set your API key and integrate the model here.*
"""
    return answer

def rag_pipeline(query: str, top_k: int = 3) -> Tuple[str, List[str]]:
    """Complete RAG: retrieve + generate."""
    # Retrieve
    results = semantic_search(query, top_k)
    docs = [doc for doc, _ in results]
    
    # Generate
    answer = generate_answer(query, docs)
    
    return answer, docs

# Test RAG pipeline
answer, sources = rag_pipeline("What is RAG?")
print(answer)

## 6. Create Gradio Interface

In [None]:
import gradio as gr

def gradio_rag(question: str, num_sources: int = 3):
    """Gradio interface function."""
    if not question.strip():
        return "Please enter a question.", ""
    
    answer, sources = rag_pipeline(question, num_sources)
    sources_text = "\n\n".join([f"ðŸ“„ {i+1}. {s}" for i, s in enumerate(sources)])
    
    return answer, sources_text

# Create interface
with gr.Blocks(title="RAG Demo") as demo:
    gr.Markdown("# ðŸ¤– RAG System Demo")
    gr.Markdown("Ask questions about ML deployment!")
    
    with gr.Row():
        with gr.Column():
            question = gr.Textbox(label="Question", lines=3)
            num_sources = gr.Slider(1, 5, 3, step=1, label="Sources")
            submit = gr.Button("Ask")
        
        with gr.Column():
            answer = gr.Textbox(label="Answer", lines=10)
            sources = gr.Textbox(label="Sources", lines=5)
    
    gr.Examples(
        [["What is deep learning?"],
         ["How do I deploy a model?"],
         ["What is RAG?"]],
        inputs=question
    )
    
    submit.click(gradio_rag, [question, num_sources], [answer, sources])

# Launch (uncomment to run)
# demo.launch(share=True)
print("âœ“ Gradio interface created")
print("  Uncomment demo.launch() to start the UI")

## 7. Integration with Real LLMs

### OpenAI Example

In [None]:
# Example OpenAI integration (requires API key)
example_code = '''
import openai
import os

openai.api_key = os.getenv("OPENAI_API_KEY")

def generate_with_openai(query: str, context_docs: List[str]) -> str:
    context = "\\n".join(context_docs)
    
    messages = [
        {"role": "system", "content": "You are a helpful AI assistant. Answer based on the provided context."},
        {"role": "user", "content": f"Context:\\n{context}\\n\\nQuestion: {query}"}
    ]
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=messages,
        temperature=0.7
    )
    
    return response.choices[0].message.content
'''

print("OpenAI Integration Example:")
print(example_code)

## Summary

âœ“ Built semantic search with sentence-transformers  
âœ“ Created FAISS vector database  
âœ“ Implemented RAG pipeline  
âœ“ Created Gradio UI  

### Production Checklist:
- [ ] Use production vector DB (Pinecone, Weaviate, Qdrant)
- [ ] Integrate real LLM (OpenAI, Anthropic, or open-source)
- [ ] Add caching for embeddings
- [ ] Implement rate limiting
- [ ] Monitor retrieval quality
- [ ] Add feedback collection

**Next**: `04_docker_and_containerization.ipynb`