# RAG System using LLaMa Model

## Prerequisites
- Python 3.10 or lower
- macOS environment

## Setup Instructions

### 1. Create and Activate Virtual Environment
```bash
# Create virtual environment
python3.10 -m venv venv

# Activate virtual environment
source venv/bin/activate
```

### 2. Install PyTorch
Install PyTorch, torchvision, and torchaudio first:
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
```

### 3. Install Jupyter and Setup Kernel
```bash
# Install Jupyter and IPython kernel
pip install jupyter ipykernel

# Register the virtual environment as a Jupyter kernel
python -m ipykernel install --user --name=venv --display-name "Python (venv)"

# Start Jupyter Lab
jupyter lab
```

### 4. Additional Setup Notes
- Make sure to select the "Python (venv)" kernel in your Jupyter notebook
- The kernel name will appear in the top right corner of your notebook
- You can switch kernels at any time using the kernel menu

### Troubleshooting
- If you encounter any issues with the kernel, try:
  1. Restarting the kernel
  2. Rerunning the kernel installation command
  3. Verifying that your virtual environment is activated

### Install the required libraries

In [1]:
pip install PyPDF2 sentence-transformers rank-bm25 llama-cpp-python fpdf ollama

Note: you may need to restart the kernel to use updated packages.


In [2]:
!pip install --upgrade typing_extensions pydantic
!pip uninstall ollama -y
!pip install ollama

Found existing installation: ollama 0.4.7
Uninstalling ollama-0.4.7:
  Successfully uninstalled ollama-0.4.7
Collecting ollama
  Using cached ollama-0.4.7-py3-none-any.whl (13 kB)
Installing collected packages: ollama
Successfully installed ollama-0.4.7
[0m

### Import the required libraries

In [3]:
import os
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
import PyPDF2
import numpy as np
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
from pathlib import Path
import ollama

# PDFChunk Class
`PDFChunk` is a data class that represents a segment of a PDF document. It contains:
- `content`: The actual text content from the PDF
- `page_num`: Page number where the content was found
- `chunk_num`: Sequential ID for the chunk
- `metadata`: Additional information like filename, title, author
- `score`: Relevance score used during search (defaults to 0.0)

Example:
```python
chunk = PDFChunk(content="Sample text", page_num=0, chunk_num=3, 
                 metadata={"filename": "doc.pdf"}, score=0.85)
```

In [4]:
@dataclass
class PDFChunk:
    content: str
    page_num: int
    chunk_num: int
    metadata: Dict[str, Any]
    score: float = 0.0

# PDFProcessor Class
A utility class that processes PDF documents by breaking them into smaller, manageable chunks with configurable size (default 500 words) and overlap (default 50 words). It reads PDFs using PyPDF2, extracts metadata (filename, pages, title, author), and processes each page into overlapping text chunks stored as PDFChunk objects.

```python
processor = PDFProcessor(chunk_size=500, chunk_overlap=50)
chunks = processor.process_pdf("document.pdf")  # Returns list of PDFChunk objects
```

In [5]:
class PDFProcessor:
    """Handles PDF document processing and chunking"""
    
    def __init__(self, chunk_size: int = 500, chunk_overlap: int = 50):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
    
    def process_pdf(self, pdf_path: str) -> List[PDFChunk]:
        chunks = []
        
        with open(pdf_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            
            # Extract metadata
            metadata = {
                'filename': os.path.basename(pdf_path),
                'num_pages': len(pdf_reader.pages),
                'title': pdf_reader.metadata.get('/Title', ''),
                'author': pdf_reader.metadata.get('/Author', '')
            }
            
            # Process each page
            for page_num, page in enumerate(pdf_reader.pages):
                text = page.extract_text()
                page_chunks = self._chunk_text(text, page_num, metadata)
                chunks.extend(page_chunks)
        
        return chunks
    
    def _chunk_text(self, text: str, page_num: int, metadata: Dict) -> List[PDFChunk]:
        words = text.split()
        chunks = []
        chunk_num = 0
        
        for i in range(0, len(words), self.chunk_size - self.chunk_overlap):
            chunk_words = words[i:i + self.chunk_size]
            if chunk_words:
                chunk_text = ' '.join(chunk_words)
                chunk = PDFChunk(
                    content=chunk_text,
                    page_num=page_num,
                    chunk_num=chunk_num,
                    metadata={**metadata, 'chunk_location': f'page_{page_num}_chunk_{chunk_num}'}
                )
                chunks.append(chunk)
                chunk_num += 1
        
        return chunks


# EnhancedHybridRetriever Class
A hybrid search system that combines semantic search (using SentenceTransformer embeddings, weighted at 70%) and lexical search (using BM25, weighted at 30%) to find relevant document chunks. It normalizes and combines both scores to provide the most relevant results, using `add_chunks()` to index content and `retrieve()` to get top-k matches.

```python
retriever = EnhancedHybridRetriever(semantic_weight=0.7)
results = retriever.retrieve("query", top_k=5)  # Returns top 5 relevant chunks
```

In [6]:
class EnhancedHybridRetriever:
    """Enhanced retriever with PDF support and hybrid search"""
    
    def __init__(self, 
                 semantic_model_name: str = "all-MiniLM-L6-v2",
                 semantic_weight: float = 0.7):
        self.semantic_model = SentenceTransformer(semantic_model_name)
        self.semantic_weight = semantic_weight
        self.bm25 = None
        self.chunks: List[PDFChunk] = []
        self.embeddings = None
        
    def add_chunks(self, chunks: List[PDFChunk]):
        """Index PDF chunks for hybrid search"""
        self.chunks = chunks
        texts = [chunk.content for chunk in chunks]
        
        # Create BM25 index
        tokenized_texts = [text.lower().split() for text in texts]
        self.bm25 = BM25Okapi(tokenized_texts)
        
        # Create dense embeddings
        self.embeddings = self.semantic_model.encode(texts)
    
    def retrieve(self, query: str, top_k: int = 5) -> List[PDFChunk]:
        """Perform hybrid retrieval combining BM25 and semantic search"""
        # BM25 scoring
        tokenized_query = query.lower().split()
        bm25_scores = np.array(self.bm25.get_scores(tokenized_query))
        
        # Normalize BM25 scores
        bm25_scores = bm25_scores / bm25_scores.max() if bm25_scores.max() > 0 else bm25_scores
        
        # Semantic scoring
        query_embedding = self.semantic_model.encode(query)
        semantic_scores = np.dot(self.embeddings, query_embedding)
        semantic_scores = semantic_scores / semantic_scores.max() if semantic_scores.max() > 0 else semantic_scores
        
        # Combine scores
        combined_scores = (self.semantic_weight * semantic_scores + 
                         (1 - self.semantic_weight) * bm25_scores)
        
        # Get top-k candidates
        top_k_indices = np.argsort(combined_scores)[-top_k:][::-1]
        
        # Update scores and return top chunks
        results = []
        for idx in top_k_indices:
            chunk = self.chunks[idx]
            chunk.score = float(combined_scores[idx])
            results.append(chunk)
        
        return results


# OllamaRAG Class
A Retrieval-Augmented Generation (RAG) system that combines document retrieval with Ollama's LLM capabilities. It retrieves relevant document chunks using a hybrid retriever, constructs a prompt with the retrieved context, and generates an answer using Ollama, returning both the response and metadata about the used chunks.

```python
rag = OllamaRAG(model_name="llama2", retriever=hybrid_retriever)
result = rag.generate_response("How does X work?", num_chunks=3)
```

In [7]:
class OllamaRAG:
    """RAG system using Ollama"""
    
    def __init__(self, 
                 model_name: str,
                 retriever: EnhancedHybridRetriever,
                 temperature: float = 0.7):
        self.model_name = model_name
        self.retriever = retriever
        self.temperature = temperature
    
    def generate_response(self, query: str, num_chunks: int = 3) -> Dict[str, Any]:
        """Generate response using retrieved context and Ollama"""
        # Retrieve relevant chunks
        relevant_chunks = self.retriever.retrieve(query, top_k=num_chunks)
        
        # Prepare context
        context = "\n\n".join([
            f"[Page {chunk.page_num + 1}]: {chunk.content}"
            for chunk in relevant_chunks
        ])
        
        # Construct prompt
        prompt = f"""Use the following passages to answer the question. Include relevant page numbers in your response.
        If you cannot find the answer in the passages, say so.

        Passages:
        {context}

        Question: {query}

        Answer:"""
        
        # Generate response using Ollama
        response = ollama.generate(
            model=self.model_name,
            prompt=prompt,
   #         temperature=self.temperature
        )
        
        return {
            'query': query,
            'response': response['response'],
            'used_chunks': [
                {
                    'content': chunk.content,
                    'page': chunk.page_num + 1,
                    'score': chunk.score,
                    'filename': chunk.metadata['filename']
                }
                for chunk in relevant_chunks
            ]
        }



# initialize_rag_system Function
A setup function that creates a complete RAG system by initializing components (PDFProcessor and HybridRetriever), processing a set of expected PDF documents into chunks, indexing these chunks, and returning a configured OllamaRAG instance ready for question answering.

```python
rag_system = initialize_rag_system(docs_path="path/to/docs", model_name="llama3.2")
```

In [8]:
def initialize_rag_system(docs_path: str = "/Users/Ritesh/Documents/dev/ai-engg/content/ai-engg/week2/docs", model_name: str = "llama3.2") -> OllamaRAG:
    """Initialize the RAG system with documents from the specified path"""
    # Initialize components
    pdf_processor = PDFProcessor()
    retriever = EnhancedHybridRetriever()
    
    # Expected PDF files
    expected_pdfs = [
        "customer_documentation.pdf",
        "incident_runbook.pdf",
        "product_architecture.pdf"
    ]
    
    # Process PDFs
    all_chunks = []
    for pdf_name in expected_pdfs:
        pdf_path = os.path.join(docs_path, pdf_name)
        if not os.path.exists(pdf_path):
            raise FileNotFoundError(
                f"Could not find {pdf_name}. Please ensure it exists in the {docs_path} directory."
            )
        chunks = pdf_processor.process_pdf(pdf_path)
        all_chunks.extend(chunks)
    
    # Index chunks
    retriever.add_chunks(all_chunks)
    
    # Initialize RAG system
    return OllamaRAG(model_name, retriever)


# run_rag_demo Function
A demonstration function that showcases the RAG system's capabilities by running predefined example queries and providing an interactive mode for custom questions. It initializes the system, processes queries, and displays results with source attribution and relevance scores, while also handling potential errors gracefully.

```python
run_rag_demo()  # Runs demo with example queries then enters interactive mode
```

In [9]:
def run_rag_demo():
    """Demo the RAG system with example queries"""
    try:
        # Initialize the system
        print("Initializing RAG system...")
        rag = initialize_rag_system()
        
        # Example queries
        queries = [
            "What are the hardware requirements for TechnoVision's NeuroStack platform?",
            "What happened during the March 10, 2024 outage at TechnoVision AI?",
            "How do I set up a TechnoVision account and generate a TV-API-KEY?"
        ]
        
        # Run queries
        for query in queries:
            print(f"\nQuestion: {query}")
            result = rag.generate_response(query)
            
            print("\nAnswer:", result['response'])
            print("\nSources:")
            for chunk in result['used_chunks']:
                print(f"- {chunk['filename']}, Page {chunk['page']} (Score: {chunk['score']:.3f})")
            
            input("\nPress Enter for next question...")
        
        # Interactive mode
        print("\nEntering interactive mode. Type 'exit' to quit.")
        while True:
            query = input("\nEnter your question: ")
            if query.lower() == 'exit':
                break
                
            result = rag.generate_response(query)
            print("\nAnswer:", result['response'])
            print("\nSources:")
            for chunk in result['used_chunks']:
                print(f"- {chunk['filename']}, Page {chunk['page']} (Score: {chunk['score']:.3f})")
    
    except Exception as e:
        print(f"Error: {str(e)}")
        print("\nPlease ensure:")
        print("1. Ollama is running locally (default port: 11434)")
        print("2. The required model is available (default: llama2)")
        print("3. PDF documents are present in the 'docs' directory")



In [None]:
if __name__ == "__main__":
    run_rag_demo()

Initializing RAG system...

Question: What are the hardware requirements for TechnoVision's NeuroStack platform?

Answer: According to Page 1 of the passages, the hardware requirements for NeuroStack include:

* TechnoVision Custom Silicon (TV-GPU-2024 series)
* NeuroStack Accelerator Cards
* Minimum 128GB of TechnoVision Certified Memory.

Note that there are no other specific hardware requirements mentioned in the passage.

Sources:
- product_architecture.pdf, Page 1 (Score: 1.000)
- incident_runbook.pdf, Page 1 (Score: 0.412)
- customer_documentation.pdf, Page 1 (Score: 0.345)



Press Enter for next question... 



Question: What happened during the March 10, 2024 outage at TechnoVision AI?

Answer: According to Passage [Page 1], on March 10, 2024, there was an outage due to a malfunction of the TechnoVision API rate limiter. The impact was significant, affecting 4 enterprise customers in the APAC region. To resolve this issue, an emergency patch (TV-Hotfix-2024-03) was applied.

Sources:
- incident_runbook.pdf, Page 1 (Score: 1.000)
- customer_documentation.pdf, Page 1 (Score: 0.713)
- product_architecture.pdf, Page 1 (Score: 0.712)



Press Enter for next question... 



Question: How do I set up a TechnoVision account and generate a TV-API-KEY?

Answer: To set up a TechnoVision account and generate a TV-API-KEY, follow these steps:

1. Register at console.technovision.ai (Page 1).
2. Generate the TV-API-KEY from the Singapore portal.

No specific page number is required for this answer as it can be inferred from the provided passages.

Sources:
- customer_documentation.pdf, Page 1 (Score: 1.000)
- product_architecture.pdf, Page 1 (Score: 0.541)
- incident_runbook.pdf, Page 1 (Score: 0.433)
