### Data Ingestion

In [1]:
from langchain_core.documents import Document

In [2]:
doc=Document(
    page_content="This is the main text content I am using to create RAG",
    metadata={
        "source":"example.txt",
        "pages":1,
        "author":"Lokesh",
        "date_created":"2025-10-02"
    }
)

doc

Document(metadata={'source': 'example.txt', 'pages': 1, 'author': 'Lokesh', 'date_created': '2025-10-02'}, page_content='This is the main text content I am using to create RAG')

In [3]:
import os
os.makedirs("../data/text_files",exist_ok=True)

In [4]:
sample_texts={
    "../data/text_files/python_intro.txt":"""
    Python is a high-level, interpreted programming language known for its readability and versatility. It emphasizes clear and concise code, making it an excellent choice for beginners. Python's design prioritizes developer productivity and code maintainability.

Key Characteristics:
Readability: Python's syntax is designed to be intuitive and easy to understand, often resembling natural language. It uses indentation to define code blocks, promoting a clean and consistent style.
Interpreted Language: Python code is executed line by line by an interpreter, without the need for a separate compilation step. This allows for rapid development and testing.
Dynamically Typed: Variable types in Python are determined at runtime, meaning you don't explicitly declare the type of a variable when you create it.
Versatility: Python is a general-purpose language used in various domains, including web development (Django, Flask), data analysis and machine learning (NumPy, Pandas, scikit-learn), automation, scientific computing, game development, and more.
Large Standard Library and Ecosystem: Python comes with a comprehensive standard library and a vast ecosystem of third-party libraries and frameworks, providing tools for almost any programming task.
Cross-Platform Compatibility: Python code can run on different operating systems like Windows, macOS, and Linux without significant modifications.

Basic Concepts:
Variables: Used to store data in memory. They are assigned values using the = operator (e.g., name = "Alice").
Data Types: Python supports various data types, including numbers (integers, floats), strings (text), booleans (True/False), lists, tuples, dictionaries, and sets.
Control Flow: Statements like if/elif/else for conditional execution and for and while loops for repetitive tasks control the flow of a program.
Functions: Reusable blocks of code that perform specific tasks. They are defined using the def keyword.
Modules: Python files containing functions, classes, and variables that can be imported and used in other Python programs, promoting code organization and reusability.
""",

"../data/text_files/machine_learning.txt":""" 
Machine learning (ML) is a field of Artificial Intelligence (AI) where systems learn from data to identify patterns and make predictions or decisions without explicit programming. It involves gathering and preparing data, choosing appropriate algorithms, training a model to learn patterns, and evaluating its performance on new, unseen data to ensure it generalizes well. The primary goal is to build models that can apply learned patterns to real-world tasks, a process called inference. 
Key Concepts
Artificial Intelligence (AI): A broad field of AI that focuses on creating machines capable of performing tasks that typically require human intelligence. 
Algorithms: The set of rules or procedures that a machine learning model follows to learn from data. 
Data: The raw information (examples or experiences) that machine learning models learn from to identify patterns and make predictions. 
Model Training: The process of feeding data into an algorithm to learn patterns and adjust parameters to optimize performance on the training data. 
Generalization: The ability of a trained model to perform well on new, unseen data by applying the patterns learned from the training set to real-world scenarios. 
Core Components of the Machine Learning Process
Data Collection: Gathering relevant and sufficient data to be used for training the model. 
Data Preparation: Cleaning, structuring, and transforming the raw data into a format that is suitable for the machine learning algorithm. 
Algorithm Selection: Choosing the right type of algorithm (e.g., supervised, unsupervised, reinforcement) that is best suited for the problem at hand. 
Model Training: Training the selected algorithm on the prepared data to help it learn patterns and relationships. 
Model Evaluation: Testing the trained model's performance on new, unseen data to assess its accuracy and ability to generalize. 
Inference (Deployment): Using the finalized and well-performing model to make predictions or decisions on real-world data. 
Types of Machine Learning
Supervised Learning: Models are trained on labeled data (input-output pairs) to predict or classify new data. 
Unsupervised Learning: Models discover hidden patterns, structures, and clusters in unlabeled data without predefined output. 
Reinforcement Learning: Models learn by trial and error, receiving rewards or penalties for their actions to maximize performance in decision-making tasks. 
"""
}

for filepath,content in sample_texts.items():
    with open(filepath,'w',encoding="utf-8") as f:
        f.write(content)

print("Sample text files created!")

Sample text files created!


In [5]:
##Loading the text

from langchain.document_loaders import TextLoader

loader=TextLoader("../data/text_files/python_intro.txt",encoding="utf-8")
document=loader.load()

print(document)

[Document(metadata={'source': '../data/text_files/python_intro.txt'}, page_content='\n    Python is a high-level, interpreted programming language known for its readability and versatility. It emphasizes clear and concise code, making it an excellent choice for beginners. Python\'s design prioritizes developer productivity and code maintainability.\n\nKey Characteristics:\nReadability: Python\'s syntax is designed to be intuitive and easy to understand, often resembling natural language. It uses indentation to define code blocks, promoting a clean and consistent style.\nInterpreted Language: Python code is executed line by line by an interpreter, without the need for a separate compilation step. This allows for rapid development and testing.\nDynamically Typed: Variable types in Python are determined at runtime, meaning you don\'t explicitly declare the type of a variable when you create it.\nVersatility: Python is a general-purpose language used in various domains, including web devel

In [6]:
## Directory Loading

from langchain.document_loaders import DirectoryLoader

dir_loader=DirectoryLoader("../data/text_files",
                       glob="**/*.txt",
                       loader_cls=TextLoader,
                       loader_kwargs={
                       "encoding":"utf-8"},
                       show_progress=False)
document=dir_loader.load()

print(document)

[Document(metadata={'source': '../data/text_files/python_intro.txt'}, page_content='\n    Python is a high-level, interpreted programming language known for its readability and versatility. It emphasizes clear and concise code, making it an excellent choice for beginners. Python\'s design prioritizes developer productivity and code maintainability.\n\nKey Characteristics:\nReadability: Python\'s syntax is designed to be intuitive and easy to understand, often resembling natural language. It uses indentation to define code blocks, promoting a clean and consistent style.\nInterpreted Language: Python code is executed line by line by an interpreter, without the need for a separate compilation step. This allows for rapid development and testing.\nDynamically Typed: Variable types in Python are determined at runtime, meaning you don\'t explicitly declare the type of a variable when you create it.\nVersatility: Python is a general-purpose language used in various domains, including web devel

In [7]:
## Directory Loading

from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import PyMuPDFLoader,PyPDFLoader

dir_loader=DirectoryLoader("../data/pdf",
                       glob="**/*.pdf",
                       loader_cls=PyMuPDFLoader,
                       show_progress=False)
document=dir_loader.load()

document

  from .autonotebook import tqdm as notebook_tqdm


[Document(metadata={'producer': 'PyPDF2', 'creator': '', 'creationdate': '', 'source': '../data/pdf/attention.pdf', 'file_path': '../data/pdf/attention.pdf', 'total_pages': 11, 'format': 'PDF 1.3', 'title': 'Attention is All you Need', 'author': 'Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'keywords': '', 'moddate': '2018-02-12T21:22:10-08:00', 'trapped': '', 'modDate': "D:20180212212210-08'00'", 'creationDate': '', 'page': 0}, page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhi

In [8]:
import os
from langchain.document_loaders import PyPDFLoader, PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pathlib import Path

In [9]:
### Read all the pdf's inside the directory
def process_all_pdfs(pdf_directory):
    """Process all PDF files in a directory"""
    all_documents = []
    pdf_dir = Path(pdf_directory)
    
    # Find all PDF files recursively
    pdf_files = list(pdf_dir.glob("**/*.pdf"))
    
    print(f"Found {len(pdf_files)} PDF files to process")
    
    for pdf_file in pdf_files:
        print(f"\nProcessing: {pdf_file.name}")
        try:
            loader = PyPDFLoader(str(pdf_file))
            documents = loader.load()
            
            # Add source information to metadata
            for doc in documents:
                doc.metadata['source_file'] = pdf_file.name
                doc.metadata['file_type'] = 'pdf'
            
            all_documents.extend(documents)
            print(f"  ✓ Loaded {len(documents)} pages")
            
        except Exception as e:
            print(f"  ✗ Error: {e}")
    
    print(f"\nTotal documents loaded: {len(all_documents)}")
    return all_documents

# Process all PDFs in the data directory
all_pdf_documents = process_all_pdfs("../data")

Found 4 PDF files to process

Processing: attention.pdf
  ✓ Loaded 11 pages

Processing: Apple2023.pdf
  ✓ Loaded 80 pages

Processing: Nvidia2024.pdf
  ✓ Loaded 130 pages

Processing: Apple2024.pdf
  ✓ Loaded 121 pages

Total documents loaded: 342


In [10]:
### Text splitting get into chunks

def split_documents(documents,chunk_size=1000,chunk_overlap=200):
    """Split documents into smaller chunks for better RAG performance"""
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        separators=["\n\n", "\n", " ", ""]
    )
    split_docs = text_splitter.split_documents(documents)
    print(f"Split {len(documents)} documents into {len(split_docs)} chunks")
    
    # Show example of a chunk
    if split_docs:
        print(f"\nExample chunk:")
        print(f"Content: {split_docs[0].page_content[:200]}...")
        print(f"Metadata: {split_docs[0].metadata}")
    
    return split_docs

In [11]:
chunks=split_documents(all_pdf_documents)

Split 342 documents into 1632 chunks

Example chunk:
Content: Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.com
Noam Shazeer∗
Google Brain
noam@google.com
Niki Parmar∗
Google Research
nikip@google.com
Jakob Uszkoreit∗
Google Research
usz...
Metadata: {'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superio

#### Embedding and Vector store 

In [12]:
import numpy as np
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
import uuid
from typing import List,Dict,Any,Tuple
from sklearn.metrics.pairwise import cosine_similarity

In [13]:
class EmbeddingManager:
    """Handles document embedding generation using Sentence Transformer"""

    def __init__(self,model_name:str="sentence-transformers/all-MiniLM-L6-v2"):
        """ 
        Initialize the embedding manager
        Args:
            model_name: HuggingFace model name for sentence embeddings
            """
        self.model_name=model_name
        self.model=None
        self._load_model()

    def _load_model(self):
        """Load the SentenceTransformer model"""

        try:
            print(f"Loading embedding model: {self.model_name}")
            self.model=SentenceTransformer(self.model_name)
            print(f"Model loaded successfully. Embedding dimension: {self.model.get_sentence_embedding_dimension()}")
        except Exception as e:
            print(f"Error loading model {self.model_name}: {e}")
            raise


    def generate_embeddings(self,texts:List[str]) ->np.ndarray:
        """ 
        Generate embeddings for a list of texts

        Args:
            texts: List of text strings to embed

        Returns:
            numpy arrary of embeddings with shape (len(texts),embedding_dim)

        """

        if not self.model:
            raise ValueError("Model not loaded.")
        
        print(f"Generate embeddings for {len(texts)} texts...")
        embeddings=self.model.encode(texts,show_progress_bar=True)
        print(f"Generated embeddings with shape: {embeddings.shape}")
        return embeddings

embedding_manager=EmbeddingManager()
embedding_manager


Loading embedding model: sentence-transformers/all-MiniLM-L6-v2
Model loaded successfully. Embedding dimension: 384


<__main__.EmbeddingManager at 0x7443896b6d20>

Vector Store

In [14]:
class VectorStore:
    """Manages document embeddings in a ChromaDB vector store"""

    def __init__(self, collection_name:str="pdf_documents",persist_directory:str="../data/vector_store"):
        """ 
        Initialize the vector store
        
        Args:
            collection_name: Name of the ChromaDB collection
            persist_directory: Directory to persist the vector store"""
        
        self.collection_name = collection_name
        self.persist_directory = persist_directory
        self.client = None
        self.collection = None
        self._initialize_store()
    
    def _initialize_store(self):
        """ Initialize ChromaDB client and collection"""
        try:
            os.makedirs(self.persist_directory,exist_ok=True)
            self.client=chromadb.PersistentClient(path=self.persist_directory)

            self.collection = self.client.get_or_create_collection(
                name=self.collection_name,
                metadata={"description":"PDF document embeddings for RAG"}
            )

            print(f"Vector store initialized. Collection: {self.collection_name}")
            print(f"Existing documents in collection:{self.collection.count()}")
        except Exception as e:
            print(f"Error initializing vector store: {e}")
            raise
    
    def add_documents(self,documents: List[Any],embeddings:np.ndarray):
        """ 
        Add documents and their embeddings to the vector store

        Args:
            documents: List of Langchain documents
            embeddings: Corresponding embeddings for the documents
        """
    
        if len(documents) != len(embeddings):
            raise ValueError("Number of documents must match number of embeddings")
        
        print(f"Adding {len(documents)} documents to vectore store..")

        #Prepare data for ChromaDB

        ids=[]
        metadatas=[]
        documents_text=[]
        embeddings_list=[]

        for i,(doc,embedding) in enumerate(zip(documents,embeddings)):

            doc_id=f"doc_{uuid.uuid4().hex[:8]}_{i}"
            ids.append(doc_id)

            metadata = dict(doc.metadata)
            metadata['doc_index'] = i
            metadata['content_length'] = len(doc.page_content)
            metadatas.append(metadata)

            documents_text.append(doc.page_content)

            embeddings_list.append(embedding.tolist())


        try:
            self.collection.add(
                ids=ids,
                embeddings=embeddings_list,
                metadatas=metadatas,
                documents=documents_text
            )
            print(f"Sucessfully added {len(documents)} documents to vector store.")
            print(f"Total documents in collection: {self.collection.count()}")

        except Exception as e:
            print(f"Error adding documents to vector store: {e}")
            raise

vectorstore=VectorStore()
vectorstore

Vector store initialized. Collection: pdf_documents
Existing documents in collection:3264


<__main__.VectorStore at 0x74433784f590>

In [15]:
chunks

[Document(metadata={'producer': 'PyPDF2', 'creator': 'PyPDF', 'creationdate': '', 'subject': 'Neural Information Processing Systems http://nips.cc/', 'publisher': 'Curran Associates, Inc.', 'language': 'en-US', 'created': '2017', 'eventtype': 'Poster', 'description-abstract': 'The dominant sequence transduction models are based on complex recurrent orconvolutional neural networks in an encoder and decoder configuration. The best performing such models also connect the encoder and decoder through an attentionm echanisms.  We propose a novel, simple network architecture based solely onan attention mechanism, dispensing with recurrence and convolutions entirely.Experiments on two machine translation tasks show these models to be superiorin quality while being more parallelizable and requiring significantly less timeto train. Our single model with 165 million parameters, achieves 27.5 BLEU onEnglish-to-German translation, improving over the existing best ensemble result by over 1 BLEU. On 

In [16]:
### Convert the text to embeddings
texts=[doc.page_content for doc in chunks]

## Generate the Embeddings

embeddings=embedding_manager.generate_embeddings(texts)

##store int he vector dtaabase
vectorstore.add_documents(chunks,embeddings)

Generate embeddings for 1632 texts...


Batches: 100%|██████████| 51/51 [00:02<00:00, 21.79it/s]


Generated embeddings with shape: (1632, 384)
Adding 1632 documents to vectore store..
Sucessfully added 1632 documents to vector store.
Total documents in collection: 4896


Retriever Pipeline From VectorStore


In [17]:
class RAGRetriever:
    """Handles query-based retrieval from the vector store"""
    
    def __init__(self, vector_store: VectorStore, embedding_manager: EmbeddingManager):
        """
        Initialize the retriever
        
        Args:
            vector_store: Vector store containing document embeddings
            embedding_manager: Manager for generating query embeddings
        """
        self.vector_store = vector_store
        self.embedding_manager = embedding_manager

    def retrieve(self, query: str, top_k: int = 5, score_threshold: float = 0.0) -> List[Dict[str, Any]]:
        """
        Retrieve relevant documents for a query
        
        Args:
            query: The search query
            top_k: Number of top results to return
            score_threshold: Minimum similarity score threshold
            
        Returns:
            List of dictionaries containing retrieved documents and metadata
        """
        print(f"Retrieving documents for query: '{query}'")
        print(f"Top K: {top_k}, Score threshold: {score_threshold}")
        
        # Generate query embedding
        query_embedding = self.embedding_manager.generate_embeddings([query])[0]
        
        # Search in vector store
        try:
            results = self.vector_store.collection.query(
                query_embeddings=[query_embedding.tolist()],
                n_results=top_k
            )
            
            # Process results
            retrieved_docs = []
            
            if results['documents'] and results['documents'][0]:
                documents = results['documents'][0]
                metadatas = results['metadatas'][0]
                distances = results['distances'][0]
                ids = results['ids'][0]
                
                for i, (doc_id, document, metadata, distance) in enumerate(zip(ids, documents, metadatas, distances)):
                    # Convert distance to similarity score (ChromaDB uses cosine distance)
                    similarity_score = 1 - distance
                    
                    if similarity_score >= score_threshold:
                        retrieved_docs.append({
                            'id': doc_id,
                            'content': document,
                            'metadata': metadata,
                            'similarity_score': similarity_score,
                            'distance': distance,
                            'rank': i + 1
                        })
                
                print(f"Retrieved {len(retrieved_docs)} documents (after filtering)")
            else:
                print("No documents found")
            
            return retrieved_docs
            
        except Exception as e:
            print(f"Error during retrieval: {e}")
            return []

rag_retriever=RAGRetriever(vectorstore,embedding_manager)

In [18]:
rag_retriever

<__main__.RAGRetriever at 0x744334952930>

In [31]:
rag_retriever.retrieve("What is attention is all you need?")

Retrieving documents for query: 'What is attention is all you need?'
Top K: 5, Score threshold: 0.0
Generate embeddings for 1 texts...


Batches: 100%|██████████| 1/1 [00:00<00:00, 76.94it/s]

Generated embeddings with shape: (1, 384)
Retrieved 0 documents (after filtering)





[]

In [21]:
from langchain_groq import ChatGroq
from langchain.prompts import PromptTemplate
from langchain.schema import HumanMessage, SystemMessage

In [22]:
class GroqLLM:
    def __init__(self, model_name: str = "gemma2-9b-it", api_key: str =None):
        """
        Initialize Groq LLM
        
        Args:
            model_name: Groq model name (qwen2-72b-instruct, llama3-70b-8192, etc.)
            api_key: Groq API key (or set GROQ_API_KEY environment variable)
        """
        self.model_name = model_name
        self.api_key = api_key or os.getenv("GROQ_API_KEY")
        
        if not self.api_key:
            raise ValueError("Groq API key is required. Set GROQ_API_KEY environment variable or pass api_key parameter.")
        
        self.llm = ChatGroq(
            groq_api_key=self.api_key,
            model_name=self.model_name,
            temperature=0.1,
            max_tokens=1024
        )
        
        print(f"Initialized Groq LLM with model: {self.model_name}")

    def generate_response(self, query: str, context: str, max_length: int = 500) -> str:
        """
        Generate response using retrieved context
        
        Args:
            query: User question
            context: Retrieved document context
            max_length: Maximum response length
            
        Returns:
            Generated response string
        """
        
        # Create prompt template
        prompt_template = PromptTemplate(
            input_variables=["context", "question"],
            template="""You are a helpful AI assistant. Use the following context to answer the question accurately and concisely.

Context:
{context}

Question: {question}

Answer: Provide a clear and informative answer based on the context above. If the context doesn't contain enough information to answer the question, say so."""
        )
        
        # Format the prompt
        formatted_prompt = prompt_template.format(context=context, question=query)
        
        try:
            # Generate response
            messages = [HumanMessage(content=formatted_prompt)]
            response = self.llm.invoke(messages)
            return response.content
            
        except Exception as e:
            return f"Error generating response: {str(e)}"
        
    def generate_response_simple(self, query: str, context: str) -> str:
        """
        Simple response generation without complex prompting
        
        Args:
            query: User question
            context: Retrieved context
            
        Returns:
            Generated response
        """
        simple_prompt = f"""Based on this context: {context}

Question: {query}

Answer:"""
        
        try:
            messages = [HumanMessage(content=simple_prompt)]
            response = self.llm.invoke(messages)
            return response.content
        except Exception as e:
            return f"Error: {str(e)}"

In [23]:
# Initialize Groq LLM (you'll need to set GROQ_API_KEY environment variable)
try:
    groq_llm = GroqLLM(api_key=os.getenv("GROQ_API_KEY"))
    print("Groq LLM initialized successfully!")
except ValueError as e:
    print(f"Warning: {e}")
    print("Please set your GROQ_API_KEY environment variable to use the LLM.")
    groq_llm = None

Initialized Groq LLM with model: gemma2-9b-it
Groq LLM initialized successfully!


In [24]:
rag_retriever.retrieve("Unified Multi-task Learning Framework")

Retrieving documents for query: 'Unified Multi-task Learning Framework'
Top K: 5, Score threshold: 0.0
Generate embeddings for 1 texts...


Batches: 100%|██████████| 1/1 [00:00<00:00, 503.82it/s]

Generated embeddings with shape: (1, 384)
Retrieved 0 documents (after filtering)





[]

In [25]:
### Simple RAG pipeline with Groq LLM
from langchain_groq import ChatGroq
import os
from dotenv import load_dotenv
load_dotenv()

### Initialize the Groq LLM (set your GROQ_API_KEY in environment)
groq_api_key = os.getenv("GROQ_API_KEY")

llm=ChatGroq(groq_api_key=groq_api_key,model_name="gemma2-9b-it",temperature=0.1,max_tokens=1024)

## 2. Simple RAG function: retrieve context + generate response
def rag_simple(query,retriever,llm,top_k=3):
    ## retriever the context
    results=retriever.retrieve(query,top_k=top_k)
    context="\n\n".join([doc['content'] for doc in results]) if results else ""
    if not context:
        return "No relevant context found to answer the question."
    
    ## generate the answwer using GROQ LLM
    prompt=f"""Use the following context to answer the question concisely.
        Context:
        {context}

        Question: {query}

        Answer:"""
    
    response=llm.invoke([prompt.format(context=context,query=query)])
    return response.content

In [30]:
answer=rag_simple("What is attention is all you need?",rag_retriever,llm)
print(answer)

Retrieving documents for query: 'What is attention is all you need?'
Top K: 3, Score threshold: 0.0
Generate embeddings for 1 texts...


Batches: 100%|██████████| 1/1 [00:00<00:00, 76.15it/s]

Generated embeddings with shape: (1, 384)
Retrieved 0 documents (after filtering)
No relevant context found to answer the question.





In [34]:
# --- Enhanced RAG Pipeline Features ---
def rag_advanced(query, retriever, llm, top_k=5, min_score=0.2, return_context=False):
    """
    RAG pipeline with extra features:
    - Returns answer, sources, confidence score, and optionally full context.
    """
    results = retriever.retrieve(query, top_k=top_k, score_threshold=min_score)
    if not results:
        return {'answer': 'No relevant context found.', 'sources': [], 'confidence': 0.0, 'context': ''}
    
    # Prepare context and sources
    context = "\n\n".join([doc['content'] for doc in results])
    sources = [{
        'source': doc['metadata'].get('source_file', doc['metadata'].get('source', 'unknown')),
        'page': doc['metadata'].get('page', 'unknown'),
        'score': doc['similarity_score'],
        'preview': doc['content'][:300] + '...'
    } for doc in results]
    confidence = max([doc['similarity_score'] for doc in results])
    
    # Generate answer
    prompt = f"""Use the following context to answer the question concisely.\nContext:\n{context}\n\nQuestion: {query}\n\nAnswer:"""
    response = llm.invoke([prompt.format(context=context, query=query)])
    
    output = {
        'answer': response.content,
        'sources': sources,
        'confidence': confidence
    }
    if return_context:
        output['context'] = context
    return output

# Example usage:
result = rag_advanced("What is Nvidia's technological advancement in 2024?", rag_retriever, llm, top_k=3, min_score=0.1, return_context=True)
print("Answer:", result['answer'])
print("Sources:", result['sources'])
print("Confidence:", result['confidence'])
print("Context Preview:", result['context'][:300])

Retrieving documents for query: 'What is Nvidia's technological advancement in 2024?'
Top K: 3, Score threshold: 0.1
Generate embeddings for 1 texts...


Batches: 100%|██████████| 1/1 [00:00<00:00, 79.32it/s]

Generated embeddings with shape: (1, 384)
Retrieved 3 documents (after filtering)





Answer: The provided text does not contain information about Nvidia's technological advancements in 2024. 



Sources: [{'source': 'Nvidia2024.pdf', 'page': 35, 'score': 0.3111526370048523, 'preview': 'Overview\nOur Company and Our Businesses\nNVIDIA pioneered accelerated computing to help solve the most challenging computational problems. Since our original focus on PC graphics, we have\nexpanded to several other large and important computationally intensive fields. Fueled by the sustained demand fo...'}, {'source': 'Nvidia2024.pdf', 'page': 35, 'score': 0.3111526370048523, 'preview': 'Overview\nOur Company and Our Businesses\nNVIDIA pioneered accelerated computing to help solve the most challenging computational problems. Since our original focus on PC graphics, we have\nexpanded to several other large and important computationally intensive fields. Fueled by the sustained demand fo...'}, {'source': 'Nvidia2024.pdf', 'page': 35, 'score': 0.3111526370048523, 'preview': 'Overview\nOur 

In [37]:
# --- Advanced RAG Pipeline: Streaming, Citations, History, Summarization ---
from typing import List, Dict, Any
import time

class AdvancedRAGPipeline:
    def __init__(self, retriever, llm):
        self.retriever = retriever
        self.llm = llm
        self.history = []  # Store query history

    def query(self, question: str, top_k: int = 5, min_score: float = 0.2, stream: bool = False, summarize: bool = False) -> Dict[str, Any]:
        # Retrieve relevant documents
        results = self.retriever.retrieve(question, top_k=top_k, score_threshold=min_score)
        if not results:
            answer = "No relevant context found."
            sources = []
            context = ""
        else:
            context = "\n\n".join([doc['content'] for doc in results])
            sources = [{
                'source': doc['metadata'].get('source_file', doc['metadata'].get('source', 'unknown')),
                'page': doc['metadata'].get('page', 'unknown'),
                'score': doc['similarity_score'],
                'preview': doc['content'][:120] + '...'
            } for doc in results]
            # Streaming answer simulation
            prompt = f"""Use the following context to answer the question concisely.\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"""
            if stream:
                print("Streaming answer:")
                for i in range(0, len(prompt), 80):
                    print(prompt[i:i+80], end='', flush=True)
                    time.sleep(0.05)
                print()
            response = self.llm.invoke([prompt.format(context=context, question=question)])
            answer = response.content

        # Add citations to answer
        citations = [f"[{i+1}] {src['source']} (page {src['page']})" for i, src in enumerate(sources)]
        answer_with_citations = answer + "\n\nCitations:\n" + "\n".join(citations) if citations else answer

        # Optionally summarize answer
        summary = None
        if summarize and answer:
            summary_prompt = f"Summarize the following answer in 2 sentences:\n{answer}"
            summary_resp = self.llm.invoke([summary_prompt])
            summary = summary_resp.content

        # Store query history
        self.history.append({
            'question': question,
            'answer': answer,
            'sources': sources,
            'summary': summary
        })

        return {
            'question': question,
            'answer': answer_with_citations,
            'sources': sources,
            'summary': summary,
            'history': self.history
        }

# Example usage:
adv_rag = AdvancedRAGPipeline(rag_retriever, llm)
result = adv_rag.query("what is attention is all you need", top_k=3, min_score=0.2, stream=True, summarize=True)
print("\nFinal Answer:", result['answer'])
print("Summary:", result['summary'])
print("History:", result['history'][-1])

Retrieving documents for query: 'what is attention is all you need'
Top K: 3, Score threshold: 0.2
Generate embeddings for 1 texts...


Batches: 100%|██████████| 1/1 [00:00<00:00, 502.13it/s]

Generated embeddings with shape: (1, 384)
Retrieved 0 documents (after filtering)






Final Answer: No relevant context found.
Summary: The provided text does not contain any information that can be summarized.  There is no relevant context to draw upon. 



History: {'question': 'what is attention is all you need', 'answer': 'No relevant context found.', 'sources': [], 'summary': 'The provided text does not contain any information that can be summarized.  There is no relevant context to draw upon. \n\n\n'}
