# NLP Lecture 4 Example: Retrieval Augmented Generation
---


## RAGs

To use LLMs locally, download it from here:

https://ollama.com/

Then you can pull LLMs models by pull it from your terminal:

`ollama pull mistral`

And to run it:

`ollama run mistral`


RAGs pipeline

![alt text](rag_pipeline.gif)

## Documents and DataBase Preparation

We will use the boardgame rules as the documents for our RAGs system

In [None]:
%pip install chromadb
%pip install ollama
%pip install wikipedia
%pip install wikipedia-api

In [None]:
import os
import shutil
import chromadb
from langchain.document_loaders.pdf import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema.document import Document
from langchain.vectorstores.chroma import Chroma

In [None]:
# Define chroma and data path
chroma_path = 'chroma'
data_path = 'data'

In [None]:
from langchain.document_loaders import WikipediaLoader

def load_wikipedia_articles(topic: str):
    """Load Wikipedia articles based on a given topic."""
    loader = WikipediaLoader(query=topic, lang="en")
    documents = loader.load()
    
    print(f"Loaded {len(documents)} Wikipedia articles on '{topic}'")
    return documents

wikipedia_docs = load_wikipedia_articles("Retrieval-Augmented Generation")


In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_documents(documents):
    """Chunk Wikipedia documents into smaller pieces for embedding."""
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,  # Size of each chunk
        chunk_overlap=80,  # Overlapping words between chunks
    )
    chunks = splitter.split_documents(documents)
    
    print(f"Split Wikipedia content into {len(chunks)} chunks")
    return chunks

# Split the Wikipedia documents into chunks
chunks = split_documents(wikipedia_docs)


In [None]:
def calculate_chunk_ids(chunks):
    chuck_ids = [f"id{x}" for x in range(len(chunks))]
    return chuck_ids

print(calculate_chunk_ids(chunks))

## Vector Embedding

In this section, we will use the `OllamaEmbeddings` model from the `langchain_community` library to embed our documents. This model will help us convert the text data into numerical vectors, which can be used for various downstream tasks such as similarity search, clustering, and more.

The `OllamaEmbeddings` model is initialized with the `nomic-embed-text` model, which is specifically designed for embedding text data. 

In [None]:
from langchain_community.embeddings.ollama import OllamaEmbeddings

def embedding_function():
    embeddings = OllamaEmbeddings(model='nomic-embed-text')
    return embeddings

### Add chunking documents to the Chroma DB using the `OllamaEmbeddings`

In [None]:
def add_to_chroma(chunks: list):
    """Adds Wikipedia chunks to ChromaDB while checking for duplicates."""
    # Initialize ChromaDB
    db = Chroma(
        persist_directory=chroma_path, 
        embedding_function=embedding_function()
    )

    # Generate unique IDs for chunks
    chunk_ids = calculate_chunk_ids(chunks)

    # Fetch existing document IDs
    existing_items = db.get(include=[])
    existing_ids = set(existing_items["ids"])
    print(f"Number of existing documents in DB: {len(existing_ids)}")

    # Filter out chunks that are already in the database
    new_chunks = [chunk for i, chunk in enumerate(chunks) if chunk_ids[i] not in existing_ids]

    if new_chunks:
        print(f"Adding {len(new_chunks)} new documents...")
        new_chunk_ids = [chunk_ids[i] for i in range(len(new_chunks))]
        db.add_documents(new_chunks, ids=new_chunk_ids)
        db.persist()
        print("Successfully added new documents to ChromaDB!")
    else:
        print("No new documents to add.")

In [None]:
add_to_chroma(chunks)

In [None]:
# Reconnect to the existing ChromaDB instance
def load_chroma():
    """Load the existing ChromaDB with stored documents."""
    return Chroma(
        persist_directory=chroma_path,  # Load from saved DB
        embedding_function=embedding_function()
    )

# Load ChromaDB and check how many documents exist
db = load_chroma()
existing_items = db.get(include=["documents"])  # FIXED LINE
print(f"Total documents in ChromaDB: {len(existing_items['documents'])}")

## Query Data

In this section, we will query the data using the RAGs system. We will use the `OllamaEmbeddings` model to embed the query text and search the Chroma database for relevant documents. The results will be formatted and displayed along with their sources.

The following steps will be performed:
1. Prepare the Chroma database with the embedding function.
2. Search the database for the most similar documents to the query text.
3. Format the results and display the response along with the sources.

The `query_rag` function will handle these steps and return the response.

In [None]:
from langchain.vectorstores.chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_community.llms.ollama import Ollama

chroma_path = 'chroma'

PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---

Answer the question based on the above context: {question}
"""

In [None]:
def query_rag(query_text: str):
    """Query the RAG system with a user question."""
    
    # Load the existing ChromaDB
    embedding = embedding_function()
    db = Chroma(persist_directory=chroma_path, embedding_function=embedding)

    # Perform similarity search to retrieve relevant chunks
    results = db.similarity_search_with_score(query_text, k=3)  # Top 3 relevant docs

    if not results:
        print("No relevant documents found!")
        return
    
    # Combine the results into a single context string
    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])

    # Format the prompt with the retrieved context
    prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=context_text, question=query_text)
    
    print(f"Querying Ollama with prompt:\n{prompt}\n")

    # Query the Ollama model
    model = Ollama(model="mistral")  # Ensure Ollama is running
    response_text = model.invoke(prompt)

    # Extract source document IDs
    sources = [doc.metadata.get("id", "Unknown") for doc, _score in results]

    # Display results
    print("Response:\n", response_text)
    print("\nSources:", sources)

    return response_text, sources

In [None]:
query_rag("How does Retrieval-Augmented Generation work?")