<a href="https://colab.research.google.com/github/korondipeter-dev/myrepo/blob/main/Rag_short_version_with_ollama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook provides a complete solution for building a Retrieval Augmented Generation (RAG) system using PDF documents, Ollama for embeddings and response generation, and ChromaDB for vector storage.

The notebook is structured as follows:

1.  **Setup and Installation**: Installs necessary libraries and tools, including `fitz` (PyMuPDF) for PDF processing, `ollama` for interacting with the Ollama service, and `chromadb` for the vector database. It also includes commands to set up Ollama itself.
2.  **PDF Processing and Embedding**: Defines and uses a function `process_pdf_to_chroma` to:
    *   Extract text from PDF files in a specified input directory (`/content/input`).
    *   Chunk the extracted text into smaller pieces.
    *   Generate embeddings for each chunk using a specified Ollama model (`mxbai-embed-large`).
    *   Add the generated embeddings, original text chunks, and unique IDs to a ChromaDB collection (`pdf_embeddings`).
3.  **Interactive Question Answering**: Sets up an interactive loop where the user can ask questions. For each query:
    *   An embedding is generated for the user's question.
    *   Relevant document chunks are retrieved from the ChromaDB collection based on the query embedding.
    *   A prompt is constructed combining the user's query and the retrieved context.
    *   A response is generated using another Ollama model (`gemma3:4b`) based on the prompt.
    *   The generated response is displayed to the user.
4.  **Database Management**: Includes a utility function `empty_chroma_db` to clear all data from the ChromaDB collection, allowing for easy resetting of the database.
5.  **References**: Provides links to external resources related to RAG concepts and implementations.

To use this notebook:

1.  Ensure Ollama is running and the required models (`mxbai-embed-large` and `gemma3:4b`) are pulled.
2.  Place your PDF documents in the `/content/input` directory.
3.  Run the cells sequentially.
4.  Use the interactive loop to ask questions about the content of your PDF files.

### Install necessary libraries and create folders, install services

This cell installs the required Python libraries for PDF processing, embedding generation with Ollama, and vector storage with ChromaDB. It also includes commands to install `lshw` for system information, necessary for the nvidia driver to expose function to ollama.

In [None]:
!mkdir /content/input
!pip install fitz
!pip install PyMuPDF
!apt install lshw
!pip install ollama
!pip install chromadb
!curl -fsSL https://ollama.com/install.sh | sh
#ollama serve
#ollama pull gemma3:4b
#ollama pull mxbai-embed-large

Collecting fitz
  Downloading fitz-0.0.1.dev2-py2.py3-none-any.whl.metadata (816 bytes)
Collecting configobj (from fitz)
  Downloading configobj-5.0.9-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting configparser (from fitz)
  Downloading configparser-7.2.0-py3-none-any.whl.metadata (5.5 kB)
Collecting nipype (from fitz)
  Downloading nipype-1.10.0-py3-none-any.whl.metadata (7.1 kB)
Collecting pyxnat (from fitz)
  Downloading pyxnat-1.6.3-py3-none-any.whl.metadata (5.4 kB)
Collecting prov>=1.5.2 (from nipype->fitz)
  Downloading prov-2.1.1-py3-none-any.whl.metadata (3.7 kB)
Collecting rdflib>=5.0.0 (from nipype->fitz)
  Downloading rdflib-7.1.4-py3-none-any.whl.metadata (11 kB)
Collecting traits>=6.2 (from nipype->fitz)
  Downloading traits-7.0.2-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.8 kB)
Collecting acres (from nipype->fitz)
  Downloading acres-0.5.0-py3-none-any.whl.metadata (6.2 kB)
Collecting etelemetry>=0.3.1

Collecting PyMuPDF
  Downloading pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl (24.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m101.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDF
Successfully installed PyMuPDF-1.26.3
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  pci.ids usb.ids
The following NEW packages will be installed:
  lshw pci.ids usb.ids
0 upgraded, 3 newly installed, 0 to remove and 35 not upgraded.
Need to get 791 kB of archives.
After this operation, 2,988 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 lshw amd64 02.19.git.2021.06.19.996aaad9c7-2build1 [321 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 pci.ids all 0.0~2022.01.22-1ubuntu0.1 [

### Process PDF files and store embeddings in ChromaDB

This cell defines and uses a function `process_pdf_to_chroma` to handle the entire PDF processing pipeline for each file in the `/content/input` directory. It extracts text, chunks it, generates embeddings using Ollama, and adds them to a ChromaDB collection.

In [None]:
import fitz
import os
import uuid
from ollama import Client
import chromadb
import time

# Initialize Chroma client (assuming it's persistent as before)
client_chroma = chromadb.PersistentClient(path="./chroma_db")
collection_name = "pdf_embeddings"
try:
    collection = client_chroma.create_collection(name=collection_name)
    print(f"Collection '{collection_name}' created.")
except:
    collection = client_chroma.get_collection(name=collection_name)
    print(f"Collection '{collection_name}' already exists.")


# Initialize Ollama client
# Assuming Ollama is running on localhost:11434
ollama_client = Client(host='http://localhost:11434')

def process_pdf_to_chroma(file_path: str, chunk_size: int = 1024):
    """
    Processes a single PDF file, extracts text, chunks it, generates embeddings,
    and adds them to the Chroma database.

    Args:
        file_path: The path to the PDF file.
        chunk_size: The size of text chunks.
    """
    extracted_text = ""
    try:
        doc = fitz.open(file_path)
        for page_num in range(doc.page_count):
            page = doc.load_page(page_num)
            extracted_text += page.get_text()
        doc.close()
        print(f"Text extracted from {file_path}.")
    except Exception as e:
        print(f"Error processing file {file_path}: {e}")
        return

    if not extracted_text:
        print(f"No text extracted from {file_path}.")
        return

    chunked_texts = []
    for i in range(0, len(extracted_text), chunk_size):
        chunk = extracted_text[i:i + chunk_size]
        chunked_texts.append(chunk)
    print(f"Text chunked for {file_path}.")

    if not chunked_texts:
        print(f"No chunks created for {file_path}.")
        return

    embeddings_to_add = []
    texts_to_add = []
    ids_to_add = []

    start_time_loop = time.time() # Start timing the loop
    for chunk in chunked_texts:
        try:
            embedding = ollama_client.embeddings(model='mxbai-embed-large', prompt=chunk)['embedding']
            embeddings_to_add.append(embedding)
            texts_to_add.append(chunk)
            ids_to_add.append(str(uuid.uuid4()))
        except Exception as e:
            print(f"Error generating embedding for a chunk in {file_path}: {e}")
    end_time_loop = time.time() # End timing the loop
    elapsed_time_loop = end_time_loop - start_time_loop


    if not embeddings_to_add:
        print(f"No embeddings generated for {file_path}.")
        return
    print(f"Embeddings generated for {file_path}.")
    print(f"Total embedding generation time for all chunks: {elapsed_time_loop:.4f} seconds\n")
    try:
        collection.add(
            embeddings=embeddings_to_add,
            documents=texts_to_add,
            ids=ids_to_add
        )
        print(f"Added {len(ids_to_add)} items from {file_path} to the collection.")
        print(collection.peek(),'\n')
    except Exception as e:
            print(f"Error adding data from {file_path} to Chroma collection: {e}")

# --- Main execution part ---
folder_path = '/content/input'

# Check if the folder exists
if not os.path.exists(folder_path):
    print(f"Folder not found at: {folder_path}")
else:
    # Iterate through all files in the folder and process PDFs
    start_time_loop1 = time.time() # Start timing the loop
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        if filename.lower().endswith('.pdf'):
            process_pdf_to_chroma(file_path)
    end_time_loop1 = time.time() # End timing the loop
    elapsed_time_loop1 = end_time_loop1 - start_time_loop1
    print(f"Total embedding generation time for all files: {elapsed_time_loop1:.4f} seconds\n")
print("\nPDF processing complete.")

Collection 'pdf_embeddings' created.
Text extracted from /content/input/The Origin of Species by Means of Natural Selection by Charles Darwin.pdf.
Text chunked for /content/input/The Origin of Species by Means of Natural Selection by Charles Darwin.pdf.
Embeddings generated for /content/input/The Origin of Species by Means of Natural Selection by Charles Darwin.pdf.
Total embedding generation time for all chunks: 49.5705 seconds

Added 1245 items from /content/input/The Origin of Species by Means of Natural Selection by Charles Darwin.pdf to the collection.
{'ids': ['667ed3d2-0858-4974-99ee-b7d5033b116c', '8d78eaea-c322-46e3-a1bb-012bf97310b3', '0f1434ff-6327-41fe-9320-9cdaa3a58d8b', '6744d90c-c8c5-4405-9361-10f6bd7f1eaf', '22fa5b6e-8e49-4619-afd0-e4e8872abb90', '638bf61a-4cc2-48c6-b912-6dbf65bd122d', '9152af4e-835e-4b5e-a4af-17801905bf64', 'bfd43af4-d847-40ad-a9d8-93aad004dc9e', 'bcd2c4ab-fe1f-4d95-baa4-eab21f83a6b4', '1b952d68-c1df-4c39-92bd-fa131806b0dd'], 'embeddings': array([[-0.5

### Interactive Question Answering Loop

This cell contains the interactive loop for querying the Chroma database. It defines the `retrieve_documents` and `generate_response` functions, which are used to fetch relevant document chunks based on a user query and then generate a response using Ollama based on the retrieved context. The loop allows the user to enter questions and receive answers until they type 'quit'.

In [None]:
from ollama import Client

# Assuming Ollama is running on localhost:11434
ollama_client = Client(host='http://localhost:11434')

def retrieve_documents(query: str, n_results: int = 5):
    """
    Generates an embedding for a query and retrieves relevant documents from Chroma.

    Args:
        query: The user query string.
        n_results: The number of relevant documents to retrieve.

    Returns:
        A list of relevant document chunks.
    """
    # Generate embedding for the query using the ollama_client
    query_embedding = ollama_client.embeddings(model='mxbai-embed-large', prompt=query)['embedding']

    # Query the Chroma collection using the chromadb client named client_chroma
    results = client_chroma.get_collection(name="pdf_embeddings").query(
        query_embeddings=[query_embedding],
        n_results=n_results
    )

    # Return the retrieved documents
    return results['documents'][0]

def generate_response(query: str, context: list):
    """
    Constructs a prompt for Ollama with query and context, and generates a response.

    Args:
        query: The user query string.
        context: A list of relevant document chunks (strings).

    Returns:
        The generated text response from Ollama.
    """
    # Construct the prompt
    prompt = f"Using the following context, answer the question:\n\nContext:\n{''.join(context)}\n\nQuestion: {query}\nAnswer:"

    # Generate response using Ollama
    response = ollama_client.generate(model='gemma3:4b', prompt=prompt)

    # Extract and return the text response
    return response['response']

# Now, implement the loop for user interaction
while True:
    user_query = input("Enter your question (or type 'quit' to exit): ")
    if user_query.lower() == 'quit':
        break

    # Retrieve relevant documents based on the query
    relevant_documents = retrieve_documents(user_query)
    print("Relevant documents retrieved:\n")
    print(relevant_documents,"\n")
    # Generate a response using the query and retrieved documents
    response = generate_response(user_query, relevant_documents)

    # Print the generated response
    print("\nOllama's response:")
    print(response)
    print("-" * 50) # Separator for readability

print("Exiting chat.")

### Empty ChromaDB Collection

This cell defines a function `empty_chroma_db` that allows you to clear all data from the specified ChromaDB collection. This is useful for resetting the database or starting with a fresh collection.

In [None]:
import chromadb

def empty_chroma_db(db_path: str = "./chroma_db", collection_name: str = "pdf_embeddings"):
    """
    Empties the specified Chroma database collection.

    Args:
        db_path: The path to the Chroma database directory.
        collection_name: The name of the collection to empty.
    """
    try:
        client = chromadb.PersistentClient(path=db_path)
        client.delete_collection(name=collection_name)
        #collection = client.create_collection(name=collection_name)
        #collection.delete(where={}) # Delete all items in the collection
        print(f"Collection '{collection_name}' in '{db_path}' has been emptied.")
    except Exception as e:
        print(f"Error emptying Chroma database: {e}")
        print(collection.peek())

# Example usage:
empty_chroma_db()

Collection 'pdf_embeddings' in './chroma_db' has been emptied.


<details>
  <summary>References</summary>

1. general knowledge and best prcatices for RAG
https://medium.com/@adnanmasood/optimizing-chunking-embedding-and-vectorization-for-retrieval-augmented-generation-ea3b083b68f7
2. dell reference design using 2 R760xa + 1x R660+ powerscale F710 + switches and Nvidia software stack with kubernetes, focusing motly on hardware side and the software stack, less on the aplication side(just basic chunking based on sentences, no multimodal embeding):
https://www.delltechnologies.com/asset/en-us/solutions/industry-solutions/industry-market/dell-scalable-architecture-for-retrieval-augmented-generation-with-nvidia-microservices-whitepaper.pdf

</details>