# Retrieval-Augmented Generation (RAG) Example with Ollama in Google Colab

This notebook demonstrates how to set up a simple RAG example using Ollama's LLaVA model and LangChain. We will:
- Install necessary libraries
- Set up and run Ollama in the background
- Download a sample PDF document
- Embed document chunks using a vector database (ChromaDB)
- Use Ollama's LLaVA model to answer queries based on document context

In [None]:
# Step 1: Install Ollama and start the service
!curl -fsSL https://ollama.com/install.sh | sh
!ollama serve &>/dev/null&  # Start Ollama in the background

In [None]:
# Step 2: Download LLaVA model (13b parameters, 1.6 version)
!ollama pull llava:13b-v1.6  # Start Ollama in the background

In [None]:
# Step 3: Install additional Python packages for LangChain and PDF processing
!pip install langchain_community pypdf requests langchain fastembed chromadb tiktoken sentence_transformers

### Step 4: Import Libraries and Define Configurations
Import the necessary libraries and define configurations for ChromaDB and the document download path.

In [None]:
import sys
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain_community.llms import Ollama
import requests

# Set up configuration variables
CHROMA_DATA_PATH = "vdb_data/"  # Path to store vector database data
DOC_PATH = "https://raw.githubusercontent.com/skillrepos/genai-dd/main/samples/data.pdf"  # URL for sample PDF
DOC_FILENAME = "data.pdf"  # Filename to save downloaded PDF

### Step 5: Download the Sample PDF Document
We'll download the sample PDF document from the specified URL.

In [None]:
# Download the PDF
response = requests.get(DOC_PATH, allow_redirects=True)
if response.status_code == 200:
    with open(DOC_FILENAME, 'wb') as f:
        f.write(response.content)
    print("PDF downloaded successfully.")
else:
    print("Failed to download PDF. Status code:", response.status_code)
    sys.exit(1)  # Exit if download failed

### Step 6: Initialize the Ollama Model
Make sure the Ollama server is running in the background. Load the LLaVA model with desired settings.

In [None]:
# Start Ollama server if not already running
!ollama serve &>/dev/null&  # Run in the background

# Initialize the LLaVA model from Ollama
llm = Ollama(model="llava:13b-v1.6", temperature=0.7)

### Step 7: Load and Split PDF Document into Chunks
We will load the document into memory and split it into smaller chunks for more efficient retrieval during querying.

In [None]:
# Load and split the document into smaller chunks
loader = PyPDFLoader(DOC_PATH)
pages = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(pages)

print(f"Document split into {len(chunks)} chunks.")

### Step 8: Embed Chunks and Create Vector Database
Embed the chunks into vectors using FastEmbed and store them in ChromaDB.

In [None]:
# Embed document chunks and load them into ChromaDB
embeddings = FastEmbedEmbeddings()
db_chroma = Chroma.from_documents(chunks, embeddings, persist_directory=CHROMA_DATA_PATH)

print("Vector database created.")

### Step 9: Define the Query Template and Run Queries
Define a prompt template and set up a loop to allow interactive queries. Type `exit` to stop the loop.

In [None]:
PROMPT_TEMPLATE = """
Answer the question: {question} using whatever resources you have.
Include any related information from {context} as part of your answer.
Provide a detailed answer.
Don’t justify your answers.
"""

# Interactive query loop
while True:
    query = input("\nQuery: ")
    if query.lower() == "exit":
        print("Exiting...")
        break
    if not query.strip():
        continue
    
    # Retrieve the top 5 most relevant chunks
    docs_chroma = db_chroma.similarity_search_with_score(query, k=5)
    context_text = "\n\n".join([doc.page_content for doc, _score in docs_chroma])
    
    # Prepare the prompt
    prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=context_text, question=query)

    # Generate and display the response
    response_text = llm.invoke(prompt)
    print(response_text)