<a href="https://colab.research.google.com/github/mrodgers/ollama_rag_colab/blob/main/Testing_Ollama_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to Chatting 'Privately' in Google Colab with Ollama/llama3.1/RAG!

This notebook is designed to help you set up and run a Retrieval-Augmented Generation (RAG) system using Ollama's Llama3.1 model. Whether you're new to machine learning or an experienced developer, this notebook will guide you through the process of installing necessary packages, setting up an interactive terminal, and running a server to process and query documents.

## What You Will Learn

1. **Installing Required Packages**: Learn how to install the necessary Python packages to get started with Ollama and RAG.
2. **Setting Up xterm**: Understand how to set up an xterm terminal within Google Colab to run shell commands.
3. **Running Ollama Server**: Get the Ollama server up and running to serve the Llama3.1 model.
4. **Loading and Processing Documents**: Learn how to load text and PDF files, process them, and create a vector store for efficient querying.
5. **Building a Gradio Interface**: Create a user-friendly interface to upload files and ask questions about their content.

## How to Use This Notebook

1. **Open the Notebook**: Click the "Open in Colab" badge below to open this notebook in Google Colab.
2. **Follow the Steps**: Execute the cells in the notebook one by one. Each cell contains code and instructions to guide you through the setup process.
3. **Upload and Query Documents**: Use the Gradio interface to upload your documents and ask questions. The system will retrieve relevant information from the documents and provide answers.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mrodgers/ollama_rag_colab/blob/main/Testing_Ollama_RAG.ipynb)

## Is Colab Safe for Private Data?

NOPE. While Google Colab is generally safe for personal use, it is not recommended for handling sensitive or confidential data. Your private Colab notebooks are as secure as your private Google Docs, but always exercise caution and avoid sharing any national secrets or highly sensitive information.


In [None]:
# After running the cell below, an xterm terminal will open within the Colab interface.
# In the xterm terminal, you need to run a few commands to set up and start the Ollama server.
# These commands will:
# 1. Download and install the Ollama software.
# 2. Pull the necessary models (llama3.1 and nomic-embed-text) from the Ollama repository.
# 3. Start the Ollama server to serve the models.

# Be patient with the below items, they are downloading a lot of data and take a few minutes.

# After starting the xterm terminal, run the steps 2 and 3 in cells below the xterm, then the following command in the xterm terminal:
# while true; do echo "Keeping xterm and thus Ollama alive..."; sleep 60; done &

# After that, in the xterm window. Copy and paste the following commands into the xterm terminal after it opens:
# curl -fsSL https://ollama.com/install.sh | sh
# ollama serve
# ollama pull llama3.1 &
# ollama pull nomic-embed-text &

In [None]:
# set up a place for Ollama to run inside of an xterm
!pip install colab-xterm -q
%load_ext colabxterm
%xterm

In [None]:
# Run this cell to set up xterm for success: Keep-Alive Script
import time
import requests

def keep_alive():
    while True:
        try:
            requests.get("https://www.google.com")
            time.sleep(60)  # Send a request every 60 seconds
        except Exception as e:
            print(f"Error: {e}")
            break

import threading
thread = threading.Thread(target=keep_alive)
thread.start()

In [None]:
# Then run this cell, after the above completed: JavaScript Keep-Alive Snippet
from IPython.display import Javascript

keep_alive_js = """
function ClickConnect(){
    console.log("Clicking to keep Colab awake...");
    document.querySelector("colab-toolbar-button#connect").click();
}
setInterval(ClickConnect, 60000);  // Click every 60 seconds
"""

display(Javascript(keep_alive_js))


You'll jump back here after running the xterm and starting Ollama. This is where the chatbot app gets created.

In [None]:
!pip -q install langchain langchain-core langchain-community ollama beautifulsoup4 chromadb gradio pypdf

# Is Colab Safe for private data?

Colab is not safe for Cisco data.

In general though, depending on personal risk thresholds, it's generally safe, at least as safe as your private Google Doc is.

No one can access your own private Colab notebooks. And Google has the incentive to make it as safe as possible for their reputation. But don't share any national secrets. But trust shouldn't just be handed out either.


In [None]:
# Import necessary libraries and modules
import os  # Provides functions for interacting with the operating system
import gradio as gr  # Gradio is used to create a web-based interface
import ollama  # Ollama is used for interacting with the Llama3.1 model
from langchain_community.llms.ollama import Ollama  # Import the Ollama class for the Llama3.1 model
from langchain_community.embeddings import OllamaEmbeddings  # Import OllamaEmbeddings for creating embeddings
from langchain_community.document_loaders import PyPDFLoader  # Import PyPDFLoader for loading PDF files
from langchain.text_splitter import RecursiveCharacterTextSplitter  # Import text splitter for splitting documents into chunks
from langchain_community.vectorstores import Chroma  # Import Chroma for creating a vector store

# Define the model to be used
MODEL = "llama3.1"
llm = Ollama(model=MODEL)  # Initialize the Ollama model with Llama3.1

# Define a simple Document class to wrap the content
class Document:
    def __init__(self, page_content, metadata=None):
        self.page_content = page_content  # The content of the document
        self.metadata = metadata if metadata is not None else {}  # Metadata associated with the document

# Function to load data from an uploaded file
def load_file(file_path):
    file_extension = os.path.splitext(file_path)[1].lower()  # Get the file extension

    if file_extension == '.txt' or file_extension == '.md':  # If the file is a text or markdown file
        try:
            with open(file_path, 'r', encoding='utf-8') as f:
                content = f.read()  # Read the content of the file
        except UnicodeDecodeError:
            with open(file_path, 'r', encoding='latin-1') as f:
                content = f.read()  # Read the content with a different encoding if there's an error
    elif file_extension == '.pdf':  # If the file is a PDF
        loader = PyPDFLoader(file_path)  # Initialize the PDF loader
        docs = loader.load()  # Load the PDF content
        content = "\n".join([doc.page_content for doc in docs])  # Join the content of all pages
    else:
        raise ValueError("Unsupported file type: {}".format(file_extension))  # Raise an error for unsupported file types

    return content  # Return the content of the file

# Function to process the uploaded file and create a vector store
def process_file(file_path):
    content = load_file(file_path)  # Load the file content
    docs = [Document(page_content=content)]  # Wrap the content in a Document object

    # Split the loaded documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    splits = text_splitter.split_documents(docs)  # Split the document into smaller chunks

    # Create Ollama embeddings and vector store
    embeddings = OllamaEmbeddings(model="nomic-embed-text")  # Initialize the embeddings model
    vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)  # Create a vector store from the document chunks

    return vectorstore  # Return the vector store

# Define the function to call the Ollama Llama3 model
def ollama_llm(question, context):
    formatted_prompt = f"Question: {question}\n\nContext: {context}"  # Format the prompt with the question and context
    response = ollama.chat(model='llama3.1', messages=[{'role': 'user', 'content': formatted_prompt}])  # Get the response from the model
    return response['message']['content']  # Return the content of the response

# Define the RAG chain itself
def rag_chain(question, vectorstore):
    retriever = vectorstore.as_retriever(
        search_type="mmr",
        search_kwargs={
            'k': 5,
            'fetch_k': 50
            })  # Get a retriever from the vector store
    retrieved_docs = retriever.invoke(question)  # Retrieve relevant documents based on the question
    formatted_context = "\n\n".join(doc.page_content for doc in retrieved_docs)  # Format the context from the retrieved documents
    return ollama_llm(question, formatted_context)  # Get the answer from the model using the formatted context

# Set up and run the LLM chain (see langchain for details)
def get_important_facts(file, question):
    vectorstore = process_file(file.name)  # Process the uploaded file to create a vector store
    return rag_chain(question, vectorstore)  # Get the answer to the question using the RAG setup

# Improved Gradio app interface
iface = gr.Interface(
    fn=get_important_facts,  # The function to be called when the interface is used
    inputs=[
        gr.File(type="filepath", file_count="single", label="📁 Upload a file"),  # Add an icon to the file upload label
        gr.Textbox(lines=2, placeholder="Enter your question here...", label="❓ Your Question")  # Add an icon to the textbox label
    ],  # Define the inputs
    outputs=gr.Textbox(label="💬 Answer"),  # Define the output with  an icon
    title="'Private' Chat using Ollama/Llama3.1",  # Simplified title
    description="Upload a file (pdf or txt) and ask questions about the provided context. The system will retrieve relevant information and provide an answer.",  # Enhanced description
    theme="default",  # Use the default theme for a clean look
    allow_flagging="never",  # Disable flagging
    css="""
        .gradio-container {
            font-family: 'Roboto', sans-serif;  # Use a clean and modern font
            padding: 20px;
        }
        .gradio-title {
            font-size: 24px;
            font-weight: 500;
            color: #3f51b5;  # Use a primary color for the title
        }
        .gradio-description {
            font-size: 16px;
            color: #757575;  # Use a secondary color for the description
        }
        .gradio-inputs, .gradio-outputs {
            margin-top: 20px;
        }
        .gradio-inputs .gradio-file, .gradio-inputs .gradio-textbox {
            margin-bottom: 20px;
        }
        .gradio-button {
            background-color: #3f51b5;  # Use a primary color for the button
            color: white;
            border: none;
            padding: 10px 20px;
            font-size: 16px;
            cursor: pointer;
        }
        .gradio-button:hover {
            background-color: #303f9f;  # Darken the button color on hover
        }
    """  # Custom CSS for styling
)

# Launch the Gradio app
iface.launch(debug=True)  # Launch the user interface with debug mode enabled (just in case)


# El Fin!
Thanks to https://medium.com/@tharindumadhusanka99/llama3-rag-on-google-colab-73c43aa53281 for some of this code!