[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/michaelford85/atlas-rag-pipeline/blob/main/atlas_rag_pipeline.ipynb)

# Atlas RAG Pipeline — Retrieval-Augmented Generation with MongoDB & Ollama 

This notebook demonstrates a **Retrieval-Augmented Generation (RAG)** workflow built on:

- **MongoDB Atlas Vector Search** for semantic retrieval  
- **VoyageAI embeddings** for context-rich text representations  
- **Ollama (Mistral model)** for local, private LLM inference  

The goal: retrieve semantically relevant text from MongoDB Atlas and generate natural language answers using a locally hosted large language model.  

## Clone the Project Repository

Before running the RAG pipeline, we install dependencies and pull the latest version of the code from GitHub.  
This cell:

1. Updates the system package list  
2. Installs `git` (if not already available)  
3. Removes any previous copy of the project  
4. Clones the **atlas-rag-pipeline** repository from GitHub  

This ensures that the notebook always uses the most recent version of the pipeline code.

In [None]:
!apt-get update -qq > /dev/null
!apt-get install --yes git > /dev/null
!rm -rf atlas-rag-pipeline
!git clone https://github.com/michaelford85/atlas-rag-pipeline.git

## Import Required Libraries

We begin by importing Python libraries used throughout the pipeline:  

- **pymongo** – to connect to MongoDB Atlas  
- **voyageai** – for embedding generation  
- **requests** – to communicate with the local Ollama API  
- **dotenv_vault** – for securely loading environment variables  
- **certifi** – for verified SSL certificates in Atlas connections  
- **IPython.display** – for interactive display elements  

These imports set up the environment for retrieval, generation, and visualization.

In [None]:
!pip install --no-cache-dir -r atlas-rag-pipeline/requirements.txt
!pip install --upgrade "docutils>=0.20,<0.22"


## Access Secure Environment Variables from Google Drive

If you’re running this notebook in **Google Colab**, you have two options for loading your environment variables:
If you’re running this notebook in **Google Colab**, you can securely load your private `.env.vault` key from Google Drive.  

### Option 1: Secure (Recommended): Load from Google Drive

This approach keeps your credentials out of the notebook and automatically decrypts values from your `.env.vault` file.
#### Steps:
1. Mount your Google Drive so the notebook can access stored files.
2. Read your Dotenv Vault key and ID (e.g., `atlas_rag_pipeline_voyageai_dotenv_key.txt` and `atlas_rag_pipeline_voyageai_dotenv_vault_id.txt`) from your Drive folder.
3. Set the environment variables so the notebook can decrypt your .env.vault.

This approach keeps sensitive credentials out of the notebook while allowing you to authenticate seamlessly during Colab sessions.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os

# Load Dotenv Vault credentials from Google Drive
os.environ["DOTENV_KEY"] = open(
    '/content/drive/MyDrive/Colab Notebooks/secrets/atlas_rag_pipeline_voyageai_dotenv_key.txt'
).read().strip()

os.environ["DOTENV_VAULT_ID"] = open(
    '/content/drive/MyDrive/Colab Notebooks/secrets/atlas_rag_pipeline_voyageai_dotenv_vault_id.txt'
).read().strip()

# --- Use bash commands in a separate cell block ---
!npx dotenv-vault@latest new $DOTENV_VAULT_ID $DOTENV_KEY
!npx dotenv-vault@latest pull development atlas-rag-pipeline/.env.vault
!ls -la atlas-rag-pipeline

### Option 2: Simple — Use Dummy or Local Values

If you’re running locally (or don’t want to mount Google Drive), you can set placeholder values instead.
These won’t decrypt a `.env.vault`, but they let you execute the notebook without secrets.

## NOTE 
- **Never commit real keys or vault IDs to GitHub**.
- Dummy values are suitable for demonstration or dry-run mode.
- For production use, always load credentials from an encrypted vault or a local .env.vault file.

In [None]:
## Replace all values below with your actual credentials - DO NOT COMMIT REAL KEYS TO GITHUB!

## API key for VoyageAI embedding service.
os.environ["VOYAGE_API_KEY"] = "v2_dummykey_abc123xyz789"

## MongoDB Atlas connection details.
os.environ["MONGODB_URI"] = "mongodb+srv://demo_user:demo_password@cluster0.mongodb.net/?retryWrites=true&w=majority"

## MongoDB Atlas API public key and private key
os.environ["ATLAS_PUBLIC_KEY"] = "abcd1234efgh5678ijkl"
os.environ["ATLAS_PRIVATE_KEY"] = "mnop5678qrst9012uvwx"

# 🧠 Generate and Manage Vector Embeddings in MongoDB

Before running retrieval or semantic search queries, we need to make sure our **MongoDB Atlas collection** contains up-to-date vector embeddings and a properly configured **Atlas Vector Search index**.

This section runs a coordinated set of helper scripts that prepare the collection for vector-based querying:

---

### 1️⃣ `update_voyage_ai_embeddings.py`
Uses the [Voyage AI](https://www.voyageai.com) API to generate (or refresh) embeddings for each document’s text fields defined in `EMBEDDING_PATHS`.

- Each field listed in `EMBEDDING_PATHS` is embedded and stored under a corresponding field in `EMBEDDING_NAMES`.  
  For example:
  - `"fullplot"` → `"fullplot_embedding"`
  - `"plot"` → `"plot_embedding"`

---

### 2️⃣ `manage_vector_index.py`
Creates or ensures the existence of an **Atlas Vector Search index** referencing all embedding fields.

- Enables `$vectorSearch` queries over one or more embedding vectors (e.g., `"fullplot_embedding"` and `"plot_embedding"`).  
- Waits until the index is fully built before exiting, guaranteeing it’s ready for queries.

---

### 3️⃣ *(Optional)* `remove_embeddings.py`
Removes existing embedding fields if you need to regenerate them or rebuild the index from scratch.

---

Together, these scripts automate the preparation process for **semantic retrieval** and **Retrieval-Augmented Generation (RAG)** workloads.

In [None]:
## Replace all values below with your actual database values

import os
os.environ["DB_NAME"] = "sample_mflix"
os.environ["COLL_NAME"] = "movies"
os.environ["BATCH_SIZE"] = "100"
os.environ["MODEL_NAME"] = "voyage-3-large"
os.environ["NUM_DIMENSIONS"] = "1024"
os.environ["ATLAS_GROUP_ID"] = "000000000000000000000000"
os.environ["ATLAS_CLUSTER"] = "demo-cluster"

# An array of embedding paths and names for making a vector index, separated by commas
os.environ["EMBEDDING_PATHS"] = "fullplot,plot"
os.environ["EMBEDDING_NAMES"] = "fullplot_embedding,plot_embedding"

# The Vector Index based on the embedding names specified above
os.environ["INDEX_NAME"] = "fullplot_vector_index"

# Create or update embeddings and vector search index
%run atlas-rag-pipeline/update_voyage_ai_embeddings.py
%run atlas-rag-pipeline/manage_vector_index.py

#Option to remove all embedding fields from the specified collection
# %run atlas-rag-pipeline/remove_embeddings.py

## Set Up Ollama on Ubuntu with Ansible

This step uses **Ansible** to automatically install and configure **Ollama** on an Ubuntu environment.  
Running this playbook ensures that the local system (or Google Colab VM) has the Ollama service and dependencies properly set up.

Specifically, this command:
- Installs **Ollama** and its required packages  
- Configures the **systemd service** so Ollama runs continuously  
- Ensures the **Mistral model** can be pulled and served via API at `http://localhost:11434`

This automation makes your environment consistent and repeatable across different machines or Google Colab sessions.

In [None]:
!ansible-playbook atlas-rag-pipeline/setup_ollama_ubuntu.yml

## Enable Line Wrapping for Readable Output

By default, Jupyter and Google Colab often display long text outputs on a single line, requiring horizontal scrolling.  
This helper function injects a small CSS rule into the notebook that enables **automatic text wrapping** inside output cells.

Every time a new cell runs, the `pre_run_cell` event re-applies the style so wrapping stays consistent throughout the notebook — perfect for displaying long model responses or logs neatly.

In [None]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

## Test the Local Mistral Model via Ollama API

Before integrating Ollama into the full RAG pipeline, it’s important to verify that the local **Mistral** model is running correctly and capable of generating responses.

This command sends a test prompt directly to the **Ollama REST API** at `http://localhost:11434/api/generate`.  
It asks a simple, knowledge-based question about preventing frozen pipes and expects a concise, paragraph-style answer.

The command:
- Uses `curl` to send a JSON payload with the model name (`mistral`) and prompt text  
- Disables streaming for easier readability  
- Pipes the output through `jq` to extract and display only the `.response` field  

If the setup is correct, this cell should output a coherent answer — confirming that the local generative AI model is online and functioning properly.

In [None]:
!curl -s http://localhost:11434/api/generate \
  -d '{{"model": "mistral", "prompt": "What should homeowners do in order to keep their pipes from freezing in the winter? Keep your explanation as a paragraph of 10 sentences or less.", "stream": false}}' \
  | jq -r '.response'

## Run the Full RAG Pipeline with a Custom Prompt

Now that both the **retrieval** and **generation** components are configured, we can run the complete **Retrieval-Augmented Generation (RAG)** pipeline end-to-end.  

This command executes `rag_with_input.py`, which:
1. Takes a natural-language query as input (in this case: *“Which movies feature artificial intelligence or sentient robots?”*)  
2. Generates embeddings using **VoyageAI**  
3. Retrieves semantically similar movie plots from **MongoDB Atlas**  
4. Passes the retrieved context to the **Mistral** model via **Ollama**  
5. Returns a generated answer that blends factual retrieval with fluent reasoning  

During discussions or demos, you can modify the quoted text to explore different topics — testing how the system responds to various prompts in real time.

In [None]:
%run atlas-rag-pipeline/rag_with_input.py "Which movies feature artificial intelligence or sentient robots?"

### Install Gradio

Before launching the RAG Query interface, install **Gradio**, a lightweight web UI framework for running interactive machine learning demos directly in Google Colab or your local environment.

In [None]:
!pip install gradio --quiet

### Launch the MongoDB RAG Query Assistant

This section launches a simple Gradio interface that connects to your MongoDB Vector Search index and allows you to ask natural-language questions.

The interface:
- Calls your retrieve_relevant_docs() and generate_answer() functions.
- Returns the generated answer in a scrollable textbox.
- Provides a “Copy Output” button for convenience.

How it works:
- You enter a question in the input box.
- The notebook retrieves semantically relevant documents from MongoDB Atlas.
- It uses your local model (e.g., Ollama with Mistral) to generate an answer.

In [None]:
import gradio as gr
import sys
sys.path.append('/content/atlas-rag-pipeline')
from rag_with_input import retrieve_relevant_docs, generate_answer

def rag_query(question):
    docs = retrieve_relevant_docs(question, limit=3)
    if not docs:
        return "⚠️ No relevant documents found."
    answer = generate_answer(question, docs)
    return answer

demo = gr.Interface(
    fn=rag_query,
    inputs=gr.Textbox(
        label="Ask a Question",
        placeholder="Type your query here...",
        lines=2,       # input box height
    ),
    outputs=gr.Textbox(
        label="Output",
        lines=15,      # 👈 make output area much taller
        show_copy_button=True
    ),
    title="MongoDB RAG Query Assistant",
    description="Ask questions against your MongoDB vector index.",
)

app = demo.launch(share=True)

### Gracefully Shut Down the Gradio App

When you’re done using the interface (or want to restart it with updated code),  
you can safely close all active Gradio instances by running the following cell:

In [None]:
gr.close_all()

## Restart the Ollama Service

Occasionally, especially during long Colab sessions or multiple test runs, the **Ollama service** may hold onto old connections or become unresponsive.  
This cell cleanly restarts the local Ollama server to ensure a fresh, stable session before generating new responses.

Here’s what each command does:
1. `pkill ollama` — Stops any existing Ollama processes (ignores errors if none are running).  
2. `nohup ollama serve > /tmp/ollama.log 2>&1 &` — Restarts Ollama in the background and logs output silently.  
3. `sleep 10` — Waits a few seconds to allow the server to fully initialize before accepting API requests.

Running this cell helps maintain reliable performance and prevents connection errors when calling the Mistral model.

In [None]:
!pkill ollama || true
!nohup ollama serve > /tmp/ollama.log 2>&1 &
!sleep 10