# Building a RAG System Locally with Ollama, LlamaIndex, and Chroma DB

## Exercise 0 - Install Workshop Dependencies

Before starting the workshop, ensure all necessary dependencies are installed in your Python environment. Use the following steps to set up your environment.

### Step 1: Create a Virtual Environment

Create and activate a virtual environment to isolate the workshop dependencies. For this workshop, we use **Python 3.11**. Choose between **venv** or **conda** (using Mamba for efficiency).

##### Using `venv`

On Linux/Mac:
  ```bash
  python3.11 -m venv local-rag
  source local-rag/bin/activate
  ```
On Windows:
  ```bash
  python3.11 -m venv local-rag
  local-rag\Scripts\activate
  ```

##### Using `conda`

   ```bash
   conda create -n local-rag python=3.11
   conda activate local-rag
   ```

### Step 2: Install Required Packages

Install all the required dependencies:

```bash
pip install -r requirements.txt
```

### Step 3: Verify Installation

Check that the key packages are installed correctly by importing them in Python:

In [1]:
import sys
sys.path.insert(0,'/Users/rhita.mamou/Downloads/lauzhack-workshop-2024-main/')
sys.path

['/Users/rhita.mamou/Downloads/lauzhack-workshop-2024-main/',
 '/Users/mamourhita/opt/anaconda3/envs/py311/lib/python311.zip',
 '/Users/mamourhita/opt/anaconda3/envs/py311/lib/python3.11',
 '/Users/mamourhita/opt/anaconda3/envs/py311/lib/python3.11/lib-dynload',
 '',
 '/Users/mamourhita/opt/anaconda3/envs/py311/lib/python3.11/site-packages',
 '/Users/mamourhita/Desktop/first-project/src']

In [2]:
import chromadb
import llama_index
import ollama

print("Dependencies installed successfully!")

Dependencies installed successfully!


## Exercise 1 - Setting up Ollama

### Install Ollama

First, download and install Ollama from the official website: [https://ollama.com/download/](https://ollama.com/download/).

### Pull Required Models

Open a terminal and run the following commands to download the necessary models:

1. Pull the `llama3` model:
   ```bash
   ollama pull llama3
   ```

2. Pull the Nomic embedding model if required:
   ```bash
   ollama pull nomic
   ```

### Run the Model

Once the models are installed, you can run the `llama3` model and test it by writing some prompts. Use the following command:

```bash
ollama run llama3
```

Type a prompt and observe the output to ensure everything is working correctly.

### Interact with Ollama in Python



In [2]:
import ollama

response = ollama.generate(model="llama3.2", prompt="What is EPFL?", stream=True)

for r in response:
    print(r["response"], end="")

EPFL stands for École polytechnique fédérale de Lausanne, which translates to Polytechnic School of Lausanne in English. It is a Swiss federal university that was founded in 1858 and is now one of the top universities in Switzerland.

EPFL is known for its strong programs in science, technology, engineering, and mathematics (STEM) fields, as well as humanities and social sciences. The university has a global reputation for excellence in research and innovation, particularly in areas such as physics, chemistry, computer science, engineering, and biology.

EPFL has multiple campuses, with the main campus located in Lausanne, Switzerland. It also has a strong presence of international students from around the world, making it an attractive option for students looking to study abroad or pursue advanced research opportunities.

The university is organized into several faculties, including:

* Faculty of Arts and Social Sciences
* Faculty of Biology and Medicine
* Faculty of Engineering and 

## Exercise 2 - Getting Started with LlamaIndex and ChromaDB

**LlamaIndex** ([official site](https://llamaindex.ai)) is a framework for connecting LLMs with data sources, enabling efficient retrieval and interaction with structured or unstructured data.

**Chroma** ([official site](https://www.trychroma.com)) is a vector database designed for managing embeddings and serving as a retrieval layer for LLM applications.

In this exercise, we’ll explore how to set up and use LlamaIndex to index and retrieve data in a **Chroma** database.

### Step 0: Let's download a PDF

You can start by adding documents to the `./docs` folder. If you don't know what to use, we suggest downloading the PDF at the following link:

https://observationofalostsoul.wordpress.com/wp-content/uploads/2011/05/the-gospel-of-the-flying-spaghetti-monster.pdf

### Step 1: Set Up Chroma as the Storage Backend

Initialize the Chroma database and configure it for use with LlamaIndex. Here, we create an **Ephemeral Client** and collection, which stores data temporarily in memory without persisting it. This is ideal for testing and experimentation.

In [5]:
import chromadb

chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.get_or_create_collection("mydocs")

You can also create a **Persistent Client** that will preserve your database across sessions with:

```python
client = chromadb.PersistentClient(path="/path/to/save/to")
```

### Step 2: Set Up LlamaIndex connectors

Configure LlamaIndex to connect with Chroma as the vector store and set up a storage context. A **storage context** is an abstraction that manages how data is stored and retrieved, enabling seamless integration with different storage backends like Chroma.

In [6]:
from llama_index.core import StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

### Step 3: Load and explore documents

We can use LlamaIndex's `SimpleDirectoryReader` to **ingest documents from a directory**. This utility reads files from a specified directory and prepares them for indexing by splitting the content into manageable chunks.

In [7]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("docs", recursive=True).load_data()

In [8]:
documents[0].dict()

{'id_': '9c0813a6-bc50-4c77-85e6-85797d4b3c94',
 'embedding': None,
 'metadata': {'page_label': '1',
  'file_name': '2379143-20_releve_de_postes_2024-04-17_00-55-17470.PDF',
  'file_path': '/Users/mamourhita/Desktop/RAG/docs/2379143-20_releve_de_postes_2024-04-17_00-55-17470.PDF',
  'file_type': 'application/pdf',
  'file_size': 101960,
  'creation_date': '2024-11-17',
  'last_modified_date': '2024-11-17'},
 'excluded_embed_metadata_keys': ['file_name',
  'file_type',
  'file_size',
  'creation_date',
  'last_modified_date',
  'last_accessed_date'],
 'excluded_llm_metadata_keys': ['file_name',
  'file_type',
  'file_size',
  'creation_date',
  'last_modified_date',
  'last_accessed_date'],
 'relationships': {},
 'text': "Compte privé CSX 2379143-20\nMonnaie Francs suisses\nIBAN CH86 0483 5237 9143 2000 0\nVue d'ensemble du compte\nSolde reporté\nTotal des débits\nTotal des crédits\nSolde final\n6.58\n-1'779.61\n2'240.00\n466.97\nCREDIT SUISSE (Suisse) SA\nCH-8070 Zürich (0589)\nCustome

Let's explore the content of the documents further with a dataframe.

In [9]:
from typing import List

import pandas as pd
from llama_index.core.schema import TextNode


def data_to_df(nodes: List[TextNode]):
    """Convert a list of TextNode objects to a pandas DataFrame."""
    return pd.DataFrame([node.dict() for node in nodes])

In [10]:
document_df = data_to_df(documents)

document_df.head()

Unnamed: 0,id_,embedding,metadata,excluded_embed_metadata_keys,excluded_llm_metadata_keys,relationships,text,mimetype,start_char_idx,end_char_idx,text_template,metadata_template,metadata_seperator,class_name
0,9c0813a6-bc50-4c77-85e6-85797d4b3c94,,"{'page_label': '1', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},Compte privé CSX 2379143-20\nMonnaie Francs su...,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
1,0eac77bd-868f-4fe0-96c6-8382485c8aff,,"{'page_label': '2', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},Relevé de postes détaillé 17.03.2024 au 16.04....,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
2,37cd6a60-9927-4fde-a752-4ec94d531dc5,,"{'page_label': '3', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},Relevé de postes détaillé 17.03.2024 au 16.04....,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
3,7540ebe7-624a-49b8-8a03-6cd8bf0bbf70,,"{'page_label': '4', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},Relevé de postes détaillé 17.03.2024 au 16.04....,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document
4,1a56c828-4bcf-4a60-922f-120e4c387113,,"{'page_label': '1', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{},Compte privé CSX 2379143-20\nMonnaie Francs su...,text/plain,,,{metadata_str}\n\n{content},{key}: {value},\n,Document


We observe several attributes, including `metadata`, `text`, `text_template`, and others. Let's focus on these three key categories:

- **`metadata`**: This attribute contains additional information about the document, such as its source, creation date, or tags that can be used for filtering or retrieval purposes.
- **`text`**: The main content of the document, representing the raw textual data that will be indexed and queried.
- **`text_template`**: A structured format or schema for the document's text, often used to define how the content should be presented or processed during queries. 

These attributes play distinct roles in organizing and interacting with your data. Feel free to explore the different attributes at this point.

### Step 4: Index and the documents

To ingest documents into an index, we will need an embedder model to convert the document content into vector representations. These embeddings enable efficient similarity searches and retrievals.

In [11]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

  from .autonotebook import tqdm as notebook_tqdm


In LlamaIndex, we can create an index using the `VectorStoreIndex` class, which enables efficient storage and retrieval of document embeddings and integrates with various storage backends and embedding models. We use here the chroma collection we previously defined.

In [12]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=embed_model,
    show_progress=True,
)

Parsing nodes: 100%|██████████| 11/11 [00:00<00:00, 796.36it/s]
Generating embeddings: 100%|██████████| 14/14 [00:04<00:00,  3.35it/s]


### Step 5: Query the Index for Retrieval

Once the documents are indexed, we can perform retrieval on them. This allows us to ask questions or search for relevant content based on the embeddings stored in the index.

In [13]:
retriever = index.as_retriever(
    similarity_top_k=3,
)

nodes_with_score = retriever.retrieve("What is the Flying Spaghetti Monster?")
nodes = [n.node for n in nodes_with_score]
data_to_df(nodes)

Unnamed: 0,id_,embedding,metadata,excluded_embed_metadata_keys,excluded_llm_metadata_keys,relationships,text,mimetype,start_char_idx,end_char_idx,text_template,metadata_template,metadata_seperator,class_name
0,c6384548-d6f7-4ee1-8032-71273bb332a8,,"{'page_label': '1', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': '9c081...,Compte privé CSX 2379143-20\nMonnaie Francs su...,text/plain,0,1801,{metadata_str}\n\n{content},{key}: {value},\n,TextNode
1,1dca2c88-ab6d-4dc9-90ff-1f06f4a8f2e5,,"{'page_label': '5', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': 'f6374...,10.24 TWINT Paiement du 12.10.24 à 09:22\nSB...,text/plain,1540,2046,{metadata_str}\n\n{content},{key}: {value},\n,TextNode
2,ffec4715-7a55-4e6b-a9ec-40a7eb882daf,,"{'page_label': '3', 'file_name': '2379143-20_r...","[file_name, file_type, file_size, creation_dat...","[file_name, file_type, file_size, creation_dat...",{'NodeRelationship.SOURCE': {'node_id': '37cd6...,Relevé de postes détaillé 17.03.2024 au 16.04....,text/plain,0,2027,{metadata_str}\n\n{content},{key}: {value},\n,TextNode


Congrats! You've retrieved your first data!

## Exercise 3 - Your First RAG!

For a Retrieval-Augmented Generation (RAG) system, you need a Large Language Model (LLM) to generate answers to your queries by combining retrieved knowledge with the model's reasoning capabilities. At this point, Ollama comes to help as the LLM powering your RAG system. We set it up for LlamaIndex.

In [14]:
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3.2", request_timeout=120.0)

Everything is ready for querying your data. You can define a query engine and start asking it questions. Congrats, You have a working RAG!

In [17]:
query_engine = index.as_query_engine(
    llm=llm,
    similarity_top_k=3,
    streaming=True,
)



In [19]:
response = query_engine.query("What is the biggest category of expense on Octobre 2024?")
response.print_response_stream()

Based on the provided transaction records for October 2024, the biggest category of expense appears to be 'Carte de débit paiement point vente CHF' (point-of-sale card purchases), with several transactions exceeding 7.00 CHF.

## Going further...

### Prompt template

LlamaIndex offers an easy way to improve the generated answer by prompting the LLM with a custom template, in which the relevant context will be fed.

In [20]:
from llama_index.core import PromptTemplate

template = (
    "Read carefully each of the bank report, analyse each expense, and try to categorize them"
    
)
qa_template = PromptTemplate(template)


query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
    text_qa_template=qa_template,
)

response = query_engine.query("What is the biggest category of expense on Octobre 2024?")

In [21]:
response.print_response_stream()

I'd be happy to help you with analyzing and categorizing expenses from a bank report. However, I don't see any specific bank report provided in your message.

Please provide the actual bank report or the details of the expenses you'd like me to analyze, such as:

* A list of transactions
* A statement of accounts
* Specific categories (e.g., housing, transportation, food)

Once I have the necessary information, I'll do my best to categorize each expense and provide an analysis.

Please paste the bank report or the expenses you'd like me to analyze, and I'll get started!

### Better embeddings

Under the hood, a basic retriever is used. Let's look at how data is saved.

In [None]:
nodes_with_score = response.source_nodes
nodes = [n.node for n in nodes_with_score]
data_to_df(nodes)

In [None]:
from llama_index.core.schema import MetadataMode

node = nodes[0]

In [None]:
print(node.text)

But what do the models see exactly? Let's have a look.

In [None]:
print(
    "The Embedding model sees this: \n",
    node.get_content(metadata_mode=MetadataMode.EMBED),
)

In [None]:
print(
    "The LLM sees this: \n",
    node.get_content(metadata_mode=MetadataMode.LLM),
)

We might want to change the embeddings. For example, we can split the sentences in smaller blocks.

In [None]:
# Reset the index data
index.vector_store.clear()

In [None]:
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter, SentenceWindowNodeParser

sentence_splitter = SentenceSplitter(chunk_size=200)

index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=embed_model,
    show_progress=True,
    transformations=[sentence_splitter],
)

### There are many more ways to improve the RAG system, explore them on the official LlamaIndex page!