# The basics of RAG from scratch

With this notebook you can ask questions about your own document. 
It uses ollama to run LLM's locally. 

Make sure you have downloaded and installed ollama from www.ollama.com.

#### Contents
0. Install and import packages
1. Check available models in ollama + set system prompt
2. Get the text document and split into paragraphs (chunks of text)
3. Embeddings¶
4. Set the prompt, create prompt embeddings and do similarity search
5. Get response from the LLM

#### Source
source & acknowledgements: https://decoder.sh/videos/rag-from-the-ground-up-with-python-and-ollama 

This notebook follows the flow in this flowdiagram. Please study it carefully.
![slide1](slide_flow.png)

## 0. Install and import packages

In [1]:
%pip install ollama
%pip install json

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement json (from versions: none)
ERROR: No matching distribution found for json


In [2]:
#import packages
import ollama
import time
import os
import json
import numpy as np
from numpy.linalg import norm

## 1. Check availabe models in ollama + set system prompt

For this notebook you'll need two models:
- an embedding model: nomic-embed-text
- any LLM: tinyllama, mistral or other (choose yourself, set later on)

In [3]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED       
nomic-embed-text:latest	0a109f422b47	274 MB	30 seconds ago	
mistral:latest         	2ae6f6dd7a3d	4.1 GB	8 minutes ago 	


In [5]:
#uncomment if necessary
!ollama pull nomic-embed-text

'ollama' is not recognized as an internal or external command,
operable program or batch file.


In [4]:
#set the system prompt
SYSTEM_PROMPT = """You are a helpful reading assistant who answers questions 
        based on snippets of text provided in context. Answer only using the context provided, 
        being as concise as possible. If you're unsure, just say that you don't know."""

## 2. Get the text document and split into paragraphs (chunks of text)

In this case, we simply use .txt files from the Project Gutenberg website.
www.gutenberg.org

In [5]:
# function to open a file and return paragraphs
def parse_file(filename):
    with open(filename, encoding="utf-8-sig") as f:
        paragraphs = []
        buffer = []
        for line in f.readlines():
            line = line.strip()
            if line:
                buffer.append(line)
            elif len(buffer):
                paragraphs.append((" ").join(buffer))
                buffer = []
        if len(buffer):
            paragraphs.append((" ").join(buffer))
        return paragraphs

<font color="lightblue">

1. Bestand openen: De functie opent een bestand met een opgegeven bestandsnaam, gebruikmakend van UTF-8-SIG encoding.
2. Initialisatie: Twee variabelen worden aangemaakt: paragraphs (een lijst van paragrafen) en buffer (een tijdelijke opslag voor regels van een paragraaf).
3. Regels lezen: Voor elke regel in het bestand:
    - De regel wordt gestript van witruimte.
    - Als de regel niet leeg is, wordt deze toegevoegd aan buffer.
    - Als de regel leeg is en buffer bevat regels, worden de regels in buffer samengevoegd tot een paragraaf en toegevoegd aan paragraphs, waarna buffer wordt geleegd.

4. Buffer controleren: Na het lezen van alle regels, als buffer nog regels bevat, worden deze samengevoegd en toegevoegd aan paragraphs.
5. Resultaat: De functie retourneert de lijst van paragrafen.

</font>

In [6]:
# open file as provided in the GitHub repo
filename = "peter-pan.txt"
paragraphs = parse_file(filename)
print(paragraphs[0:3]) #print first 3 paragraphs
print(f'Total number of paragraphs: {len(paragraphs)}')

['The Project Gutenberg eBook of Peter Pan', 'This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.', 'Title: Peter Pan']
Total number of paragraphs: 1736


## 3. Embeddings

In [7]:
# functions to save, load and get the embeddings
def save_embeddings(filename, embeddings):
    # create dir if it doesn't exist
    if not os.path.exists("embeddings"):
        os.makedirs("embeddings")
    # dump embeddings to json
    with open(f"embeddings/{filename}.json", "w") as f:
        json.dump(embeddings, f)

def load_embeddings(filename):
    # check if file exists
    if not os.path.exists(f"embeddings/{filename}.json"):
        return False
    # load embeddings from json
    with open(f"embeddings/{filename}.json", "r") as f:
        return json.load(f)

def get_embeddings(filename, modelname, chunks):
    # check if embeddings are already saved
    if (embeddings := load_embeddings(filename)) is not False:
        return embeddings
    # get embeddings from ollama
    embeddings = [
        ollama.embeddings(model=modelname, prompt=chunk)["embedding"]
        for chunk in chunks
    ]
    # save embeddings
    save_embeddings(filename, embeddings)
    return embeddings

<font color="lightblue">

#### save_embeddings

Slaat embeddinggegevens op in een JSON-bestand.

- Werking:
    - Controleert of de directory "embeddings" bestaat, en maakt deze indien nodig aan.
    - Schrijft de embeddings naar een JSON-bestand met de opgegeven bestandsnaam binnen de "embeddings" directory.


#### load_embeddings

Laadt embeddinggegevens van een JSON-bestand als het bestaat.

- Werking:

    - Controleert of het JSON-bestand met de embeddings bestaat.
    - Als het bestand bestaat, leest het de embeddings vanuit het JSON-bestand en retourneert deze.
    - Als het bestand niet bestaat, retourneert het False.

#### get_embeddings

Verkrijgt embeddings voor gegeven tekststukken (chunks), hetzij door ze te laden van een opgeslagen bestand, hetzij door ze te genereren en vervolgens op te slaan.

- Werking:
    - Controleert of de embeddings al zijn opgeslagen door load_embeddings aan te roepen. Als ze bestaan, worden deze teruggegeven.
    - Als de embeddings nog niet zijn opgeslagen, worden deze gegenereerd door de ollama.embeddings functie aan te roepen voor elk tekststuk (chunk).
    - De nieuw gegenereerde embeddings worden opgeslagen met behulp van save_embeddings.
    - De embeddings worden geretourneerd.


</font>

In [8]:
#get the embeddings
embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)
len(embeddings) #should be same number as paragraphs

1736

## 4. Set the prompt, create prompt embeddings, do similarity search

In [9]:
#set the prompt
prompt = "Tell me about tinke bell?"

In [10]:
# Create the prompt embeddings. Use the same embedding model!
prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]

In [11]:
# find cosine similarity of every chunk to a given embedding
def find_most_similar(needle, haystack):
    needle_norm = norm(needle)
    similarity_scores = [
        np.dot(needle, item) / (needle_norm * norm(item)) for item in haystack
    ]
    return sorted(zip(similarity_scores, range(len(haystack))), reverse=True)

<font color="lightblue">

#### find_most_similar

Vindt de meest vergelijkbare embedding(s) in een lijst van embeddings (haystack) ten opzichte van een gegeven embedding (needle) door gebruik te maken van de cosinus-similariteit.

- Werking:

1. Normalisatie van de needle embedding: Bereken de norm (lengte) van de needle embedding.
2. Berekening van similariteitsscores: Voor elke embedding in de haystack wordt de cosinus-similariteit met de needle berekend:
    - De cosinus-similariteit tussen twee vectoren wordt berekend als de dot-product gedeeld door het product van hun normen.
3. Sorteren van scores: De similariteitsscores worden samen met hun indexen gesorteerd in aflopende volgorde van similariteit.
4. Resultaat: De gesorteerde lijst van tuples (similariteitsscore, index) wordt geretourneerd.

</font>

In [12]:
# find paragraphs most similar to the prompt
most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]
most_similar_chunks

[(0.6675811809011156, 1576),
 (0.6637123102530766, 248),
 (0.6530704549422695, 252),
 (0.6492243466576073, 1024),
 (0.6416622182936448, 278)]

## 5. Get a response from the LLM

In [13]:
#set your model here!
model='mistral'

In [14]:
response = ollama.chat(
        model= model,
        messages=[
            {
                "role": "system",
                "content": SYSTEM_PROMPT
                + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
            },
            {"role": "user", "content": prompt},
        ],
    )
print("\n\n")
print(response["message"]["content"])




 Tinker Bell is a fairy who mends pots and kettles. She can become insolent and uses harsh language, as shown by her response "You silly ass." This interaction suggests she has some level of intelligence and strong feelings.


# Own data set (recepten)

In [23]:
# open file as provided in the GitHub repo
filename = "recepten.txt"
paragraphs = parse_file(filename)
print(paragraphs[0:3]) #print first 3 paragraphs
print(f'Total number of paragraphs: {len(paragraphs)}')

['Almond Meal Chicken Fingers', 'Ingredients: 2 Large chicken breasts or tenders 1 cup almond meal or almond flour 2 eggs Various spices to tase: Oregano, salt, pepper, cajun spice, cayenne pepper, Paprika, garlic salt, onion salt, thyme, etc Veggie as a side', 'In one mixing bowl add: 2 eggs whipped (for dip and roll) In another bowl: spices and 1 cup almond flour']
Total number of paragraphs: 42


In [24]:
#get the embeddings for new dataset
filename = "recepten.txt"
embeddings = get_embeddings(filename, "nomic-embed-text", paragraphs)
len(embeddings) #should be same number as paragraphs

42

In [27]:
#set the prompt
prompt = "How do I make a creamy grape salad?"

# Create the prompt embeddings. Use the same embedding model!
prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
# find paragraphs most similar to the prompt
most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]

response = ollama.chat(
        model= model,
        messages=[
            {
                "role": "system",
                "content": SYSTEM_PROMPT
                + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
            },
            {"role": "user", "content": prompt},
        ],
    )
print("\n\n")
print(response["message"]["content"])




 To make a Creamy Grape Salad, follow these steps:
1. In a large bowl, beat the softened cream cheese, sour cream, sugar, and vanilla until blended.
2. Add seedless red and green grapes to the mixture and toss to coat.
3. Transfer the mixture to a serving bowl, cover, and refrigerate until serving.
4. Just before serving, sprinkle with brown sugar and chopped pecans.
No need for the ingredients listed under "Optional" in this recipe.


<font color="lightblue">

Hier kan ze zien dat dit RAG model het recept uit de recepten text file haalt. Dit recept is dus gebasseerd op het daadwerkelijke recept in de text file.

</font>

In [28]:
#set the prompt
prompt = "How do I make a banana muffins?"

# Create the prompt embeddings. Use the same embedding model!
prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
# find paragraphs most similar to the prompt
most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]

response = ollama.chat(
        model= model,
        messages=[
            {
                "role": "system",
                "content": SYSTEM_PROMPT
                + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
            },
            {"role": "user", "content": prompt},
        ],
    )
print("\n\n")
print(response["message"]["content"])




 To make Banana Nut Muffins, follow these steps:

1. Preheat your oven to 350°F (175°C). Grease or line a muffin tin with paper liners.
2. In a medium-sized bowl, whisk together the flour, baking soda, baking powder, and salt.
3. In a separate large bowl, combine the mashed bananas, sugar, beaten egg, melted butter, and vanilla extract. Mix until well combined.
4. Gradually add the dry ingredient mixture to the banana mixture. Stir until just combined, avoiding overmixing. Fold in the chopped nuts.
5. Spoon the batter into the prepared muffin tin, filling each cup about two-thirds full.
6. Bake for 20-25 minutes, or until a toothpick inserted into the center of a muffin comes out clean.
7. Allow the muffins to cool in the tin for 5 minutes, then transfer to a wire rack to cool completely.
8. Enjoy your delicious homemade Banana Nut Muffins!


<font color="lightblue">

Ook hier basseerd hij het recept op recepten uit de text file.

</font>

In [29]:
#set the prompt
prompt = "How do I make spaghetti?"

# Create the prompt embeddings. Use the same embedding model!
prompt_embedding = ollama.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
# find paragraphs most similar to the prompt
most_similar_chunks = find_most_similar(prompt_embedding, embeddings)[:5]

response = ollama.chat(
        model= model,
        messages=[
            {
                "role": "system",
                "content": SYSTEM_PROMPT
                + "\n".join(paragraphs[item[1]] for item in most_similar_chunks),
            },
            {"role": "user", "content": prompt},
        ],
    )
print("\n\n")
print(response["message"]["content"])




 To make spaghetti, follow steps 4-5 in the provided recipe for Spaghetti Bolognese. Here's a summary:

1. Cook the spaghetti noodles according to the package instructions.
2. Drain and set aside.
3. Serve the cooked spaghetti topped with the prepared Bolognese sauce (as described in steps 1-6 of the recipe) and sprinkled with Parmesan cheese.

Enjoy your homemade spaghetti!


<font color="lightblue">

Nu stel ik een iets bredere/vagere vraag, alleen spaghetti. Hier krijg ik dus een niet compleet antwoord. Ik krijg alleen het laatste deel van het recept mee in mijn antwoord. 

Dit kan misschien ook liggen aan iets anders, en niet aan de formulering van mijn vraag.

</font>