curl -fsSL https://ollama.com/install.sh | sh

# Introduction to Retrieval-Augmented Generation (RAG) Workflows

In this notebook, we will explore common workflows for Retrieval-Augmented Generation (RAG). RAG is a powerful technique that combines the strengths of information retrieval and natural language generation to produce more accurate and contextually relevant responses when compared to classic LLM generation. This approach is particularly useful in scenarios where the model needs to generate responses based on a large corpus of documents or knowledge base.

## Objectives
- Understand the basic concepts of Retrieval-Augmented Generation.
- Learn how to set up a retrieval system to fetch relevant documents.
- Integrate the retrieval system with a generation model to produce augmented responses.
- Explore different use cases and applications of RAG.

## Prerequisites
- Very basic understanding of natural language processing (NLP) and machine learning.
- Familiarity with Python programming.

## Notebook Overview
1. **Data Preparation**: Load and preprocess the corpus of documents.
2. **Data Transformation**: Explore tokenization and vector embeddings.
3. **Simple Retrieval System Setup**: Implement a simple vector-based retrieval system to fetch relevant documents based on a query.
4. **Generative Model Integration**: Combine the simple retrieval system with a generation model to produce responses.
5. **Advanced Retrieval System Setup**: Implement a more advanced retrieval system, using HNSW and BM25.
6. **Try Prompt Tuning**: Experiment with different prompts and RAG variations.
7. **Evaluation**: Assess the performance of the RAG system using appropriate metrics.

Let's get started by setting up our environment and loading the necessary libraries.

In [3]:
%pip install requests
%pip install gensim
%pip install datasets
%pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu125/

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Looking in indexes: https://pypi.org/simple, https://abetlen.github.io/llama-cpp-python/whl/cu125/
Collecting llama-cpp-python
  Using cached llama_cpp_python-0.3.1.tar.gz (63.9 MB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Using cached diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting jinja2>=2.11.3 (from llama-cpp-python)
  Using cached jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama-cpp-python)
  Using cached MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Using cached diskcache-5.6.3-py3-none-any.whl (45

In [3]:
import requests

# Define the Ollama API endpoint and your model
MODEL_API_URL = "http://localhost:11434/api/chat"
MODEL_NAME = "llama3.1:8b"
EMBEDDING_API_URL = "http://localhost:11434/api/embeddings"
EMBEDDING_MODEL_NAME = "nomic-embed-text"


def generate_text(prompt):
    payload = {
        "model": MODEL_NAME,
        "messages": [{"role": "user", "content": prompt}],
        "max_completion_tokens": 256,  # Adjust max tokens for desired length
        "temperature": 0.7,  # Adjust temperature for randomness
        "stream": False,
    }

    response = requests.post(MODEL_API_URL, json=payload)

    if response.status_code == 200:
        return response.json().get("message").get("content")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None


def generate_embedding(prompt):
    payload = {
        "model": EMBEDDING_MODEL_NAME,
        "prompt": prompt,
    }

    response = requests.post(EMBEDDING_API_URL, json=payload)

    if response.status_code == 200:
        return response.json().get("embedding", "")
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None

# LLM Embeddings != Vector DB Embeddings

Explanation of GenSim:

    Tokenization: The sentences are split into tokens (words) and converted to lowercase.
    Word2Vec Model: The Word2Vec model is trained with specified parameters:
        vector_size: Size of the embedding vectors.
        window: Maximum distance between the current and predicted word within a sentence. Specifically, it determines how many words to the left and right of the target word are included in the training examples.
        min_count: Ignores all words with a total frequency lower than this.
        sg: Skip-gram model if set to 1; CBOW if set to 0.
    Getting Embeddings: The get_embedding function retrieves the embedding for a given word, with error handling for words not found in the vocabulary.

In [4]:
# Sample text corpus: list of sentences
sentences = [
    "The cat sat on the mat, enjoying the warm sun.",
    "The dog sat on the log, watching the world go by.",
    "Cats and dogs are both popular pets, each with unique characteristics.",
    "I love my pets because they provide companionship and joy.",
    "Pets bring joy to our lives and teach us responsibility.",
    "Training pets can be a rewarding experience for both the owner and the animal.",
    "Cats are often independent, while dogs typically seek companionship.",
    "Many families consider pets as part of their family unit.",
    "Adopting a pet can change your life and bring immense happiness.",
    "Understanding pet behavior is key to building a strong bond with them.",
]

In [6]:
from gensim.models import Word2Vec

# Tokenization
tokenized_sentences = [sentence.lower().split() for sentence in sentences]

# Training the Word2Vec model
model = Word2Vec(
    sentences=tokenized_sentences, vector_size=1024, window=5, min_count=1, sg=0
)


# Getting token embeddings
def get_embedding(word):
    try:
        return model.wv[word]
    except KeyError:
        print(f"{word} not in vocabulary")
        return None


# Example of retrieving embeddings for specific words
words_to_embed = ["cat", "dog", "pets", "love"]

for word in words_to_embed:
    embedding = get_embedding(word)
    if embedding is not None:
        print(f"Embedding for '{word}': {embedding}")

Embedding for 'cat': [-3.3003380e-04  3.4521687e-05 -5.8911741e-04 ...  6.8755902e-04
 -2.3352924e-05 -7.8444142e-04]
Embedding for 'dog': [-0.00025762  0.00087888 -0.00071964 ...  0.00081767  0.00016185
  0.0008499 ]
Embedding for 'pets': [ 3.8436285e-04  4.9113255e-04  5.9397717e-04 ... -8.1487949e-04
 -1.5125802e-05 -2.5849711e-04]
Embedding for 'love': [-0.00066168 -0.00095225 -0.00059386 ... -0.00075696 -0.00030017
  0.00017272]


In [7]:
sentence_embeddings = []
for sentence in sentences:
    sentence_embedding = generate_embedding(sentence)
    sentence_embeddings.append(sentence_embedding)
    print(
        f"Original Sentence: {sentence}\nIts embedding vector dimension: {len(sentence_embedding)}"
    )

Original Sentence: The cat sat on the mat, enjoying the warm sun.
Its embedding vector dimension: 768
Original Sentence: The dog sat on the log, watching the world go by.
Its embedding vector dimension: 768
Original Sentence: Cats and dogs are both popular pets, each with unique characteristics.
Its embedding vector dimension: 768
Original Sentence: I love my pets because they provide companionship and joy.
Its embedding vector dimension: 768
Original Sentence: Pets bring joy to our lives and teach us responsibility.
Its embedding vector dimension: 768
Original Sentence: Training pets can be a rewarding experience for both the owner and the animal.
Its embedding vector dimension: 768
Original Sentence: Cats are often independent, while dogs typically seek companionship.
Its embedding vector dimension: 768
Original Sentence: Many families consider pets as part of their family unit.
Its embedding vector dimension: 768
Original Sentence: Adopting a pet can change your life and bring immen

# Simple RAG process

In [8]:
import numpy as np


def cosine_similarity(vec_a, vec_b):
    # Compute the dot product
    dot_product = np.dot(vec_a, vec_b)

    # Compute the magnitudes (norms) of the vectors
    norm_a = np.linalg.norm(vec_a)
    norm_b = np.linalg.norm(vec_b)

    # Calculate cosine similarity
    if norm_a == 0 or norm_b == 0:
        return 0.0  # Handle division by zero if any vector is zero
    return dot_product / (norm_a * norm_b)


def vector_db_search(query_embedding):
    # Initialize variables to track the most similar text and its score
    most_similar_text = ""
    most_similar_score = -1

    # Loop through the list of sentence embeddings to find the most similar one
    for index, sentence_embedding in enumerate(sentence_embeddings):
        # Compute cosine similarity between the question embedding and the current sentence embedding
        computed_cosine_similarity = cosine_similarity(
            query_embedding, sentence_embedding
        )

        # Check if the computed similarity is higher than the previous best score
        if computed_cosine_similarity > most_similar_score:
            most_similar_score = computed_cosine_similarity
            most_similar_text = sentences[index]  # Update the most similar text

    return (most_similar_text, most_similar_score)

In [9]:
print(f"Our current sentences: {sentences}\n")

# User's question
our_question = "How do families see pets?"

# Generate an embedding for the user's question
the_embedding_of_our_question = generate_embedding(our_question)

# Query the vector DB
most_similar_text, score = vector_db_search(the_embedding_of_our_question)

# Print the user's question and the most similar sentence in a formatted way
print(
    f"Our query: '{our_question}'\n"
    f"Most similar sentence in our vector DB: '{most_similar_text}'\n"
    f"Cosine Similarity Score: {score:.4f}\n"
)

# Prepare the prompt for the RAG model using the most similar text
rag_prompt_template = f"""
You are an assistant for question answering tasks. Use the information between the <context> </context> blocks to help answer the question. If you don't know, say 'I dunno'.

Here is the user's question: {our_question}

<context> {most_similar_text} </context>
"""

# Print the prompt for RAG generation
print("### RAG GENERATION ###")
print(generate_text(rag_prompt_template))

# Print the result of regular text generation based on the original question
print("\n### REGULAR GENERATION ###")
print(generate_text(our_question))

Our current sentences: ['The cat sat on the mat, enjoying the warm sun.', 'The dog sat on the log, watching the world go by.', 'Cats and dogs are both popular pets, each with unique characteristics.', 'I love my pets because they provide companionship and joy.', 'Pets bring joy to our lives and teach us responsibility.', 'Training pets can be a rewarding experience for both the owner and the animal.', 'Cats are often independent, while dogs typically seek companionship.', 'Many families consider pets as part of their family unit.', 'Adopting a pet can change your life and bring immense happiness.', 'Understanding pet behavior is key to building a strong bond with them.']

Our query: 'How do families see pets?'
Most similar sentence in our vector DB: 'Many families consider pets as part of their family unit.'
Cosine Similarity Score: 0.8076

### RAG GENERATION ###
Families typically view pets as beloved members of their family, often including them in daily activities and showing affect

# Rewrite Retrieve Read

In [10]:
# Original query
original_query = "Tell me about pets."

# Step 1: Rewrite the query
refined_prompt = generate_text(
    "Rewrite the following query to make it more specific: "
    + original_query
    + "--- Only return the rewritten query"
)

print(refined_prompt)

What are the characteristics, habits, and living situations of typical household pets?


# Generate-Read

In [11]:
# Original query
original_query = "Tell me about pets."

# Step 1: Generate additional information about the query
generated_information = generate_text(
    "Generate some facts about the following query: "
    + original_query
    + "--- Only return the facts"
)

print(generated_information)

Here are some facts about pets:

**General Facts**

1. Over 60% of households in the United States have a pet.
2. The most common pets kept by Americans are dogs, cats, and fish.

**Pet Care and Health**

3. A typical cat spends around 16-18 hours per day sleeping.
4. Dogs can hear sounds at frequencies as high as 45,000 Hz, while humans can only hear up to 20,000 Hz.
5. The average lifespan of a domestic cat is 12-17 years.
6. Some pets, such as birds and small mammals, are susceptible to stress-related behaviors if not properly socialized.

**Pet Ownership Benefits**

7. Studies have shown that owning a pet can lower blood pressure, cholesterol levels, and heart rate.
8. Children who grow up in households with pets tend to develop a stronger immune system.
9. Pet owners often experience reduced anxiety and depression symptoms compared to non-pet owners.

**Interesting Pet Behaviors**

10. Dogs have a unique nose print, just like human fingerprints.
11. Cats can't taste sweetness due 