# Retrieval Augmented Generation Experiment Example

Retrieval-augmented generation (RAG) is a technique that improves the quality of large language model (LLM) outputs by grounding the model on external sources of knowledge. RAG works by first retrieving a set of relevant documents from a knowledge base, such as documents stored in a vector database, in response to a given prompt. The retrieved documents are then concatenated with the original prompt and fed to the LLM, which uses them to generate a more informed and accurate response.

As seen from other notebook examples, PromptTools enables you test various LLMs and vector databases independently. For [example](https://github.com/hegelai/prompttools/blob/main/examples/notebooks/vectordb_experiments/ChromaDBExperiment.ipynb), you can provide a prompt to `ChromaDB` to see if the list of returned documents are sufficiently relevant. In this example, we combine the evaluation of vector databases and LLMs and evaluate the final outputs of the whole process.

## Installations

In [None]:
# !pip install --quiet --force-reinstall prompttools

## Setup imports and API keys

We'll import the relevant `prompttools` modules to setup our experiment.

In [1]:
from prompttools.experiment import ChromaDBExperiment, OpenAICompletionExperiment, OpenAIChatExperiment
from prompttools.harness import RetrievalAugmentedGenerationExperimentationHarness

We will be using OpenAI's LLM in this example. You can set up your API key here.

In [2]:
import os

os.environ["OPENAI_API_KEY"] = ""  # Put your key here

### Set up data for retrieval with a VectorDB Experiment

There are two main steps in Retrieval Augmented Generation. We will start with the first step - retrieval.

We will set up a vector database experiment. We will insert documents the DB with different embedding functions (vectorizer), and query the results.

For this example, we will use ChromaDB, but you use other vector databases (e.g. Weaviate, LanceDB, Qdrant) as well. You can also experiment over different distance function and query methods if desired.

For detailed explanation about each step, have a look at the [ChromaDB example notebook](https://github.com/hegelai/prompttools/blob/main/examples/notebooks/vectordb_experiments/ChromaDBExperiment.ipynb). It also discusses how you can try different chunk and pre-processing strategies as you insert documents into the database.

In [3]:
import chromadb
from chromadb.utils import embedding_functions


emb_fns = [
    embedding_functions.SentenceTransformerEmbeddingFunction(model_name="paraphrase-MiniLM-L3-v2"),
    embedding_functions.DefaultEmbeddingFunction(),
]  # default is "all-MiniLM-L6-v2"
emb_fns_names = ["paraphrase-MiniLM-L3-v2", "default"]

chroma_client = chromadb.Client()
# You can also create and use `chromadb.PersistentClient` or `chromadb.HttpClient`
TEST_COLLECTION_NAME = "TEMPORARY_COLLECTION"
try:
    chroma_client.delete_collection(TEST_COLLECTION_NAME)
except Exception:
    pass
collection_name = TEST_COLLECTION_NAME

use_existing_collection = False  # Specify that we want to create a collection during the experiment

# Documents that will be added into the database
add_to_collection_params = {
    "documents": ["Mickey Mouse is the 50th president.",
                  "The 51st president is Snoopy.",
                  "Batman became the 52th president briefly after."],
    "metadatas": [{"source": "my_source"}, {"source": "my_source"}, {"source": "my_source"}],
    "ids": ["id1", "id2", "id3"],
}

# Our test queries
query_collection_params = {"query_texts": ["Who was the 50th president?", "Who was the 51st president?"],
                           "n_results": [1],  # You can have the model returns more results if you'd like
                          }


# Set up the experiment
vdb_experiment = ChromaDBExperiment(
    chroma_client,
    collection_name,
    use_existing_collection,
    query_collection_params,
    emb_fns,
    emb_fns_names,
    add_to_collection_params,
)

  if not hasattr(tensorboard, "__version__") or LooseVersion(
  ) < LooseVersion("1.15"):


You can visualize the results and see what documents have been fetched.

In [4]:
vdb_experiment.visualize()

  self._read_ready.notifyAll()


Unnamed: 0,query_texts,embed_fn,top doc ids,distances,documents,latency
0,Who was the 50th president?,paraphrase-MiniLM-L3-v2,[id2],[21.199106216430664],[The 51st president is Snoopy.],0.007107
1,Who was the 51st president?,paraphrase-MiniLM-L3-v2,[id2],[13.693190574645996],[The 51st president is Snoopy.],0.005465
2,Who was the 50th president?,default,[id1],[0.617713212966919],[Mickey Mouse is the 50th president.],0.019721
3,Who was the 51st president?,default,[id2],[0.6116843819618225],[The 51st president is Snoopy.],0.019985


Notice how the first embedding function returns "The 51st president is Snoopy" for both queries. This inaccuracy is going to cause problem as we pass the documents into the LLM, because it will not have the right context to answer questions.

At this point, you have results from the retrieval step. If you wish to evaluate how relevant the retrieved documents are, you can. The [ChromaDB notebook example](https://github.com/hegelai/prompttools/blob/main/examples/notebooks/vectordb_experiments/ChromaDBExperiment.ipynb) shows you how you may do that. We will skip that here for brevity.

It is often worthwhile to independently evaluate the retrieval step.

### Setup up Retrieval Augmented Generation Experiment

After setting up your vector database experiment, we can set up the LLM experiment that will consume the documents retrieved from the vector DB. We need:

1. LLM experiment (we will use `OpenAICompletionExperiment` here, but you can use something else as well)
2. LLM arguments (this will be passed into the LLM experiment)
3. A function to extract documents from the resuls of the vector DB experiment

These are the arguments we will use for our LLM experiment `OpenAICompletionExperiment`. For an example with `OpenAIChatExperiment` (that uses `gpt-3.5-turbo`, scroll further below).


In [5]:
models = ["babbage-002"]  # If you want to use "gpt-3,5-turbo", look further below for an example
prompts = ["Who is the 50th president?", "Who is the 51st president?"]
temperatures = [1.0]  # You can test multiple temperate or other parameters as wel
# You can add more parameters that you'd like to test here.

llm_arguments = {"model": models, "prompt": prompts, "temperature": temperatures}

We define two functions:
1. Extracts the list of documents from each row of the vector DB experiment result. These lists will be passed to the LLM during the generation process.
2. Generate a string of relevant metadata based on the row. This will be used later for visualization.

In [6]:
def _extract_doc_from_row(row: 'pandas.core.series.Series'):
    return row['documents']


def _extract_query_metadata_from_row(row: 'pandas.core.series.Series'):
    return f"emb_fn: {row['embed_fn']}, prompt: {row['query_texts']}"

We pass in everything into the RAG experiment

In [7]:
rag_experiment = RetrievalAugmentedGenerationExperimentationHarness(
    vector_db_experiment = vdb_experiment,
    llm_experiment_cls = OpenAICompletionExperiment,
    llm_arguments = llm_arguments,
    extract_document_fn = _extract_doc_from_row,
    extract_query_metadata_fn = _extract_query_metadata_from_row,
)

In [8]:
rag_experiment.run()
rag_experiment.visualize()

  self._read_ready.notifyAll()


Unnamed: 0,prompt,response,latency,retrieval_metadata
0,Given these documents:\nThe 51st president is Snoopy.\n\nWho is the 50th president?,1 Like her bear.\nview more\n\nLucas Cook is asking for help,0.316887,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 50th president?"
1,Given these documents:\nThe 51st president is Snoopy.\n\nWho is the 51st president?,What is the photo tag?\n\nSnoopy's aversion to being called a,0.200568,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 51st president?"
2,Given these documents:\nThe 51st president is Snoopy.\n\nWho is the 50th president?,[The 49th and 50th presidents] are Bobo.\n\nWho,0.187423,"emb_fn: default, prompt: Who was the 50th president?"
3,Given these documents:\nThe 51st president is Snoopy.\n\nWho is the 51st president?,"Tricky — and if you stick to the facts, you might figure it out",0.186136,"emb_fn: default, prompt: Who was the 51st president?"
4,Given these documents:\nMickey Mouse is the 50th president.\n\nWho is the 50th president?,"[/color]\n\nOkay. So that pretty much answers that.'\n\nNow, we just",0.217461,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 50th president?"
5,Given these documents:\nMickey Mouse is the 50th president.\n\nWho is the 51st president?,"To get answers to these questions and others like it, Disney has produced a historical",0.192069,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 51st president?"
6,Given these documents:\nThe 51st president is Snoopy.\n\nWho is the 50th president?,See the answer with this poem.\nCorrect answer is Ronald Reagan \n\nWhich 200,0.182476,"emb_fn: default, prompt: Who was the 50th president?"
7,Given these documents:\nThe 51st president is Snoopy.\n\nWho is the 51st president?,\nThe 51st president thereby ceased to be president when he wrote the document,0.193127,"emb_fn: default, prompt: Who was the 51st president?"


As you can see, the `"babbage-002"` is not very good despite having the right documents/context. We will use GPT-3.5 next.

In [9]:
chat_models = ["gpt-3.5-turbo"]
messages = [
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who was the 50th president?"},
    ],
    [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who was the 51st president?"},
    ]
]
temperatures = [1.0]  # You can test multiple temperate or other parameters as wel
# You can add more parameters that you'd like to test here.

llm_chat_arguments = {"model": chat_models, "messages": messages, "temperature": temperatures}

In [10]:
rag_experiment_2 = RetrievalAugmentedGenerationExperimentationHarness(
    vector_db_experiment = vdb_experiment,
    llm_experiment_cls = OpenAIChatExperiment,
    llm_arguments = llm_chat_arguments,
    extract_document_fn = _extract_doc_from_row,
    extract_query_metadata_fn = _extract_query_metadata_from_row,
)

rag_experiment_2.run()
rag_experiment_2.visualize()

  self._read_ready.notifyAll()


Unnamed: 0,messages,response,latency,retrieval_metadata
0,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 50th president?'}]","Based on the given document that states ""The 51st president is Snoopy,"" we can infer that the 50th president remains unknown or unspecified.",0.869082,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 50th president?"
1,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 51st president?'}]","According to the given document, the 51st president was Snoopy.",0.572129,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 51st president?"
2,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 50th president?'}]","Based on the information provided, it is not explicitly stated who the 50th president was.",0.464698,"emb_fn: default, prompt: Who was the 50th president?"
3,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 51st president?'}]","According to the information provided, the 51st president is Snoopy.",0.439752,"emb_fn: default, prompt: Who was the 51st president?"
4,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: Mickey Mouse is the 50th president. Who was the 50th president?'}]","Based on the given document, Mickey Mouse is stated as the 50th president. However, I should note that Mickey Mouse is a fictional character and not an actual president in real life.",0.995162,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 50th president?"
5,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: Mickey Mouse is the 50th president. Who was the 51st president?'}]","Based on the information given, I cannot determine who the 51st president was.",0.383972,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 51st president?"
6,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 50th president?'}]","Based on the given information, the 50th president is not mentioned in the documents provided.",0.540173,"emb_fn: default, prompt: Who was the 50th president?"
7,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 51st president?'}]","According to the given document, the 51st president is Snoopy.",0.716542,"emb_fn: default, prompt: Who was the 51st president?"


The results from GPT-3.5 looks much better, but some answers are wrong/missing because the retrieval step didn't get the right document. Let's automatically evaluate all the final responses.

## Evaluate the model response

To evaluate the results, you can either define your own evaluation function or use a built-in one provided by our library.

In this case, we will use the built-in `autoeval_with_documents`. Given a list of documents, it will score whether the model response is accurate with "gpt-4" as the judge, returning an integer score from 0 to 10.


In [11]:
from prompttools.utils import autoeval_with_documents

documents = ["Mickey Mouse is the 50th president.",
             "The 51st president is Snoopy.",
             "Batman became the 52th president briefly after."]

rag_experiment_2.evaluate("Score", autoeval_with_documents, documents=[documents] * 8)
rag_experiment_2.visualize()

Unnamed: 0,messages,response,latency,retrieval_metadata,Score
0,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 50th president?'}]","Based on the given document that states ""The 51st president is Snoopy,"" we can infer that the 50th president remains unknown or unspecified.",0.869082,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 50th president?",0
1,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 51st president?'}]","According to the given document, the 51st president was Snoopy.",0.572129,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 51st president?",10
2,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 50th president?'}]","Based on the information provided, it is not explicitly stated who the 50th president was.",0.464698,"emb_fn: default, prompt: Who was the 50th president?",0
3,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 51st president?'}]","According to the information provided, the 51st president is Snoopy.",0.439752,"emb_fn: default, prompt: Who was the 51st president?",10
4,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: Mickey Mouse is the 50th president. Who was the 50th president?'}]","Based on the given document, Mickey Mouse is stated as the 50th president. However, I should note that Mickey Mouse is a fictional character and not an actual president in real life.",0.995162,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 50th president?",10
5,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: Mickey Mouse is the 50th president. Who was the 51st president?'}]","Based on the information given, I cannot determine who the 51st president was.",0.383972,"emb_fn: paraphrase-MiniLM-L3-v2, prompt: Who was the 51st president?",0
6,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 50th president?'}]","Based on the given information, the 50th president is not mentioned in the documents provided.",0.540173,"emb_fn: default, prompt: Who was the 50th president?",0
7,"[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'Given these documents: The 51st president is Snoopy. Who was the 51st president?'}]","According to the given document, the 51st president is Snoopy.",0.716542,"emb_fn: default, prompt: Who was the 51st president?",10
