# Basic RAG with Model-graded Eval

In this example we'll build a simple RAG application on Volume 7 of History of the United States of America, 
and evaluate it across 
* **relevance** -- does the answer make sense in context of the original question?, 
* **faithfulness** -- is the final answer faithful to the data that we fed into the LLM?
* **coherence** -- is the answer consistent and easy to understand?

We'll use AIConfig to manage and iterate on all our prompts, both for the generation step of the RAG pipeline, as well as its evaluation.

## Install dependencies

Create .env file containing the following line:
`OPENAI_API_KEY=<your key here>`
> You can get your key from https://platform.openai.com/api-keys 


In [1]:
%pip install python-aiconfig==1.1.20
%pip install chromadb

import dotenv
dotenv.load_dotenv()

Collecting importlib-metadata<6.0.0,>=5.0.0 (from json-spec->jsoncomment==0.4.2->lastmile-utils==0.0.21->python-aiconfig==1.1.20)
  Using cached importlib_metadata-5.2.0-py3-none-any.whl (21 kB)
Installing collected packages: importlib-metadata
  Attempting uninstall: importlib-metadata
    Found existing installation: importlib-metadata 6.11.0
    Uninstalling importlib-metadata-6.11.0:
      Successfully uninstalled importlib-metadata-6.11.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opentelemetry-api 1.22.0 requires importlib-metadata<7.0,>=6.0, but you have importlib-metadata 5.2.0 which is incompatible.[0m[31m
[0mSuccessfully installed importlib-metadata-5.2.0
Note: you may need to restart the kernel to use updated packages.
Collecting importlib-metadata<7.0,>=6.0 (from opentelemetry-api>=1.2.0->chromadb)
  Using cached importlib_metadata-6.11

True

In [2]:
import argparse
import asyncio
import os
import sys
from aiconfig import AIConfigRuntime
import chromadb
from glob import glob


  from .autonotebook import tqdm as notebook_tqdm


## Download the raw data
Fetch Volume 7 of the History of the United States of America (our raw unstructured dataset)

In [3]:
!mkdir -p data/books/
!curl -o data/books/pg72846.txt https://www.gutenberg.org/cache/epub/72846/pg72846.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  636k  100  636k    0     0  2721k      0 --:--:-- --:--:-- --:--:-- 2766k


In [4]:
!head data/books/pg72846.txt

The Project Gutenberg eBook of History of the United States of America, Volume 7 (of 9)
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.



In [5]:
collection_name="us_history_volume_7"
chromadb_path="chroma_2.db"

## RAG Data Ingestion & Indexing
Chunk the data and ingest it into a Chroma DB collection.

> We use a very naive text splitting strategy with fixed-size chunks. For a production environment, this step will be critical to optimize.

**Note:** You can also run this as a CLI script using the command 
```
!python rag.py ingest `data/books/` --chroma-collection-name us_history_volume_7
```

In [6]:
def chunk_markdown(text, chunk_size=1000):
    chunks = []
    for i in range(0, len(text), chunk_size):
        yield text[i : i + chunk_size]
    return chunks

In [7]:
async def run_ingest(directory, collection_name):
    chroma_client = chromadb.PersistentClient(path=chromadb_path)
    collection = chroma_client.create_collection(name=collection_name)

    for i, filename in enumerate(glob(f"{directory}/**/*", recursive=True)):
        print("Ingesting:", i, filename)
        documents = []
        metadatas = []
        ids = []

        with open(filename, "r") as f:
            data = f.read()
            for j, chunk in enumerate(chunk_markdown(data)):
                documents.append(chunk)
                metadatas.append({"source": filename})
                ids.append(f"doc_{i}_chunk{j}")

        collection.add(documents=documents, metadatas=metadatas, ids=ids)

In [8]:
try:
    await run_ingest(directory="data/books", collection_name=collection_name)
except Exception as e:
    print(f"Ingest failed: {e}.\nIf the collection exists already, this is fine.")

Ingest failed: Collection us_history_volume_7 already exists.
If the collection exists already, this is fine.


## RAG Query & Response Generation
Query the index for context given a user-supplied question, and use that context to generate a response

**Note:** You can also run this as a CLI script using the Example command: 
```
!python rag.py query "In July, flour sold at Boston for _?" -k=10 --chroma-collection-name us_history_volume_7
```

In [9]:
def retrieve_data(collection, query, k):
    print("Querying for:", query)
    context = collection.query(query_texts=[query], n_results=k)
    return context


def serialize_retrieved_data(data):
    # print("Serializing data:", type(data), data)
    out = "\n".join(data["documents"][0])
    print("Serialized retrieved data:\n", out)
    return out


async def generate(query, context):
    config = AIConfigRuntime.load("rag.aiconfig.yaml")

    params = {
        "query": query, 
        "context": context
    }
    print("Running generate with params:", params)
    prompt = "generate"
    return await config.run_and_get_output_text(
        prompt, params=params
    )

async def run_query(query, collection_name, k):
    chroma_client = chromadb.PersistentClient(path=chromadb_path)
    collection = chroma_client.get_collection(name=collection_name)
    data = retrieve_data(collection, query, k)
    context = serialize_retrieved_data(data)
    result = await generate(query, context)
    print("\n\nResponse:\n", result)

    return (query, context, result)

In [10]:
queries = [
    "What are some of the most important events in US history?"
]
query = queries[0]

In [11]:
query, context, result = await run_query(
    query, collection_name, k=10, 
)

Querying for: What are some of the most important events in US history?


[0;93m2024-02-08 19:16:40.719894 [W:onnxruntime:, helper.cc:67 IsInputSupported] CoreML does not support input dim > 16384. Input:embeddings.word_embeddings.weight, shape: {30522,384}[m
[0;93m2024-02-08 19:16:40.720355 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 49 number of nodes in the graph: 323 number of nodes supported by CoreML: 231[m


Serialized retrieved data:
 ﻿The Project Gutenberg eBook of History of the United States of America, Volume 7 (of 9)
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

Title: History of the United States of America, Volume 7 (of 9)
        During the second administration of James Madison


Author: Henry Adams

Release date: January 31, 2024 [eBook #72846]

Language: English

Original publication: New York: Charles Scribner's Sons, 1889

Credits: Richard Hulse and the Online Distributed Proofreading Team at https://www.pgdp.net (This file was produced from images generously made available by The

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




Response:
 Based on the provided context, some of the important events in US history include:

- The American declaration of war against England in 1812
- The burning of the Assembly houses in Canada in 1812
- The overthrow of Napoleon’s authority in Europe
- The close alliance between Great Britain and Russia
- The loss of the Bank of the United States
- The loss of the Massachusetts and Connecticut banks
- The Battle of the Thames in 1813
- The campaigns of General Dearborn and General Wilkinson
- The blockades and conflicts with British ships, including the battles of Chesapeake and Argus
- Privateering by the US during the war
- The last embargo implemented by the US in an attempt to obtain concessions from England
- The involvement of Russia and England in the war
- The financial challenges faced by the US Treasury
- The changing attitudes and perceptions of the British press towards the US during the war
- The opposition to the war by Federalists, particularly in Massachusetts.

## Evaluate the response
Run evals on the responses across 
* **relevance** -- does the answer make sense in context of the original question?, 
* **faithfulness** -- is the final answer faithful to the data that we fed into the LLM?
* **coherence** -- is the answer consistent and easy to understand?
* **succinctness** -- does the answer contain unnecessary information?

In [12]:
async def run_evals(query, context, answer):
    config = AIConfigRuntime.load("rag.aiconfig.yaml")
    def _get_prompt(criterion):
        return f"evaluate_{criterion}"
    return [
        await config.run_and_get_output_text(
            _get_prompt(criterion),
            params={
                "query": query,
                "context": context,
                "generate": {
                    "output": answer
                }
            }
        )
        for criterion in [
            "relevance", "faithfulness", "coherence", "succinctness"
        ]
    ]


In [13]:
print(f"Evaluating...Query: {query} \n Answer: {result}")
evals = await run_evals(query, context, result)
print("Evaluations:")
for criterion, score in zip(
    ["relevance", "faithfulness", "coherence", "succinctness"], 
    evals
):
    print(f"\n\n{criterion}: {score}")


Evaluating...Query: What are some of the most important events in US history? 
 Answer: Based on the provided context, some of the important events in US history include:

- The American declaration of war against England in 1812
- The burning of the Assembly houses in Canada in 1812
- The overthrow of Napoleon’s authority in Europe
- The close alliance between Great Britain and Russia
- The loss of the Bank of the United States
- The loss of the Massachusetts and Connecticut banks
- The Battle of the Thames in 1813
- The campaigns of General Dearborn and General Wilkinson
- The blockades and conflicts with British ships, including the battles of Chesapeake and Argus
- Privateering by the US during the war
- The last embargo implemented by the US in an attempt to obtain concessions from England
- The involvement of Russia and England in the war
- The financial challenges faced by the US Treasury
- The changing attitudes and perceptions of the British press towards the US during the war

## Eval with trials

In [14]:
import pandas as pd 

async def generate_trials_for_eval(query, context, trials):
    outputs = []
    for _ in range(trials):
        result = await generate(query, context)
        outputs.append(result)

    return outputs

def get_context_for_trials(query, collection_name, k):
    chroma_client = chromadb.PersistentClient(path=chromadb_path)
    collection = chroma_client.get_collection(name=collection_name)
    data = retrieve_data(collection, query, k)
    context = serialize_retrieved_data(data)
    return context


async def run_batch_evals(query, context, trials):
    raw_results = await generate_trials_for_eval(
        query, context, trials
    )

    out = []
    answers = []
    for rr in raw_results:
        evals_for_trial_numbers = await run_evals(query, context, rr)
        evals_for_trial = dict(
                zip(
                [
                "relevance", "faithfulness", "coherence", "succinctness"
                ],
                evals_for_trial_numbers
            )
        )

        out.append(evals_for_trial)
        answers.append(rr)

    df_evals = pd.DataFrame.from_records(out).applymap(
        lambda s: s.lower().startswith("yes")
    )
    df_evals["query"] = query
    df_evals["answer"] = answers

    return df_evals



def run_query_and_batch_evals(query, collection_name, k, trials):
    print("Running query and evals for:", query)
    context = get_context_for_trials(query, collection_name, k)
    return run_batch_evals(query, context, trials)


trials = 2

df_pass = await run_query_and_batch_evals(
    query, collection_name, k=10, 
    trials=trials
)


Running query and evals for: What are some of the most important events in US history?
Querying for: What are some of the most important events in US history?
Serialized retrieved data:
 ﻿The Project Gutenberg eBook of History of the United States of America, Volume 7 (of 9)
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

Title: History of the United States of America, Volume 7 (of 9)
        During the second administration of James Madison


Author: Henry Adams

Release date: January 31, 2024 [eBook #72846]

Language: English

Original publication: New York: Charles Scribner's Sons, 1889

C

  df_evals = pd.DataFrame.from_records(out).applymap(


In [15]:
pd.set_option("display.max_colwidth", 500)
display(df_pass)

# print("Trial results (% pass):")

display(df_pass.drop(columns=["answer"]).groupby("query").mean() * 100)

Unnamed: 0,relevance,faithfulness,coherence,succinctness,query,answer
0,False,False,False,False,What are some of the most important events in US history?,The provided context does not provide any information about important events in US history.
1,True,True,True,False,What are some of the most important events in US history?,"Based on the provided context, some important events in US history mentioned in the ebook ""History of the United States of America, Volume 7 (of 9)"" are:\n\n1. The American declaration of war against England in 1812.\n2. The Battle of the Thames.\n3. Dearborn's campaign.\n4. Wilkinson's campaign.\n5. Mobile and Fort Mims.\n6. Campaigns among the Creeks.\n7. The blockade.\n8. Chesapeake and Argus naval battles.\n9. Privateering.\n10. Russia and England's relations.\n11. The last embargo.\n12...."


Unnamed: 0_level_0,relevance,faithfulness,coherence,succinctness
query,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
What are some of the most important events in US history?,50.0,50.0,50.0,0.0


In [16]:
df_pass_all_queries = pd.concat(
    [
        await run_query_and_batch_evals(
            query, collection_name, k=10, 
            trials=trials
        )
        for query in queries
    ]
)
df_pass_all_queries.head()

Running query and evals for: What are some of the most important events in US history?
Querying for: What are some of the most important events in US history?
Serialized retrieved data:
 ﻿The Project Gutenberg eBook of History of the United States of America, Volume 7 (of 9)
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

Title: History of the United States of America, Volume 7 (of 9)
        During the second administration of James Madison


Author: Henry Adams

Release date: January 31, 2024 [eBook #72846]

Language: English

Original publication: New York: Charles Scribner's Sons, 1889

C

  df_evals = pd.DataFrame.from_records(out).applymap(


Unnamed: 0,relevance,faithfulness,coherence,succinctness,query,answer
0,False,False,False,True,What are some of the most important events in US history?,The provided context does not provide information about the most important events in US history.
1,False,True,False,False,What are some of the most important events in US history?,"Based on the provided context, it is not possible to determine the important events in US history."


In [17]:
print("PCT pass:")
df_pass_all_queries.drop(columns=["answer"]).groupby("query").mean() * 100

PCT pass:


Unnamed: 0_level_0,relevance,faithfulness,coherence,succinctness
query,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
What are some of the most important events in US history?,0.0,50.0,0.0,50.0


In [18]:
!python3 rag.py info




huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Starting info
Available Chroma Collections: [Collection(name=us_history_volume_7)]
