# Basic RAG with Model-graded Eval

In this example we'll build a simple RAG application on Volume 7 of History of the United States of America, 
and evaluate it across 
* **relevance** -- does the answer make sense in context of the original question?, 
* **faithfulness** -- is the final answer faithful to the data that we fed into the LLM?
* **coherence** -- is the answer consistent and easy to understand?

We'll use AIConfig to manage and iterate on all our prompts, both for the generation step of the RAG pipeline, as well as its evaluation.

## Install dependencies

Create .env file containing the following line:
`OPENAI_API_KEY=<your key here>`
> You can get your key from https://platform.openai.com/api-keys 


In [None]:
!pip install python-aiconfig==1.1.20
!pip install chromadb

import dotenv
dotenv.load_dotenv()

In [3]:
import argparse
import asyncio
import os
import sys
from aiconfig import AIConfigRuntime
import chromadb
from glob import glob


  from .autonotebook import tqdm as notebook_tqdm


## Download the raw data
Fetch Volume 7 of the History of the United States of America (our raw unstructured dataset)

In [3]:
!mkdir -p data/books/
!wget https://www.gutenberg.org/cache/epub/72846/pg72846.txt -O data/books/pg72846.txt

--2024-02-07 23:22:57--  https://www.gutenberg.org/cache/epub/72846/pg72846.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 651483 (636K) [text/plain]
Saving to: ‘data/books/pg72846.txt’


2024-02-07 23:22:58 (1.35 MB/s) - ‘data/books/pg72846.txt’ saved [651483/651483]



In [5]:
!head data/books/pg72846.txt

The Project Gutenberg eBook of History of the United States of America, Volume 7 (of 9)
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.



In [6]:
collection_name="us_history_volume_7"
chromadb_path="chroma_2.db"

## RAG Data Ingestion & Indexing
Chunk the data and ingest it into a Chroma DB collection.

> We use a very naive text splitting strategy with fixed-size chunks. For a production environment, this step will be critical to optimize.

**Note:** You can also run this as a CLI script using the command 
```
!python rag.py ingest `data/books/` --chroma-collection-name us_history_volume_7
```

In [6]:
def chunk_markdown(text, chunk_size=1000):
    chunks = []
    for i in range(0, len(text), chunk_size):
        yield text[i : i + chunk_size]
    return chunks

In [8]:
async def run_ingest(directory, collection_name):
    chroma_client = chromadb.PersistentClient(path=chromadb_path)
    collection = chroma_client.create_collection(name=collection_name)

    for i, filename in enumerate(glob(f"{directory}/**/*", recursive=True)):
        print("Ingesting:", i, filename)
        documents = []
        metadatas = []
        ids = []

        with open(filename, "r") as f:
            data = f.read()
            for j, chunk in enumerate(chunk_markdown(data)):
                documents.append(chunk)
                metadatas.append({"source": filename})
                ids.append(f"doc_{i}_chunk{j}")

        collection.add(documents=documents, metadatas=metadatas, ids=ids)

In [9]:
await run_ingest(directory="data/books", collection_name=collection_name)

Ingesting: 0 data/books/pg72846.txt


/Users/saqadri/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:03<00:00, 22.6MiB/s]
[0;93m2024-02-07 23:32:38.369635 [W:onnxruntime:, helper.cc:67 IsInputSupported] CoreML does not support input dim > 16384. Input:embeddings.word_embeddings.weight, shape: {30522,384}[m
[0;93m2024-02-07 23:32:38.370126 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 49 number of nodes in the graph: 323 number of nodes supported by CoreML: 231[m


## RAG Query & Response Generation
Query the index for context given a user-supplied question, and use that context to generate a response

**Note:** You can also run this as a CLI script using the command 
```
!python rag.py query "In July, flour sold at Boston for _?" -k=10 --chroma-collection-name us_history_volume_7
```

In [14]:
def retrieve_data(collection, query, k):
    print("Querying for:", query)
    context = collection.query(query_texts=[query], n_results=k)
    return context


def serialize_retrieved_data(data):
    # print("Serializing data:", type(data), data)
    return "\n".join(f"{k}={v}" for k, v in data.items())


async def generate(query, context):
    config = AIConfigRuntime.load("rag.aiconfig.yaml")
    params = {"query": query, "context": context}
    # print("Running generate with params:", params)
    return await config.run_and_get_output_text(
        "generate_baseline", params=params
    )

async def run_query(query, collection_name, k):
    chroma_client = chromadb.PersistentClient(path=chromadb_path)
    collection = chroma_client.get_collection(name=collection_name)
    data = retrieve_data(collection, query, k)
    print("Retrieved data:\n", "\n".join(data["documents"][0]))
    context = serialize_retrieved_data(data)
    result = await generate(query, context)
    print("\n\nResponse:\n", result)

    return (query, context, result)

In [10]:
query="What was the price of flour sold in Boston in August?"

In [15]:
query, context, result = await run_query(query, collection_name, k=10)

Querying for: What was the price of flour sold in Boston in August?
Retrieved data:
 wheat to be brought by sea from Charleston or Norfolk to
Boston. Soon speculation began. The price of imported articles rose to
extravagant points. At the end of the year coffee sold for thirty-eight
cents a pound, after selling for twenty-one cents in August. Tea which
could be bought for $1.70 per pound in August, sold for three and four
dollars in December. Sugar which was quoted at nine dollars a hundred
weight in New Orleans, and in August sold for twenty-one or twenty-two
dollars in New York and Philadelphia, stood at forty dollars in
December.

More sweeping in its effects on exports than on imports, the blockade
rapidly reduced the means of the people. After the summer of 1813,
Georgia alone, owing to its contiguity with Florida, succeeded in
continuing to send out cotton. The exports of New York, which exceeded
$12,250,000 in 1811, fell to $209,000 for the year ending in 1814. The
domestic exp

## Evaluate the response
Run evals on the responses across 
* **relevance** -- does the answer make sense in context of the original question?, 
* **faithfulness** -- is the final answer faithful to the data that we fed into the LLM?
* **coherence** -- is the answer consistent and easy to understand?

In [16]:
async def run_evals(query, context, answer):
    config = AIConfigRuntime.load("rag.aiconfig.yaml")
    
    return [
        await config.run_and_get_output_text(
            f"evaluate_{criterion}",
            params={
                "query": query,
                "context": context,
                "answer": answer,
            },
        )
        for criterion in ["relevance", "faithfulness_baseline", "coherence"]
    ]


In [18]:
print(f"Evaluating...Query: {query} \n Answer: {result}")
evals = await run_evals(query, context, result)
print("Evaluations:")
for criterion, score in zip(
    ["relevance", "faithfulness_baseline", "coherence"], evals
):
    print(f"{criterion}: {score}")


Evaluating...Query: What was the price of flour sold in Boston in August? 
 Answer: The price of flour sold in Boston in August was $11.87 per barrel.
Evaluations:
relevance: No, the answer does not satisfactorily answer the question. The question asks for the price of flour in August, but the answer gives the price for July.
faithfulness_baseline: NO

The context given in the answer discusses the price of flour in July, not in August as asked in the question. Therefore, the answer isn't faithful to the context.
coherence: Yes, the answer is self-consistent and easy to understand.


In [6]:
!python rag.py info


Starting info
Available Chroma Collections: [Collection(name=my_collection_name)]
