# RAG-Debugger - Getting Started Tutorial

In this notebook, we showcase how to use the [RAG-Debugger](https://rag.lastmileai.dev/) to optimize your RAG pipelines. We will evaluate a demo RAG pipeline, which enables question-answering over [Paul Graham's essays](https://www.paulgraham.com/worked.html) using `gpt-3.5-turbo`. Check out our [Cookbook](https://github.com/lastmile-ai/eval-cookbook/tree/main) for more examples and tutorials.

<img width="500" alt="Screenshot 2024-05-16 at 12 27 45 PM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/497135cb-3fc0-452b-a7fd-c04819be2fab">

## Notebook Outline
* [Step 1: Install and Setup](#install)
* [Step 2: Build and Trace RAG System](#trace)
  * [Download Data](#download_data)
  * [Trace Ingestion Pipeline](#trace_ingestion)
  * [Trace Query Pipeline](#trace_query)
  * [Access Raw Traces](#access_data)
  * [View Traces in RAG Debugger UI](#view_ui)
* [Step 3: Debug and Optimize your RAG System](#debug)
  * [Measure and Evaluate Performance](#measure)
  * [Idenfity Issues](#identify)
  * [Iterate and Optimize System](#iterate)


<a name="install"></a>
## Step 1: Install and Setup
1. Install required packages and modules
2. Setup API Keys/Tokens

To begin, we need to install the required packages and modules.

In [1]:
!pip install chromadb
!pip install llama-index
!pip install lastmile-eval --upgrade



In [2]:
import os
import chromadb
import pandas as pd
import openai
from lastmile_eval.rag.debugger.tracing import get_lastmile_tracer

from lastmile_eval.rag.debugger.api import LastMileTracer
from lastmile_eval.rag.debugger.tracing import (
    list_ingestion_trace_events,
    get_latest_ingestion_trace_id,
    get_trace_data,
)
from lastmile_eval.rag.debugger.tracing import list_ingestion_trace_events
from lastmile_eval.rag.debugger.api import (
    QueryReceived,
    ContextRetrieved,
    PromptResolved,
    LLMOutputReceived,
)
from functools import partial
from lastmile_eval.rag.debugger.api.evaluation import (
    run_and_evaluate,
)
from openai import OpenAI
from llama_index.core.evaluation import generate_question_context_pairs

  from .autonotebook import tqdm as notebook_tqdm


We also need the following tokens/keys:

* **LastMile AI API Token:** Go to the [LastMile Settings page](https://lastmileai.dev/settings?page=tokens). You will need to first create a LastMile AI account.
* **OpenAI API Key:** Go to [OpenAI API Keys page](https://platform.openai.com/account/api-keys) to create and access your OpenAI API Key.

In [3]:
LASTMILE_API_TOKEN: str = "" # n.b. # You can get your key from the "API Token" section of https://lastmileai.dev/settings?page=tokens
OPENAI_API_KEY: str = "" # n.b. see above

os.environ['OPENAI_API_KEY'] =  OPENAI_API_KEY
os.environ['LASTMILE_API_TOKEN'] =  LASTMILE_API_TOKEN

In [4]:
# Alternative to load keys:

from dotenv import load_dotenv

if load_dotenv(override=True):
    print("Keys loaded successfully.")
else:
    print("Please make sure you have a .env in the working directory.")


Keys loaded successfully.


<a name="trace"></a>

## Step 2: Build and Trace RAG System

1. Download Data (Paul Graham Essay)
2. Trace Document Ingestion Pipeline
3. (Optional) Access Raw Trace Data
4. Trace Query Pipeline
5. View Traces in RAG Debugger UI

**Note:** If you are using OpenAI, LangChain, or LlamaIndex, we offer auto-instrumention for tracing (no manual setup required). See our documentation to learn more.

<a name="download_data"></a>

#### Download Data

In [5]:
!mkdir -p 'data/paul_graham/'
!curl 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -o 'data/paul_graham/paul_graham_essay.txt'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75042  100 75042    0     0   806k      0 --:--:-- --:--:-- --:--:--  832k


<a name="trace_ingestion"></a>

#### Trace Document Ingestion Pipeline
In the cells below, we create chunks of our document (Paul Graham essay) and store it in a vector database (ChromaDB). ChromaDB converts these chunks of texts to vector embeddings which are indexed in the database and can easily be retrieved.

We also instatiated a **LastMile AI Tracer object** and traced the chunking step and the ingestion step (storing the embeddings in ChromaDB). We use the `@traced` decorator to trace these steps.

First, instantiate a Tracer object. The name of the tracer object ("Paul-Graham-Demo-Project") is also your Project Name that is visible in the UI.

In [6]:
tracer: LastMileTracer = get_lastmile_tracer("Paul-Graham-Demo-Project")

In [7]:
tracer

<lastmile_eval.rag.debugger.tracing.lastmile_tracer.LastMileTracer at 0x2c11c1870>

Setup of your ingestion pipeline below with the necessary tracing added.

In [8]:
chroma_client = chromadb.Client()

@tracer.trace_function() #Decorate the function with the tracer
def chunk_document(file_path: str, chunk_size: int = 1000) -> list[str]:
    """
    Chunk a text file into a list of strings based on the specified chunk size.

    Args:
        file_path (str): The path to the text file.
        chunk_size (int): The desired number of characters in each chunk.

    Returns:
        list[str]: A list of strings, where each string represents a chunk of text.
    """
    with open(file_path, "r") as file:
        text = file.read()

    chunks: list[str] = []
    for i in range(0, len(text), chunk_size):
        chunks.append(text[i:i + chunk_size])

    return chunks

@tracer.trace_function()
def run_ingestion_flow() -> chromadb.Collection:
    collection = chroma_client.create_collection(name="paul_graham_collection")
    tracer.mark_rag_ingestion_trace_event("Ingesting Paul Graham's essay")

    document_chunks = chunk_document("data/paul_graham/paul_graham_essay.txt")
    document_ids = [f"chunk_{i}" for i in range(len(document_chunks))]

    collection.add(
        ids=document_ids,
        documents=document_chunks, # ex: ["What I Worked On", "February 2021", ...]
    )
    return collection

In [9]:
# TODO(b7r6): do this properly with a real cache and stop being lazy...
try:
  print(f"collection: {collection}")
except NameError:
  collection = run_ingestion_flow()

**Important - Linking Ingestion Trace to Query Pipeline**
The trace data for the ingestion pipeline has an ID associated with it. We can use this ID to link the tracing for the ingestion step and the query step of the RAG system for a comprehensive overview of your RAG system.

Here is how you get the latest ingestion trace ID which you can use when setting up the tracing for the Query Pipeline.

In [10]:
# TODO(b7r6): we need a utility for this...
ingestion_trace_id = list_ingestion_trace_events(take=1)["ingestionTraces"][0]["id"]

print(ingestion_trace_id)

clwsjzygn00t0qpnak5cilo54


<a name="trace_query"></a>

#### Trace Query Pipeline
Now that we have the document ingestion pipeline built and traced, lets build a query pipeline. We will use an OpenAI model (`gpt-3.5-turbo`) to generate responses to user queries. Similar to the document ingestion pipeline, we will trace this pipeline with the `@traced` decorator.

**NOTE:** the document ingestion pipeline and query pipeline are separate so we will need to link them together with `ingestion_trace_id`.

In [11]:
# TODO remove
def mock_get_last_rag_query_trace_id_written(user_query: str) -> str | None:
    TODO_REMOVE_RQTID_BY_USER_QUERY = {
        "What two main things did Paul Graham work on before college, outside of school?": "clwshzblp002qqynladsy4mno",
        "What was the key realization Paul Graham had about artificial intelligence during his first year of grad school at Harvard?": "clwshzdg800h9pex6058i633h",
        "How did Paul Graham and his partner Robert Morris get their initial idea and start working on what became their startup Viaweb?": "clwshzfl300qvqpna2wasu8vv",
        "What were some of the novel approaches and advantages that Y Combinator introduced compared to traditional venture capital firms when it first started?": "clwshzk620086quyzm84aa0zr",
        "What ambitious programming language project did Paul Graham work on intensively for 4 years from 2015-2019, and what was unique about the goal and approach of this language called Bel?": "clwshzk620086quyzm84aa0zr",
    }

    # from lastmile_eval.rag.debugger.tracing import list_query_trace_events

    # traces = [r["id"] for r in list_query_trace_events(take=50)["queryTraces"]]

    return TODO_REMOVE_RQTID_BY_USER_QUERY.get(user_query, None)

In [12]:
from typing import Optional

LLM_NAME = "gpt-3.5-turbo"

PROMPT_TEMPLATE = """
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer:
"""

@tracer.trace_function("retrieve-context") #Decorate the function with the tracer
def retrieve_context(query_string: str, top_k: int = 5) -> list[str]:
    """
    Retrieve the top-k most relevant contexts based on the query string
    from the chroma db collection
    """
    tracer.register_param("similarity_top_k", top_k) #Register parameters associated with your RAG pipeline setup
    chroma_retrival_results = collection.query(query_texts=query_string, n_results=top_k)
    documents_parsed_as_strings = [document for document in chroma_retrival_results.get("documents")[0]]

    tracer.mark_rag_query_trace_event(ContextRetrieved(context=documents_parsed_as_strings), ingestion_trace_id)

    return documents_parsed_as_strings

@tracer.trace_function("resolve-prompt")
def resolve_prompt(user_query: str, retrieved_contexts: list[str]):
    resolved_prompt = PROMPT_TEMPLATE.replace(
        "{context_str}", "\n\n\n".join(retrieved_contexts)
    ).replace("{query_str}", user_query)
    tracer.mark_rag_query_trace_event(
        PromptResolved(fully_resolved_prompt=resolved_prompt), ingestion_trace_id
    )
    return resolved_prompt

@tracer.trace_function("query-root-span") # You can provide a custom name for the root span
def run_query_flow(user_query: str, ingestion_trace_id: str) -> tuple[str, Optional[str]]:
    tracer.mark_rag_query_trace_event(QueryReceived(query=user_query), ingestion_trace_id)

    retrieved_contexts = retrieve_context(user_query, top_k=3)
    resolved_prompt = resolve_prompt(user_query, retrieved_contexts)

    with tracer.start_as_current_span("call-llm") as _llm_span:
        openai_client = openai.Client(api_key=os.getenv("OPENAI_API_KEY"))
        response = openai_client.chat.completions.create(
            model=LLM_NAME,
            messages=[{"role": "user", "content": resolved_prompt}],
        )
        output: str = response.choices[0].message.content
        tracer.mark_rag_query_trace_event(LLMOutputReceived(llm_output=output), ingestion_trace_id)

    # Returning the RAG Query Trace ID allows the evaluation framework to
    # associate evaluations with the trace.

    rag_query_trace_id = mock_get_last_rag_query_trace_id_written(user_query)
    # TODO instead: 
    # rag_query_trace_id = tracer.get_last_rag_query_trace_id_written()

    return output, rag_query_trace_id


Let's try an example user query.

In [13]:
response, rag_query_trace_id = run_query_flow("What did the author do growing up?", ingestion_trace_id)

print(f"Response: {response}")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Response: The author did not mention specifically what they did growing up in the provided context information. The author mentioned taking art classes at Harvard, being in a PhD program in computer science, dropping out of RISD (Rhode Island School of Design), and later moving to England with their family. The focus of the information provided is more on the author's academic and professional pursuits rather than their upbringing.


<a name="access_data"></a>

#### Access Raw Trace Data
The trace data for the ingestion and query pipeline have IDs associated with them. The raw trace data is sent to both Jaegar and Postgres which we have shown below. This trace data is also viewable in a much more user-friendly view in the RAG-Debugger UI which will be shown soon.

In [14]:
# Print trace data from Jaeger
get_trace_data(get_latest_ingestion_trace_id())

{'data': [{'traceID': '0f9e8840cc19bc74521a5a6ca1a6e710',
   'spans': [{'traceID': '0f9e8840cc19bc74521a5a6ca1a6e710',
     'spanID': 'ed357e98334aa294',
     'operationName': '1 - run_ingestion_flow',
     'references': [],
     'startTime': 1717031117164798,
     'duration': 6751323,
     'tags': [{'key': 'input', 'type': 'string', 'value': '{}'},
      {'key': 'output',
       'type': 'string',
       'value': '"<output not json-serializable>: All logged values must be JSON-serializable: name=\'paul_graham_collection\' id=UUID(\'32d821ef-93c4-47ee-aa9a-9174d8e9cb6a\') metadata=None tenant=\'default_tenant\' database=\'default_database\'"'},
      {'key': 'span.kind', 'type': 'string', 'value': 'internal'},
      {'key': 'internal.span.format', 'type': 'string', 'value': 'otlp'}],
     'logs': [],
     'processID': 'p1',
    {'traceID': '0f9e8840cc19bc74521a5a6ca1a6e710',
     'spanID': '206b207efcb676a7',
     'operationName': '2 - chunk_document',
     'references': [{'refType': 'C

In [15]:
# Print trace data from Postgres
ingestion_trace_events = list_ingestion_trace_events(take=1)
pd.DataFrame.from_records(ingestion_trace_events["ingestionTraces"]).rename(  # type: ignore[fixme]
    columns={"id": "ragIngestionTraceEventId"}
)

Unnamed: 0,ragIngestionTraceEventId,createdAt,updatedAt,paramSet,eventName,eventData,input,output,metadata,traceId,creatorId,projectId,organizationId,visibility,active,annotations,feedback
0,clwsjzygn00t0qpnak5cilo54,2024-05-30T01:05:24.023Z,2024-05-30T01:05:24.023Z,{},,{},,,,f9e8840cc19bc74521a5a6ca1a6e710,clp1m7n3l0062qpqnd4nyabbl,clwshp0wz0058quyz3xb8juvr,,MEMBER,True,[],[]


<a name="view_ui"></a>

#### View Traces in RAG Debugger UI
At this point, we have a traced ingestion pipeline and a traced query pipeline. We can view the traces in the RAG Debugger UI. In a seperate terminal, run the following command to launch the RAG Debugger:

`rag-debug launch`

Open up your webbrowser and navigate to the url provided by the RAG Debugger. This will look like http://localhost:8080/

Navigate to the Traces tab. You should see the traces for the ingestion pipeline and the query pipeline. It will look something like this:

<img width="1792" alt="Screenshot 2024-05-16 at 12 00 55 PM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/adfc429e-7533-4d98-8bc7-acfc00a703f8">


<a name="debug"></a>

## Step 3: Debug your RAG System
1. Measure and Evaluate Performance
2. Identify Issues
3. Iterate and Optimize RAG System

Evaluation is a crucial part of LLM development. To improve and debug your RAG system, you must have a way to measure it. Evaluation metrics (aka evaluators) allow you to measure the quality of LLM-generated results. Evaluators can take in various inputs including the generated response, ground truth data, context, etc. and typically output a numeric score from 0 to 1.

Our first step is run evaluations on data we pass into our RAG system and gather metrics we can analyze in the RAG Debugger UI.

<a name="measure"></a>

#### Measure and Evaluate Performance


To evaluate our RAG system, we'll create a TestSet containing questions to ask the system. We'll compare the system's responses to ground truth answers using evaluation metrics to assess the quality and effectiveness of the pipeline in providing accurate and relevant answers based on the ingested document (Paul Graham essay).

In [16]:
user_questions = [
    "What two main things did Paul Graham work on before college, outside of school?",
    "What was the key realization Paul Graham had about artificial intelligence during his first year of grad school at Harvard?",
    "How did Paul Graham and his partner Robert Morris get their initial idea and start working on what became their startup Viaweb?",
    "What were some of the novel approaches and advantages that Y Combinator introduced compared to traditional venture capital firms when it first started?",
    "What ambitious programming language project did Paul Graham work on intensively for 4 years from 2015-2019, and what was unique about the goal and approach of this language called Bel?"
]

ground_truth_answers = [
    "The author first interacted with programming on a mainframe computer, using punch cards to input Fortran code, which was a challenging and time-consuming process",
    "The transition from the IBM 1401 to microcomputers like the TRS-80 represented a significant step forward in terms of both programming capabilities and user interaction.",
    "A turning point came after reading Nick Bostrom's \"Superintelligence,\" which presented a persuasive argument on the potential of Artificial Intelligence (AI)",
    "Heinlein's \"The Moon is a Harsh Mistress\" and Terry Winograd's SHRDLU heavily influenced the author's decision to pursue AI",
    "The author considered the AI practices during his first year of grad school as a \"hoax\" because they didn't meet his expectations for understanding and interpreting natural language accurately.",
]

For each question in the TestSet, we'll compute a **Relevance score** that measures how closely the system's response matches the corresponding ground truth answer. Since each question is associated with a specific trace, we'll obtain a Relevance score for each trace, allowing us to assess the performance of the RAG pipeline at a granular level.



In [17]:
# In this case, we are using just the default Relevance evaluation metric.
evaluator_names = {"relevance"}

evaluate_result = run_and_evaluate(
    project_id=None,
    evaluators=evaluator_names,
    run_query_fn=partial(
        run_query_flow,
        ingestion_trace_id=ingestion_trace_id
    ),
    inputs=user_questions,
    ground_truths=ground_truth_answers,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
llm_classify |██████████| 5/5 (100.0%) | ⏳ 00:02<00:00 |  2.15it/s
llm_classify |██████████| 5/5 (100.0%) | ⏳ 00:02<00:00 |  1.99it/s
llm_classify |██████████| 5/5 (100.0%) | ⏳ 00:02<00:00 |  2.20it/s
llm_classify |██████████| 5/5 (100.0%) | ⏳ 00:02<00:00 |  1.78it/s


In [19]:
print(f"""
    {evaluate_result.success=}
    {evaluate_result.message=}

    {evaluate_result.evaluation_result_id=}
    {evaluate_result.example_set_id=}
""")

print("Trace-level metrics:")
display(evaluate_result.df_metrics_trace)
print("Dataset-level metrics:")
display(evaluate_result.df_metrics_dataset)


    evaluate_result.success=True
    evaluate_result.message='{"id":"clwsk0ir600aiquyz6k805fzo","createdAt":"2024-05-30T01:05:50.322Z","updatedAt":"2024-05-30T01:05:50.322Z","name":"Evaluation Result","paramSet":{"similarity_top_k":3},"testSetId":"clwsk0ajg00t5qpna5g4ht1kk","creatorId":"clp1m7n3l0062qpqnd4nyabbl","projectId":null,"organizationId":null,"visibility":"MEMBER","metadata":null,"active":true}'

    evaluate_result.evaluation_result_id='clwsk0ir600aiquyz6k805fzo'
    evaluate_result.example_set_id='clwsk0ajg00t5qpna5g4ht1kk'

Trace-level metrics:


Unnamed: 0,exampleSetId,exampleId,metricName,value
0,clwsk0ajg00t5qpna5g4ht1kk,clwsk0ajk00t7qpnau52lx2fg,relevance,1.0
1,clwsk0ajg00t5qpna5g4ht1kk,clwsk0ajk00t8qpnanntnsfo2,relevance,1.0
2,clwsk0ajg00t5qpna5g4ht1kk,clwsk0ajk00t9qpnaaxcnplyn,relevance,1.0
3,clwsk0ajg00t5qpna5g4ht1kk,clwsk0ajk00taqpnaujkqougs,relevance,1.0
4,clwsk0ajg00t5qpna5g4ht1kk,clwsk0ajk00tbqpnaaq139pq4,relevance,1.0


Dataset-level metrics:


Unnamed: 0,exampleSetId,metricName,value
0,clwsk0ajg00t5qpna5g4ht1kk,relevance_mean,1.0
0,clwsk0ajg00t5qpna5g4ht1kk,relevance_std,0.0
0,clwsk0ajg00t5qpna5g4ht1kk,relevance_count,5.0


<a name="identify"></a>

### Identify Issues

<a name="iterate"></a>

### Iterate and Optimize RAG System

`TODO(b7r6): figure out what we want to do here...`