# RAG Workbench - Getting Started Tutorial

In this notebook, we showcase how to use the [RAG Workbench](https://rag.lastmileai.dev/) to optimize your information retrieval systems. We will evaluate a demo RAG system, which enables question-answering over [Paul Graham's essays](https://www.paulgraham.com/worked.html) using `gpt-3.5-turbo`.

Check out our [LastMile Evaluation notebook](https://github.com/lastmile-ai/eval-cookbook/tree/main) for more examples and tutorials.

## Notebook Outline
* [Step 1: Install and Setup](#install)
* [Step 2: Build and Trace RAG System](#trace)
  * [Download Data](#download_data)
  * [Trace Ingestion Pipeline](#trace_ingestion)
  * [Trace Query Pipeline](#trace_query)
  * [View Traces in RAG Workbench UI](#view_ui)
* [Step 3: Debug and Optimize your RAG System](#debug)
  * [Measure and Evaluate Performance](#measure)
  * [View Results in RAG Workbench UI](#identify)

<a name="install" id="install"></a>
## Step 1: Install and Setup

First install the required packages.

In [None]:
!pip install chromadb
!pip install lastmile-eval --upgrade
!pip install jsonref

Import the modules used in this tutorial.

In [2]:
import chromadb
import pandas as pd
import openai

from dataclasses import asdict
from lastmile_eval.rag.debugger.api import (
    LastMileTracer,
    Node,
    RetrievedNode,
)
from lastmile_eval.rag.debugger.tracing import (
    get_lastmile_tracer,
    list_ingestion_trace_events,
    get_latest_ingestion_trace_id,
    get_trace_data,
)
from lastmile_eval.rag.debugger.common.types import RagFlowType
from functools import partial
from lastmile_eval.rag.debugger.api.evaluation import (
    run_and_evaluate,
)

We also need the following API tokens/keys:

* **LastMile AI API Token:** Go to the [LastMile Settings page](https://lastmileai.dev/settings?page=tokens). You will need to first create a LastMile AI account.
* **OpenAI API Key:** Go to [OpenAI API Keys page](https://platform.openai.com/account/api-keys) to create and access your OpenAI API Key.

Run the code cell below after setting the keys either in **Google Colab Secrets** or in `.env` in your directory. Avoid inputting keys directly in the notebook.

In [3]:
import os

try:
    # If running on Google Colab, use userdata to securely input keys
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    LASTMILE_API_TOKEN = userdata.get('LASTMILE_API_TOKEN')
except ModuleNotFoundError:
    # If running locally, load keys from .env file
    from dotenv import load_dotenv
    load_dotenv()
    OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
    LASTMILE_API_TOKEN = os.getenv('LASTMILE_API_TOKEN')

os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
os.environ['LASTMILE_API_TOKEN'] = LASTMILE_API_TOKEN

<a name="trace" id="trace"></a>

## Step 2: Build and Trace RAG System

**Note:** If you are using OpenAI, LangChain, or LlamaIndex, we offer auto-instrumention for tracing (no manual setup required).

<a name="download_data" id="download_data"></a>

#### Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!curl 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -o 'data/paul_graham/paul_graham_essay.txt'

<a name="trace_ingestion" id="trace_ingestion"></a>

#### Trace Document Ingestion Pipeline
In this step, we chunk the Paul Graham essay and store the chunks in ChromaDB, a vector database that converts the chunks into vector embeddings for efficient indexing and retrieval.

We use **LastMile Tracing** to monitor the ingestion process and log key parameters.



In [5]:
# Instantiate LastMile Tracer object
PROJECT_NAME = "LM-Tutorial"

ingestion_tracer: LastMileTracer = get_lastmile_tracer(
    tracer_name="my-tracer",
    project_name=PROJECT_NAME,
    rag_flow_type=RagFlowType.INGESTION,
)

Setup the ingestion pipeline below with LastMile Tracing.

In [6]:
chroma_client = chromadb.Client()

# Decorate chunking function with LastMile Tracer
@ingestion_tracer.trace_function()
def chunk_document(file_path: str, chunk_size: int = 1000) -> list[Node]:
    """
    Chunk a text file into a list of strings based on the specified chunk size.

    Args:
        file_path (str): The path to the text file.
        chunk_size (int): The desired number of characters in each chunk.

    Returns:
        list[Node]: A list of Nodes, where each node contains info
            representing a chunk of text.
    """
    with open(file_path, "r") as file:
        text = file.read()

    nodes: list[Node] = []
    for i in range(0, len(text), chunk_size):
        nodes.append(
            Node(
                id=f"node{i}",
                text=text[i:i + chunk_size],
            )
        )

    # Add the chunking event to LastMile Tracer
    ingestion_tracer.add_chunking_event(
        output_nodes=nodes,
        filepath=file_path,
        metadata={"chunk_size": chunk_size},
    )
    # Log chunk size as a parameter
    ingestion_tracer.register_ingestion_chunk_size(chunk_size)


    return nodes

# Decorate ingestion function with LastMile Tracer
@ingestion_tracer.trace_function()
def run_ingestion_flow() -> chromadb.Collection:
    filepath = "data/paul_graham/paul_graham_essay.txt"
    document_nodes: list[Node] = chunk_document(filepath)

    collection = chroma_client.create_collection(name="paul_graham_collection")
    collection.add(
        ids=[node.id for node in document_nodes],
        documents=[node.text for node in document_nodes]
    )

    # Add the synthesize event to LastMile Tracer
    ingestion_tracer.add_synthesize_event(
        input=filepath,
        output=[asdict(node) for node in document_nodes],
    )

    return collection

In [None]:
collection = run_ingestion_flow()

**Important - Link Ingestion Pipeline to Query Pipeline**

Here's how to get the latest ingestion trace ID to use when setting up the tracing for the query pipeline, allowing you to link the ingestion and query pipeline traces for a comprehensive overview of your RAG system.

In [None]:
ingestion_trace_id = list_ingestion_trace_events(take=1)["ingestionTraces"][0]["id"]

print(ingestion_trace_id)

<a name="trace_query" id="trace_query"></a>

#### Trace Query Pipeline
Now we will trace the query pipeline which uses `gpt-3.5-turbo` to generate responses to user queries. We link the trace of the query pipeline to the ingestion pipeline using `ingestion_trace_id`.



In [9]:
from typing import Optional

LLM_NAME = "gpt-3.5-turbo"

PROMPT_TEMPLATE = """
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer:
"""

# Instantiate Tracer for the Query Pipeline
tracer: LastMileTracer = get_lastmile_tracer(
    tracer_name="my-tracer",
    project_name=PROJECT_NAME,
    rag_flow_type=RagFlowType.QUERY,
)

# Decorate retrieval function with LastMile Tracer
@tracer.trace_function("retrieve-context")
def retrieve_context(
    query_string: str,
    ingestion_trace_id: str,
    top_k: int = 5,
) -> list[RetrievedNode]:
    """
    Retrieve the top-k most relevant contexts based on the query string
    from the chroma db collection
    """
    # Log Top K as a parameter
    tracer.register_retrieval_top_k(top_k)

    chroma_retrival_results = collection.query(query_texts=query_string, n_results=top_k)

    retrieved_nodes: list[RetrievedNode] = []
    for i in range(len(chroma_retrival_results.get("documents")[0])):
        retrieved_nodes.append(
            RetrievedNode(
                text=chroma_retrival_results.get("documents")[0][i],
                id=chroma_retrival_results.get("ids")[0][i],
                score=1/chroma_retrival_results.get("distances")[0][i],
            )
        )

    # Add the retrieval event to LastMile Tracer
    tracer.add_retrieval_event(
        query=query_string,
        retrieved_nodes=retrieved_nodes,
        metadata={"top_k": top_k},
        ingestion_trace_id=ingestion_trace_id,
    )

    return retrieved_nodes

# Decorate prompt resolution function with LastMile Tracer
@tracer.trace_function("resolve-prompt")
def resolve_prompt(
    user_query: str,
    retrieved_nodes: list[RetrievedNode],
    ingestion_trace_id: str,
) -> str:
    retrieved_texts = [node.text for node in retrieved_nodes]
    resolved_prompt = PROMPT_TEMPLATE.replace(
        "{context_str}", "\n\n\n".join(retrieved_texts)
    ).replace("{query_str}", user_query)

    # Add the prompt template event to LastMile Tracer
    tracer.add_template_event(
        prompt_template=PROMPT_TEMPLATE,
        resolved_prompt=resolved_prompt,
        ingestion_trace_id=ingestion_trace_id,
    )
    return resolved_prompt

# Decorate LLM call function with LastMile Tracer
@tracer.trace_function("query-root-span")
def run_query_flow(user_query: str, ingestion_trace_id: str) -> str:
    retrieved_nodes = retrieve_context(user_query, ingestion_trace_id, top_k=3)
    resolved_prompt = resolve_prompt(
        user_query,
        retrieved_nodes,
        ingestion_trace_id,
    )

    # Start span on LastMile Tracer
    with tracer.start_as_current_span("call-llm") as _llm_span:
        openai_client = openai.Client(api_key=os.getenv("OPENAI_API_KEY"))
        response = openai_client.chat.completions.create(
            model=LLM_NAME,
            messages=[{"role": "user", "content": resolved_prompt}],
        )
        output: str = response.choices[0].message.content

        # Add the LLM query event to LastMile Tracer
        tracer.add_query_event(
            query=resolved_prompt,
            llm_output=output,
            metadata={"llm_name": LLM_NAME},
            ingestion_trace_id=ingestion_trace_id,
        )

        # Log query model as a parameter
        tracer.register_query_model(LLM_NAME)


    return output


Let's try an example user query.

In [10]:
response = run_query_flow("What did the author do growing up?", ingestion_trace_id)

print(f"Response: {response}")

Response: The author did not have a background in drawing in high school but started taking art classes at Harvard and eventually dropped out of RISD to teach himself how to paint.


<a name="view_ui" id="view_ui"></a>

#### View Traces in RAG Workbench UI
Now, you can view the traces of both the ingestion and query pipelines in the RAG Workbench UI.

1. Go to your terminal and install the UI package:

  `pip install "lastmile-eval[ui]"`

2. Export your LASTMILE_API_TOKEN, replacing <your-api-token> with your actual API token:

  `export LASTMILE_API_TOKEN="<your-api-token>"`

3. Run the following command in your terminal:

  `rag-debug launch`

4. Click on the provided URL (e.g., http://127.0.0.1:8000)

5. Go to the Traces Tab, select **Project 'LM-Tutorial'**. You should see a single trace for the test run executed from this notebook, displaying the traced ingestion and query pipelines.

<img width="600" alt="Screenshot 2024-06-17 at 1 56 46 PM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/3e96e7fe-ad7e-427e-955f-532409044664">



<a name="debug" id="debug"></a>

## Step 3: Debug your RAG System

Evaluation metrics are essential for measuring and improving your RAG system. Evaluators assess the quality of LLM-generated results by taking inputs such as the response, ground truth data, and context, and output a numeric score (0-1). To begin, we'll run evaluations on data passed into our RAG system and analyze the metrics in the RAG Workbench UI.

<a name="measure" id="measure"></a>

#### Measure and Evaluate Performance


We'll test questions with known answers and use evaluators to assess the quality of the RAG system's responses against the ground truth.

In [13]:
user_questions = [
    "What two main things did Paul Graham work on before college, outside of school?",
    "What was the key realization Paul Graham had about artificial intelligence during his first year of grad school at Harvard?",
    "How did Paul Graham and his partner Robert Morris get their initial idea and start working on what became their startup Viaweb?",
    "What were some of the novel approaches and advantages that Y Combinator introduced compared to traditional venture capital firms when it first started?",
    "What ambitious programming language project did Paul Graham work on intensively for 4 years from 2015-2019, and what was unique about the goal and approach of this language called Bel?"
]

ground_truth_answers = [
    "Before college, Paul Graham worked on solving mathematical problems and participated in programming competitions.",
    "During his first year of grad school at Harvard, Paul Graham realized that it was possible to make computers understand natural language by using statistical methods.",
    "Paul Graham and his partner Robert Morris got their initial idea for Viaweb by realizing that they could build an online store builder that would make it easy for non-programmers to create their own e-commerce websites.",
    "Compared to traditional venture capital firms, Y Combinator introduced the novel approaches of funding startups through seed-stage investments and providing mentorship and support to founders.",
    "From 2015-2019, Paul Graham worked intensively on the programming language project called Bel, which aimed to create a language that made it easy to write programs that were concise, expressive, and readable."
]

For each question, we'll run 3 of our LastMile evaluators (Faithfulness,Relevance Score, Similarity Score). [Read about these metrics](https://rag.lastmileai.dev/docs/features/eval_metrics).

Since each question is associated with a specific trace, we'll obtain the three scores for each trace, allowing us to assess the performance of the RAG pipeline at a granular level.

In [None]:
# Specify our evaluators
evaluator_names = {"faithfulness", "relevance", "similarity"}

# Runs the RAG query pipeline (with tracing) on test set and evaluates responses
evaluate_result = run_and_evaluate(
    project_name=PROJECT_NAME,
    evaluators=evaluator_names,
    run_query_fn=partial(
        run_query_flow,
        ingestion_trace_id=ingestion_trace_id
    ),
    inputs=user_questions,
    ground_truths=ground_truth_answers,
)

<a name="identify" id="identify"></a>

### View Results in RAG Workbench UI

Now, you can view the evaluation results in the RAG Workbench UI:

1. Go to your terminal and install the UI package:

  `pip install "lastmile-eval[ui]"`

2. Export your LASTMILE_API_TOKEN, replacing <your-api-token> with your actual API token:

  `export LASTMILE_API_TOKEN="<your-api-token>"`

3. Run the following command in your terminal:

  `rag-debug launch`

4. Click on the provided URL (e.g., http://127.0.0.1:8000)

5. Navigate to the **Evaluation Console Tab**, select **Project 'LM-Tutorial'**, and click on the latest evaluation run to view your results.

<img width="850" alt="eval_set_view" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/d0ef6d37-40c0-456e-9288-20469304da6c">


6. We can see all data for our latest evaluation run. Let's debug the response to a specific question. Click the debug icon.

<img width="850" alt="Screenshot 2024-06-18 at 5 51 44 PM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/2e2bc98e-6ab3-4a7e-a3bb-816f49154f85">


7. In the Trace overview, we can see all the steps (via our LastMile Tracer) used to generate the output for this specific question.

<img width="850" alt="trace_overview" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/d283cd3f-da69-43e0-b719-8c4fbee29607">

8. Let's look at the 'Retrieve-context' step. We can see all our retrieved context easily here.

<img width="850" alt="retrieved_docs" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/5adfc530-f7be-4b24-9adf-6469bfc30356">


9. For the LLM call step, we can **'Debug Prompts'** to switch out different models, edit the system prompt, etc. in real time!

<img width="850" alt="run_prompt" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/4c71af4a-a057-4629-9aa6-02778c711f5a">



