# LastMile Instrumentor for LlamaIndex

In this notebook, we showcase how to use the **LastMile Tracing SDK** to auto-instrument tracing for your LlamaIndex applications. With tracing automatically setup, you can easily debug your RAG application using LastMile's RAG Debugger.

## Notebook Outline
* [Step 1: Setup](#setup)
* [Step 2: Configure the LastMile Instrumentor](#step2)
* [Step 3: Load and Process Docs with LlamaIndex](#step3)
* [Step 4: Create an Index and Query Engine with LlamaIndex](#step4)
* [Step 5: Query the Index with LlamaIndex](#step5)
* [Step 6: View Trace Data in RAG Debugger](#step6)


<a name="setup"></a>
# Step 1: Setup

To begin, we need to install a few packages including llamaindex and lastmile-eval.


In [None]:
!pip install llama-index-embeddings-openai
!pip install llama-index-embeddings-openai --upgrade

%pip install -q html2text llama-index pandas pyarrow tqdm
%pip install -q llama-index-readers-web
%pip install -q llama-index-callbacks-openinference
!pip install openai --upgrade
!pip install "tracing-auto-instrumentation[llama-index]" --upgrade


Import the necessary libraries for this example

In [2]:
import os
from getpass import getpass

import dotenv
import llama_index.core

from tracing_auto_instrumentation.llama_index import LlamaIndexCallbackHandler
import textwrap

Before we start this tutorial, we need the following tokens/keys:

* LastMile AI API Token: Go to the [LastMile Settings page](https://lastmileai.dev/settings?page=tokens). You will need to first create a LastMile AI account.
* OpenAI API Key: Go to [OpenAI API Keys page](https://platform.openai.com/account/api-keys) to create and access your OpenAI API Key.

We're using Google Colab's Secret Manager to set our tokens in this notebook.

In [3]:
from google.colab import userdata

os.environ['OPENAI_API_KEY'] =  userdata.get('OPENAI_API_KEY')
os.environ['LASTMILE_API_TOKEN'] =  userdata.get('LASTMILE_API_TOKEN')

<a name="step2"></a>

## Step 2: Configure the LastMile Instrumentor

Next, we need to configure the LastMile Instrumentor by setting the global handler for LlamaIndex.

In [4]:
import llama_index.core

from tracing_auto_instrumentation.llama_index import LlamaIndexCallbackHandler

llama_index.core.global_handler = LlamaIndexCallbackHandler(
    project_name="LlamaIndex with Paul Graham",
)

<a name="step3"></a>

# Step 3: Load and Process Documents


In [5]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader().load_data(
    [
        "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt"
    ]
)

parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

<a name="step4"></a>

# Step 4: Create an Index and Query Engine

In [6]:
from llama_index.embeddings.openai.base import OpenAIEmbedding

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()


<a name="step5"></a>

# Step 5: Query the Index

In [7]:
max_characters_per_line = 80
queries = [
    "What did Paul Graham do growing up?",
    "When and how did Paul Graham's mother die?",
    "What, in Paul Graham's opinion, is the most distinctive thing about YC?"
]
for query in queries:
    response = query_engine.query(query)
    print("Query")
    print("=====")
    print(textwrap.fill(query, max_characters_per_line))
    print()
    print("Response")
    print("========")
    print(textwrap.fill(str(response), max_characters_per_line))
    print()

Query
=====
What did Paul Graham do growing up?

Response
Growing up, Paul Graham worked on writing short stories and programming. He
started programming on an IBM 1401 in 9th grade using an early version of
Fortran. Later, he got a TRS-80 computer and wrote simple games, a rocket
prediction program, and a word processor. Despite his interest in programming,
he initially planned to study philosophy in college before eventually switching
to AI.

Query
=====
When and how did Paul Graham's mother die?

Response
Paul Graham's mother died when he was 18 years old, from a brain tumor.

Query
=====
What, in Paul Graham's opinion, is the most distinctive thing about YC?

Response
The most distinctive thing about Y Combinator, according to Paul Graham, is that
instead of deciding for himself what to work on, the problems come to him. Every
6 months, a new batch of startups brings their problems, which then become the
focus of YC. This engagement with a variety of startup problems and the
opport

<a name="step6"></a>

# Step 6: View Trace Data in RAG Debugger
Now we can view the trace data of our LlamaIndex application in a UI!
#### From your terminal:

Export your LASTMILE_API_TOKEN

```bash
export LASTMILE_API_TOKEN="<your-api-token>"
```

Run this CLI command to access the UI

```bash
rag-debug launch
```
Navigate to the 'Traces' Page where you see all the Traces listed under this Project "LlamaIndex with Paul Graham" (top-right corner to choose Project).

<img width="973" alt="Screenshot 2024-05-28 at 11 37 26 AM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/9568199a-5404-4254-aaf5-f87ac5a2f562"/>

Let's click into the Trace.

<img width="973" alt="Screenshot 2024-05-28 at 11 38 05 AM" src="https://github.com/lastmile-ai/aiconfig/assets/81494782/cc1e451c-472f-4b56-9bfc-5508326d12d9"/>

Here we can see all the spans auto-generated for us. This can help us debug and pinpoint issues in our application, especially if we add additional logging on top of the auto-instrumentor.