# Debugging LlamaIndex

I've been developing RAG(Retrieval Augmented Generation) apps with Llamaindex and helping develop/contributing to the main project as well. Over time I've been learning a few tricks here and there to helps debug the pipelines I build more effectively. This is a collection of all my tips and tricks


## Content
### 1. [Tracing your steps with `CallbackManager`](#Tracing-your-steps-with-CallbackManager)
### 2. [Furthur Explorations with `LlamaDebugHandler`](#Furthur-Explorations-with-LlamaDebugHandler)
### 3. [Setting up wandb for experiment tracking and tracing](#Setting-up-wandb-for-experiment-tracking-and-tracing)

# Tracing your steps with `CallbackManager`

Using the `CallbackManager` allows you to trace the steps llamaindex takes to generate the response and the time each step took. 

For this example lets compare 2 indices `ListIndex` and `VectorIndex` and see how the outputs are.

but first things first, lets import everything and init a service_context that uses the callback_manager we created. Then we use `set_global_service_context` to use that service context throught.

In [1]:
from langchain.chat_models import ChatOpenAI
from llama_index import set_global_service_context
from llama_index import ListIndex, VectorStoreIndex, ServiceContext, LLMPredictor
from llama_index.callbacks import CallbackManager, LlamaDebugHandler, CBEventType

llm_predictor = LLMPredictor(
    llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
)

llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
service_context = ServiceContext.from_defaults(
    callback_manager=callback_manager, 
    llm_predictor=llm_predictor
)

set_global_service_context(service_context)

In [2]:
# load the data
from llama_index import SimpleDirectoryReader

docs = SimpleDirectoryReader("./what_i_worked_on_pg/").load_data()
len(docs)

1

Now lets create the first index

In [3]:
vector_index = VectorStoreIndex.from_documents(docs)

**********
Trace: index_construction
    |_node_parsing ->  0.124317 seconds
      |_chunking ->  0.122693 seconds
    |_embedding ->  1.381548 seconds
    |_embedding ->  0.986437 seconds
**********


and vola! you can see the traces in actions. Here you can see the different steps the index construction took. You can see that the LlamaIndex made 2 calls to the embedding endpoint to create the embeddings for the chunks.

Now lets try `ListIndex`

In [4]:
list_index = ListIndex.from_documents(docs)

**********
Trace: index_construction
    |_node_parsing ->  0.130536 seconds
      |_chunking ->  0.128729 seconds
**********


And we have a different output. If you know `ListIndexes` you know that they don't have embeddings instead just chunk the docs and store.

That was index creation but how about query time? Lets find out

In [5]:
# util func to help me :)
def query(question, index):
    qe = index.as_query_engine()
    r = qe.query(question)
    
    return r

In [6]:
r = query("what did the author do growing up?", vector_index)

**********
Trace: query
    |_query ->  2.390294 seconds
      |_retrieve ->  0.33361 seconds
        |_embedding ->  0.329092 seconds
      |_synthesize ->  2.05655 seconds
        |_llm ->  2.031409 seconds
**********


Here you can see the steps LlamaIndex took.

Lets try something a bit more complicated, like our `ListIndex`

In [7]:
r = query("what did the author do growing up?", list_index)

**********
Trace: query
    |_query ->  100.628079 seconds
      |_retrieve ->  0.003366 seconds
      |_synthesize ->  100.624557 seconds
        |_llm ->  4.596029 seconds
        |_llm ->  5.08475 seconds
        |_llm ->  6.206133 seconds
        |_llm ->  10.300521 seconds
        |_llm ->  8.189844 seconds
        |_llm ->  9.047571 seconds
        |_llm ->  11.563737 seconds
        |_llm ->  10.239132 seconds
        |_llm ->  15.381177 seconds
        |_llm ->  19.77576 seconds
**********


As you can imagine this is very neat tool to get a better understanding of what are the steps that happening internally with LlamaIndex. Especially usefull when debugging, something I've turned to time and time again when building complex indexes with LlamaIndex.

but what if you want a bit more info?

# Furthur Explorations with `LlamaDebugHandler`

> Note: The `LlamaDebugHandler` is in beta and subject to change. So these examples might get outdated check the [docs](https://gpt-index.readthedocs.io/en/latest/examples/callbacks/LlamaDebugHandler.html) and [code]() for up-to-date info. 


LlamaIndex uses callbacks to log the events that are happening. These callbacks are exposed via the `LlamaDebugHandler` for debugging purposes. If you can parse these, it is much easier when you are working with LlamaIndex to know exactly what is happening.

the `CallbackManager` logs many different events. Here is the full list.
```python
CHUNKING = "chunking"
NODE_PARSING = "node_parsing"
EMBEDDING = "embedding"
LLM = "llm"
QUERY = "query"
RETRIEVE = "retrieve"
SYNTHESIZE = "synthesize"
TREE = "tree"
SUB_QUESTIONS = "sub_questions"
```

They also have different payload types which are listed here
```python
DOCUMENTS = "documents"  # list of documents before parsing
CHUNKS = "chunks"  # list of text chunks
NODES = "nodes"  # list of nodes
PROMPT = "formatted_prompt"  # formatted prompt sent to LLM
RESPONSE = "response"  # response from LLM
TEMPLATE = "template"  # template used in LLM call
QUERY_STR = "query_str"  # query used for query engine
SUB_QUESTIONS = "sub_questions"  # list of sub question & answer pairs
```

Now lets see this in action.

In [53]:
# clear the events uptill now
llama_debug.flush_event_logs()

In [54]:
r = query("what did the author do growing up?", list_index)

In [65]:
from llama_index.callbacks import CBEventType

es = llama_debug.get_events()
len(es)

26

In [66]:
for e in es:
    print(e.event_type)

CBEventType.QUERY
CBEventType.RETRIEVE
CBEventType.RETRIEVE
CBEventType.SYNTHESIZE
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.LLM
CBEventType.SYNTHESIZE
CBEventType.QUERY


this represents one List Index call and can be quite helpful when debugging. If you want to see all llms call made run `get_llm_inputs_outputs()` to get the LLM event pairs

In [68]:
es = llama_debug.get_llm_inputs_outputs()
len(es)

10

To view the events sequentially you can access the `sequential_events` property.

In [69]:
e = llama_debug.sequential_events
len(e)

26

But honestly debugging like this is very hard and non-intutive. We need easier ways to do this. We'll cover those next.

# Setting up wandb for experiment tracking and tracing

(soon)

In [59]:
from llama_index.callbacks import WandbCallbackHandler

wandb_callback = WandbCallbackHandler(run_args={
    "project": "LlamaIndex",
})

callback_manager = CallbackManager([llama_debug, wandb_callback])
service_context = ServiceContext.from_defaults(
    callback_manager=callback_manager, 
    llm_predictor=llm_predictor
)

set_global_service_context(service_context)

In [60]:
vector_index = VectorStoreIndex.from_documents(docs)

**********
Trace: index_construction
    |_embedding ->  1.630853 seconds
    |_embedding ->  1.937581 seconds
**********


[34m[1mwandb[0m: Logged trace tree to W&B.


In [61]:
wandb_callback.persist_index(vector_index, index_name="vector_index")

[34m[1mwandb[0m: Adding directory to artifact (/home/jjmachan/jjmachan/explodinggradients/notes/retrieval/wandb/run-20230622_152138-hhq061lj/files/storage)... Done. 0.0s


In [62]:
r = query("what did the author do growing up?", vector_index)

**********
Trace: query
    |_query ->  2.01595 seconds
      |_retrieve ->  0.463314 seconds
        |_embedding ->  0.457025 seconds
      |_synthesize ->  1.552477 seconds
        |_llm ->  1.534402 seconds
**********


[34m[1mwandb[0m: Logged trace tree to W&B.
