# Debugging LlamaIndex

I've been developing RAG(Retrieval Augmented Generation) apps with Llamaindex and helping develop/contributing to the main project as well. Over time I've been learning a few tricks here and there to helps debug the pipelines I build more effectively. This is a collection of all my tips and tricks


## Content
### 1. [Tracing your steps with `CallbackManager`](#Tracing-your-steps-with-CallbackManager)
### 2. [Furthur Explorations with `LlamaDebugHandler`](#Furthur-Explorations-with-LlamaDebugHandler)
### 3. [Setting up wandb for experiment tracking and tracing](#Setting-up-wandb-for-experiment-tracking-and-tracing)

# Tracing your steps with `CallbackManager`

Using the `CallbackManager` allows you to trace the steps llamaindex takes to generate the response and the time each step took. 

For this example lets compare 2 indices `ListIndex` and `VectorIndex` and see how the outputs are.

but first things first, lets import everything and init a service_context that uses the callback_manager we created. Then we use `set_global_service_context` to use that service context throught.

In [14]:
from langchain.chat_models import ChatOpenAI
from llama_index import set_global_service_context
from llama_index import ListIndex, VectorStoreIndex, ServiceContext, LLMPredictor
from llama_index.callbacks import CallbackManager, LlamaDebugHandler, CBEventType

llm_predictor = LLMPredictor(
    llm=ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
)

llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
service_context = ServiceContext.from_defaults(
    callback_manager=callback_manager, 
    llm_predictor=llm_predictor
)

set_global_service_context(service_context)

In [15]:
# load the data
from llama_index import SimpleDirectoryReader

docs = SimpleDirectoryReader("./what_i_worked_on_pg/").load_data()
len(docs)

1

Now lets create the first index

In [16]:
vector_index = VectorStoreIndex.from_documents(docs)

**********
Trace: index_construction
    |_node_parsing ->  0.126203 seconds
      |_chunking ->  0.125512 seconds
    |_embedding ->  1.223936 seconds
    |_embedding ->  1.190142 seconds
**********


and vola! you can see the traces in actions. Here you can see the different steps the index construction took. You can see that the LlamaIndex made 2 calls to the embedding endpoint to create the embeddings for the chunks.

Now lets try `ListIndex`

In [17]:
list_index = ListIndex.from_documents(docs)

**********
Trace: index_construction
    |_node_parsing ->  0.125731 seconds
      |_chunking ->  0.125057 seconds
**********


And we have a different output. If you know `ListIndexes` you know that they don't have embeddings instead just chunk the docs and store.

That was index creation but how about query time? Lets find out

In [25]:
# util func to help me :)
def query(question, index):
    qe = index.as_query_engine()
    r = qe.query(question)
    
    return r

In [26]:
r = query("what did the author do growing up?", vector_index)

**********
Trace: query
    |_query ->  2.269069 seconds
      |_retrieve ->  0.70633 seconds
        |_embedding ->  0.696948 seconds
      |_synthesize ->  1.562535 seconds
        |_llm ->  1.542432 seconds
**********


Here you can see the steps LlamaIndex took.

Lets try something a bit more complicated, like our `ListIndex`

In [27]:
r = query("what did the author do growing up?", list_index)

**********
Trace: query
    |_query ->  206.851148 seconds
      |_retrieve ->  0.019339 seconds
      |_synthesize ->  206.831567 seconds
        |_llm ->  5.387623 seconds
        |_llm ->  11.923166 seconds
        |_llm ->  15.879682 seconds
        |_llm ->  11.992212 seconds
        |_llm ->  20.846005 seconds
        |_llm ->  22.670804 seconds
        |_llm ->  36.684416 seconds
        |_llm ->  25.109205 seconds
        |_llm ->  26.02517 seconds
        |_llm ->  30.040593 seconds
**********


As you can imagine this is very neat tool to get a better understanding of what are the steps that happening internally with LlamaIndex. Especially usefull when debugging, something I've turned to time and time again when building complex indexes with LlamaIndex.

but what if you want a bit more info?

# Furthur Explorations with `LlamaDebugHandler`

(tomorrow)

# Setting up wandb for experiment tracking and tracing

(soon)