# Llama-Index Quickstart

In this quickstart you will create a simple Llama Index App and learn how to log it and get feedback on an LLM response.

For evaluation, we will leverage the "hallucination triad" of groundedness, context relevance and answer relevance.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb)

## Setup

### Install dependencies
Let's install some of the dependencies for this notebook if we don't have them already

In [1]:
%pip install -qU "trulens_eval>=0.19.2" "llama_index>0.9.17" "html2text>=2020.1.16" qdrant_client python-dotenv ipywidgets streamlit_jupyter "litellm>=1.15.1" google-cloud-aiplatform
# 'google-generativeai>=0.3.0'
# %pip install -qU trulens_eval==0.19.1 llama_index>0.9.15 html2text>=2020.1.16 qdrant_client python-dotenv

%load_ext dotenv

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install -U google-cloud-core google-cloud-aiplatform

Note: you may need to restart the kernel to use updated packages.


### Add API keys
For this quickstart, you will need Open AI and Huggingface keys. The OpenAI key is used for embeddings and GPT, and the Huggingface key is used for evaluation.

In [1]:
# import os
# from google.colab import userdata
# GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"] = userdata.get('GEMINI_API_KEY')
# os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [13]:
import os
%load_ext dotenv
%dotenv
GOOGLE_API_KEY = os.environ["GEMINI_API_KEY"]

In [3]:
from google.cloud import aiplatform

# This is used by the LiteLLM for Vertex AI models including Gemini.
# The LiteLLM wrapper for Gemini is used by the TruLens evaluation provider.
aiplatform.init(project="fovi-site", location="us-west1")

### Import from LlamaIndex and TruLens

In [4]:
from trulens_eval import Tru

tru = Tru(database_redact_keys=True)

🦑 Tru initialized with db url sqlite:///default.sqlite .
🔒 Secret keys will not be included in the database.


### Create Simple LLM Application

This example uses LlamaIndex which internally uses an OpenAI LLM.

In [5]:
from llama_index.readers.web import SimpleWebPageReader

DOCUMENTS = SimpleWebPageReader(html_to_text=True).load_data(["http://paulgraham.com/worked.html"])

In [15]:
import qdrant_client
from llama_index import ServiceContext, StorageContext, VectorStoreIndex
from llama_index import StorageContext
from llama_index.embeddings import GeminiEmbedding
from llama_index.llms import Gemini
from llama_index.vector_stores import QdrantVectorStore


def load_llamaindex_app():
    # Create a local Qdrant vector store
    # client = qdrant_client.QdrantClient(path="qdrant_gemini_3")
    client = qdrant_client.QdrantClient(location=":memory:")

    vector_store = QdrantVectorStore(client=client, collection_name="collection")
    # Using the embedding model to Gemini
    embed_model = GeminiEmbedding(model_name="models/embedding-001", api_key=GOOGLE_API_KEY)

    service_context = ServiceContext.from_defaults(
        llm=Gemini(api_key=GOOGLE_API_KEY), embed_model=embed_model
    )
    storage_context = StorageContext.from_defaults(vector_store=vector_store)

    index = VectorStoreIndex.from_documents(
        DOCUMENTS,
        service_context=service_context,
        storage_context=storage_context,
        show_progress=True,
    )

    return index.as_query_engine()

query_engine = load_llamaindex_app()

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/23 [00:00<?, ?it/s]

### Send your first request

In [16]:
RESPONSE = query_engine.query("What does the author say about their education?")
print(RESPONSE)

The author initially studied philosophy in college but found it boring and switched to AI. They then pursued a PhD in computer science at Harvard while also taking art classes. After completing their PhD, they applied to art schools and was accepted into the BFA program at RISD. They also received an invitation to take the entrance exam at the Accademia di Belli Arti in Florence, which they passed. However, the author ultimately decided to attend RISD.


In [16]:
RESPONSE = query_engine.query("Where did the author go to school?")
print(RESPONSE)

The author went to Harvard for a PhD program in computer science, RISD for a BFA program, and the Accademia di Belli Arti in Florence for an entrance exam.


In [12]:
RESPONSE = query_engine.query("Who was the author's Harvard PhD advisor?")
print(RESPONSE)

The provided context does not mention the author's Harvard PhD advisor, so I cannot answer this question.


In [17]:
RESPONSE = query_engine.query("who was Tom Cheatham to the author?")
print(RESPONSE)

Tom Cheatham was the author's advisor in the PhD program in computer science at Harvard.


In [14]:
RESPONSE = query_engine.query("who is Tom? why is he in this story?")
print(RESPONSE)

Tom Cheatham is a professor at Harvard University. He is mentioned in the story because the narrator was taking art classes at Harvard while pursuing a PhD in computer science. Tom Cheatham was the narrator's advisor, and he was very easy-going about the narrator's choice to take art classes. He never said anything about it, even though it was unusual for a PhD student to be taking art classes.


In [15]:
RESPONSE = query_engine.query(
    "what is this story about?  what are the most important things the author want the reader to learn?"
)

print(RESPONSE)

This story is about the author's journey from being a philosophy student to becoming an AI researcher. The author initially believed that philosophy was the study of ultimate truths, but later realized that it was mostly concerned with edge cases that other fields ignored. The author then switched to AI, inspired by a novel and a documentary. However, the author later realized that the AI research at the time was a hoax, as it was limited to a small subset of formal language and could not truly understand natural language.

The most important things the author wants the reader to learn are:
- The limitations of AI research at the time, particularly the inability to truly understand natural language.
- The importance of having realistic expectations about what AI can achieve.
- The need for a new approach to AI research that goes beyond the traditional methods of representing concepts with explicit data structures.


## Initialize Feedback Function(s)

In [6]:
import numpy as np
from trulens_eval import Feedback, LiteLLM, TruLlama
from trulens_eval.feedback import Groundedness

# Initialize provider class
GEMINI_PROVIDER = LiteLLM(model_engine="gemini-pro")

GROUNDED = Groundedness(groundedness_provider=GEMINI_PROVIDER)

# Define a groundedness feedback function
f_groundedness = (
    Feedback(GROUNDED.groundedness_measure_with_cot_reasons)
    .on(TruLlama.select_source_nodes().node.text.collect())
    .on_output()
    .aggregate(GROUNDED.grounded_statements_aggregator)
)

# Question/answer relevance between overall question and answer.
f_groundedness = Feedback(GEMINI_PROVIDER.relevance).on_input_output()

# Question/statement relevance between question and each context chunk.
f_qs_relevance = (
    Feedback(GEMINI_PROVIDER.qs_relevance)
    .on_input()
    .on(TruLlama.select_source_nodes().node.text)
    .aggregate(np.mean)
)

✅ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In qs_relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In qs_relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .


## Instrument app for logging with TruLens

In [7]:
tru_query_engine_recorder = TruLlama(
    query_engine,
    tru=tru,
    app_id="PaulGrahamB", initial_app_loader=load_llamaindex_app,
    feedbacks=[f_groundedness, f_groundedness, f_qs_relevance],
)

<function load_llamaindex_app at 0x154dc6b60>


In [None]:
mymodel = tru_query_engine_recorder.model_dump()
tru_query_engine_recorder.model_validate(mymodel)
# Beware, this can expose the API key(s) in the notebook.

In [8]:
tru_query_engine_recorder = tru.Llama(
    query_engine,
    app_id="PaulGrahamC", initial_app_loader=load_llamaindex_app,
    feedbacks=[f_groundedness, f_groundedness, f_qs_relevance],
)

<function load_llamaindex_app at 0x154dc6b60>


In [9]:
# or as context manager
with tru_query_engine_recorder as _:
    response = query_engine.query("Why did the author drop AI?")
    print(response)

The author dropped AI because they realized that the way AI was practiced at the time was a hoax. They believed that the whole way of doing AI, with explicit data structures representing concepts, was not going to work and would never lead to the creation of truly intelligent machines like Mike from the novel _The Moon is a Harsh Mistress_.


## Explore in a Dashboard

In [10]:
list(tru.get_apps())

[{'tru_class_info': {'name': 'TruLlama',
   'module': {'package_name': 'trulens_eval',
    'module_name': 'trulens_eval.tru_llama'},
   'bases': [{'name': 'TruLlama',
     'module': {'package_name': 'trulens_eval',
      'module_name': 'trulens_eval.tru_llama'},
     'bases': None},
    {'name': 'App',
     'module': {'package_name': 'trulens_eval',
      'module_name': 'trulens_eval.app'},
     'bases': None},
    {'name': 'AppDefinition',
     'module': {'package_name': 'trulens_eval',
      'module_name': 'trulens_eval.schema'},
     'bases': None},
    {'name': 'SerialModel',
     'module': {'package_name': 'trulens_eval.utils',
      'module_name': 'trulens_eval.utils.serial'},
     'bases': None},
    {'name': 'WithClassInfo',
     'module': {'package_name': 'trulens_eval.utils',
      'module_name': 'trulens_eval.utils.pyschema'},
     'bases': None},
    {'name': 'BaseModel',
     'module': {'package_name': 'pydantic', 'module_name': 'pydantic.main'},
     'bases': None},
    {

In [12]:
len(list(tru.get_apps()))

2

In [10]:
from trulens_eval.schema import AppDefinition
from pprint import pp

for app in AppDefinition.get_loadable_apps():
    print(app['app_id'])
    pp(app)

PaulGrahamB
{'tru_class_info': {'name': 'TruLlama',
                    'module': {'package_name': 'trulens_eval',
                               'module_name': 'trulens_eval.tru_llama'},
                    'bases': [{'name': 'TruLlama',
                               'module': {'package_name': 'trulens_eval',
                                          'module_name': 'trulens_eval.tru_llama'},
                               'bases': None},
                              {'name': 'App',
                               'module': {'package_name': 'trulens_eval',
                                          'module_name': 'trulens_eval.app'},
                               'bases': None},
                              {'name': 'AppDefinition',
                               'module': {'package_name': 'trulens_eval',
                                          'module_name': 'trulens_eval.schema'},
                               'bases': None},
                              {'name': 'SerialModel',

In [11]:
tru.run_dashboard() # open a local streamlit app to explore

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://192.168.86.200:8501 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

In [13]:
from trulens_eval.schema import AppDefinition

for app_json in AppDefinition.get_loadable_apps():
    print(app_json["app_id"])
    print(app_json['app'])

PaulGrahamB
{'__tru_non_serialized_object': {'cls': {'name': 'RetrieverQueryEngine', 'module': {'package_name': 'llama_index.query_engine', 'module_name': 'llama_index.query_engine.retriever_query_engine'}, 'bases': None}, 'id': 5755054672, 'init_bindings': None}}
PaulGrahamC
{'__tru_non_serialized_object': {'cls': {'name': 'RetrieverQueryEngine', 'module': {'package_name': 'llama_index.query_engine', 'module_name': 'llama_index.query_engine.retriever_query_engine'}, 'bases': None}, 'id': 5755054672, 'init_bindings': None}}


In [14]:
tru.stop_dashboard(force=True) # stop if needed

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [16]:
tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,relevance,qs_relevance,groundedness_measure_with_cot_reasons,relevance_calls,qs_relevance_calls,groundedness_measure_with_cot_reasons_calls,latency,total_tokens,total_cost
0,LlamaIndex_App1,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_52bf02f03f3c55b3593b4e1c8441facb,"""Why did the author drop AI?""","""The author dropped AI because they realized t...",-,"{""record_id"": ""record_hash_52bf02f03f3c55b3593...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2023-12-22T13:19:57.307562"", ""...",2023-12-22T13:20:03.164893,0.9,1.0,1.0,[{'args': {'prompt': 'Why did the author drop ...,[{'args': {'question': 'Why did the author dro...,[{'args': {'source': ['Though I liked programm...,5,0,0.0
1,PaulGrahamX,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_3538328d0bb2a0a3b042d913a8312bad,"""Why did the author drop AI?""","""The author dropped AI because they realized t...",-,"{""record_id"": ""record_hash_3538328d0bb2a0a3b04...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2023-12-22T13:23:21.180615"", ""...",2023-12-22T13:23:26.580532,0.9,1.0,1.0,[{'args': {'prompt': 'Why did the author drop ...,[{'args': {'question': 'Why did the author dro...,[{'args': {'source': ['Though I liked programm...,5,0,0.0


In [28]:
def load_llamaindex_app():
    index = VectorStoreIndex.from_documents(documents)
    return index.as_query_engine()

APP2 = load_llamaindex_app()
# tru_app2 = tru.Llama(
# Can't specify which Tru instance to use with tru.Llama.
TRU_APP2 = TruLlama(
    APP2,
    tru=tru,
    app_id="llamaindex_appZZ",
    initial_app_loader=load_llamaindex_app,
    feedbacks=[f_groundedness, f_qa_relevance, f_qs_relevance],
)

In [29]:
tru.add_app(tru_app2)

In [30]:
from trulens_eval.appui import AppUI

AUI = AppUI(
    app=tru_app2,
    app_selectors=[],
    record_selectors=[
        "app.retriever.retrieve[0].rets[:].score",
        "app.retriever.retrieve[0].rets[:].node.text",
],)

VBox(children=(HBox(children=(VBox(children=(VBox(children=(VBox(children=(HBox(children=(HTML(value='<b>human…

kwargs[caching]: False; litellm.cache: None

LiteLLM completion() model= gemini-pro; provider = vertex_ai

LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stop': None, 'max_tokens': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'model': 'gemini-pro', 'custom_llm_provider': 'vertex_ai', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None}

LiteLLM: Non-Default params passed to completion() {}
self.optional_params: {}
PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': {}, 'request_str': 'llm_model = GenerativeModel(gemini-pro)\nchat = llm_model.start_chat()\nchat.send_message(You are a RELEVANCE grader; providing the relevance of the given STATEMENT to the given QUESTION.\nRespond only as a number from 0 to 10 where 0 is the least relevant and 10 is the most relevant. \n\nA few additional scoring gui