# Llama-Index Quickstart

In this quickstart you will create a simple Llama Index App and learn how to log it and get feedback on an LLM response.

For evaluation, we will leverage the "hallucination triad" of groundedness, context relevance and answer relevance.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb)

## Setup

### Install dependencies
Let's install some of the dependencies for this notebook if we don't have them already

In [1]:
%pip install -qU "trulens_eval>=0.19.2" "llama_index>0.9.17" "html2text>=2020.1.16" qdrant_client python-dotenv ipywidgets streamlit_jupyter "litellm>=1.15.1" google-cloud-aiplatform 
# 'google-generativeai>=0.3.0'
# %pip install -qU trulens_eval==0.19.1 llama_index>0.9.15 html2text>=2020.1.16 qdrant_client python-dotenv

%load_ext dotenv

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install -U google-cloud-core google-cloud-aiplatform

Note: you may need to restart the kernel to use updated packages.


### Add API keys
For this quickstart, you will need Open AI and Huggingface keys. The OpenAI key is used for embeddings and GPT, and the Huggingface key is used for evaluation.

In [4]:
# import os
# from google.colab import userdata
# GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"] = userdata.get('GEMINI_API_KEY')
# os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [2]:
%load_ext dotenv

In [3]:
import os
%dotenv
GOOGLE_API_KEY = os.environ["GEMINI_API_KEY"]

In [None]:
from google.cloud import aiplatform

# This is used by the LiteLLM for Vertex AI models including Gemini.
# The LiteLLM wrapper for Gemini is used by the TruLens evaluation provider.
aiplatform.init(
    project = "fovi-site",
    location="us-west1"
)

### Import from LlamaIndex and TruLens

In [4]:
from trulens_eval import Tru

tru = Tru(database_redact_keys=True)

🦑 Tru initialized with db url sqlite:///default.sqlite .
🔒 Secret keys will not be included in the database.


### Create Simple LLM Application

This example uses LlamaIndex which internally uses an OpenAI LLM.

In [5]:
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(
    html_to_text=True
).load_data(["http://paulgraham.com/worked.html"])
documents

[Document(id_='22b74553-bf09-465f-b4a3-2259e847ea3a', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='e63f93d1899b11fdd72d48ae487da624571439d97f87ca63fa5cc9753df3549e', text='![](https://s.turbifycdn.com/aah/paulgraham/essays-5.gif)|\n![](https://sep.turbifycdn.com/ca/Img/trans_1x1.gif)|\n[![](https://s.turbifycdn.com/aah/paulgraham/essays-6.gif)](index.html)  \n  \n| ![What I Worked On](https://s.turbifycdn.com/aah/paulgraham/what-i-worked-\non-4.gif)  \n  \nFebruary 2021  \n  \nBefore college the two main things I worked on, outside of school, were\nwriting and programming. I didn\'t write essays. I wrote what beginning writers\nwere supposed to write then, and probably still are: short stories. My stories\nwere awful. They had hardly any plot, just characters with strong feelings,\nwhich I imagined made them deep.  \n  \nThe first programs I tried writing were on the IBM 1401 that our school\ndistrict used for what

In [6]:

from llama_index import VectorStoreIndex, StorageContext, ServiceContext
from llama_index.embeddings import GeminiEmbedding
from llama_index.llms import Gemini
from llama_index.vector_stores import QdrantVectorStore
import qdrant_client
# from llama_index.vector_stores import ChromaVectorStore
# import chromadb
from llama_index import StorageContext

# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_gemini_3")

vector_store = QdrantVectorStore(client=client, collection_name="collection")

# # initialize client, setting path to save data
# db = chromadb.PersistentClient(path="./chroma_db")

# # create collection
# chroma_collection = db.get_or_create_collection("quickstart")

# # assign chroma as the vector_store to the context
# vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Using the embedding model to Gemini
embed_model = GeminiEmbedding(
    model_name="models/embedding-001", api_key=GOOGLE_API_KEY
)
service_context = ServiceContext.from_defaults(
    llm=Gemini(api_key=GOOGLE_API_KEY), embed_model=embed_model
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents,
    service_context=service_context,
    storage_context=storage_context,
    show_progress=True,
)

query_engine = index.as_query_engine()

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/23 [00:00<?, ?it/s]

### Send your first request

In [7]:
response = query_engine.query("What does the author say about their education?")
print(response)

The author initially studied philosophy in college but found it boring and switched to AI. They then pursued a PhD in computer science at Harvard while also taking art classes. After completing their PhD, they applied to art schools and was accepted into the BFA program at RISD. They also received an invitation to take the entrance exam at the Accademia di Belli Arti in Florence, which they passed. However, the author ultimately decided to attend RISD.


In [11]:
response = query_engine.query("Where did the author go to school?")
print(response)

Harvard, RISD, and Accademia di Belli Arti


In [12]:
response = query_engine.query("Who was the author's Harvard PhD advisor?")
print(response)

The provided context does not mention the author's Harvard PhD advisor, so I cannot answer this question.


In [13]:
response = query_engine.query("who was Tom Cheatham to the author?")
print(response)

Tom Cheatham was the author's advisor in the PhD program in computer science at Harvard.


In [14]:
response = query_engine.query("who is Tom? why is he in this story?")
print(response)

Tom Cheatham is a professor at Harvard University. He is mentioned in the story because the narrator was taking art classes at Harvard while pursuing a PhD in computer science. Tom Cheatham was the narrator's advisor, and he was very easy-going about the narrator's choice to take art classes. He never said anything about it, even though it was unusual for a PhD student to be taking art classes.


In [15]:
response = query_engine.query("what is this story about?  what are the most important things the author want the reader to learn?")
print(response)

This story is about the author's journey from being a philosophy student to becoming an AI researcher. The author initially believed that philosophy was the study of ultimate truths, but later realized that it was mostly concerned with edge cases that other fields ignored. The author then switched to AI, inspired by a novel and a documentary. However, the author later realized that the AI research at the time was a hoax, as it was limited to a small subset of formal language and could not truly understand natural language.

The most important things the author wants the reader to learn are:
- The limitations of AI research at the time, particularly the inability to truly understand natural language.
- The importance of having realistic expectations about what AI can achieve.
- The need for a new approach to AI research that goes beyond the traditional methods of representing concepts with explicit data structures.


## Initialize Feedback Function(s)

In [10]:
from trulens_eval import Feedback, TruLlama
from trulens_eval.feedback import Groundedness
from trulens_eval import LiteLLM
import numpy as np

# import litellm
# litellm.set_verbose=True

# Initialize provider class
gemini_provider = LiteLLM(model_engine="gemini-pro")

grounded = Groundedness(groundedness_provider=gemini_provider)

# Define a groundedness feedback function
f_groundedness = Feedback(grounded.groundedness_measure_with_cot_reasons).on(
    TruLlama.select_source_nodes().node.text.collect()
    ).on_output(
    ).aggregate(grounded.grounded_statements_aggregator)

# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(gemini_provider.relevance).on_input_output()

# Question/statement relevance between question and each context chunk.
f_qs_relevance = Feedback(gemini_provider.qs_relevance).on_input().on(
    TruLlama.select_source_nodes().node.text
    ).aggregate(np.mean)

✅ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In qs_relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In qs_relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .


## Instrument app for logging with TruLens

In [11]:
tru_query_engine_recorder = TruLlama(query_engine,
    app_id='LlamaIndex_App1',
    feedbacks=[f_groundedness, f_qa_relevance, f_qs_relevance])

In [12]:
# or as context manager
with tru_query_engine_recorder as recording:
    response = query_engine.query("Why did the author drop AI?")
    print(response)

The author dropped AI because they realized that the way AI was practiced at the time was a hoax. They believed that the whole way of doing AI, with explicit data structures representing concepts, was not going to work and would never lead to the creation of truly intelligent machines like Mike from the novel _The Moon is a Harsh Mistress_.
kwargs[caching]: False; litellm.cache: None

LiteLLM completion() model= gemini-pro; provider = vertex_ai

LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stop': None, 'max_tokens': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'model': 'gemini-pro', 'custom_llm_provider': 'vertex_ai', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None}

LiteLLM: Non-Default params passed to completion() {}
self.optional_params: {}
kwargs[caching]: False; litellm.cache: None

Lite

kwargs[caching]: False; litellm.cache: None

LiteLLM completion() model= gemini-pro; provider = vertex_ai

LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stop': None, 'max_tokens': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'model': 'gemini-pro', 'custom_llm_provider': 'vertex_ai', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None}

LiteLLM: Non-Default params passed to completion() {}
self.optional_params: {}
PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': {}, 'request_str': 'llm_model = GenerativeModel(gemini-pro)\nchat = llm_model.start_chat()\nchat.send_message(You are a INFORMATION OVERLAP classifier providing the overlap of information between a SOURCE and STATEMENT.\nFor every sentence in the statement, please answer with this template:\n\nTEMPLATE: \nStatement Sentence: <Sentence>,

## Explore in a Dashboard

In [22]:
tru.run_dashboard() # open a local streamlit app to explore

# tru.run_dashboard_in_jupyter() # open a streamlit app in the notebook

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://localhost:8501 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

In [17]:
tru.stop_dashboard(force=True) # stop if needed

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [31]:
tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,relevance,qs_relevance,groundedness_measure_with_cot_reasons,relevance_calls,qs_relevance_calls,groundedness_measure_with_cot_reasons_calls,latency,total_tokens,total_cost
0,LlamaIndex_App1,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_0a7f91df583c9a8587daf873ff21473e,"""Why did the author drop AI?""","""The author dropped AI because they realized t...",-,"{""record_id"": ""record_hash_0a7f91df583c9a8587d...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2023-12-21T17:52:43.762856"", ""...",2023-12-21T17:52:49.408424,1.0,1.0,1.0,[{'args': {'prompt': 'Why did the author drop ...,[{'args': {'question': 'Why did the author dro...,[{'args': {'source': ['Though I liked programm...,5,0,0.0
1,llamaindex_app,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_6d036cd509a02a1b4b80bedab7daf3f6,"""who is Mike?""","""Mike is an intelligent computer mentioned in ...",-,"{""record_id"": ""record_hash_6d036cd509a02a1b4b8...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-21T18:51:05.702451"", ""...",2023-12-21T18:51:07.735358,,,,,,,5,2135,0.003218
2,llamaindex_app,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_cbba8570f5fe7f0f16fe2628dc1f31c3,"""who is Robert?""","""Robert is a person mentioned in the context w...",-,"{""record_id"": ""record_hash_cbba8570f5fe7f0f16f...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-21T18:52:23.234527"", ""...",2023-12-21T18:52:25.731222,,,,,,,2,2099,0.003158
3,llamaindex_app,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_45aa04a5fc41c66e40b686b09ba5313b,"""what is Robert's full name?""","""The context information does not provide Robe...",-,"{""record_id"": ""record_hash_45aa04a5fc41c66e40b...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-21T18:52:56.970487"", ""...",2023-12-21T18:52:58.876144,,,,,,,2,2085,0.003123
4,llamaindex_appZZ,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_af1ee4ab87a425343e762d1346dd4610,"""what is required to create real AI?""","""Understanding natural language and bridging t...",-,"{""record_id"": ""record_hash_af1ee4ab87a425343e7...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-21T19:44:43.922278"", ""...",2023-12-21T19:44:46.196627,0.8,0.7,1.0,[{'args': {'prompt': 'what is required to crea...,[{'args': {'question': 'what is required to cr...,[{'args': {'source': ['By which I mean the sor...,5,2137,0.003211


In [28]:
def load_llamaindex_app():
    # from llama_index import VectorStoreIndex
    index = VectorStoreIndex.from_documents(documents)    
    query_engine = index.as_query_engine()

    return query_engine

app2 = load_llamaindex_app()
# tru_app2 = tru.Llama(
# Can't specify which Tru instance to use with tru.Llama.
tru_app2 = TruLlama(
    app2,
    tru=tru,
    app_id="llamaindex_appZZ",
    initial_app_loader=load_llamaindex_app,
    feedbacks=[f_groundedness, f_qa_relevance, f_qs_relevance]
)

In [29]:
tru.add_app(tru_app2)

In [30]:
from trulens_eval.appui import AppUI

aui = AppUI(
    app=tru_app2,
    
    app_selectors=[
    ],
    record_selectors=[
        "app.retriever.retrieve[0].rets[:].score",
        "app.retriever.retrieve[0].rets[:].node.text",
    ]
)
aui.widget

VBox(children=(HBox(children=(VBox(children=(VBox(children=(VBox(children=(HBox(children=(HTML(value='<b>human…

kwargs[caching]: False; litellm.cache: None

LiteLLM completion() model= gemini-pro; provider = vertex_ai

LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stop': None, 'max_tokens': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'model': 'gemini-pro', 'custom_llm_provider': 'vertex_ai', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None}

LiteLLM: Non-Default params passed to completion() {}
self.optional_params: {}
PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': {}, 'request_str': 'llm_model = GenerativeModel(gemini-pro)\nchat = llm_model.start_chat()\nchat.send_message(You are a RELEVANCE grader; providing the relevance of the given STATEMENT to the given QUESTION.\nRespond only as a number from 0 to 10 where 0 is the least relevant and 10 is the most relevant. \n\nA few additional scoring gui