# Components in LlamaIndex

This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.

![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)

Alfred is hosting a party and needs to be able to find relevant information on personas that will be attending the party. Therefore, we will use a `QueryEngine` to index and search through a database of personas.

## Let's install the dependencies

We will install the dependencies for this unit.

In [15]:
!pip install llama-index datasets llama-index-callbacks-arize-phoenix arize-phoenix llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


And, let's log in to Hugging Face to use serverless Inference APIs.

In [16]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Create a `QueryEngine` for retrieval augmented generation

### Setting up the persona database

We will be using personas from the [dvilasuero/finepersonas-v0.1-tiny dataset](https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny). This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory

In [17]:
from datasets import load_dataset
from pathlib import Path

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

Awesome, now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects. 

In [18]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`. We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceEmbedding` to embed the chunks.

In [19]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

# run the pipeline sync or async
nodes = await pipeline.arun(documents=documents[:10])
nodes

2025-08-25 16:58:05,486 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
2025-08-25 16:58:06,814 - INFO - 1 prompt is loaded, with the key: query
2025-08-25 16:58:06,875 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:06,928 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:06,985 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:07,040 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:07,095 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:07,145 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:07,198 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:07,258 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:07,315 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:07,365 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:

[TextNode(id_='c740fd9f-61a2-4aed-b627-cd3638786e08', embedding=[-0.0418381430208683, 0.004143779166042805, 0.03124382160604, -0.006319718435406685, 0.0038116266950964928, -0.032063160091638565, 0.020751167088747025, 0.022821927443146706, -0.05821932479739189, -0.07639218866825104, 0.009779002517461777, -0.02090546302497387, -0.022619878873229027, 0.04885818809270859, -0.003836399642750621, -0.001568099483847618, -0.014094950631260872, 0.08483941853046417, 0.004979403223842382, 0.004527567885816097, 0.01080176793038845, -0.05665913224220276, 0.06588956713676453, -0.022347483783960342, -0.04901585355401039, 0.030768534168601036, 0.032757386565208435, -0.01692076213657856, -0.00020077178487554193, -0.12324990332126617, -0.04046152904629707, 0.023721201345324516, 0.005328468047082424, 0.03003883920609951, 0.05875980854034424, 0.009523406624794006, -0.009102941490709782, 0.05340903624892235, -0.007393943145871162, 0.017748605459928513, 0.01183661725372076, -0.0034299117978662252, -0.022076

As, you can see, we have created a list of `Node` objects, which are just chunks of text from the original documents. Let's explore how we can add these nodes to a vector store.

### Storing and indexing documents

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it.
In this case, we will use `Chroma` to store our documents.
Let's run the pipeline again with the vector store attached. 
The `IngestionPipeline` caches the operations so this should be fast!

In [20]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)

nodes = await pipeline.arun(documents=documents[:10])
len(nodes)

2025-08-25 16:58:07,644 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
2025-08-25 16:58:08,865 - INFO - 1 prompt is loaded, with the key: query
2025-08-25 16:58:08,924 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:08,981 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,040 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,102 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,151 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,205 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,262 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,313 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,368 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:09,426 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:

10

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.

In [21]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

2025-08-25 16:58:09,747 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
2025-08-25 16:58:10,847 - INFO - 1 prompt is loaded, with the key: query


We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

### Querying the index

Now that we have our index, we can use it to query the documents.
Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.


In [22]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)
response

2025-08-25 16:58:11,068 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,127 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,188 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,243 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,302 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,441 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,496 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,559 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,617 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,677 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 16:58:11,734 - ERROR - Failed to export span batch code: 401, reason: 


HfHubHTTPError: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions (Request ID: Root=1-68acce63-7155b82b10cac8157297f01e;3342769f-a9e4-4d3b-b350-e88e6cadaa60)

## Evaluation and observability

LlamaIndex provides **built-in evaluation tools to assess response quality.**
These evaluators leverage LLMs to analyze responses across different dimensions.
We can now check if the query is faithful to the original persona.

In [None]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [23]:
import llama_index
import os

PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)




Now, we can query the index and see the response in the Arize Phoenix tool.

In [24]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

2025-08-25 17:01:24,432 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,489 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,546 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,603 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,665 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,796 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,857 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,915 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:24,974 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:25,037 - ERROR - Failed to export span batch code: 401, reason: 
2025-08-25 17:01:25,096 - ERROR - Failed to export span batch code: 401, reason: 


HfHubHTTPError: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions (Request ID: Root=1-68accf24-207091230d1d864d58af4b9e;c5209639-37fb-45f4-8c31-a432d605e984)

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.

![arize-phoenix](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/arize.png)    