# Components in LlamaIndex

This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.

![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)

Alfred is hosting a party and needs to be able to find relevant information on personas that will be attending the party. Therefore, we will use a `QueryEngine` to index and search through a database of personas.

## Let's install the dependencies

We will install the dependencies for this unit.

In [1]:
%pip install -U -q \
    llama-index \
    datasets \
    llama-index-callbacks-arize-phoenix \
    llama-index-vector-stores-chroma \
    llama-index-llms-huggingface-api \
    llama-index-embeddings-huggingface-api \
    ipywidgets 

Note: you may need to restart the kernel to use updated packages.


And, let's log in to Hugging Face to use serverless Inference APIs.

In [2]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Create a `QueryEngine` for retrieval augmented generation

### Setting up the persona database

We will be using personas from the [dvilasuero/finepersonas-v0.1-tiny dataset](https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny). This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory

In [3]:
from datasets import load_dataset
from pathlib import Path

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

Awesome, now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects.

In [4]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`. We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceInferenceAPIEmbedding` to embed the chunks.

In [8]:
%pip install llama-index-embeddings-ollama

!ollama pull qllama/bge-small-en-v1.5:f16 

Note: you may need to restart the kernel to use updated packages.
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling 6dbfaeaa2837... 100% ▕████████████████▏  67 MB                         
pulling ac02c0b5e300... 100% ▕████████████████▏  262 B                         
verifying sha256 digest 
writing manifest 
success [?25h


In [9]:
# from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.embeddings.ollama import OllamaEmbedding

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        # HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),
        OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")
    ]
)

# run the pipeline sync or async
nodes = await pipeline.arun(documents=documents[:10])
nodes

[TextNode(id_='c9bd5497-692f-40cd-906c-dfb9b7d75e1e', embedding=[-0.1155925840139389, 0.11756384372711182, 0.31548774242401123, -0.010442741215229034, 0.062471162527799606, -0.21860802173614502, 0.07422478497028351, 0.1554882973432541, -0.8126384615898132, -0.8672986030578613, -0.15084494650363922, -0.4180627465248108, -0.4126204550266266, 0.3191247582435608, -0.08667249977588654, 0.028504356741905212, -0.043927405029535294, 0.8400282859802246, 0.16205215454101562, -0.22673261165618896, -0.2809491455554962, -0.43867677450180054, 0.568784773349762, -0.006498023867607117, -0.21258544921875, 0.1770147681236267, 0.46626436710357666, -0.1517353057861328, 0.02527783066034317, -1.0909438133239746, -0.32064780592918396, 0.08379112184047699, 0.05516097694635391, 0.1199626475572586, 0.4842248558998108, 0.4430839419364929, -0.12091891467571259, 0.7001889944076538, 0.20285490155220032, -0.15056031942367554, 0.009695194661617279, 0.05401137098670006, -0.2609137296676636, 0.5221009254455566, 0.61416

As, you can see, we have created a list of `Node` objects, which are just chunks of text from the original documents. Let's explore how we can add these nodes to a vector store.

### Storing and indexing documents

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it.
In this case, we will use `Chroma` to store our documents.
Let's run the pipeline again with the vector store attached.
The `IngestionPipeline` caches the operations so this should be fast!

In [10]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        # HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),
        OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")

    ],
    vector_store=vector_store,
)

nodes = await pipeline.arun(documents=documents[:10])
len(nodes)

10

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.

In [11]:
from llama_index.core import VectorStoreIndex
# from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding

# embed_model = HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5")
embed_model = OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

### Querying the index

Now that we have our index, we can use it to query the documents.
Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.


In [12]:
%pip install llama-index-llms-ollama

Note: you may need to restart the kernel to use updated packages.


In [13]:
# from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.llms.ollama import Ollama 
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine
# llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
llm = Ollama(model="myaniu/qwen2.5-1m:7b")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
    embed_model=OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")
)
response = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)
response

Response(response="I have spent many years as an anthropologist in Cyprus, immersing myself deeply into Cypriot culture, history, and society. My journey began by exploring the ancient sites around Cyprus, from the imposing walls of Kition to the grandiose architecture of Paphos. The experiences and stories I've collected over these years have shaped my understanding not just as a researcher but also as someone who has lived among the people.\n\nOne of the most profound aspects of Cypriot culture that fascinates me is their hospitality, which is unparalleled anywhere else. It's evident in every home visit, market exchange, or casual conversation on the streets. The warmth and genuine interest people have towards strangers make a lasting impression.\n\nThe traditional festivals are another highlight; they showcase the rich heritage through vibrant costumes, music, dance, and food. Each festival offers a unique glimpse into various aspects of Cypriot life. Witnessing how these traditions

## Evaluation and observability

LlamaIndex provides **built-in evaluation tools to assess response quality.**
These evaluators leverage LLMs to analyze responses across different dimensions.
We can now check if the query is faithful to the original persona.

In [14]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [15]:
import llama_index
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()


llama_index.core.set_global_handler(
    "arize_phoenix", 
    endpoint="https://llamatrace.com/v1/traces"
)


Now, we can query the index and see the response in the Arize Phoenix tool.

In [16]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

Response(response="The provided context does not mention anyone specifically who is interested in AI and technology. The description refers to an anthropologist or cultural expert focused on Cypriot culture, history, and society. There is no information given about any individual's interest in AI and technology.", source_nodes=[NodeWithScore(node=TextNode(id_='372496be-c985-453a-b534-bbbdbfdd02fa', embedding=None, metadata={'file_path': '/home/daimler/workspaces/agents-course-huggingface/2.2-llamaindex/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-03-11', 'last_modified_date': '2025-03-11'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.

![arize-phoenix](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/arize.png)    