# Components in LlamaIndex

This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.

![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)

Alfred is hosting a party and needs to be able to find relevant information on personas that will be attending the party. Therefore, we will use a `QueryEngine` to index and search through a database of personas.

## Let's install the dependencies

We will install the dependencies for this unit.

In [19]:
%pip install -U  \
    llama-index \
    datasets \
    llama-index-callbacks-arize-phoenix \
    llama-index-vector-stores-chroma \
    # llama-index-llms-huggingface-api \
    # llama-index-embeddings-huggingface-api \
    llama-index-embeddings-ollama \
    llama-index-llms-ollama \
    # 

Note: you may need to restart the kernel to use updated packages.


## Create a `QueryEngine` for retrieval augmented generation

### Setting up the persona database

We will be using personas from the [dvilasuero/finepersonas-v0.1-tiny dataset](https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny). This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory

In [2]:
from datasets import load_dataset
from pathlib import Path

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)
for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])

Awesome, now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects.

In [3]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()
len(documents)

5000

Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`. We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceInferenceAPIEmbedding` to embed the chunks.

In [4]:
!ollama pull qllama/bge-small-en-v1.5:f16 

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 6dbfaeaa2837... 100% ▕████████████████▏  67 MB                         [K
pulling ac02c0b5e300... 100% ▕████████████████▏  262 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


In [5]:
# from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.embeddings.ollama import OllamaEmbedding

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        # HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),
        OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")
    ]
)

# run the pipeline sync or async
nodes = await pipeline.arun(documents=documents[:10])
nodes

[TextNode(id_='0f4dcdcd-8bef-4833-8511-bfe055e25466', embedding=[-0.11368885636329651, 0.11777554452419281, 0.31672120094299316, -0.011371240019798279, 0.061826758086681366, -0.2176019549369812, 0.0729525089263916, 0.15565034747123718, -0.8124821186065674, -0.867414653301239, -0.15169256925582886, -0.4183608889579773, -0.4135409891605377, 0.31847918033599854, -0.0855540782213211, 0.028311938047409058, -0.04249300807714462, 0.8410949110984802, 0.16153240203857422, -0.22631394863128662, -0.2809866666793823, -0.43955379724502563, 0.5681192278862, -0.006019406020641327, -0.21419525146484375, 0.17594265937805176, 0.46561259031295776, -0.15278299152851105, 0.02525971829891205, -1.091186761856079, -0.32091477513313293, 0.08397220075130463, 0.05588915944099426, 0.11885492503643036, 0.4839937090873718, 0.4431559443473816, -0.12187705934047699, 0.7001597881317139, 0.20205941796302795, -0.15109406411647797, 0.010180503129959106, 0.053661853075027466, -0.2615140676498413, 0.5227049589157104, 0.614

As, you can see, we have created a list of `Node` objects, which are just chunks of text from the original documents. Let's explore how we can add these nodes to a vector store.

### Storing and indexing documents

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it.
In this case, we will use `Chroma` to store our documents.
Let's run the pipeline again with the vector store attached.
The `IngestionPipeline` caches the operations so this should be fast!

In [6]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        # HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),
        OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")

    ],
    vector_store=vector_store,
)

nodes = await pipeline.arun(documents=documents[:10])
len(nodes)

10

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.

In [7]:
from llama_index.core import VectorStoreIndex
# from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding

# embed_model = HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5")
embed_model = OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

### Querying the index

Now that we have our index, we can use it to query the documents.
Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.


In [8]:
# from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.llms.ollama import Ollama 
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine
# llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
llm = Ollama(model="myaniu/qwen2.5-1m:7b")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
    embed_model=OllamaEmbedding(model_name="qllama/bge-small-en-v1.5:f16")
)
response = query_engine.query(
    "Respond using a persona that describes author and travel experiences?"
)
response

Response(response="Hello! I have spent many years as an anthropologist in Cyprus, delving into its rich cultural tapestry. My travels across this island have allowed me to immerse myself in the lives of Cypriots, understanding their customs and ways deeply. Through my research and personal experiences, I have witnessed firsthand how history shapes society and culture here. If you're interested in learning more about Cyprus's unique heritage, feel free to ask!", source_nodes=[NodeWithScore(node=TextNode(id_='372496be-c985-453a-b534-bbbdbfdd02fa', embedding=None, metadata={'file_path': '/home/daimler/workspaces/agents-course-huggingface/2.2-llamaindex/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-03-11', 'last_modified_date': '2025-03-11'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_

## Evaluation and observability

LlamaIndex provides **built-in evaluation tools to assess response quality.**
These evaluators leverage LLMs to analyze responses across different dimensions.
We can now check if the query is faithful to the original persona.

In [9]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [20]:
pip install -q arize-phoenix


Note: you may need to restart the kernel to use updated packages.


In [21]:


import llama_index
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()


llama_index.core.set_global_handler(
    "arize_phoenix", 
    endpoint="https://llamatrace.com/v1/traces"
)


Now, we can query the index and see the response in the Arize Phoenix tool.

In [23]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

Response(response='The given context does not provide any information about someone interested in AI and technology. It only describes a cultural expert focused on Cypriot culture, history, and society. Therefore, there is no name available to answer the query based on the provided information.', source_nodes=[NodeWithScore(node=TextNode(id_='812a3c97-9036-42c6-9e4d-98c84eb113d3', embedding=None, metadata={'file_path': '/home/daimler/workspaces/agents-course-huggingface/2.2-llamaindex/data/persona_1.txt', 'file_name': 'persona_1.txt', 'file_type': 'text/plain', 'file_size': 266, 'creation_date': '2025-03-13', 'last_modified_date': '2025-03-13'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='19c80b98-6507-

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.