In [1]:
!pip install llama-index datasets llama-index-callbacks-arize-phoenix arize-phoenix llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

And, let's log in to Hugging Face to use serverless Inference APIs.

In [12]:

from huggingface_hub import login

# Login to HF (make sure this completes successfully)
login()



VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Create a `QueryEngine` for retrieval augmented generation

### Setting up the persona database

We will be using personas from the [dvilasuero/finepersonas-v0.1-tiny dataset](https://huggingface.co/datasets/dvilasuero/finepersonas-v0.1-tiny). This dataset contains 5K personas that will be attending the party!

Let's load the dataset and store it as files in the `data` directory

Awesome, now we have a local directory with all the personas that will be attending the party, we can load and index!

### Loading and embedding persona documents

We will use the `SimpleDirectoryReader` to load the persona descriptions from the `data` directory. This will return a list of `Document` objects. 

In [2]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="../data/pdfs", recursive=True)
documents = reader.load_data()
reader = SimpleDirectoryReader(input_dir="../data/csv")
csv_documents = reader.load_data()
print(len(documents))
print(len(csv_documents))




54
1


Now we have a list of `Document` objects, we can use the `IngestionPipeline` to create nodes from the documents and prepare them for the `QueryEngine`. We will use the `SentenceSplitter` to split the documents into smaller chunks and the `HuggingFaceEmbedding` to embed the chunks.

In [4]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# Merge PDF documents and CSV documents
all_documents = documents + csv_documents
print(f"Total documents to process: {len(all_documents)}")

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

# run the pipeline sync or async with ALL documents (PDFs + CSV)
nodes = await pipeline.arun(documents=all_documents)
nodes


2025-11-11 16:45:07,277 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5


Total documents to process: 55


2025-11-11 16:45:10,350 - INFO - 1 prompt is loaded, with the key: query


[TextNode(id_='ff6c6bc1-2197-4f7a-8428-9a8118dafe47', embedding=[-0.033207911998033524, -0.05875500291585922, -0.026154305785894394, -0.004648653790354729, 0.010754942893981934, -0.04465577378869057, 0.062280815094709396, 0.036211732774972916, -0.018917420879006386, 0.009323179721832275, 0.024359703063964844, -0.01600527949631214, 0.055051546543836594, 0.0141454441472888, 0.04092458263039589, -0.0015298736980184913, 0.011020015925168991, -0.05334288626909256, 0.0005489602335728705, -0.022573716938495636, 0.02484746277332306, -0.0598822757601738, 0.02829374559223652, -0.04114038869738579, -0.011526383459568024, 0.05382836237549782, 0.0034343567676842213, -0.11891495436429977, -0.050054024904966354, -0.16778776049613953, 0.022245435044169426, -0.007317075505852699, 0.09168977290391922, 0.02738754265010357, -0.01886448636651039, 0.03932519257068634, -0.0002847245486918837, -0.011868719942867756, -0.014978978782892227, -0.02347179688513279, -0.0750996544957161, -0.07003649324178696, 0.0103

As, you can see, we have created a list of `Node` objects, which are just chunks of text from the original documents. Let's explore how we can add these nodes to a vector store.

### Storing and indexing documents

Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it.
In this case, we will use `Chroma` to store our documents.
Let's run the pipeline again with the vector store attached. 
The `IngestionPipeline` caches the operations so this should be fast!

In [5]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)

# Use the merged documents (PDFs + CSV)
nodes = await pipeline.arun(documents=all_documents)
len(nodes)


2025-11-11 16:45:50,858 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2025-11-11 16:45:51,035 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
2025-11-11 16:45:51,035 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
2025-11-11 16:45:54,478 - INFO - 1 prompt is loaded, with the key: query
2025-11-11 16:45:54,478 - INFO - 1 prompt is loaded, with the key: query


58

We can create a `VectorStoreIndex` from the vector store and use it to query the documents by passing the vector store and embedding model to the `from_vector_store()` method.

In [6]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

2025-11-11 16:46:15,851 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5


2025-11-11 16:46:17,719 - INFO - 1 prompt is loaded, with the key: query


We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

### Querying the index

Now that we have our index, we can use it to query the documents.
Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.


In [7]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine

llm = HuggingFaceInferenceAPI(
    model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
    temperature=0.7,
    max_tokens=100,
    token=hf_token,
    provider="auto"
)

query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response = query_engine.query(
    "Find a faculty member who has interests in machine learning and data science, especifically in deep learning and natural language processing."
)
response

Response(response='Jie Cao is a faculty member with interests in machine learning, natural language processing, dialogue and discourse, structured prediction, and trustworthy AI for education and healthcare, which aligns with the specified interests in deep learning and natural language processing.', source_nodes=[NodeWithScore(node=TextNode(id_='dade4e38-9845-434e-b133-21fee244611e', embedding=None, metadata={'file_path': '/Users/armanrad/Documents/Projects/Repos/radmanesh/dsai-recommender/notebooks/../data/csv/DSAI-Faculties.csv', 'file_name': 'DSAI-Faculties.csv', 'file_type': 'text/csv', 'file_size': 14394, 'creation_date': '2025-11-11', 'last_modified_date': '2025-11-11'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: Relate

In [18]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import nest_asyncio

nest_asyncio.apply()  # This is needed to run the query engine

llm = HuggingFaceInferenceAPI(
    model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
    temperature=0.7,
    max_tokens=100,
    token=hf_token,
    provider="auto"
)

# Load the sample proposal PDF
proposal_reader = SimpleDirectoryReader(input_files=["../data/sample_proposal.pdf"])
proposal_docs = proposal_reader.load_data()
proposal_text = "\n".join([doc.text for doc in proposal_docs])

# Create query with the proposal content
query_text = f"""Based on the following research proposal, find the most suitable faculty members:

{proposal_text}

Find faculty members whose research interests and expertise align with this proposal."""

query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
response = query_engine.query(query_text)
response

2025-11-11 17:18:13,654 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:13,755 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:13,755 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:14,176 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:14,176 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:14,235 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:14,235 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:14,412 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:14,412 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:20,220 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:20,220 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18:20,275 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 17:18

Response(response='Faculty members whose research interests and expertise align with the proposal on "Autonomous Coding Agents for Software Development" include Prof. Jie Cao and Prof. Charles Nicholson from the University of Oklahoma.\n\nTheir work in areas such as multi-agent systems, large language models, and reinforcement learning matches the objectives of developing a collaborative reasoning architecture and dynamic role assignment protocols for coding agents. Additionally, their involvement in AI-driven systems and methodologies supports the integration of retrieval-augmented generation and the evaluation processes outlined in the proposal.', source_nodes=[NodeWithScore(node=TextNode(id_='a8d9f145-042e-406e-baac-9ee51ab7803b', embedding=None, metadata={'page_label': '4', 'file_name': 'Jie Cao | Dialogue, NLP, ML.pdf', 'file_path': '/Users/armanrad/Documents/Projects/Repos/radmanesh/dsai-recommender/notebooks/../data/pdfs/Jie Cao | Dialogue, NLP, ML.pdf', 'file_type': 'applicatio

## Evaluation and observability

LlamaIndex provides **built-in evaluation tools to assess response quality.**
These evaluators leverage LLMs to analyze responses across different dimensions.
We can now check if the query is faithful to the original persona.

In [8]:
from llama_index.core.evaluation import FaithfulnessEvaluator

# query index
evaluator = FaithfulnessEvaluator(llm=llm)
eval_result = evaluator.evaluate_response(response=response)
eval_result.passing

True

If one of these LLM based evaluators does not give enough context, we can check the response using the Arize Phoenix tool, after creating an account at [LlamaTrace](https://llamatrace.com/login) and generating an API key.

In [16]:
import llama_index
import os
import dotenv
from arize.otel import register
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

# Load environment variables from .env file
dotenv.load_dotenv()

# Setup OTel via Arize's convenience function
tracer_provider = register(
    space_id=os.getenv("ARIZE_SPACE_ID"),
    api_key=os.getenv("ARIZE_API_KEY"),
    project_name="my-llamaindex-app" # Choose a project name
)

# Instrument LlamaIndex
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)



🔭 OpenTelemetry Tracing Details 🔭
|  Arize Project: my-llamaindex-app
|  Span Processor: BatchSpanProcessor
|  Collector Endpoint: otlp.arize.com
|  Transport: gRPC
|  Transport Headers: {'authorization': '****', 'api_key': '****', 'arize-space-id': '****', 'space_id': '****', 'arize-interface': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



Now, we can query the index and see the response in the Arize Phoenix tool.

In [17]:
response = query_engine.query(
    "What is the name of the someone that is interested in AI and techhnology?"
)
response

2025-11-11 16:56:52,785 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:52,837 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:52,837 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:53,015 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:53,015 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:53,053 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:53,053 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:53,190 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:53,190 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:56,023 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:56,023 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56:56,060 - ERROR - Failed to export span batch code: 401, reason: 
2025-11-11 16:56

Response(response='Ta l a y e h  R a z z a g h i is someone who is interested in AI and technology, focusing on developing data-driven models with applications in healthcare, cyber-physical systems, and smart manufacturing.', source_nodes=[NodeWithScore(node=TextNode(id_='c1c8e975-1165-4676-ad57-0c66f8943245', embedding=None, metadata={'page_label': '3', 'file_name': 'Dimitris Diochnos : home.pdf', 'file_path': '/Users/armanrad/Documents/Projects/Repos/radmanesh/dsai-recommender/notebooks/../data/pdfs/Dimitris Diochnos : home.pdf', 'file_type': 'application/pdf', 'file_size': 67249, 'creation_date': '2025-11-11', 'last_modified_date': '2025-11-11'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='9d4d0670-3

We can then go to the [LlamaTrace](https://llamatrace.com/login) and explore the process and response.

![arize-phoenix](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/llama-index/arize.png)    