# RAG Agent using Llama-Index

[llama-index](https://developers.llamaindex.ai/) created originally as a frameowrk for RAG, indexing text using [Facebook's (Meta) llama model](https://www.llama.com/), but since then it has developed dramatically and expanded into Agents and more.

Let's use it to create a search agent, that also has the capability of creating visualizations and charts.

In [1]:
# %pip install -qU openai
# %pip install -qU llama-index
# %pip install -qU pydantic
# %pip install -qU sentence-transformers
# %pip install -qU llama-index-llms-azure-openai
# %pip install -qU llama-index-embeddings-azure-openai
# %pip install plotly

In [24]:
import json
import os
import re
import warnings
from typing import Optional

import matplotlib.pyplot as plt
import plotly.express as px
from datasets import load_dataset
from dotenv import load_dotenv

# load from llama_index its VectorStoreIndex, and Document representation
from llama_index.core import Document, Settings, VectorStoreIndex
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.llms.azure_openai import AzureOpenAI

warnings.filterwarnings("ignore")

##### Configure Azure OpenAI

In [3]:
# Load environment variables from .env file
load_dotenv()

llm_model_name = "gpt-5-mini"
llm_deploy_name = "gpt-5-mini"

aoai_api_key = os.getenv("OPENAI_API_KEY")
aoai_endpoint = "https://aa-dsa-training-msca.openai.azure.com/"
aoai_api_version = "2024-12-01-preview"

llm = AzureOpenAI(
    engine=llm_deploy_name,
    api_key=aoai_api_key,
    azure_endpoint=aoai_endpoint,
    api_version=aoai_api_version,
    temperature=1,
)

llm_embedding_model_name = "text-embedding-3-large"

Settings.llm = llm

Settings.embed_model = AzureOpenAIEmbedding(
    model=llm_embedding_model_name,
    deployment_name=llm_embedding_model_name,
    api_key=aoai_api_key,
    azure_endpoint=aoai_endpoint,
    api_version=aoai_api_version,
)

### Dataset

Let's spice it up. We'll use a different dataset this time:  
**Dataset: _Financial QA 10K_**  
**Source:** [Hugging Face](https://huggingface.co/datasets/virattt/financial-qa-10K)

In [4]:
financial_qa_ds = load_dataset("virattt/financial-qa-10K", split="train")

Each row contains:

* Question
* Answer
* Context - text paragraph that contains the information required to answer the question; core of our RAG system.
* Ticker - symbol of the company the context belongs to.
* Filing - filing identifier (which year the data came from).

In [5]:
financial_qa_ds[0]

{'question': 'What area did NVIDIA initially focus on before expanding to other computationally intensive fields?',
 'answer': 'NVIDIA initially focused on PC graphics.',
 'context': 'Since our original focus on PC graphics, we have expanded to several other large and important computationally intensive fields.',
 'ticker': 'NVDA',
 'filing': '2023_10K'}

To make it easy on our poor laptops, we'll create a subset of the dataset:

In [6]:
financial_qa_ds_small = financial_qa_ds.select(range(300))

Next, we will convert these contexts into [LlamaIndex Documents](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/).  
In llama-index, a **Document** is a generic abstraction, a way to represent any text-document: a PDF, API output, or text that was extracted from the search.

Documents in llama-index are rather smart. They can hold metadata, such as the creation/altering date, the information source URL or who the author is.  
They can also store the relations to other Document chunks, such as those who come before or after it.

In [7]:
documents = [Document(text=row["context"]) for row in financial_qa_ds_small]
documents[0].text

'Since our original focus on PC graphics, we have expanded to several other large and important computationally intensive fields.'

## Build a Vector Index & Create a Query Engine

LlamaIndex splits documents into chunks, embeds each of them, and then stores the embeddings in a vector index.  
Then, the vector index is wrapped into a query engine.

In [8]:
index = VectorStoreIndex.from_documents(documents, show_progress=True)
query_engine = index.as_query_engine()

Parsing nodes:   0%|          | 0/300 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/300 [00:00<?, ?it/s]

## Test the simple RAG

In [9]:
def finance_agent(question):
    """A simple retrieval agent."""
    return query_engine.query(question).response

In [10]:
finance_agent("What risks does the company mention?")

'Risks mentioned include:\n\n- Adverse global and regional economic conditions (slow growth, recession) that reduce demand.\n- High unemployment, inflation, tighter credit and higher interest rates.\n- Currency fluctuations.\n- Changes in fiscal or monetary policy and financial market volatility.\n- Declines in consumer income or asset values that reduce consumer confidence and spending.\n- Financial stress or failure among suppliers, contract manufacturers, logistics providers, distributors, carriers and developers (including inability to obtain credit or insolvency).\n- Increased credit and collectibility risk on trade receivables and failure of derivative counterparties or other financial institutions.\n- Limitations on the company’s ability to issue new debt, reduced liquidity, and declines in fair value of financial instruments.\n- Political events, trade and international disputes, war, terrorism, natural disasters, public health issues, industrial accidents and other business in

In [13]:
row = financial_qa_ds_small[42]
true_answer = row["answer"]
question = row["question"]

In [14]:
predicted_answer = finance_agent(question)

In [18]:
print("Query:", question)
print("Golden Truth:", true_answer)
print("Model Prediction", predicted_answer)

Query: What was the overall turnover rate at the company in fiscal year 2023?
Golden Truth: The overall turnover rate at the company in fiscal year 2023 was 5.3%.
Model Prediction The overall turnover rate in fiscal year 2023 was 5.3%.


## Building an agent

Let's enrich our agent with some tools

In [20]:
import plotly.express as px
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool, QueryEngineTool

### 1. RAG Tool

In [21]:
financial_rag = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="financial_rag",
    description="Retrieve relevant financial context from the dataset.",
)

### 2. Visualization tool

We will get from the LLM a JSON in this format: 

`{`

`  "chart_type": "...",`

`  "labels": [...],`

`  "values": [...]`

`}`

The tool will then clean the numbers and display the suitable Plotly chart.

In [25]:
def visualize_numbers(
    chart_type: str,
    labels: list,
    values: list,
    title: Optional[str] = None,
    x_label: Optional[str] = None,
    y_label: Optional[str] = None,
):
    """Plot numeric data with Plotly."""
    clean_values = []
    for v in values:
        try:
            clean_values.append(float(str(v).replace("%", "").replace(",", "")))
        except Exception as ex:
            print(ex)
            pass

    if len(labels) != len(clean_values):
        labels = [f"Value {i + 1}" for i in range(len(clean_values))]

    # choose chart
    if chart_type == "comparison":
        fig = px.bar(x=labels[:2], y=clean_values[:2], title=title)
    elif chart_type == "pie":
        fig = px.pie(names=labels, values=clean_values, title=title)
    elif chart_type == "line":
        fig = px.line(x=labels, y=clean_values, markers=True, title=title)
    else:
        fig = px.bar(x=labels, y=clean_values, title=title)

    fig.update_layout(xaxis_title=x_label, yaxis_title=y_label)

    fig.show()
    return "Visualization done."

In [26]:
visualize_tool = FunctionTool.from_defaults(
    name="visualize_numbers",
    fn=visualize_numbers,
    description="Visualize numeric values using Plotly.",
)

## Multi-Tool Agent

In [27]:
agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[financial_rag, visualize_tool],
    llm=Settings.llm,
    verbose=True,
    system_prompt="""
You are a financial analysis agent.

You are a financial analysis agent with access to tools.

Your job is to:
- understand the user's question,
- retrieve relevant financial context when needed,
- extract useful numeric information,
- choose the most suitable visualization type (e.g. bar chart for categories, line chart for time series, pie chart for percentages),
- and decide whether visualization adds value.

You may use tools, when they genuinely help answer the question.
If a tool is unnecessary, answer directly.

You decide autonomously whether to use:
- the retrieval tool (financial_rag),
- the visualization tool (visualize_numbers),
- or no tool at all.

Use your own reasoning to judge whether a visualization is helpful. 
If it would not help, do not visualize.

If you choose to visualize:
- produce a JSON object with "chart_type", "labels", "values", "title",
- you may optionally include "x_label" and "y_label" in the JSON object 
if you think they improve the clarity of the visualization.
- then call the visualization tool.

If visualization is not useful:
- produce no chart

After the JSON (whether you visualized or not),
give the final written answer.
""",
)

In [28]:
response = await agent.run(
    "How did the revenue from automotive regulatory credits change in 2023 compared to 2022?"
)

print(response)

The revenue from automotive regulatory credits rose by $14 million in 2023 versus 2022, an increase of about 1%.


In [29]:
response = await agent.run(
    "Show me how did the revenue from automotive regulatory credits change in 2023 compared to 2022?"
)

print(response)

Short answer: Automotive regulatory credits revenue rose by $14 million, an increase of about 1% in 2023 versus 2022 (per the financial dataset).

Interpretation: that is a very small change — essentially flat year‑over‑year — so regulatory credits remained a minor and stable contributor to overall revenue in 2023.

Would you like:
- a chart showing the two years side-by-side (I can retry the plot), or
- the regulatory‑credits amount as a percentage of total revenue for each year?


In [30]:
response = await agent.run(
    "What percentage of the NVIDIA global workforce was female at the end of fiscal year 2023?"
)

print(response)

19% — at the end of NVIDIA’s fiscal year 2023, 19% of the global workforce was female.

Would you like a breakdown by region, job level, or a trend over multiple years?


In [31]:
response = await agent.run(
    "Visualize the percentage of the NVIDIA global workforce that was female at the end of fiscal year 2023 to all workforce?"
)

print(response)

I created a pie-chart showing female vs. all other employees in NVIDIA’s global workforce at the end of fiscal year 2023.

Summary:
- Female: 19%
- All other employees: 81%

If you’d like, I can:
- provide the visual in a different file format (PNG, SVG),
- compare this to prior years or industry peers (if you want those, tell me which companies or years),
- break down females by role/region if data is available.


In [0]:
response = await agent.run(
    "Visualize the percentage of Black or African American and Hispanic or Latino NVIDIA employees in the workforce in the United States to all employees at the end of 2023?"
)

print(response)

Here's what I found and how it’s visualized:

- NVIDIA does not disclose separate percentages for Black or African American and Hispanic or Latino employees within the United States. They report a combined share for these two groups in the US.
- The combined share of Black or African American and Hispanic or Latino employees in the United States is 6% of NVIDIA’s US workforce.

Visualization:
- Type: Pie chart
- Segments:
  - Black or African American & Hispanic/Latino (US): 6%
  - Other US Employees: 94%

Interpretation:
- As of the end of 2023, 6% of NVIDIA’s US employees belong to either Black or African American or Hispanic or Latino categories when combined, with the remaining 94% in other demographic groups. Note that the data do not separate the two groups individually in NVIDIA’s disclosures. If you need more granular breakdown, we would need NVIDIA’s internal diversity data or a more detailed external report that specifies each group separately.


[Trace(request_id=tr-61ce0fa0475b42b98a3c4b5fc92cce10), Trace(request_id=tr-0dd3abcd1c5843c2928d5aa0f6343f5d), Trace(request_id=tr-9d37ae7aa25445cfbc449bcdc085c185), Trace(request_id=tr-5d66c956a2a44d41986428da0d7041d5), Trace(request_id=tr-95e034a549ef4998967ef21584261deb)]

In [0]:
response = await agent.run(
    "What amount did NVIDIA record as an acquisition termination cost in fiscal year 2023?"
)

print(response)

NVIDIA recorded $1.35 billion as an acquisition termination cost in fiscal year 2023.


[Trace(request_id=tr-e5606f5c89834a6fb0549c708da9f91b), Trace(request_id=tr-c9ae1ee9110e496bad43951b9de659b3), Trace(request_id=tr-bea02cdc0d264321a5ca9078251385f0), Trace(request_id=tr-e712f098fc4e4e2fa40670da49cd2126)]

In [0]:
response = await agent.run(
    "What is the anticipated total capital investment range for fiscal year 2024 related to property and equipment?"
)

print(response)

Approximately $1.10 billion to $1.30 billion (for fiscal year 2024, related to property and equipment).


[Trace(request_id=tr-09c7c6015d4b4bc9bcf24fe977757009), Trace(request_id=tr-411e37aca6f1462f8ec0cc4baa7c8be3), Trace(request_id=tr-566602d318104b3186fa4471d625e42b), Trace(request_id=tr-209686e6c3094dae9e56526f0dd97819)]

In [0]:
response = await agent.run(
    "Show me how did the valuation allowance of NVIDIA increase from 2022 to 2023"
)

print(response)

Here’s how the NVIDIA valuation allowance changed from 2022 to 2023, based on the year-end amounts:

- 2022 year-end valuation allowance: $0.907 billion
- 2023 year-end valuation allowance: $1.48 billion

Change: +$0.573 billion (roughly a 63% increase)

Context and interpretation:
- The increase reflects a larger valuation allowance recorded at year-end 2023 related to deferred tax assets (capital loss carryforwards and other state/foreign deferred tax assets) that management did not expect to realize.
- The change in the effective tax rate supports this: 2022 showed an 8% tax expense, while 2023 shows a 50% tax benefit largely due to the release of the valuation allowance on U.S. federal and certain state deferred tax assets. In other words, management’s outlook improved enough to release some of the allowance for realization, but the overall year-end balance still increased to $1.48B due to the remaining unrealized portion.

Chart:
- A bar chart was generated showing the year-end va

[Trace(request_id=tr-e58316418ac3412e9c6aacfdcfce1d31), Trace(request_id=tr-a47b45c14a6345bfb1e34b5755637eb5), Trace(request_id=tr-63f4b72434274c32ada6e569636fa6d3), Trace(request_id=tr-614c594b16f540c9aac1ab6d6912b89c), Trace(request_id=tr-9f0323b661734c139543e4340a03c5ca)]

In [0]:
response = await agent.run(
    "Show me how did cash, cash equivalents, and marketable securities of NVIDIA change from the end of fiscal year 2022 to January 29, 2023"
)

print(response)

Here’s the change in NVIDIA’s cash, cash equivalents, and marketable securities between the end of fiscal year 2022 and January 29, 2023:

- End of fiscal year 2022: $21.21 billion
- January 29, 2023: $13.30 billion

What this means:
- The cash-related balance decreased by about $7.91 billion.
- This decline could be due to factors such as share repurchases, debt repayment, or investing activity, but to confirm the exact causes we’d need the cash flow statement and balance sheet notes for that period.

Visualization summary:
- A bar chart was generated comparing the two periods:
  - End of fiscal year 2022: 21.21
  - January 29, 2023: 13.30
- Title: NVIDIA: Cash, cash equivalents, and marketable securities
- Y-axis: USD billions

If you’d like, I can pull the exact cash flow movements (net cash used in investing, financing, and operations) from NVIDIA’s quarterly reports for a more detailed interpretation.


[Trace(request_id=tr-d528d80bab5644d5a93bb4873b31ab2d), Trace(request_id=tr-4b413deaf26e4a9392373160c6f32c43), Trace(request_id=tr-933249309e9a46ba8ecfd1d0e9e60bb9), Trace(request_id=tr-563008247ea5465f8c4286cf3bb037e1), Trace(request_id=tr-3b0309be89264a339e9a383a8f9fb8ec), Trace(request_id=tr-3a8d3a1f18724467b974a6b615db3262), Trace(request_id=tr-b097d3102688401d8b13e6bb43641d21), Trace(request_id=tr-c7c4b579359241f29b7b6a2d7f6627f9), Trace(request_id=tr-22a3d74853304365ad37e935b862194b), Trace(request_id=tr-0fb6172ad0e74f118a54938acb02dccc)]

In [0]:
response = await agent.run("Visualize the total gross margin of Apple for years 2021, 2022, 2023")

print(response)

Here is the visualization of Apple’s total gross margin for 2021–2023.

Chart: Apple Gross Margin by Year
- 2021: $152,836 million
- 2022: $170,782 million
- 2023: $169,148 million

If you’d like, I can also add a percentage gross margin (gross margin as a percent of revenue) for each year, or compare to revenue to show margin trends.


[Trace(request_id=tr-0c73534816464f36a8db3901c68d4324), Trace(request_id=tr-9c18167b834d42c1adaa2f29346abed1), Trace(request_id=tr-6acf234ba4264f49999fee8ae55af6f0), Trace(request_id=tr-ae7884f60ada4767bc04e36d94546715), Trace(request_id=tr-39723d430f1a4501a1f87b0aa27d8f0e)]

## Your Turn

Let's add guard-rails to the model.  
Think of guidelines and ensure the answer stands in these guidelines.