This is a notebook detailing on how to use Llama index to build agent LLM applications

In [None]:
from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv('COHERE_API_KEY')

Setting the API Key

In [27]:
from llama_index.llms.cohere import Cohere

llm = Cohere(model="command-r-plus",api_key = api_key)
response = llm.complete("William Shakespeare is ")
print(response)

William Shakespeare is considered one of the greatest playwrights in English literature and is often called England's national poet. The Bard of Avon's legacy continues to influence literature and performance to this day.


In [39]:
from llama_index.core.base.llms.types import ChatMessage
messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="Explain about LLMs"),
]
question = messages[-1].content
chat_response = llm.chat(messages)
print(f"Q: {question}")
print(f"A: {chat_response.message.content}")

Q: Explain about LLMs
A: Large Language Models (LLMs) are a type of artificial intelligence (AI) that has been trained on massive amounts of text data from the internet, books, and other sources. They use deep learning techniques to generate and understand language at a level that is comparable to, or even surpasses, that of humans in some tasks. 

At their core, LLMs consist of neural networks with billions or even trillions of parameters. These parameters are adjusted during training as the model learns patterns and relationships in the data. The training process involves exposing the LLM to a diverse range of texts, such as books, articles, social media posts, and even code, to teach it the intricacies of language, including grammar, syntax, semantics, and context. 

One of the key advantages of LLMs is their ability to generate human-like responses or text based on a prompt. They can continue a story, answer questions, generate emails, and even create code, depending on their train

Creating the tools:

In [106]:

def multiply(a: float, b: float) -> float:
    """Multiply two numbers and returns the product"""
    print("Multiplying")
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and returns the sum"""
    return a + b

In [107]:
from llama_index.llms.google_genai import GoogleGenAI
from dotenv import load_dotenv
import os

load_dotenv()

api_key = os.getenv('GEMINI_API_KEY')

llm = GoogleGenAI(
    model="models/gemini-2.0-flash", 
    api_key=api_key
)

Now we initialize a agent using the prebuilt function of Llama index

In [115]:
from llama_index.core.agent import FunctionCallingAgent
from llama_index.core.tools import FunctionTool
tools = [
    FunctionTool.from_defaults(fn=multiply),
    FunctionTool.from_defaults(fn=add)
]
workflow = FunctionCallingAgent.from_tools(
    tools,
    llm=llm,
    system_prompt="You are an agent that performs basic mathematical operations only using the tools provided.",
)

Now we send in prompts

In [117]:
response = workflow.chat("What is 20+(2*4)?")
print(response)

Multiplying
20+(2*4) is 28.



Using pre existing tools from llama hub : 

In [128]:
from llama_index.tools.yahoo_finance import YahooFinanceToolSpec

finance_tools = YahooFinanceToolSpec().to_tool_list()
finance_tools.extend([multiply, add]) # adding our custom tools multiply and add
workflow = FunctionAgent(
    name="Agent",
    description="Useful for performing financial operations.",
    llm=GoogleGenAI(model="models/gemini-2.0-flash", api_key=api_key,temperature = 2),
    tools=finance_tools,
    system_prompt="You are a helpful assistant and if you cannot use tools allow the LLM model to use its creativity. Please mark all responses(by specifiying tools : ) given as generated by tools or model as well ",
)
response = await workflow.run(
    user_msg="What's the current stock price of NVIDIA and predict its growth over the next 2 years?"
)
print(response)

Current stock price of NVIDIA (NVDA) is $144.12. (tools: stock_basic_info)

Predicting stock growth is inherently speculative and depends on numerous factors that are impossible to foresee with certainty. Here's a blend of optimistic and cautious considerations for the next 2 years (Model):

*   **AI and Data Center Growth:** NVIDIA is a major player in AI, data centers, and high-performance computing. Continued expansion in these areas could drive significant revenue growth.
*   **Gaming Market:** The gaming market's cyclical nature could present fluctuations. However, new GPU technologies and cloud gaming initiatives could offset potential downturns.
*   **Automotive and Other Emerging Markets:** NVIDIA's automotive partnerships and its forays into robotics and other areas could create new revenue streams.
*   **Competition:** Competition in the GPU and AI chip markets is intense. Competitor advancements could impact NVIDIA's market share and profitability.
*   **Economic Factors:** 

Adding context to the workflows : 

In [131]:
from llama_index.core.workflow import Context
workflow = FunctionAgent(name = 'Contextual bot', llm = GoogleGenAI(model="models/gemini-2.0-flash", api_key=api_key,temperature = 2),system_prompt = "You are an assistant answer queries as provided")
ctx = Context(workflow)
response = await workflow.run(user_msg="Hi, my name is Abhinav!", ctx=ctx)
print(response)
response = await workflow.run(user_msg= "Can you provide me with today's weather in Bengaluru,India",ctx = ctx)
print(response)

Hi Abhinav! It's nice to meet you. How can I help you today?

Okay, I can help with that! Please give me a moment to access the latest weather information for Bengaluru, India.

Okay, here's what I'm seeing for Bengaluru, India as of right now (this is a real-time estimate, so it's subject to change):

*   **Temperature:** The current temperature is around 27°C.
*   **Condition:** Sunny
*   **Wind:** Calm Winds

I hope you have a good day Abhinav.



Now we will learn how to create a streaming output in cases when output is extremely long for this we will be using Tavily tool which takes some time to execute. Tavily tool is like a retriever in a RAG instead of providing it with a dataset Tavily extracts information from the internet. 

In [137]:
from llama_index.tools.tavily_research import TavilyToolSpec
import os
load_dotenv()
tavily_tool = TavilyToolSpec(api_key=os.getenv("TAVILY_API_KEY"))
llm = GoogleGenAI(model="models/gemini-2.0-flash", api_key=api_key,temperature = 2)
workflow = FunctionAgent(
    tools=tavily_tool.to_tool_list(),
    llm=llm,
    system_prompt="You're a helpful assistant that can search the web for information.",
)
from llama_index.core.agent.workflow import AgentStream
handler = workflow.run(user_msg="What's the networth of Elon Musk currently?")

async for event in handler.stream_events():
    if isinstance(event, AgentStream):
        print(event.delta, end="", flush=True)


As of May 2025, according to the Bloomberg Billionaires Index, Elon Musk's net worth is estimated to be US$381 billion. Forbes estimates his net worth to be US$424.7 billion.

Now we will create workflows with multiple agents

In [148]:
import os
load_dotenv()
tavily_tool = TavilyToolSpec(api_key=os.getenv("TAVILY_API_KEY"))
search_web = tavily_tool.to_tool_list()[0]
from llama_index.core.agent.workflow import AgentOutput
llm = GoogleGenAI(model="models/gemini-2.0-flash", api_key=api_key,temperature = 0)
async def record_notes(ctx: Context, notes: str, notes_title: str) -> str:
    """Useful for recording notes on a given topic."""
    current_state = await ctx.get("state")
    if "research_notes" not in current_state:
        current_state["research_notes"] = {}
    current_state["research_notes"][notes_title] = notes
    await ctx.set("state", current_state)
    return "Notes recorded."
async def write_report(ctx: Context, report_content: str) -> str:
    """Useful for writing a report on a given topic."""
    current_state = await ctx.get("state")
    current_state["report_content"] = report_content
    await ctx.set("state", current_state)
    return "Report written."


async def review_report(ctx: Context, review: str) -> str:
    """Useful for reviewing a report and providing feedback."""
    current_state = await ctx.get("state")
    current_state["review"] = review
    await ctx.set("state", current_state)
    return "Report reviewed."
from llama_index.core.agent.workflow import FunctionAgent

research_agent = FunctionAgent(
    name="ResearchAgent",
    description="Useful for searching the web for information on a given topic and recording notes on the topic.",
    system_prompt=(
        "You are the ResearchAgent that can search the web for information on a given topic and record notes on the topic. "
        "Once notes are recorded and you are satisfied, you should hand off control to the WriteAgent to write a report on the topic."
    ),
    llm=llm,
    tools=[search_web, record_notes],
    can_handoff_to=["WriteAgent"],
)
write_agent = FunctionAgent(
    name="WriteAgent",
    description="Useful for writing a report on a given topic.",
    system_prompt=(
        "You are the WriteAgent that can write a report on a given topic. "
        "Your report should be in a markdown format. The content should be grounded in the research notes. "
        "Once the report is written, you should get feedback at least once from the ReviewAgent."
    ),
    llm=llm,
    tools=[write_report],
    can_handoff_to=["ReviewAgent", "ResearchAgent"],
)

review_agent = FunctionAgent(
    name="ReviewAgent",
    description="Useful for reviewing a report and providing feedback.",
    system_prompt=(
        "You are the ReviewAgent that can review a report and provide feedback. "
        "Your feedback should either approve the current report or request changes for the WriteAgent to implement."
    ),
    llm=llm,
    tools=[review_report],
    can_handoff_to=["WriteAgent"],
)
agent_workflow = AgentWorkflow(
    agents=[research_agent, write_agent, review_agent],
    root_agent=research_agent.name,
    initial_state={
        "research_notes": {},
        "report_content": "Not written yet.",
        "review": "Review required.",
    },
)
handler = agent_workflow.run(
    user_msg="""
    Write me a report on the history of the web. Briefly describe the history
    of the world wide web, including the development of the internet and the
    development of the web, including 21st century developments.
"""
)
async for event in handler.stream_events():
    if isinstance(event, AgentOutput):
        if event.response.content:
            print("Output:", event.response.content)

Output: Okay, I have gathered some information on the history of the World Wide Web. Here are my notes:

**History of the World Wide Web**

*   **Inventor:** Tim Berners-Lee
*   **Inception:** March 12, 1989
*   **Location:** CERN
*   Berners-Lee published a summary of the WWW project on August 6, 1991, inviting collaborators.
*   The Web is a service that operates over the Internet.
*   The development of the World Wide Web was begun in 1989 by Tim Berners-Lee and his colleagues at CERN.
*   They created HTTP, which standardized communication between servers and clients.
*   On 30 April 1993, CERN put the World Wide Web software in the public domain.

I am now handing off to the WriteAgent to write a report on this topic.

Output: Okay, I will now write a report on the history of the web based on the research notes.

```markdown
# A Brief History of the World Wide Web

The World Wide Web, often mistakenly used as a synonym for the Internet, is a global information medium that users ca

Now, we will be creating a simple Workflow : 

In [None]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    
)

class MyWorkflow(Workflow):
    @step
    async def my_step(self, ev: StartEvent) -> StopEvent:
        # do something here
        return StopEvent(result="Hello, world!")


w = MyWorkflow(timeout=10, verbose=False)
result = await w.run()
print(result)

Hello, world!


Creating custom events :

In [None]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
)
class FirstEvent(Event):
    first_output: str


class SecondEvent(Event):
    second_output: str

class MyWorkflow(Workflow):
    @step
    async def step_one(self, ev: StartEvent) -> FirstEvent:
        print(ev.first_input)
        return FirstEvent(first_output="First step complete.")

    @step
    async def step_two(self, ev: FirstEvent) -> SecondEvent:
        print(ev.first_output)
        return SecondEvent(second_output="Second step complete.")

    @step
    async def step_three(self, ev: SecondEvent) -> StopEvent:
        print(ev.second_output)
        return StopEvent(result="Workflow complete.")


w = MyWorkflow(timeout=10, verbose=False)
result = await w.run(first_input="Start the workflow.")
print(result)


Start the workflow.
First step complete.
Second step complete.
Workflow complete.


Now we will create loops in workflow :

In [6]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
)
import random
class LoopEvent(Event):
    loop_output: str

class Myworkflow(Workflow):
    @step
    async def step(self,ev : StartEvent | LoopEvent)->StopEvent | LoopEvent:
        if isinstance(ev, StartEvent):
            print(ev.first_input)
        if random.randint(0,1) == 0:
            print("Bad thing")
            return LoopEvent(loop_output = "Back to first step")
        else:
            print("Good thing")
            return StopEvent(result = "Event over!")

w = Myworkflow(timeout = 20,verbose = False)
result = await w.run(first_input = "Let's start!!!")
print(result)

Let's start!!!
Bad thing
Bad thing
Good thing
Event over!


Now we will do branching in workflows :

In [9]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
)
import random
class BranchA1Event(Event):
    payload: str


class BranchA2Event(Event):
    payload: str


class BranchB1Event(Event):
    payload: str


class BranchB2Event(Event):
    payload: str


class BranchWorkflow(Workflow):
    @step
    async def start(self, ev: StartEvent) -> BranchA1Event | BranchB1Event:
        if random.randint(0, 1) == 0:
            print("Go to branch A")
            return BranchA1Event(payload="Branch A")
        else:
            print("Go to branch B")
            return BranchB1Event(payload="Branch B")

    @step
    async def step_a1(self, ev: BranchA1Event) -> BranchA2Event:
        print(ev.payload)
        return BranchA2Event(payload=ev.payload)

    @step
    async def step_b1(self, ev: BranchB1Event) -> BranchB2Event:
        print(ev.payload)
        return BranchB2Event(payload=ev.payload)

    @step
    async def step_a2(self, ev: BranchA2Event) -> StopEvent:
        print(ev.payload)
        return StopEvent(result="Branch A complete.")

    @step
    async def step_b2(self, ev: BranchB2Event) -> StopEvent:
        print(ev.payload)
        return StopEvent(result="Branch B complete.")
    
w = BranchWorkflow(timeout = 20)
result = await w.run()
print(f"Iteration 1: {result}")
result = await w.run()
print(f"Iteration 2: {result}")

Go to branch A
Branch A
Branch A
Iteration 1: Branch A complete.
Go to branch B
Branch B
Branch B
Iteration 2: Branch B complete.


Now we will look at running steps concurrently. Concurrency is when you run a function multiple times but instead of how in parallelism you would run the different instances simultaneously, you would split the resources and you would run each functional call piece wise. 

In [14]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
    Context,
)
import asyncio
import random
class StepTwoEvent(Event):
    query:str
class StepThreeEvent(Event):
    result :str

class ConcurrentFlow(Workflow):
    @step
    async def start(self, ctx: Context, ev: StartEvent) -> StepTwoEvent:
        ctx.send_event(StepTwoEvent(query="Query 1"))
        ctx.send_event(StepTwoEvent(query="Query 2"))
        ctx.send_event(StepTwoEvent(query="Query 3"))

    @step(num_workers=4)
    async def step_two(self, ctx: Context, ev: StepTwoEvent) -> StepThreeEvent:
        print("Running query ", ev.query)
        await asyncio.sleep(random.randint(1, 5))
        return StepThreeEvent(result=ev.query)

    @step
    async def step_three(self, ctx: Context, ev: StepThreeEvent) -> StopEvent:
        # wait until we receive 3 events
        result = ctx.collect_events(ev, [StepThreeEvent] * 3)
        if result is None:
            return None

        # do something with all 3 results together
        print(result)
        return StopEvent(result="Done")
c = ConcurrentFlow(timeout = 20)
result = await c.run()

Running query  Query 1
Running query  Query 2
Running query  Query 3
[StepThreeEvent(result='Query 1'), StepThreeEvent(result='Query 3'), StepThreeEvent(result='Query 2')]


Now, we will create sub workflows so that lets say your model involves two steps pre processing and then based on that sending a text you can use a main workflow and use that as a parent class for the sub workflow :

In [16]:
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
    Context,
)


class Step2Event(Event):
    query: str


class Step3Event(Event):
    query: str


class MainWorkflow(Workflow):
    @step
    async def start(self, ev: StartEvent) -> Step2Event:
        print("Starting up")
        return Step2Event(query=ev.query)

    @step
    async def step_two(self, ev: Step2Event) -> Step3Event:
        print("Sending an email")
        return Step3Event(query=ev.query)

    @step
    async def step_three(self, ev: Step3Event) -> StopEvent:
        print("Finishing up")
        return StopEvent(result=ev.query)

class Step2BEvent(Event):
    query: str


class CustomWorkflow(MainWorkflow):
    @step
    async def step_two(self, ev: Step2Event) -> Step2BEvent:
        print("Sending an email")
        return Step2BEvent(query=ev.query)

    @step
    async def step_two_b(self, ev: Step2BEvent) -> Step3Event:
        print("Also sending a text message")
        return Step3Event(query=ev.query)
w = CustomWorkflow(timeout=10, verbose=False)
result = await w.run(query="Initial query")
print(result)

Starting up
Sending an email
Also sending a text message
Finishing up
Initial query
