# Level 3: Advanced Agent Capabilities with Prompt Chaining and ReAct Agent

Building on the simple agent introduced in [Level 1](Level1_simple_agent_with_websearch.ipynb), this tutorial continues the agent-focused section of our series by introducing techniques that make the agent smarter and more autonomous: **Prompt Chaining** and the **ReAct (Reasoning + Acting) framework**. These approaches allow the agent to complete multi-step tasks, dynamically choose tools, and adjust its behavior based on context.

- **Prompt Chaining** connects multiple prompts into a coherent sequence, allowing the agent to maintain context and perform multi-step reasoning across tool invocations. 
- **ReAct Agent** combines reasoning and acting steps in a loop, enabling the agent to make decisions, use tools dynamically, and adapt based on intermediate results. 

## Overview

In this notebook, you'll explore three agent configurations:
1. **Simple Agent (Baseline)** – Uses a single web search tool.
2. **Prompt Chaining** – Performs structured, multi-step reasoning by chaining prompts and responses.
3. **ReAct Agent** – Dynamically plans and executes actions using a loop of reasoning and tool use.


## Prerequisites

Before starting this notebook, ensure that you have:
- Followed the instructions in the [Setup Guide](./Level1_getting_started_with_Llama_Stack.ipynb) notebook. 
- A Tavily API key. It is critical for this notebook to run correctly. You can register for one at [https://tavily.com/](https://tavily.com/).

Add your TAVILY_SEARCH_API_KEY="tvly-dev-your-key" to the `env.example` file.

## 1. Setting Up this Notebook
We will start with a few imports needed for this demo only.

In [1]:
from llama_stack_client import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.lib.agents.react.agent import ReActAgent
from llama_stack_client.lib.agents.react.tool_parser import ReActOutput
import sys
sys.path.append('.') 
from src.client_tools import get_location

Next, we will initialize our environment as described in detail in our ["Getting Started" notebook](./Level1_getting_started_with_Llama_Stack.ipynb). Please refer to it for additional explanations.

In [2]:
# for accessing the environment variables
import os
from dotenv import load_dotenv
load_dotenv(override=True)

# for communication with Llama Stack
from llama_stack_client import LlamaStackClient

# pretty print of the results returned from the model/agent
import sys
sys.path.append('.')  
from src.utils import step_printer
from termcolor import cprint

base_url = os.getenv("REMOTE_BASE_URL")


# Tavily search API key is required for some of our demos and must be provided to the client upon initialization.
# We will cover it in the agentic demos that use the respective tool. Please ignore this parameter for all other demos.
tavily_search_api_key = os.getenv("TAVILY_SEARCH_API_KEY")
if len(tavily_search_api_key) != 41:
    raise ValueError("Sorry your Tavily Search key seems invalid?")
else:
    provider_data = {"tavily_search_api_key": tavily_search_api_key}


client = LlamaStackClient(
    base_url=base_url,
    provider_data=provider_data
)
    
print(f"Connected to Llama Stack server")

# model_id for the model you wish to use that is configured with the Llama Stack server
model_id = "llama3-2-3b" # "deepseek-r1-0528-qwen3-8b-bnb-4bit"

temperature = float(os.getenv("TEMPERATURE", 0.0))
if temperature > 0.0:
    top_p = float(os.getenv("TOP_P", 0.95))
    strategy = {"type": "top_p", "temperature": temperature, "top_p": top_p}
else:
    strategy = {"type": "greedy"}

max_tokens = 5000

# sampling_params will later be used to pass the parameters to Llama Stack Agents/Inference APIs
sampling_params = {
    "strategy": strategy,
    "max_tokens": max_tokens,
}

stream = "True"

print(f"Inference Parameters:\n\tModel: {model_id}\n\tSampling Parameters: {sampling_params}\n\tstream: {stream}")

Connected to Llama Stack server
Inference Parameters:
	Model: llama3-2-3b
	Sampling Parameters: {'strategy': {'type': 'greedy'}, 'max_tokens': 5000}
	stream: True


## 2. Simple Agent (Baseline)
Same agent setup as [Level 2 notebook](Level2_simple_agent_with_websearch.ipynb). 

In [3]:
agent = Agent(
    client, 
    model=model_id,
    instructions="""You are a helpful websearch assistant. When you are asked to search the latest you must use a tool. 
            Whenever a tool is called, be sure return the response in a friendly and helpful tone.
            """ ,
    tools=["builtin::websearch"],
    sampling_params=sampling_params
)
user_prompts = [
    "Are there any immediate weather-related risks in my area that could disrupt network connectivity or system availability?",
]
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    session_id = agent.create_session("web-session")
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


[34mProcessing user query: Are there any immediate weather-related risks in my area that could disrupt network connectivity or system availability?[0m
[33minference> [0m[97m[0m
[32mtool_execution> Tool:brave_search Args:{'query': 'weather-related risks in my area'}[0m
[33minference> [0m[33mBased[0m[33m on[0m[33m the[0m[33m search[0m[33m results[0m[33m,[0m[33m it[0m[33m appears[0m[33m that[0m[33m there[0m[33m are[0m[33m no[0m[33m immediate[0m[33m weather[0m[33m-related[0m[33m risks[0m[33m in[0m[33m your[0m[33m area[0m[33m that[0m[33m could[0m[33m disrupt[0m[33m network[0m[33m connectivity[0m[33m or[0m[33m system[0m[33m availability[0m[33m.[0m[33m However[0m[33m,[0m[33m it[0m[33m's[0m[33m always[0m[33m a[0m[33m good[0m[33m idea[0m[33m to[0m[33m check[0m[33m the[0m[33m current[0m[33m weather[0m[33m conditions[0m[33m and[0m[33m forecasts[0m[33m to[0m[33m ensure[0m[33m that[0m[33m you[

### Output Analysis

In this example, since the agent is unaware of the users location, it hallucinates one and generates an incorrect search query. This misidentification leads to inaccurate information about potential weather-related risks.

This is where Prompt Chaining comes in. Prompt chaining allows the agent to:
1. Maintain context across multiple queries
2. Chain multiple tools together
3. Use previous interactions to inform current decisions

Let’s see how prompt chaining can improve the accuracy of the response.

## 3. Prompt chaining with websearch tool and client tool

In this section, we demonstrate a more sophisticated use case that combines the use of two tools: location detection and web search.

1. **Automatic Location Detection**: Use the `get_location` client tool (have a look in the src folder `client_tools.py`) to automatically determine the user's current location.
2. **Contextual Search**: Leverage the detected location to formulate the correct websearch query.

For example, when a user asks "Are there any weather-related risks in my area that could disrupt network connectivity or system availability?", the agent will:
- First detect the user's current location using `get_location`.
- Then use that location to search for nearby weather-related risks.
- Finally, present a comprehensive response.

This demonstrates how the builtin websearch tool and custom client tools can work together to provide intelligent, context-aware responses without requiring explicit location input from the user.

In [4]:
agent = Agent(
    client, 
    model=model_id,
    instructions="""You are a helpful assistant. 
    When a user asks about their location, you MUST use the get_location tool. When you are asked to search the latest news, you MUST use the websearch tool.
    """ ,
    tools=[get_location, "builtin::websearch"],
    sampling_params=sampling_params
)
user_prompts = [
    "Where am I?",
    "Are there any immediate weather-related risks in my area that could disrupt network connectivity or system availability?"
]
session_id = agent.create_session("prompt-chaining-session")  # for prompt chaining, queries must share the same session_id.
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream
    )

    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


[34mProcessing user query: Where am I?[0m
[33minference> [0m[97m[0m
[30m[0m[32mtool_execution> Tool:get_location Args:{'query': 'location'}[0m
[32mtool_execution> Tool:get_location Response:"Your current location is: Columbus, Ohio, US"[0m
[33minference> [0m[97m[0m
[32mtool_execution> Tool:get_location Args:{'query': 'location'}[0m
[32mtool_execution> Tool:get_location Response:"Your current location is: Columbus, Ohio, US"[0m
[33minference> [0m[97m[0m
[32mtool_execution> Tool:brave_search Args:{'query': 'latest news'}[0m
[32mtool_execution> Tool:brave_search Response:{"query": "latest news", "top_k": [{"title": "Breaking News, Latest News and Videos | CNN", "url": "https://www.cnn.com/", "content": "View the latest news and breaking news today for U.S., world, weather, entertainment, politics and health at CNN.com.", "score": 0.522617, "raw_content": null}, {"title": "Fox News - Breaking News Updates | Latest News Headlines | Photos ...", "url": "https://ww

### ReAct Agent with websearch tool and client tool

This section demonstrates the ReAct (Reasoning and Acting) framework in action.

Here is a walkthrough of how the ReAct agent will tackle this same "weather near me" problem:

When asked "Are there any weather-related risks in my area that could disrupt network connectivity or system availability?", the agent will:

1. **Reason** that it needs to get location information first.
2. **Act** by calling the `get_location` client tool.
3. **Observe** the location result.
4. **Reason** that it now needs to search for weather in that location.
5. **Act** by calling the `websearch` tool with observed location.
6. **Observe** and processes the search results into a final answer. 

Unlike prompt chaining which follows fixed steps, ReAct dynamically breaks down tasks and adapts its approach based on the results of each step. This makes it more flexible and capable of handling complex, real-world queries effectively.

We are going to switch LLM models here from the LLM llama32-3b model to the more powerful reasoning model deepseek-r1-0528-qwen3-8b-bnb-4bit that is able to more accurately reason about the tools it will use. Resoning models output their \<thought\> process first.

In [6]:
model_id = "deepseek-r1-0528-qwen3-8b-bnb-4bit" # "llama3-2-3b"

agent = ReActAgent(
            client=client,
            model=model_id,
            tools=[get_location, "builtin::websearch"],
            response_format={
                "type": "json_schema",
                "json_schema": ReActOutput.model_json_schema(),
            },
            sampling_params=sampling_params,
        )
user_prompts = [
    "Are there any immediate weather-related risks in my area that could disrupt network connectivity or system availability?"
]
session_id = agent.create_session("web-session")
for prompt in user_prompts:
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 

[34mProcessing user query: Are there any immediate weather-related risks in my area that could disrupt network connectivity or system availability?[0m
[33minference> [0m[33m{
[0m[33m   [0m[33m "[0m[33mthought[0m[33m":[0m[33m "[0m[33mI[0m[33m need[0m[33m to[0m[33m determine[0m[33m if[0m[33m there[0m[33m are[0m[33m any[0m[33m immediate[0m[33m weather[0m[33m-related[0m[33m risks[0m[33m in[0m[33m the[0m[33m user[0m[33m's[0m[33m area[0m[33m.[0m[33m However[0m[33m,[0m[33m I[0m[33m don[0m[33m't[0m[33m have[0m[33m the[0m[33m user[0m[33m's[0m[33m location[0m[33m.[0m[33m I[0m[33m can[0m[33m use[0m[33m the[0m[33m get[0m[33m_location[0m[33m tool[0m[33m to[0m[33m get[0m[33m the[0m[33m user[0m[33m's[0m[33m location[0m[33m,[0m[33m but[0m[33m the[0m[33m tool[0m[33m description[0m[33m says[0m[33m '[0m[33mProvide[0m[33m the[0m[33m location[0m[33m upon[0m[33m request[0m[33m.'[0m[

## Key Takeaways
- This notebook demonstrated how to build more capable agents using Prompt Chaining and the ReAct framework.
- It showed how agents can maintain context across multiple steps and perform structured, multi-step reasoning.
- It highlights how ReAct enables dynamic tool selection and adaptive decision-making based on intermediate results.
- These techniques enhance agent autonomy and make them more suitable for complex operational tasks.

For further extensions, continue exploring in the next notebook: [Agents and MCP](Level4_agents_and_mcp.ipynb).