# Level 2: Simple Agentic 

This tutorial is designed to give new users an overview of how to use llama-stack's builtin agent framework effectively. It demonstrates basic single-tool usage and provides guidance on switching between a local development environment and a remote kubernetes cluster.

## Overview

This tutorial walks you through how to build your own AI agent who can search the web following these steps:
1. Connecting to a llama-stack server (remote or local).
2. Configuring an agent for tool use.
3. Running the agent.


## Prerequisites

Before starting, ensure you have the following:
- Access to a remote cluster or a local Podman setup.
- A Tavily API key is required. You can register for one at [https://tavily.com/](https://tavily.com/).

### Setting your ENV variables:

Use the [`.env.example`](../../../.env.example) to create a new file called `.env` and ensure you add all the relevant environment variables below. 

- `REMOTE_BASE_URL` (string): the URL for your llama-stack instance if using remote connection.
- `TAVILY_SEARCH_API_KEY` (string): your API key for tavily search. One can get one by going to: https://tavily.com/home.

### Installing dependencies

This code requires `llama-stack` and the `llama-stack-client`, both at version `0.2.2`. Lets begin by installing them:


In [1]:
# !pip install llama-stack-client==0.2.2 llama-stack==0.2.2

## Building a Web Search Agent
### 1. Connecting to a llama-stack server.
#### 1.1 Setting Up the Environment
- Import the necessary libraries.
- Defining user variables.
- Connection options
    - **Remote Server**: Set `remote=True` and provide the remote llama-stack endpoint.
    - **Local Server**: Set `remote=False` and provide the local llama-stack endpoint (default port: 8321).
    
    For detailed llama-stack server setup instructions, refer to:
    - [Remote Setup Guide](../../../kubernetes/README.md)
    - [Local Setup Guide](../../../local_setup_guide.md)

In [2]:
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.lib.agents.react.agent import ReActAgent
from llama_stack_client.lib.agents.react.tool_parser import ReActOutput
from llama_stack_client import LlamaStackClient
import os
import sys
sys.path.append('..')  
from src.utils import step_printer
from termcolor import cprint
from dotenv import load_dotenv
load_dotenv()

remote = True # Use the `remote` variable to switching between a local development environment and a remote kubernetes cluster.
stream_output = False # Set to True to stream the output of the agent.
inference_model = "ibm-granite/granite-3.2-8b-instruct"
print(f"Model: {inference_model}")

tavily_search_api_key = os.getenv("TAVILY_SEARCH_API_KEY")  # necessary for websearch tool
if remote:
    base_url = os.getenv("REMOTE_BASE_URL")
else:
    base_url = "http://localhost:8321"

Model: ibm-granite/granite-3.2-8b-instruct


#### 1.2 Client Initialization
- Initialize the `LlamaStackClient` with the appropriate base URL and provider data.  
- Connect the llama-stack client to the llama-stack server using the base URL.

In [3]:
client = LlamaStackClient(
        base_url=base_url,
        provider_data={"tavily_search_api_key": tavily_search_api_key}
)

### 2. Configuring an agent for tool use.
- **Agent Initialization**: 

- Create an `Agent` instance with the desired model, instructions and tools."

- **Instructions**: The `instructions` parameter, also referred to as the system prompt, specifies the agent's role and behavior. In this example, the agent is configured as a helpful web search assistant. It is instructed to use a tool whenever a web search is required and to respond in a friendly and helpful tone.

- **Tools**: The `tools` parameter defines the tools available to the agent. In this case, the `builtin::websearch` tool is used, which enables the agent to perform web searches. This tool is essential for retrieving up-to-date information from the web. Internally, it leverages Tavily Search to execute the search queries efficiently.

- **How It Works**: When a user query is provided, the agent processes the input and determines whether a tool is required to fulfill the request. If the query involves retrieving information from the web, the agent invokes the `builtin::websearch` tool. The tool interacts with Tavily Search to fetch real-time data, which is then processed and returned to the user in a friendly and helpful tone. This workflow ensures that the agent can handle a wide range of queries effectively.

For more details on the `builtin::websearch` tool and its capabilities, refer to the [Llama-stack tools documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/tools.html#web-search-providers). 

In [4]:
agent = Agent(
    client, 
    model=inference_model,
    instructions="""You are a helpful websearch assistant. When you are asked to search the web you must use a tool. 
            Whenever a tool is called, be sure return the response in a friendly and helpful tone.
            """ ,
    tools=["builtin::websearch"],
    sampling_params={"max_tokens":4096}
)

### 3. Running the agent.
- Define user prompts to interact with the agent.
- Use the agent to create a session and process user queries.
- Display the agent's responses for each query.

In [5]:
user_prompts = [
    "What’s latest in OpenShift?",
    "Search for recent news articles about advancements in quantum computing.",
    "Find coffee shops near Boston Convention and Exhibition Center that are open now, have Wi-Fi.",
]
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    session_id = agent.create_session("web-session")
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream_output
    )
    if stream_output:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


[34mProcessing user query: What’s latest in OpenShift?[0m

---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[33mTool call: brave_search, Arguments: {'query': 'latest updates on OpenShift'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[33mHere are the latest updates on OpenShift:

1. **Red Hat unveils OpenShift 4.18: Enhanced security and virtualization**
   - This version introduces improvements in network flexibility with User Defined Networks and Border Gateway Protocol for pods and virtual machines.
   - For a detailed list of updates and improvements, check out the [OpenShift 4.18 Release Notes](https://www.redhat.com/en/blog/what-you-need-to-know-red-hat-openshift-418).

2. **Experience the many enhancements in OpenShift 4.18**
   - This article provides an overview of the latest features, including updates for plug-ins and templates.
   - For a comprehensive review, refer to [What's new in Red Hat Developer Hub 1.4](https://developers.redhat.com/blog/2025/02/25/whats-new-developers-red-hat-openshift-418).

3. **Red Hat OpenShift 4.17: What you need to know**
   - OpenShift 4.17, based on Kubernetes 1.30 and CRI-O 1.30, offers expanded control p


---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[33mHere are some recent news articles about advancements in quantum computing:

1. **Quantum Computers News -- ScienceDaily**
   - [Read the latest news in developing quantum computers.](https://www.sciencedaily.com/news/computers_math/quantum_computers/)
   - Content: "New Type of Quantum Computer Studies the Dance of Elementary Particles; Friday, March 21, 2025."
   - Score: 0.77893573

2. **Quantum Computing News -- ScienceDaily**
   - [Read the latest about the development of quantum computers.](https://www.sciencedaily.com/news/matter_energy/quantum_computing/)
   - Score: 0.6897452

3. **A blueprint for making quantum computers easier to program. MIT News**
   - [A CSAIL study highlights why it is so challenging to program a quantum computer to run a quantum algorithm, and offers a conceptual model for a more user-friendly quantum computer.](https://news.mit.edu/topic/quantum-computing)
   - Content: "April 16, 20


---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[33mHere are some coffee shops near the Boston Convention and Exhibition Center that are open now and offer Wi-Fi:

1. **Cocorico Boulangerie** - Reviews on Yelp mention their delicious pastries and coffee. [Check it out here](https://www.yelp.com/search?find_desc=Coffee&find_near=boston-convention-and-exhibition-center-boston)

2. **Blue Bottle Coffee** - Known for their high-quality coffee, this spot might be a great choice for coffee lovers. [More details here](https://www.yelp.com/search?find_desc=Coffee&find_near=boston-convention-and-exhibition-center-boston)

3. **Flour Bakery + Café** - Besides coffee, they're also famous for their baked goods. [Find out more](https://www.yelp.com/search?find_desc=Coffee&find_near=boston-convention-and-exhibition-center-boston)

4. **Tatte Bakery & Cafe** - This café is popular for its Israeli-inspired pastries and coffee. [Check their details here](https://www.yelp.com/search?fi

## Output Analysis
Here, we can observe that the `builtin::websearch` tool is used to perform a web search. The outputs are displayed in the notebook with color-coded text to help interpret the process:

- **Blue Text**: Represents the user's input or query.
- **Yellow Text**: Displays the LLM's inference response. 
- **Green Text**: Indicates the tool execution process, such as the tool being called and the query being sent to the web search API.

Great! 
We can see that the model returned some relevant and recent information about OpenShift. This was only possible due to its ability to call tools like the web search, demonstrating the agent's capacity to retrieve real-time data effectively.

For the second query, the model fetched the latest news articles, including 2 from March 2025. This is particularly impressive since the Granite 3.2 8B model, released in February 2025, has a knowledge cutoff date of April 2024. This demonstrates the agent's capability to extend beyond its static knowledge base by dynamically retrieving current information.  

For the third query about nearby coffee shops, the agent effectively used the `builtin::websearch` tool to provide practical, location-specific results with valid coffee shops and links, highlighting its real-world applicability.  

## Testing location-based queries


### Basic web search tool agent in one shot

In [6]:
agent = Agent(
    client, 
    model=inference_model,
    instructions="""You are a helpful websearch assistant. When you are asked to search the web you must use a tool. 
            Whenever a tool is called, be sure return the response in a friendly and helpful tone.
            """ ,
    tools=["builtin::websearch"],
    sampling_params={"max_tokens":4096}
)
user_prompts = [
    "Find coffee shops near me that are open now, have Wi-Fi.",
]
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    session_id = agent.create_session("web-session")
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream_output
    )
    if stream_output:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


[34mProcessing user query: Find coffee shops near me that are open now, have Wi-Fi.[0m

---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[33mTool call: brave_search, Arguments: {'query': 'coffee shops open now with Wi-Fi near me'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[33mSure, I found some coffee shops near you that are open now and offer Wi-Fi. Here are the top results:

1. **Best Coffee With Wifi Near Me - Yelp**
   - [Check it out on Yelp](https://www.yelp.com/nearme/coffee-with-wifi)
   - Yelp has compiled a list of coffee shops near you that offer Wi-Fi. It's based on over 7 million business reviews and opinions from Yelpers.

2. **12 LA Coffee Shops With Really Fast Wifi - The Infatuation**
   - [Read more on The Infatuation](https://www.theinfatuation.com/los-angeles/guides/coffee-shops-with-really-fast-wifi-la)
   - This article lists 12 coffee shops in Los Angeles known for their fast Wi-Fi, along with their unique features.

3. **The 16 Best Los Angeles Coffee Shops With Free Wi-Fi**
   - [Explore on Eater LA](https://la.eater.com/maps/los-angeles-best-coffee-shops-free-wifi)
   - Eater LA has a map of the best coffee shops in LA that offer free Wi-Fi. One of them is noted 

## Output Insight
If just put near me, model assume im in Dearborn,+MI. This could be related to the default system prompt in granite model.

### Web search agent + Prompt chaining 

In [12]:
agent = Agent(
    client, 
    model=inference_model,
    instructions="""You are a helpful websearch assistant. When you are asked to search location based queries you must use the websearch tool. Provide reference for each recommendation.
            """ ,
    tools=["builtin::websearch"],
    sampling_params={"max_tokens":4096}
)
user_prompts = [
    "Where am i?",
    "Im at the Boston Convention and Exhibition Center",
    "Find coffee shops near me that are open now, have Wi-Fi."
]
session_id = agent.create_session("web-session")  # for prompt chaining, queries must share the same session_id.
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream_output
    )
    if stream_output:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


[34mProcessing user query: Where am i?[0m

---------- 📍 Step 1: InferenceStep ----------
🤖 Model Response:
[33mI'm sorry for the confusion, but as a text-based AI, I don't have the ability to determine your physical location. I can help you find information about a location if you provide me with a name or details.
[0m


[34mProcessing user query: Im at the Boston Convention and Exhibition Center[0m

---------- 📍 Step 1: InferenceStep ----------
🤖 Model Response:
[33mThe Boston Convention and Exhibition Center is located at 415 Summer Street, Boston, MA 02210, United States. Here are some details about the location:

1. It's the largest convention center in New England, covering over 1.7 million square feet.
2. It's situated in the Seaport District of Boston, near the waterfront.
3. The nearest major airport is General Edward Lawrence Logan International Airport, about 4 miles away.
4. It's also close to several hotels, restaurants, and attractions in the area.

Reference(s):
1

### Web search ReAct Agent

In [8]:
agent = ReActAgent(
            client=client,
            model=inference_model,
            instructions="""You are a helpful websearch assistant. 
            When you are asked to search the web you must use a tool. 
            You are a helpful assistant that uses reasoning to solve problems step by step. Break down complex problems into simpler steps.
            For location based queries, ask me to provide my location and then use the tool to search.
            """,
            tools=["builtin::websearch"],
            response_format={
                "type": "json_schema",
                "json_schema": ReActOutput.model_json_schema(),
            },
            sampling_params={"max_tokens":4096},
        )
user_prompts = [
    "Find coffee shops near me that are open now, have Wi-Fi."
]
session_id = agent.create_session("web-session")
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream_output
    )
    if stream_output:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


[34mProcessing user query: Find coffee shops near me that are open now, have Wi-Fi.[0m

---------- 📍 Step 1: InferenceStep ----------
🤖 Model Response:
[33m{"thought": "I need to ask the user for their location to provide accurate results. Then, I will use the 'brave_search' tool to find coffee shops near the user's location that are currently open and offer Wi-Fi."

,"action": {
    "tool_name": "brave_search",
    "tool_params": [
        {
            "name": "query",
            "value": "coffee shops open now with Wi-Fi near me"
        }
    ]
}

,"answer": "I'm sorry for the inconvenience, but I need to know your current location to provide accurate results. Could you please share your location?"}
[0m



## Key Takeaways
- This tutorial demonstrates how to set up and use a single tool with AI Agent.
- It highlights the flexibility of switching between remote and local environments.
- By following these steps, users can quickly get started with AI Agents and explore its capabilities.

For more advanced use cases, please check our other example jupyter notebooks.