# Building Agents with Llama Stack Agentic Capabilities

This notebook demonstrates building **agentic systems** using Llama Stack's native capabilities for tool calling with **MCP (Model Context Protocol)** integration.

For more information check [the Llama Stack Responses API docs](https://llamastack.github.io/docs/building_applications/responses_vs_agents#lls-agents-api) and the [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses).

## Configuration

This notebook uses environment variables from `.env` file in the project root.
Create your own `.env` file based on `.env.example`.

## Install Required Packages

Install Llama Stack client (version must match server version):

In [None]:
%pip install -q "llama-stack-client==0.2.22" "python-dotenv"

## Import Dependencies

In [None]:
# Core imports
import os
import sys
from pathlib import Path
from dotenv import load_dotenv, find_dotenv

# --- Load environment variables ---
# Automatically detect the nearest .env (walks up from current directory)
env_path = find_dotenv(usecwd=True)
if env_path:
    load_dotenv(env_path)
    print(f"üìÅ Loading environment from: {env_path}")
    print("‚úÖ .env file FOUND and loaded")
else:
    default_path = Path.cwd() / ".env"
    print(f"üìÅ No .env found via find_dotenv ‚Äî checked: {default_path}")
    print("‚ö†Ô∏è  .env file NOT FOUND")

# --- Verify Python interpreter / kernel ---
print(f"\nüêç Python: {sys.executable}")

# Detect if running inside a virtual environment
in_venv = (
    hasattr(sys, "real_prefix") or
    (getattr(sys, "base_prefix", sys.prefix) != sys.prefix) or
    "VIRTUAL_ENV" in os.environ or
    "CONDA_PREFIX" in os.environ
)

if in_venv:
    print("‚úÖ Using virtual environment - CORRECT!")
else:
    print("‚ö†Ô∏è  Using global Python - Consider switching kernel!")
    print("   Click 'Select Kernel' ‚Üí Choose 'Python (byo-agentic-framework)')")

## Configure Llama Stack Connection

Create client using Llama Stack's base URL:

In [None]:
from llama_stack_client import LlamaStackClient

# Get configuration from environment variables
LLAMA_STACK_BASE_URL = os.getenv("LLAMA_STACK_BASE_URL")
LLAMA_STACK_OPENAI_ENDPOINT = os.getenv("LLAMA_STACK_OPENAI_ENDPOINT")

print("üåê Llama Stack Configuration:")
print(f"   Base URL: {LLAMA_STACK_BASE_URL}")
print(f"   OpenAI Endpoint: {LLAMA_STACK_OPENAI_ENDPOINT}")

# Create client using the base URL
client = LlamaStackClient(base_url=LLAMA_STACK_BASE_URL)
print("‚úÖ Llama Stack client created")

## Helper Functions

Utility for pretty-printing response objects:

In [None]:
import json
from datetime import date

def pretty_print(obj) -> None:
    """
    Print object as formatted JSON.
    Handles nested objects and lists.
    """
    def recursive_serializer(o):
        if hasattr(o, '__dict__'):
            return o.__dict__
        if isinstance(o, date):
            return o.isoformat()
        raise TypeError(f"Object of type {o.__class__.__name__} is not JSON serializable")

    data_to_serialize = obj.__dict__ if hasattr(obj, "__dict__") else obj
    print(json.dumps(data_to_serialize, indent=2, default=recursive_serializer))

## Example 1: Agent with Tools Available (No Tool Usage)

Test the agent with a basic question. The agent has tools available but determines they're not needed for this query:

In [None]:
# Configuration
INFERENCE_MODEL = os.getenv("INFERENCE_MODEL")
MCP_WEATHER_SERVER_URL = os.getenv("MCP_WEATHER_SERVER_URL")

EXAMPLE_PROMPT = "What is the longest river in Spain?"

print(f"ü§ñ Model: {INFERENCE_MODEL}")
print(f"üí¨ Prompt: {EXAMPLE_PROMPT}\n")

# Use Llama Stack's Responses API with MCP tools available
responses = client.responses.create(
    model=INFERENCE_MODEL,
    input=EXAMPLE_PROMPT,
)

print("=" * 80)
print("FINAL RESPONSE:")
print("=" * 80)
print(responses.output_text)

## Example 2: Agentic Tool Execution

Test the agent with a weather query. The agent will:
1. **Discover** available MCP tools
2. **Decide** which tool to use (getforecast)
3. **Execute** the tool call with appropriate arguments
4. **Synthesize** the tool results into a natural language response

In [None]:
# Configuration
INFERENCE_MODEL = os.getenv("INFERENCE_MODEL")
MCP_WEATHER_SERVER_URL = os.getenv("MCP_WEATHER_SERVER_URL")

EXAMPLE_PROMPT = "What is the weather in Boston? Use the getforecast tool."

print(f"ü§ñ Model: {INFERENCE_MODEL}")
print(f"üå¶Ô∏è  MCP Server: {MCP_WEATHER_SERVER_URL}")
print(f"üí¨ Prompt: {EXAMPLE_PROMPT}\n")

# Use Llama Stack's Responses API
agent_responses = client.responses.create(
    model=INFERENCE_MODEL,
    input=EXAMPLE_PROMPT,
    tools=[
        {
            "type": "mcp",  # Server-side MCP
            "server_url": MCP_WEATHER_SERVER_URL,
            "server_label": "weather",
        }
    ],
)

# Display execution trace
print("ü§ñ Agent Execution Trace:")
print("-" * 80)

for i, output in enumerate(agent_responses.output):
    print(f"\n[Output {i + 1}] Type: {output.type}")
    
    if output.type == "mcp_list_tools":
        print(f"  Server: {output.server_label}")
        print(f"  Tools available: {[t.name for t in output.tools]}")
    
    elif output.type == "mcp_call":
        print(f"  Tool called: {output.name}")
        print(f"  Arguments: {output.arguments}")
        print(f"  Result: {output.output[:200]}...")  # Truncate long output
        if output.error:
            print(f"  Error: {output.error}")
    
    elif output.type == "message":
        print(f"  Role: {output.role}")
        if hasattr(output.content[0], 'text'):
            print(f"  Content: {output.content[0].text}")

print("\n" + "=" * 80)
print("FINAL RESPONSE:")
print("=" * 80)
print(agent_responses.output_text)

## Example 3: Agentic Tool Execution - Kubernetes MCP Server

Test the agent with a OpenShift query. The agent will:
1. **Discover** available MCP tools
2. **Decide** which tool to use (getforecast)
3. **Execute** the tool call with appropriate arguments
4. **Synthesize** the tool results into a natural language response

In [None]:
# Configuration
INFERENCE_MODEL = os.getenv("INFERENCE_MODEL")
MCP_OCP_SERVER_URL = os.getenv("MCP_OCP_SERVER_URL")

EXAMPLE_PROMPT = "Give me a list of the pods in the ai-bu-shared. Use the k8s tool."

print(f"ü§ñ Model: {INFERENCE_MODEL}")
print(f"üå¶Ô∏è  MCP Server: {MCP_OCP_SERVER_URL}")
print(f"üí¨ Prompt: {EXAMPLE_PROMPT}\n")

# Use Llama Stack's Responses API
agent_responses = client.responses.create(
    model=INFERENCE_MODEL,
    input=EXAMPLE_PROMPT,
    tools=[
        {
            "type": "mcp",  # Server-side MCP
            "server_url": MCP_OCP_SERVER_URL,
            "server_label": "k8s",
        }
    ],
)

# Display execution trace
print("ü§ñ Agent Execution Trace:")
print("-" * 80)

for i, output in enumerate(agent_responses.output):
    print(f"\n[Output {i + 1}] Type: {output.type}")
    
    if output.type == "mcp_list_tools":
        print(f"  Server: {output.server_label}")
        print(f"  Tools available: {[t.name for t in output.tools]}")
    
    elif output.type == "mcp_call":
        print(f"  Tool called: {output.name}")
        print(f"  Arguments: {output.arguments}")
        print(f"  Result: {output.output[:200]}...")  # Truncate long output
        if output.error:
            print(f"  Error: {output.error}")
    
    elif output.type == "message":
        print(f"  Role: {output.role}")
        if hasattr(output.content[0], 'text'):
            print(f"  Content: {output.content[0].text}")

print("\n" + "=" * 80)
print("FINAL RESPONSE:")
print("=" * 80)
print(agent_responses.output_text)