# Amazon Bedrock Recipe: Arize Phoenix Integration with Bedrock Agents

## Overview
Arize AI delivers comprehensive observability tools specifically designed for AI applications. The platform is available in two versions:

**Arize AX:** An enterprise solution offering advanced monitoring capabilities

**Arize Phoenix:** An open-source platform making tracing and evaluation accessible to all developers

The integration between Arize AI and Amazon Bedrock Agents delivers three primary benefits:

**Comprehensive Traceability:** Gain visibility into every step of your agent’s execution path, from initial user query through knowledge retrieval and action execution

**Systematic Evaluation Framework:** Apply consistent evaluation methodologies to measure and understand agent performance

**Data-Driven Optimization:** Run structured experiments to compare different agent configurations and identify optimal settings

### Context
Amazon Bedrock Agents is a fully managed capability in Amazon Bedrock that allows you to build AI agents that can complete tasks by interacting with enterprise systems, data sources, and APIs. These agents can understand user requests in natural language, break down complex tasks into steps, retrieve relevant information, and take actions to fulfill user requests. With Bedrock Agents, you can create conversational assistants that can answer questions, provide recommendations, and perform actions on behalf of users, all while maintaining context throughout the conversation.


The integration between Arize AI and Amazon Bedrock Agents, provides developers with powerful capabilities for tracing, evaluating, and monitoring AI agent applications.
This Phoenix's tracing and span analysis capabilities are invaluable during the prototyping and debugging stages. 

#### Use Case
To demonstrate the integration between Arize Phoenix and Amazon Bedrock Agents providing observability outside of AWS tooling. 

#### Implementation
In this notebook we will show how to integrate Amazon Bedrock Agents and Arize Pheonix using the Arize AI platform. We will configure agent observability, send traces to Arize Pheonix, and evaluate the results using a single agent.


## Prerequisites
AWS account with appropriate IAM permissions for Amazon Bedrock Agents and Model Access.

### Python Dependencies

To run this notebook, you'll need to install some libraries in your environment:


In [None]:
%pip install -r requirements.txt --quiet

First, let's load the libraries:

In [None]:
# load the libraries
import time
import boto3
import logging
import os
import nest_asyncio
from phoenix.otel import register
from openinference.instrumentation import using_metadata

nest_asyncio.apply()

### Set Phoenix Environment Variables

This example used [Phoenix Cloud](https://app.phoenix.arize.com), free online hosted version of Phoenix. If you'd prefer, you can [self-host Phoenix](https://docs.arize.com/phoenix/self-hosting) instead.


In [None]:
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"
if not os.environ.get("PHOENIX_CLIENT_HEADERS"):
    os.environ["PHOENIX_CLIENT_HEADERS"] = "api_key=" + input("Enter your Phoenix API key: ")

### Connect to Phoenix
Now you can connect your notebook to a Phoenix instance.

The `auto_instrument` flag below will search your environment for any openinference-instrumentation packages, and call any that are found. Because you installed the openinference-instrumentation-bedrock library, any calls you make to Bedrock or Bedrock agents will be automatically instrumented and sent to Phoenix.

In [None]:
project_name = "Amazon Bedrock Agent Example"

tracer_provider = register(project_name=project_name, auto_instrument=True)

Configure metadata for the span - Using_metadata context manager to add metadata to the current OpenTelemetry Context. OpenInference auto instrumentators will read this Context and pass the metadata as a span attribute, following the OpenInference semantic conventions. The metadata, must be a dictionary with string keys. 

In [None]:
# metadata for filtering
metadata = { "agent" : "bedrock-agent", 
            "env" : "development"
       }

### AWS Credentials
Before using Amazon Bedrock, ensure that your AWS credentials are configured correctly. You can set them up using the AWS CLI or by setting environment variables. For this notebook assumes that the credentials are already configured.


In [None]:
# Create the client to invoke Agents in Amazon Bedrock:
session = boto3.Session()
REGION = session.region_name
bedrock_agent_runtime = session.client(service_name="bedrock-agent-runtime",region_name=REGION)

### Amazon Bedrock Agent


We assume you've already created an [Amazon Bedrock Agent](https://docs.aws.amazon.com/bedrock/latest/userguide/agents.html). If you don't have one already you can follow the **[instructions here]()** to set up an example agent.

Configure your agent's **ID** and (optionally) alias ID in the cell below. You can find these by looking up your agent in the ["Agents" page on the AWS Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home?#/agents) or CLI.

The Agent ID should be ten characters, uppercase, and alphanumeric.

In [None]:
agent_id = ""  # <- Configure your Bedrock Agent ID
agent_alias_id = "ZZCHKSYYJE"  # <- Optionally set a different Alias ID if you have one

Before moving on lets validate invoke agent is working correctly. The response is not important we are simply testing the API call. 

In [None]:
print(f"Trying to invoke alias {agent_alias_id} of agent {agent_id}...")
agent_resp = bedrock_agent_runtime.invoke_agent(
    agentAliasId=agent_alias_id,
    agentId=agent_id,
    inputText="Hello!",
    sessionId="dummy-session",
)
if "completion" in agent_resp:
    print("✅ Got response")
else:
    raise ValueError(f"No 'completion' in agent response:\n{agent_resp}")

## Run your Agent
You're now ready to run your Bedrock Agent.

In [None]:
# add constructor for metadata
@using_metadata(metadata)
def run(input_text):
    session_id = f"default-session1_{int(time.time())}"

    attributes = dict(
        inputText=input_text,
        agentId=agent_id,
        agentAliasId=agent_alias_id,
        sessionId=session_id,
        enableTrace=True,
    )
    response = bedrock_agent_runtime.invoke_agent(**attributes)

    # Stream the response
    for _, event in enumerate(response["completion"]):
        if "chunk" in event:
            print(event)
            chunk_data = event["chunk"]
            if "bytes" in chunk_data:
                output_text = chunk_data["bytes"].decode("utf8")
                print(output_text)
        elif "trace" in event:
            print(event["trace"])

In [None]:
run("What are the starters in the childrens menu?")  ## your prompt to the agent

## View your Traces in Phoenix

You should now be able to see traces in your Phoenix dashboard:

![image trace1](./images/trace_1.png)
![image trace2](./images/trace_2.png)


## Filter your Traces in Phoenix

Arize has 3 ways users can filter and search across their traces. 

- Use AI Search with natural language 

- Use AI Search to construct the filter syntax 

- Directly use Filter Syntax

You should now be able to see subset of traces in your Phoenix dashboard:

![image filter1](./images/filter_1.png)


# Evaluating your Agent

Phoenix also includes built in LLM evaluations and code-based experiment testing. In this next section, you'll add Agent tool calling evaluations to your traces.

Up until now, you'd just used the lighter-weight Phoenix OTEL tracing library. To run evals, you'll need to install the full library.

In [None]:
!pip install -q arize-phoenix --quiet

In [None]:
# load the libraries
import re
import json
import phoenix as px
from phoenix.evals import (
    TOOL_CALLING_PROMPT_RAILS_MAP,
    TOOL_CALLING_PROMPT_TEMPLATE,
    BedrockModel,
    llm_classify,
)
from phoenix.trace import SpanEvaluations
from phoenix.trace.dsl import SpanQuery

px.launch_app()

In [None]:
# function to process json string
def process_question(query):
    # print(query)
    query = re.sub(r'\'', '', query)
    query = re.sub(r'\\"','', query)
    query = re.sub(r'\\+', '', query)
    dict_data = json.loads(query)
    
    return dict_data.get("messages", [{}])[0].get("content", "")


In [None]:
query = (
    SpanQuery()
    .where(
        # Filter for the `LLM` span kind.
        # The filter condition is a string of valid Python boolean expression.
        "span_kind == 'LLM' and 'evaluation' not in input.value"
    )
    .select(
        question="input.value",
        outputs="output.value",
    )
)
trace_df = px.Client().query_spans(query, project_name=project_name)

In [None]:
# Apply the function using lambda
trace_df["question"] = trace_df["question"].apply(
    lambda x: process_question(x) if isinstance(x, str) else x
)

In [None]:
# Function to extract tool call names from the output
# extract all tool calls
def extract_tool_calls(output_value):
    try:
        tool_calls = []
        # Look for tool calls within <function_calls> tags
        if "<function_calls>" in output_value:
            # Find all tool_name tags
            tool_name_pattern = r"<tool_name>(.*?)</tool_name>"
            tool_names = re.findall(tool_name_pattern, output_value)

            # Add each found tool name to the list
            for tool_name in tool_names:
                if tool_name:
                    tool_calls.append(tool_name)
    except Exception as e:
        print(f"Error extracting tool calls: {e}")
        pass

    return tool_calls

# Apply the function to each row of trace_df.output.value
trace_df["tool_call"] = trace_df["outputs"].apply(
    lambda x: extract_tool_calls(x) if isinstance(x, str) else []
)

# Display the tool calls found
print("Tool calls found in traces:", trace_df["tool_call"].sum())

In [None]:
# Keep only rows where tool_calls is not empty (has at least one tool call)
trace_df = trace_df[trace_df["tool_call"].apply(lambda x: len(x) > 0)]

# Show the dataframe
trace_df.head()

In [None]:
# include tool definitions
trace_df["tool_definitions"] = (
    "phoenix-traces retrieves the latest trace information from Phoenix, phoenix-experiments retrieves the latest experiment information from Phoenix, phoenix-datasets retrieves the latest dataset information from Phoenix"
)

In [None]:
rails = list(TOOL_CALLING_PROMPT_RAILS_MAP.values())

# Use LLM as judge to evaluate the reponse
eval_model = BedrockModel(session=session, model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0")

response_classifications = llm_classify(
    data=trace_df,
    template=TOOL_CALLING_PROMPT_TEMPLATE,
    model=eval_model,
    rails=rails,
    provide_explanation=True,
)

#create a label
response_classifications["score"] = response_classifications.apply(
    lambda x: 1 if x["label"] == "correct" else 0, axis=1
)

## Logging evaluations to Phoenix

In [None]:
# log the evaluations
px.Client().log_evaluations(
    SpanEvaluations(eval_name="Tool Calling Eval", dataframe=response_classifications),
)

You should now see your evaluation labels in Phoenix!

![image eval1](./images/eval_1.png)
![image eval2](./images/eval_2.png)


# The End