## Advanced Data Analysis using Amazon AgentCore Bedrock Code Interpreter- Tutorial(Strands)
This tutorial demonstrates how to create an AI agent that performs advanced data analysis through code execution using Python. We use Amazon Bedrock AgentCore Code Interpreter to run code that is generated by the LLM.

This tutorial demonstrates how to use AgentCore Bedrock Code Interpreter to:
1. Set up a sandbox environment
2. Configure a strands based agent that performs advanced data analysis by generating code based on the user query
3. Execute code in a sandbox environment using Code Interpreter
4. Display the results back to the user

## Prerequisites
- AWS account with Bedrock AgentCore Code Interpreter access
- You have the necessary IAM permissions to create and manage code interpreter resources
- Required Python packages installed(including boto3, bedrock-agentcore & strands)
- IAM role should have permissions to invoke models on Amazon Bedrock
 - Access to Claude 3.7 Sonnet model in the US Oregon (us-west-2) region (default model for Strands SDK)

## Your IAM execution role should have the following IAM policy attached

~~~ {
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "bedrock-agentcore:CreateCodeInterpreter",
            "bedrock-agentcore:StartCodeInterpreterSession",
            "bedrock-agentcore:InvokeCodeInterpreter",
            "bedrock-agentcore:StopCodeInterpreterSession",
            "bedrock-agentcore:DeleteCodeInterpreter",
            "bedrock-agentcore:ListCodeInterpreters",
            "bedrock-agentcore:GetCodeInterpreter"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": "arn:aws:logs:*:*:log-group:/aws/bedrock-agentcore/code-interpreter*"
    }
]
}

## How it works

The code execution sandbox enables agents to safely process user queries by creating an isolated environment with a code interpreter, shell, and file system. After a Large Language Model helps with tool selection, code is executed within this session, before being returned to the user or Agent for synthesis.

![architecture local](code-interpreter.png)

## 1. Setting Up the Environment

First, let's import the necessary libraries and initialize our Code Interpreter client.

The default session timeout is 900 seconds(15 minutes). However, we start the session with a slightly session timeout duration of 1200 seconds(20 minutes), since we will perform detailed analysis on our data

In [None]:
!pip install --upgrade -r requirements.txt

In [110]:
from bedrock_agentcore.tools.code_interpreter_client import CodeInterpreter
from strands import Agent, tool
import json
import pandas as pd
from typing import Dict, Any, List

# Initialize the Code Interpreter within a supported AWS region.
code_client = CodeInterpreter('us-west-2')
code_client.start(session_timeout_seconds=1200)

'01K01J7Z3DH445CJ0YW14BK2GY'

## 2. Reading Local Data File

Now we'll read the contents of our sample data file. The file consists of random data with 4 columns: Name, Preferred_City, Preferred_Animal, Preferred_Thing and ~ 300,000 records.

We will analyze this file using an agent little later, to understand distributions and outliers

In [111]:
df_data = pd.read_csv("samples/data.csv")
df_data.head()

Unnamed: 0,Name,Preferred_City,Preferred_Animal,Preferred_Thing
0,Betty Ramirez,Dallas,Elephant,Sofa
1,Jennifer Green,Naples,Bee,Shirt
2,John Lopez,Helsinki,Zebra,Wallet
3,Susan Gonzalez,Beijing,Chicken,Phone
4,Jennifer Wright,Buenos Aires,Goat,Wallet


In [None]:
def read_file(file_path: str) -> str:
    """Helper function to read file content with error handling"""
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read()
    except FileNotFoundError:
        print(f"Error: The file '{file_path}' was not found.")
        return ""
    except Exception as e:
        print(f"An error occurred: {e}")
        return ""

data_file_content = read_file("samples/data.csv")

## 3. Preparing Files for Sandbox Environment

We'll create a structure that defines the files we want to create in the sandbox environment.

In [113]:
files_to_create = [
                {
                    "path": "data.csv",
                    "text": data_file_content
                }]

## 4. Creating Helper Function for Tool Invocation

This helper function will make it easier to call sandbox tools and handle their responses. Within an active session, you can execute code in supported languages (Python, JavaScript), access libraries based on your dependencies configuration, generate visualizations, and maintain state between executions.

In [114]:
def call_tool(tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
    """Helper function to invoke sandbox tools

    Args:
        tool_name (str): Name of the tool to invoke
        arguments (Dict[str, Any]): Arguments to pass to the tool

    Returns:
        Dict[str, Any]: JSON formatted result
    """
    response = code_client.invoke(tool_name, arguments)
    for event in response["stream"]:
        return json.dumps(event["result"])

## 5. Write data file to Code Sandbox

Now we'll write our data file into the sandbox environment and verify they were created successfully.

In [115]:
# Write files to sandbox
writing_files = call_tool("writeFiles", {"content": files_to_create})
print("Writing files result:")
print(writing_files)

# Verify files were created
listing_files = call_tool("listFiles", {"path": ""})
print("\nFiles in sandbox:")
print(listing_files)

Writing files result:
{"content": [{"type": "text", "text": "Successfully wrote all 1 files"}], "isError": false}

Files in sandbox:
{"content": [{"type": "resource_link", "uri": "file:///log", "name": "log", "description": "Directory"}, {"type": "resource_link", "mimeType": "text/csv", "uri": "file:///data.csv", "name": "data.csv", "description": "File"}, {"type": "resource_link", "uri": "file:///.ipython", "name": ".ipython", "description": "Directory"}], "isError": false}


## 6. Perform Advanced Analysis using Strands based Agent

Now we will configure an agent to perform data analysis on the data file that we uploaded into the sandbox(above)

### 6.1 System Prompt Definition
Define the behavior and capabilities of the AI assistant. We instruct our assistant to always validate answers through code execution and data based reasoning.

In [116]:
SYSTEM_PROMPT = """You are a helpful AI assistant that validates all answers through code execution using the tools provided. DO NOT Answer questions without using the tools

VALIDATION PRINCIPLES:
1. When making claims about code, algorithms, or calculations - write code to verify them
2. Use execute_python to test mathematical calculations, algorithms, and logic
3. Create test scripts to validate your understanding before giving answers
4. Always show your work with actual code execution
5. If uncertain, explicitly state limitations and validate what you can

APPROACH:
- If asked about a programming concept, implement it in code to demonstrate
- If asked for calculations, compute them programmatically AND show the code
- If implementing algorithms, include test cases to prove correctness
- Document your validation process for transparency
- The sandbox maintains state between executions, so you can refer to previous results

TOOL AVAILABLE:
- execute_python: Run Python code and see output

RESPONSE FORMAT: The execute_python tool returns a JSON response with:
- sessionId: The sandbox session ID
- id: Request ID
- isError: Boolean indicating if there was an error
- content: Array of content objects with type and text/data
- structuredContent: For code execution, includes stdout, stderr, exitCode, executionTime

For successful code execution, the output will be in content[0].text and also in structuredContent.stdout.
Check isError field to see if there was an error.

Be thorough, accurate, and always validate your answers when possible."""

### 6.2 Code Execution Tool Definition
Next we define the function as tool that will be used by the Agent as tool, to run code in the code sandbox. We use the @tool decorator to annotate the function as a custom tool for the Agent.

Within an active code interpreter session, you can execute code in supported languages (Python, JavaScript), access libraries based on your dependencies configuration, generate visualizations, and maintain state between executions.

In [120]:
#Define and configure the code interpreter tool
@tool
def execute_python(code: str, description: str = "") -> str:
    """Execute Python code in the sandbox."""

    if description:
        code = f"# {description}\n{code}"

    #Print generated Code to be executed
    print(f"\n Generated Code: {code}")


    # Call the Invoke method and execute the generated code, within the initialized code interpreter session
    response = code_client.invoke("executeCode", {
        "code": code,
        "language": "python",
        "clearContext": False
    })
    for event in response["stream"]:
        return json.dumps(event["result"])

### 6.3 Agent Configuration
We create and configure an agent using the Strands SDK. We provide it the system prompt and the tool we defined above to execute generate code

In [121]:
#configure the strands agent including the tool(s)
agent=Agent(
        tools=[execute_python],
        system_prompt=SYSTEM_PROMPT,
        callback_handler=None)

## 7. Agent Invocation and Response Processing
We invoke the agent with our query and process the agent's response


Note: Async execution requires running in an async environment

## 7.1 Query to perform Exploratory Data Analysis(EDA)

Let's start with a query which instructs the agent to perform exploratory data analysis on the data file in the code sandbox environment

In [123]:
query = "Perform exploratory data analysis(EDA) on the file 'data.csv'. Tell me about distributions and outlier values."

# Invoke the agent asynchcronously and stream the response
response_text = ""
async for event in agent.stream_async(query):
    if "data" in event:
        # Stream text response
        chunk = event["data"]
        response_text += chunk
        print(chunk, end="")

I'll perform an exploratory data analysis (EDA) on the data.csv file to examine distributions and identify any outliers. Let me work through this step by step.I see we're missing some libraries. Let me try without seaborn and use the standard libraries:Now let's analyze each categorical column in more detail:Let's check for patterns in preferences and analyze distributions visually:Let's check for outliers and get more detailed statistics:Let's further explore interesting patterns and relationships in the data:Based on the comprehensive exploratory data analysis (EDA) of the data.csv file, here's a summary of the findings:

## 1. Dataset Overview

- **Size**: The dataset contains 299,130 records with 4 columns: Name, Preferred_City, Preferred_Animal, and Preferred_Thing.
- **Data types**: All columns contain categorical (object) data.
- **Unique values**:
  - 1,722 unique names (combinations of first and last names)
  - 55 unique cities
  - 50 unique animals
  - 51 unique things

## 2.

## 7.2 Query to extract information

Now, let's instruct the agent to extract specific information from the data file in the code sandbox environment

In [122]:
query = "Within the file 'data.csv', how many individuals with the first name 'Kimberly' have 'Crocodile' as their favourite animal?"

# Invoke the agent asynchcronously and stream the response
response_text = ""
async for event in agent.stream_async(query):
    if "data" in event:
        # Stream text response
        chunk = event["data"]
        response_text += chunk
        print(chunk, end="")

I'll help you analyze the data.csv file to find individuals named 'Kimberly' who have 'Crocodile' as their favorite animal. Let me use Python to check this data file.
 Generated Code: # Checking if data.csv exists and examining its structure
# First, let's check if the file exists
import os
import pandas as pd

# Check if the file exists
if os.path.exists('data.csv'):
    print(f"File 'data.csv' exists. Reading the file...")
    # Try to read the CSV file
    try:
        df = pd.read_csv('data.csv')
        print("File successfully loaded. Here's a preview:")
        print(df.head())
        print(f"\nTotal rows in the file: {len(df)}")
        print(f"Columns in the file: {df.columns.tolist()}")
    except Exception as e:
        print(f"Error reading the file: {e}")
else:
    print(f"File 'data.csv' does not exist in the current directory.")
    print(f"Current directory contents: {os.listdir()}")
Great! The data.csv file exists and has been successfully loaded. Now I can see it has

## 8. Cleanup

Finally, we'll clean up by stopping the Code Interpreter session. Once finished using a session, the session should be shopped to release resources and avoid unnecessary charges.

In [109]:
# Stop the Code Interpreter session
code_client.stop()
print("Code Interpreter session stopped successfully!")

Code Interpreter session stopped successfully!
