## Data Analysis agent using FireworksAI + Amazon AgentCore Bedrock Code Interpreter

This tutorial demonstrates how to create an AI agent that performs advanced data analysis through code execution using Python. We use Amazon Bedrock AgentCore Code Interpreter to run code that is generated by the LLM running on FireworksAI.

This tutorial is an adaptation of the [agentcore data analysis tutorial](https://github.com/awslabs/amazon-bedrock-agentcore-samples/blob/main/01-tutorials/05-AgentCore-tools/01-Agent-Core-code-interpreter/03-advanced-data-analysis-with-agent-using-code-interpreter/strands-agent-advanced-data-analysis-code-interpreter.ipynb)

We will demonstrate how to use AgentCore Bedrock Code Interpreter to:
1. Set up a sandbox environment
2. Configure a strands based agent that performs advanced data analysis by generating code based on the user query
3. Run top OSS coding models on FireworksAI (Qwen 3 Coder, Deepseek, Kimi, etc)
4. Execute code in a sandbox environment using Code Interpreter
5. Display the results back to the user

## Prerequisites
- AWS account with Bedrock AgentCore Code Interpreter access
- You have the necessary IAM permissions to create and manage code interpreter resources
- Required Python packages installed(including boto3, bedrock-agentcore & strands)
- IAM role should have permissions to invoke models on Amazon Bedrock
 - FireworksAI API access key, if you dont have one get one [here](https://app.fireworks.ai/settings/users/api-keys)

## Your IAM execution role should have the following IAM policy attached

~~~ {
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "bedrock-agentcore:CreateCodeInterpreter",
            "bedrock-agentcore:StartCodeInterpreterSession",
            "bedrock-agentcore:InvokeCodeInterpreter",
            "bedrock-agentcore:StopCodeInterpreterSession",
            "bedrock-agentcore:DeleteCodeInterpreter",
            "bedrock-agentcore:ListCodeInterpreters",
            "bedrock-agentcore:GetCodeInterpreter"
        ],
        "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": "arn:aws:logs:*:*:log-group:/aws/bedrock-agentcore/code-interpreter*"
    }
]
}

## How it works

The code execution sandbox enables agents to safely process user queries by creating an isolated environment with a code interpreter, shell, and file system. After a Large Language Model helps with tool selection, code is executed within this session, before being returned to the user or Agent for synthesis.

![architecture local](images/code-interpreter.png)

## 1. Setting Up the Environment

First, let's import the necessary libraries and initialize our Code Interpreter client.

The default session timeout is 900 seconds(15 minutes). However, we start the session with a slightly session timeout duration of 1200 seconds(20 minutes), since we will perform detailed analysis on our data

In [1]:
!make setup

Setting up local environment...
'uv' is already installed.
Virtual environment already exists.
Installing dependencies...
uv pip install -r requirements.txt
[2mAudited [1m13 packages[0m [2min 86ms[0m[0m


In [46]:
from bedrock_agentcore.tools.code_interpreter_client import CodeInterpreter
from strands import Agent, tool
import json
from typing import Dict, Any
from strands.models.openai import OpenAIModel
from dotenv import load_dotenv
import os

load_dotenv()

FIREWORKS_API_KEY = os.getenv("FIREWORKS_API_KEY")

assert FIREWORKS_API_KEY is not None, "FIREWORKS_API_KEY not found in environment variables"

# Initialize the Code Interpreter within a supported AWS region.
code_client = CodeInterpreter('us-west-2')
code_client.start(session_timeout_seconds=1200)

'01K6CE9JXACRSTCVNDDHV0GMA8'

## 2. Downloading data from Kaggle

We will be using an open source dataset in kaggle which has GDP by country for the years 2020-2025. The link to the dataset is [here](https://www.kaggle.com/datasets/codebynadiia/gdp-per-country-20202025)

In [3]:
import kagglehub
from kagglehub import KaggleDatasetAdapter

df = kagglehub.dataset_load(
  KaggleDatasetAdapter.PANDAS,
  handle="codebynadiia/gdp-per-country-20202025",
  path="2020-2025.csv"
)

df.to_csv("data/gdp_data.csv", index=False)

# Drop NaN values to keep things clean
df = df.dropna()

print(f"Dataset schema {df.columns}")
print()
print("First 5 records:", df.head())

Dataset schema Index(['Country', '2020', '2021', '2022', '2023', '2024', '2025'], dtype='object')

First 5 records:                Country    2020      2021      2022      2023      2024  \
1              Albania   15271   18086.0   19185.0   23388.0   27259.0   
2              Algeria  164774  185850.0  225709.0  247789.0  264913.0   
3              Andorra    2885    3325.0    3376.0    3786.0    4038.0   
4               Angola   66521   84375.0  142442.0  109764.0  115946.0   
5  Antigua and Barbuda    1412    1602.0    1867.0    2006.0    2225.0   

       2025  
1   28372.0  
2  268885.0  
3    4035.0  
4  113343.0  
5    2373.0  


## 3. Preparing Files for Sandbox Environment

We'll create a structure that defines the files we want to create in the sandbox environment.

In [4]:
def read_file(file_path: str) -> str:
    """Helper function to read file content with error handling"""
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read()
    except FileNotFoundError:
        print(f"Error: The file '{file_path}' was not found.")
        return ""
    except Exception as e:
        print(f"An error occurred: {e}")
        return ""


In [5]:
files_to_create = [
                {
                    "path": "data/gdp_data.csv",
                    "text": read_file("data/gdp_data.csv")
                }]

## 4. Creating Helper Function for Tool Invocation

This helper function will make it easier to call sandbox tools and handle their responses. Within an active session, you can execute code in supported languages (Python, JavaScript), access libraries based on your dependencies configuration, generate visualizations, and maintain state between executions.

In [6]:
def call_tool(tool_name: str, arguments: Dict[str, Any]) -> str:
    """Helper function to invoke sandbox tools

    Args:
        tool_name (str): Name of the tool to invoke
        arguments (Dict[str, Any]): Arguments to pass to the tool

    Returns:
        Dict[str, Any]: JSON formatted result
    """
    response = code_client.invoke(tool_name, arguments)
    for event in response["stream"]:
        return json.dumps(event["result"])

## 5. Write data file to Code Sandbox

Now we'll write our data file into the sandbox environment and verify they were created successfully.

In [7]:
# Write files to sandbox
writing_files = call_tool("writeFiles", {"content": files_to_create})
print("Writing files result:")
print(writing_files)

# Verify files were created
listing_files = call_tool("listFiles", {"path": ""})
print("\nFiles in sandbox:")
print(listing_files)

Writing files result:
{"content": [{"type": "text", "text": "Successfully wrote all 1 files"}], "isError": false}

Files in sandbox:
{"content": [{"type": "resource_link", "uri": "file:///log", "name": "log", "description": "Directory"}, {"type": "resource_link", "uri": "file:///data", "name": "data", "description": "Directory"}, {"type": "resource_link", "uri": "file:///.ipython", "name": ".ipython", "description": "Directory"}], "isError": false}


## 6. Perform Advanced Analysis using Strands based Agent

Now we will configure an agent to perform data analysis on the data file that we uploaded into the sandbox(above)

### 6.1 System Prompt Definition
Define the behavior and capabilities of the AI assistant. We instruct our assistant to always validate answers through code execution and data based reasoning.

In [8]:
from constants import DATA_SCIENTIST_SYSTEM_PROMPT
print(DATA_SCIENTIST_SYSTEM_PROMPT)


    You are an expert data analysis AI assistant specializing in economic and statistical analysis. You have access to a GDP dataset containing country-level data from 2020-2025 with columns: 'Country', '2020', '2021', '2022', '2023', '2024', '2025'.
                    
    You MUST validate all answers through code execution using the tools provided. DO NOT answer questions without using the tools.
    
    DATA ANALYSIS PRINCIPLES:
    1. Always load and examine the dataset before answering questions
    2. Verify all statistical calculations, trends, and comparisons through code
    3. Use pandas for data manipulation and analysis, and matplotlib for data visualization
    4. Create visualizations when helpful to illustrate findings
    5. Show your analytical work with actual code execution
    6. Validate data quality and handle missing values appropriately
    
    VALIDATION PRINCIPLES:
    1. When making claims about calculations or trends - write code to verify them
    2. U

### 6.2 Code Execution Tool Definition
Next we define the function as tool that will be used by the Agent as tool, to run code in the code sandbox. We use the @tool decorator to annotate the function as a custom tool for the Agent.

Within an active code interpreter session, you can execute code in supported languages (Python, JavaScript), access libraries based on your dependencies configuration, generate visualizations, and maintain state between executions.

In [9]:
#Define and configure the code interpreter tool
@tool
def execute_python(code: str, description: str = "") -> str:
    """Execute Python code in the sandbox."""

    if description:
        code = f"# {description}\n{code}"

    #Print generated Code to be executed
    print(f"\n Generated Code: {code}")


    # Call the Invoke method and execute the generated code, within the initialized code interpreter session
    response = code_client.invoke("executeCode", {
        "code": code,
        "language": "python",
        "clearContext": False
    })
    for event in response["stream"]:
        return json.dumps(event["result"])

### 6.3 Agent Configuration
We create and configure an agent using the Strands SDK. We provide it the system prompt and the tool we defined above to execute generate code.

We use [Qwen 3 Coder 480B](https://app.fireworks.ai/models/fireworks/qwen3-coder-480b-a35b-instruct) a SOTA OSS model from the Qwen family

In [10]:
model = OpenAIModel(
    client_args={
        "api_key": FIREWORKS_API_KEY,
        "base_url": "https://api.fireworks.ai/inference/v1",
    },
    model_id="accounts/fireworks/models/qwen3-coder-480b-a35b-instruct",
    params={
        "max_tokens": 5000,
        "temperature": 0.0,
    }
)

agent=Agent(
    model=model,
        tools=[execute_python],
        system_prompt=DATA_SCIENTIST_SYSTEM_PROMPT,
        callback_handler=None
)

## 7. Agent Invocation and Response Processing
We invoke the agent with our query and process the agent's response


Note: Async execution requires running in an async environment

## 7.1 Query to perform Exploratory Data Analysis(EDA)

Let's start with a query which instructs the agent to perform exploratory data analysis on the data file in the code sandbox environment

In [11]:
query = ("Load the file 'gdp_data.csv' and perform some simple exploratory data analysis (EDA) on it. Tell me about distributions and outlier values. "
         "Prepare a short final report with your findings.")

# Invoke the agent asynchcronously and stream the response
response_text = ""
async for event in agent.stream_async(query):
    if "data" in event:
        # Stream text response
        chunk = event["data"]
        response_text += chunk
        print(chunk, end="")

I'll help you load and analyze the GDP dataset. Let me start by loading the file and performing exploratory data analysis.


 Generated Code: # Loading the GDP dataset and displaying basic information
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the GDP dataset
df = pd.read_csv('gdp_data.csv')

# Display basic information about the dataset
print("Dataset Info:")
print(df.info())
print("\nFirst few rows:")
print(df.head())
print("\nDataset shape:", df.shape)
I apologize for the error. It seems the file 'gdp_data.csv' doesn't exist in the current directory. Let me check what files are available for us to work with.


 Generated Code: # Checking available files in the current directory
import os

# List all files in the current directory
files = os.listdir('.')
print("Available files:")
for file in files:
    print(file)
I see that there's a 'data' directory. Let me check what's inside that directory, as the GDP data might be stored there.


 Generated Co

Lets double check that these insights are correct and not hallucinated

In [22]:
# Calculate growth per country year to year
for years in range(2021, 2026):
    df["growth_" + str(years)] = (100 * (df[str(years)] - df[str(years - 1)]) )/ df[str(years - 1)]

### Double check the agents work

In [35]:
print(f'Countries with the highest growth in 2025:\n {df.sort_values(by="growth_2025", ascending=False).loc[:, ["Country", "growth_2025"]].head(3)}')
print()
print(f'Countries with the largest contraction in 2025:\n {df.sort_values(by="growth_2025", ascending=True).loc[:, ["Country", "growth_2025"]].head(3)}')

Countries with the highest growth in 2025:
      Country  growth_2025
28   Burundi    42.209572
72     Haiti    27.904228
104   Malawi    18.326693

Countries with the largest contraction in 2025:
          Country  growth_2025
159  South Sudan   -26.276968
57      Ethiopia   -17.932827
79          Iran   -15.034994


## 8. Cleanup

Finally, we'll clean up by stopping the Code Interpreter session. Once finished using a session, the session should be shopped to release resources and avoid unnecessary charges.

In [47]:
# Stop the Code Interpreter session
code_client.stop()
print("Code Interpreter session stopped successfully!")

Code Interpreter session stopped successfully!
