In [33]:
import pandas as pd
import plotly.express as px
import os
from string import Template
from smolagents import CodeAgent, HfApiModel, tool, OpenAIServerModel


agent = CodeAgent(
    tools=[], # Tools are not explicitly needed as CodeAgent can use python_interpreter by default
    model=OpenAIServerModel(model_id="gpt-4o", api_key="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"),
    additional_authorized_imports=["pandas", "plotly"],
    verbosity_level=1
)

SYSTEM_PROMPT = Template("""
I want you to analyse this question ${question} using python with Pandas and Plotly. You are provided with a dataset balance_data_2020_2023.csv containing financial fundamentals of US companies from 2020 to 2023. All columns in the dataset are fully populated with no missing values (non-NaN).
Your job is to answer user questions and create graphs. If possible generate plotly graphs and write them as html file. Never show the html graph – you are unable to display them in the environment in which you execute code. All the information you need in regards to the dataset, including all columns follows:

**Dataset Overview**:
- **Country**: United States
- **Years Covered**: 2020, 2021, 2022, 2023

**Dataset Columns**:

1. **Ticker** (`string`): The stock ticker symbol representing each company.
2. **Fiscal Year** (`integer`): The fiscal year for which the data is reported.
3. **Report Date** (`datetime`): The date when the fiscal year report was filed.
4. **Revenue** (`float`): Total revenue generated by the company in the fiscal year.
5. **Cost of Revenue** (`float`): Direct costs attributable to the production of the goods sold by the company.
6. **Gross Profit** (`float`): Revenue minus the Cost of Revenue.
7. **Operating Expenses** (`float`): Expenses incurred through normal business operations.
8. **Selling, General & Administrative** (`float`): Combined costs related to selling, general, and administrative functions.
9. **Research & Development** (`float`): Costs associated with research and development activities.
10. **Depreciation & Amortization** (`float`): Non-cash expenses related to the reduction in value of assets.
11. **Non-Operating Income (Loss)** (`float`): Income or loss from non-core business activities.
12. **Interest Expense, Net** (`float`): Net interest expenses after accounting for interest income.
13. **Pretax Income (Loss), Adj.** (`float`): Income before taxes, adjusted for extraordinary items.
14. **Abnormal Gains (Losses)** (`float`): Gains or losses that are unusual or infrequent.
15. **Pretax Income (Loss)** (`float`): Income before taxes without adjustments.
16. **Income Tax (Expense) Benefit, Net** (`float`): Net income tax expense or benefit.
17. **Income (Loss) from Continuing Operations** (`float`): Income or loss from ongoing business operations.
18. **Net Extraordinary Gains (Losses)** (`float`): Gains or losses from extraordinary items.
19. **Net Income** (`float`): Total profit after all expenses and taxes.
20. **Net Income (Common)** (`float`): Net income attributable to common shareholders.

Your task is to generate Python code that analyzes the provided dataset based on user queries and produces interactive HTML visualizations using Plotly.
""")

user_question_1 = SYSTEM_PROMPT.substitute(question = "Can show me which company was most successful in terms of revenue in 2023? Do the analysis step by step and show me the code.")


In [34]:
print(agent.system_prompt_template)

You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.
To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.
To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.

At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.
Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_code>' sequence.
During each intermediate step, you can use 'print()' to save whatever important information you will then need.
These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step.
In the end you have to return a final answer using the `final_answer` tool.

Here are a few examples using n

In [35]:
import pandas as pd
import plotly.express as px
import os
import uuid
import shutil  # Import the shutil module for file copying

from smolagents import CodeAgent, HfApiModel

DATASET_FILENAME = "balance_data_2020_2023.csv" # Define dataset filename as a constant

def run_agent_in_directory(agent, user_question, output_dir_base="user_question_output", dataset_filename=DATASET_FILENAME):
    """
    Runs the agent in a specific directory, copies the dataset, and ensures outputs are saved there.

    Args:
        agent: The initialized CodeAgent instance.
        user_question (str): The user's question to run the agent with.
        output_dir_base (str): Base name for the output directory (will be made unique).
        dataset_filename (str): The filename of the dataset to copy.

    Returns:
        str: The agent's response string.
        str: The path to the output directory where files are saved.
    """
    original_working_directory = os.getcwd()
    output_directory = os.path.join(original_working_directory, output_dir_base + "_" + str(uuid.uuid4()))
    os.makedirs(output_directory, exist_ok=True)

    # Copy dataset file into the output directory
    dataset_source_path = os.path.join(original_working_directory, dataset_filename) # Assumes dataset is in main dir
    dataset_destination_path = os.path.join(output_directory, dataset_filename)
    if os.path.exists(dataset_source_path):
        shutil.copy2(dataset_source_path, dataset_destination_path) # Use copy2 to preserve metadata
        print(f"Dataset '{dataset_filename}' copied to output directory: {output_directory}")
    else:
        print(f"Warning: Dataset file '{dataset_filename}' not found in main directory. Agent might fail.")


    os.chdir(output_directory)

    try:
        response_chart = agent.run(user_question, reset=False)
        return response_chart, output_directory
    finally:
        os.chdir(original_working_directory)

user_question_1
response_chart_output, output_directory = run_agent_in_directory(agent, user_question_1)

print(f"Response to: '{user_question_1}'")
print(response_chart_output)
print(f"\nGraphs should be in directory: {output_directory}")


if os.path.exists(output_directory) and os.path.isdir(output_directory):
    html_files = [f for f in os.listdir(output_directory) if f.endswith(".html")]
    if html_files:
        print("\nFound HTML graph files:")
        for html_file in html_files:
            file_path = os.path.join(output_directory, html_file)
            print(f"- {file_path}")
    else:
        print("\nNo HTML graph files found in the output directory.")
else:
    print("\nOutput directory not found or is not a directory.")

Dataset 'balance_data_2020_2023.csv' copied to output directory: /Users/paulroeseler/Documents/GitHub/agent_analytics/user_question_output_df1245f3-0c21-4b79-9072-ba28f8937302


Response to: '
I want you to analyse this question Can show me which company was most successful in terms of revenue in 2023? Do the analysis step by step and show me the code. using python with Pandas and Plotly. You are provided with a dataset balance_data_2020_2023.csv containing financial fundamentals of US companies from 2020 to 2023. All columns in the dataset are fully populated with no missing values (non-NaN).
Your job is to answer user questions and create graphs. If possible generate plotly graphs and write them as html file. Never show the html graph – you are unable to display them in the environment in which you execute code. All the information you need in regards to the dataset, including all columns follows:

**Dataset Overview**:
- **Country**: United States
- **Years Covered**: 2020, 2021, 2022, 2023

**Dataset Columns**:

1. **Ticker** (`string`): The stock ticker symbol representing each company.
2. **Fiscal Year** (`integer`): The fiscal year for which the data is

In [40]:
agent.logs

[SystemPromptStep(system_prompt='You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.\nTo do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.\nTo solve the task, you must plan forward to proceed in a series of steps, in a cycle of \'Thought:\', \'Code:\', and \'Observation:\' sequences.\n\nAt each step, in the \'Thought:\' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.\nThen in the \'Code:\' sequence, you should write the code in simple Python. The code sequence must end with \'<end_code>\' sequence.\nDuring each intermediate step, you can use \'print()\' to save whatever important information you will then need.\nThese print outputs will then appear in the \'Observation:\' field, which will be available as input for the next step.\nIn the end you have to return a final answer using 