In [None]:
from autogen_ext.models.openai import OpenAIChatCompletionClient
import json
import os
import re
from datetime import datetime, timezone
import uuid

from autogen_agentchat.agents import AssistantAgent, CodeExecutorAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_agentchat.messages import TextMessage
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor
from autogen_agentchat.ui import Console
from dotenv import load_dotenv

load_dotenv()

# Model configuration (left as-is)
ANTHROPIC_MODEL = os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-20250514")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY2")

# Directory configuration
TEMP_DIR = "temp"
CSV_PATH = os.path.join(TEMP_DIR, "data.csv")

DATA_ANALYZER_SYSTEM_MESSAGE = """
You are an expert-level data analyst agent. Your purpose is to write and execute Python code to analyze financial 
data and present the findings. You will receive a file named `data.csv` and a question from the user.

Your **first and most important step** is to determine if the user's request is **specific** or **broad**.


-----

## 🔁 Execution Protocol

You must follow this turn-based process without deviation.

**Step 1: Plan**

  - Start your response by stating whether the user's request is **broad** or **specific**.
  - Briefly outline the plan for the Python script you will write.

**Step 2: Code**

  - Write all necessary Python code in a **single, complete code block.**
  - The script must conform to the **Python Scripting Guidelines** detailed below.

**Step 3: Wait for Execution**

  - After providing the code block, **end your turn.**
  - **DO NOT** write anything else. Do not explain the code. Do not predict the results.
  - Wait for the executor agent to run the script and provide you with the results (e.g., `stdout`, `stderr`, or file paths).

**Step 4: Review the Output**

  - Once you receive the console output and/or file paths, review them carefully.
  - If the script failed (e.g., errors, missing libraries, no console output), your next step is to debug. Provide a command 
  to install missing packages or submit a corrected script.

**Step 5: Provide the Final Answer**

  - **Only after the code executes successfully** and you have reviewed the actual console output, provide your final answer.
  - Your answer should be a comprehensive explanation of the financial insights based **exclusively on the results provided by the executor.**
  - Conclude your final response with the word **STOP**.

-----

## 📝 Workflows

Your script's objective is determined by the type of question. The following instructions describe what your **Python script** must accomplish.

### **Workflow 1: Broad Questions**

For open-ended requests ("Analyze the data"), generate a **single Python script** that creates a comprehensive, web-friendly markdown report named `output/report.md`.

  - **Report Generation:** The script itself must generate the entire report as a markdown string and write it to the file.
  - **Executive Summary:** The script should begin the report with an "\# Executive Summary" section containing 3-4 key bullet points.
  - **Aggregations & Charts:** The script must perform all required calculations (e.g., spend by month, category, cardholder etc.) and generate all required charts as PNG files saved to the `output/` directory.
  - **Insights & Tables:** For every table or chart the script adds to the report, it must also append a concise, 1-2 line summary of the key financial insight.
  - **Embed Images:** The script must embed all charts directly into the markdown file using a Base64 data URI. **Do not use simple file links.**

### **Workflow 2: Specific Questions**

For targeted questions ("Who spent the most?"), write a Python script that directly calculates or visualizes the answer.

  - **Direct Output:** The script should print calculations and data (e.g., formatted pandas DataFrames) directly to the console.
  - **Visualizations:** If a plot is necessary, the script must save it as a PNG file in the `output/` directory and print a confirmation message.
  - **Insights:** For every output (table or chart), the script must print a concise, 1-2 line summary of the key insight to the console.
  - **No Markdown Report:** Do not generate a full `.md` report for specific questions.

-----

## 🐍 Python Scripting Guidelines

**EVERY Python script you write MUST adhere to the following template and rules.**

```python
# ----------------- BOILERPLATE START -----------------
import sys
import os
import json
import glob
import traceback
import base64
from pathlib import Path
import subprocess
import importlib

# Ensure critical libraries are installed
def ensure_package(package_name, import_name=None):
    if import_name is None:
        import_name = package_name
    try:
        return importlib.import_module(import_name)
    except ImportError:
        print(f"⏳ Installing {package_name}...")
        sys.stdout.flush()
        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
        return importlib.import_module(import_name)

# Helper function to embed images in markdown
def embed_image(image_path, report_content):
    try:
        print(f"🖼️ Embedding image: {image_path}")
        sys.stdout.flush()
        image_data = base64.b64encode(Path(image_path).read_bytes()).decode()
        report_content += f"\n![{Path(image_path).stem}](data:image/png;base64,{image_data})\n\n"
    except Exception as e:
        error_msg = f"*Error embedding image {image_path}: {e}*"
        print(f"⚠️ {error_msg}")
        sys.stdout.flush()
        report_content += f"\n{error_msg}\n\n"
    return report_content

# File discovery function
def find_data_file():
    print("🔍 Searching for data file...")
    sys.stdout.flush()
    # [Rest of the find_data_file function as provided in the original prompt]
    possible_paths = ['temp/data.csv', './data.csv', '/workspace/data.csv', '/workspace/temp/data.csv']
    for path in possible_paths:
        if os.path.exists(path):
            print(f"✅ Found data file: {path}")
            sys.stdout.flush()
            return path
    print("❌ No data file found in common locations. Please ensure 'combined_data.json' is available.")
    sys.stdout.flush()
    return None

try:
    print("📊 Starting data analysis...")
    sys.stdout.flush()
    
    # Ensure packages and load them
    pd = ensure_package("pandas")
    plt = ensure_package("matplotlib", "matplotlib.pyplot")
    
    # Find and load the data file
    data_file_path = find_data_file()
    if data_file_path is None:
        sys.exit(1) # Exit if no file is found

    print(f"📂 Loading data from: {data_file_path}")
    sys.stdout.flush()
    with open(data_file_path, 'r') as f:
        data = json.load(f)
    print("✅ Data loaded successfully.")
    sys.stdout.flush()

# ----------------- BOILERPLATE END -----------------

    # <<< YOUR ANALYSIS CODE GOES HERE >>>
    # Perform data prep, calculations, and generate outputs (charts/reports).
    # Remember: The 'output/' directory is pre-created for you.


# ----------------- ERROR HANDLING START -----------------
except FileNotFoundError as e:
    print(f"⚠️ File not found error: {e}. Please ensure the data file exists.")
    sys.stdout.flush()
    sys.exit(1)
except json.JSONDecodeError as e:
    print(f"⚠️ JSON parsing error in data file: {e}")
    sys.stdout.flush()
    sys.exit(1)
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    print(f"Traceback: {traceback.format_exc()}")
    sys.stdout.flush()
    sys.exit(1)
finally:
    print("✅ Analysis script finished.")
    sys.stdout.flush()
# ----------------- ERROR HANDLING END -----------------
```

**Key Scripting Rules:**

  - **Use `sys.stdout.flush()` after every `print()` statement** to ensure console output is visible.
  - The `output/` directory is pre-created. Save all artifacts (charts, reports) there. **Do not create any other directories.**
  - All charts must be saved as `.png` files. **Do not use pie charts or subplots.**
  - After saving any file, the script **must** print a confirmation message (e.g., `print("✅ Successfully saved file: output/chart.png")`).
 """

def get_openai_client():
    """Get configured OpenAI model client."""
    return OpenAIChatCompletionClient(
        model=OPENAI_MODEL,
        api_key=OPENAI_API_KEY
    )

model_client = get_openai_client()

code_executor = LocalCommandLineCodeExecutor(work_dir=TEMP_DIR)

# Ensure the CSV exists
if not os.path.exists(CSV_PATH):
    raise FileNotFoundError(f"CSV not found at {CSV_PATH}. Please place 'data.csv' in the 'temp' directory.")

data_analyzer_agent = AssistantAgent(
        name="Data_Analyzer",
        model_client=model_client,
        system_message=DATA_ANALYZER_SYSTEM_MESSAGE,
        description="Data analysis agent that processes and analyzes data."
    )

code_executor_agent = CodeExecutorAgent(
        name="Python_Code_Executor",
        code_executor=code_executor,
        description="Python code executor agent that runs code in a Docker container."
    )

generator_team = RoundRobinGroupChat(
        participants=[code_executor_agent, data_analyzer_agent],
        termination_condition=text_mention_termination,
        max_turns=100
    )

# Run and stream to console (works in Jupyter via top-level await)
generator_result = await Console(generator_team.run_stream(task=task))

  DATA_ANALYZER_SYSTEM_MESSAGE = """


---------- TextMessage (user) ----------
You are given a CSV file named data.csv located in the working directory for this run. The CSV columns are exactly: bank_name, cardholder, transaction_date, description, amount, Category.

Follow the Execution Protocol and Python Scripting Guidelines in your system prompt. Decide if the question is broad or specific and act accordingly.

User question: Analyze the CSV and produce a comprehensive markdown report with monthly spend trends by cardholder and top merchants, including at least 5 charts.
---------- TextMessage (Data_Analyzer) ----------
Step 1: Plan
The user's request is broad. They want a comprehensive analysis report with monthly spend trends by cardholder and top merchants, including at least 5 charts. The plan is to:

- Load and clean the data.
- Aggregate monthly spend trends by cardholder.
- Identify top merchants by total spend.
- Generate at least 5 charts including:
  1. Monthly spend trend by cardholder (line or bar chart)
  

In [3]:
from autogen_ext.models.openai import OpenAIChatCompletionClient
import os
from datetime import datetime, timezone
import uuid

from autogen_agentchat.agents import AssistantAgent, CodeExecutorAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_agentchat.messages import TextMessage
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor
from autogen_agentchat.ui import Console
from dotenv import load_dotenv

load_dotenv()

# Model configuration (left as-is)
ANTHROPIC_MODEL = os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-20250514")
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4.1-mini")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY2")

# Directory configuration
TEMP_DIR = "temp"
CSV_PATH = os.path.join(TEMP_DIR, "data.csv")
CSV_ABS_PATH = os.path.abspath(CSV_PATH)

# Ensure source CSV exists in temp (do NOT copy it elsewhere)
if not os.path.exists(CSV_PATH):
    raise FileNotFoundError(f"CSV not found at {CSV_PATH}. Please place 'data.csv' in the 'temp' directory.")

# Create a per-message/run working directory under temp and an output/ directory inside it
run_id = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S_") + uuid.uuid4().hex[:8]
WORK_DIR = os.path.join(TEMP_DIR, f"run_{run_id}")
OUTPUT_DIR = os.path.join(WORK_DIR, "output")
os.makedirs(OUTPUT_DIR, exist_ok=True)

# System message template (CSV-aware; includes Category column; reads directly from temp/data.csv)
DATA_ANALYZER_SYSTEM_MESSAGE_TEMPLATE = """
You are an expert-level data analyst agent. Your purpose is to write and execute Python code to analyze financial
data and present the findings. You will receive a question from the user and must analyze the CSV file located at:
- ABSOLUTE PATH: {CSV_ABS_PATH}
- RELATIVE PATH (from project root): temp/data.csv

CSV schema (columns exactly, in any case the header exists):
- bank_name, cardholder, transaction_date, description, amount, Category

Notes:
- transaction_date is a date-like string.
- amount is numeric (may include negatives for refunds/credits).
- Category is a human-assigned category string (e.g., "Food & Dining", "Bills & Subscriptions", etc.).

Do NOT move or copy the CSV file. Read it directly from temp/data.csv (absolute path provided above).

Your first and most important step is to determine if the user's request is specific or broad.

-----
## 🔁 Execution Protocol
You must follow this turn-based process without deviation.
Step 1: Plan
  - Start your response by stating whether the user's request is broad or specific.
  - Briefly outline the plan for the Python script you will write.
Step 2: Code
  - Write all necessary Python code in a single, complete code block.
  - The script must conform to the Python Scripting Guidelines detailed below.
Step 3: Wait for Execution
  - After providing the code block, end your turn.
  - DO NOT write anything else. Do not explain the code. Do not predict the results.
  - Wait for the executor agent to run the script and provide you with the results (stdout, stderr, or file paths).
Step 4: Review the Output
  - Once you receive the console output and/or file paths, review them carefully.
  - If the script failed (errors, missing libraries, no console output), your next step is to debug. Provide a command
    to install missing packages or submit a corrected script.
Step 5: Provide the Final Answer
  - Only after the code executes successfully and you have reviewed the actual console output, provide your final answer.
  - Your answer should be a comprehensive explanation of the financial insights based exclusively on the results provided by the executor.
  - If appropriate, include a concise markdown summary and references to generated charts or the markdown report.
  - Conclude your final response with the word STOP.

-----
## 📝 Workflows
Your script's objective is determined by the type of question. The following instructions describe what your Python script must accomplish.

Workflow 1: Broad Questions
For open-ended requests ("Analyze the data"), generate a single Python script that creates a comprehensive, web-friendly markdown report named output/report.md.
  - Report Generation: The script itself must generate the entire report as a markdown string and write it to the file.
  - Executive Summary: The report must begin with an "# Executive Summary" section containing 3-4 key bullet points.
  - Aggregations & Charts: Perform calculations (e.g., spend by month, by Category, by cardholder, by bank, and top merchants inferred from the description) and generate charts as PNG files saved to the output/ directory.
  - Insights & Tables: For every table or chart added to the report, also append a concise, 1-2 line summary of the key insight.
  - Embed Images: Embed all charts directly into the markdown file using a Base64 data URI. Do not use simple file links.

Workflow 2: Specific Questions
For targeted questions ("Who spent the most?"), write a Python script that directly calculates or visualizes the answer.
  - Direct Output: Print calculations and data (e.g., formatted pandas DataFrames) directly to the console.
  - Visualizations: If a plot is necessary, save it as a PNG file in the output/ directory and print a confirmation message.
  - Insights: For every output (table or chart), print a concise, 1-2 line summary of the key insight to the console.
  - No Markdown Report: Do not generate a full .md report for specific questions.

-----
## 🐍 Python Scripting Guidelines
EVERY Python script you write MUST adhere to the following template and rules.
```python
# ----------------- BOILERPLATE START -----------------
import sys
import os
import glob
import traceback
import base64
from pathlib import Path
import subprocess
import importlib

# Ensure critical libraries are installed
def ensure_package(package_name, import_name=None):
    if import_name is None:
        import_name = package_name
    try:
        return importlib.import_module(import_name)
    except ImportError:
        print(f"⏳ Installing {package_name}...")
        sys.stdout.flush()
        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
        return importlib.import_module(import_name)

# Helper function to embed images in markdown
def embed_image(image_path, report_content):
    try:
        print(f"🖼️ Embedding image: {image_path}")
        sys.stdout.flush()
        image_data = base64.b64encode(Path(image_path).read_bytes()).decode()
        report_content += f"\\n![{Path(image_path).stem}](data:image/png;base64,{image_data})\\n\\n"
    except Exception as e:
        error_msg = f"*Error embedding image {image_path}: {e}*"
        print(f"⚠️ {error_msg}")
        sys.stdout.flush()
        report_content += f"\\n{error_msg}\\n\\n"
    return report_content

# File discovery function: prefer the absolute temp/data.csv path first, then common fallbacks.
def find_data_file():
    print("🔍 Searching for data file...")
    sys.stdout.flush()
    candidates = [
        "{CSV_ABS_PATH}",
        "temp/data.csv",
        "../temp/data.csv",
        "../../temp/data.csv",
        "../../../temp/data.csv",
        "data.csv",  # Last resort
    ]
    for path in candidates:
        if os.path.exists(path):
            print(f"✅ Found data file: {path}")
            sys.stdout.flush()
            return path
    print("❌ No data file found in expected locations. Please ensure 'temp/data.csv' exists.")
    sys.stdout.flush()
    return None

try:
    print("📊 Starting data analysis...")
    sys.stdout.flush()

    # Ensure packages and load them
    pd = ensure_package("pandas")
    np = ensure_package("numpy")
    plt = ensure_package("matplotlib", "matplotlib.pyplot")
    seaborn = ensure_package("seaborn")

    # Headless backend for plots
    try:
        import matplotlib
        matplotlib.use("Agg")
    except Exception:
        pass

    # Controller pre-creates the output directory. Ensure it exists.
    os.makedirs("output", exist_ok=True)

    # Find and load the data file (CSV)
    data_file_path = find_data_file()
    if data_file_path is None:
        sys.exit(1)  # Exit if no file is found

    print(f"📂 Loading data from: {data_file_path}")
    sys.stdout.flush()

    # Read CSV robustly (handle BOM and strings cleanly; defer date parsing)
    df = pd.read_csv(
        data_file_path,
        encoding="utf-8-sig",
        dtype={
            "bank_name": "string",
            "cardholder": "string",
            "description": "string",
            "Category": "string",
        }
    )

    # Normalize/clean fields
    if "amount" in df.columns:
        # Strip currency symbols, commas, and parentheses for negatives
        amt_series = df["amount"].astype(str).str.strip()
        amt_series = (
            amt_series.str.replace(r"[,$]", "", regex=True)
                      .str.replace(r"\\(", "-", regex=True)
                      .str.replace(r"\\)", "", regex=True)
        )
        df["amount"] = pd.to_numeric(amt_series, errors="coerce")

    if "transaction_date" in df.columns:
        # Coerce to datetime without deprecated infer flags
        df["transaction_date"] = pd.to_datetime(df["transaction_date"], errors="coerce", utc=False)

    for col in ["bank_name", "cardholder", "description", "Category"]:
        if col in df.columns:
            df[col] = df[col].fillna("").astype("string").str.strip()

    # Drop rows with missing critical fields
    before = len(df)
    df = df.dropna(subset=["transaction_date", "amount"])
    after = len(df)
    print(f"🧹 Dropped {before - after} rows with invalid dates or amounts.")
    sys.stdout.flush()

    # If still empty, print diagnostics to help debugging
    if df.empty:
        print("⚠️ DataFrame is empty after cleaning. Dumping diagnostics:")
        print("Columns:", list(df.columns))
        try:
            print("Head of CSV (first 10 lines):")
            with open(data_file_path, 'r', encoding='utf-8-sig') as fh:
                for i, line in enumerate(fh):
                    if i > 9:
                        break
                    print(line.rstrip())
        except Exception as e:
            print("Failed to show CSV head:", e)
        sys.stdout.flush()
# ----------------- BOILERPLATE END -----------------
    # <<< YOUR ANALYSIS CODE GOES HERE >>>
    # Perform data prep, calculations, and generate outputs (charts/reports).
    # Remember: The 'output/' directory is pre-created for you.

# ----------------- ERROR HANDLING START -----------------
except FileNotFoundError as e:
    print(f"⚠️ File not found error: {e}. Please ensure the data file exists at temp/data.csv.")
    sys.stdout.flush()
    sys.exit(1)
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    print(f"Traceback: {traceback.format_exc()}")
    sys.stdout.flush()
    sys.exit(1)
finally:
    print("✅ Analysis script finished.")
    sys.stdout.flush()
# ----------------- ERROR HANDLING END -----------------
```
Key Scripting Rules:
  - Use sys.stdout.flush() after every print() statement to ensure console output is visible.
  - The output/ directory is pre-created. Save all artifacts (charts, reports) there. Do not create any other directories.
  - All charts must be saved as .png files. Do not use pie charts or subplots.
  - After saving any file, the script must print a confirmation message (e.g., print("✅ Successfully saved file: output/chart.png")).
"""

# Inject the absolute CSV path into the system message
DATA_ANALYZER_SYSTEM_MESSAGE = DATA_ANALYZER_SYSTEM_MESSAGE_TEMPLATE.replace("{CSV_ABS_PATH}", CSV_ABS_PATH)

def get_openai_client():
    """Get configured OpenAI model client."""
    return OpenAIChatCompletionClient(
        model=OPENAI_MODEL,
        api_key=OPENAI_API_KEY
    )

model_client = get_openai_client()

# Code executor attached to this run-specific working directory (per-message outputs live under temp/run_*/output)
code_executor = LocalCommandLineCodeExecutor(work_dir=WORK_DIR)

# Agents
data_analyzer_agent = AssistantAgent(
    name="Data_Analyzer",
    model_client=model_client,
    system_message=DATA_ANALYZER_SYSTEM_MESSAGE,
    description="Data analysis agent that processes and analyzes CSV data directly from temp/data.csv."
)

code_executor_agent = CodeExecutorAgent(
    name="Python_Code_Executor",
    code_executor=code_executor,
    description="Python code executor agent that runs Python scripts locally."
)

# Termination condition to avoid infinite loops
termination_condition = MaxMessageTermination(max_messages=12)

# Team setup: start with the analyzer so it can plan and emit code first
generator_team = RoundRobinGroupChat(
    participants=[data_analyzer_agent, code_executor_agent],
    termination_condition=termination_condition,
    max_turns=100
)

# Provide your question here. Adjust as needed.
user_question = (
    "Analyze the CSV and produce a comprehensive markdown report with monthly spend trends by cardholder, "
    "by Category, and top merchants inferred from the description field, including at least two charts."
)
# Example of a specific question:
# user_question = "Which Category has the highest total spend this year, and which cardholder is the top contributor? Include a bar chart."

# Compose the user task message
task = TextMessage(
    content=(
        "You must read the CSV directly from temp/data.csv using the absolute path provided in your system prompt. "
        "Do not move or copy the file. Follow the Execution Protocol and Python Scripting Guidelines.\n\n"
        f"User question: {user_question}\n\n"
        f"Per-run output directory (already created): {OUTPUT_DIR}. "
        "Your script should save into the relative path 'output/' which maps to that directory."
    ),
    source="user"
)

# Run and stream to console (works in Jupyter via top-level await)
generator_result = await Console(generator_team.run_stream(task=task))

print(f"\nRun directory: {WORK_DIR}")
print(f"Output directory: {OUTPUT_DIR}")
print("If a report was generated, it should be at: output/report.md relative to the run directory above.")


---------- TextMessage (user) ----------
You must read the CSV directly from temp/data.csv using the absolute path provided in your system prompt. Do not move or copy the file. Follow the Execution Protocol and Python Scripting Guidelines.

User question: Analyze the CSV and produce a comprehensive markdown report with monthly spend trends by cardholder, by Category, and top merchants inferred from the description field, including at least two charts.

Per-run output directory (already created): temp/run_20250830_065226_3eac066b/output. Your script should save into the relative path 'output/' which maps to that directory.
---------- TextMessage (Data_Analyzer) ----------
Step 1: Plan

The user's request is broad; they want a comprehensive markdown report analyzing financial transactions with monthly spend trends by cardholder, by Category, and top merchants inferred from the description. The report must include at least two charts embedded as base64 images.

Plan:
- Load and clean data