# Reflexion Pattern Demo: CSV Data Analysis

This notebook demonstrates a **Reflexion** pattern with an AI agent:
- The agent generates **dynamic Python code** to analyze CSV/Excel data.
- It **executes** that code in a controlled environment.
- If there's an **error or unsatisfactory result**, the agent **reflects** on it and **self-corrects** the code.

We’ll organize the code into **four sections**:
1. **Data Executor** (executes generated Python code safely)
2. **LLM Adapter** (a mock function to simulate LLM behavior)
3. **CSV Reflexion Agent** (the core logic that orchestrates code generation, execution, and correction)
4. **Main Orchestration** (loads sample data and runs the agent)

In a real system, you would use a more robust sandbox for code execution, integrate a genuine LLM API, and add more sophisticated error handling.

## Section 1: Data Executor
This module simulates a **controlled environment** to execute user-generated Python code strings. We expose:
- `pd` (pandas)
- `csv_data` (the DataFrame)
- `io` (for potential I/O)
- A placeholder variable `result` which the code can set.

In production, you’d replace this with a true sandbox (Docker, restricted interpreter, etc.) for safety.

In [None]:
# data_executor.py

import io
import pandas as pd

def run_generated_code(code_str, csv_data):
    """
    Executes the user-generated Python code in a controlled environment.
    Returns (success, result_or_error).
    - success = True if code ran without exception
    - result_or_error = code's return value or error message
    """
    # Restricted local environment for exec
    local_env = {
        "pd": pd,
        "csv_data": csv_data,
        "io": io,
        "result": None,
    }

    try:
        exec(code_str, {}, local_env)
        # Expecting the code to set 'result' if it wants to return something
        return True, local_env.get("result", "No result variable found.")
    except Exception as e:
        return False, str(e)

## Section 2: LLM Adapter
Here, we define a **mock** LLM call function (`call_llm`). In a real environment, this would be replaced by calls to an actual LLM API (e.g., OpenAI, Anthropic). For demonstration, it returns either an initial code snippet or a “corrected” snippet if it detects a reflection prompt.

In [None]:
# llm_adapter.py

def call_llm(prompt, temperature=0.2):
    """
    Simulated LLM call. If 'REFLECT_ON_ERROR' is in the prompt, we return a 'corrected' snippet.
    Otherwise, we return an initial code snippet.
    """
    if "REFLECT_ON_ERROR:" in prompt:
        # LLM sees an error and attempts a fix
        return (
            "# Corrected code snippet:\n"
            "import pandas as pd\n"
            "result = csv_data.groupby('Category')['Value'].sum()\n"
        )
    else:
        # First attempt code snippet
        return (
            "# Generated code snippet:\n"
            "import pandas as pd\n"
            "# Let's assume we want to group by 'Category' and calculate sum of 'Value'\n"
            "result = csv_data.groupby('Category')['Value'].sum()\n"
        )

## Section 3: CSV Reflexion Agent
This class handles the **Reflexion** loop:
1. Prompt the LLM for code.
2. **Execute** the code.
3. If there’s an error, create a **reflection prompt** with the error and get corrected code.
4. Retry until success or until hitting a maximum number of reflections.

Optionally, we can also check if the result is "satisfactory" (not empty, not None, etc.).

In [None]:
# csv_reflexion_agent.py

from llm_adapter import call_llm
from data_executor import run_generated_code

class CSVReflexionAgent:
    def __init__(self, csv_data, max_reflections=2):
        """
        :param csv_data: A pandas DataFrame
        :param max_reflections: How many times to allow auto-correction
        """
        self.csv_data = csv_data
        self.max_reflections = max_reflections
        self.conversation_log = []

    def analyze_query(self, user_query: str):
        """
        Main method:
        1. Prompt LLM for code.
        2. Execute.
        3. If error, reflect.
        4. Return final result or error message.
        """
        # Step 1: Initial code generation
        initial_prompt = (
            f"User Query: {user_query}\n"
            "You have a pandas DataFrame called 'csv_data'. Generate Python code to achieve the user's goal.\n"
            "The code should store the final result in a variable named 'result'."
        )

        code_snippet = call_llm(initial_prompt)
        self.conversation_log.append(f"Initial code:\n{code_snippet}\n")

        success, outcome = run_generated_code(code_snippet, self.csv_data)

        reflections = 0
        while not success and reflections < self.max_reflections:
            reflections += 1
            error_msg = outcome
            self.conversation_log.append(f"Execution error: {error_msg}\n")

            # Step 2: Reflection prompt
            reflection_prompt = (
                f"REFLECT_ON_ERROR: The code failed with the following error:\n{error_msg}\n"
                "Please correct the code."
            )
            corrected_code = call_llm(reflection_prompt)
            self.conversation_log.append(f"Corrected code:\n{corrected_code}\n")

            success, outcome = run_generated_code(corrected_code, self.csv_data)

        # If success, check if outcome is meaningful
        if success:
            if outcome is None or (hasattr(outcome, "empty") and outcome.empty):
                self.conversation_log.append("The result is empty or None. Possibly unsatisfactory.\n")
                return "No meaningful data was produced by the analysis."
            else:
                return outcome
        else:
            # Even after max reflections, we failed
            self.conversation_log.append("Exceeded max reflections. No solution found.\n")
            return "Sorry, I couldn't generate a valid code snippet."

## Section 4: Main Orchestration
We simulate loading some CSV data (here, we just create a small DataFrame in-memory). We then instantiate the **CSVReflexionAgent** and provide a user query (e.g., "Please sum the 'Value' column by 'Category'.")

Run the final cell and observe the result, along with any **reflection** steps if an error occurs.

In [None]:
# main.py

import pandas as pd
if __name__ == "__main__":
    # Sample data simulating CSV contents
    sample_data = {
        "Category": ["A", "B", "A", "C", "B", "A"],
        "Value": [10, 5, 3, 8, 2, 6],
        "Comment": ["foo", "bar", "baz", "lorem", "ipsum", "dolor"]
    }
    df = pd.DataFrame(sample_data)

    # Instantiate our Reflexion-based agent
    from csv_reflexion_agent import CSVReflexionAgent
    agent = CSVReflexionAgent(csv_data=df, max_reflections=2)

    # User query
    user_query = "Please sum the 'Value' column by 'Category'."

    # Analyze the query
    final_result = agent.analyze_query(user_query)

    print("=== Final Analysis Result ===")
    print(final_result)
    print("\n=== Conversation Log ===")
    for entry in agent.conversation_log:
        print(entry)

## How to Run
1. **Run all cells** in this notebook.
2. Observe the **Final Analysis Result**. If the first code snippet works, the agent won't need to reflect. If there's an error, you'll see a **reflection** attempt.

## Key Takeaways
1. **Reflexion Loop**: The agent tries some code, checks if it fails or is unsatisfactory, then **reflects** with an error prompt to produce a corrected version.
2. **Self-Correction**: The LLM sees the error details (exception message) and attempts to fix the code.
3. **Sandboxing**: In production, you'd want a safer environment than a raw `exec()`. Docker containers, restricted interpreters, or frameworks like Jupyter’s `ipykernel` in a restricted mode can help.
4. **Practical Use**: This can be extended for more complex data transformations, logic errors, or repeated improvement (like performance tuning, scaling to bigger data, etc.).
5. **Limitations**: Ensure you have a maximum number of reflections, and that you handle any possible infinite loops or repetitive errors. In real systems, guard against malicious or resource-intensive code.
