# Reflexion Pattern with OpenAI: CSV Data Analysis

In this notebook, we show the **Reflexion** pattern for CSV data analysis. The agent:
1. **Generates code** to analyze a CSV (via OpenAI).
2. **Executes** the code locally.
3. If there's an error, the agent sends the **error message** to OpenAI, gets a **corrected** snippet, and **retries**.



## 1. Data Executor
We use a **controlled environment** to run user-generated Python code. In reality, you'd want more robust sandboxing.

In [None]:
import io
import pandas as pd

def run_generated_code(code_str, csv_data):
    """
    Executes the user-generated Python code in a controlled environment.
    Returns (success, result_or_error).
    - success = True if code ran without exception
    - result_or_error = code's return value or error message
    """
    local_env = {
        "pd": pd,
        "csv_data": csv_data,
        "io": io,
        "result": None,
    }

    try:
        exec(code_str, {}, local_env)
        return True, local_env.get("result", "No result variable found.")
    except Exception as e:
        return False, str(e)

## 2. OpenAI Adapter (Reflexion)
We create a function `call_openai_for_code` that:
- Takes in **prompt** (e.g., “User Query: …” or “REFLECT_ON_ERROR: …”).
- Calls the **OpenAI** client.
- Returns the **code snippet** from `.choices[0].message.content`.

We’ll assume the user wants the model `gpt-4o` (or whatever custom name you have) and a short `developer` message to keep context. In practice, you’ll adapt the system/developer/user messages as needed.

In [None]:
from openai import OpenAI
import os
import re

os.environ['OPENAI_API_KEY']=''


# Initialize your client (ensure your environment has OPENAI_API_KEY set, or provide it explicitly)
client = OpenAI()

def call_openai_for_code(prompt, temperature=0.2):
    """
    Calls the OpenAI chat completion endpoint with a developer/system message and a user message.
    Expects the LLM to return a Python code snippet.
    """
    messages = [
        {"role": "developer", "content": "You are a helpful code-generation assistant. give only code."},
        {"role": "user", "content": prompt}
    ]

    completion = client.chat.completions.create(
        model="gpt-4o",  # or 'gpt-3.5-turbo' or another model
        messages=messages,
        temperature=temperature,
        max_tokens=300,
    )

    code_snippet = completion.choices[0].message.content
    code_snippet = clean_code(code_snippet)
    print (code_snippet)
    return code_snippet.strip()

# Function to clean up code (remove markdown formatting like ```python and `python`)
def clean_code(code):
    # Remove any markdown code fences (```) and 'python' marker
    if code.startswith("```"):
        # Split by ``` and take the code in between, while also stripping 'python' if it appears after ```
        code = code.split("```")[1]
    
    # Remove any remaining 'python' marker if it starts the cleaned block
    code = code.replace("python", "").strip()
    
    return code.strip()

## 3. CSV Reflexion Agent
Implements the **Reflexion** loop:
1. Generate initial code from the user query (via `call_openai_for_code`).
2. Execute. If error, reflect with the error message, get corrected code.
3. Return final result or error message.

In [None]:
class CSVReflexionAgent:
    def __init__(self, csv_data, max_reflections=2):
        self.csv_data = csv_data
        self.max_reflections = max_reflections
        self.conversation_log = []

    def analyze_query(self, user_query: str):
        initial_prompt = (
            f"User Query: {user_query}\n"
            "You have a pandas DataFrame named 'csv_data'. Generate Python code to achieve the user's goal.\n"
            "Store the final result in 'result'."
        )

        # 1) Get initial code from OpenAI
        code_snippet = call_openai_for_code(initial_prompt)
        self.conversation_log.append(f"Initial code:\n{code_snippet}\n")

        success, outcome = run_generated_code(code_snippet, self.csv_data)

        reflections = 0
        while not success and reflections < self.max_reflections:
            reflections += 1
            error_msg = outcome
            self.conversation_log.append(f"Error: {error_msg}\n")

            # 2) Reflection prompt
            reflection_prompt = (
                f"REFLECT_ON_ERROR: The code failed with error:\n{error_msg}\n"
                "Please correct the code so it works."  # Brief instruction
            )

            corrected_code = call_openai_for_code(reflection_prompt)
            self.conversation_log.append(f"Corrected code:\n{corrected_code}\n")

            success, outcome = run_generated_code(corrected_code, self.csv_data)

        # Final check
        if success:
            if outcome is None or (hasattr(outcome, "empty") and outcome.empty):
                self.conversation_log.append("Result is empty. Possibly unsatisfactory.")
                return "No meaningful data produced."
            else:
                return outcome
        else:
            self.conversation_log.append("Exceeded max reflections. No solution.")
            return "Sorry, couldn't generate valid code."

## 4. Main Orchestration
Here, we create some dummy CSV data, instantiate the **Reflexion agent**, and run a sample query like “Please sum the 'Value' column by 'Category'.” We then print out the final result (or error) plus the conversation log.

In [None]:
if __name__ == "__main__":
    # Example CSV Data
    sample_data = {
        "Category": ["A", "B", "A", "C", "B", "A"],
        "Value": [10, 5, 3, 8, 2, 6],
        "Comment": ["foo", "bar", "baz", "lorem", "ipsum", "dolor"]
    }

    df = pd.DataFrame(sample_data)

    # Create the agent
    agent = CSVReflexionAgent(csv_data=df, max_reflections=2)

    # User's data analysis request
    user_query = "Please sum the 'Value' column by 'Category'."

    # Let the agent handle it
    result = agent.analyze_query(user_query)

    print("=== Final Analysis Result ===")
    print(result)

    print("\n=== Conversation Log ===")
    for entry in agent.conversation_log:
        print(entry)

## How to Use
1. **Install** `openai` and set your `OPENAI_API_KEY` environment variable, or configure `client = OpenAI(api_key="...")`.
2. **Run all cells**. The final cell will demonstrate a sample user query. If the initial code fails, the agent tries up to 2 reflections.
3. The outcome might be something like a `pandas.Series` or `DataFrame` grouping result (e.g., sums by category).
4. Inspect the `conversation_log` for each step of code generation and error correction.

## Key Points
- **Reflexion**: We feed errors back to the LLM so it can correct the generated code.
- **Sandbox**: We just do a local `exec()`. In real applications, consider safer isolation.
- **Single** or **Multiple** Reflection Steps: We limit to `max_reflections=2` to avoid infinite loops.
- **Prompt Engineering**: You can refine the system/developer messages for more specialized code generation or debugging instructions.
