In [1]:
from langgraph.graph import StateGraph, END
from typing import TypedDict
from langchain_ollama.chat_models import ChatOllama

In [2]:
class AgentState(TypedDict):
    user_input: str
    dataset_code: str
    model_code: str
    interpretation: str

In [3]:
llm = ChatOllama(model="gemma3:latest", temperature=0.2)

In [4]:
# Agent 1 - Generate Synthetic Fraud Dataset Using Faker

def generate_synthetic_data(state: AgentState) -> AgentState:
    prompt = """
You are a Python data engineer.

Generate Python code that:
- Uses `Faker` and `pandas` to create a synthetic fraud detection dataset
- Contains 1000 rows with columns: name, email, transaction_amount, transaction_date, location, is_fraud
- Randomly assigns ~5% of rows as fraud (is_fraud = 1)
- Saves the dataset to 'synthetic_fraud.csv'
- Is fully executable and self-contained
"""
    response = llm.invoke(prompt)
    return {**state, "dataset_code": response.content}


In [5]:
# Agent 2 - Build Classification Model using PyCaret

def build_model_with_pycaret(state: AgentState) -> AgentState:
    prompt = f"""
You are a data scientist.

Write end-to-end Python code using PyCaret to:
1. Load the dataset 'synthetic_fraud.csv'
2. Setup classification using 'is_fraud' as the target column
3. Compare models
4. Select the best performing model
5. Evaluate the model and display metrics

Assume the dataset was generated using:
{state['dataset_code']}
"""
    response = llm.invoke(prompt)
    return {**state, "model_code": response.content}


In [6]:
# Agent 3 - Interpret Model Results

def interpret_results(state: AgentState) -> AgentState:
    prompt = f"""
You are a machine learning analyst.

Given the following PyCaret code:
{state['model_code']}

Please summarize:
- Which model performed best?
- What are the key performance metrics (AUC, accuracy)?
- Any recommendations based on the results
"""
    response = llm.invoke(prompt)
    return {**state, "interpretation": response.content}


In [7]:
# Create LangGraph and Define Flow

graph = StateGraph(AgentState)

# nodes
graph.add_node("generate_data", generate_synthetic_data)
graph.add_node("build_model", build_model_with_pycaret)
graph.add_node("interpret", interpret_results)

graph.set_entry_point("generate_data")
graph.add_edge("generate_data", "build_model")
graph.add_edge("build_model", "interpret")
graph.set_finish_point("interpret")

app = graph.compile()


In [8]:

# Provide initial user input
user_question = "Create a fraud detection model using synthetic data and interpret the results"

# Run the LangGraph workflow
result = app.invoke({"user_input": user_question})


In [9]:
from IPython.display import Markdown

print("Synthetic Data Generation Code")
display(Markdown(f"```python\n{result['dataset_code']}\n```"))


Synthetic Data Generation Code


```python
```python
import pandas as pd
import numpy as np
from faker import Faker

# Initialize Faker
fake = Faker()

# Set the number of rows
num_rows = 1000

# Generate data
data = {
    'name': [fake.name() for _ in range(num_rows)],
    'email': [fake.email() for _ in range(num_rows)],
    'transaction_amount': np.random.uniform(10, 1000, num_rows),
    'transaction_date': pd.to_datetime([''.join(fake.date_time_this_decade()) for _ in range(num_rows)]),
    'location': [fake.city() for _ in range(num_rows)],
    'is_fraud': np.random.choice([0, 1], size=num_rows, p=[0.95, 0.05])  # 5% fraud
}

# Create DataFrame
df = pd.DataFrame(data)

# Save to CSV
df.to_csv('synthetic_fraud.csv', index=False)

print("synthetic_fraud.csv created successfully!")
```

```

In [10]:

print("PyCaret Classification Code**")
display(Markdown(f"```python\n{result['model_code']}\n```"))


PyCaret Classification Code**


```python
```python
import pandas as pd
import numpy as np
from pycaret.classification import *
import time

# Load the dataset 'synthetic_fraud.csv'
df = pd.read_csv('synthetic_fraud.csv')

# Print the first few rows of the dataset
print(df.head())

# Setup classification using 'is_fraud' as the target column
s = setup(data=df, target='is_fraud', session_id=42, verbose=True)

# Compare models
time.sleep(5)  # Allow models to train - increase if needed
best_model = compare_models(n_models=5, fold=5, sort='AUC', verbose=True)

# Select the best performing model
best_model_name = best_model.index[np.argmax(best_model['AUC'])]
print(f"\nBest performing model: {best_model_name}")

# Evaluate the model and display metrics
predictions = s.predict(predictions=best_model_name)
predictions = pd.DataFrame({'is_fraud': predictions})
predictions = predictions.merge(df, on='index')
print("\nPredictions:")
print(predictions.head())

# Evaluate the model
evaluate_model(best_model_name)

# Get the best model
best_model_name = s.retrain_model(best_model_name)
print(f"\nRetrained model: {best_model_name}")

# Evaluate the retrained model
evaluate_model(best_model_name)
```
```

In [11]:
print("Model Interpretation")
display(Markdown(result['interpretation']))


Model Interpretation


Okay, here's a summary of the PyCaret code execution and recommendations, based on the provided snippet:

**Summary of Execution:**

The code performs a standard PyCaret classification workflow to detect fraudulent transactions. Here's a breakdown of the steps:

1.  **Data Loading & Setup:** Loads the 'synthetic_fraud.csv' dataset and sets up a classification task using 'is_fraud' as the target variable.  A session ID of 42 is used for reproducibility.
2.  **Model Comparison:**  `compare_models` is used to train and evaluate 5 different classification models (likely Logistic Regression, XGBoost, Random Forest, etc.) using 5-fold cross-validation. The models are sorted by AUC (Area Under the ROC Curve).
3.  **Best Model Selection:** The model with the highest AUC is identified as the `best_model_name`.
4.  **Initial Evaluation:** The `best_model_name` is evaluated using `evaluate_model`, providing initial performance metrics.
5.  **Retraining:** The `best_model_name` is retrained using `s.retrain_model()`.
6.  **Retrained Evaluation:** The retrained model is evaluated again.

**Key Performance Metrics (Based on the Code):**

*   **Best Performing Model:** The code identifies the model with the highest AUC as the `best_model_name`.  The specific model name isn't explicitly printed, but the code indicates that it's the model with the highest AUC.
*   **AUC:** The AUC value for the best model is used to rank the models.
*   **Accuracy:** The `evaluate_model` function will also calculate and display the accuracy metric. However, the code snippet doesn't show the final accuracy value.

**Recommendations:**

1.  **Retrieve and Analyze Accuracy:** The most critical missing piece is the actual accuracy value for the best model.  You *must* examine the output of `evaluate_model(best_model_name)` to see the final accuracy score.  This is the primary metric for judging the model's performance.

2.  **Consider Precision and Recall:** While AUC is a good overall measure, it's crucial to also look at precision and recall, especially in fraud detection.  A model with high AUC might have a high false positive rate (many non-fraudulent transactions flagged as fraud).  The `evaluate_model` function should provide these metrics.

3.  **Increase `time.sleep()`:** The `time.sleep(5)` call is a deliberate pause to allow the models to train.  If the training process is consistently slow, you might need to increase this duration.

4.  **Further Exploration:**  After obtaining the accuracy, precision, and recall, consider:
    *   **Feature Importance:**  PyCaret can provide insights into which features are most important for predicting fraud. This can help you understand the underlying patterns.
    *   **Threshold Tuning:**  The default threshold for classifying a transaction as fraudulent might not be optimal. Experiment with different thresholds to balance precision and recall based on your business needs.

5.  **Data Quality:** The synthetic nature of the dataset might limit the generalizability of the model.  If possible, evaluate the model on a real-world dataset.

To provide a more definitive answer, I would need to see the output of the `evaluate_model` calls to get the actual accuracy, precision, and recall values.
