# DataAgent LangGraph Integration Example

This notebook demonstrates how to integrate DataAgent tools with LangGraph for automated data analysis workflows.

## Overview

LangGraph is a framework for building stateful, multi-actor applications with LLMs. DataAgent provides tools that can be easily integrated into LangGraph workflows for automated data analysis.

## Key Concepts

- **State Management**: LangGraph manages state across workflow steps
- **Tool Integration**: DataAgent tools can be used as nodes in the workflow
- **Automated Analysis**: Complete data analysis pipelines can be automated
- **Recommendations**: AI-powered insights and recommendations

## Installation

```bash
pip install datagent[langgraph]
```

In [None]:
# Import required libraries
import datagent
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from typing import Dict, Any, TypedDict
import warnings
warnings.filterwarnings('ignore')

print(f"DataAgent version: {datagent.__version__}")

## 1. Define State Structure

First, let's define the state structure that will be used throughout our LangGraph workflow.

In [None]:
# Define the state structure for our LangGraph workflow
class AnalysisState(TypedDict):
    data: pd.DataFrame
    target: pd.Series
    analysis_results: Dict[str, Any]
    current_step: str
    recommendations: list

print("State structure defined successfully!")

## 2. Data Loading Step

This step loads and prepares the data for analysis.

In [None]:
def load_data(state: AnalysisState) -> AnalysisState:
    """Load and prepare data"""
    print("Loading iris dataset...")
    
    iris = load_iris()
    X = pd.DataFrame(iris.data, columns=iris.feature_names)
    y = pd.Series(iris.target)
    
    state["data"] = X
    state["target"] = y
    state["current_step"] = "data_loaded"
    state["analysis_results"] = {}
    state["recommendations"] = []
    
    print(f"Loaded dataset with shape: {X.shape}")
    print(f"Target classes: {np.unique(y)}")
    return state

# Test the data loading step
initial_state = AnalysisState(
    data=pd.DataFrame(),
    target=pd.Series(),
    analysis_results={},
    current_step="initialized",
    recommendations=[]
)

state = load_data(initial_state)
print(f"\nCurrent step: {state['current_step']}")

## 3. Sklearn Analysis Step

This step performs machine learning analysis using DataAgent's sklearn tools.

In [None]:
def analyze_with_sklearn(state: AnalysisState) -> AnalysisState:
    """Analyze data using sklearn tools"""
    print("Running sklearn analysis...")
    
    try:
        # Create DataFrame with target column for sklearn tools
        df = state["data"].copy()
        df['target'] = state["target"]
        
        # Use DataAgent's sklearn tool
        result = datagent.universal_sklearn_estimator(
            estimator_name="random_forest_classifier",
            data=df,
            target_column="target",
            test_size=0.2,
            random_state=42,
            n_estimators=100
        )
        
        state["analysis_results"]["sklearn"] = result
        state["current_step"] = "sklearn_completed"
        
        # Add recommendations based on results
        accuracy = result["metrics"]["accuracy"]
        if accuracy > 0.95:
            state["recommendations"].append("Excellent model performance! Consider using this model for production.")
        elif accuracy > 0.85:
            state["recommendations"].append("Good model performance. Consider hyperparameter tuning for improvement.")
        else:
            state["recommendations"].append("Model performance could be improved. Consider feature engineering or different algorithms.")
            
        print(f"Sklearn analysis completed. Accuracy: {accuracy:.4f}")
        
    except Exception as e:
        print(f"Error in sklearn analysis: {e}")
        state["analysis_results"]["sklearn"] = {"error": str(e)}
    
    return state

# Test the sklearn analysis step
state = analyze_with_sklearn(state)
print(f"\nCurrent step: {state['current_step']}")
print(f"Recommendations: {len(state['recommendations'])}")

## 4. Statsmodels Analysis Step

This step performs statistical analysis using DataAgent's statsmodels tools.

In [None]:
def analyze_with_statsmodels(state: AnalysisState) -> AnalysisState:
    """Analyze data using statsmodels tools"""
    print("Running statsmodels analysis...")
    
    try:
        # Create a combined dataset for statsmodels
        df = state["data"].copy()
        df['target'] = state["target"]
        
        # Use DataAgent's statsmodels tool
        result = datagent.universal_linear_models(
            model_name="ols",
            data=df,
            formula="target ~ sepal_length + sepal_width + petal_length + petal_width"
        )
        
        state["analysis_results"]["statsmodels"] = result
        state["current_step"] = "statsmodels_completed"
        
        # Add recommendations based on results
        r_squared = result.get("r_squared", 0)
        if r_squared > 0.9:
            state["recommendations"].append("Strong linear relationship detected. Linear models are appropriate.")
        elif r_squared > 0.7:
            state["recommendations"].append("Moderate linear relationship. Consider non-linear models for better fit.")
        else:
            state["recommendations"].append("Weak linear relationship. Consider non-linear models or feature engineering.")
            
        print(f"Statsmodels analysis completed. R-squared: {r_squared:.4f}")
        
    except Exception as e:
        print(f"Error in statsmodels analysis: {e}")
        state["analysis_results"]["statsmodels"] = {"error": str(e)}
    
    return state

# Test the statsmodels analysis step
state = analyze_with_statsmodels(state)
print(f"\nCurrent step: {state['current_step']}")
print(f"Total recommendations: {len(state['recommendations'])}")

## 5. Report Generation Step

This step generates a comprehensive analysis report.

In [None]:
def generate_report(state: AnalysisState) -> AnalysisState:
    """Generate a comprehensive analysis report"""
    print("Generating analysis report...")
    
    report = {
        "summary": {
            "dataset_shape": state["data"].shape,
            "target_classes": len(state["target"].unique()),
            "analysis_steps": state["current_step"]
        },
        "results": state["analysis_results"],
        "recommendations": state["recommendations"]
    }
    
    state["analysis_results"]["report"] = report
    state["current_step"] = "report_generated"
    
    print("Report generated successfully!")
    return state

# Test the report generation step
state = generate_report(state)
print(f"\nCurrent step: {state['current_step']}")

## 6. Report Display Step

This step displays the analysis report in a formatted way.

In [None]:
def print_report(state: AnalysisState) -> AnalysisState:
    """Print the analysis report"""
    print("\n" + "="*60)
    print("DATAAGENT ANALYSIS REPORT")
    print("="*60)
    
    report = state["analysis_results"]["report"]
    
    print(f"\nDataset Summary:")
    print(f"  Shape: {report['summary']['dataset_shape']}")
    print(f"  Target classes: {report['summary']['target_classes']}")
    
    print(f"\nAnalysis Results:")
    
    # Sklearn results
    if "sklearn" in report["results"] and "error" not in report["results"]["sklearn"]:
        sklearn_result = report["results"]["sklearn"]
        print(f"  Scikit-learn Random Forest:")
        print(f"    Accuracy: {sklearn_result['metrics']['accuracy']:.4f}")
        print(f"    Precision: {sklearn_result['metrics']['precision']:.4f}")
        print(f"    Recall: {sklearn_result['metrics']['recall']:.4f}")
        print(f"    F1 Score: {sklearn_result['metrics']['f1']:.4f}")
    
    # Statsmodels results
    if "statsmodels" in report["results"] and "error" not in report["results"]["statsmodels"]:
        statsmodels_result = report["results"]["statsmodels"]
        print(f"  Statsmodels OLS:")
        print(f"    R-squared: {statsmodels_result.get('r_squared', 'N/A')}")
        print(f"    Adjusted R-squared: {statsmodels_result.get('adj_r_squared', 'N/A')}")
        print(f"    AIC: {statsmodels_result.get('aic', 'N/A')}")
    
    print(f"\nRecommendations:")
    for i, rec in enumerate(report["recommendations"], 1):
        print(f"  {i}. {rec}")
    
    print("\n" + "="*60)
    
    return state

# Display the final report
state = print_report(state)

## 7. Complete Workflow Execution

Now let's run the complete workflow from start to finish.

In [None]:
def main():
    """Main function to run the LangGraph-style workflow"""
    print("DataAgent LangGraph Integration Example")
    print("="*50)
    
    # Initialize state
    state = AnalysisState(
        data=pd.DataFrame(),
        target=pd.Series(),
        analysis_results={},
        current_step="initialized",
        recommendations=[]
    )
    
    # Run the workflow steps
    state = load_data(state)
    state = analyze_with_sklearn(state)
    state = analyze_with_statsmodels(state)
    state = generate_report(state)
    state = print_report(state)
    
    print("\nWorkflow completed successfully!")
    return state

# Run the complete workflow
final_state = main()

## 8. LangGraph Integration

Here's how you would integrate this with actual LangGraph (if you have it installed):

In [None]:
# Example LangGraph integration (commented out as it requires langgraph installation)
"""
from langgraph.graph import StateGraph

# Create the graph
workflow = StateGraph(AnalysisState)

# Add nodes
workflow.add_node("load_data", load_data)
workflow.add_node("sklearn_analysis", analyze_with_sklearn)
workflow.add_node("statsmodels_analysis", analyze_with_statsmodels)
workflow.add_node("generate_report", generate_report)
workflow.add_node("print_report", print_report)

# Define the flow
workflow.set_entry_point("load_data")
workflow.add_edge("load_data", "sklearn_analysis")
workflow.add_edge("sklearn_analysis", "statsmodels_analysis")
workflow.add_edge("statsmodels_analysis", "generate_report")
workflow.add_edge("generate_report", "print_report")

# Compile the graph
app = workflow.compile()

# Run the workflow
initial_state = AnalysisState(
    data=pd.DataFrame(),
    target=pd.Series(),
    analysis_results={},
    current_step="initialized",
    recommendations=[]
)

result = app.invoke(initial_state)
"""

print("LangGraph integration example shown above (commented out)")
print("To use with LangGraph, install: pip install langgraph")

## 9. Summary

This example demonstrates how DataAgent can be integrated into LangGraph workflows:

### Key Benefits:

1. **Stateful Workflows**: Each step maintains and updates state
2. **Modular Design**: Each analysis step is a separate function
3. **Automated Insights**: AI-powered recommendations based on results
4. **Comprehensive Analysis**: Both ML and statistical analysis in one workflow
5. **Extensible**: Easy to add new analysis steps or modify existing ones

### Workflow Steps:

1. **Data Loading**: Load and prepare data
2. **Sklearn Analysis**: Perform machine learning analysis
3. **Statsmodels Analysis**: Perform statistical analysis
4. **Report Generation**: Create comprehensive report
5. **Report Display**: Format and display results

### Next Steps:

1. Install LangGraph: `pip install langgraph`
2. Uncomment the LangGraph integration code above
3. Add more analysis steps (clustering, feature engineering, etc.)
4. Integrate with LLM agents for more sophisticated recommendations
5. Deploy as a production workflow