# DataAgent Basic Usage Example

This notebook demonstrates how to use both sklearn and statsmodels tools from the DataAgent package.

## Overview

DataAgent provides a unified interface for:
- **Scikit-learn tools**: Machine learning estimators with automated parameter validation
- **Statsmodels tools**: Statistical analysis including linear models, GLM, nonparametric methods, and more

## Installation

```bash
pip install datagent
```

In [None]:
# Import required libraries
import datagent
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import warnings
warnings.filterwarnings('ignore')

print(f"DataAgent version: {datagent.__version__}")

## 1. Scikit-learn Example

Let's start with a machine learning example using the iris dataset.

In [None]:
# Load iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)

print(f"Dataset shape: {X.shape}")
print(f"Target classes: {np.unique(y)}")
print(f"Feature names: {list(X.columns)}")

# Display first few rows
X.head()

In [None]:
# Create DataFrame with target column for sklearn tools
df = X.copy()
df['target'] = y

# Use universal sklearn estimator
result = datagent.universal_sklearn_estimator(
    estimator_name="random_forest_classifier",
    data=df,
    target_column="target",
    test_size=0.2,
    random_state=42,
    n_estimators=100
)

print(f"Model: {result['estimator_name']}")
print(f"Accuracy: {result['metrics']['accuracy']:.4f}")
print(f"Precision: {result['metrics']['precision']:.4f}")
print(f"Recall: {result['metrics']['recall']:.4f}")
print(f"F1 Score: {result['metrics']['f1']:.4f}")

## 2. Statsmodels Example

Now let's demonstrate statistical analysis using statsmodels tools.

In [None]:
# Create sample data for linear regression
np.random.seed(42)
n = 100
X = np.random.randn(n, 2)
y = 2 * X[:, 0] + 1.5 * X[:, 1] + np.random.randn(n) * 0.5

df = pd.DataFrame({
    'y': y,
    'x1': X[:, 0],
    'x2': X[:, 1]
})

print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

# Display first few rows
df.head()

In [None]:
# Use universal linear model
result = datagent.universal_linear_models(
    model_name="ols",
    data=df,
    formula="y ~ x1 + x2"
)

print(f"Model: {result.get('model_name', 'OLS')}")
print(f"R-squared: {result.get('r_squared', 'N/A')}")
print(f"Adjusted R-squared: {result.get('adj_r_squared', 'N/A')}")
print(f"AIC: {result.get('aic', 'N/A')}")
print(f"BIC: {result.get('bic', 'N/A')}")

# Print coefficients if available
if 'params' in result:
    print("\nCoefficients:")
    for param, value in result['params'].items():
        print(f"  {param}: {value:.4f}")

## 3. Available Models

Let's explore what models are available in DataAgent.

In [None]:
# Get available sklearn models
sklearn_models = datagent.get_available_sklearn_models()
print(f"Available sklearn models: {len(sklearn_models)}")
print("\nSample sklearn models:")
for model in list(sklearn_models.keys())[:10]:
    print(f"  - {model}")

# Get available statsmodels models
linear_models = datagent.get_linear_available_models()
print(f"\nAvailable linear models: {len(linear_models)}")
print("\nSample linear models:")
for model in list(linear_models.keys())[:5]:
    print(f"  - {model}")

## 4. Summary

DataAgent provides a powerful unified interface for both machine learning and statistical analysis:

- **Easy to use**: Single function calls for complex analyses
- **Comprehensive**: Covers both sklearn and statsmodels capabilities
- **Flexible**: Supports various model types and parameters
- **Well-documented**: Clear error messages and results

### Next Steps

1. Explore more models in the available model lists
2. Try different parameters and configurations
3. Check out the LangGraph integration example
4. Use DataAgent in your own data analysis workflows