# EquiML Quick Start Tutorial

**Welcome to EquiML!** This notebook will guide you through building your first fair AI model.

## What you'll learn:
- Load and analyze data for bias
- Train a fair machine learning model
- Evaluate fairness and performance
- Generate comprehensive reports

## Prerequisites:
- Basic Python knowledge
- Understanding of machine learning concepts (helpful but not required)

Let's build fair AI together! 🚀

## Step 1: Import EquiML and Load Data

First, let's import the EquiML components and load a sample dataset.

In [None]:
# Import EquiML components
import sys
sys.path.append('../..')

from src.data import Data
from src.model import Model
from src.evaluation import EquiMLEvaluation
from src.monitoring import BiasMonitor

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print("✅ EquiML imported successfully!")

In [None]:
# Load the Adult Income dataset
data = Data(
    dataset_path='../../tests/data/adult.csv',
    sensitive_features=['sex']  # We want to ensure fairness by gender
)

data.load_data()

print(f"📊 Dataset loaded: {len(data.df)} rows, {len(data.df.columns)} columns")
print(f"🏷️  Columns: {list(data.df.columns)}")

# Preview the data
data.df.head()

## Step 2: Analyze Data for Potential Bias

Before building our model, let's examine the data for potential bias.

In [None]:
# Analyze target variable distribution by gender
print("📈 Target variable distribution by gender:")
gender_income = pd.crosstab(data.df['sex'], data.df['income'])
print(gender_income)

# Calculate outcome rates by gender
outcome_rates = gender_income.div(gender_income.sum(axis=1), axis=0)
print("\n📊 Outcome rates by gender:")
print(outcome_rates)

# Visualize the bias
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Raw counts
gender_income.plot(kind='bar', ax=ax1, title='Income Distribution by Gender (Counts)')
ax1.set_ylabel('Count')
ax1.tick_params(axis='x', rotation=0)

# Outcome rates
outcome_rates.plot(kind='bar', ax=ax2, title='Income Distribution by Gender (Rates)')
ax2.set_ylabel('Rate')
ax2.tick_params(axis='x', rotation=0)

plt.tight_layout()
plt.show()

# Calculate bias score
high_income_rates = outcome_rates['>50K']
bias_score = abs(high_income_rates['Male'] - high_income_rates['Female'])
print(f"\n⚖️  Raw bias score: {bias_score:.1%}")

if bias_score > 0.2:
    print("🔴 HIGH BIAS detected - immediate action needed!")
elif bias_score > 0.1:
    print("🟡 MODERATE BIAS detected - improvement recommended")
else:
    print("🟢 LOW BIAS detected - good fairness baseline")

## Step 3: Preprocess Data with Bias Mitigation

Now let's preprocess the data and apply bias mitigation techniques.

In [None]:
# Preprocess the data
print("🔧 Preprocessing data...")

data.preprocess(
    target_column='income',
    numerical_features=['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week'],
    categorical_features=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
)

print(f"✅ Preprocessing complete: {data.X.shape[1]} features created")

# Apply bias mitigation
print("\n⚖️  Applying bias mitigation...")
data.apply_bias_mitigation(method='reweighing')
print("✅ Bias mitigation applied")

# Handle class imbalance
print("\n📊 Handling class imbalance...")
data.handle_class_imbalance(method='class_weights')
print("✅ Class imbalance handled")

# Split data
data.split_data(test_size=0.2, random_state=42)
print(f"\n🔀 Data split: {len(data.X_train)} training, {len(data.X_test)} testing")

## Step 4: Train Fair AI Model

Now let's train a model with fairness constraints.

In [None]:
# Prepare features for training
print("🎯 Preparing features for training...")

# Find sensitive feature columns (they get renamed during preprocessing)
sensitive_feature_column = [col for col in data.X_train.columns if col.startswith('sex_')][0]
print(f"📋 Sensitive feature column: {sensitive_feature_column}")

# Separate sensitive features from training features
sensitive_features_train = data.X_train[sensitive_feature_column]
X_train = data.X_train.drop(columns=[sensitive_feature_column])
sensitive_features_test = data.X_test[sensitive_feature_column]
X_test = data.X_test.drop(columns=[sensitive_feature_column])

print(f"✅ Features prepared: {X_train.shape[1]} features for training")

# Create and train fair model
print("\n🤖 Training fair AI model...")

model = Model(
    algorithm='robust_random_forest',           # Use robust algorithm
    fairness_constraint='demographic_parity'   # Ensure fair treatment
)

# Apply stability improvements
model.apply_stability_improvements(
    X_train, data.y_train,
    sensitive_features_train,
    stability_method='comprehensive'
)

# Train the model
model.train(
    X_train, data.y_train,
    sensitive_features=sensitive_features_train
)

print("✅ Fair AI model trained successfully!")

## Step 5: Comprehensive Evaluation

Let's evaluate our model for both performance and fairness.

In [None]:
# Make predictions
print("🔮 Making predictions...")
predictions = model.predict(X_test)
print(f"✅ Predictions made for {len(predictions)} test samples")

# Comprehensive evaluation
print("\n📊 Running comprehensive evaluation...")
evaluation = EquiMLEvaluation()
metrics = evaluation.evaluate(
    model, X_test, data.y_test,
    y_pred=predictions,
    sensitive_features=sensitive_features_test
)

print("✅ Evaluation completed!")

# Display key results
print("\n🎯 KEY RESULTS:")
print(f"📈 Accuracy: {metrics['accuracy']:.1%}")
print(f"📈 F1-Score: {metrics['f1_score']:.1%}")

if 'demographic_parity_difference' in metrics:
    dp_diff = abs(metrics['demographic_parity_difference'])
    print(f"⚖️  Demographic Parity: {dp_diff:.1%}")
    
    if dp_diff <= 0.1:
        print("   🏆 EXCELLENT fairness achieved!")
    elif dp_diff <= 0.2:
        print("   🥈 GOOD fairness - minor improvements possible")
    else:
        print("   ⚠️  BIAS detected - consider additional mitigation")

## Step 6: Set Up Bias Monitoring

Let's set up real-time bias monitoring for our model.

In [None]:
# Set up bias monitoring
print("🛡️  Setting up bias monitoring...")

monitor = BiasMonitor(sensitive_features=['sex'])

# Monitor current predictions
monitoring_result = monitor.monitor_predictions(
    predictions,
    pd.DataFrame({sensitive_feature_column: sensitive_features_test}),
    data.y_test.values
)

violations = len(monitoring_result['violations'])
print(f"🔍 Bias violations detected: {violations}")

if violations == 0:
    print("✅ No bias violations - your AI is working fairly!")
else:
    print("⚠️  Bias issues detected:")
    for violation in monitoring_result['violations']:
        print(f"   - {violation}")

# Get monitoring summary
summary = monitor.get_monitoring_summary()
print(f"\n📋 Monitoring Summary:")
for key, value in summary.items():
    print(f"   {key}: {value}")

## Step 7: Generate Comprehensive Report

Finally, let's generate a detailed HTML report with actionable recommendations.

In [None]:
# Generate comprehensive report
print("📝 Generating comprehensive report...")

evaluation.generate_report(
    metrics,
    output_path='quick_start_report.html',
    template_path='../../src/report_template.html'
)

print("✅ Report generated: quick_start_report.html")
print("   Open this file in your browser to see detailed results!")

# Display summary
print("\n🎉 CONGRATULATIONS!")
print("You've successfully built your first fair AI model!")
print("\nWhat you accomplished:")
print("✅ Loaded and analyzed data for bias")
print("✅ Applied bias mitigation techniques")
print("✅ Trained a stable, robust AI model")
print("✅ Ensured fairness across gender groups")
print("✅ Set up real-time bias monitoring")
print("✅ Generated comprehensive analysis report")

print("\n🚀 Next steps:")
print("1. Try different algorithms (robust_xgboost, robust_ensemble)")
print("2. Experiment with different fairness constraints")
print("3. Test with your own datasets")
print("4. Explore advanced features in our comprehensive guides")
print("\n🌟 Join the fair AI movement: https://github.com/mkupermann/EquiML")