# IAM ML Framework - Example Analysis

This notebook demonstrates how to use the IAM ML Framework for identity and access management data analysis.

## What You'll Learn

1. Loading and exploring IAM data
2. Data preprocessing and feature engineering
3. Training ML models for approval prediction
4. Generating actionable insights
5. Creating visualizations

## Prerequisites

Make sure you have:
- Run `pip install -r requirements.txt`
- Generated sample data with `python data/generate_dummy_data.py`
- Or configured your MySQL connection in `config/db_config.yaml`


In [None]:
# Import required libraries
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Add src to path
sys.path.insert(0, '../src')

from database import DatabaseConnector
from preprocessing import DataPreprocessor
from model_training import ModelTrainer
from insights import InsightsGenerator
from visualization import Visualizer

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
sns.set_style('whitegrid')

print("All imports successful!")


## Step 1: Load Data

The framework automatically loads data from MySQL or CSV files based on your configuration.


In [None]:
# Initialize database connector
db = DatabaseConnector(
    db_config_path='../config/db_config.yaml',
    schema_config_path='../config/schema_config.yaml'
)

# Connect and load tables
db.connect()
tables = db.fetch_all_tables()

print(f"\nLoaded {len(tables)} tables:")
for table_name, df in tables.items():
    print(f"  - {table_name}: {len(df)} rows, {len(df.columns)} columns")


## Step 2: Explore the Data

Let's examine each dataset to understand the IAM data structure.


In [None]:
# Decision History - tracks approval/rejection decisions
print("=== Decision History ===")
decision_history = tables['decision_history']
print(f"Shape: {decision_history.shape}")
print(f"\nColumns: {list(decision_history.columns)}")
print(f"\nFirst few rows:")
display(decision_history.head())

print(f"\nDecision distribution:")
print(decision_history['decision'].value_counts())


## Step 3: Preprocess Data

The framework handles cleaning, merging, and encoding automatically.


In [None]:
# Get configuration
ml_config = db.get_ml_config()
merge_strategy = db.get_merge_strategy()

# Initialize preprocessor
preprocessor = DataPreprocessor(ml_config, merge_strategy)

# Run preprocessing pipeline
features, target = preprocessor.process_pipeline(tables)

print(f"\n=== Preprocessing Results ===")
print(f"Features shape: {features.shape}")
print(f"Target shape: {target.shape}")
print(f"\nFeature columns: {list(features.columns)[:10]}...")
print(f"\nTarget distribution:")
print(target.value_counts())


## Step 4: Train ML Models

Train multiple classification models to predict approval/rejection.


In [None]:
# Initialize trainer
trainer = ModelTrainer(ml_config, models_dir='../models')

# Train models
results = trainer.train(features, target)

print("\n=== Training Results ===")
print(f"\nModels trained: {list(results['models'].keys())}")

# Display metrics
for model_name, metrics in results['metrics'].items():
    print(f"\n{model_name.upper()}:")
    print(f"  Accuracy: {metrics.get('accuracy', 0):.4f}")
    print(f"  F1-Score: {metrics.get('f1_score', 0):.4f}")


## Step 5: Generate Insights

Extract actionable insights for IAM operations.


In [None]:
# Get original data for context
cleaned_tables = {name: preprocessor.clean_dataframe(df, name) 
                 for name, df in tables.items()}
original_data = preprocessor.merge_tables(cleaned_tables)

# Generate insights
insights_config = db.get_insights_config()
insights_gen = InsightsGenerator(insights_config, models_dir='../models', output_dir='../outputs/insights')

insights = insights_gen.generate_all_insights(features, target, original_data)

print("\n=== Insights Generated ===")
for key in insights.keys():
    print(f"  - {key}")


## Step 6: View Insights Report

Generate comprehensive text report.


In [None]:
# Generate and display report
report = insights_gen.generate_report()
print(report)


## Step 7: Create Visualizations

Generate comprehensive visualizations.


In [None]:
# Create visualizations
viz = Visualizer(output_dir='../outputs/visualizations')
viz.generate_all_visualizations(features, target, original_data, insights)

print(f"\nVisualizations saved to: {viz.output_dir}")


## Visualization Examples

Let's display some key visualizations inline.


In [None]:
# Risk score distribution by decision
if 'risk_score' in original_data.columns and 'decision' in original_data.columns:
    plt.figure(figsize=(10, 6))
    
    for decision in original_data['decision'].unique():
        data = original_data[original_data['decision'] == decision]['risk_score']
        plt.hist(data, alpha=0.5, label=decision, bins=30)
    
    plt.xlabel('Risk Score')
    plt.ylabel('Frequency')
    plt.title('Risk Score Distribution by Decision')
    plt.legend()
    plt.tight_layout()
    plt.show()


## Key Takeaways

This notebook demonstrated:

1. **Data Loading**: Automatically fetch data from MySQL or CSV files
2. **Preprocessing**: Clean, merge, and encode IAM data
3. **ML Training**: Build classification models for approval prediction
4. **Feature Importance**: Understand key decision drivers
5. **Insights**: Identify access reduction opportunities and risk trends
6. **Visualizations**: Create actionable dashboards

## Next Steps

- **Customize**: Edit `config/schema_config.yaml` to define your own tables
- **Extend**: Add new models in `src/model_training.py`
- **Deploy**: Use trained models for real-time predictions
- **Automate**: Run `main.py` for complete pipeline execution


In [None]:
# Disconnect from database
db.disconnect()
print("Analysis complete!")
