# Workshop 4: Kaggle System Simulation
## CIBMTR - Equity in post-HCT Survival Predictions

Este notebook implementa las dos simulaciones requeridas para validar la arquitectura del sistema diseÃ±ado en workshops anteriores.

### Simulaciones:
1. **SimulaciÃ³n 1**: Data-Driven Machine Learning
2. **SimulaciÃ³n 2**: Event-Based Cellular Automata

### Contexto:
- **Workshop 1**: AnÃ¡lisis de sistemas - identificÃ³ variables sensibles y comportamiento caÃ³tico
- **Workshop 2**: DiseÃ±o del sistema con arquitectura de 7 mÃ³dulos (M1-M7)
- **Workshop 3**: GestiÃ³n de proyecto con umbrales de calidad

## 1. ConfiguraciÃ³n Inicial

In [None]:
# Instalar dependencias si es necesario (para Google Colab)
# !pip install pandas numpy matplotlib seaborn scikit-learn

In [None]:
import sys
import os

# Agregar el directorio src al path
sys.path.insert(0, '../src')

# Importar configuraciÃ³n
from config import *

# Importar mÃ³dulos
from m1_preprocessing import preprocess_pipeline
from m2_equity_analysis import run_equity_analysis, plot_equity_analysis
from m3_feature_selection import run_feature_selection, plot_feature_importance
from simulation1_ml import run_simulation1, plot_simulation1_results
from simulation2_automata import run_simulation2, plot_automata_evolution, compare_scenarios
from m5_fairness import run_fairness_calibration, plot_fairness_metrics
from m6_uncertainty import run_uncertainty_quantification, plot_uncertainty_analysis

print("âœ“ MÃ³dulos importados correctamente")

In [None]:
# ConfiguraciÃ³n de visualizaciÃ³n
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

print("âœ“ ConfiguraciÃ³n de visualizaciÃ³n completada")

## 2. Cargar y Subir Datos

**Instrucciones para Google Colab:**
1. Ejecutar la celda de abajo
2. Subir el archivo `train.csv` cuando se solicite
3. Opcionalmente subir `data_dictionary.csv`

In [None]:
# Para Google Colab: Subir archivos
try:
    from google.colab import files
    print("Subir train.csv:")
    uploaded = files.upload()
    train_path = list(uploaded.keys())[0]
    
    print("\nÂ¿Desea subir data_dictionary.csv? (opcional)")
    try:
        uploaded_dict = files.upload()
        dict_path = list(uploaded_dict.keys())[0]
    except:
        dict_path = None
        print("Continuando sin diccionario de datos...")
except:
    # Para ejecuciÃ³n local
    train_path = '../data/train.csv'
    dict_path = '../data/data_dictionary.csv'
    print(f"Usando rutas locales:")
    print(f"  - train_path: {train_path}")
    print(f"  - dict_path: {dict_path}")

## 3. MÃ³dulo M1: Data Preprocessing

In [None]:
# Ejecutar pipeline de preprocesamiento
df_processed, numeric_features, categorical_features, encoders, scaler = preprocess_pipeline(
    train_path, 
    dict_path if os.path.exists(dict_path) if dict_path else None else None
)

print(f"\nDataset preprocesado: {df_processed.shape}")

## 4. MÃ³dulo M2: Equity Analysis

In [None]:
# Ejecutar anÃ¡lisis de equidad
equity_results = run_equity_analysis(df_processed)

# Visualizar resultados
if not equity_results['equity_stats'].empty:
    fig_equity = plot_equity_analysis(equity_results['equity_stats'], '../results/equity_analysis.png')
    plt.show()

## 5. MÃ³dulo M3: Feature Selection

In [None]:
# Ejecutar selecciÃ³n de features
feature_results = run_feature_selection(df_processed)

# Visualizar importancia de features
fig_features = plot_feature_importance(feature_results['importance_df'], '../results/feature_importance.png')
plt.show()

# Mostrar features seleccionadas
print("\nFeatures seleccionadas:")
for i, feat in enumerate(feature_results['selected_features'], 1):
    print(f"  {i}. {feat}")

---
# SIMULACIÃ“N 1: Data-Driven Machine Learning

Esta simulaciÃ³n implementa un modelo de ML clÃ¡sico (Gradient Boosting) para predecir supervivencia post-HCT, con anÃ¡lisis de:
- Variabilidad del modelo con diferentes semillas
- Sensibilidad al caos (Butterfly Effect)
- VerificaciÃ³n de umbrales de calidad

In [None]:
# Ejecutar SimulaciÃ³n 1
simulation1_results = run_simulation1(df_processed, feature_results['selected_features'])

In [None]:
# Visualizar resultados de SimulaciÃ³n 1
fig_sim1 = plot_simulation1_results(
    simulation1_results['results_df'],
    simulation1_results['chaos_df'],
    simulation1_results['best_model'],
    simulation1_results['features'],
    '../results/simulation1_results.png'
)
plt.show()

In [None]:
# Tabla resumen de SimulaciÃ³n 1
print("\n" + "="*60)
print("TABLA RESUMEN - SIMULACIÃ“N 1")
print("="*60)
print(simulation1_results['results_df'].to_string(index=False))
print("\nAnÃ¡lisis de Caos:")
print(simulation1_results['chaos_df'].to_string(index=False))

---
# SIMULACIÃ“N 2: Event-Based Cellular Automata

Esta simulaciÃ³n utiliza autÃ³matas celulares para modelar el comportamiento emergente en la evoluciÃ³n de estados de pacientes post-HCT.

### Estados del AutÃ³mata:
- **Estado 0 (Verde)**: Paciente Estable
- **Estado 1 (Amarillo)**: Paciente En Riesgo
- **Estado 2 (Rojo)**: Evento (muerte o falla)

### Reglas de TransiciÃ³n:
- Estable â†’ Riesgo: si â‰¥3 vecinos en riesgo O evento caÃ³tico
- Riesgo â†’ Evento: si â‰¥4 vecinos con evento O probabilidad de progresiÃ³n
- Riesgo â†’ Estable: probabilidad de recuperaciÃ³n
- Evento â†’ Riesgo: probabilidad de recuperaciÃ³n parcial

In [None]:
# Ejecutar SimulaciÃ³n 2
simulation2_results = run_simulation2(df_processed)

In [None]:
# Visualizar evoluciÃ³n del autÃ³mata
fig_sim2 = plot_automata_evolution(
    simulation2_results['automata'],
    '../results/simulation2_evolution.png'
)
plt.show()

In [None]:
# MÃ©tricas de comportamiento emergente
print("\n" + "="*60)
print("MÃ‰TRICAS DE COMPORTAMIENTO EMERGENTE")
print("="*60)
for key, value in simulation2_results['emergence_metrics'].items():
    if isinstance(value, float):
        print(f"  {key}: {value:.4f}")
    else:
        print(f"  {key}: {value}")

## 6. MÃ³dulo M5: Fairness Calibration (Opcional)

Aplica calibraciÃ³n de equidad para reducir disparidades entre grupos demogrÃ¡ficos.

In [None]:
# Preparar datos para calibraciÃ³n de equidad
from sklearn.model_selection import train_test_split

X = feature_results['X']
y = feature_results['y']

# Obtener grupos si estÃ¡n disponibles
if EQUITY_COLUMN in df_processed.columns:
    groups = df_processed.loc[X.index, EQUITY_COLUMN]
    
    # Split y entrenar modelo
    X_train, X_test, y_train, y_test, groups_train, groups_test = train_test_split(
        X, y, groups, test_size=0.2, random_state=RANDOM_STATE, stratify=y
    )
    
    # Usar mejor modelo de simulaciÃ³n 1
    best_model = simulation1_results['best_model']
    y_pred = best_model.predict(X_test)
    y_proba = best_model.predict_proba(X_test)[:, 1]
    
    # Ejecutar calibraciÃ³n de equidad
    fairness_results = run_fairness_calibration(
        y_test.values, y_pred, y_proba, groups_test.values
    )
    
    # Visualizar
    fig_fairness = plot_fairness_metrics(
        fairness_results['fairness_after'],
        '../results/fairness_metrics.png'
    )
    plt.show()
else:
    print(f"âš  Columna '{EQUITY_COLUMN}' no encontrada. Omitiendo calibraciÃ³n de equidad.")

## 7. MÃ³dulo M6: Uncertainty Quantification (Opcional)

Cuantifica la incertidumbre en las predicciones del modelo.

In [None]:
# Ejecutar cuantificaciÃ³n de incertidumbre
uncertainty_results = run_uncertainty_quantification(
    simulation1_results['best_model'],
    X,
    y
)

# Visualizar
fig_uncertainty = plot_uncertainty_analysis(
    uncertainty_results['mean_predictions'],
    uncertainty_results['std_predictions'],
    uncertainty_results['lower_bound'],
    uncertainty_results['upper_bound'],
    '../results/uncertainty_analysis.png'
)
plt.show()

---
# 8. Resumen Final y Conclusiones

In [None]:
print("="*70)
print("RESUMEN FINAL - WORKSHOP 4")
print("="*70)

print("\nðŸ“Š SIMULACIÃ“N 1: DATA-DRIVEN ML")
print("-"*40)
print(f"  Mejor Accuracy: {simulation1_results['best_accuracy']:.4f}")
print(f"  Mejor AUC: {simulation1_results['best_auc']:.4f}")
print(f"  Estabilidad: {'âœ“ CUMPLE' if simulation1_results['variability_analysis']['stability_ok'] else 'âœ— NO CUMPLE'}")
print(f"  Objetivo Accuracy: {'âœ“ CUMPLE' if simulation1_results['variability_analysis']['accuracy_ok'] else 'âœ— NO CUMPLE'}")

print("\nðŸ”¬ SIMULACIÃ“N 2: CELLULAR AUTOMATA")
print("-"*40)
print(f"  Tasa inicial de eventos: {simulation2_results['emergence_metrics']['initial_event_rate']:.4f}")
print(f"  Tasa final de eventos: {simulation2_results['emergence_metrics']['final_event_rate']:.4f}")
print(f"  Tendencia: {simulation2_results['emergence_metrics']['trend']}")
print(f"  Volatilidad: {simulation2_results['emergence_metrics']['volatility']:.4f}")

print("\nðŸ“ˆ UMBRALES DE CALIDAD (Workshop 3)")
print("-"*40)
print(f"  ACCURACY_TARGET (â‰¥{ACCURACY_TARGET}): {'âœ“' if simulation1_results['best_accuracy'] >= ACCURACY_TARGET else 'âœ—'}")
print(f"  INSTABILITY_THRESHOLD (â‰¤{INSTABILITY_THRESHOLD}): {'âœ“' if simulation1_results['variability_analysis']['accuracy_cv'] <= INSTABILITY_THRESHOLD else 'âœ—'}")
print(f"  BIAS_THRESHOLD (â‰¤{BIAS_THRESHOLD}): {'âœ“' if equity_results['max_disparity'] <= BIAS_THRESHOLD else 'âœ—'}")

print("\n" + "="*70)

## 9. Guardar Resultados

In [None]:
import json
from datetime import datetime

# Crear resumen de resultados
results_summary = {
    'timestamp': datetime.now().isoformat(),
    'simulation1': {
        'best_accuracy': float(simulation1_results['best_accuracy']),
        'best_auc': float(simulation1_results['best_auc']),
        'stability_ok': simulation1_results['variability_analysis']['stability_ok'],
        'accuracy_ok': simulation1_results['variability_analysis']['accuracy_ok']
    },
    'simulation2': {
        'initial_event_rate': float(simulation2_results['emergence_metrics']['initial_event_rate']),
        'final_event_rate': float(simulation2_results['emergence_metrics']['final_event_rate']),
        'trend': simulation2_results['emergence_metrics']['trend']
    },
    'equity': {
        'max_disparity': float(equity_results['max_disparity']),
        'bias_detected': equity_results['bias_detected']
    }
}

# Guardar resumen
with open('../results/workshop4_summary.json', 'w') as f:
    json.dump(results_summary, f, indent=2)

print("âœ“ Resultados guardados en '../results/workshop4_summary.json'")

---
### Referencias

- Workshop 1: AnÃ¡lisis de Sistemas
- Workshop 2: DiseÃ±o del Sistema (Arquitectura M1-M7)
- Workshop 3: GestiÃ³n de Proyecto y Control de Calidad
- [Kaggle Competition: CIBMTR - Equity in post-HCT Survival Predictions](https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions)