# ALICE Engine - ML Training Pipeline

**Objectif:** Prédiction des résultats d'échecs (Interclubs FFE)

## Conformité ISO

| Norme | Description | Statut |
|-------|-------------|--------|
| ISO/IEC 5259 | Data Quality for ML | Validé |
| ISO/IEC 42001 | AI Management System (Model Card) | Validé |
| ISO/IEC 24029 | Neural Network Robustness | Testé |
| ISO/IEC 24027 | Bias in AI (Fairness) | Testé |
| ISO/IEC 25059 | AI Quality Model | Rapport final |

## Métadonnées

- **Version:** 2.0.0
- **Auteur:** ALICE Engine Team
- **Dataset:** FFE Interclubs 2014-2024
- **Target:** Victoire Blancs (binaire)

---
## 1. Configuration & Seeds (Reproducibility)

In [None]:
# Configuration globale - Reproducibilité
import os
import random
import numpy as np

RANDOM_SEED = 42
os.environ['PYTHONHASHSEED'] = str(RANDOM_SEED)
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Paramètres d'entraînement
CONFIG = {
    'seed': RANDOM_SEED,
    'target_column': 'target',
    'eval_metric': 'roc_auc',
    'threshold_auc': 0.70,  # ISO 25059 minimum
    'autogluon_time_limit': 21600,  # 6 heures
    'autogluon_presets': 'best_quality',  # Kaggle: best_quality (extreme trop long)
    'num_bag_folds': 5,
    'num_stack_levels': 2,
}

# Features
FEATURES = [
    'blanc_elo', 'noir_elo', 'diff_elo', 'echiquier', 'niveau', 'ronde',
    'type_competition', 'division', 'ligue_code', 'blanc_titre', 'noir_titre', 'jour_semaine'
]

CAT_FEATURES = ['type_competition', 'division', 'ligue_code', 'blanc_titre', 'noir_titre', 'jour_semaine']

print(f'Config: seed={CONFIG["seed"]}, metric={CONFIG["eval_metric"]}, threshold={CONFIG["threshold_auc"]}')

---
## 2. Library Imports

In [None]:
# Installation (Kaggle)
!pip install -q autogluon.tabular catboost xgboost lightgbm scikit-learn pandas pyarrow matplotlib seaborn

In [None]:
# Data manipulation
import pandas as pd
import gc
import json
import hashlib
from pathlib import Path
from datetime import datetime

# ML Libraries
from sklearn.metrics import roc_auc_score, accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder
from catboost import CatBoostClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from autogluon.tabular import TabularPredictor

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical tests
from scipy.stats import chi2

print('Libraries imported successfully')
print(f'Pandas: {pd.__version__}')

---
## 3. Data Loading (ISO 5259 - Data Quality)

In [None]:
# Chemins Kaggle
DATA_PATH = Path('/kaggle/input/alice-features')
OUTPUT_PATH = Path('/kaggle/working')

# Chargement des données
train = pd.read_parquet(DATA_PATH / 'train.parquet')
valid = pd.read_parquet(DATA_PATH / 'valid.parquet')
test = pd.read_parquet(DATA_PATH / 'test.parquet')

print(f'Train: {len(train):,} samples')
print(f'Valid: {len(valid):,} samples')
print(f'Test:  {len(test):,} samples')
print(f'Total: {len(train) + len(valid) + len(test):,} samples')

In [None]:
# ISO 5259: Validation intégrité données (hash)
def compute_hash(df: pd.DataFrame) -> str:
    """Compute SHA256 hash for data lineage."""
    return hashlib.sha256(pd.util.hash_pandas_object(df).values.tobytes()).hexdigest()[:8]

data_lineage = {
    'train': {'samples': len(train), 'hash': compute_hash(train)},
    'valid': {'samples': len(valid), 'hash': compute_hash(valid)},
    'test': {'samples': len(test), 'hash': compute_hash(test)},
}

print('ISO 5259 Data Lineage:')
for name, info in data_lineage.items():
    print(f'  {name}: {info["samples"]:,} samples, hash={info["hash"]}')

---
## 4. Data Exploration (EDA)

In [None]:
# Aperçu des données
train.head()

In [None]:
# Statistiques descriptives
train[FEATURES].describe()

In [None]:
# Distribution target
for df in [train, valid, test]:
    df['target'] = (df['resultat_blanc'] == 1.0).astype(int)

print('Target distribution:')
print(f'  Train: {train["target"].mean():.2%} positive')
print(f'  Valid: {valid["target"].mean():.2%} positive')
print(f'  Test:  {test["target"].mean():.2%} positive')

In [None]:
# Visualisation: Distribution ELO
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

axes[0].hist(train['blanc_elo'], bins=50, alpha=0.7, label='Blancs')
axes[0].hist(train['noir_elo'], bins=50, alpha=0.7, label='Noirs')
axes[0].set_xlabel('ELO')
axes[0].set_ylabel('Fréquence')
axes[0].set_title('Distribution ELO')
axes[0].legend()

axes[1].hist(train['diff_elo'], bins=50, color='green', alpha=0.7)
axes[1].set_xlabel('Différence ELO (Blanc - Noir)')
axes[1].set_title('Distribution Diff ELO')

train['target'].value_counts().plot(kind='bar', ax=axes[2], color=['red', 'green'])
axes[2].set_xlabel('Target (0=Défaite/Nulle, 1=Victoire Blanc)')
axes[2].set_title('Distribution Target')
axes[2].set_xticklabels(['0', '1'], rotation=0)

plt.tight_layout()
plt.savefig(OUTPUT_PATH / 'eda_distributions.png', dpi=150)
plt.show()

---
## 5. Feature Engineering

In [None]:
# Préparation des datasets
combined = pd.concat([train, valid], ignore_index=True)
print(f'Combined train+valid: {len(combined):,} samples')

X_train = combined[FEATURES]
y_train = combined['target']
X_test = test[FEATURES]
y_test = test['target']

print(f'X_train shape: {X_train.shape}')
print(f'X_test shape: {X_test.shape}')

In [None]:
# Encodage pour XGBoost/LightGBM (CatBoost gère nativement)
X_train_encoded = X_train.copy()
X_test_encoded = X_test.copy()

label_encoders = {}
for col in CAT_FEATURES:
    le = LabelEncoder()
    X_train_encoded[col] = le.fit_transform(X_train_encoded[col].astype(str))
    X_test_encoded[col] = le.transform(X_test_encoded[col].astype(str))
    label_encoders[col] = le

print('Label encoding completed for XGBoost/LightGBM')

---
## 6. Baseline Models (CatBoost, XGBoost, LightGBM)

In [None]:
# Stockage des résultats
results = {}
models = {}

In [None]:
%%time
# CatBoost
print('Training CatBoost...')
cb = CatBoostClassifier(
    iterations=1000,
    learning_rate=0.03,
    depth=6,
    cat_features=CAT_FEATURES,
    early_stopping_rounds=50,
    random_seed=RANDOM_SEED,
    verbose=100
)
cb.fit(X_train, y_train, eval_set=(X_test, y_test))

results['CatBoost'] = roc_auc_score(y_test, cb.predict_proba(X_test)[:, 1])
models['CatBoost'] = cb
print(f'CatBoost Test AUC: {results["CatBoost"]:.4f}')

In [None]:
%%time
# XGBoost
print('Training XGBoost...')
xgb = XGBClassifier(
    n_estimators=1000,
    learning_rate=0.03,
    max_depth=6,
    early_stopping_rounds=50,
    random_state=RANDOM_SEED,
    eval_metric='auc'
)
xgb.fit(X_train_encoded, y_train, eval_set=[(X_test_encoded, y_test)], verbose=100)

results['XGBoost'] = roc_auc_score(y_test, xgb.predict_proba(X_test_encoded)[:, 1])
models['XGBoost'] = xgb
print(f'XGBoost Test AUC: {results["XGBoost"]:.4f}')

In [None]:
%%time
# LightGBM
print('Training LightGBM...')
lgb = LGBMClassifier(
    n_estimators=1000,
    learning_rate=0.03,
    num_leaves=63,
    random_state=RANDOM_SEED,
    verbose=100
)
lgb.fit(
    X_train_encoded, y_train,
    eval_set=[(X_test_encoded, y_test)],
    eval_metric='auc',
    callbacks=[lgb.early_stopping(50)]
)

results['LightGBM'] = roc_auc_score(y_test, lgb.predict_proba(X_test_encoded)[:, 1])
models['LightGBM'] = lgb
print(f'LightGBM Test AUC: {results["LightGBM"]:.4f}')

In [None]:
# Résumé Baseline
print('\n=== BASELINE RESULTS ===')
for model_name, auc in sorted(results.items(), key=lambda x: -x[1]):
    status = 'PASS' if auc >= CONFIG['threshold_auc'] else 'FAIL'
    print(f'{model_name}: {auc:.4f} [{status}]')

---
## 7. AutoGluon (Preset: best_quality)

In [None]:
# Libération mémoire
del combined
gc.collect()

# Préparation données AutoGluon
train_ag = pd.concat([train[FEATURES + ['target']], valid[FEATURES + ['target']]], ignore_index=True)
test_ag = test[FEATURES + ['target']]

print(f'AutoGluon train: {len(train_ag):,}')
print(f'AutoGluon test: {len(test_ag):,}')

In [None]:
%%time
# AutoGluon Training
print('Training AutoGluon (best_quality preset)...')
print(f'Time limit: {CONFIG["autogluon_time_limit"]} seconds ({CONFIG["autogluon_time_limit"]/3600:.1f} hours)')

predictor = TabularPredictor(
    label=CONFIG['target_column'],
    eval_metric=CONFIG['eval_metric'],
    path=str(OUTPUT_PATH / 'autogluon')
)

predictor.fit(
    train_data=train_ag,
    presets=CONFIG['autogluon_presets'],
    time_limit=CONFIG['autogluon_time_limit'],
    num_bag_folds=CONFIG['num_bag_folds'],
    num_stack_levels=CONFIG['num_stack_levels'],
    verbosity=2,
    ag_args_fit={'ag.max_memory_usage_ratio': 3.0}
)

In [None]:
# Leaderboard AutoGluon
leaderboard = predictor.leaderboard()
print(f'\nAutoGluon trained {len(leaderboard)} models')
leaderboard[['model', 'score_val', 'fit_time']].head(15)

In [None]:
# Évaluation sur test set
ag_proba = predictor.predict_proba(test_ag.drop(columns='target'))
results['AutoGluon'] = roc_auc_score(test_ag['target'], ag_proba[1])
print(f'AutoGluon Test AUC: {results["AutoGluon"]:.4f}')

---
## 8. ISO 24029 - Robustness Testing

In [None]:
def test_robustness(predictor, test_data: pd.DataFrame, noise_level: float = 0.1) -> dict:
    """ISO 24029: Test model robustness to input perturbations."""
    X_test = test_data.drop(columns='target')
    y_test = test_data['target']
    
    # Baseline AUC
    baseline_auc = roc_auc_score(y_test, predictor.predict_proba(X_test)[1])
    
    # Perturbed AUC (add Gaussian noise to numeric features)
    X_noisy = X_test.copy()
    numeric_cols = X_noisy.select_dtypes(include=[np.number]).columns
    for col in numeric_cols:
        std = X_noisy[col].std()
        X_noisy[col] = X_noisy[col] + np.random.normal(0, std * noise_level, len(X_noisy))
    
    noisy_auc = roc_auc_score(y_test, predictor.predict_proba(X_noisy)[1])
    
    # Tolerance calculation
    tolerance = baseline_auc - noisy_auc
    
    # Status based on tolerance
    if tolerance < 0.02:
        status = 'ROBUST'
    elif tolerance < 0.05:
        status = 'ACCEPTABLE'
    else:
        status = 'FRAGILE'
    
    return {
        'baseline_auc': float(baseline_auc),
        'noisy_auc': float(noisy_auc),
        'noise_tolerance': float(tolerance),
        'noise_level': noise_level,
        'status': status
    }

robustness_report = test_robustness(predictor, test_ag, noise_level=0.1)
print('ISO 24029 Robustness Report:')
print(f'  Baseline AUC: {robustness_report["baseline_auc"]:.4f}')
print(f'  Noisy AUC: {robustness_report["noisy_auc"]:.4f}')
print(f'  Tolerance: {robustness_report["noise_tolerance"]:.4f}')
print(f'  Status: {robustness_report["status"]}')

---
## 9. ISO 24027 - Fairness Testing

In [None]:
def test_fairness(predictor, test_data: pd.DataFrame, sensitive_attr: str) -> dict:
    """ISO 24027: Test model fairness across groups."""
    X_test = test_data.drop(columns='target')
    y_test = test_data['target']
    y_pred = predictor.predict(X_test)
    
    # Positive rate per group
    groups = test_data[sensitive_attr].unique()
    positive_rates = {}
    
    for group in groups:
        mask = test_data[sensitive_attr] == group
        if mask.sum() > 100:  # Minimum sample size
            positive_rates[str(group)] = float(y_pred[mask].mean())
    
    # Demographic parity (max difference between groups)
    rates = list(positive_rates.values())
    demographic_parity = max(rates) - min(rates) if rates else 0
    
    # Status
    if demographic_parity < 0.05:
        status = 'FAIR'
    elif demographic_parity < 0.10:
        status = 'ACCEPTABLE'
    else:
        status = 'CRITICAL'
    
    return {
        'sensitive_attribute': sensitive_attr,
        'positive_rates_by_group': positive_rates,
        'demographic_parity': float(demographic_parity),
        'status': status
    }

fairness_report = test_fairness(predictor, test_ag, 'ligue_code')
print('ISO 24027 Fairness Report:')
print(f'  Sensitive attribute: {fairness_report["sensitive_attribute"]}')
print(f'  Demographic parity: {fairness_report["demographic_parity"]:.4f}')
print(f'  Status: {fairness_report["status"]}')

---
## 10. McNemar Statistical Comparison

In [None]:
def mcnemar_test(y_true, pred_a, pred_b):
    """McNemar test for comparing two classifiers."""
    correct_a = (pred_a == y_true)
    correct_b = (pred_b == y_true)
    
    # Contingency: b correct & a wrong, a correct & b wrong
    b = np.sum(correct_a & ~correct_b)
    c = np.sum(~correct_a & correct_b)
    
    if b + c == 0:
        return 0.0, 1.0
    
    statistic = (abs(b - c) - 1) ** 2 / (b + c)
    p_value = 1 - chi2.cdf(statistic, 1)
    
    return statistic, p_value

# Comparaison AutoGluon vs Baseline (CatBoost)
y_true = test_ag['target'].values
pred_ag = predictor.predict(test_ag.drop(columns='target')).values
pred_cb = models['CatBoost'].predict(X_test).astype(int)

stat, p_value = mcnemar_test(y_true, pred_ag, pred_cb)

baseline_best_auc = max(results['CatBoost'], results['XGBoost'], results['LightGBM'])
diff_pct = (results['AutoGluon'] - baseline_best_auc) * 100

mcnemar_report = {
    'autogluon_auc': float(results['AutoGluon']),
    'baseline_best_auc': float(baseline_best_auc),
    'difference_pct': float(diff_pct),
    'mcnemar_statistic': float(stat),
    'p_value': float(p_value),
    'significant': p_value < 0.05,
    'meets_2pct': diff_pct >= 2.0
}

# Décision
if mcnemar_report['significant'] and mcnemar_report['meets_2pct']:
    mcnemar_report['winner'] = 'AutoGluon'
else:
    mcnemar_report['winner'] = 'Baseline'

print('McNemar Comparison:')
print(f'  AutoGluon AUC: {mcnemar_report["autogluon_auc"]:.4f}')
print(f'  Baseline Best AUC: {mcnemar_report["baseline_best_auc"]:.4f}')
print(f'  Difference: {mcnemar_report["difference_pct"]:+.2f}%')
print(f'  p-value: {mcnemar_report["p_value"]:.4f}')
print(f'  Significant (p<0.05): {mcnemar_report["significant"]}')
print(f'  Winner: {mcnemar_report["winner"]}')

---
## 11. ISO 42001 - Model Card

In [None]:
# ISO 42001 Model Card
model_card = {
    'model_id': f'autogluon_{datetime.now():%Y%m%d_%H%M%S}',
    'model_name': 'AutoGluon_best_quality',
    'version': '2.0.0',
    'created_at': datetime.now().isoformat(),
    
    # Purpose
    'purpose': 'Prediction of chess game outcomes (White wins) for FFE Interclubs',
    'intended_use': 'Team composition optimization, player performance analysis',
    
    # Data
    'training_data': {
        'source': 'FFE Interclubs 2014-2024',
        'samples': len(train_ag),
        'hash': compute_hash(train_ag),
        'features': FEATURES
    },
    'test_data': {
        'samples': len(test_ag),
        'hash': compute_hash(test_ag)
    },
    
    # Hyperparameters
    'hyperparameters': {
        'presets': CONFIG['autogluon_presets'],
        'time_limit': CONFIG['autogluon_time_limit'],
        'num_bag_folds': CONFIG['num_bag_folds'],
        'num_stack_levels': CONFIG['num_stack_levels']
    },
    
    # Performance
    'performance': {
        'test_auc': results['AutoGluon'],
        'best_model': leaderboard.iloc[0]['model'],
        'num_models': len(leaderboard)
    },
    
    # ISO Compliance
    'iso_compliance': {
        'iso_24029_robustness': robustness_report['status'],
        'iso_24027_fairness': fairness_report['status'],
        'iso_25059_threshold': results['AutoGluon'] >= CONFIG['threshold_auc']
    },
    
    # Limitations
    'limitations': [
        'Only applicable to French chess federation data',
        'Performance may vary for different competition types',
        'Does not account for player form/fatigue'
    ],
    
    # Contact
    'contact': 'ALICE Engine Team'
}

# Save Model Card
with open(OUTPUT_PATH / 'model_card.json', 'w') as f:
    json.dump(model_card, f, indent=2, default=str)

print('ISO 42001 Model Card saved')
print(f'  Model ID: {model_card["model_id"]}')
print(f'  Test AUC: {model_card["performance"]["test_auc"]:.4f}')
print(f'  Best model: {model_card["performance"]["best_model"]}')

---
## 12. Results Visualization

In [None]:
# Comparaison des modèles
fig, ax = plt.subplots(figsize=(10, 6))

model_names = list(results.keys())
aucs = list(results.values())
colors = ['#4CAF50' if auc >= CONFIG['threshold_auc'] else '#F44336' for auc in aucs]

bars = ax.barh(model_names, aucs, color=colors)
ax.axvline(x=CONFIG['threshold_auc'], color='red', linestyle='--', linewidth=2, label=f'ISO 25059 Threshold ({CONFIG["threshold_auc"]})')

for bar, auc in zip(bars, aucs):
    ax.text(auc + 0.005, bar.get_y() + bar.get_height()/2, f'{auc:.4f}', va='center', fontsize=12)

ax.set_xlabel('Test AUC', fontsize=12)
ax.set_title('ALICE Engine - Model Comparison', fontsize=14)
ax.legend(loc='lower right')
ax.set_xlim(0.5, max(aucs) + 0.05)

plt.tight_layout()
plt.savefig(OUTPUT_PATH / 'model_comparison.png', dpi=150)
plt.show()

In [None]:
# Feature Importance (CatBoost)
fig, ax = plt.subplots(figsize=(10, 6))

importance = pd.DataFrame({
    'feature': FEATURES,
    'importance': models['CatBoost'].feature_importances_
}).sort_values('importance', ascending=True)

ax.barh(importance['feature'], importance['importance'], color='steelblue')
ax.set_xlabel('Importance')
ax.set_title('Feature Importance (CatBoost)')

plt.tight_layout()
plt.savefig(OUTPUT_PATH / 'feature_importance.png', dpi=150)
plt.show()

---
## 13. ISO 25059 Final Report

In [None]:
# Génération rapport final
report = f'''# Rapport ISO 25059 - ALICE Engine ML Training

**Date:** {datetime.now():%Y-%m-%d %H:%M}
**Version:** 2.0.0

---

## Résumé Exécutif

| Critère | Résultat | Statut |
|---------|----------|--------|
| AutoGluon AUC | {results["AutoGluon"]:.4f} | {"PASS" if results["AutoGluon"] >= CONFIG["threshold_auc"] else "FAIL"} |
| ISO 24029 Robustesse | {robustness_report["status"]} | {"PASS" if robustness_report["status"] != "FRAGILE" else "FAIL"} |
| ISO 24027 Fairness | {fairness_report["status"]} | {"PASS" if fairness_report["status"] != "CRITICAL" else "FAIL"} |
| Diff vs Baseline | {mcnemar_report["difference_pct"]:+.2f}% | {"PASS" if mcnemar_report["meets_2pct"] else "WARN"} |
| p-value (McNemar) | {mcnemar_report["p_value"]:.4f} | {"PASS" if mcnemar_report["significant"] else "FAIL"} |

**Recommandation:** {mcnemar_report["winner"]}

---

## Baseline Models

| Model | Test AUC | Status |
|-------|----------|--------|
| CatBoost | {results["CatBoost"]:.4f} | {"PASS" if results["CatBoost"] >= CONFIG["threshold_auc"] else "FAIL"} |
| XGBoost | {results["XGBoost"]:.4f} | {"PASS" if results["XGBoost"] >= CONFIG["threshold_auc"] else "FAIL"} |
| LightGBM | {results["LightGBM"]:.4f} | {"PASS" if results["LightGBM"] >= CONFIG["threshold_auc"] else "FAIL"} |

---

## AutoGluon Results

- **Preset:** {CONFIG["autogluon_presets"]}
- **Time Limit:** {CONFIG["autogluon_time_limit"]/3600:.1f} hours
- **Models Trained:** {len(leaderboard)}
- **Best Model:** {leaderboard.iloc[0]["model"]}
- **Test AUC:** {results["AutoGluon"]:.4f}

---

## Décision Finale

**Règle:**
```
IF AutoGluon.AUC >= 0.70
   AND robustness != FRAGILE
   AND fairness != CRITICAL
   AND p_value < 0.05
   AND diff >= +2%
THEN AutoGluon
ELSE Baseline (CatBoost)
```

**Résultat:** {mcnemar_report["winner"]}

---

## Data Lineage (ISO 5259)

| Dataset | Samples | Hash |
|---------|---------|------|
| Train | {data_lineage["train"]["samples"]:,} | {data_lineage["train"]["hash"]} |
| Valid | {data_lineage["valid"]["samples"]:,} | {data_lineage["valid"]["hash"]} |
| Test | {data_lineage["test"]["samples"]:,} | {data_lineage["test"]["hash"]} |

---

*Generated by ALICE Engine Kaggle Notebook*
'''

with open(OUTPUT_PATH / 'ISO_25059_REPORT.md', 'w') as f:
    f.write(report)

print('ISO 25059 Report saved to:', OUTPUT_PATH / 'ISO_25059_REPORT.md')
print('\n' + '='*50)
print(report)

---
## 14. Save All Reports

In [None]:
# Sauvegarde de tous les rapports JSON
reports_to_save = {
    'autogluon_results.json': {
        'test_auc': results['AutoGluon'],
        'best_model': leaderboard.iloc[0]['model'],
        'num_models': len(leaderboard),
        'leaderboard': leaderboard[['model', 'score_val']].to_dict('records')
    },
    'robustness_report.json': robustness_report,
    'fairness_report.json': fairness_report,
    'mcnemar_comparison.json': mcnemar_report,
    'baseline_results.json': {
        'CatBoost': {'test_auc': results['CatBoost']},
        'XGBoost': {'test_auc': results['XGBoost']},
        'LightGBM': {'test_auc': results['LightGBM']}
    }
}

for filename, data in reports_to_save.items():
    with open(OUTPUT_PATH / filename, 'w') as f:
        json.dump(data, f, indent=2, default=str)
    print(f'Saved: {filename}')

print('\nAll reports saved successfully!')

In [None]:
# Liste des fichiers générés
print('\n=== OUTPUT FILES ===')
for f in sorted(OUTPUT_PATH.glob('*')):
    if f.is_file():
        print(f'  {f.name}')

---

## Conclusion

Ce notebook a exécuté un pipeline ML complet conforme aux normes ISO:

- **ISO 5259**: Data quality avec lineage et hash
- **ISO 42001**: Model Card documenté
- **ISO 24029**: Tests de robustesse
- **ISO 24027**: Tests de fairness
- **ISO 25059**: Rapport qualité final

Pour reproduire les résultats localement, consultez le dépôt ALICE Engine.