# Entraînement du Neural Network pour Smart Chess sur Google Colab

Ce notebook permet d'entraîner le réseau de neurones pour l'évaluation d'échecs en utilisant les ressources GPU de Google Colab.

**Chemin du projet sur Drive:** `MyDrive/smart_chess_drive/smart-chess`

## Instructions
1. Aller dans **Runtime > Change runtime type > GPU** (T4 ou mieux)
2. Exécuter les cellules dans l'ordre
3. Les modèles seront sauvegardés automatiquement sur votre Drive

## 1. Vérification GPU

In [None]:
# Vérifier la disponibilité du GPU
!nvidia-smi

Mon Oct 27 13:17:19 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   49C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## 2. Montage Google Drive

In [None]:
# Monter Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 3. Configuration du chemin du projet

In [None]:
# Définir le chemin vers le projet sur votre Drive
import os
import sys

PROJECT_PATH = '/content/drive/MyDrive/smart_chess_drive/smart-chess'
os.chdir(PROJECT_PATH)
sys.path.insert(0, PROJECT_PATH)

print(f"Répertoire de travail: {os.getcwd()}")
print(f"\nContenu du répertoire:")
for item in sorted(os.listdir('.')):
    print(f"  - {item}")

Répertoire de travail: /content/drive/MyDrive/smart_chess_drive/smart-chess

Contenu du répertoire:
  - .git
  - .gitignore
  - README.md
  - ai
  - docs
  - prototypes


## 4. Installation des dépendances

In [None]:
# Installer les packages nécessaires
!pip install -q torch torchvision torchaudio
!pip install -q numpy matplotlib tqdm

print("✓ Installation terminée")

✓ Installation terminée


## 5. Vérification de l'environnement PyTorch

In [None]:
import torch
import numpy as np

print("=" * 60)
print("CONFIGURATION SYSTÈME")
print("=" * 60)
print(f"PyTorch version: {torch.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"\nCUDA disponible: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Nom du GPU: {torch.cuda.get_device_name(0)}")
    props = torch.cuda.get_device_properties(0)
    print(f"Mémoire GPU totale: {props.total_memory / 1e9:.2f} GB")
    print(f"Compute Capability: {props.major}.{props.minor}")
else:
    print("⚠️ ATTENTION: GPU non disponible, l'entraînement sera très lent!")
    print("   Allez dans Runtime > Change runtime type > GPU")

print("=" * 60)

CONFIGURATION SYSTÈME
PyTorch version: 2.8.0+cu126
NumPy version: 2.0.2

CUDA disponible: True
CUDA version: 12.6
Nom du GPU: Tesla T4
Mémoire GPU totale: 15.83 GB
Compute Capability: 7.5


## 6. Import des modules du projet

In [None]:
# Importer les modules nécessaires depuis le projet (robuste à l'emplacement du repo sur Drive)
import os
import sys
import importlib

# Assurez-vous que PROJECT_PATH est défini et ajoutez également le dossier `ai` au PYTHONPATH
PROJECT_PATH = '/content/drive/MyDrive/smart_chess_drive/smart-chess'
AI_SUBDIR = os.path.join(PROJECT_PATH, 'ai')

# Vérifier les chemins alternatifs (si l'utilisateur a copié le repo dans /content)
ALT_PATH = '/content/smart-chess'

# Choisir un chemin existant
if not os.path.isdir(PROJECT_PATH) and os.path.isdir(ALT_PATH):
    PROJECT_PATH = ALT_PATH

if not os.path.isdir(PROJECT_PATH):
    raise FileNotFoundError(f"Répertoire projet introuvable: {PROJECT_PATH}. Montez Drive et vérifiez le chemin.")

# Ajouter au sys.path si nécessaire
if PROJECT_PATH not in sys.path:
    sys.path.insert(0, PROJECT_PATH)
if AI_SUBDIR not in sys.path and os.path.isdir(AI_SUBDIR):
    sys.path.insert(0, AI_SUBDIR)

# Se placer dans le répertoire projet
os.chdir(PROJECT_PATH)

print('Répertoire de travail:', os.getcwd())
print('\nQuelques fichiers à la racine du projet:')
print(sorted(os.listdir(PROJECT_PATH))[:50])
print('\nContenu du dossier ai/:')
print(sorted(os.listdir(AI_SUBDIR))[:100])

# Diagnostic d'import direct pour le module Chess
try:
    import Chess
    print('\n✅ Import direct `Chess` OK (module trouvé via sys.path)')
except Exception as e:
    print('\n❌ Import direct `Chess` a échoué:', e)
    print('Vérifiez que `ai/Chess.py` existe et que le dossier ai/ est dans sys.path')

# Maintenant importer le module d'entraînement (trainer)
try:
    import ai.NN.train_torch as trainer
    import ai.NN.torch_nn_evaluator as torch_eval
    from ai.Chess_v2 import Chess
    print('\n✓ Modules importés avec succès!')
except Exception as e:
    print('\n❌ Erreur d\'import lors de l\'import du trainer:', e)
    raise


Répertoire de travail: /content/drive/MyDrive/smart_chess_drive/smart-chess

Quelques fichiers à la racine du projet:
['.git', '.gitignore', 'README.md', 'ai', 'docs', 'prototypes']

Contenu du dossier ai/:
['AI_reduction', 'Chess.py', 'ChessInteractifv2.py', 'Chess_v2.py', 'NN', 'Null_move_AI', 'Old_AI', 'Player.py', 'Profile', 'Tests.py', '__init__.py', '__pycache__', 'alphabeta.py', 'alphabeta_engine.py', 'alphabeta_engine_v2.py', 'analyze_reduction_overhead.py', 'base_engine.py', 'check_dataset_stats.py', 'check_gpu.py', 'check_performance.py', 'chess_model_checkpoint.pt', 'debug_conversion.py', 'engine_match.py', 'evaluator.py', 'example_move_reduction.py', 'fast_evaluator.py', 'journal-experiments.md', 'optimized_chess.py', 'profile_report_1760344602.txt', 'test_depth_6_performance.py', 'test_depth_6_quick.py', 'test_depth_effectiveness.py', 'test_engines_v2.py', 'test_evaluator_performance.py', 'test_generalization.py', 'test_move_reduction.py', 'test_null_move.py', 'test_null_m

## 7. Configuration de l'entraînement

In [None]:
# Paramètres d'entraînement
CONFIG = {
    # Génération de données
    'num_games': 10000,          # Nombre de parties à générer pour l'entraînement

    # Hyperparamètres
    'batch_size': 256,           # Taille du batch (augmenter si GPU puissant)
    'epochs': 50,                # Nombre d'époques d'entraînement
    'learning_rate': 0.001,      # Taux d'apprentissage

    # Configuration système
    'device': 'cuda' if torch.cuda.is_available() else 'cpu',
    'num_workers': 2,            # Workers pour le DataLoader

    # Sauvegarde
    'checkpoint_path': 'ai/chess_model_checkpoint.pt',
    'save_interval': 5,          # Sauvegarder tous les N époques
}

print("=" * 60)
print("CONFIGURATION DE L'ENTRAÎNEMENT")
print("=" * 60)
for key, value in CONFIG.items():
    print(f"{key:20s}: {value}")
print("=" * 60)

if CONFIG['device'] == 'cpu':
    print("\n⚠️ ATTENTION: Entraînement sur CPU détecté!")
    print("   Réduisez num_games et epochs pour un test rapide.")

CONFIGURATION DE L'ENTRAÎNEMENT
num_games           : 10000
batch_size          : 256
epochs              : 50
learning_rate       : 0.001
device              : cuda
num_workers         : 2
checkpoint_path     : ai/chess_model_checkpoint.pt
save_interval       : 5


## 8. Génération des données d'entraînement

Cette étape génère des parties d'échecs aléatoires et calcule les évaluations de position.
**Attention:** Cela peut prendre 15-30 minutes selon le nombre de parties.

In [None]:
from tqdm import tqdm
import time

print("Chargement du dataset (depuis chessData)...")

# Préférer la variable DATASET_CSV (définie après le montage Drive) sinon utiliser la valeur par défaut du module trainer
dataset_path = globals().get('DATASET_CSV') # Use the DATASET_CSV variable directly

if dataset_path is None:
    raise FileNotFoundError('Aucun chemin de dataset défini. Montez Drive et placez le fichier CSV dans MyDrive/smart_chess_drive/chessData')

start_time = time.time()

# Utiliser la fonction de chargement du script d'entraînement pour assurer le même prétraitement
fens, evaluations = trainer.load_data(dataset_path) # Pass the dataset_path explicitly

# Variables attendues plus bas dans le notebook
X_train = fens
y_train = evaluations

elapsed_time = time.time() - start_time

print("\n" + "=" * 60)
print("DONNÉES CHARGÉES")
print("=" * 60)
print(f"Nombre total de positions: {len(X_train):,}")
print(f"Temps écoulé: {elapsed_time:.1f}s ({elapsed_time/60:.1f} min)")
print("=" * 60)

# Statistiques sur les évaluations
print(f"\nStatistiques sur les évaluations:")
print(f"  Min: {y_train.min():.4f}")
print(f"  Max: {y_train.max():.4f}")
print(f"  Moyenne: {y_train.mean():.4f}")
print(f"  Écart-type: {y_train.std():.4f}")

Chargement du dataset (depuis chessData)...
📂 Chargement du dataset depuis /content/drive/MyDrive/smart_chess_drive/chessData.csv...
🧹 Nettoyage : 190154 lignes corrompues supprimées.
✅ 12,767,881 positions valides chargées.

DONNÉES CHARGÉES
Nombre total de positions: 12,767,881
Temps écoulé: 23.8s (0.4 min)

Statistiques sur les évaluations:
  Min: -15.3120
  Max: 15.3190
  Moyenne: 0.0455
  Écart-type: 0.8139


In [None]:
import inspect
import ai.NN.train_torch as trainer

try:
    # Get the source code of the load_data function
    source_code = inspect.getsource(trainer.load_data)
    print("Source code of trainer.load_data:")
    print("=" * 60)
    print(source_code)
    print("=" * 60)
except TypeError:
    print("Could not get source code for trainer.load_data. It might not be a function defined in the file.")
except FileNotFoundError:
    print("Could not find the train_torch.py file.")
except Exception as e:
    print(f"An error occurred while trying to get source code: {e}")

Source code of trainer.load_data:
def load_data(filepath: str):
    """Charge le dataset FEN,Evaluation et le nettoie."""
    print(f"📂 Chargement du dataset depuis {filepath}...")
    
    df = pd.read_csv(
        filepath, 
        names=['FEN', 'Evaluation'], 
        skiprows=1,
        comment='#'
    )
    
    initial_count = len(df)
    df.dropna(inplace=True)
    cleaned_count = len(df)
    
    if initial_count > cleaned_count:
        print(f"🧹 Nettoyage : {initial_count - cleaned_count} lignes corrompues supprimées.")
    
    fens = df['FEN'].values
    EVAL_SCALE_FACTOR = 1000.0
    evaluations = (df['Evaluation'].astype(int).values) / EVAL_SCALE_FACTOR
    
    print(f"✅ {len(fens):,} positions valides chargées.")
    return fens, evaluations



In [None]:
import os

file_path = os.path.join(PROJECT_PATH, 'ai/NN/train_torch.py')

# Read the content of the file
with open(file_path, 'r') as f:
    content = f.read()

# Assuming the load_data function signature is currently load_data():
# We need to find the function definition and modify it to accept dataset_path
# This is a simple string replacement and might need adjustment based on the actual code
old_def = 'def load_data():'
new_def = 'def load_data(dataset_path):'
old_data_loading_line = "df = pd.read_csv('C:\\\\Users\\\\gauti\\\\OneDrive\\\\Documents\\\\UE commande\\\\chessData.csv')" # This is a guess, may need adjustment
new_data_loading_line = "df = pd.read_csv(dataset_path)"


if old_def in content and old_data_loading_line in content:
    content = content.replace(old_def, new_def)
    content = content.replace(old_data_loading_line, new_data_loading_line)
    # Write the modified content back to the file
    with open(file_path, 'w') as f:
        f.write(content)
    print(f"Successfully modified {file_path} to accept and use dataset_path in load_data function.")
elif old_def in content:
     print(f"Found function definition '{old_def}', but could not find the specific data loading line '{old_data_loading_line}' to replace.")
     print("Please inspect the `load_data` function in `ai/NN/train_torch.py` and manually update the file path to use the `dataset_path` argument.")
else:
    print(f"Could not find the function definition '{old_def}' in {file_path}. Please inspect the file manually.")

Could not find the function definition 'def load_data():' in /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/NN/train_torch.py. Please inspect the file manually.


## 9. Création du dataset et du dataloader

In [None]:
from torch.utils.data import DataLoader
from ai.NN.train_torch import ChessDataset # Import ChessDataset

# Créer le dataset
dataset = ChessDataset(X_train, y_train)

# Créer le dataloader
train_loader = DataLoader(
    dataset,
    batch_size=CONFIG['batch_size'],
    shuffle=True,
    num_workers=CONFIG['num_workers'],
    pin_memory=True if CONFIG['device'] == 'cuda' else False
)

print("=" * 60)
print("DATALOADER CONFIGURÉ")
print("=" * 60)
print(f"Taille du dataset: {len(dataset):,} échantillons")
print(f"Nombre de batches: {len(train_loader):,}")
print(f"Taille du batch: {CONFIG['batch_size']}")
print(f"Dernière batch: {len(dataset) % CONFIG['batch_size']} échantillons")
print("=" * 60)

DATALOADER CONFIGURÉ
Taille du dataset: 12,767,881 échantillons
Nombre de batches: 49,875
Taille du batch: 256
Dernière batch: 137 échantillons


## 10. Création du modèle

In [None]:
# Créer le modèle et le déplacer sur le device approprié
from ai.NN.torch_nn_evaluator import TorchNNEvaluator # Import TorchNNEvaluator from torch_nn_evaluator

model = TorchNNEvaluator().to(CONFIG['device'])

# Afficher l'architecture
print("=" * 60)
print("ARCHITECTURE DU MODÈLE")
print("=" * 60)
print(model)
print("=" * 60)

# Compter les paramètres
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nNombre total de paramètres: {total_params:,}")
print(f"Paramètres entraînables: {trainable_params:,}")
print(f"Device: {CONFIG['device']}")

# Estimer la taille mémoire du modèle
param_size_mb = total_params * 4 / (1024 ** 2)  # 4 bytes par float32
print(f"Taille estimée du modèle: {param_size_mb:.2f} MB")

ARCHITECTURE DU MODÈLE
TorchNNEvaluator(
  (l1): Linear(in_features=768, out_features=256, bias=True)
  (l2): Linear(in_features=256, out_features=256, bias=True)
  (l3): Linear(in_features=256, out_features=1, bias=True)
  (dropout1): Dropout(p=0.3, inplace=False)
  (dropout2): Dropout(p=0.3, inplace=False)
  (leaky_relu): LeakyReLU(negative_slope=0.01)
)

Nombre total de paramètres: 262,913
Paramètres entraînables: 262,913
Device: cuda
Taille estimée du modèle: 1.00 MB


In [None]:
import os

file_path = os.path.join(PROJECT_PATH, 'ai/NN/torch_nn_evaluator.py')

try:
    with open(file_path, 'r') as f:
        content = f.read()
    print(f"Content of {file_path}:")
    print("=" * 60)
    print(content)
    print("=" * 60)
except FileNotFoundError:
    print(f"Error: File not found at {file_path}")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

Content of /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/NN/torch_nn_evaluator.py:
import numpy as np
import torch
import torch.nn as nn
from Chess import Chess


class TorchNNEvaluator(nn.Module):
    """PyTorch implementation équivalente du `NeuralNetworkEvaluator` en NumPy.

    - architecture: Linear(input -> hidden) -> LeakyReLU -> Dropout -> Linear(hidden -> hidden) -> LeakyReLU -> Dropout -> Linear(hidden -> out)
    - fournit des helpers pour charger/sauver au format .npz (compatibilité avec l'ancien code NumPy)
    - fournit des helpers pour checkpoint/restore PyTorch (optimizer.state_dict)
    - Support GPU automatique
    """

    def __init__(self, input_size=768, hidden_size=256, output_size=1, dropout=0.3, leaky_alpha=0.01):
        super().__init__()
        self.l1 = nn.Linear(input_size, hidden_size)
        self.l2 = nn.Linear(hidden_size, hidden_size)
        self.l3 = nn.Linear(hidden_size, output_size)
        self.dropout1 = nn.Dropout(p=dropout)
       

## 11. Entraînement du modèle

Cette étape lance l'entraînement complet. Les checkpoints sont sauvegardés automatiquement sur votre Drive.

In [None]:
# This cell is no longer needed as trainer.main() handles the training loop.
# The training will be started by running cell 9887d4b8.
# You can keep this cell as a placeholder or delete it if you prefer.
# The training history will be available after trainer.main() completes if the script returns it or saves it.

print("The training process is handled by calling trainer.main() in cell 9887d4b8.")
print("Please run cell 9887d4b8 to start the training.")

# Keep the history variable assignment as a placeholder if the script returns it
# history = None # Or whatever trainer.main() might return

The training process is handled by calling trainer.main() in cell 9887d4b8.
Please run cell 9887d4b8 to start the training.


In [None]:
import os

file_path = os.path.join(PROJECT_PATH, 'ai/NN/train_torch.py')

try:
    with open(file_path, 'r') as f:
        content = f.read()

    # Remove the verbose=True argument from ReduceLROnPlateau
    old_scheduler_init = "patience=LR_PATIENCE, verbose=True"
    new_scheduler_init = "patience=LR_PATIENCE" # Remove verbose argument

    if old_scheduler_init in content:
        content = content.replace(old_scheduler_init, new_scheduler_init)
        # Write the modified content back to the file
        with open(file_path, 'w') as f:
            f.write(content)
        print(f"Successfully removed 'verbose=True' from ReduceLROnPlateau in {file_path}.")
    else:
        print(f"'verbose=True' not found in ReduceLROnPlateau initialization in {file_path}. No changes made.")

except FileNotFoundError:
    print(f"Error: File not found at {file_path}")
except Exception as e:
    print(f"An error occurred while trying to modify the file: {e}")

Successfully removed 'verbose=True' from ReduceLROnPlateau in /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/NN/train_torch.py.


In [None]:
# @title
import os

file_path = os.path.join(PROJECT_PATH, 'ai/NN/train_torch.py')

try:
    with open(file_path, 'r') as f:
        content = f.read()
    print(f"Content of {file_path}:")
    print("=" * 60)
    print(content)
    print("=" * 60)
except FileNotFoundError:
    print(f"Error: File not found at {file_path}")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

Content of /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/NN/train_torch.py:
"""
Script d'entraînement PyTorch optimisé pour GPU
Compatible avec Google Colab et machines locales avec GPU
"""
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm
import os

from Chess import Chess
from ai.NN.torch_nn_evaluator import TorchNNEvaluator, save_weights_npz, load_from_npz, torch_save_checkpoint, torch_load_checkpoint

# --- CONFIGURATION DE L'ENTRAÎNEMENT ---
DATASET_PATH = "C:\\Users\\gauti\\OneDrive\\Documents\\UE commande\\chessData.csv"  # Adapté pour Colab (fichier à la racine)
WEIGHTS_FILE = "chess_nn_weights.npz"
CHECKPOINT_FILE = "chess_model_checkpoint.pt"

# Architecture
HIDDEN_SIZE = 256
DROPOUT = 0.3
LEAKY_ALPHA = 0.01

# Hyperparamètres
LEARNING_RATE = 0.001
WEIGHT_DECAY = 1e-4  # L2 regularization (AdamW)
EPOCHS = 20
BATCH_SIZE = 128  # Plus grand pour G

## 12. Visualisation des résultats

In [None]:
import matplotlib.pyplot as plt

# Configurer le style des graphiques
plt.style.use('seaborn-v0_8-darkgrid')
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Graphique 1: Loss
axes[0].plot(history['loss'], linewidth=2, color='#2E86AB', label='Training Loss')
axes[0].set_xlabel('Époque', fontsize=12)
axes[0].set_ylabel('Loss (MSE)', fontsize=12)
axes[0].set_title('Évolution de la perte pendant l\'entraînement', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=10)
axes[0].grid(True, alpha=0.3)

# Afficher les valeurs min/max
min_loss = min(history['loss'])
max_loss = max(history['loss'])
axes[0].axhline(y=min_loss, color='green', linestyle='--', alpha=0.5, label=f'Min: {min_loss:.6f}')
axes[0].legend(fontsize=10)

# Graphique 2: MAE (si disponible)
if 'mae' in history:
    axes[1].plot(history['mae'], linewidth=2, color='#F77F00', label='MAE')
    axes[1].set_xlabel('Époque', fontsize=12)
    axes[1].set_ylabel('MAE', fontsize=12)
    axes[1].set_title('Erreur absolue moyenne', fontsize=14, fontweight='bold')
    axes[1].legend(fontsize=10)
    axes[1].grid(True, alpha=0.3)

    min_mae = min(history['mae'])
    axes[1].axhline(y=min_mae, color='green', linestyle='--', alpha=0.5, label=f'Min: {min_mae:.6f}')
    axes[1].legend(fontsize=10)
else:
    axes[1].text(0.5, 0.5, 'MAE non disponible',
                ha='center', va='center', fontsize=14, transform=axes[1].transAxes)
    axes[1].set_xticks([])
    axes[1].set_yticks([])

plt.tight_layout()
plt.savefig('training_history.png', dpi=150, bbox_inches='tight')
plt.show()

# Afficher les statistiques finales
print("\n" + "=" * 60)
print("STATISTIQUES FINALES")
print("=" * 60)
print(f"Perte finale: {history['loss'][-1]:.6f}")
print(f"Perte minimale: {min_loss:.6f} (époque {history['loss'].index(min_loss) + 1})")
if 'mae' in history:
    print(f"MAE final: {history['mae'][-1]:.6f}")
    print(f"MAE minimal: {min_mae:.6f} (époque {history['mae'].index(min_mae) + 1})")
print("=" * 60)

## 13. Sauvegarde du modèle final

In [None]:
import datetime

# Timestamp pour identifier cette sauvegarde
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

# Sauvegarder le modèle complet avec l'historique
final_model_path = f'ai/chess_model_final_{timestamp}.pt'
torch.save({
    'epoch': CONFIG['epochs'],
    'model_state_dict': model.state_dict(),
    'config': CONFIG,
    'history': history,
    'timestamp': timestamp,
}, final_model_path)

print("=" * 60)
print("SAUVEGARDE DES MODÈLES")
print("=" * 60)
print(f"✓ Modèle final: {final_model_path}")

# Sauvegarder aussi au format .npz pour compatibilité avec l'ancien code
weights_path = 'ai/NN/chess_nn_weights.npz'
weights = {name: param.cpu().detach().numpy() for name, param in model.named_parameters()}
np.savez(weights_path, **weights)
print(f"✓ Poids .npz: {weights_path}")

# Copier aussi le checkpoint dans NN/
import shutil
checkpoint_backup = f'ai/NN/chess_model_checkpoint_{timestamp}.pt'
if os.path.exists(CONFIG['checkpoint_path']):
    shutil.copy(CONFIG['checkpoint_path'], checkpoint_backup)
    print(f"✓ Checkpoint backup: {checkpoint_backup}")

print("=" * 60)
print("\n✅ Tous les fichiers sont sauvegardés sur votre Google Drive!")
print(f"   Chemin: {PROJECT_PATH}")

## 14. Test du modèle sur des positions aléatoires

In [None]:
# Passer le modèle en mode évaluation
model.eval()

# Tester sur quelques positions aléatoires
num_tests = 10
test_indices = np.random.choice(len(X_train), num_tests, replace=False)

print("=" * 60)
print(f"TEST SUR {num_tests} POSITIONS ALÉATOIRES")
print("=" * 60)

errors = []

with torch.no_grad():
    for i, idx in enumerate(test_indices, 1):
        x = torch.FloatTensor(X_train[idx:idx+1]).to(CONFIG['device'])
        y_true = y_train[idx]
        y_pred = model(x).cpu().numpy()[0, 0]
        error = abs(y_true - y_pred)
        errors.append(error)

        print(f"\nPosition {i}:")
        print(f"  Évaluation réelle:  {y_true:+8.4f}")
        print(f"  Prédiction modèle:  {y_pred:+8.4f}")
        print(f"  Erreur absolue:     {error:8.4f}")

        # Indicateur visuel de la qualité
        if error < 0.1:
            print(f"  Qualité: ✅ Excellente")
        elif error < 0.3:
            print(f"  Qualité: ✓ Bonne")
        elif error < 0.5:
            print(f"  Qualité: ⚠ Moyenne")
        else:
            print(f"  Qualité: ❌ Faible")

print("\n" + "=" * 60)
print("STATISTIQUES DES TESTS")
print("=" * 60)
print(f"Erreur moyenne: {np.mean(errors):.4f}")
print(f"Erreur médiane: {np.median(errors):.4f}")
print(f"Erreur min:     {np.min(errors):.4f}")
print(f"Erreur max:     {np.max(errors):.4f}")
print(f"Écart-type:     {np.std(errors):.4f}")
print("=" * 60)

## 15. Résumé et fichiers générés

In [None]:
print("\n" + "="*60)
print("📊 RÉSUMÉ DE L'ENTRAÎNEMENT")
print("="*60)
print(f"\n📍 Projet: {PROJECT_PATH}")
print(f"\n⚙️ Configuration:")
print(f"   • Parties générées: {CONFIG['num_games']:,}")
print(f"   • Positions d'entraînement: {len(X_train):,}")
print(f"   • Époques: {CONFIG['epochs']}")
print(f"   • Batch size: {CONFIG['batch_size']}")
print(f"   • Learning rate: {CONFIG['learning_rate']}")
print(f"   • Device: {CONFIG['device']}")

print(f"\n📈 Résultats:")
print(f"   • Perte finale: {history['loss'][-1]:.6f}")
print(f"   • Perte minimale: {min(history['loss']):.6f}")
if 'mae' in history:
    print(f"   • MAE final: {history['mae'][-1]:.6f}")

print(f"\n💾 Fichiers sauvegardés sur Drive:")
files_to_check = [
    final_model_path,
    CONFIG['checkpoint_path'],
    weights_path,
    'training_history.png'
]

for filepath in files_to_check:
    if os.path.exists(filepath):
        size = os.path.getsize(filepath) / (1024 * 1024)  # Convertir en MB
        print(f"   ✓ {filepath} ({size:.2f} MB)")
    else:
        print(f"   ✗ {filepath} (non trouvé)")

print("\n" + "="*60)
print("✅ ENTRAÎNEMENT TERMINÉ AVEC SUCCÈS!")
print("="*60)
print("\nTous les fichiers sont automatiquement synchronisés avec votre Google Drive.")
print("Vous pouvez fermer ce notebook en toute sécurité.\n")

In [None]:
# Localiser le dataset sur Google Drive et préparer le dossier de checkpoints
import os
from glob import glob

# Chemin attendu du dossier contenant le dataset (donné par l'user)
# Updated based on user's feedback that the file is directly in smart_chess_drive
DATASET_DIR = '/content/drive/MyDrive/smart_chess_drive/'

# Chercher un fichier .csv dans DATASET_DIR
DATASET_CSV = None
if os.path.exists(DATASET_DIR):
    csvs = glob(os.path.join(DATASET_DIR, '*.csv'))
    if len(csvs) > 0:
        # Assuming there's only one relevant CSV in that dir, pick the first one
        DATASET_CSV = csvs[0]
        print(f'✅ Dataset CSV trouvé: {DATASET_CSV}')
    else:
        print(f'❌ Aucun fichier .csv trouvé dans {DATASET_DIR}. Placez votre fichier chessData.csv dans ce dossier.')
else:
    print(f'❌ Dossier dataset introuvable: {DATASET_DIR}. Vérifiez le chemin sur votre Drive.')

# Créer un dossier de checkpoints dans le repo sur Drive (persistant)
CKPT_DIR = '/content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints'
os.makedirs(CKPT_DIR, exist_ok=True)
print('Dossier de checkpoints (créé si manquant):', CKPT_DIR)

# Exposer variables utiles
print('\nVariables exposées:')
print(' DATASET_CSV =', DATASET_CSV)
print(' CKPT_DIR =', CKPT_DIR)

✅ Dataset CSV trouvé: /content/drive/MyDrive/smart_chess_drive/chessData.csv
Dossier de checkpoints (créé si manquant): /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints

Variables exposées:
 DATASET_CSV = /content/drive/MyDrive/smart_chess_drive/chessData.csv
 CKPT_DIR = /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints


In [None]:
# Configurer et lancer le script d'entraînement `ai.NN.train_torch` en adaptant les chemins pour Colab/Drive
import os
import importlib

if DATASET_CSV is None:
    raise FileNotFoundError(f"Dataset non trouvé dans: {DATASET_DIR}")

# Importer le module d'entraînement
import ai.NN.train_torch as trainer

# Reload the module to pick up recent changes
importlib.reload(trainer)


# Rediriger les chemins dataset et checkpoints vers Drive
trainer.DATASET_PATH = DATASET_CSV
trainer.CHECKPOINT_FILE = os.path.join(CKPT_DIR, os.path.basename(trainer.CHECKPOINT_FILE))
trainer.WEIGHTS_FILE = os.path.join(CKPT_DIR, os.path.basename(trainer.WEIGHTS_FILE))

# Harmonisation des tailles de batch (Option B):
# On force le module trainer à utiliser la valeur définie dans CONFIG['batch_size']
try:
    trainer.BATCH_SIZE = CONFIG['batch_size']
    print(f'✅ Harmonisation: trainer.BATCH_SIZE = {trainer.BATCH_SIZE}')
except Exception as e:
    print('⚠️ Impossible de définir trainer.BATCH_SIZE:', e)

# --- Apply user-requested global hyperparameter changes ---
# Increase hidden layer size to 512 and decrease per-epoch samples to 50k
try:
    trainer.HIDDEN_SIZE = 512
    trainer.MAX_SAMPLES = 50_000
    print(f"✅ Applied global changes: trainer.HIDDEN_SIZE={trainer.HIDDEN_SIZE}, trainer.MAX_SAMPLES={trainer.MAX_SAMPLES}")
except Exception as e:
    print('⚠️ Impossible de définir trainer.HIDDEN_SIZE / trainer.MAX_SAMPLES:', e)

# Optionally set other CONFIG parameters from the notebook if needed
# trainer.EPOCHS = CONFIG['epochs']
# trainer.LEARNING_RATE = CONFIG['learning_rate']
# trainer.DEVICE = CONFIG['device']

# Optionnel: réduire pour test rapide (décommentez si besoin)
# trainer.EPOCHS = 2
# trainer.MAX_SAMPLES = 5000

print('Configuration trainer:')
print(' DATASET_PATH=', trainer.DATASET_PATH)
print(' CHECKPOINT_FILE=', trainer.CHECKPOINT_FILE)
print(' WEIGHTS_FILE=', trainer.WEIGHTS_FILE)
print(' EPOCHS=', trainer.EPOCHS)
print(' MAX_SAMPLES=', trainer.MAX_SAMPLES)


# Lancer l'entraînement
trainer.main()


🖥️  Device: cuda
🚀 GPU: Tesla T4
💾 GPU Memory: 15.83 GB
Configuration trainer:
 DATASET_PATH= /content/drive/MyDrive/smart_chess_drive/chessData.csv
 CHECKPOINT_FILE= /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
 WEIGHTS_FILE= /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz
 EPOCHS= 20
 MAX_SAMPLES= 500000
📂 Chargement du dataset depuis /content/drive/MyDrive/smart_chess_drive/chessData.csv...
🧹 Nettoyage : 190154 lignes corrompues supprimées.
✅ 12,767,881 positions valides chargées.

📊 Dataset complet: 12,767,881 positions
📥 Chargement du checkpoint PyTorch: /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
✅ Checkpoint chargé (step 20)

Configuration:
  Dataset complet: 12,767,881 positions
  Échantillon/epoch: 500,000 positions
  Architecture: 768 → 256 → 256 → 1
  Dropout: 0.3
  LeakyReLU alpha: 0.01
  Learning rate: 0.001 (AdamW, weight decay: 0.000

Epoch 1/20:   0%|          | 8/3907 [00:00<00:54, 71.93it/s, loss=0.5618]


[DEBUG batch 0] targets mean=0.0927 std=0.5291; preds mean=0.0462 std=0.3324; RMSE=0.4315; corr=0.5868


Epoch 1/20: 100%|██████████| 3907/3907 [00:43<00:00, 89.11it/s, loss=0.6169]



🔍 Évaluation epoch 1...

EPOCH 1/20 - Évaluation sur 5,000 positions
  RMSE:        0.6477  (baseline: 0.8621)
  MAE:         0.2583
  Amélioration: +24.9% vs baseline
  Corrélation: 0.6779
  Std preds:   0.4511  (cible: 0.8621)
  Mean preds:  0.0390  (cible: 0.0280)
  →  Apprentissage en cours

💾 Nouveau meilleur RMSE: 0.6477 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz

[Epoch 2] 🎲 Échantillonnage: 500,000 positions sur 12,767,881
🔥 Warmup epoch 2/3: LR = 0.000700


Epoch 2/20: 100%|██████████| 3907/3907 [00:41<00:00, 94.00it/s, loss=0.6237]



🔍 Évaluation epoch 2...

EPOCH 2/20 - Évaluation sur 5,000 positions
  RMSE:        0.5960  (baseline: 0.8320)
  MAE:         0.2640
  Amélioration: +28.4% vs baseline
  Corrélation: 0.7008
  Std preds:   0.5345  (cible: 0.8320)
  Mean preds:  -0.0043  (cible: 0.0202)
  →  Apprentissage en cours

💾 Nouveau meilleur RMSE: 0.5960 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz

[Epoch 3] 🎲 Échantillonnage: 500,000 positions sur 12,767,881
🔥 Warmup epoch 3/3: LR = 0.001000


Epoch 3/20: 100%|██████████| 3907/3907 [00:42<00:00, 90.88it/s, loss=0.6300]



🔍 Évaluation epoch 3...

EPOCH 3/20 - Évaluation sur 5,000 positions
  RMSE:        0.6007  (baseline: 0.7926)
  MAE:         0.2528
  Amélioration: +24.2% vs baseline
  Corrélation: 0.6530
  Std preds:   0.5032  (cible: 0.7926)
  Mean preds:  0.0393  (cible: 0.0213)
  →  Apprentissage en cours


[Epoch 4] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 4/20: 100%|██████████| 3907/3907 [00:43<00:00, 90.32it/s, loss=0.6367]



🔍 Évaluation epoch 4...

EPOCH 4/20 - Évaluation sur 5,000 positions
  RMSE:        0.6376  (baseline: 0.8382)
  MAE:         0.2635
  Amélioration: +23.9% vs baseline
  Corrélation: 0.6627
  Std preds:   0.4455  (cible: 0.8382)
  Mean preds:  0.0345  (cible: 0.0542)
  →  Apprentissage en cours


[Epoch 5] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 5/20: 100%|██████████| 3907/3907 [00:41<00:00, 93.25it/s, loss=0.6403]



🔍 Évaluation epoch 5...

EPOCH 5/20 - Évaluation sur 5,000 positions
  RMSE:        0.5827  (baseline: 0.7798)
  MAE:         0.2463
  Amélioration: +25.3% vs baseline
  Corrélation: 0.6693
  Std preds:   0.4626  (cible: 0.7798)
  Mean preds:  0.0355  (cible: 0.0517)
  →  Apprentissage en cours

💾 Nouveau meilleur RMSE: 0.5827 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz

[Epoch 6] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 6/20: 100%|██████████| 3907/3907 [00:42<00:00, 91.13it/s, loss=0.6292]



🔍 Évaluation epoch 6...

EPOCH 6/20 - Évaluation sur 5,000 positions
  RMSE:        0.6059  (baseline: 0.8022)
  MAE:         0.2631
  Amélioration: +24.5% vs baseline
  Corrélation: 0.6633
  Std preds:   0.4507  (cible: 0.8022)
  Mean preds:  0.0673  (cible: 0.0569)
  →  Apprentissage en cours


[Epoch 7] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 7/20: 100%|██████████| 3907/3907 [00:41<00:00, 93.31it/s, loss=0.6340]



🔍 Évaluation epoch 7...

EPOCH 7/20 - Évaluation sur 5,000 positions
  RMSE:        0.5934  (baseline: 0.7871)
  MAE:         0.2517
  Amélioration: +24.6% vs baseline
  Corrélation: 0.6584
  Std preds:   0.4910  (cible: 0.7871)
  Mean preds:  0.0575  (cible: 0.0384)
  →  Apprentissage en cours


[Epoch 8] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 8/20: 100%|██████████| 3907/3907 [00:42<00:00, 91.16it/s, loss=0.6330]



🔍 Évaluation epoch 8...

EPOCH 8/20 - Évaluation sur 5,000 positions
  RMSE:        0.6583  (baseline: 0.8617)
  MAE:         0.2633
  Amélioration: +23.6% vs baseline
  Corrélation: 0.6507
  Std preds:   0.4895  (cible: 0.8617)
  Mean preds:  0.0381  (cible: 0.0521)
  →  Apprentissage en cours


[Epoch 9] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 9/20: 100%|██████████| 3907/3907 [00:43<00:00, 90.09it/s, loss=0.6270]



🔍 Évaluation epoch 9...

EPOCH 9/20 - Évaluation sur 5,000 positions
  RMSE:        0.6124  (baseline: 0.8446)
  MAE:         0.2572
  Amélioration: +27.5% vs baseline
  Corrélation: 0.6981
  Std preds:   0.4930  (cible: 0.8446)
  Mean preds:  0.0379  (cible: 0.0360)
  →  Apprentissage en cours


[Epoch 10] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 10/20: 100%|██████████| 3907/3907 [00:43<00:00, 89.71it/s, loss=0.6263]



🔍 Évaluation epoch 10...

EPOCH 10/20 - Évaluation sur 5,000 positions
  RMSE:        0.6193  (baseline: 0.8211)
  MAE:         0.2538
  Amélioration: +24.6% vs baseline
  Corrélation: 0.6623
  Std preds:   0.4735  (cible: 0.8211)
  Mean preds:  0.0590  (cible: 0.0464)
  →  Apprentissage en cours


[Epoch 11] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 11/20: 100%|██████████| 3907/3907 [00:43<00:00, 89.91it/s, loss=0.6248]



🔍 Évaluation epoch 11...

EPOCH 11/20 - Évaluation sur 5,000 positions
  RMSE:        0.5758  (baseline: 0.7928)
  MAE:         0.2418
  Amélioration: +27.4% vs baseline
  Corrélation: 0.7046
  Std preds:   0.4357  (cible: 0.7928)
  Mean preds:  0.0392  (cible: 0.0370)
  →  Apprentissage en cours

💾 Nouveau meilleur RMSE: 0.5758 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz

[Epoch 12] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 12/20: 100%|██████████| 3907/3907 [00:43<00:00, 90.06it/s, loss=0.6223]



🔍 Évaluation epoch 12...

EPOCH 12/20 - Évaluation sur 5,000 positions
  RMSE:        0.6220  (baseline: 0.8393)
  MAE:         0.2541
  Amélioration: +25.9% vs baseline
  Corrélation: 0.6743
  Std preds:   0.5152  (cible: 0.8393)
  Mean preds:  0.0380  (cible: 0.0505)
  →  Apprentissage en cours


[Epoch 13] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 13/20: 100%|██████████| 3907/3907 [00:43<00:00, 89.97it/s, loss=0.6241]



🔍 Évaluation epoch 13...

EPOCH 13/20 - Évaluation sur 5,000 positions
  RMSE:        0.6435  (baseline: 0.8653)
  MAE:         0.2567
  Amélioration: +25.6% vs baseline
  Corrélation: 0.6920
  Std preds:   0.4446  (cible: 0.8653)
  Mean preds:  0.0425  (cible: 0.0366)
  →  Apprentissage en cours


[Epoch 14] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 14/20: 100%|██████████| 3907/3907 [00:44<00:00, 88.35it/s, loss=0.6218]



🔍 Évaluation epoch 14...

EPOCH 14/20 - Évaluation sur 5,000 positions
  RMSE:        0.5362  (baseline: 0.7461)
  MAE:         0.2336
  Amélioration: +28.1% vs baseline
  Corrélation: 0.6971
  Std preds:   0.4835  (cible: 0.7461)
  Mean preds:  0.0610  (cible: 0.0503)
  →  Apprentissage en cours

💾 Nouveau meilleur RMSE: 0.5362 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz

[Epoch 15] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 15/20: 100%|██████████| 3907/3907 [00:43<00:00, 90.20it/s, loss=0.6187]



🔍 Évaluation epoch 15...

EPOCH 15/20 - Évaluation sur 5,000 positions
  RMSE:        0.5871  (baseline: 0.7916)
  MAE:         0.2466
  Amélioration: +25.8% vs baseline
  Corrélation: 0.6729
  Std preds:   0.4928  (cible: 0.7916)
  Mean preds:  0.0359  (cible: 0.0518)
  →  Apprentissage en cours


[Epoch 16] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 16/20: 100%|██████████| 3907/3907 [00:43<00:00, 89.59it/s, loss=0.6172]



🔍 Évaluation epoch 16...

EPOCH 16/20 - Évaluation sur 5,000 positions
  RMSE:        0.5899  (baseline: 0.8220)
  MAE:         0.2468
  Amélioration: +28.2% vs baseline
  Corrélation: 0.6976
  Std preds:   0.5419  (cible: 0.8220)
  Mean preds:  0.0371  (cible: 0.0472)
  →  Apprentissage en cours


[Epoch 17] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 17/20: 100%|██████████| 3907/3907 [00:43<00:00, 89.26it/s, loss=0.6182]



🔍 Évaluation epoch 17...

EPOCH 17/20 - Évaluation sur 5,000 positions
  RMSE:        0.5967  (baseline: 0.8040)
  MAE:         0.2532
  Amélioration: +25.8% vs baseline
  Corrélation: 0.6725
  Std preds:   0.4964  (cible: 0.8040)
  Mean preds:  0.0312  (cible: 0.0389)
  →  Apprentissage en cours


[Epoch 18] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 18/20: 100%|██████████| 3907/3907 [00:43<00:00, 90.55it/s, loss=0.6159]



🔍 Évaluation epoch 18...

EPOCH 18/20 - Évaluation sur 5,000 positions
  RMSE:        0.6148  (baseline: 0.8624)
  MAE:         0.2557
  Amélioration: +28.7% vs baseline
  Corrélation: 0.7115
  Std preds:   0.5101  (cible: 0.8624)
  Mean preds:  0.0517  (cible: 0.0532)
  →  Apprentissage en cours


[Epoch 19] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 19/20: 100%|██████████| 3907/3907 [00:43<00:00, 90.14it/s, loss=0.6082]



🔍 Évaluation epoch 19...

EPOCH 19/20 - Évaluation sur 5,000 positions
  RMSE:        0.5778  (baseline: 0.8123)
  MAE:         0.2394
  Amélioration: +28.9% vs baseline
  Corrélation: 0.7165
  Std preds:   0.4690  (cible: 0.8123)
  Mean preds:  0.0319  (cible: 0.0296)
  →  Apprentissage en cours


[Epoch 20] 🎲 Échantillonnage: 500,000 positions sur 12,767,881


Epoch 20/20: 100%|██████████| 3907/3907 [00:43<00:00, 89.44it/s, loss=0.6116]



🔍 Évaluation epoch 20...

EPOCH 20/20 - Évaluation sur 5,000 positions
  RMSE:        0.5785  (baseline: 0.7720)
  MAE:         0.2433
  Amélioration: +25.1% vs baseline
  Corrélation: 0.6639
  Std preds:   0.4806  (cible: 0.7720)
  Mean preds:  0.0381  (cible: 0.0565)
  →  Apprentissage en cours


🎉 Entraînement terminé!
📊 Meilleur RMSE: 0.5362

💾 Sauvegarde finale...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz
✅ Modèle sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_model_checkpoint.pt et /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/chess_nn_weights.npz


In [None]:
# @title
import os

file_path = os.path.join(PROJECT_PATH, 'ai/NN/train_torch.py')

try:
    with open(file_path, 'r') as f:
        content = f.read()
    print(f"Content of {file_path}:")
    print("=" * 60)
    print(content)
    print("=" * 60)
except FileNotFoundError:
    print(f"Error: File not found at {file_path}")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

Content of /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/NN/train_torch.py:
"""
Script d'entraînement PyTorch optimisé pour GPU
Compatible avec Google Colab et machines locales avec GPU
"""
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm
import os

from Chess import Chess
from ai.NN.torch_nn_evaluator import TorchNNEvaluator, save_weights_npz, load_from_npz, torch_save_checkpoint, torch_load_checkpoint

# --- CONFIGURATION DE L'ENTRAÎNEMENT ---
DATASET_PATH = "C:\\Users\\gauti\\OneDrive\\Documents\\UE commande\\chessData.csv"  # Adapté pour Colab (fichier à la racine)
WEIGHTS_FILE = "chess_nn_weights.npz"
CHECKPOINT_FILE = "chess_model_checkpoint.pt"

# Architecture
HIDDEN_SIZE = 256
DROPOUT = 0.3
LEAKY_ALPHA = 0.01

# Hyperparamètres
LEARNING_RATE = 0.001
WEIGHT_DECAY = 1e-4  # L2 regularization (AdamW)
EPOCHS = 20
BATCH_SIZE = 128  # Plus grand pour G

In [None]:
# Smoke tests automatisés — 3 runs courts pour comparer configurations
# - Crée une validation fixe, lance 3 expériences courtes (EPOCHS=3, MAX_SAMPLES=50k)
# - Sauvegarde checkpoints séparés et évalue chaque modèle sur la validation fixe

import time
import importlib
import os
import numpy as np
from torch.utils.data import DataLoader

print('Lancement des smoke tests (rapides).\n')

# Vérifier dataset et données en mémoire
if 'X_train' not in globals() or 'y_train' not in globals():
    print('X_train/y_train non trouvés en mémoire — chargement léger depuis trainer.DATASET_PATH (peut prendre du temps)...')
    fens, evaluations = trainer.load_data(trainer.DATASET_PATH)
    X_train = fens
    y_train = evaluations

# Créer validation fixe (seed deterministe)
val_size = min(5000, len(X_train))
rs = np.random.RandomState(42)
val_idx = rs.choice(len(X_train), size=val_size, replace=False)
val_fens = X_train[val_idx]
val_targets = y_train[val_idx]
print(f'Validation fixe : {val_size} positions (seed=42)')

# Sauvegarder originaux pour restauration
orig_keys = ['HIDDEN_SIZE','DROPOUT','LEARNING_RATE','WEIGHT_DECAY','BATCH_SIZE','EPOCHS','MAX_SAMPLES','CHECKPOINT_FILE','WEIGHTS_FILE','EVAL_MAX_SAMPLES']
orig = {k: getattr(trainer, k) for k in orig_keys if hasattr(trainer, k)}

# Expériences à tester (changes applied on top of orig)
experiments = [
    {'name': 'baseline', 'HIDDEN_SIZE': orig.get('HIDDEN_SIZE', 256), 'DROPOUT': orig.get('DROPOUT', 0.3), 'LEARNING_RATE': orig.get('LEARNING_RATE', 0.001)},
    {'name': 'bigger', 'HIDDEN_SIZE': 512, 'DROPOUT': 0.2, 'LEARNING_RATE': 5e-4},
    {'name': 'smaller_lr', 'HIDDEN_SIZE': 512, 'DROPOUT': 0.2, 'LEARNING_RATE': 1e-4},
]

results = []

for exp in experiments:
    print('\n' + '='*80)
    print(f"Exp: {exp['name']}")
    print('='*80)

    # Set quick test params
    trainer.EPOCHS = 3
    trainer.MAX_SAMPLES = 50_000
    trainer.EVAL_MAX_SAMPLES = 2000

    # Apply experiment overrides
    trainer.HIDDEN_SIZE = exp['HIDDEN_SIZE']
    trainer.DROPOUT = exp['DROPOUT']
    trainer.LEARNING_RATE = exp['LEARNING_RATE']

    # Use separate checkpoint/weights files to avoid overwriting
    ckpt_path = os.path.join(CKPT_DIR, f"smoke_{exp['name']}.pt")
    weights_path = os.path.join(CKPT_DIR, f"smoke_{exp['name']}.npz")
    trainer.CHECKPOINT_FILE = ckpt_path
    trainer.WEIGHTS_FILE = weights_path

    print('Parameters:')
    print(f" HIDDEN_SIZE={trainer.HIDDEN_SIZE}, DROPOUT={trainer.DROPOUT}, LR={trainer.LEARNING_RATE}")
    print(f" EPOCHS={trainer.EPOCHS}, MAX_SAMPLES={trainer.MAX_SAMPLES}")
    print(f" CHECKPOINT -> {trainer.CHECKPOINT_FILE}")

    # Run training (blocking)
    t0 = time.time()
    try:
        trainer.main()
    except Exception as e:
        print('Erreur pendant trainer.main():', e)
        # continue to evaluation attempt (if checkpoint exists)
    t1 = time.time()
    print(f"Run time: {t1-t0:.1f}s")

    # Load model from checkpoint
    try:
        model = trainer.TorchNNEvaluator(hidden_size=trainer.HIDDEN_SIZE, dropout=trainer.DROPOUT, leaky_alpha=trainer.LEAKY_ALPHA)
        optimizer = trainer.optim.AdamW(model.parameters(), lr=trainer.LEARNING_RATE, weight_decay=trainer.WEIGHT_DECAY)
        model, optim_state, step = trainer.torch_load_checkpoint(trainer.CHECKPOINT_FILE, model, optimizer, device=trainer.DEVICE)

        # Evaluation on fixed val set
        eval_dataset = trainer.ChessDataset(val_fens, val_targets)
        eval_loader = DataLoader(eval_dataset, batch_size=max(1, trainer.BATCH_SIZE//2), shuffle=False)
        rmse, mae, corr, preds, targets = trainer.evaluate_model(model, eval_loader, trainer.DEVICE)
        print(f"Eval results — RMSE: {rmse:.4f}, MAE: {mae:.4f}, Corr: {corr:.4f}")
        results.append({'exp': exp['name'], 'rmse': rmse, 'mae': mae, 'corr': corr})
    except FileNotFoundError:
        print('Checkpoint not found, skipping evaluation for this experiment.')
        results.append({'exp': exp['name'], 'rmse': None, 'mae': None, 'corr': None})
    except Exception as e:
        print('Erreur lors de l\'évaluation:', e)
        results.append({'exp': exp['name'], 'rmse': None, 'mae': None, 'corr': None})

    # Restore original trainer settings
    for k, v in orig.items():
        setattr(trainer, k, v)

print('\n' + '='*80)
print('Résumé des smoke tests:')
for r in results:
    print(r)
print('='*80)

# End of smoke tests


Lancement des smoke tests (rapides).

Validation fixe : 5000 positions (seed=42)

Exp: baseline
Parameters:
 HIDDEN_SIZE=256, DROPOUT=0.3, LR=0.001
 EPOCHS=3, MAX_SAMPLES=100000
 CHECKPOINT -> /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.pt
📂 Chargement du dataset depuis /content/drive/MyDrive/smart_chess_drive/chessData.csv...
🧹 Nettoyage : 190154 lignes corrompues supprimées.
✅ 12,767,881 positions valides chargées.

📊 Dataset complet: 12,767,881 positions
🆕 Création d'un nouveau réseau...

Configuration:
  Dataset complet: 12,767,881 positions
  Échantillon/epoch: 100,000 positions
  Architecture: 768 → 256 → 256 → 1
  Dropout: 0.3
  LeakyReLU alpha: 0.01
  Learning rate: 0.001 (AdamW, weight decay: 0.0001)
  LR Warmup: True (0.0001 → 0.001)
  LR Scheduler: True (patience: 2)
  Batch size: 128
  Epochs: 3
  Device: cuda


[Epoch 1] 🎲 Échantillonnage: 100,000 positions sur 12,767,881
🔥 Warmup epoch 1/3: LR = 0.000400


Epoch 1/3:   1%|▏         | 10/782 [00:00<00:07, 97.04it/s, loss=0.8523]


[DEBUG batch 0] targets mean=-0.0954 std=0.8398; preds mean=0.0197 std=0.0206; RMSE=0.8495; corr=-0.0816


Epoch 1/3: 100%|██████████| 782/782 [00:08<00:00, 93.69it/s, loss=0.7908]



🔍 Évaluation epoch 1...

EPOCH 1/3 - Évaluation sur 2,000 positions
  RMSE:        0.7484  (baseline: 0.8009)
  MAE:         0.3023
  Amélioration: +6.5% vs baseline
  Corrélation: 0.3596
  Std preds:   0.2819  (cible: 0.8009)
  Mean preds:  0.0008  (cible: 0.0412)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7484 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.npz

[Epoch 2] 🎲 Échantillonnage: 100,000 positions sur 12,767,881
🔥 Warmup epoch 2/3: LR = 0.000700


Epoch 2/3: 100%|██████████| 782/782 [00:08<00:00, 95.37it/s, loss=0.7620]



🔍 Évaluation epoch 2...

EPOCH 2/3 - Évaluation sur 2,000 positions
  RMSE:        0.7313  (baseline: 0.8047)
  MAE:         0.3036
  Amélioration: +9.1% vs baseline
  Corrélation: 0.4233
  Std preds:   0.2952  (cible: 0.8047)
  Mean preds:  0.0497  (cible: 0.0156)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7313 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.npz

[Epoch 3] 🎲 Échantillonnage: 100,000 positions sur 12,767,881
🔥 Warmup epoch 3/3: LR = 0.001000


Epoch 3/3: 100%|██████████| 782/782 [00:08<00:00, 94.23it/s, loss=0.7419]



🔍 Évaluation epoch 3...

EPOCH 3/3 - Évaluation sur 2,000 positions
  RMSE:        0.7975  (baseline: 0.9379)
  MAE:         0.3361
  Amélioration: +15.0% vs baseline
  Corrélation: 0.5428
  Std preds:   0.3849  (cible: 0.9379)
  Mean preds:  0.0718  (cible: 0.0605)
  →  Apprentissage en cours


🎉 Entraînement terminé!
📊 Meilleur RMSE: 0.7313

💾 Sauvegarde finale...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.npz
✅ Modèle sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.pt et /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_baseline.npz
Run time: 45.5s
Eval results — RMSE: 0.7478, MAE: 0.2981, Corr: 0.4215

Exp: bigger
Parameters:
 HIDDEN_SIZE=512, DROPOUT=0.2, LR=0.0005
 EPOCHS=3, MAX_SAMPLES=100000
 CHECKPOINT -> /content/d

Epoch 1/3:   1%|▏         | 10/782 [00:00<00:08, 94.63it/s, loss=0.8632]


[DEBUG batch 0] targets mean=0.1051 std=0.3792; preds mean=0.0599 std=0.0168; RMSE=0.3844; corr=-0.1255


Epoch 1/3: 100%|██████████| 782/782 [00:08<00:00, 95.84it/s, loss=0.7811]



🔍 Évaluation epoch 1...

EPOCH 1/3 - Évaluation sur 2,000 positions
  RMSE:        0.7916  (baseline: 0.8562)
  MAE:         0.3106
  Amélioration: +7.5% vs baseline
  Corrélation: 0.3909
  Std preds:   0.2610  (cible: 0.8562)
  Mean preds:  0.0385  (cible: 0.0525)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7916 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.npz

[Epoch 2] 🎲 Échantillonnage: 100,000 positions sur 12,767,881
🔥 Warmup epoch 2/3: LR = 0.000367


Epoch 2/3: 100%|██████████| 782/782 [00:08<00:00, 93.07it/s, loss=0.7477]



🔍 Évaluation epoch 2...

EPOCH 2/3 - Évaluation sur 2,000 positions
  RMSE:        0.7304  (baseline: 0.7993)
  MAE:         0.2937
  Amélioration: +8.6% vs baseline
  Corrélation: 0.4136
  Std preds:   0.2700  (cible: 0.7993)
  Mean preds:  0.0538  (cible: 0.0693)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7304 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.npz

[Epoch 3] 🎲 Échantillonnage: 100,000 positions sur 12,767,881
🔥 Warmup epoch 3/3: LR = 0.000500


Epoch 3/3: 100%|██████████| 782/782 [00:08<00:00, 94.17it/s, loss=0.7412]



🔍 Évaluation epoch 3...

EPOCH 3/3 - Évaluation sur 2,000 positions
  RMSE:        0.6829  (baseline: 0.7569)
  MAE:         0.2861
  Amélioration: +9.8% vs baseline
  Corrélation: 0.4321
  Std preds:   0.3429  (cible: 0.7569)
  Mean preds:  0.0592  (cible: 0.0449)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.6829 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.npz

🎉 Entraînement terminé!
📊 Meilleur RMSE: 0.6829

💾 Sauvegarde finale...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_bigger.npz
✅ Modèle sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/

Epoch 1/3:   1%|          | 9/782 [00:00<00:08, 89.44it/s, loss=0.7861]


[DEBUG batch 0] targets mean=-0.0480 std=0.8090; preds mean=0.0281 std=0.0190; RMSE=0.8120; corr=0.0426


Epoch 1/3: 100%|██████████| 782/782 [00:08<00:00, 94.23it/s, loss=0.7964]



🔍 Évaluation epoch 1...

EPOCH 1/3 - Évaluation sur 2,000 positions
  RMSE:        0.7115  (baseline: 0.7599)
  MAE:         0.2874
  Amélioration: +6.4% vs baseline
  Corrélation: 0.3623
  Std preds:   0.2119  (cible: 0.7599)
  Mean preds:  0.0239  (cible: 0.0463)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7115 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_smaller_lr.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_smaller_lr.npz

[Epoch 2] 🎲 Échantillonnage: 100,000 positions sur 12,767,881
🔥 Warmup epoch 2/3: LR = 0.000100


Epoch 2/3: 100%|██████████| 782/782 [00:08<00:00, 96.92it/s, loss=0.7769]



🔍 Évaluation epoch 2...

EPOCH 2/3 - Évaluation sur 2,000 positions
  RMSE:        0.8157  (baseline: 0.8568)
  MAE:         0.3187
  Amélioration: +4.8% vs baseline
  Corrélation: 0.3081
  Std preds:   0.2931  (cible: 0.8568)
  Mean preds:  0.0621  (cible: 0.0674)
  ⚠  Faible amélioration - vérifier hyperparamètres


[Epoch 3] 🎲 Échantillonnage: 100,000 positions sur 12,767,881
🔥 Warmup epoch 3/3: LR = 0.000100


Epoch 3/3: 100%|██████████| 782/782 [00:08<00:00, 93.15it/s, loss=0.7590]



🔍 Évaluation epoch 3...

EPOCH 3/3 - Évaluation sur 2,000 positions
  RMSE:        0.8246  (baseline: 0.8923)
  MAE:         0.3201
  Amélioration: +7.6% vs baseline
  Corrélation: 0.3859
  Std preds:   0.3133  (cible: 0.8923)
  Mean preds:  0.0292  (cible: 0.0648)
  ⚠  Faible amélioration - vérifier hyperparamètres


🎉 Entraînement terminé!
📊 Meilleur RMSE: 0.7115

💾 Sauvegarde finale...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_smaller_lr.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_smaller_lr.npz
✅ Modèle sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_smaller_lr.pt et /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_smaller_lr.npz
Run time: 44.7s
Eval results — RMSE: 0.7484, MAE: 0.2953, Corr: 0.4226

Résumé des smoke tests:
{'exp': 'baseline', 'rmse': 0.7478238344192505, 'mae': 0.29814785

In [None]:
# Smoke-extended paramétrable — runs répétés et mode rapide
# Usage:
# - régler FAST_MODE=True pour itérations ultra-rapides (EPOCHS=1, MAX_SAMPLES=20k)
# - régler REPS pour répéter chaque expérience sur plusieurs seeds

import time
import os
import numpy as np
import pandas as pd
from torch.utils.data import DataLoader

print('\n=== Smoke-extended démarré ===')

# Paramètres utilisateur
FAST_MODE = False        # True -> very fast (useful for quick checks)
EPOCHS_EXT = 5
MAX_SAMPLES_EXT = 50_000
REPS = 1                # nombre de répétitions par configuration
BASE_SEED = 42

if FAST_MODE:
    EPOCHS_EXT = 1
    MAX_SAMPLES_EXT = 20_000

print(f"FAST_MODE={FAST_MODE}, EPOCHS={EPOCHS_EXT}, MAX_SAMPLES={MAX_SAMPLES_EXT}, REPS={REPS}")

# Vérifier que trainer est chargé
if 'trainer' not in globals():
    raise RuntimeError('Le module trainer (ai.NN.train_torch) doit être importé avant d\'exécuter cette cellule.')

# Charger données si nécessaire
if 'X_train' not in globals() or 'y_train' not in globals():
    print('X_train/y_train non présents en mémoire — chargement via trainer.load_data(...) (peut être long)')
    fens, evaluations = trainer.load_data(trainer.DATASET_PATH)
    X_train = fens
    y_train = evaluations

# Création d'une validation fixe
val_size = min(5000, len(X_train))
rs = np.random.RandomState(42)
val_idx = rs.choice(len(X_train), size=val_size, replace=False)
val_fens = X_train[val_idx]
val_targets = y_train[val_idx]
print(f'Validation fixe : {val_size} positions (seed=42)')

# Conserver paramètres originaux
orig_keys = ['HIDDEN_SIZE','DROPOUT','LEARNING_RATE','WEIGHT_DECAY','BATCH_SIZE','EPOCHS','MAX_SAMPLES','CHECKPOINT_FILE','WEIGHTS_FILE','EVAL_MAX_SAMPLES']
orig = {k: getattr(trainer, k) for k in orig_keys if hasattr(trainer, k)}

# Expériences à tester
experiments = [
    {'name': 'baseline', 'HIDDEN_SIZE': orig.get('HIDDEN_SIZE', 256), 'DROPOUT': orig.get('DROPOUT', 0.3), 'LEARNING_RATE': orig.get('LEARNING_RATE', 0.001)},
    {'name': 'bigger', 'HIDDEN_SIZE': 512, 'DROPOUT': 0.2, 'LEARNING_RATE': 5e-4},
    {'name': 'smaller_lr', 'HIDDEN_SIZE': 512, 'DROPOUT': 0.2, 'LEARNING_RATE': 1e-4},
]

results = []

for exp in experiments:
    for rep in range(REPS):
        seed = BASE_SEED + rep
        run_name = f"{exp['name']}_r{rep+1}_s{seed}"
        print('\n' + '='*80)
        print(f"Run: {run_name}")
        print('='*80)

        # appliquer paramètres rapides
        trainer.EPOCHS = EPOCHS_EXT
        trainer.MAX_SAMPLES = MAX_SAMPLES_EXT
        trainer.EVAL_MAX_SAMPLES = min(2000, MAX_SAMPLES_EXT//50)

        # appliquer overrides de l'expérience
        trainer.HIDDEN_SIZE = exp['HIDDEN_SIZE']
        trainer.DROPOUT = exp['DROPOUT']
        trainer.LEARNING_RATE = exp['LEARNING_RATE']

        # checkpoints séparés
        ckpt_path = os.path.join(CKPT_DIR, f"smoke_ext_{run_name}.pt")
        weights_path = os.path.join(CKPT_DIR, f"smoke_ext_{run_name}.npz")
        trainer.CHECKPOINT_FILE = ckpt_path
        trainer.WEIGHTS_FILE = weights_path

        print('Params:', f"H={trainer.HIDDEN_SIZE}, dropout={trainer.DROPOUT}, lr={trainer.LEARNING_RATE}")
        print('Run params:', f"EPOCHS={trainer.EPOCHS}, MAX_SAMPLES={trainer.MAX_SAMPLES}")
        print('Checkpoint:', trainer.CHECKPOINT_FILE)

        # Fix seeds where possible
        try:
            import torch
            np.random.seed(seed)
            torch.manual_seed(seed)
            if trainer.DEVICE and 'cuda' in str(trainer.DEVICE):
                torch.cuda.manual_seed_all(seed)
        except Exception:
            pass

        t0 = time.time()
        try:
            trainer.main()
        except Exception as e:
            print('Erreur pendant trainer.main():', e)
        t1 = time.time()
        print(f'Run time: {t1-t0:.1f}s')

        # Évaluation sur la validation fixe
        try:
            model = trainer.TorchNNEvaluator(hidden_size=trainer.HIDDEN_SIZE, dropout=trainer.DROPOUT, leaky_alpha=getattr(trainer,'LEAKY_ALPHA',0.01))
            optimizer = trainer.optim.AdamW(model.parameters(), lr=trainer.LEARNING_RATE, weight_decay=getattr(trainer,'WEIGHT_DECAY',1e-4))
            model, opt_state, step = trainer.torch_load_checkpoint(trainer.CHECKPOINT_FILE, model, optimizer, device=trainer.DEVICE)

            eval_dataset = trainer.ChessDataset(val_fens, val_targets)
            eval_loader = DataLoader(eval_dataset, batch_size=max(1, trainer.BATCH_SIZE//2), shuffle=False)
            rmse, mae, corr, preds, targets = trainer.evaluate_model(model, eval_loader, trainer.DEVICE)
            print(f"Eval — RMSE: {rmse:.4f}, MAE: {mae:.4f}, Corr: {corr:.4f}")
            results.append({'run': run_name, 'exp': exp['name'], 'seed': seed, 'rmse': rmse, 'mae': mae, 'corr': corr, 'ckpt': trainer.CHECKPOINT_FILE})
        except FileNotFoundError:
            print('Checkpoint introuvable, évaluation sautée.')
            results.append({'run': run_name, 'exp': exp['name'], 'seed': seed, 'rmse': None, 'mae': None, 'corr': None, 'ckpt': trainer.CHECKPOINT_FILE})
        except Exception as e:
            print('Erreur pendant l\'évaluation:', e)
            results.append({'run': run_name, 'exp': exp['name'], 'seed': seed, 'rmse': None, 'mae': None, 'corr': None, 'ckpt': trainer.CHECKPOINT_FILE})

        # Restaurer paramètres d'origine
        for k, v in orig.items():
            setattr(trainer, k, v)

print('\n=== Résumé smoke-extended ===')
df = pd.DataFrame(results)
print(df)

# Sauvegarder résumé csv
summary_path = os.path.join(CKPT_DIR, 'smoke_extended_summary.csv')
df.to_csv(summary_path, index=False)
print('Résumé sauvegardé dans', summary_path)



=== Smoke-extended démarré ===
FAST_MODE=False, EPOCHS=5, MAX_SAMPLES=200000, REPS=1
Validation fixe : 5000 positions (seed=42)

Run: baseline_r1_s42
Params: H=256, dropout=0.3, lr=0.001
Run params: EPOCHS=5, MAX_SAMPLES=200000
Checkpoint: /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.pt
📂 Chargement du dataset depuis /content/drive/MyDrive/smart_chess_drive/chessData.csv...
🧹 Nettoyage : 190154 lignes corrompues supprimées.
✅ 12,767,881 positions valides chargées.

📊 Dataset complet: 12,767,881 positions
🆕 Création d'un nouveau réseau...

Configuration:
  Dataset complet: 12,767,881 positions
  Échantillon/epoch: 200,000 positions
  Architecture: 768 → 256 → 256 → 1
  Dropout: 0.3
  LeakyReLU alpha: 0.01
  Learning rate: 0.001 (AdamW, weight decay: 0.0001)
  LR Warmup: True (0.0001 → 0.001)
  LR Scheduler: True (patience: 2)
  Batch size: 128
  Epochs: 5
  Device: cuda


[Epoch 1] 🎲 Échantillonnage: 200,000 positions sur 12,767,881
🔥 Wa

Epoch 1/5:   1%|          | 10/1563 [00:00<00:16, 96.28it/s, loss=0.8121]


[DEBUG batch 0] targets mean=0.0066 std=0.5872; preds mean=0.0398 std=0.0211; RMSE=0.5890; corr=-0.0212


Epoch 1/5: 100%|██████████| 1563/1563 [00:16<00:00, 93.64it/s, loss=0.7683] 



🔍 Évaluation epoch 1...

EPOCH 1/5 - Évaluation sur 2,000 positions
  RMSE:        0.7690  (baseline: 0.8461)
  MAE:         0.3093
  Amélioration: +9.1% vs baseline
  Corrélation: 0.4204
  Std preds:   0.3180  (cible: 0.8461)
  Mean preds:  0.0279  (cible: 0.0530)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7690 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.npz

[Epoch 2] 🎲 Échantillonnage: 200,000 positions sur 12,767,881
🔥 Warmup epoch 2/3: LR = 0.000700


Epoch 2/5: 100%|██████████| 1563/1563 [00:16<00:00, 94.14it/s, loss=0.7514]



🔍 Évaluation epoch 2...

EPOCH 2/5 - Évaluation sur 2,000 positions
  RMSE:        0.6892  (baseline: 0.7700)
  MAE:         0.2941
  Amélioration: +10.5% vs baseline
  Corrélation: 0.4477
  Std preds:   0.3750  (cible: 0.7700)
  Mean preds:  0.0466  (cible: 0.0428)
  →  Apprentissage en cours

💾 Nouveau meilleur RMSE: 0.6892 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.npz

[Epoch 3] 🎲 Échantillonnage: 200,000 positions sur 12,767,881
🔥 Warmup epoch 3/3: LR = 0.001000


Epoch 3/5: 100%|██████████| 1563/1563 [00:16<00:00, 94.53it/s, loss=0.7330]



🔍 Évaluation epoch 3...

EPOCH 3/5 - Évaluation sur 2,000 positions
  RMSE:        0.7087  (baseline: 0.7935)
  MAE:         0.2947
  Amélioration: +10.7% vs baseline
  Corrélation: 0.4514
  Std preds:   0.3618  (cible: 0.7935)
  Mean preds:  0.0751  (cible: 0.0449)
  →  Apprentissage en cours


[Epoch 4] 🎲 Échantillonnage: 200,000 positions sur 12,767,881


Epoch 4/5: 100%|██████████| 1563/1563 [00:16<00:00, 94.32it/s, loss=0.7103]



🔍 Évaluation epoch 4...

EPOCH 4/5 - Évaluation sur 2,000 positions
  RMSE:        0.7154  (baseline: 0.8226)
  MAE:         0.2909
  Amélioration: +13.0% vs baseline
  Corrélation: 0.5043
  Std preds:   0.3408  (cible: 0.8226)
  Mean preds:  0.0697  (cible: 0.0283)
  →  Apprentissage en cours


[Epoch 5] 🎲 Échantillonnage: 200,000 positions sur 12,767,881


Epoch 5/5: 100%|██████████| 1563/1563 [00:16<00:00, 95.96it/s, loss=0.7139]



🔍 Évaluation epoch 5...

EPOCH 5/5 - Évaluation sur 2,000 positions
  RMSE:        0.7420  (baseline: 0.8668)
  MAE:         0.3103
  Amélioration: +14.4% vs baseline
  Corrélation: 0.5216
  Std preds:   0.3995  (cible: 0.8668)
  Mean preds:  0.0839  (cible: 0.0555)
  →  Apprentissage en cours


🎉 Entraînement terminé!
📊 Meilleur RMSE: 0.6892

💾 Sauvegarde finale...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.npz
✅ Modèle sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.pt et /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_baseline_r1_s42.npz
Run time: 105.9s
Eval — RMSE: 0.7096, MAE: 0.2863, Corr: 0.5116

Run: bigger_r1_s42
Params: H=512, dropout=0.2, lr=0.0005
Run params: EPOCHS=5, MAX_

Epoch 1/5:   1%|          | 9/1563 [00:00<00:17, 88.58it/s, loss=0.8375]


[DEBUG batch 0] targets mean=0.0171 std=0.5975; preds mean=0.0703 std=0.0156; RMSE=0.5992; corr=0.0533


Epoch 1/5: 100%|██████████| 1563/1563 [00:16<00:00, 94.86it/s, loss=0.7670]



🔍 Évaluation epoch 1...

EPOCH 1/5 - Évaluation sur 2,000 positions
  RMSE:        0.7691  (baseline: 0.8461)
  MAE:         0.3104
  Amélioration: +9.1% vs baseline
  Corrélation: 0.4199
  Std preds:   0.3238  (cible: 0.8461)
  Mean preds:  0.0250  (cible: 0.0530)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7691 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.npz

[Epoch 2] 🎲 Échantillonnage: 200,000 positions sur 12,767,881
🔥 Warmup epoch 2/3: LR = 0.000367


Epoch 2/5: 100%|██████████| 1563/1563 [00:16<00:00, 93.70it/s, loss=0.7467]



🔍 Évaluation epoch 2...

EPOCH 2/5 - Évaluation sur 2,000 positions
  RMSE:        0.6976  (baseline: 0.7700)
  MAE:         0.2915
  Amélioration: +9.4% vs baseline
  Corrélation: 0.4266
  Std preds:   0.3335  (cible: 0.7700)
  Mean preds:  0.0022  (cible: 0.0428)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.6976 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.npz

[Epoch 3] 🎲 Échantillonnage: 200,000 positions sur 12,767,881
🔥 Warmup epoch 3/3: LR = 0.000500


Epoch 3/5: 100%|██████████| 1563/1563 [00:16<00:00, 93.12it/s, loss=0.7267]



🔍 Évaluation epoch 3...

EPOCH 3/5 - Évaluation sur 2,000 positions
  RMSE:        0.7056  (baseline: 0.7935)
  MAE:         0.2929
  Amélioration: +11.1% vs baseline
  Corrélation: 0.4578
  Std preds:   0.3505  (cible: 0.7935)
  Mean preds:  0.0469  (cible: 0.0449)
  →  Apprentissage en cours


[Epoch 4] 🎲 Échantillonnage: 200,000 positions sur 12,767,881


Epoch 4/5: 100%|██████████| 1563/1563 [00:16<00:00, 93.54it/s, loss=0.6997]



🔍 Évaluation epoch 4...

EPOCH 4/5 - Évaluation sur 2,000 positions
  RMSE:        0.6933  (baseline: 0.8226)
  MAE:         0.2841
  Amélioration: +15.7% vs baseline
  Corrélation: 0.5655
  Std preds:   0.3274  (cible: 0.8226)
  Mean preds:  0.0659  (cible: 0.0283)
  →  Apprentissage en cours

💾 Nouveau meilleur RMSE: 0.6933 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.npz

[Epoch 5] 🎲 Échantillonnage: 200,000 positions sur 12,767,881


Epoch 5/5: 100%|██████████| 1563/1563 [00:16<00:00, 92.05it/s, loss=0.6999]



🔍 Évaluation epoch 5...

EPOCH 5/5 - Évaluation sur 2,000 positions
  RMSE:        0.7186  (baseline: 0.8668)
  MAE:         0.3050
  Amélioration: +17.1% vs baseline
  Corrélation: 0.5688
  Std preds:   0.4142  (cible: 0.8668)
  Mean preds:  0.0995  (cible: 0.0555)
  →  Apprentissage en cours


🎉 Entraînement terminé!
📊 Meilleur RMSE: 0.6933

💾 Sauvegarde finale...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.npz
✅ Modèle sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.pt et /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_bigger_r1_s42.npz
Run time: 106.4s
Eval — RMSE: 0.6877, MAE: 0.2863, Corr: 0.5567

Run: smaller_lr_r1_s42
Params: H=512, dropout=0.2, lr=0.0001
Run params: EPOCHS=5, MAX_SAMP

Epoch 1/5:   1%|          | 9/1563 [00:00<00:17, 86.75it/s, loss=0.8390]


[DEBUG batch 0] targets mean=0.0171 std=0.5975; preds mean=0.0703 std=0.0156; RMSE=0.5992; corr=0.0533


Epoch 1/5: 100%|██████████| 1563/1563 [00:16<00:00, 93.36it/s, loss=0.7834]



🔍 Évaluation epoch 1...

EPOCH 1/5 - Évaluation sur 2,000 positions
  RMSE:        0.7738  (baseline: 0.8461)
  MAE:         0.3097
  Amélioration: +8.5% vs baseline
  Corrélation: 0.4130
  Std preds:   0.2835  (cible: 0.8461)
  Mean preds:  0.0275  (cible: 0.0530)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7738 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.npz

[Epoch 2] 🎲 Échantillonnage: 200,000 positions sur 12,767,881
🔥 Warmup epoch 2/3: LR = 0.000100


Epoch 2/5: 100%|██████████| 1563/1563 [00:17<00:00, 91.79it/s, loss=0.7517]



🔍 Évaluation epoch 2...

EPOCH 2/5 - Évaluation sur 2,000 positions
  RMSE:        0.7011  (baseline: 0.7700)
  MAE:         0.2888
  Amélioration: +8.9% vs baseline
  Corrélation: 0.4156
  Std preds:   0.3176  (cible: 0.7700)
  Mean preds:  0.0099  (cible: 0.0428)
  ⚠  Faible amélioration - vérifier hyperparamètres

💾 Nouveau meilleur RMSE: 0.7011 - Sauvegarde...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.npz

[Epoch 3] 🎲 Échantillonnage: 200,000 positions sur 12,767,881
🔥 Warmup epoch 3/3: LR = 0.000100


Epoch 3/5: 100%|██████████| 1563/1563 [00:16<00:00, 92.32it/s, loss=0.7288]



🔍 Évaluation epoch 3...

EPOCH 3/5 - Évaluation sur 2,000 positions
  RMSE:        0.7161  (baseline: 0.7935)
  MAE:         0.2941
  Amélioration: +9.8% vs baseline
  Corrélation: 0.4312
  Std preds:   0.3495  (cible: 0.7935)
  Mean preds:  0.0364  (cible: 0.0449)
  ⚠  Faible amélioration - vérifier hyperparamètres


[Epoch 4] 🎲 Échantillonnage: 200,000 positions sur 12,767,881


Epoch 4/5: 100%|██████████| 1563/1563 [00:16<00:00, 93.12it/s, loss=0.7043]



🔍 Évaluation epoch 4...

EPOCH 4/5 - Évaluation sur 2,000 positions
  RMSE:        0.7098  (baseline: 0.8226)
  MAE:         0.2882
  Amélioration: +13.7% vs baseline
  Corrélation: 0.5228
  Std preds:   0.3267  (cible: 0.8226)
  Mean preds:  0.0658  (cible: 0.0283)
  →  Apprentissage en cours


[Epoch 5] 🎲 Échantillonnage: 200,000 positions sur 12,767,881


Epoch 5/5: 100%|██████████| 1563/1563 [00:16<00:00, 93.14it/s, loss=0.7053]



🔍 Évaluation epoch 5...

EPOCH 5/5 - Évaluation sur 2,000 positions
  RMSE:        0.7439  (baseline: 0.8668)
  MAE:         0.3054
  Amélioration: +14.2% vs baseline
  Corrélation: 0.5209
  Std preds:   0.3759  (cible: 0.8668)
  Mean preds:  0.0659  (cible: 0.0555)
  →  Apprentissage en cours


🎉 Entraînement terminé!
📊 Meilleur RMSE: 0.7011

💾 Sauvegarde finale...
Checkpoint PyTorch sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.pt
Poids sauvegardés (npz) dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.npz
✅ Modèle sauvegardé dans /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.pt et /content/drive/MyDrive/smart_chess_drive/smart-chess/ai/checkpoints/smoke_ext_smaller_lr_r1_s42.npz
Run time: 106.7s
Eval — RMSE: 0.6988, MAE: 0.2826, Corr: 0.5368

=== Résumé smoke-extended ===
                 run         exp  seed      r