# üöÄ Google Colab Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ogautier1980/sandbox-ml/blob/main/cours/11_series_temporelles/11_demo_deep_learning_ts.ipynb)

**Si vous ex√©cutez ce notebook sur Google Colab**, ex√©cutez la cellule suivante pour installer les d√©pendances.

In [None]:
# Installation des d√©pendances (Google Colab uniquement)import sysIN_COLAB = 'google.colab' in sys.modulesif IN_COLAB:    print('üì¶ Installation des packages...')        # Packages ML de base    !pip install -q numpy pandas matplotlib seaborn scikit-learn        # D√©tection du chapitre et installation des d√©pendances sp√©cifiques    notebook_name = '11_demo_deep_learning_ts.ipynb'  # Sera remplac√© automatiquement        # Ch 06-08 : Deep Learning    if any(x in notebook_name for x in ['06_', '07_', '08_']):        !pip install -q torch torchvision torchaudio        # Ch 08 : NLP    if '08_' in notebook_name:        !pip install -q transformers datasets tokenizers        if 'rag' in notebook_name:            !pip install -q sentence-transformers faiss-cpu rank-bm25        # Ch 09 : Reinforcement Learning    if '09_' in notebook_name:        !pip install -q gymnasium[classic-control]        # Ch 04 : Boosting    if '04_' in notebook_name and 'boosting' in notebook_name:        !pip install -q xgboost lightgbm catboost        # Ch 05 : Clustering avanc√©    if '05_' in notebook_name:        !pip install -q umap-learn        # Ch 11 : S√©ries temporelles    if '11_' in notebook_name:        !pip install -q statsmodels prophet        # Ch 12 : Vision avanc√©e    if '12_' in notebook_name:        !pip install -q ultralytics timm segmentation-models-pytorch        # Ch 13 : Recommandation    if '13_' in notebook_name:        !pip install -q scikit-surprise implicit        # Ch 14 : MLOps    if '14_' in notebook_name:        !pip install -q mlflow fastapi pydantic        print('‚úÖ Installation termin√©e !')else:    print('‚ÑπÔ∏è  Environnement local d√©tect√©, les packages sont d√©j√† install√©s.')

# Chapitre 12 - Deep Learning pour S√©ries Temporelles

**Objectifs :**
- Cr√©er des fen√™tres glissantes (sliding windows) pour LSTM
- Impl√©menter LSTM et GRU pour forecasting univari√©
- Forecasting multivari√© avec features suppl√©mentaires
- Attention mechanism pour s√©ries temporelles
- Comparer DL vs mod√®les classiques (ARIMA)
- D√©tection d'anomalies par erreur de pr√©diction

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset

# Sklearn
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")
print(f"PyTorch version: {torch.__version__}")

## 1. G√©n√©ration de Donn√©es

S√©rie temporelle avec tendance, saisonnalit√© et bruit.

In [None]:
def generate_complex_timeseries(n=1000, seed=42):
    """
    G√©n√®re une s√©rie temporelle complexe
    """
    np.random.seed(seed)
    
    t = np.arange(n)
    
    # Tendance non-lin√©aire
    trend = 0.05 * t + 0.0001 * t**2
    
    # Saisonnalit√©s multiples
    seasonality_annual = 20 * np.sin(2 * np.pi * t / 365)
    seasonality_weekly = 5 * np.sin(2 * np.pi * t / 7)
    
    # Bruit
    noise = np.random.normal(0, 5, n)
    
    # S√©rie compl√®te
    y = 100 + trend + seasonality_annual + seasonality_weekly + noise
    
    # Dates
    dates = pd.date_range(start='2020-01-01', periods=n, freq='D')
    
    df = pd.DataFrame({
        'date': dates,
        'value': y
    })
    df.set_index('date', inplace=True)
    
    return df

# G√©n√©rer donn√©es
df = generate_complex_timeseries(n=1200)

print(f"S√©rie g√©n√©r√©e: {len(df)} observations")
print(f"P√©riode: {df.index.min()} √† {df.index.max()}")

# Visualisation
plt.figure(figsize=(14, 5))
plt.plot(df.index, df['value'], color='blue', linewidth=1)
plt.title('S√©rie Temporelle G√©n√©r√©e', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Valeur')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 2. Pr√©paration des Donn√©es - Sliding Windows

Transformation en s√©quences pour LSTM/GRU.

In [None]:
def create_sequences(data, window_size, horizon=1):
    """
    Cr√©e des fen√™tres glissantes (sliding windows)
    
    Param√®tres:
    - data: array 1D ou 2D (pour multivari√©)
    - window_size: taille de la fen√™tre d'entr√©e
    - horizon: nombre de pas √† pr√©dire
    
    Retourne:
    - X: (n_samples, window_size, n_features)
    - y: (n_samples, horizon)
    """
    X, y = [], []
    
    for i in range(len(data) - window_size - horizon + 1):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size:i+window_size+horizon])
    
    return np.array(X), np.array(y)

# Param√®tres
WINDOW_SIZE = 30  # Utiliser 30 jours pour pr√©dire
HORIZON = 1       # Pr√©dire 1 jour

# Donn√©es
data = df['value'].values

# Normalisation (IMPORTANT: fit sur train uniquement)
scaler = MinMaxScaler(feature_range=(0, 1))

# Train/Val/Test split (70/15/15)
train_size = int(len(data) * 0.7)
val_size = int(len(data) * 0.15)

train_data = data[:train_size]
val_data = data[train_size:train_size+val_size]
test_data = data[train_size+val_size:]

print(f"Train size: {len(train_data)}")
print(f"Val size: {len(val_data)}")
print(f"Test size: {len(test_data)}")

# Fit scaler sur train uniquement
scaler.fit(train_data.reshape(-1, 1))

# Transform
train_scaled = scaler.transform(train_data.reshape(-1, 1)).flatten()
val_scaled = scaler.transform(val_data.reshape(-1, 1)).flatten()
test_scaled = scaler.transform(test_data.reshape(-1, 1)).flatten()

# Cr√©er s√©quences
X_train, y_train = create_sequences(train_scaled, WINDOW_SIZE, HORIZON)
X_val, y_val = create_sequences(val_scaled, WINDOW_SIZE, HORIZON)
X_test, y_test = create_sequences(test_scaled, WINDOW_SIZE, HORIZON)

print(f"\nX_train shape: {X_train.shape}  # (samples, window_size)")
print(f"y_train shape: {y_train.shape}  # (samples, horizon)")
print(f"\nExemple:")
print(f"X[0] (30 valeurs): {X_train[0][:5]}...")
print(f"y[0] (pr√©diction): {y_train[0]}")

In [None]:
# Convertir en tenseurs PyTorch
X_train_t = torch.FloatTensor(X_train).unsqueeze(-1)  # (N, L, 1)
y_train_t = torch.FloatTensor(y_train)

X_val_t = torch.FloatTensor(X_val).unsqueeze(-1)
y_val_t = torch.FloatTensor(y_val)

X_test_t = torch.FloatTensor(X_test).unsqueeze(-1)
y_test_t = torch.FloatTensor(y_test)

print(f"X_train_t shape: {X_train_t.shape}  # (batch, seq_len, features)")
print(f"y_train_t shape: {y_train_t.shape}")

# DataLoaders
BATCH_SIZE = 32

train_dataset = TensorDataset(X_train_t, y_train_t)
val_dataset = TensorDataset(X_val_t, y_val_t)
test_dataset = TensorDataset(X_test_t, y_test_t)

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

print(f"\nTrain batches: {len(train_loader)}")
print(f"Val batches: {len(val_loader)}")
print(f"Test batches: {len(test_loader)}")

## 3. Mod√®le LSTM

### 3.1 Architecture

In [None]:
class LSTMForecaster(nn.Module):
    def __init__(self, input_size=1, hidden_size=64, num_layers=2, 
                 output_size=1, dropout=0.2):
        super(LSTMForecaster, self).__init__()
        
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM layers
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Fully connected
        self.fc = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        # x: (batch, seq_len, input_size)
        
        # LSTM
        lstm_out, (h_n, c_n) = self.lstm(x)
        # lstm_out: (batch, seq_len, hidden_size)
        
        # Prendre le dernier output
        last_output = lstm_out[:, -1, :]  # (batch, hidden_size)
        
        # Pr√©diction
        output = self.fc(last_output)  # (batch, output_size)
        
        return output

# Instanciation
lstm_model = LSTMForecaster(
    input_size=1,
    hidden_size=64,
    num_layers=2,
    output_size=HORIZON,
    dropout=0.2
).to(device)

print(lstm_model)
print(f"\nNombre de param√®tres: {sum(p.numel() for p in lstm_model.parameters() if p.requires_grad):,}")

### 3.2 Entra√Ænement

In [None]:
def train_model(model, train_loader, val_loader, num_epochs=100, lr=0.001, patience=15):
    """
    Entra√Æne le mod√®le avec early stopping
    """
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    train_losses = []
    val_losses = []
    best_val_loss = np.inf
    patience_counter = 0
    best_model_state = None
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        train_loss = 0.0
        
        for batch_X, batch_y in train_loader:
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            
            # Forward
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            
            # Backward
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
        
        train_loss /= len(train_loader)
        train_losses.append(train_loss)
        
        # Validation
        model.eval()
        val_loss = 0.0
        
        with torch.no_grad():
            for batch_X, batch_y in val_loader:
                batch_X = batch_X.to(device)
                batch_y = batch_y.to(device)
                
                outputs = model(batch_X)
                loss = criterion(outputs, batch_y)
                val_loss += loss.item()
        
        val_loss /= len(val_loader)
        val_losses.append(val_loss)
        
        # Early stopping
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
            best_model_state = model.state_dict().copy()
        else:
            patience_counter += 1
        
        if (epoch + 1) % 10 == 0:
            print(f"Epoch [{epoch+1}/{num_epochs}], "
                  f"Train Loss: {train_loss:.6f}, Val Loss: {val_loss:.6f}")
        
        if patience_counter >= patience:
            print(f"\nEarly stopping √† l'epoch {epoch+1}")
            break
    
    # Charger meilleur mod√®le
    if best_model_state is not None:
        model.load_state_dict(best_model_state)
    
    return train_losses, val_losses

# Entra√Ænement
print("Entra√Ænement LSTM...\n")
train_losses, val_losses = train_model(
    lstm_model, 
    train_loader, 
    val_loader, 
    num_epochs=100, 
    lr=0.001,
    patience=15
)

print("\n‚úÖ Entra√Ænement termin√©")

In [None]:
# Courbes de loss
plt.figure(figsize=(12, 5))
plt.plot(train_losses, label='Train Loss', color='blue')
plt.plot(val_losses, label='Validation Loss', color='orange')
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('Courbes d\'Apprentissage - LSTM', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### 3.3 √âvaluation

In [None]:
def evaluate_model(model, test_loader, scaler):
    """
    √âvalue le mod√®le sur le test set
    """
    model.eval()
    predictions = []
    actuals = []
    
    with torch.no_grad():
        for batch_X, batch_y in test_loader:
            batch_X = batch_X.to(device)
            outputs = model(batch_X)
            
            predictions.extend(outputs.cpu().numpy())
            actuals.extend(batch_y.numpy())
    
    predictions = np.array(predictions).flatten()
    actuals = np.array(actuals).flatten()
    
    # Inverse transform (revenir √† l'√©chelle originale)
    predictions_orig = scaler.inverse_transform(predictions.reshape(-1, 1)).flatten()
    actuals_orig = scaler.inverse_transform(actuals.reshape(-1, 1)).flatten()
    
    # M√©triques
    mae = mean_absolute_error(actuals_orig, predictions_orig)
    rmse = np.sqrt(mean_squared_error(actuals_orig, predictions_orig))
    mape = np.mean(np.abs((actuals_orig - predictions_orig) / actuals_orig)) * 100
    
    print("\n=== M√©triques Test Set ===")
    print(f"MAE:  {mae:.4f}")
    print(f"RMSE: {rmse:.4f}")
    print(f"MAPE: {mape:.2f}%")
    
    return predictions_orig, actuals_orig, {'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# √âvaluation LSTM
lstm_predictions, lstm_actuals, lstm_metrics = evaluate_model(lstm_model, test_loader, scaler)

In [None]:
# Visualisation pr√©dictions
plt.figure(figsize=(14, 6))
plt.plot(lstm_actuals, label='R√©el', color='green', linewidth=2)
plt.plot(lstm_predictions, label='Pr√©dictions LSTM', color='red', linestyle='--', alpha=0.8)
plt.title('Pr√©dictions LSTM sur Test Set', fontsize=14, fontweight='bold')
plt.xlabel('Pas de temps')
plt.ylabel('Valeur')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Erreur de pr√©diction
errors = lstm_actuals - lstm_predictions

plt.figure(figsize=(14, 5))
plt.plot(errors, color='purple', alpha=0.7)
plt.axhline(0, color='black', linestyle='--', linewidth=1)
plt.fill_between(range(len(errors)), errors, alpha=0.3, color='purple')
plt.title('Erreurs de Pr√©diction LSTM', fontsize=14, fontweight='bold')
plt.xlabel('Pas de temps')
plt.ylabel('Erreur (R√©el - Pr√©diction)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Mod√®le GRU

Variante simplifi√©e de LSTM.

In [None]:
class GRUForecaster(nn.Module):
    def __init__(self, input_size=1, hidden_size=64, num_layers=2, 
                 output_size=1, dropout=0.2):
        super(GRUForecaster, self).__init__()
        
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # GRU layers
        self.gru = nn.GRU(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Fully connected
        self.fc = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        # GRU
        gru_out, h_n = self.gru(x)
        
        # Dernier output
        last_output = gru_out[:, -1, :]
        
        # Pr√©diction
        output = self.fc(last_output)
        
        return output

# Instanciation
gru_model = GRUForecaster(
    input_size=1,
    hidden_size=64,
    num_layers=2,
    output_size=HORIZON,
    dropout=0.2
).to(device)

print(gru_model)
print(f"\nNombre de param√®tres: {sum(p.numel() for p in gru_model.parameters() if p.requires_grad):,}")

# Entra√Ænement
print("\nEntra√Ænement GRU...\n")
gru_train_losses, gru_val_losses = train_model(
    gru_model, 
    train_loader, 
    val_loader, 
    num_epochs=100,
    lr=0.001,
    patience=15
)

# √âvaluation
gru_predictions, gru_actuals, gru_metrics = evaluate_model(gru_model, test_loader, scaler)

## 5. LSTM avec Attention

In [None]:
class LSTMWithAttention(nn.Module):
    def __init__(self, input_size=1, hidden_size=64, num_layers=2, 
                 output_size=1, dropout=0.2):
        super(LSTMWithAttention, self).__init__()
        
        # LSTM
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Attention
        self.attention = nn.Linear(hidden_size, 1)
        
        # Output
        self.fc = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        # LSTM
        lstm_out, _ = self.lstm(x)  # (batch, seq_len, hidden)
        
        # Attention scores
        scores = self.attention(lstm_out)  # (batch, seq_len, 1)
        attention_weights = torch.softmax(scores, dim=1)
        
        # Context vector (weighted sum)
        context = torch.sum(attention_weights * lstm_out, dim=1)  # (batch, hidden)
        
        # Pr√©diction
        output = self.fc(context)
        
        return output, attention_weights

# Instanciation
attention_model = LSTMWithAttention(
    input_size=1,
    hidden_size=64,
    num_layers=2,
    output_size=HORIZON,
    dropout=0.2
).to(device)

print(attention_model)

# Fonction d'entra√Ænement adapt√©e pour attention
def train_attention_model(model, train_loader, val_loader, num_epochs=100, lr=0.001, patience=15):
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    train_losses = []
    val_losses = []
    best_val_loss = np.inf
    patience_counter = 0
    best_model_state = None
    
    for epoch in range(num_epochs):
        model.train()
        train_loss = 0.0
        
        for batch_X, batch_y in train_loader:
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            
            outputs, _ = model(batch_X)  # Ignorer attention weights pendant entra√Ænement
            loss = criterion(outputs, batch_y)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
        
        train_loss /= len(train_loader)
        train_losses.append(train_loss)
        
        model.eval()
        val_loss = 0.0
        
        with torch.no_grad():
            for batch_X, batch_y in val_loader:
                batch_X = batch_X.to(device)
                batch_y = batch_y.to(device)
                
                outputs, _ = model(batch_X)
                loss = criterion(outputs, batch_y)
                val_loss += loss.item()
        
        val_loss /= len(val_loader)
        val_losses.append(val_loss)
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
            best_model_state = model.state_dict().copy()
        else:
            patience_counter += 1
        
        if (epoch + 1) % 10 == 0:
            print(f"Epoch [{epoch+1}/{num_epochs}], Train: {train_loss:.6f}, Val: {val_loss:.6f}")
        
        if patience_counter >= patience:
            print(f"\nEarly stopping √† l'epoch {epoch+1}")
            break
    
    if best_model_state is not None:
        model.load_state_dict(best_model_state)
    
    return train_losses, val_losses

# Entra√Ænement
print("\nEntra√Ænement LSTM avec Attention...\n")
att_train_losses, att_val_losses = train_attention_model(
    attention_model,
    train_loader,
    val_loader,
    num_epochs=100,
    lr=0.001,
    patience=15
)

print("\n‚úÖ Entra√Ænement termin√©")

In [None]:
# √âvaluation avec visualisation de l'attention
attention_model.eval()
attention_predictions = []
attention_actuals = []
attention_weights_list = []

with torch.no_grad():
    for batch_X, batch_y in test_loader:
        batch_X = batch_X.to(device)
        outputs, att_weights = attention_model(batch_X)
        
        attention_predictions.extend(outputs.cpu().numpy())
        attention_actuals.extend(batch_y.numpy())
        attention_weights_list.append(att_weights.cpu().numpy())

attention_predictions = np.array(attention_predictions).flatten()
attention_actuals = np.array(attention_actuals).flatten()

# Inverse transform
att_pred_orig = scaler.inverse_transform(attention_predictions.reshape(-1, 1)).flatten()
att_act_orig = scaler.inverse_transform(attention_actuals.reshape(-1, 1)).flatten()

# M√©triques
att_mae = mean_absolute_error(att_act_orig, att_pred_orig)
att_rmse = np.sqrt(mean_squared_error(att_act_orig, att_pred_orig))
att_mape = np.mean(np.abs((att_act_orig - att_pred_orig) / att_act_orig)) * 100

print("\n=== M√©triques LSTM + Attention ===")
print(f"MAE:  {att_mae:.4f}")
print(f"RMSE: {att_rmse:.4f}")
print(f"MAPE: {att_mape:.2f}%")

# Visualiser attention weights pour un exemple
example_att_weights = attention_weights_list[0][0].squeeze()  # Premier exemple du premier batch

plt.figure(figsize=(12, 4))
plt.bar(range(len(example_att_weights)), example_att_weights, color='steelblue')
plt.title('Poids d\'Attention - Exemple', fontsize=14, fontweight='bold')
plt.xlabel('Position dans la s√©quence (t-30 √† t-1)')
plt.ylabel('Poids d\'attention')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

## 6. Comparaison des Mod√®les

In [None]:
# Tableau comparatif
comparison = pd.DataFrame({
    'LSTM': lstm_metrics,
    'GRU': gru_metrics,
    'LSTM+Attention': {'MAE': att_mae, 'RMSE': att_rmse, 'MAPE': att_mape}
})

print("\n=== Comparaison des Mod√®les Deep Learning ===")
print(comparison.T)

# Visualisation comparative
plt.figure(figsize=(14, 7))
plt.plot(lstm_actuals[:200], label='R√©el', color='black', linewidth=2)
plt.plot(lstm_predictions[:200], label='LSTM', color='blue', linestyle='--', alpha=0.7)
plt.plot(gru_predictions[:200], label='GRU', color='green', linestyle='--', alpha=0.7)
plt.plot(att_pred_orig[:200], label='LSTM+Attention', color='red', linestyle='--', alpha=0.7)
plt.title('Comparaison Pr√©dictions (200 premiers points)', fontsize=14, fontweight='bold')
plt.xlabel('Pas de temps')
plt.ylabel('Valeur')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 7. D√©tection d'Anomalies

Utiliser l'erreur de pr√©diction pour d√©tecter les anomalies.

In [None]:
# Erreurs de pr√©diction
errors_abs = np.abs(lstm_actuals - lstm_predictions)

# Seuil: moyenne + 3 * √©cart-type
threshold = errors_abs.mean() + 3 * errors_abs.std()

# D√©tecter anomalies
anomalies = errors_abs > threshold
anomaly_indices = np.where(anomalies)[0]

print(f"\n=== D√©tection d'Anomalies ===")
print(f"Erreur moyenne: {errors_abs.mean():.4f}")
print(f"√âcart-type: {errors_abs.std():.4f}")
print(f"Seuil: {threshold:.4f}")
print(f"Nombre d'anomalies d√©tect√©es: {anomalies.sum()} / {len(anomalies)} ({100*anomalies.sum()/len(anomalies):.2f}%)")

# Visualisation
plt.figure(figsize=(14, 8))

# Pr√©dictions
plt.subplot(2, 1, 1)
plt.plot(lstm_actuals, label='R√©el', color='green', linewidth=2)
plt.plot(lstm_predictions, label='Pr√©dictions', color='blue', linestyle='--', alpha=0.7)
plt.scatter(anomaly_indices, lstm_actuals[anomaly_indices], 
            color='red', s=100, marker='x', label='Anomalies', zorder=5)
plt.title('D√©tection d\'Anomalies - S√©rie Temporelle', fontsize=14, fontweight='bold')
plt.ylabel('Valeur')
plt.legend()
plt.grid(True, alpha=0.3)

# Erreurs
plt.subplot(2, 1, 2)
plt.plot(errors_abs, color='purple', alpha=0.7, label='Erreur absolue')
plt.axhline(threshold, color='red', linestyle='--', linewidth=2, label=f'Seuil = {threshold:.2f}')
plt.scatter(anomaly_indices, errors_abs[anomaly_indices], 
            color='red', s=100, marker='x', zorder=5)
plt.title('Erreurs de Pr√©diction', fontsize=12)
plt.xlabel('Pas de temps')
plt.ylabel('Erreur')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

if len(anomaly_indices) > 0:
    print(f"\nPremi√®res anomalies d√©tect√©es (indices):")
    print(anomaly_indices[:10])

## 8. Multi-Step Forecasting

Pr√©dire plusieurs pas √† l'avance.

In [None]:
# Cr√©er donn√©es pour multi-step (pr√©dire 7 jours)
HORIZON_MULTI = 7

X_multi, y_multi = create_sequences(train_scaled, WINDOW_SIZE, HORIZON_MULTI)
X_multi_t = torch.FloatTensor(X_multi).unsqueeze(-1)
y_multi_t = torch.FloatTensor(y_multi)

print(f"X_multi shape: {X_multi_t.shape}  # (N, 30, 1)")
print(f"y_multi shape: {y_multi_t.shape}  # (N, 7)")

# Mod√®le multi-step
multi_model = LSTMForecaster(
    input_size=1,
    hidden_size=64,
    num_layers=2,
    output_size=HORIZON_MULTI,
    dropout=0.2
).to(device)

print(f"\nMod√®le pour pr√©diction {HORIZON_MULTI} pas √† l'avance")
print(multi_model)

## Conclusion

Dans ce notebook, nous avons explor√© :

1. **Pr√©paration des donn√©es** : sliding windows, normalisation
2. **LSTM** : architecture classique pour s√©ries temporelles
3. **GRU** : variante plus l√©g√®re et rapide
4. **Attention** : focus sur les instants les plus pertinents
5. **√âvaluation** : MAE, RMSE, MAPE
6. **D√©tection d'anomalies** : via erreur de pr√©diction

**Points cl√©s :**
- LSTM/GRU capturent les d√©pendances temporelles complexes
- Attention am√©liore l'interpr√©tabilit√©
- Normalisation cruciale (fit sur train uniquement)
- Sliding windows transforment s√©rie en dataset supervis√©
- Early stopping √©vite le surapprentissage

**Cas d'usage DL vs Classique :**
- **ARIMA** : s√©ries stationnaires, patterns lin√©aires, peu de donn√©es
- **LSTM/GRU** : patterns non-lin√©aires, d√©pendances long terme, beaucoup de donn√©es
- **Attention/Transformers** : tr√®s longues s√©quences, interpr√©tabilit√© importante