# üöÄ Google Colab Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ogautier1980/sandbox-ml/blob/main/cours/11_series_temporelles/11_exercices.ipynb)

**Si vous ex√©cutez ce notebook sur Google Colab**, ex√©cutez la cellule suivante pour installer les d√©pendances.

In [None]:
# Installation des d√©pendances (Google Colab uniquement)import sysIN_COLAB = 'google.colab' in sys.modulesif IN_COLAB:    print('üì¶ Installation des packages...')        # Packages ML de base    !pip install -q numpy pandas matplotlib seaborn scikit-learn        # D√©tection du chapitre et installation des d√©pendances sp√©cifiques    notebook_name = '11_exercices.ipynb'  # Sera remplac√© automatiquement        # Ch 06-08 : Deep Learning    if any(x in notebook_name for x in ['06_', '07_', '08_']):        !pip install -q torch torchvision torchaudio        # Ch 08 : NLP    if '08_' in notebook_name:        !pip install -q transformers datasets tokenizers        if 'rag' in notebook_name:            !pip install -q sentence-transformers faiss-cpu rank-bm25        # Ch 09 : Reinforcement Learning    if '09_' in notebook_name:        !pip install -q gymnasium[classic-control]        # Ch 04 : Boosting    if '04_' in notebook_name and 'boosting' in notebook_name:        !pip install -q xgboost lightgbm catboost        # Ch 05 : Clustering avanc√©    if '05_' in notebook_name:        !pip install -q umap-learn        # Ch 11 : S√©ries temporelles    if '11_' in notebook_name:        !pip install -q statsmodels prophet        # Ch 12 : Vision avanc√©e    if '12_' in notebook_name:        !pip install -q ultralytics timm segmentation-models-pytorch        # Ch 13 : Recommandation    if '13_' in notebook_name:        !pip install -q scikit-surprise implicit        # Ch 14 : MLOps    if '14_' in notebook_name:        !pip install -q mlflow fastapi pydantic        print('‚úÖ Installation termin√©e !')else:    print('‚ÑπÔ∏è  Environnement local d√©tect√©, les packages sont d√©j√† install√©s.')

# Chapitre 12 - S√©ries Temporelles : Exercices

Ce notebook contient 3 exercices pratiques avec solutions compl√®tes.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# S√©ries temporelles
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

---

## Exercice 1 : Pr√©diction de Ventes Mensuelles avec ARIMA

**Contexte :** Vous travaillez pour un retailer qui souhaite pr√©dire ses ventes mensuelles.

**Objectifs :**
1. G√©n√©rer des ventes mensuelles synth√©tiques avec tendance et saisonnalit√©
2. Analyser la s√©rie (d√©composition, stationnarit√©)
3. Ajuster un mod√®le ARIMA
4. Faire des pr√©dictions sur 12 mois
5. √âvaluer avec RMSE et MAPE

---

### Donn√©es

In [None]:
# TODO: G√©n√©rer ventes mensuelles (5 ans = 60 mois)
# - Tendance croissante: +2% par mois
# - Saisonnalit√© annuelle (pic en d√©cembre)
# - Bruit gaussien

def generate_monthly_sales(n_months=60, base_sales=10000, trend_rate=0.02, noise_std=500, seed=42):
    # TODO: Impl√©menter
    pass

# G√©n√©rer
# sales_df = generate_monthly_sales(n_months=60)

# TODO: Visualiser la s√©rie

### Analyse Exploratoire

In [None]:
# TODO: D√©composition de la s√©rie (trend, seasonality, residual)
# Utiliser seasonal_decompose avec period=12

In [None]:
# TODO: Test de stationnarit√© (ADF test)
# Si non-stationnaire, diff√©rencier la s√©rie

### Mod√®le ARIMA

In [None]:
# TODO: Split train/test (80/20)
# TODO: S√©lectionner param√®tres ARIMA (p, d, q)
# TODO: Entra√Æner ARIMA
# TODO: Pr√©dire sur test set
# TODO: Calculer RMSE et MAPE

---

## SOLUTION Exercice 1

---

In [None]:
# SOLUTION: G√©n√©ration des donn√©es
def generate_monthly_sales(n_months=60, base_sales=10000, trend_rate=0.02, noise_std=500, seed=42):
    np.random.seed(seed)
    
    dates = pd.date_range(start='2019-01-01', periods=n_months, freq='MS')
    t = np.arange(n_months)
    
    # Tendance
    trend = base_sales * (1 + trend_rate) ** t
    
    # Saisonnalit√© (pic en d√©cembre = mois 11)
    seasonality = 2000 * np.sin(2 * np.pi * t / 12 - np.pi/2)  # Max au mois 12
    
    # Bruit
    noise = np.random.normal(0, noise_std, n_months)
    
    # Ventes
    sales = trend + seasonality + noise
    
    df = pd.DataFrame({
        'date': dates,
        'sales': sales
    })
    df.set_index('date', inplace=True)
    
    return df

# G√©n√©rer
sales_df = generate_monthly_sales(n_months=60)

print(f"Ventes g√©n√©r√©es: {len(sales_df)} mois")
print(sales_df.head())

# Visualisation
plt.figure(figsize=(14, 5))
plt.plot(sales_df.index, sales_df['sales'], marker='o', color='blue')
plt.title('Ventes Mensuelles - 5 ans', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Ventes ($)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# SOLUTION: D√©composition
decomposition = seasonal_decompose(sales_df['sales'], model='additive', period=12)

fig, axes = plt.subplots(4, 1, figsize=(14, 10))
decomposition.observed.plot(ax=axes[0], title='Ventes Observ√©es', color='blue')
decomposition.trend.plot(ax=axes[1], title='Tendance', color='green')
decomposition.seasonal.plot(ax=axes[2], title='Saisonnalit√©', color='orange')
decomposition.resid.plot(ax=axes[3], title='R√©sidus', color='red')
plt.tight_layout()
plt.show()

In [None]:
# SOLUTION: Test ADF
result = adfuller(sales_df['sales'])
print(f"ADF Statistic: {result[0]:.6f}")
print(f"p-value: {result[1]:.6f}")

if result[1] < 0.05:
    print("‚úÖ S√©rie stationnaire")
else:
    print("‚ùå S√©rie non-stationnaire -> diff√©renciation n√©cessaire")

# Diff√©renciation
sales_diff = sales_df['sales'].diff().dropna()
result_diff = adfuller(sales_diff)
print(f"\nApr√®s diff√©renciation:")
print(f"ADF Statistic: {result_diff[0]:.6f}")
print(f"p-value: {result_diff[1]:.6f}")

In [None]:
# SOLUTION: ACF et PACF
fig, axes = plt.subplots(1, 2, figsize=(14, 4))
plot_acf(sales_diff, lags=20, ax=axes[0])
plot_pacf(sales_diff, lags=20, ax=axes[1])
plt.tight_layout()
plt.show()

In [None]:
# SOLUTION: ARIMA
train_size = int(len(sales_df) * 0.8)
train = sales_df['sales'][:train_size]
test = sales_df['sales'][train_size:]

print(f"Train: {len(train)} mois")
print(f"Test: {len(test)} mois")

# ARIMA(1,1,1) bas√© sur ACF/PACF
model = ARIMA(train, order=(1, 1, 1))
fitted = model.fit()

print(f"\nAIC: {fitted.aic:.2f}")
print(f"BIC: {fitted.bic:.2f}")

# Pr√©dictions
forecast = fitted.forecast(steps=len(test))

# M√©triques
rmse = np.sqrt(mean_squared_error(test, forecast))
mape = np.mean(np.abs((test - forecast) / test)) * 100

print(f"\n=== M√©triques ===")
print(f"RMSE: {rmse:.2f}$")
print(f"MAPE: {mape:.2f}%")

# Visualisation
plt.figure(figsize=(14, 6))
plt.plot(train.index, train, label='Train', color='blue')
plt.plot(test.index, test, label='Test (R√©el)', color='green', marker='o')
plt.plot(test.index, forecast, label='Pr√©dictions ARIMA', color='red', linestyle='--', marker='x')
plt.axvline(train.index[-1], color='black', linestyle=':', label='Train/Test Split')
plt.title('Pr√©diction Ventes Mensuelles - ARIMA(1,1,1)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Ventes ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---

## Exercice 2 : Forecasting de Consommation √âlectrique avec LSTM

**Contexte :** Pr√©dire la consommation √©lectrique horaire en utilisant des features multivari√©es.

**Objectifs :**
1. Cr√©er une s√©rie multivari√©e (consommation, temp√©rature, heure du jour, jour de la semaine)
2. Feature engineering (lags, rolling stats)
3. Cr√©er des sliding windows
4. Entra√Æner un LSTM
5. √âvaluer les pr√©dictions

---

### Donn√©es

In [None]:
# TODO: G√©n√©rer donn√©es horaires sur 30 jours
# Features:
# - temperature: 15-30¬∞C avec variation journali√®re
# - hour: 0-23
# - weekday: 0-6
# - consumption: corr√©l√©e avec temp√©rature et heure (pic en journ√©e)

### Feature Engineering et Preprocessing

In [None]:
# TODO: Cr√©er features suppl√©mentaires
# - Lags de consommation (lag 1, 24, 168)
# - Rolling mean (fen√™tre 24h)
# - Features cycliques pour hour (sin/cos)

### LSTM Multivari√©

In [None]:
# TODO: Cr√©er sliding windows avec features multivari√©es
# TODO: Normaliser les donn√©es
# TODO: D√©finir architecture LSTM
# TODO: Entra√Æner
# TODO: √âvaluer

---

## SOLUTION Exercice 2

---

In [None]:
# SOLUTION: G√©n√©ration donn√©es
def generate_energy_consumption(n_hours=30*24, seed=42):
    np.random.seed(seed)
    
    dates = pd.date_range(start='2024-01-01', periods=n_hours, freq='H')
    
    # Features temporelles
    hour = dates.hour
    weekday = dates.weekday
    
    # Temp√©rature (variation journali√®re + bruit)
    t = np.arange(n_hours)
    temp = 20 + 5 * np.sin(2 * np.pi * t / 24 - np.pi/2) + np.random.normal(0, 1, n_hours)
    
    # Consommation (d√©pend de heure et temp√©rature)
    # Base + pic journ√©e (8h-20h) + effet temp√©rature + weekend
    consumption_base = 5000
    hour_effect = 1500 * np.sin(2 * np.pi * (hour - 6) / 24)
    hour_effect[hour_effect < 0] = 0
    temp_effect = 50 * (temp - 20)
    weekend_effect = -500 * ((weekday == 5) | (weekday == 6))
    noise = np.random.normal(0, 200, n_hours)
    
    consumption = consumption_base + hour_effect + temp_effect + weekend_effect + noise
    
    df = pd.DataFrame({
        'date': dates,
        'temperature': temp,
        'hour': hour,
        'weekday': weekday,
        'consumption': consumption
    })
    df.set_index('date', inplace=True)
    
    return df

energy_df = generate_energy_consumption(n_hours=30*24)

print(f"Donn√©es g√©n√©r√©es: {len(energy_df)} heures")
print(energy_df.head(10))

# Visualisation
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

axes[0].plot(energy_df.index, energy_df['consumption'], color='blue', linewidth=0.8)
axes[0].set_title('Consommation √âlectrique Horaire', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Consommation (kWh)')
axes[0].grid(True, alpha=0.3)

axes[1].plot(energy_df.index, energy_df['temperature'], color='red', linewidth=0.8)
axes[1].set_title('Temp√©rature', fontsize=12)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Temp√©rature (¬∞C)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# SOLUTION: Feature Engineering
df_features = energy_df.copy()

# Lags
df_features['consumption_lag1'] = df_features['consumption'].shift(1)
df_features['consumption_lag24'] = df_features['consumption'].shift(24)

# Rolling mean (24h)
df_features['consumption_rolling_mean_24'] = df_features['consumption'].rolling(window=24).mean()

# Features cycliques pour heure
df_features['hour_sin'] = np.sin(2 * np.pi * df_features['hour'] / 24)
df_features['hour_cos'] = np.cos(2 * np.pi * df_features['hour'] / 24)

# Supprimer NaN
df_features = df_features.dropna()

print(f"Features cr√©√©es: {df_features.shape[1]} colonnes")
print(df_features.columns.tolist())
print(f"\nDonn√©es apr√®s feature engineering: {len(df_features)} heures")

In [None]:
# SOLUTION: Pr√©paration pour LSTM
# Features: toutes sauf consumption (target)
feature_cols = ['temperature', 'hour', 'weekday', 'consumption_lag1', 
                'consumption_lag24', 'consumption_rolling_mean_24', 
                'hour_sin', 'hour_cos']

X = df_features[feature_cols].values
y = df_features['consumption'].values

# Train/Val/Test split
train_size = int(len(X) * 0.7)
val_size = int(len(X) * 0.15)

X_train = X[:train_size]
y_train = y[:train_size]
X_val = X[train_size:train_size+val_size]
y_val = y[train_size:train_size+val_size]
X_test = X[train_size+val_size:]
y_test = y[train_size+val_size:]

# Normalisation
scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_val_scaled = scaler_X.transform(X_val)
X_test_scaled = scaler_X.transform(X_test)

y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()
y_val_scaled = scaler_y.transform(y_val.reshape(-1, 1)).flatten()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).flatten()

# Sliding windows
def create_sequences_multivariate(X, y, window_size):
    X_seq, y_seq = [], []
    for i in range(len(X) - window_size):
        X_seq.append(X[i:i+window_size])
        y_seq.append(y[i+window_size])
    return np.array(X_seq), np.array(y_seq)

WINDOW_SIZE = 24  # 24h

X_train_seq, y_train_seq = create_sequences_multivariate(X_train_scaled, y_train_scaled, WINDOW_SIZE)
X_val_seq, y_val_seq = create_sequences_multivariate(X_val_scaled, y_val_scaled, WINDOW_SIZE)
X_test_seq, y_test_seq = create_sequences_multivariate(X_test_scaled, y_test_scaled, WINDOW_SIZE)

print(f"X_train_seq shape: {X_train_seq.shape}  # (samples, window, features)")
print(f"y_train_seq shape: {y_train_seq.shape}")

# Tensors
X_train_t = torch.FloatTensor(X_train_seq)
y_train_t = torch.FloatTensor(y_train_seq)
X_val_t = torch.FloatTensor(X_val_seq)
y_val_t = torch.FloatTensor(y_val_seq)
X_test_t = torch.FloatTensor(X_test_seq)
y_test_t = torch.FloatTensor(y_test_seq)

# DataLoaders
train_dataset = TensorDataset(X_train_t, y_train_t)
val_dataset = TensorDataset(X_val_t, y_val_t)
test_dataset = TensorDataset(X_test_t, y_test_t)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [None]:
# SOLUTION: Mod√®le LSTM Multivari√©
class LSTMMultivariate(nn.Module):
    def __init__(self, input_size, hidden_size=64, num_layers=2, dropout=0.2):
        super(LSTMMultivariate, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, 
                           batch_first=True, dropout=dropout if num_layers > 1 else 0)
        self.fc = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        last_output = lstm_out[:, -1, :]
        output = self.fc(last_output)
        return output.squeeze()

model = LSTMMultivariate(input_size=len(feature_cols), hidden_size=64, num_layers=2).to(device)

print(model)
print(f"\nParam√®tres: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

In [None]:
# SOLUTION: Entra√Ænement
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

num_epochs = 50
train_losses = []
val_losses = []

for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    
    for batch_X, batch_y in train_loader:
        batch_X = batch_X.to(device)
        batch_y = batch_y.to(device)
        
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item()
    
    train_loss /= len(train_loader)
    train_losses.append(train_loss)
    
    # Validation
    model.eval()
    val_loss = 0.0
    
    with torch.no_grad():
        for batch_X, batch_y in val_loader:
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            val_loss += loss.item()
    
    val_loss /= len(val_loader)
    val_losses.append(val_loss)
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Train: {train_loss:.6f}, Val: {val_loss:.6f}")

print("\n‚úÖ Entra√Ænement termin√©")

In [None]:
# SOLUTION: √âvaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for batch_X, batch_y in test_loader:
        batch_X = batch_X.to(device)
        outputs = model(batch_X)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(batch_y.numpy())

predictions = np.array(predictions)
actuals = np.array(actuals)

# Inverse transform
predictions_orig = scaler_y.inverse_transform(predictions.reshape(-1, 1)).flatten()
actuals_orig = scaler_y.inverse_transform(actuals.reshape(-1, 1)).flatten()

# M√©triques
mae = mean_absolute_error(actuals_orig, predictions_orig)
rmse = np.sqrt(mean_squared_error(actuals_orig, predictions_orig))
mape = np.mean(np.abs((actuals_orig - predictions_orig) / actuals_orig)) * 100

print("\n=== M√©triques LSTM Multivari√© ===")
print(f"MAE:  {mae:.2f} kWh")
print(f"RMSE: {rmse:.2f} kWh")
print(f"MAPE: {mape:.2f}%")

# Visualisation
plt.figure(figsize=(14, 6))
plt.plot(actuals_orig[:200], label='R√©el', color='green', linewidth=1.5)
plt.plot(predictions_orig[:200], label='Pr√©dictions LSTM', color='red', linestyle='--', alpha=0.8)
plt.title('Pr√©diction Consommation √âlectrique - LSTM Multivari√©', fontsize=14, fontweight='bold')
plt.xlabel('Heures')
plt.ylabel('Consommation (kWh)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

---

## Exercice 3 : D√©tection d'Anomalies dans une S√©rie Temporelle

**Contexte :** Monitorer des transactions bancaires et d√©tecter les fraudes.

**Objectifs :**
1. G√©n√©rer s√©rie de transactions avec anomalies inject√©es
2. Entra√Æner LSTM pour pr√©dire transactions normales
3. D√©tecter anomalies via erreur de pr√©diction
4. √âvaluer avec pr√©cision/rappel si labels disponibles

---

---

## SOLUTION Exercice 3

---

In [None]:
# SOLUTION: G√©n√©ration avec anomalies
def generate_transactions_with_anomalies(n=1000, anomaly_rate=0.05, seed=42):
    np.random.seed(seed)
    
    # Transactions normales (pattern journalier + bruit)
    t = np.arange(n)
    base = 1000
    daily_pattern = 300 * np.sin(2 * np.pi * t / 24)
    noise = np.random.normal(0, 50, n)
    transactions = base + daily_pattern + noise
    
    # Injecter anomalies (valeurs extr√™mes)
    n_anomalies = int(n * anomaly_rate)
    anomaly_indices = np.random.choice(n, size=n_anomalies, replace=False)
    
    labels = np.zeros(n)
    for idx in anomaly_indices:
        # Anomalie = transaction 3-5x sup√©rieure √† la normale
        transactions[idx] *= np.random.uniform(3, 5)
        labels[idx] = 1
    
    dates = pd.date_range(start='2024-01-01', periods=n, freq='H')
    
    df = pd.DataFrame({
        'date': dates,
        'amount': transactions,
        'is_anomaly': labels
    })
    df.set_index('date', inplace=True)
    
    return df

transactions_df = generate_transactions_with_anomalies(n=1000, anomaly_rate=0.05)

print(f"Transactions g√©n√©r√©es: {len(transactions_df)}")
print(f"Anomalies: {transactions_df['is_anomaly'].sum()} ({100*transactions_df['is_anomaly'].mean():.1f}%)")

# Visualisation
plt.figure(figsize=(14, 6))
normal = transactions_df[transactions_df['is_anomaly'] == 0]
anomalies = transactions_df[transactions_df['is_anomaly'] == 1]

plt.plot(normal.index, normal['amount'], color='blue', linewidth=0.8, label='Normal')
plt.scatter(anomalies.index, anomalies['amount'], color='red', s=100, marker='x', 
           label=f'Anomalies ({len(anomalies)})', zorder=5)
plt.title('Transactions Bancaires avec Anomalies', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Montant ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# SOLUTION: Pr√©paration donn√©es
data = transactions_df['amount'].values
labels = transactions_df['is_anomaly'].values

# Normalisation
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data.reshape(-1, 1)).flatten()

# Windows
WINDOW_SIZE = 24

def create_sequences(data, labels, window_size):
    X, y, y_labels = [], [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size])
        y_labels.append(labels[i+window_size])
    return np.array(X), np.array(y), np.array(y_labels)

X, y, y_labels = create_sequences(data_scaled, labels, WINDOW_SIZE)

# Split (70/30)
train_size = int(len(X) * 0.7)
X_train = X[:train_size]
y_train = y[:train_size]
X_test = X[train_size:]
y_test = y[train_size:]
y_test_labels = y_labels[train_size:]

# Tensors
X_train_t = torch.FloatTensor(X_train).unsqueeze(-1)
y_train_t = torch.FloatTensor(y_train)
X_test_t = torch.FloatTensor(X_test).unsqueeze(-1)
y_test_t = torch.FloatTensor(y_test)

train_loader = DataLoader(TensorDataset(X_train_t, y_train_t), batch_size=32, shuffle=True)
test_loader = DataLoader(TensorDataset(X_test_t, y_test_t), batch_size=32, shuffle=False)

print(f"Train: {len(X_train)}, Test: {len(X_test)}")

In [None]:
# SOLUTION: LSTM pour d√©tection
class LSTMAnomaly(nn.Module):
    def __init__(self, input_size=1, hidden_size=32, num_layers=1):
        super(LSTMAnomaly, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        lstm_out, _ = self.lstm(x)
        output = self.fc(lstm_out[:, -1, :])
        return output.squeeze()

anomaly_model = LSTMAnomaly(input_size=1, hidden_size=32).to(device)

# Entra√Ænement
criterion = nn.MSELoss()
optimizer = optim.Adam(anomaly_model.parameters(), lr=0.001)

num_epochs = 30
for epoch in range(num_epochs):
    anomaly_model.train()
    train_loss = 0.0
    
    for batch_X, batch_y in train_loader:
        batch_X = batch_X.to(device)
        batch_y = batch_y.to(device)
        
        outputs = anomaly_model(batch_X)
        loss = criterion(outputs, batch_y)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item()
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {train_loss/len(train_loader):.6f}")

print("\n‚úÖ Entra√Ænement termin√©")

In [None]:
# SOLUTION: D√©tection d'anomalies
anomaly_model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for batch_X, batch_y in test_loader:
        batch_X = batch_X.to(device)
        outputs = anomaly_model(batch_X)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(batch_y.numpy())

predictions = np.array(predictions)
actuals = np.array(actuals)

# Erreur de pr√©diction
errors = np.abs(actuals - predictions)

# Seuil (moyenne + 3 std)
threshold = errors.mean() + 3 * errors.std()
detected_anomalies = errors > threshold

print(f"\n=== D√©tection d'Anomalies ===")
print(f"Seuil: {threshold:.6f}")
print(f"Anomalies d√©tect√©es: {detected_anomalies.sum()} / {len(detected_anomalies)}")
print(f"Vraies anomalies (labels): {y_test_labels.sum()}")

# √âvaluation si labels disponibles
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

precision = precision_score(y_test_labels, detected_anomalies)
recall = recall_score(y_test_labels, detected_anomalies)
f1 = f1_score(y_test_labels, detected_anomalies)

print(f"\nPr√©cision: {precision:.3f}")
print(f"Rappel:    {recall:.3f}")
print(f"F1-Score:  {f1:.3f}")

print(f"\nMatrice de Confusion:")
print(confusion_matrix(y_test_labels, detected_anomalies))

# Visualisation
plt.figure(figsize=(14, 8))

plt.subplot(2, 1, 1)
plt.plot(actuals, label='R√©el', color='blue', linewidth=1)
plt.plot(predictions, label='Pr√©dictions', color='green', linestyle='--', alpha=0.7)
true_anomalies = np.where(y_test_labels == 1)[0]
detected = np.where(detected_anomalies)[0]
plt.scatter(true_anomalies, actuals[true_anomalies], color='red', s=100, marker='o', 
           label='Vraies anomalies', zorder=5)
plt.scatter(detected, actuals[detected], color='orange', s=50, marker='x', 
           label='D√©tect√©es', zorder=4)
plt.title('D√©tection d\'Anomalies - Transactions', fontsize=14, fontweight='bold')
plt.ylabel('Montant (normalis√©)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(2, 1, 2)
plt.plot(errors, color='purple', alpha=0.7)
plt.axhline(threshold, color='red', linestyle='--', linewidth=2, label=f'Seuil = {threshold:.4f}')
plt.scatter(detected, errors[detected], color='red', s=50, marker='x', zorder=5)
plt.title('Erreurs de Pr√©diction', fontsize=12)
plt.xlabel('Pas de temps')
plt.ylabel('Erreur absolue')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Conclusion des Exercices

### Exercice 1 - ARIMA
- D√©composition et test de stationnarit√© essentiels
- Diff√©renciation pour stationnariser
- S√©lection de (p,d,q) via ACF/PACF
- ARIMA efficace pour s√©ries univari√©es avec patterns lin√©aires

### Exercice 2 - LSTM Multivari√©
- Feature engineering crucial (lags, rolling stats, features cycliques)
- LSTM capte d√©pendances complexes entre features
- Normalisation et windowing appropri√©s
- Excellent pour forecasting avec variables exog√®nes

### Exercice 3 - D√©tection d'Anomalies
- Mod√®le entra√Æn√© sur patterns normaux
- Anomalies = erreurs de pr√©diction √©lev√©es
- Seuil bas√© sur distribution des erreurs
- Trade-off pr√©cision/rappel selon seuil

**Points cl√©s g√©n√©raux :**
- Toujours visualiser les donn√©es d'abord
- Time series split (jamais CV classique)
- Normalisation sur train uniquement
- M√©triques appropri√©es (MAE, RMSE, MAPE)
- Choisir mod√®le selon nature des donn√©es