# üöÄ Google Colab Setup

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ogautier1980/sandbox-ml/blob/main/cours/11_series_temporelles/11_demo_arima_prophet.ipynb)

**Si vous ex√©cutez ce notebook sur Google Colab**, ex√©cutez la cellule suivante pour installer les d√©pendances.

In [None]:
# Installation des d√©pendances (Google Colab uniquement)import sysIN_COLAB = 'google.colab' in sys.modulesif IN_COLAB:    print('üì¶ Installation des packages...')        # Packages ML de base    !pip install -q numpy pandas matplotlib seaborn scikit-learn        # D√©tection du chapitre et installation des d√©pendances sp√©cifiques    notebook_name = '11_demo_arima_prophet.ipynb'  # Sera remplac√© automatiquement        # Ch 06-08 : Deep Learning    if any(x in notebook_name for x in ['06_', '07_', '08_']):        !pip install -q torch torchvision torchaudio        # Ch 08 : NLP    if '08_' in notebook_name:        !pip install -q transformers datasets tokenizers        if 'rag' in notebook_name:            !pip install -q sentence-transformers faiss-cpu rank-bm25        # Ch 09 : Reinforcement Learning    if '09_' in notebook_name:        !pip install -q gymnasium[classic-control]        # Ch 04 : Boosting    if '04_' in notebook_name and 'boosting' in notebook_name:        !pip install -q xgboost lightgbm catboost        # Ch 05 : Clustering avanc√©    if '05_' in notebook_name:        !pip install -q umap-learn        # Ch 11 : S√©ries temporelles    if '11_' in notebook_name:        !pip install -q statsmodels prophet        # Ch 12 : Vision avanc√©e    if '12_' in notebook_name:        !pip install -q ultralytics timm segmentation-models-pytorch        # Ch 13 : Recommandation    if '13_' in notebook_name:        !pip install -q scikit-surprise implicit        # Ch 14 : MLOps    if '14_' in notebook_name:        !pip install -q mlflow fastapi pydantic        print('‚úÖ Installation termin√©e !')else:    print('‚ÑπÔ∏è  Environnement local d√©tect√©, les packages sont d√©j√† install√©s.')

# Chapitre 12 - S√©ries Temporelles : ARIMA & Prophet

**Objectifs :**
- Analyser et visualiser des s√©ries temporelles
- Tester la stationnarit√© (ADF test)
- D√©composer en tendance, saisonnalit√©, r√©sidu
- Mod√©liser avec ARIMA et SARIMA
- Utiliser Prophet de Facebook pour forecasting
- √âvaluer les pr√©dictions avec m√©triques appropri√©es

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# S√©ries temporelles
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Prophet
try:
    from prophet import Prophet
    PROPHET_AVAILABLE = True
except ImportError:
    print("Prophet non install√©. Pour l'installer: pip install prophet")
    PROPHET_AVAILABLE = False

# M√©triques
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("Biblioth√®ques charg√©es avec succ√®s")

## 1. G√©n√©ration de Donn√©es Synth√©tiques

Cr√©ons une s√©rie temporelle avec tendance, saisonnalit√© et bruit.

In [None]:
def generate_timeseries(n=365*3, trend_coef=0.1, seasonal_period=365, noise_std=10, seed=42):
    """
    G√©n√®re une s√©rie temporelle avec tendance, saisonnalit√© et bruit
    
    Param√®tres:
    - n: nombre de points
    - trend_coef: coefficient de tendance lin√©aire
    - seasonal_period: p√©riode de la saisonnalit√©
    - noise_std: √©cart-type du bruit
    """
    np.random.seed(seed)
    
    # Dates
    dates = pd.date_range(start='2021-01-01', periods=n, freq='D')
    
    # Tendance lin√©aire
    t = np.arange(n)
    trend = trend_coef * t
    
    # Saisonnalit√© (annuelle)
    seasonality = 20 * np.sin(2 * np.pi * t / seasonal_period)
    
    # Bruit
    noise = np.random.normal(0, noise_std, n)
    
    # S√©rie compl√®te
    y = 100 + trend + seasonality + noise
    
    # DataFrame
    df = pd.DataFrame({
        'date': dates,
        'value': y,
        'trend': 100 + trend,
        'seasonality': seasonality,
        'noise': noise
    })
    df.set_index('date', inplace=True)
    
    return df

# G√©n√©rer donn√©es
df = generate_timeseries(n=365*3, trend_coef=0.05, seasonal_period=365, noise_std=8)

print(f"S√©rie temporelle g√©n√©r√©e: {len(df)} observations")
print(f"P√©riode: {df.index.min()} √† {df.index.max()}")
print(f"\nPremi√®res observations:")
print(df.head())

In [None]:
# Visualisation
fig, axes = plt.subplots(4, 1, figsize=(14, 12))

# S√©rie compl√®te
axes[0].plot(df.index, df['value'], label='S√©rie temporelle', color='blue')
axes[0].set_title('S√©rie Temporelle Compl√®te', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Valeur')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Tendance
axes[1].plot(df.index, df['trend'], label='Tendance', color='green')
axes[1].set_title('Composante Tendance', fontsize=12)
axes[1].set_ylabel('Tendance')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Saisonnalit√©
axes[2].plot(df.index, df['seasonality'], label='Saisonnalit√©', color='orange')
axes[2].set_title('Composante Saisonnalit√©', fontsize=12)
axes[2].set_ylabel('Saisonnalit√©')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

# Bruit
axes[3].plot(df.index, df['noise'], label='Bruit', color='red', alpha=0.5)
axes[3].set_title('Composante Bruit', fontsize=12)
axes[3].set_ylabel('Bruit')
axes[3].set_xlabel('Date')
axes[3].legend()
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 2. Analyse Exploratoire

### 2.1 Statistiques Descriptives

In [None]:
print("=== Statistiques Descriptives ===")
print(df['value'].describe())
print(f"\nSkewness: {df['value'].skew():.4f}")
print(f"Kurtosis: {df['value'].kurtosis():.4f}")

### 2.2 D√©composition de la S√©rie

In [None]:
# D√©composition additive
decomposition = seasonal_decompose(df['value'], model='additive', period=365)

fig, axes = plt.subplots(4, 1, figsize=(14, 10))

decomposition.observed.plot(ax=axes[0], title='S√©rie Observ√©e', color='blue')
axes[0].set_ylabel('Observ√©e')
axes[0].grid(True, alpha=0.3)

decomposition.trend.plot(ax=axes[1], title='Tendance', color='green')
axes[1].set_ylabel('Tendance')
axes[1].grid(True, alpha=0.3)

decomposition.seasonal.plot(ax=axes[2], title='Saisonnalit√©', color='orange')
axes[2].set_ylabel('Saisonnalit√©')
axes[2].grid(True, alpha=0.3)

decomposition.resid.plot(ax=axes[3], title='R√©sidus', color='red')
axes[3].set_ylabel('R√©sidus')
axes[3].set_xlabel('Date')
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 2.3 Test de Stationnarit√© (ADF)

In [None]:
def adf_test(timeseries, name=''):
    """
    Test de Dickey-Fuller Augment√© pour la stationnarit√©
    """
    print(f"\n=== Test ADF {name} ===")
    result = adfuller(timeseries.dropna())
    
    print(f"ADF Statistic: {result[0]:.6f}")
    print(f"p-value: {result[1]:.6f}")
    print(f"Lags utilis√©s: {result[2]}")
    print(f"Nombre d'observations: {result[3]}")
    
    print("\nValeurs critiques:")
    for key, value in result[4].items():
        print(f"  {key}: {value:.3f}")
    
    if result[1] < 0.05:
        print("\n‚úÖ R√©sultat: S√©rie STATIONNAIRE (rejeter H0, p < 0.05)")
    else:
        print("\n‚ùå R√©sultat: S√©rie NON-STATIONNAIRE (ne pas rejeter H0, p >= 0.05)")
    
    return result

# Test sur s√©rie originale
adf_result = adf_test(df['value'], name='S√©rie Originale')

### 2.4 Stationnarisation par Diff√©renciation

In [None]:
# Premi√®re diff√©renciation
df['value_diff1'] = df['value'].diff()

# Deuxi√®me diff√©renciation (si n√©cessaire)
df['value_diff2'] = df['value_diff1'].diff()

# Visualisation
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

axes[0].plot(df.index, df['value'], color='blue')
axes[0].set_title('S√©rie Originale', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Valeur')
axes[0].grid(True, alpha=0.3)

axes[1].plot(df.index, df['value_diff1'], color='green')
axes[1].set_title('Apr√®s 1√®re Diff√©renciation', fontsize=12)
axes[1].set_ylabel('Diff√©rence 1')
axes[1].grid(True, alpha=0.3)

axes[2].plot(df.index, df['value_diff2'], color='orange')
axes[2].set_title('Apr√®s 2√®me Diff√©renciation', fontsize=12)
axes[2].set_ylabel('Diff√©rence 2')
axes[2].set_xlabel('Date')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Tests ADF
adf_test(df['value_diff1'], name='Apr√®s 1√®re Diff√©renciation')
adf_test(df['value_diff2'], name='Apr√®s 2√®me Diff√©renciation')

### 2.5 ACF et PACF

In [None]:
# ACF et PACF de la s√©rie diff√©renci√©e
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ACF
plot_acf(df['value_diff1'].dropna(), lags=40, ax=axes[0])
axes[0].set_title('Autocorr√©lation (ACF)', fontsize=12, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# PACF
plot_pacf(df['value_diff1'].dropna(), lags=40, ax=axes[1])
axes[1].set_title('Autocorr√©lation Partielle (PACF)', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Mod√®le ARIMA

### 3.1 S√©lection des Param√®tres (p, d, q)

In [None]:
# Train/Test Split (80/20)
train_size = int(len(df) * 0.8)
train = df['value'][:train_size]
test = df['value'][train_size:]

print(f"Train size: {len(train)} observations")
print(f"Test size: {len(test)} observations")
print(f"Train period: {train.index.min()} √† {train.index.max()}")
print(f"Test period: {test.index.min()} √† {test.index.max()}")

In [None]:
# Grid search pour trouver les meilleurs param√®tres ARIMA
def arima_grid_search(train_data, p_range, d_range, q_range):
    """
    Grid search pour s√©lectionner les meilleurs param√®tres ARIMA
    """
    best_aic = np.inf
    best_params = None
    best_model = None
    
    results = []
    
    for p in p_range:
        for d in d_range:
            for q in q_range:
                try:
                    model = ARIMA(train_data, order=(p, d, q))
                    fitted = model.fit()
                    aic = fitted.aic
                    
                    results.append({
                        'p': p, 'd': d, 'q': q,
                        'AIC': aic,
                        'BIC': fitted.bic
                    })
                    
                    if aic < best_aic:
                        best_aic = aic
                        best_params = (p, d, q)
                        best_model = fitted
                        
                except Exception as e:
                    continue
    
    return best_params, best_model, pd.DataFrame(results)

# Recherche
print("Recherche des meilleurs param√®tres ARIMA...")
best_params, best_model, results_df = arima_grid_search(
    train,
    p_range=range(0, 4),
    d_range=range(0, 3),
    q_range=range(0, 4)
)

print(f"\nMeilleurs param√®tres: ARIMA{best_params}")
print(f"AIC: {best_model.aic:.2f}")
print(f"BIC: {best_model.bic:.2f}")

# Top 10 mod√®les
print("\nTop 10 mod√®les (par AIC):")
print(results_df.sort_values('AIC').head(10))

### 3.2 Entra√Ænement et Pr√©diction

In [None]:
# R√©sum√© du mod√®le
print(best_model.summary())

In [None]:
# Pr√©dictions sur test set
forecast = best_model.forecast(steps=len(test))

# Visualisation
plt.figure(figsize=(14, 6))
plt.plot(train.index, train, label='Train', color='blue')
plt.plot(test.index, test, label='Test (R√©el)', color='green')
plt.plot(test.index, forecast, label=f'Pr√©dictions ARIMA{best_params}', color='red', linestyle='--')
plt.axvline(train.index[-1], color='black', linestyle=':', label='Train/Test Split')
plt.title(f'Pr√©dictions ARIMA{best_params}', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Valeur')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### 3.3 √âvaluation

In [None]:
def evaluate_forecast(y_true, y_pred, model_name='Mod√®le'):
    """
    Calcule et affiche les m√©triques de forecasting
    """
    mae = mean_absolute_error(y_true, y_pred)
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    
    print(f"\n=== M√©triques {model_name} ===")
    print(f"MAE (Mean Absolute Error):       {mae:.4f}")
    print(f"MSE (Mean Squared Error):        {mse:.4f}")
    print(f"RMSE (Root Mean Squared Error):  {rmse:.4f}")
    print(f"MAPE (Mean Absolute % Error):    {mape:.2f}%")
    
    return {'MAE': mae, 'MSE': mse, 'RMSE': rmse, 'MAPE': mape}

arima_metrics = evaluate_forecast(test.values, forecast.values, f'ARIMA{best_params}')

### 3.4 Diagnostic des R√©sidus

In [None]:
# Diagnostic plot
best_model.plot_diagnostics(figsize=(14, 10))
plt.tight_layout()
plt.show()

## 4. Mod√®le SARIMA (avec Saisonnalit√©)

Pour capturer la saisonnalit√© annuelle.

In [None]:
# SARIMA avec saisonnalit√© annuelle (p√©riode = 365 jours)
# Param√®tres: SARIMA(p,d,q)(P,D,Q,s)
sarima_order = (1, 1, 1)  # Non-saisonnier
seasonal_order = (1, 1, 1, 365)  # Saisonnier avec p√©riode 365

print(f"Entra√Ænement SARIMA{sarima_order}x{seasonal_order}...")

sarima_model = SARIMAX(train, 
                       order=sarima_order,
                       seasonal_order=seasonal_order)
sarima_fitted = sarima_model.fit(disp=False)

print(f"\nAIC: {sarima_fitted.aic:.2f}")
print(f"BIC: {sarima_fitted.bic:.2f}")

In [None]:
# Pr√©dictions SARIMA
sarima_forecast = sarima_fitted.forecast(steps=len(test))

# Visualisation
plt.figure(figsize=(14, 6))
plt.plot(train.index, train, label='Train', color='blue')
plt.plot(test.index, test, label='Test (R√©el)', color='green')
plt.plot(test.index, sarima_forecast, label=f'SARIMA{sarima_order}x{seasonal_order}', 
         color='purple', linestyle='--')
plt.axvline(train.index[-1], color='black', linestyle=':', label='Train/Test Split')
plt.title('Pr√©dictions SARIMA', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Valeur')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# M√©triques
sarima_metrics = evaluate_forecast(test.values, sarima_forecast.values, 'SARIMA')

## 5. Prophet de Facebook

Framework simple et robuste pour le forecasting avec saisonnalit√©s multiples.

In [None]:
if PROPHET_AVAILABLE:
    # Pr√©parer les donn√©es au format Prophet (colonnes 'ds' et 'y')
    train_prophet = pd.DataFrame({
        'ds': train.index,
        'y': train.values
    })
    
    # Cr√©er et entra√Æner le mod√®le
    prophet_model = Prophet(
        yearly_seasonality=True,
        weekly_seasonality=False,
        daily_seasonality=False,
        seasonality_mode='additive',
        changepoint_prior_scale=0.05  # Flexibilit√© des changepoints
    )
    
    print("Entra√Ænement du mod√®le Prophet...")
    prophet_model.fit(train_prophet)
    print("‚úÖ Entra√Ænement termin√©")
else:
    print("‚ö†Ô∏è Prophet non disponible. Installer avec: pip install prophet")

In [None]:
if PROPHET_AVAILABLE:
    # Cr√©er dataframe pour pr√©dictions futures
    future = prophet_model.make_future_dataframe(periods=len(test), freq='D')
    
    # Pr√©dictions
    prophet_forecast = prophet_model.predict(future)
    
    # Extraire pr√©dictions sur test set
    prophet_test_pred = prophet_forecast.iloc[-len(test):]['yhat'].values
    
    # Visualisation Prophet
    fig1 = prophet_model.plot(prophet_forecast, figsize=(14, 6))
    plt.title('Pr√©dictions Prophet', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
    
    # Composantes (tendance + saisonnalit√©)
    fig2 = prophet_model.plot_components(prophet_forecast, figsize=(14, 8))
    plt.tight_layout()
    plt.show()
    
    # M√©triques
    prophet_metrics = evaluate_forecast(test.values, prophet_test_pred, 'Prophet')

## 6. Comparaison des Mod√®les

In [None]:
# Tableau comparatif
comparison = pd.DataFrame({
    f'ARIMA{best_params}': arima_metrics,
    'SARIMA': sarima_metrics,
})

if PROPHET_AVAILABLE:
    comparison['Prophet'] = prophet_metrics

print("\n=== Comparaison des Mod√®les ===")
print(comparison.T)

# Meilleur mod√®le (RMSE)
best_model_name = comparison.loc['RMSE'].idxmin()
print(f"\nüèÜ Meilleur mod√®le (RMSE): {best_model_name}")

In [None]:
# Visualisation comparative
plt.figure(figsize=(14, 7))
plt.plot(train.index, train, label='Train', color='blue', alpha=0.7)
plt.plot(test.index, test, label='Test (R√©el)', color='green', linewidth=2)
plt.plot(test.index, forecast, label=f'ARIMA{best_params}', 
         color='red', linestyle='--', alpha=0.8)
plt.plot(test.index, sarima_forecast, label='SARIMA', 
         color='purple', linestyle='--', alpha=0.8)

if PROPHET_AVAILABLE:
    plt.plot(test.index, prophet_test_pred, label='Prophet', 
             color='orange', linestyle='--', alpha=0.8)

plt.axvline(train.index[-1], color='black', linestyle=':', linewidth=2, label='Train/Test Split')
plt.title('Comparaison des Pr√©dictions - ARIMA vs SARIMA vs Prophet', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Valeur')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 7. Forecasting Multi-Horizons

In [None]:
# Pr√©dire sur plusieurs horizons (7, 30, 90 jours)
horizons = [7, 30, 90, 180]

fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for idx, horizon in enumerate(horizons):
    # ARIMA forecast
    forecast_h = best_model.forecast(steps=horizon)
    
    # Plot
    ax = axes[idx]
    ax.plot(train.index[-180:], train.values[-180:], label='Train (derniers 180 jours)', color='blue')
    
    # Dates futures
    future_dates = pd.date_range(start=train.index[-1] + timedelta(days=1), periods=horizon, freq='D')
    ax.plot(future_dates, forecast_h, label=f'Pr√©diction {horizon}j', color='red', linestyle='--', marker='o')
    
    ax.axvline(train.index[-1], color='black', linestyle=':', label='D√©but Forecast')
    ax.set_title(f'Horizon: {horizon} jours', fontsize=12, fontweight='bold')
    ax.set_xlabel('Date')
    ax.set_ylabel('Valeur')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Conclusion

Dans ce notebook, nous avons explor√© :

1. **Analyse exploratoire** : d√©composition, stationnarit√©, ACF/PACF
2. **ARIMA** : mod√®le classique pour s√©ries stationnaires
3. **SARIMA** : extension avec saisonnalit√©
4. **Prophet** : framework robuste de Facebook pour forecasting
5. **√âvaluation** : MAE, RMSE, MAPE
6. **Comparaison** : analyse des performances de chaque mod√®le

**Points cl√©s :**
- Toujours tester la stationnarit√© avant ARIMA
- Utiliser ACF/PACF pour guider la s√©lection de (p, q)
- SARIMA pour saisonnalit√©s claires
- Prophet excellent pour donn√©es r√©elles avec outliers et manquantes
- Valider avec Time Series Split (pas de CV classique)

**Prochaine √©tape** : Deep Learning pour s√©ries temporelles (LSTM, GRU)