# PIR - LSTM n°3 Hyperparameters Optimization

This notebook is used for hyperparameter optimization for the model implemented in LSTMn°3.ipynb

We aim to optimize the LSTM model's hyperparameters by using data up to June 2022 for training and validation, and the remaining part of 2022 as the test set.

## 1: Imports

In [None]:
# Custom utility functions used in the project
import utils

# Disable GPU to avoid TensorFlow conflicts
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

# Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf

import sklearn
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_percentage_error

from keras.models import Sequential
from keras.layers import Input, LSTM, Dense
from keras.optimizers import Adam
from sklearn.metrics import mean_absolute_percentage_error

2026-01-10 19:19:43.718130: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-10 19:19:43.758821: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-01-10 19:19:46.563444: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


The hyperparameters optimization is performed using Optuna's TPESampler, which provides an efficient and scalable alternative to grid and random search.

In [2]:
# imports
import optuna
import tqdm
from optuna.samplers import TPESampler

  from .autonotebook import tqdm as notebook_tqdm


---

## 2: Hyperparameters Optimization

In [3]:
# Load the data
x = pd.read_csv('train_f_x.csv')
y = pd.read_csv('y_train_sncf.csv')

In [None]:
# Data preparation
df_per_station_train, df_per_station_test = utils.prepare_train_data(x, y)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['station_id'] = data['station'].map(station_mapping)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['station_id'] = data['station'].map(station_mapping)


Hyperparameter tuning is conducted on a subset of 30 stations (out of 439) to significantly reduce runtime, under the assumption that optimal hyperparameters generalize across stations.

In [None]:
# Sampling
sample_size = 30
seed = 50
stations_sample = utils.sample_stations(df_per_station_train, n=sample_size, seed=seed)

### 2.1: First attempt

The first optimization attempt included the choice of the activation function between 'tanh' and 'relu'. However, the use of ReLU activation later led to gradient explosion issues, which motivated us to restrict the activation function to 'tanh' in subsequent experiment.

In [None]:
def objective(trial):
    # Hyperparameters Grid
    seq_len = trial.suggest_int("seq_len", 30, 120, step=15)
    units = trial.suggest_int("units", 20, 128)
    lr = trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
    activation = trial.suggest_categorical("activation", ["tanh", "relu"])
    
    total_error = 0 # Initialization

    for name_station in stations_sample:
        try:
            df_train = df_per_station_train[name_station]
            df_test = df_per_station_test[name_station]

            X_train = df_train[['job', 'ferie', 'vacances']]
            y_train = df_train['y']
            X_test = df_test[['job', 'ferie', 'vacances']]
            y_test = df_test['y']

            # Scaling
            scaler_X = MinMaxScaler()
            scaler_y = MinMaxScaler()
            X_train_scaled = scaler_X.fit_transform(X_train)
            y_train_scaled = scaler_y.fit_transform(y_train.values.reshape(-1, 1))
            X_test_scaled = scaler_X.transform(X_test)

            # Training sequences
            X_train_seq, y_train_seq = utils.create_sequences_random(
                pd.DataFrame(X_train_scaled),
                pd.DataFrame(y_train_scaled),
                seq_len
            )

            # Test_sequences
            X_test_full = np.vstack([X_train_scaled[-seq_len:], X_test_scaled])
            X_test_seq = np.array([
                X_test_full[i:i+seq_len]
                for i in range(len(X_test))
            ]) # Parenthèse fermante ajoutée ici

            # Model definition
            model = Sequential([
                LSTM(units=units, activation=activation, input_shape=(seq_len, X_train.shape[1])),
                Dense(1)
            ])
            model.compile(optimizer=Adam(learning_rate=lr), loss='mse')

            # Short training for hyperparameter optimization
            model.fit(X_train_seq, y_train_seq, epochs=10, batch_size=batch_size, verbose=0)

            # Prediction and error computation
            y_pred_scaled = model.predict(X_test_seq, verbose=0)
            y_pred = scaler_y.inverse_transform(y_pred_scaled)
            
            rmse = np.sqrt(mean_absolute_percentage_error(y_test, y_pred))
            total_error += rmse
            
        except Exception as e:
            print(f"Error for station {name_station}: {e}")
            continue

    return total_error / len(stations_sample)

[I 2026-01-10 19:24:05,580] A new study created in memory with name: no-name-02f302f7-d30f-42f2-b4f7-f0054dab2895
  super().__init__(**kwargs)
  super().__init__(**kwargs)




  super().__init__(**kwargs)




  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
Best trial: 0. Best value: 0.942126:   3%|▎         | 1/30 [05:06<2:28:05, 306.39s/it]

[I 2026-01-10 19:29:11,972] Trial 0 finished with value: 0.9421255748370522 and parameters: {'seq_len': 60, 'units': 123, 'learning_rate': 0.0029106359131330704, 'batch_size': 16, 'activation': 'relu'}. Best is trial 0 with value: 0.9421255748370522.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:35:31,411] Trial 1 finished with value: 1.1187995223556242 and parameters: {'seq_len': 90, 'units': 97, 'learning_rate': 0.00010994335574766199, 'batch_size': 16, 'activation': 'relu'}. Best is trial 0 with value: 0.9421255748370522.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:38:09,911] Trial 2 finished with value: 1.045903920675061 and parameters: {'seq_len': 60, 'units': 77, 'learning_rate': 0.0007309539835912913, 'batch_size': 32, 'activation': 'relu'}. Best is trial 0 with value: 0.9421255748370522.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:41:55,283] Trial 3 finished with value: 1.0453459654440935 and parameters: {'seq_len': 75, 'units': 105, 'learning_rate': 0.00025081156860452336, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 0 with value: 0.9421255748370522.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:44:51,874] Trial 4 finished with value: 0.887891697543871 and parameters: {'seq_len': 30, 'units': 123, 'learning_rate': 0.00853618986286683, 'batch_size': 16, 'activation': 'tanh'}. Best is trial 4 with value: 0.887891697543871.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:47:08,023] Trial 5 finished with value: 1.1120116324992453 and parameters: {'seq_len': 30, 'units': 73, 'learning_rate': 0.00011715937392307068, 'batch_size': 16, 'activation': 'relu'}. Best is trial 4 with value: 0.887891697543871.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station 4PO: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:49:34,054] Trial 6 finished with value: 0.8811674938054782 and parameters: {'seq_len': 75, 'units': 40, 'learning_rate': 0.00869299151113955, 'batch_size': 32, 'activation': 'relu'}. Best is trial 6 with value: 0.8811674938054782.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:50:52,947] Trial 7 finished with value: 1.2092592281048673 and parameters: {'seq_len': 30, 'units': 41, 'learning_rate': 0.00012315571723666037, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 6 with value: 0.8811674938054782.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:52:28,926] Trial 8 finished with value: 1.1975592177856502 and parameters: {'seq_len': 45, 'units': 79, 'learning_rate': 0.00019135880487692312, 'batch_size': 64, 'activation': 'tanh'}. Best is trial 6 with value: 0.8811674938054782.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:54:22,792] Trial 9 finished with value: 0.8573765398983191 and parameters: {'seq_len': 30, 'units': 108, 'learning_rate': 0.002592475660475161, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:56:21,316] Trial 10 finished with value: 1.0481295081923623 and parameters: {'seq_len': 120, 'units': 20, 'learning_rate': 0.0019182740262525914, 'batch_size': 64, 'activation': 'tanh'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station Z76: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station O5P: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station 9IN: Input contains NaN.


  super().__init__(**kwargs)


Erreur station FK3: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 19:59:35,368] Trial 11 finished with value: 0.864229889585404 and parameters: {'seq_len': 105, 'units': 48, 'learning_rate': 0.009375135398100842, 'batch_size': 32, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station FK3: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station H1M: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:03:37,722] Trial 12 finished with value: 0.9049992742109894 and parameters: {'seq_len': 120, 'units': 58, 'learning_rate': 0.003949511766981021, 'batch_size': 32, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:07:11,968] Trial 13 finished with value: 0.9709983069080469 and parameters: {'seq_len': 105, 'units': 54, 'learning_rate': 0.0009140759296807803, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station 4PO: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station AZV: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:11:26,672] Trial 14 finished with value: 0.8909784761126527 and parameters: {'seq_len': 90, 'units': 99, 'learning_rate': 0.0050675368554089625, 'batch_size': 32, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:14:00,983] Trial 15 finished with value: 0.9866942411000993 and parameters: {'seq_len': 105, 'units': 20, 'learning_rate': 0.0018572147968243416, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:15:42,895] Trial 16 finished with value: 1.1167019425418983 and parameters: {'seq_len': 60, 'units': 60, 'learning_rate': 0.0004502794484217855, 'batch_size': 64, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:18:48,167] Trial 17 finished with value: 0.9485998639428864 and parameters: {'seq_len': 105, 'units': 38, 'learning_rate': 0.0017585973231673874, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:21:20,458] Trial 18 finished with value: 0.9250176186336487 and parameters: {'seq_len': 45, 'units': 112, 'learning_rate': 0.005489884890147933, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:24:11,198] Trial 19 finished with value: 0.9894206436413073 and parameters: {'seq_len': 90, 'units': 87, 'learning_rate': 0.003008679905667747, 'batch_size': 64, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:26:11,644] Trial 20 finished with value: 0.9859716348667315 and parameters: {'seq_len': 45, 'units': 67, 'learning_rate': 0.009897665586885443, 'batch_size': 32, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:28:43,296] Trial 21 finished with value: 0.9535107954047446 and parameters: {'seq_len': 75, 'units': 41, 'learning_rate': 0.007090806649144605, 'batch_size': 32, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:31:01,488] Trial 22 finished with value: 0.9338235654203529 and parameters: {'seq_len': 75, 'units': 33, 'learning_rate': 0.005265764426677396, 'batch_size': 32, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:34:09,219] Trial 23 finished with value: 1.000272759011749 and parameters: {'seq_len': 90, 'units': 49, 'learning_rate': 0.003485644548553413, 'batch_size': 32, 'activation': 'relu'}. Best is trial 9 with value: 0.8573765398983191.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station DMX: Input contains NaN.


  super().__init__(**kwargs)


Erreur station 94K: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station FK3: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)


Erreur station AZV: Input contains NaN.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:37:19,107] Trial 24 finished with value: 0.7504421447393268 and parameters: {'seq_len': 120, 'units': 32, 'learning_rate': 0.009866823292399145, 'batch_size': 32, 'activation': 'relu'}. Best is trial 24 with value: 0.7504421447393268.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:40:27,229] Trial 25 finished with value: 0.9330791184045873 and parameters: {'seq_len': 120, 'units': 32, 'learning_rate': 0.0015241767831214777, 'batch_size': 32, 'activation': 'relu'}. Best is trial 24 with value: 0.7504421447393268.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:43:59,611] Trial 26 finished with value: 0.8632311773220852 and parameters: {'seq_len': 105, 'units': 49, 'learning_rate': 0.006260320009195335, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 24 with value: 0.7504421447393268.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:47:05,632] Trial 27 finished with value: 0.9229687447310729 and parameters: {'seq_len': 120, 'units': 28, 'learning_rate': 0.002527066432305605, 'batch_size': 32, 'activation': 'tanh'}. Best is trial 24 with value: 0.7504421447393268.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)


[I 2026-01-10 20:50:17,634] Trial 28 finished with value: 0.9482492738799326 and parameters: {'seq_len': 105, 'units': 90, 'learning_rate': 0.005728484654098207, 'batch_size': 64, 'activation': 'tanh'}. Best is trial 24 with value: 0.7504421447393268.


  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
  super().__init__(**kwargs)
Best trial: 24. Best value: 0.750442: 100%|██████████| 30/30 [1:31:31<00:00, 183.06s/it]

[I 2026-01-10 20:55:37,401] Trial 29 finished with value: 0.9671746971594827 and parameters: {'seq_len': 60, 'units': 125, 'learning_rate': 0.001216147966541517, 'batch_size': 16, 'activation': 'tanh'}. Best is trial 24 with value: 0.7504421447393268.
Meilleurs paramètres : {'seq_len': 120, 'units': 32, 'learning_rate': 0.009866823292399145, 'batch_size': 32, 'activation': 'relu'}





In [None]:
# Execution
run = 0
if run == 1:
    study = optuna.create_study(direction="minimize", sampler=TPESampler(seed=42))
    study.optimize(objective, n_trials=30, show_progress_bar=True)
else:
    print("run == 0")

In [None]:
print("Best hyperparameters :", study.best_params)

Meilleurs paramètres : {'seq_len': 120, 'units': 32, 'learning_rate': 0.009866823292399145, 'batch_size': 32, 'activation': 'relu'}


### 2.2: Second attempt

Second hyperparameter optimization attempt unsing only the tanh activation function.

In [None]:
def objective(trial):
    # Hyperparameters Grid
    seq_len = trial.suggest_int("seq_len", 70, 120, step=15)
    units = trial.suggest_int("units", 20, 128)
    learning_rate = trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True)
    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])

    total_error = 0 # Initialization

    for name_station in stations_sample:
        try:
            df_train = df_per_station_train[name_station]
            df_test = df_per_station_test[name_station]

            X_train = df_train[['job', 'ferie', 'vacances']]
            y_train = df_train['y']
            X_test = df_test[['job', 'ferie', 'vacances']]
            y_test = df_test['y']

            # Scaling
            scaler_X = MinMaxScaler()
            scaler_y = MinMaxScaler()
            X_train_scaled = scaler_X.fit_transform(X_train)
            y_train_scaled = scaler_y.fit_transform(y_train.values.reshape(-1, 1))
            X_test_scaled = scaler_X.transform(X_test)

            # Training sequences
            X_train_seq, y_train_seq = utils.create_sequences_random(
                pd.DataFrame(X_train_scaled),
                pd.DataFrame(y_train_scaled),
                seq_len
            )

            # Test sequences
            X_test_full = np.vstack([X_train_scaled[-seq_len:], X_test_scaled])
            X_test_seq = np.array([
                X_test_full[i:i+seq_len]
                for i in range(len(X_test))
            ])

            # Model definition
            model = Sequential([
                Input(shape=(seq_len, X_train.shape[1])),
                LSTM(units=units, activation='tanh'),
                Dense(1)
            ])
            model.compile(optimizer=Adam(learning_rate=learning_rate), loss='mse')

            # Short training for hyperparameter optimization
            model.fit(X_train_seq, y_train_seq, epochs=10, batch_size=batch_size, verbose=0)

            # Prediction and error computation
            y_pred_scaled = model.predict(X_test_seq, verbose=0)
            y_pred = scaler_y.inverse_transform(y_pred_scaled)
            
            rmse = np.sqrt(mean_absolute_percentage_error(y_test, y_pred))
            total_error += rmse
            
        except Exception as e:
            print(f"Error for station {name_station}: {e}")
            continue

    return total_error / len(stations_sample)

[I 2026-01-10 21:36:57,874] A new study created in memory with name: no-name-ce056424-157f-4764-ae12-1d9b6c954f2b


[I 2026-01-10 21:43:58,071] Trial 0 finished with value: 0.9315012836572233 and parameters: {'seq_len': 85, 'units': 123, 'learning_rate': 0.0029106359131330704, 'batch_size': 16}. Best is trial 0 with value: 0.9315012836572233.




[I 2026-01-10 21:47:04,823] Trial 1 finished with value: 0.9615602158418163 and parameters: {'seq_len': 70, 'units': 114, 'learning_rate': 0.0015930522616241021, 'batch_size': 64}. Best is trial 0 with value: 0.9315012836572233.




[I 2026-01-10 21:49:31,994] Trial 2 finished with value: 1.2335605563603198 and parameters: {'seq_len': 115, 'units': 43, 'learning_rate': 0.0002310201887845295, 'batch_size': 64}. Best is trial 0 with value: 0.9315012836572233.




[I 2026-01-10 21:51:39,186] Trial 3 finished with value: 0.9074125672829692 and parameters: {'seq_len': 85, 'units': 51, 'learning_rate': 0.0016738085788752138, 'batch_size': 64}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 21:56:07,622] Trial 4 finished with value: 1.0245908193717899 and parameters: {'seq_len': 85, 'units': 105, 'learning_rate': 0.00025081156860452336, 'batch_size': 32}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 21:59:15,244] Trial 5 finished with value: 1.2161192715568458 and parameters: {'seq_len': 100, 'units': 38, 'learning_rate': 0.00013492834268013249, 'batch_size': 32}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 22:01:01,386] Trial 6 finished with value: 0.9718310570679785 and parameters: {'seq_len': 85, 'units': 30, 'learning_rate': 0.0023359635026261607, 'batch_size': 64}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 22:07:08,119] Trial 7 finished with value: 0.9392081081835582 and parameters: {'seq_len': 70, 'units': 119, 'learning_rate': 0.00032927591344236165, 'batch_size': 16}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 22:10:18,196] Trial 8 finished with value: 0.9671408617797503 and parameters: {'seq_len': 100, 'units': 40, 'learning_rate': 0.00869299151113955, 'batch_size': 32}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 22:14:13,664] Trial 9 finished with value: 1.2189532518073742 and parameters: {'seq_len': 100, 'units': 120, 'learning_rate': 0.00015030900645056822, 'batch_size': 64}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 22:17:07,552] Trial 10 finished with value: 1.0629371098650802 and parameters: {'seq_len': 115, 'units': 69, 'learning_rate': 0.000724359894840497, 'batch_size': 64}. Best is trial 3 with value: 0.9074125672829692.




[I 2026-01-10 22:22:18,760] Trial 11 finished with value: 0.8691048082951898 and parameters: {'seq_len': 85, 'units': 76, 'learning_rate': 0.004546404153956352, 'batch_size': 16}. Best is trial 11 with value: 0.8691048082951898.


In [None]:
# Execution
run = 0
if run == 1:
    study = optuna.create_study(direction="minimize", sampler=TPESampler(seed=42))
    study.optimize(objective, n_trials=30, show_progress_bar=True)
else:
    print("run == 0")

In [None]:
print("Best hyperparameters :", study.best_params)