<h3 style="text-align: center; font-family: Arial, sans-serif; color: #4CAF50;">Modèles pour la Prédiction des Pannes (Modèle 1)</h3>
<ul style="font-family: Arial, sans-serif; font-size: 12pt; color: #333;">
    <li><strong>Réseaux de neurones récurrents (RNN) :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Notamment <strong>LSTM</strong> (Long Short-Term Memory) et <strong>GRU</strong> (Gated Recurrent Units), pour capturer les dépendances temporelles.</li>
        </ul>
    </li>
    <li><strong>Temporal Fusion Transformer (TFT) :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Un modèle avancé adapté aux séries temporelles multivariées, capable de gérer les variables continues et catégoriques tout en capturant les relations complexes.</li>
        </ul>
    </li>
    <li><strong>Gradient Boosting (XGBoost, LightGBM, CatBoost) :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Pour exploiter les relations non linéaires et les interactions entre les variables.</li>
        </ul>
    </li>
    <li><strong>Forêts aléatoires (Random Forest) :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Pour une approche robuste et interprétable de la classification.</li>
        </ul>
    </li>
    <li><strong>Support Vector Machines (SVM) :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Utile pour des prédictions précises dans des contextes bien définis, mais nécessite un prétraitement rigoureux.</li>
        </ul>
    </li>
    <li><strong>Autoencoders :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Pour détecter les anomalies en apprenant la représentation normale des données.</li>
        </ul>
    </li>
</ul>
</ul>


<h3 style="text-align: center; font-family: Arial, sans-serif; color: #4CAF50;">Observations sur les Approches Modélisées</h3>
<ul style="font-family: Arial, sans-serif; font-size: 12pt; color: #333;">
    <li><strong>Observation Unique :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Chaque ligne du dataset (une observation avec ses features) est traitée séparément.</li>
            <li>Le modèle n'a pas de notion de dépendance temporelle entre les observations.</li>
        </ul>
    </li>
    <li><strong>Fenêtre Temporelle (non utilisée dans Random Forest classique) :</strong>
        <ul style="font-size: 11pt; color: #555;">
            <li>Les fenêtres temporelles sont couramment utilisées dans des modèles spécifiques aux séries temporelles, comme :</li>
            <ul style="font-size: 11pt; color: #555;">
                <li><strong>LSTM</strong> / <strong>GRU</strong> (réseaux récurrents).</li>
                <li><strong>TFT</strong> (Temporal Fusion Transformer).</li>
                <li><strong>ARIMA, SARIMA, Prophet</strong>, etc.</li>
            </ul>
            <li>Dans ce cas, les observations environnantes sont prises en compte pour capturer les relations temporelles.</li>
        </ul>
    </li>
</ul>


In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Input
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2

In [12]:
# Charger les données
df = pd.read_csv("../Datasources/MetroPT3_imputed_final.csv", delimiter=",", decimal=".", index_col=0)
df.reset_index(drop=True, inplace=True)

In [None]:
display(df)

<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>################################################################################################</strong>    
<strong>###############################  DÉCLARATION / INITIALISATION  ###############################</strong>
<strong>################################################################################################</strong>    
</ul>

In [13]:
# Convertir timestamp
df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
#display(df.dtypes)

In [14]:
continuous_features = ["TP2", "DV_pressure", "Oil_temperature", "Motor_current", "Reservoirs"]
categorical_features = ["COMP", "DV_eletric", "Towers", "LPS", "Pressure_switch", "Oil_level", "Caudal_impulses"]

In [15]:
# Conserver uniquement les colonnes continues, catégorielles et 'timestamp'
columns_to_keep = ["timestamp", "panne"] + continuous_features + categorical_features
df = df[columns_to_keep]

In [7]:
display(df)

Unnamed: 0,timestamp,panne,TP2,DV_pressure,Oil_temperature,Motor_current,Reservoirs,COMP,DV_eletric,Towers,LPS,Pressure_switch,Oil_level,Caudal_impulses
0,2020-02-01 00:00:00,0,-0.012,-0.024,53.600,0.0400,9.358,1.0,0.0,1.0,0.0,1.0,1.0,1.0
1,2020-02-01 00:00:10,0,-0.014,-0.022,53.675,0.0400,9.348,1.0,0.0,1.0,0.0,1.0,1.0,1.0
2,2020-02-01 00:00:20,0,-0.012,-0.022,53.600,0.0425,9.338,1.0,0.0,1.0,0.0,1.0,1.0,1.0
3,2020-02-01 00:00:30,0,-0.012,-0.022,53.425,0.0400,9.328,1.0,0.0,1.0,0.0,1.0,1.0,1.0
4,2020-02-01 00:00:40,0,-0.012,-0.022,53.475,0.0400,9.318,1.0,0.0,1.0,0.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1841755,2020-09-01 03:59:10,0,-0.014,-0.022,59.675,0.0425,8.918,1.0,0.0,1.0,0.0,1.0,1.0,1.0
1841756,2020-09-01 03:59:20,0,-0.014,-0.020,59.600,0.0450,8.904,1.0,0.0,1.0,0.0,1.0,1.0,1.0
1841757,2020-09-01 03:59:30,0,-0.014,-0.022,59.600,0.0425,8.892,1.0,0.0,1.0,0.0,1.0,1.0,1.0
1841758,2020-09-01 03:59:40,0,-0.012,-0.022,59.550,0.0450,8.878,1.0,0.0,1.0,0.0,1.0,1.0,1.0



<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>################################################################################################</strong>    
<strong>######################################  FIN DÉCLARATION  ######################################</strong>
<strong>################################################################################################</strong>    
</ul>

In [None]:
# Filtrer les colonnes sans "_is_missing"
columns_without_is_missing = [col for col in df.columns if not col.endswith('_is_missing')]

In [None]:
# Colonnes marquant les valeurs manquantes (is_missing)
cols_is_missing = [col + '_is_missing' for col in ['TP2', 'TP3', 'H1', 'DV_pressure', 'Reservoirs', 
                                                   'Oil_temperature', 'Motor_current', 'COMP', 'DV_eletric', 
                                                   'Towers', 'MPG', 'LPS', 'Pressure_switch', 'Oil_level', 
                                                   'Caudal_impulses']]

In [None]:
# Convertir les colonnes catégoriques en type category
for col in categorical_features:
    df[col] = df[col].astype("category")


In [None]:
pd.set_option('display.max_columns', None)
display(df.head(5))

In [None]:
# Statistiques descriptives globales (exclure les colonnes `_is_missing`)
pd.options.display.float_format = '{:,.2f}'.format

# Statistiques descriptives globales
print("Statistiques descriptives :")
display(df[columns_without_is_missing].describe())

# Statistiques pour les colonnes spécifiques (exclure les colonnes `_is_missing`)
# Statistiques pour les colonnes spécifiques
print("\nNombre de valeurs manquantes par colonne :")
missing_counts = df.isnull().sum()
missing_percent = (df.isnull().mean() * 100).map("{:,.2f}%".format)
missing_stats = pd.DataFrame({
    'Valeurs manquantes': missing_counts.map("{:,}".format),
    'Pourcentage manquant': missing_percent
})

# Afficher les statistiques
display(missing_stats)


<ul style="font-family: times, serif; font-size:14pt; color:blue;">
<strong>MODELES DE SERIES TEMPORELLES : LSTM  (LONG SHORT-TERM MEMORY))</strong>
</ul>

<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>########################################################################</strong>    
<strong>################################ TEST 1  ################################</strong>
<strong>########################################################################</strong>    
</ul>

In [114]:
###################################################################
################ Train : Panne1 & Panne2 & Panne3  ################
################ Test  : Panne4           #########################
###################################################################


# Colonnes continues et catégoriques
continuous_features  = ["TP2", "DV_pressure", "Oil_temperature", "Motor_current", "Reservoirs"]
categorical_features = ["COMP", "DV_eletric", "Towers", "LPS", "Pressure_switch", "Oil_level", "Caudal_impulses"]
target = "panne"

# Normaliser les colonnes continues
scaler = MinMaxScaler()
df[continuous_features] = scaler.fit_transform(df[continuous_features])

# Définir les périodes d'entraînement et de test
train_periods = [{'start': '2020-02-01 00:00:00', 'end': '2020-06-07 14:30:00'}]
test_periods  = [{'start': '2020-06-07 14:30:10', 'end': '2020-09-01 03:59:50'}]

# Définir les indices pour les périodes d'entraînement
start_train = pd.Timestamp(train_periods[0]['start'])
end_train   = pd.Timestamp(train_periods[0]['end'])
train_indices = df[(df['timestamp'] >= start_train) & (df['timestamp'] <= end_train)].index.tolist()

# Définir les indices pour les périodes de test
start_test = pd.Timestamp(test_periods[0]['start'])
end_test   = pd.Timestamp(test_periods[0]['end'])
test_indices = df[(df['timestamp'] >= start_test) & (df['timestamp'] <= end_test)].index.tolist()

# Préparation des ensembles d'entraînement et de test
X_train = df.loc[train_indices].drop(columns=['timestamp', 'panne']).values
y_train = df.loc[train_indices, 'panne'].values

X_test = df.loc[test_indices].drop(columns=['timestamp', 'panne']).values
y_test = df.loc[test_indices, 'panne'].values

# Fonction pour créer des séquences
def create_sequences(X, y, sequence_length=30):
    X_seq, y_seq = [], []
    for i in range(len(X) - sequence_length):
        X_seq.append(X[i:i + sequence_length])
        y_seq.append(y[i + sequence_length])
    return np.array(X_seq), np.array(y_seq)

# Séquences pour l'ensemble d'entraînement et de test
sequence_length = 30
X_train_seq, y_train_seq = create_sequences(X_train, y_train, sequence_length)
X_test_seq, y_test_seq = create_sequences(X_test, y_test, sequence_length)


###################### Resumé du dataset ###########################
display(df.loc[train_indices].drop(columns=['timestamp', 'panne']).columns)

# Calculer et afficher les durées
for i, period in enumerate(train_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Train {i} : {duration}")

for i, period in enumerate(test_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Test  {i} : {duration}")    
print("---------------------------------")    
# Distribution des modalités
values, counts = np.unique(y_train, return_counts=True)
print("Distribution des modalités dans y_train :")
for value, count in zip(values, counts):
    print(f"Modalité {value} : {count} observations")
####################################################################

Index(['TP2', 'DV_pressure', 'Oil_temperature', 'Motor_current', 'Reservoirs',
       'COMP', 'DV_eletric', 'Towers', 'LPS', 'Pressure_switch', 'Oil_level',
       'Caudal_impulses'],
      dtype='object')

Période Train 1 : 127 days, 14:30:00
Période Test  1 : 85 days, 13:29:40
---------------------------------
Distribution des modalités dans y_train :
Modalité 0 : 1072354 observations
Modalité 1 : 29877 observations
Modalité 2 : 270 observations


In [32]:
# Construction du modèle LSTM
model = Sequential([
    LSTM(128, return_sequences=True, input_shape=(X_train_seq.shape[1], X_train_seq.shape[2])),
    Dropout(0.2),
    LSTM(64),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(3, activation='softmax')  # 3 classes : 0 (Pas de panne), 1 (En panne), 2 (Avertissement)
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Callback d'arrêt anticipé
early_stopping = EarlyStopping(
    monitor='val_loss',    # Surveiller la perte de validation
    patience=5,            # Nombre d'époques sans amélioration avant d'arrêter
    restore_best_weights=True # Restaurer les poids de la meilleure époque
)

# Entraînement du modèle avec arrêt anticipé
history = model.fit(
    X_train_seq, y_train_seq,
    validation_data=(X_test_seq, y_test_seq),
    epochs=50,
    batch_size=64,
    callbacks=[early_stopping]
)

# Évaluation du modèle
loss, accuracy = model.evaluate(X_test_seq, y_test_seq)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Sauvegarde du modèle
model.save("lstm_panne_model_1.h5")
model.save("lstm_panne_model_1.keras")

print("Modèle LSTM entraîné et sauvegardé avec succès.")


  super().__init__(**kwargs)


Epoch 1/50
[1m17227/17227[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m882s[0m 51ms/step - accuracy: 0.9732 - loss: 0.0558 - val_accuracy: 0.9977 - val_loss: 0.0115
Epoch 2/50
[1m17227/17227[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m861s[0m 50ms/step - accuracy: 0.9783 - loss: 0.0456 - val_accuracy: 0.9968 - val_loss: 0.0220
Epoch 3/50
[1m17227/17227[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m862s[0m 50ms/step - accuracy: 0.9832 - loss: 0.0397 - val_accuracy: 0.9977 - val_loss: 0.0111
Epoch 4/50
[1m17227/17227[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m858s[0m 50ms/step - accuracy: 0.9836 - loss: 0.0386 - val_accuracy: 0.9970 - val_loss: 0.0230
Epoch 5/50
[1m17227/17227[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m850s[0m 49ms/step - accuracy: 0.9875 - loss: 0.0334 - val_accuracy: 0.9972 - val_loss: 0.0200
Epoch 6/50
[1m17227/17227[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m862s[0m 49ms/step - accuracy: 0.9845 - loss: 0.0372 - val_accuracy: 0.9979 - val



ValueError: Invalid filepath extension for saving. Please add either a `.keras` extension for the native Keras format (recommended) or a `.h5` extension. Use `model.export(filepath)` if you want to export a SavedModel for use with TFLite/TFServing/etc. Received: filepath=lstm_panne_saved_model.

In [37]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred = np.argmax(model.predict(X_test_seq), axis=1)
print(confusion_matrix(y_test_seq, y_pred))
print(classification_report(y_test_seq, y_pred))


[1m23101/23101[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m266s[0m 11ms/step
[[737518      0      0]
 [  1621      0      0]
 [    90      0      0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

           0       1.00      1.00      1.00    737518
           1       0.00      0.00      0.00      1621
           2       0.00      0.00      0.00        90

    accuracy                           1.00    739229
   macro avg       0.33      0.33      0.33    739229
weighted avg       1.00      1.00      1.00    739229



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>########################################################################</strong>    
<strong>################################ TEST 2  ################################</strong>
<strong>########################################################################</strong>    
</ul>

In [113]:
###################################################################
################ Train : Panne2                    ################
################ Test  : Panne3           #########################
###################################################################


# Colonnes continues et catégoriques
continuous_features  = ["TP2", "DV_pressure", "Oil_temperature", "Motor_current", "Reservoirs"]
categorical_features = ["COMP", "DV_eletric", "Towers", "LPS", "Pressure_switch", "Oil_level", "Caudal_impulses"]
target = "panne"

# Normaliser les colonnes continues
scaler = MinMaxScaler()
df[continuous_features] = scaler.fit_transform(df[continuous_features])

# Définir les périodes d'entraînement et de test
train_periods = [{'start': '2020-05-29 22:30:00', 'end': '2020-05-30 07:00:00'}]
test_periods = [{'start': '2020-06-05 06:00:00', 'end': '2020-06-07 19:30:00'}]

# Définir les indices pour les périodes d'entraînement
start_train = pd.Timestamp(train_periods[0]['start'])
end_train = pd.Timestamp(train_periods[0]['end'])
train_indices = df[(df['timestamp'] >= start_train) & (df['timestamp'] <= end_train)].index.tolist()

# Définir les indices pour les périodes de test
start_test = pd.Timestamp(test_periods[0]['start'])
end_test   = pd.Timestamp(test_periods[0]['end'])
test_indices = df[(df['timestamp'] >= start_test) & (df['timestamp'] <= end_test)].index.tolist()

# Préparation des ensembles d'entraînement et de test
X_train = df.loc[train_indices].drop(columns=['timestamp', 'panne']).values
y_train = df.loc[train_indices, 'panne'].values

X_test = df.loc[test_indices].drop(columns=['timestamp', 'panne']).values
y_test = df.loc[test_indices, 'panne'].values

# Fonction pour créer des séquences
def create_sequences(X, y, sequence_length=30):
    X_seq, y_seq = [], []
    for i in range(len(X) - sequence_length):
        X_seq.append(X[i:i + sequence_length])
        y_seq.append(y[i + sequence_length])
    return np.array(X_seq), np.array(y_seq)

# Séquences pour l'ensemble d'entraînement et de test
sequence_length = 30
X_train_seq, y_train_seq = create_sequences(X_train, y_train, sequence_length)
X_test_seq, y_test_seq = create_sequences(X_test, y_test, sequence_length)

###################### Resumé du dataset ###########################
display(df.loc[train_indices].drop(columns=['timestamp', 'panne']).columns)

# Calculer et afficher les durées
for i, period in enumerate(train_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Train {i} : {duration}")

for i, period in enumerate(test_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Test  {i} : {duration}")    
print("---------------------------------")    
# Distribution des modalités
values, counts = np.unique(y_train, return_counts=True)
print("Distribution des modalités dans y_train :")
for value, count in zip(values, counts):
    print(f"Modalité {value} : {count} observations")
####################################################################

Index(['TP2', 'DV_pressure', 'Oil_temperature', 'Motor_current', 'Reservoirs',
       'COMP', 'DV_eletric', 'Towers', 'LPS', 'Pressure_switch', 'Oil_level',
       'Caudal_impulses'],
      dtype='object')

Période Train 1 : 8:30:00
Période Test  1 : 2 days, 13:30:00
---------------------------------
Distribution des modalités dans y_train :
Modalité 0 : 630 observations
Modalité 1 : 2341 observations
Modalité 2 : 90 observations


In [40]:
# Construction du modèle LSTM
model = Sequential([
    LSTM(128, return_sequences=True, input_shape=(X_train_seq.shape[1], X_train_seq.shape[2])),
    Dropout(0.2),
    LSTM(64),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(3, activation='softmax')  # 3 classes : 0 (Pas de panne), 1 (En panne), 2 (Avertissement)
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Callback d'arrêt anticipé
early_stopping = EarlyStopping(
    monitor='val_loss',    # Surveiller la perte de validation
    patience=5,            # Nombre d'époques sans amélioration avant d'arrêter
    restore_best_weights=True # Restaurer les poids de la meilleure époque
)

# Entraînement du modèle avec arrêt anticipé
history = model.fit(
    X_train_seq, y_train_seq,
    validation_data=(X_test_seq, y_test_seq),
    epochs=30,
    batch_size=64,
    callbacks=[early_stopping]
)

# Évaluation du modèle
loss, accuracy = model.evaluate(X_test_seq, y_test_seq)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Sauvegarde du modèle
model.save("lstm_panne_model_2.h5")
model.save("lstm_panne_model_2.keras")

print("Modèle LSTM entraîné et sauvegardé avec succès.")


Epoch 1/30


  super().__init__(**kwargs)


[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 171ms/step - accuracy: 0.8193 - loss: 0.4521 - val_accuracy: 0.2404 - val_loss: 1.8353
Epoch 2/30
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 158ms/step - accuracy: 0.8981 - loss: 0.3383 - val_accuracy: 0.9070 - val_loss: 0.3131
Epoch 3/30
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 160ms/step - accuracy: 0.9575 - loss: 0.1836 - val_accuracy: 0.9141 - val_loss: 0.3504
Epoch 4/30
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 161ms/step - accuracy: 0.9608 - loss: 0.1626 - val_accuracy: 0.9142 - val_loss: 0.4825
Epoch 5/30
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 162ms/step - accuracy: 0.9620 - loss: 0.1556 - val_accuracy: 0.9143 - val_loss: 0.4958
Epoch 6/30
[1m48/48[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 158ms/step - accuracy: 0.9644 - loss: 0.1439 - val_accuracy: 0.9144 - val_loss: 0.5176
Epoch 7/30
[1m48/48[0m [32m━━━━━━━━



Test Loss: 0.31308713555336, Test Accuracy: 0.9070146083831787
Modèle LSTM entraîné et sauvegardé avec succès.


In [41]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred = np.argmax(model.predict(X_test_seq), axis=1)
print(confusion_matrix(y_test_seq, y_pred))
print(classification_report(y_test_seq, y_pred))

[1m691/691[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step
[[ 1154  1966     0]
 [    0 18901     0]
 [   43    47     0]]
              precision    recall  f1-score   support

           0       0.96      0.37      0.53      3120
           1       0.90      1.00      0.95     18901
           2       0.00      0.00      0.00        90

    accuracy                           0.91     22111
   macro avg       0.62      0.46      0.49     22111
weighted avg       0.91      0.91      0.89     22111



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>########################################################################</strong>    
<strong>################################ TEST 3  ################################</strong>
<strong>########################################################################</strong>    
</ul>

In [115]:
###################################################################
################ Train : Panne2 & Panne3 & Panne4   ################
################ Test  : Panne1           #########################
###################################################################

# Colonnes continues et catégoriques
continuous_features = ["TP2", "DV_pressure", "Oil_temperature", "Motor_current", "Reservoirs"]
categorical_features = ["COMP", "DV_eletric", "Towers", "LPS", "Pressure_switch", "Oil_level", "Caudal_impulses"]
target = "panne"

# Normaliser les colonnes continues
scaler = MinMaxScaler()
df[continuous_features] = scaler.fit_transform(df[continuous_features])

# Définir les périodes d'entraînement et de test
train_periods = [{'start': '2020-04-18 23:59:10', 'end': '2020-09-01 03:59:50'}]
test_periods  = [{'start': '2020-02-01 00:00:00', 'end': '2020-04-18 23:59:00'}]

# Définir les indices pour les périodes d'entraînement
start_train = pd.Timestamp(train_periods[0]['start'])
end_train   = pd.Timestamp(train_periods[0]['end'])
train_indices = df[(df['timestamp'] >= start_train) & (df['timestamp'] <= end_train)].index.tolist()

# Définir les indices pour les périodes de test
start_test = pd.Timestamp(test_periods[0]['start'])
end_test   = pd.Timestamp(test_periods[0]['end'])
test_indices = df[(df['timestamp'] >= start_test) & (df['timestamp'] <= end_test)].index.tolist()

# Préparation des ensembles d'entraînement et de test
X_train = df.loc[train_indices].drop(columns=['timestamp', 'panne']).values
y_train = df.loc[train_indices, 'panne'].values

X_test = df.loc[test_indices].drop(columns=['timestamp', 'panne']).values
y_test = df.loc[test_indices, 'panne'].values

# Fonction pour créer des séquences
def create_sequences(X, y, sequence_length=30):
    X_seq, y_seq = [], []
    for i in range(len(X) - sequence_length):
        X_seq.append(X[i:i + sequence_length])
        y_seq.append(y[i + sequence_length])
    return np.array(X_seq), np.array(y_seq)

# Séquences pour l'ensemble d'entraînement et de test
sequence_length = 30
X_train_seq, y_train_seq = create_sequences(X_train, y_train, sequence_length)
X_test_seq, y_test_seq = create_sequences(X_test, y_test, sequence_length)


###################### Resumé du dataset ###########################
display(df.loc[train_indices].drop(columns=['timestamp', 'panne']).columns)

# Calculer et afficher les durées
for i, period in enumerate(train_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Train {i} : {duration}")

for i, period in enumerate(test_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Test  {i} : {duration}")    
print("---------------------------------")    
# Distribution des modalités
values, counts = np.unique(y_train, return_counts=True)
print("Distribution des modalités dans y_train :")
for value, count in zip(values, counts):
    print(f"Modalité {value} : {count} observations")
####################################################################

Index(['TP2', 'DV_pressure', 'Oil_temperature', 'Motor_current', 'Reservoirs',
       'COMP', 'DV_eletric', 'Towers', 'LPS', 'Pressure_switch', 'Oil_level',
       'Caudal_impulses'],
      dtype='object')

Période Train 1 : 135 days, 4:00:40
Période Test  1 : 77 days, 23:59:00
---------------------------------
Distribution des modalités dans y_train :
Modalité 0 : 1144712 observations
Modalité 1 : 22863 observations
Modalité 2 : 270 observations


In [43]:
# Construction du modèle LSTM
model = Sequential([
    LSTM(256, return_sequences=True, kernel_regularizer=l2(0.01), input_shape=(X_train_seq.shape[1], X_train_seq.shape[2])),
    Dropout(0.3),
    LSTM(128, return_sequences=True, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    LSTM(64, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dense(3, activation='softmax')  # 3 classes : 0 (Pas de panne), 1 (En panne), 2 (Avertissement)
])


model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Callback d'arrêt anticipé
early_stopping = EarlyStopping(
    monitor='val_loss',    # Surveiller la perte de validation
    patience=5,            # Nombre d'époques sans amélioration avant d'arrêter
    restore_best_weights=True # Restaurer les poids de la meilleure époque
)

# Entraînement du modèle avec arrêt anticipé
history = model.fit(
    X_train_seq, y_train_seq,
    validation_data=(X_test_seq, y_test_seq),
    epochs=30,
    batch_size=64,
    callbacks=[early_stopping]
)

# Évaluation du modèle
loss, accuracy = model.evaluate(X_test_seq, y_test_seq)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Sauvegarde du modèle
model.save("lstm_panne_model_3.h5")
model.save("lstm_panne_model_3.keras")

print("Modèle LSTM entraîné et sauvegardé avec succès.")


  super().__init__(**kwargs)


Epoch 1/30
[1m18248/18248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2536s[0m 139ms/step - accuracy: 0.9799 - loss: 0.1878 - val_accuracy: 0.9871 - val_loss: 0.0709
Epoch 2/30
[1m18248/18248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2565s[0m 141ms/step - accuracy: 0.9802 - loss: 0.0987 - val_accuracy: 0.9871 - val_loss: 0.0707
Epoch 3/30
[1m18248/18248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2584s[0m 142ms/step - accuracy: 0.9804 - loss: 0.0979 - val_accuracy: 0.9871 - val_loss: 0.0700
Epoch 4/30
[1m18248/18248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2600s[0m 142ms/step - accuracy: 0.9803 - loss: 0.0982 - val_accuracy: 0.9871 - val_loss: 0.0704
Epoch 5/30
[1m18248/18248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2648s[0m 145ms/step - accuracy: 0.9803 - loss: 0.0983 - val_accuracy: 0.9871 - val_loss: 0.0705
Epoch 6/30
[1m18248/18248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2672s[0m 146ms/step - accuracy: 0.9804 - loss: 0.0978 - val_accuracy: 



Test Loss: 0.07002128660678864, Test Accuracy: 0.9870526790618896
Modèle LSTM entraîné et sauvegardé avec succès.


In [44]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred = np.argmax(model.predict(X_test_seq), axis=1)
print(confusion_matrix(y_test_seq, y_pred))
print(classification_report(y_test_seq, y_pred))

[1m21059/21059[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m616s[0m 29ms/step
[[665160      0      0]
 [  8635      0      0]
 [    90      0      0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

           0       0.99      1.00      0.99    665160
           1       0.00      0.00      0.00      8635
           2       0.00      0.00      0.00        90

    accuracy                           0.99    673885
   macro avg       0.33      0.33      0.33    673885
weighted avg       0.97      0.99      0.98    673885



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>########################################################################</strong>    
<strong>################################ TEST 4  ################################</strong>
<strong>########################################################################</strong>    
</ul>

In [111]:
###################################################################
################ Train : Panne1 & Panne2           ################
################ Test  : Panne3 & Panne4  #########################
###################################################################


import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2

# Colonnes continues et catégoriques
continuous_features  = ["TP2", "DV_pressure", "Oil_temperature", "Motor_current", "Reservoirs"]
categorical_features = ["COMP", "DV_eletric", "Towers", "LPS", "Pressure_switch", "Oil_level", "Caudal_impulses"]
columns_to_exclude   = ['timestamp', 'panne', 'LPS', 'Pressure_switch', 'Oil_level', 'Caudal_impulses']

target = "panne"

# Normaliser les colonnes continues
scaler = MinMaxScaler()
df[continuous_features] = scaler.fit_transform(df[continuous_features])

# Définir les périodes d'entraînement et de test
train_periods = [{'start': '2020-02-01 00:00:00', 'end': '2020-05-30 06:00:00'}]
test_periods  = [{'start': '2020-05-30 06:00:10', 'end': '2020-09-01 03:59:50'}]

# Définir les indices pour les périodes d'entraînement
start_train = pd.Timestamp(train_periods[0]['start'])
end_train = pd.Timestamp(train_periods[0]['end'])
train_indices = df[(df['timestamp'] >= start_train) & (df['timestamp'] <= end_train)].index.tolist()

# Définir les indices pour les périodes de test
start_test = pd.Timestamp(test_periods[0]['start'])
end_test = pd.Timestamp(test_periods[0]['end'])
test_indices = df[(df['timestamp'] >= start_test) & (df['timestamp'] <= end_test)].index.tolist()

# Préparation des ensembles d'entraînement et de test
X_train = df.loc[train_indices].drop(columns=columns_to_exclude).values
y_train = df.loc[train_indices, 'panne'].values

X_test = df.loc[test_indices].drop(columns=columns_to_exclude).values
y_test = df.loc[test_indices, 'panne'].values

# Fonction pour créer des séquences
def create_sequences(X, y, sequence_length=30):
    X_seq, y_seq = [], []
    for i in range(len(X) - sequence_length):
        X_seq.append(X[i:i + sequence_length])
        y_seq.append(y[i + sequence_length])
    return np.array(X_seq), np.array(y_seq)

# Séquences pour l'ensemble d'entraînement et de test
sequence_length = 30
X_train_seq, y_train_seq = create_sequences(X_train, y_train, sequence_length)
X_test_seq, y_test_seq = create_sequences(X_test, y_test, sequence_length)

###################### Resumé du dataset ###########################
display(df.loc[train_indices].drop(columns=columns_to_exclude).columns)

# Calculer et afficher les durées
for i, period in enumerate(train_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Train {i} : {duration}")

for i, period in enumerate(test_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Test  {i} : {duration}")    
print("---------------------------------")    
# Distribution des modalités
values, counts = np.unique(y_train, return_counts=True)
print("Distribution des modalités dans y_train :")
for value, count in zip(values, counts):
    print(f"Modalité {value} : {count} observations")
####################################################################

Index(['TP2', 'DV_pressure', 'Oil_temperature', 'Motor_current', 'Reservoirs',
       'COMP', 'DV_eletric', 'Towers'],
      dtype='object')

Période Train 1 : 119 days, 6:00:00
Période Test  1 : 93 days, 21:59:40
---------------------------------
Distribution des modalités dans y_train :
Modalité 0 : 1019165 observations
Modalité 1 : 10976 observations
Modalité 2 : 180 observations


In [77]:

# Construction du modèle LSTM
model = Sequential([
    LSTM(256, return_sequences=True, kernel_regularizer=l2(0.01), input_shape=(X_train_seq.shape[1], X_train_seq.shape[2])),
    Dropout(0.3),
    LSTM(128, return_sequences=True, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    LSTM(64, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dense(3, activation='softmax')  # 3 classes : 0 (Pas de panne), 1 (En panne), 2 (Avertissement)
])


model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Callback d'arrêt anticipé
early_stopping = EarlyStopping(
    monitor='val_loss',    # Surveiller la perte de validation
    patience=5,            # Nombre d'époques sans amélioration avant d'arrêter
    restore_best_weights=True # Restaurer les poids de la meilleure époque
)

# Entraînement du modèle avec arrêt anticipé
history = model.fit(
    X_train_seq, y_train_seq,
    validation_data=(X_test_seq, y_test_seq),
    epochs=30,
    batch_size=64,
    callbacks=[early_stopping]
)

# Évaluation du modèle
loss, accuracy = model.evaluate(X_test_seq, y_test_seq)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Sauvegarde du modèle
model.save("lstm_panne_model_4.h5")
model.save("lstm_panne_model_4.keras")

print("Modèle LSTM entraîné et sauvegardé avec succès.")

Epoch 1/30


  super().__init__(**kwargs)


[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m312s[0m 26s/step - accuracy: 0.3952 - loss: 6.4827 - val_accuracy: 0.6528 - val_loss: 5.2700
Epoch 2/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m300s[0m 25s/step - accuracy: 0.5375 - loss: 4.6920 - val_accuracy: 0.6547 - val_loss: 3.8072
Epoch 3/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m300s[0m 25s/step - accuracy: 0.5163 - loss: 3.4880 - val_accuracy: 0.6716 - val_loss: 2.7213
Epoch 4/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m321s[0m 25s/step - accuracy: 0.5148 - loss: 2.6688 - val_accuracy: 0.6097 - val_loss: 2.5478
Epoch 5/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m323s[0m 25s/step - accuracy: 0.5362 - loss: 2.1252 - val_accuracy: 0.6346 - val_loss: 2.0081
Epoch 6/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m321s[0m 25s/step - accuracy: 0.5477 - loss: 1.7217 - val_accuracy: 0.6157 - val_loss: 1.7918
Epoch 7/30
[1m13/13[0m [32m━━━━━━━━━



Test Loss: 1.03300940990448, Test Accuracy: 0.6414580345153809
Modèle LSTM entraîné et sauvegardé avec succès.


In [78]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred = np.argmax(model.predict(X_test_seq), axis=1)
print(confusion_matrix(y_test_seq, y_pred))
print(classification_report(y_test_seq, y_pred))

[1m10859/10859[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m370s[0m 34ms/step
[[203716  93774  30739]
 [     0  19141      0]
 [    38     28     24]]
              precision    recall  f1-score   support

           0       1.00      0.62      0.77    328229
           1       0.17      1.00      0.29     19141
           2       0.00      0.27      0.00        90

    accuracy                           0.64    347460
   macro avg       0.39      0.63      0.35    347460
weighted avg       0.95      0.64      0.74    347460



<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>########################################################################</strong>    
<strong>################################ TEST 5  ################################</strong>
<strong>########################################################################</strong>    
</ul>

In [105]:
###################################################################################################
################ Train : Panne1 & Panne2 & Panne3  (15mn + 15mn + 15mn) pour chaque panne   #######
################ Test  : Panne4  (15mn + 15mn + 15mn)     #########################################
###################################################################################################

# Colonnes continues et catégoriques
continuous_features = ["TP2", "DV_pressure", "Oil_temperature", "Motor_current", "Reservoirs"]
categorical_features = ["COMP", "DV_eletric", "Towers", "LPS", "Pressure_switch", "Oil_level", "Caudal_impulses"]
columns_to_exclude = ['timestamp', 'panne', 'LPS', 'Pressure_switch', 'Oil_level', 'Caudal_impulses']

target = "panne"

# Normaliser les colonnes continues
scaler = MinMaxScaler()
df[continuous_features] = scaler.fit_transform(df[continuous_features])

# Définir les périodes d'entraînement et de test
# Définir les périodes d'entraînement et de test
train_periods = [
    {'start': '2020-04-17 23:30:00', 'end': '2020-04-18 00:15:00'},  # Panne1
    {'start': '2020-05-29 23:00:00', 'end': '2020-05-29 23:45:00'},  # Panne2
    {'start': '2020-06-05 09:30:00', 'end': '2020-06-05 10:15:00'}   # Panne3
                 ]

test_periods = [{'start': '2020-07-15 14:00:00', 'end': '2020-07-15 14:45:00'}]  #  Panne4

# Définir les indices pour les périodes d'entraînement
train_indices = []
for period in train_periods:
    start_train = pd.Timestamp(period['start'])
    end_train = pd.Timestamp(period['end'])
    indices = df[(df['timestamp'] >= start_train) & (df['timestamp'] <= end_train)].index.tolist()
    train_indices.extend(indices)


# Définir les indices pour les périodes de test
start_test = pd.Timestamp(test_periods[0]['start'])
end_test = pd.Timestamp(test_periods[0]['end'])
test_indices = df[(df['timestamp'] >= start_test) & (df['timestamp'] <= end_test)].index.tolist()

# Préparation des ensembles d'entraînement et de test
X_train = df.loc[train_indices].drop(columns=columns_to_exclude).values
y_train = df.loc[train_indices, 'panne'].values

X_test = df.loc[test_indices].drop(columns=columns_to_exclude).values
y_test = df.loc[test_indices, 'panne'].values

# Fonction pour créer des séquences
def create_sequences(X, y, sequence_length=30):
    X_seq, y_seq = [], []
    for i in range(len(X) - sequence_length):
        X_seq.append(X[i:i + sequence_length])
        y_seq.append(y[i + sequence_length])
    return np.array(X_seq), np.array(y_seq)

# Séquences pour l'ensemble d'entraînement et de test
sequence_length = 30
X_train_seq, y_train_seq = create_sequences(X_train, y_train, sequence_length)
X_test_seq, y_test_seq = create_sequences(X_test, y_test, sequence_length)

###################### Resumé du dataset ###########################
display(df.loc[train_indices].drop(columns=columns_to_exclude).columns)

# Calculer et afficher les durées
for i, period in enumerate(train_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Train {i} : {duration}")

for i, period in enumerate(test_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Test  {i} : {duration}")    
print("---------------------------------")    
# Distribution des modalités
values, counts = np.unique(y_train, return_counts=True)
print("Distribution des modalités dans y_train :")
for value, count in zip(values, counts):
    print(f"Modalité {value} : {count} observations")
####################################################################

Index(['TP2', 'DV_pressure', 'Oil_temperature', 'Motor_current', 'Reservoirs',
       'COMP', 'DV_eletric', 'Towers'],
      dtype='object')

Période Train 1 : 0:45:00
Période Train 2 : 0:45:00
Période Train 3 : 0:45:00
Période Test  1 : 0:45:00
---------------------------------
Distribution des modalités dans y_train :
Modalité 0 : 270 observations
Modalité 1 : 273 observations
Modalité 2 : 270 observations


In [106]:
# Construction du modèle LSTM
model = Sequential([
    Input(shape=(X_train_seq.shape[1], X_train_seq.shape[2])),  # Définir la forme de l'entrée ici
    LSTM(256, return_sequences=True, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    LSTM(128, return_sequences=True, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    LSTM(64, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dense(3, activation='softmax')  # 3 classes : 0 (Pas de panne), 1 (En panne), 2 (Avertissement)
])


model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Callback d'arrêt anticipé
early_stopping = EarlyStopping(
    monitor='val_loss',    # Surveiller la perte de validation
    patience=5,            # Nombre d'époques sans amélioration avant d'arrêter
    restore_best_weights=True # Restaurer les poids de la meilleure époque
)

# Entraînement du modèle avec arrêt anticipé
history = model.fit(
    X_train_seq, y_train_seq,
    validation_data=(X_test_seq, y_test_seq),
    epochs=30,
    batch_size=64,
    callbacks=[early_stopping]
)

# Évaluation du modèle
loss, accuracy = model.evaluate(X_test_seq, y_test_seq)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Sauvegarde du modèle
model.save("lstm_panne_model_5.keras")

print("Modèle LSTM entraîné et sauvegardé avec succès.")

Epoch 1/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 232ms/step - accuracy: 0.4577 - loss: 6.4586 - val_accuracy: 0.3983 - val_loss: 5.2145
Epoch 2/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 184ms/step - accuracy: 0.5039 - loss: 4.7081 - val_accuracy: 0.4689 - val_loss: 3.8999
Epoch 3/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 178ms/step - accuracy: 0.5386 - loss: 3.4944 - val_accuracy: 0.4606 - val_loss: 3.0526
Epoch 4/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 178ms/step - accuracy: 0.5394 - loss: 2.6690 - val_accuracy: 0.4523 - val_loss: 2.5122
Epoch 5/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 179ms/step - accuracy: 0.5460 - loss: 2.1034 - val_accuracy: 0.4647 - val_loss: 2.1682
Epoch 6/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 184ms/step - accuracy: 0.5598 - loss: 1.7300 - val_accuracy: 0.4730 - val_loss: 1.9119
Epoch 7/30
[1m13/13[0m [

In [107]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred = np.argmax(model.predict(X_test_seq), axis=1)
print(confusion_matrix(y_test_seq, y_pred))
print(classification_report(y_test_seq, y_pred))

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 105ms/step
[[ 7 35 18]
 [ 0 91  0]
 [34 20 36]]
              precision    recall  f1-score   support

           0       0.17      0.12      0.14        60
           1       0.62      1.00      0.77        91
           2       0.67      0.40      0.50        90

    accuracy                           0.56       241
   macro avg       0.49      0.51      0.47       241
weighted avg       0.53      0.56      0.51       241



<ul style="text-align: center;font-family: times, serif; font-size:14pt; color:Red;">
<strong>########################################################################</strong>    
<strong>################################ TEST 6  ################################</strong>
<strong>########################################################################</strong>    
</ul>

In [116]:
###################################################################################################
################ Train : Panne1 & Panne2 & Panne3  (15mn + 15mn + 15mn) pour chaque panne   #######
################ Test  : Panne4  (all)                    #########################################
###################################################################################################

# Colonnes continues et catégoriques
continuous_features = ["TP2", "DV_pressure", "Oil_temperature", "Motor_current", "Reservoirs"]
categorical_features = ["COMP", "DV_eletric", "Towers", "LPS", "Pressure_switch", "Oil_level", "Caudal_impulses"]
columns_to_exclude = ['timestamp', 'panne', 'LPS', 'Pressure_switch', 'Oil_level', 'Caudal_impulses']

target = "panne"

# Normaliser les colonnes continues
scaler = MinMaxScaler()
df[continuous_features] = scaler.fit_transform(df[continuous_features])

# Définir les périodes d'entraînement et de test
# Définir les périodes d'entraînement et de test
train_periods = [
    {'start': '2020-04-17 23:30:00', 'end': '2020-04-18 00:15:00'},  # Panne1
    {'start': '2020-05-29 23:00:00', 'end': '2020-05-29 23:45:00'},  # Panne2
    {'start': '2020-06-05 09:30:00', 'end': '2020-06-05 10:15:00'}   # Panne3
                 ]

test_periods = [{'start': '2020-06-07 14:30:10', 'end': '2020-09-01 03:59:50'}]  #  Panne4

# Définir les indices pour les périodes d'entraînement
train_indices = []
for period in train_periods:
    start_train = pd.Timestamp(period['start'])
    end_train = pd.Timestamp(period['end'])
    indices = df[(df['timestamp'] >= start_train) & (df['timestamp'] <= end_train)].index.tolist()
    train_indices.extend(indices)


# Définir les indices pour les périodes de test
start_test = pd.Timestamp(test_periods[0]['start'])
end_test = pd.Timestamp(test_periods[0]['end'])
test_indices = df[(df['timestamp'] >= start_test) & (df['timestamp'] <= end_test)].index.tolist()

# Préparation des ensembles d'entraînement et de test
X_train = df.loc[train_indices].drop(columns=columns_to_exclude).values
y_train = df.loc[train_indices, 'panne'].values

X_test = df.loc[test_indices].drop(columns=columns_to_exclude).values
y_test = df.loc[test_indices, 'panne'].values

# Fonction pour créer des séquences
def create_sequences(X, y, sequence_length=30):
    X_seq, y_seq = [], []
    for i in range(len(X) - sequence_length):
        X_seq.append(X[i:i + sequence_length])
        y_seq.append(y[i + sequence_length])
    return np.array(X_seq), np.array(y_seq)

# Séquences pour l'ensemble d'entraînement et de test
sequence_length = 30
X_train_seq, y_train_seq = create_sequences(X_train, y_train, sequence_length)
X_test_seq, y_test_seq = create_sequences(X_test, y_test, sequence_length)

###################### Resumé du dataset ###########################
display(df.loc[train_indices].drop(columns=columns_to_exclude).columns)

# Calculer et afficher les durées
for i, period in enumerate(train_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Train {i} : {duration}")

for i, period in enumerate(test_periods, 1):
    start_time = datetime.strptime(period['start'], '%Y-%m-%d %H:%M:%S')
    end_time = datetime.strptime(period['end'], '%Y-%m-%d %H:%M:%S')
    duration = end_time - start_time  # Calculer la durée
    print(f"Période Test  {i} : {duration}")    
print("---------------------------------")    
# Distribution des modalités
values, counts = np.unique(y_train, return_counts=True)
print("Distribution des modalités dans y_train :")
for value, count in zip(values, counts):
    print(f"Modalité {value} : {count} observations")
####################################################################

Index(['TP2', 'DV_pressure', 'Oil_temperature', 'Motor_current', 'Reservoirs',
       'COMP', 'DV_eletric', 'Towers'],
      dtype='object')

Période Train 1 : 0:45:00
Période Train 2 : 0:45:00
Période Train 3 : 0:45:00
Période Test  1 : 85 days, 13:29:40
---------------------------------
Distribution des modalités dans y_train :
Modalité 0 : 270 observations
Modalité 1 : 273 observations
Modalité 2 : 270 observations


In [117]:
# Construction du modèle LSTM
model = Sequential([
    Input(shape=(X_train_seq.shape[1], X_train_seq.shape[2])),  # Définir la forme de l'entrée ici
    LSTM(256, return_sequences=True, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    LSTM(128, return_sequences=True, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    LSTM(64, kernel_regularizer=l2(0.01)),
    Dropout(0.3),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dense(3, activation='softmax')  # 3 classes : 0 (Pas de panne), 1 (En panne), 2 (Avertissement)
])


model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Callback d'arrêt anticipé
early_stopping = EarlyStopping(
    monitor='val_loss',    # Surveiller la perte de validation
    patience=5,            # Nombre d'époques sans amélioration avant d'arrêter
    restore_best_weights=True # Restaurer les poids de la meilleure époque
)

# Entraînement du modèle avec arrêt anticipé
history = model.fit(
    X_train_seq, y_train_seq,
    validation_data=(X_test_seq, y_test_seq),
    epochs=30,
    batch_size=64,
    callbacks=[early_stopping]
)

# Évaluation du modèle
loss, accuracy = model.evaluate(X_test_seq, y_test_seq)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")

# Sauvegarde du modèle
model.save("lstm_panne_model_6.keras")

print("Modèle LSTM entraîné et sauvegardé avec succès.")

Epoch 1/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m673s[0m 56s/step - accuracy: 0.4368 - loss: 6.4831 - val_accuracy: 0.3313 - val_loss: 5.6249
Epoch 2/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m703s[0m 59s/step - accuracy: 0.4873 - loss: 4.7548 - val_accuracy: 0.6584 - val_loss: 3.8265
Epoch 3/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m763s[0m 64s/step - accuracy: 0.5079 - loss: 3.5358 - val_accuracy: 0.6932 - val_loss: 2.7623
Epoch 4/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m701s[0m 58s/step - accuracy: 0.5442 - loss: 2.6898 - val_accuracy: 0.6627 - val_loss: 2.2320
Epoch 5/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m669s[0m 56s/step - accuracy: 0.5342 - loss: 2.1424 - val_accuracy: 0.6093 - val_loss: 2.0352
Epoch 6/30
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m667s[0m 56s/step - accuracy: 0.5401 - loss: 1.8088 - val_accuracy: 0.6083 - val_loss: 1.8853
Epoch 7/30
[1m13/13[0m [3

In [118]:
from sklearn.metrics import confusion_matrix, classification_report

y_pred = np.argmax(model.predict(X_test_seq), axis=1)
print(confusion_matrix(y_test_seq, y_pred))
print(classification_report(y_test_seq, y_pred))

[1m23101/23101[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m794s[0m 34ms/step
[[498351 169300  69867]
 [    38   1583      0]
 [    39     28     23]]
              precision    recall  f1-score   support

           0       1.00      0.68      0.81    737518
           1       0.01      0.98      0.02      1621
           2       0.00      0.26      0.00        90

    accuracy                           0.68    739229
   macro avg       0.34      0.64      0.28    739229
weighted avg       1.00      0.68      0.80    739229

