In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# Topic: EX2 - Turbofan RUL Prediction
**Task**: Predict the remaining useful life (RUL) of turbofan engines based on given sensor data (time series data). It is a regression problem.
**Data**: Turbofan engine degradation simulation data (NASA) - [Link](https://data.nasa.gov/dataset/Turbofan-Engine-Degradation-Simulation-Data-Set/vrks-gjie). See also in the topic [introduction notebook](https://github.com/nina-prog/damage-propagation-modeling/blob/2fb8c1a1102a48d7abbf04e4031807790a913a99/notebooks/Turbofan%20remaining%20useful%20life%20Prediction.ipynb).

**Subtasks**:
1. Perform a deep **exploratory data analysis (EDA)** on the given data.
2. Implement a more efficient **sliding window method** for time series data analysis.
3. Apply **traditional machine learning methods** (SOTA) to predict the remaining useful life. Includes data preparation, feature extraction, feature selection, model selection, and model parameter optimization. -> 🎯 **Focus on this task** data preparation and feature selection (feature extraction part of sliding window method).
4. Create **neural network models** to predict the remaining useful life. Includes different architectures like Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), or Attention Models. Note: You can search for SOTA research papers and reproduce current state-of-the-art models.


# Imports + Settings

In [3]:
!pip install colorlog

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [4]:
# third-party libraries
import pandas as pd
import numpy as np
import os
from typing import List, Union
import time
from tqdm.notebook import tqdm
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import LocalOutlierFactor
from sklearn.covariance import EllipticEnvelope
from scipy import stats
from scipy.stats import multivariate_normal, zscore
from scipy.stats._mstats_basic import winsorize

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset, random_split
from torch.optim.lr_scheduler import StepLR

In [5]:
# source code
os.chdir("../") # set working directory to root of project
#os.getcwd() # check current working directory


from src.utils import load_data, load_config, train_val_split_by_group
from src.rolling_window_creator import RollingWindowDatasetCreator, calculate_RUL
from src.data_cleaning import identify_missing_values, identify_single_unique_features, format_dtype, clean_data
import src.nn_utils as nu
import src.transformer_fred as tff

In [6]:
# settings
sns.set_style("whitegrid")
sns.set_palette("Set2")
sns.set(rc={"figure.dpi":100, 'savefig.dpi':200})
sns.set_context('notebook')

In [7]:
np.random.seed(42)

# Paths

In [8]:
PATH_TO_CONFIG = "configs/config.yaml"

# Load config + Data

In [9]:
config = load_config(PATH_TO_CONFIG) # config is dict

In [10]:
%%time
train_data, test_data, test_RUL_data = load_data(config_path=PATH_TO_CONFIG, dataset_num=1)

2024-05-28 11:11:04 [[34msrc.utils:60[0m] [[32mINFO[0m] >>>> Loading data set 1...[0m
2024-05-28 11:11:04 [[34msrc.utils:89[0m] [[32mINFO[0m] >>>> Loaded raw data for dataset 1.[0m
2024-05-28 11:11:04 [[34msrc.utils:90[0m] [[32mINFO[0m] >>>> Train Data: (20631, 26)[0m
2024-05-28 11:11:04 [[34msrc.utils:91[0m] [[32mINFO[0m] >>>> Test Data: (13096, 26)[0m
2024-05-28 11:11:04 [[34msrc.utils:92[0m] [[32mINFO[0m] >>>> Test RUL Data: (100, 1)[0m
CPU times: user 63.5 ms, sys: 13.8 ms, total: 77.3 ms
Wall time: 77.4 ms


In [11]:
# count unit numbers in test set
print(f"Number of unique unit numbers in test set: {test_data['UnitNumber'].nunique()}")
# count min number of cycles in test set for each unit number --> window size must be in the range of these values, for example a window size of 10 would be too large if there is a unit number with only 10 cycles
print("Min number of cycles in test set for a unit number: ", test_data.groupby("UnitNumber")["Cycle"].count().min())

Number of unique unit numbers in test set: 100
Min number of cycles in test set for a unit number:  31


---
Test Data Cleaning Functionality and its impact on Rolling Window Creation

In [12]:
# clean data (with outlier removal, where no samples are dropped but the outliers are replaced, method='winsorize')
# TODO: outsource settings to config file
cleaned_train, cleaned_test = clean_data(train_data, test_data, method='winsorize', ignore_columns=['UnitNumber', 'Cycle'], threshold_missing=0.1, threshold_corr=0.0, contamination=0.05)

2024-05-28 11:11:06 [[34msrc.data_cleaning:134[0m] [[32mINFO[0m] >>>> Cleaning train and test data...[0m
2024-05-28 11:11:06 [[34msrc.data_cleaning:136[0m] [[32mINFO[0m] >>>> Formatting column types...[0m
2024-05-28 11:11:06 [[34msrc.data_cleaning:69[0m] [DEBUG[0m] >>>> Found 0 categorical columns: [][0m
2024-05-28 11:11:06 [[34msrc.data_cleaning:69[0m] [DEBUG[0m] >>>> Found 0 categorical columns: [][0m
2024-05-28 11:11:06 [[34msrc.data_cleaning:141[0m] [[32mINFO[0m] >>>> Handling duplicates...[0m
2024-05-28 11:11:06 [[34msrc.data_cleaning:146[0m] [[32mINFO[0m] >>>> Removing outliers...[0m
2024-05-28 11:11:06 [[34msrc.outlier_detection:150[0m] [DEBUG[0m] >>>> Removing outliers using method: winsorize ...[0m
2024-05-28 11:11:06 [[34msrc.outlier_detection:98[0m] [DEBUG[0m] >>>> Found 1031 outliers to be replaced (winsorized).[0m
2024-05-28 11:11:06 [[34msrc.outlier_detection:100[0m] [DEBUG[0m] >>>> Original DataFrame shape: (20631, 26), Resulting Da

In [13]:
cleaned_train_data = calculate_RUL(cleaned_train, time_column= "Cycle", group_column= "UnitNumber")
cleaned_test_data = nu.calculate_RUL_test(cleaned_test, test_RUL_data)

In [14]:
# Group by 'UnitNumber' and get the size of each group
group_sizes = test_data.groupby('UnitNumber').size()

# Calculate min, max, and mean of the group sizes
min_size = group_sizes.min()
max_size = group_sizes.max()
mean_size = group_sizes.mean()
sd_size = group_sizes.std()

print(f"Min group size: {min_size}")
print(f"Max group size: {max_size}")
print(f"Mean group size: {mean_size}")
print(f"Sd group size: {sd_size}")

Min group size: 31
Max group size: 303
Mean group size: 130.96
Sd group size: 53.593479175185195


In [15]:
train_data.describe()

Unnamed: 0,UnitNumber,Cycle,Operation Setting 1,Operation Setting 2,Operation Setting 3,Sensor Measure 1,Sensor Measure 2,Sensor Measure 3,Sensor Measure 4,Sensor Measure 5,...,Sensor Measure 12,Sensor Measure 13,Sensor Measure 14,Sensor Measure 15,Sensor Measure 16,Sensor Measure 17,Sensor Measure 18,Sensor Measure 19,Sensor Measure 20,Sensor Measure 21
count,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,...,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0,20631.0
mean,51.506568,108.807862,-9e-06,2e-06,100.0,518.67,642.680934,1590.523119,1408.933782,14.62,...,521.41347,2388.096152,8143.752722,8.442146,0.03,393.210654,2388.0,100.0,38.816271,23.289705
std,29.227633,68.88099,0.002187,0.000293,0.0,6.537152e-11,0.500053,6.13115,9.000605,3.3947e-12,...,0.737553,0.071919,19.076176,0.037505,1.556432e-14,1.548763,0.0,0.0,0.180746,0.108251
min,1.0,1.0,-0.0087,-0.0006,100.0,518.67,641.21,1571.04,1382.25,14.62,...,518.69,2387.88,8099.94,8.3249,0.03,388.0,2388.0,100.0,38.14,22.8942
25%,26.0,52.0,-0.0015,-0.0002,100.0,518.67,642.325,1586.26,1402.36,14.62,...,520.96,2388.04,8133.245,8.4149,0.03,392.0,2388.0,100.0,38.7,23.2218
50%,52.0,104.0,0.0,0.0,100.0,518.67,642.64,1590.1,1408.04,14.62,...,521.48,2388.09,8140.54,8.4389,0.03,393.0,2388.0,100.0,38.83,23.2979
75%,77.0,156.0,0.0015,0.0003,100.0,518.67,643.0,1594.38,1414.555,14.62,...,521.95,2388.14,8148.31,8.4656,0.03,394.0,2388.0,100.0,38.95,23.3668
max,100.0,362.0,0.0087,0.0006,100.0,518.67,644.53,1616.91,1441.49,14.62,...,523.38,2388.56,8293.72,8.5848,0.03,400.0,2388.0,100.0,39.43,23.6184


In [27]:
## create dataset
window_size = 85
train_data, test_data = nu.scale_data(cleaned_train_data, cleaned_test_data)
#train, val = train_val_split_by_group(train_data)

X_train, y_train = nu.create_sliding_window(train_data, window_size = window_size)

# Set values to maximum of 130
y_train = np.clip(y_train, a_min=None, a_max=test_RUL_data["RUL"].max())

#X_val, y_val = nu.create_sliding_window(val, window_size = window_size)

#test_data = nu.scale_data(cleaned_test_data)
X_test, y_test = nu.create_sliding_window(test_data, typ = "test", window_size = window_size)

In [31]:
# Example data (replace with actual data loading)
seq_len, batch_size, feature_size = X_train.shape[1], 64, X_train.shape[2]
num_heads, num_layers, project_dim  = 8, 1, 12 * 4 * 2
num_epochs = 450
learning_rate = 0.0001

print(seq_len)
# Create dataset and dataloaders
train_dataset = tff.TurbofanDataset(X_train, y_train)
#val_dataset = nu.TurbofanDataset(X_val, y_val)
test_dataset = tff.TurbofanDataset(X_test, y_test)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
#val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
    
# Initialize model, criterion, optimizer
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = tff.TransformerModel(feature_size, num_heads, num_layers, project_dim = project_dim, window_size = seq_len, dropout = 0.1).to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
scheduler = StepLR(optimizer, step_size=30, gamma=0.5)


print(f"The model has in total {tff.count_parameters(model)} parameters!!")
    
# Training loop
for epoch in range(num_epochs):
    train_loss = tff.train_model(model, train_loader, criterion, optimizer, device)
    #val_loss = evaluate_model(model, val_loader, criterion, device)
    test_loss = tff.evaluate_model(model, test_loader, criterion, device)
    scheduler.step()
    
    print(f"Epoch {epoch+1}/{num_epochs}, Train_L: {train_loss:.2f}, Test_L: {test_loss:.2f}, Test_RMSE: {np.sqrt(test_loss):.2f} ")

85
The model has in total 2973611 parameters!!
Epoch 1/450, Train_L: 1437.44, Test_L: 812.03, Test_RMSE: 28.50 
Epoch 2/450, Train_L: 255.63, Test_L: 1010.26, Test_RMSE: 31.78 
Epoch 3/450, Train_L: 168.56, Test_L: 1025.67, Test_RMSE: 32.03 
Epoch 4/450, Train_L: 135.21, Test_L: 659.92, Test_RMSE: 25.69 
Epoch 5/450, Train_L: 116.21, Test_L: 758.20, Test_RMSE: 27.54 
Epoch 6/450, Train_L: 104.35, Test_L: 537.76, Test_RMSE: 23.19 
Epoch 7/450, Train_L: 99.39, Test_L: 516.76, Test_RMSE: 22.73 
Epoch 8/450, Train_L: 87.98, Test_L: 461.62, Test_RMSE: 21.49 
Epoch 9/450, Train_L: 78.18, Test_L: 615.90, Test_RMSE: 24.82 
Epoch 10/450, Train_L: 78.53, Test_L: 557.48, Test_RMSE: 23.61 
Epoch 11/450, Train_L: 73.66, Test_L: 595.11, Test_RMSE: 24.39 
Epoch 12/450, Train_L: 64.92, Test_L: 597.38, Test_RMSE: 24.44 
Epoch 13/450, Train_L: 62.00, Test_L: 506.90, Test_RMSE: 22.51 
Epoch 14/450, Train_L: 58.37, Test_L: 584.28, Test_RMSE: 24.17 
Epoch 15/450, Train_L: 55.45, Test_L: 558.71, Test_RMSE: 

KeyboardInterrupt: 

In [19]:
## 128 --> epoch 120: 19.99 size 140

torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
}, 'model_FD1_20.pth')

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from ray import tune, train
from ray.tune.schedulers import ASHAScheduler
from ray.tune import CLIReporter
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader
from sklearn.preprocessing import StandardScaler

# Import your necessary functions and modules (assuming nu module and TransformerModel are defined)
# from your_module import TransformerModel, TurbofanDataset, scale_data, create_sliding_window, train_val_split_by_group, count_parameters

# Define training function
def train_model(config, checkpoint_dir=None):
    window_size = config["window_size"]
    project_dim = config["project_dim"]
    num_heads = config["num_heads"]
    
    # Assume your data preparation functions are defined and work as shown in your example
    train_data = nu.scale_data(cleaned_train_data)
    X_train, y_train = nu.create_sliding_window(train_data, window_size=window_size)
    
    val_data = nu.scale_data(cleaned_test_data)
    X_val, y_val = nu.create_sliding_window(val_data, window_size=window_size)
    
    
    # Create datasets and dataloaders
    train_dataset = nu.TurbofanDataset(X_train, y_train)
    val_dataset = nu.TurbofanDataset(X_val, y_val)
    
    train_loader = DataLoader(train_dataset, batch_size=config["batch_size"], shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=config["batch_size"], shuffle=False)
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = TransformerModel(feature_size=X_train.shape[2], num_heads=num_heads, num_layers=config["num_layers"], project_dim=project_dim, window_size=window_size).to(device)
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0001)
    scheduler = StepLR(optimizer, step_size=30, gamma=0.5)
    
    for epoch in range(config["num_epochs"]):
        model.train()
        running_loss = 0.0
        for inputs, targets in train_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            targets = targets.view(-1, 1)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            running_loss += loss.item() * inputs.size(0)
        train_loss = running_loss / len(train_loader.dataset)
        
        model.eval()
        running_loss = 0.0
        with torch.no_grad():
            for inputs, targets in val_loader:
                inputs, targets = inputs.to(device), targets.to(device)
                targets = targets.view(-1, 1)
                outputs = model(inputs)
                loss = criterion(outputs, targets)
                running_loss += loss.item() * inputs.size(0)
        val_loss = running_loss / len(val_loader.dataset)
        
        train.report({"val_loss":val_loss, "train_loss":train_loss})
        scheduler.step()

# Define search space and Ray Tune configuration
search_space = {
    "window_size": tune.choice([120, 130 ,140, 150, 160]),
    "project_dim": tune.choice([48, 96, 192]),
    "num_heads": tune.choice([8, 12, 16, 24]),
    "num_layers": 1,
    "batch_size": 64,
    "num_epochs": 80  # Reduced for quicker tuning
}

# Use ASHAScheduler for efficient hyperparameter search
scheduler = ASHAScheduler(
    metric="val_loss",
    mode="min",
    max_t=175,
    grace_period=25,
    reduction_factor=2
)

# Configure the reporter
reporter = CLIReporter(
    metric_columns=["val_loss", "train_loss", "training_iteration"]
)

# Run hyperparameter search
result = tune.run(
    train_model,
    resources_per_trial={"cpu": 1, "gpu": 1},
    config=search_space,
    num_samples=30,
    scheduler=scheduler,
    progress_reporter=reporter
)

# Get the best trial
best_trial = result.get_best_trial("val_loss", "min", "last")
print("Best trial config: {}".format(best_trial.config))
print("Best trial final validation loss: {}".format(best_trial.last_result["val_loss"]))


2024-05-25 12:52:21,472	INFO worker.py:1749 -- Started a local Ray instance.
2024-05-25 12:52:21,871	INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run(...)`.
2024-05-25 12:52:21,873	INFO tune.py:614 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949
2024-05-25 12:52:21,904	INFO tensorboardx.py:193 -- pip install "ray[tune]" to see TensorBoard files.


== Status ==
Current time: 2024-05-25 12:52:22 (running for 00:00:00.90)
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 100.000: None | Iter 50.000: None | Iter 25.000: None
Logical resource usage: 1.0/8 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A100)
Result logdir: /tmp/ray/session_2024-05-25_12-52-18_786758_14481/artifacts/2024-05-25_12-52-21/train_model_2024-05-25_12-52-21/driver_artifacts
Number of trials: 30/30 (30 PENDING)
+-------------------------+----------+-------+-------------+--------------+---------------+---------------+
| Trial name              | status   | loc   |   num_heads |   num_layers |   project_dim |   window_size |
|-------------------------+----------+-------+-------------+--------------+---------------+---------------|
| train_model_a09f9_00000 | PENDING  |       |          16 |            1 |            48 |           150 |
| train_model_a09f9_00001 | PENDING  |       |          12 |            1 |           192 |           160 |
| train_model_a09f9_000

Trial name,train_loss,val_loss
train_model_a09f9_00000,178.882,1096.28
train_model_a09f9_00001,124.018,874.072
train_model_a09f9_00002,929.325,1830.81
train_model_a09f9_00003,1503.6,2951.39
train_model_a09f9_00004,214.339,1729.8
train_model_a09f9_00005,237.613,1161.53
train_model_a09f9_00006,1804.95,3238.29
train_model_a09f9_00007,162.766,2396.52
train_model_a09f9_00008,141.342,1313.15


== Status ==
Current time: 2024-05-25 12:52:32 (running for 00:00:10.98)
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 100.000: None | Iter 50.000: None | Iter 25.000: None
Logical resource usage: 1.0/8 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A100)
Result logdir: /tmp/ray/session_2024-05-25_12-52-18_786758_14481/artifacts/2024-05-25_12-52-21/train_model_2024-05-25_12-52-21/driver_artifacts
Number of trials: 30/30 (29 PENDING, 1 RUNNING)
+-------------------------+----------+-------------------+-------------+--------------+---------------+---------------+------------+--------------+----------------------+
| Trial name              | status   | loc               |   num_heads |   num_layers |   project_dim |   window_size |   val_loss |   train_loss |   training_iteration |
|-------------------------+----------+-------------------+-------------+--------------+---------------+---------------+------------+--------------+----------------------|
| train_model_a09f9_00000 | RUNNING  | 