# Turbofan Engine Degradation Simulation
 
Engine degradation simulation was carried out using the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS). 

Four different sets were simulated under different combinations of operational conditions and fault modes. This records several sensor channels to characterize fault evolution. The data set was provided by the NASA Ames Prognostics Center of Excellence (PCoE).

Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise.

The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data.

The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. 

The columns correspond to:
- unit number
- time, in cycles
- operational setting 1
- operational setting 2
- operational setting 3
- sensor measurement  1
- sensor measurement  2
- ...
- sensor measurement  21



Data Set: FD001 
Train trajectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: ONE (HPC Degradation)

Data Set: FD002
Train trajectories: 260
Test trajectories: 259
Conditions: SIX 
Fault Modes: ONE (HPC Degradation)

Data Set: FD003
Train trajectories: 100
Test trajectories: 100
Conditions: ONE (Sea Level)
Fault Modes: TWO (HPC Degradation, Fan Degradation)

Data Set: FD004
Train trajectories: 248
Test trajectories: 249
Conditions: SIX 
Fault Modes: TWO (HPC Degradation, Fan Degradation)

A. Saxena and K. Goebel (2008). “Turbofan Engine Degradation Simulation Data Set”, NASA Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA

## A selection of papers which use this dataset

Performance Benchmarking and Analysis of Prognostic Methods for CMAPSS Datasets

https://doi.org/10.36001/ijphm.2014.v5i2.2236

Remaining useful life estimation via transformer encoder enhanced by a gated convolutional unit

https://link.springer.com/article/10.1007/s10845-021-01750-x

Variational encoding approach for interpretable assessment of remaining useful life estimation

https://www.sciencedirect.com/science/article/pii/S0951832022000321?via%3Dihub

Exploratory Data Analysis of the N-CMAPSS Dataset for Prognostics

https://ieeexplore.ieee.org/document/9673064







In [None]:
import pandas
import os
files={'RUL':['RUL_FD001.txt','RUL_FD002.txt','RUL_FD003.txt','RUL_FD004.txt'],
        'test':['test_FD001.txt','test_FD002.txt','test_FD003.txt','test_FD004.txt'],
        'train':['train_FD001.txt','train_FD002.txt','train_FD003.txt','train_FD004.txt']}
tt_header = ['unit number',
             'time',
             'operational setting 1',
             'operational setting 2',
             'operational setting 3',
             'sensor measurement 1',
             'sensor measurement 2',
             'sensor measurement 3',
             'sensor measurement 4',
             'sensor measurement 5',
             'sensor measurement 6',
             'sensor measurement 7',
             'sensor measurement 8',
             'sensor measurement 9',
             'sensor measurement 10',
             'sensor measurement 11',
             'sensor measurement 12',
             'sensor measurement 13',
             'sensor measurement 14',
             'sensor measurement 15',
             'sensor measurement 16',
             'sensor measurement 17',
             'sensor measurement 18',
             'sensor measurement 19',
             'sensor measurement 20',
             'sensor measurement 21']
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
trainingdata = pandas.read_csv(os.path.join('data/CMAPSS',files['train'][0]),sep=' ',names=tt_header, index_col=False)
testingdata = pandas.read_csv(os.path.join('data/CMAPSS',files['test'][0]),sep=' ',names=tt_header, index_col=False)
testrul = pandas.read_csv(os.path.join('data/CMAPSS',files['RUL'][0]),sep=' ',names=['RUL'],index_col=False)



In [None]:
tt_header

In [None]:
trainingdata

In [None]:
testingdata

In [None]:
testrul['RUL'].loc[0]

# Remaining Useful Life

In the training data each unit runs until failure.  So the remaining useful life is the maximum cycle time recorded - the current number of cycles

This needs to be calculated and added into the data

In [None]:
units = list(trainingdata['unit number'].unique())

In [None]:
trainingdata[trainingdata['unit number']==a]['time']

In [None]:
unit_failure_times = {}
for a in units:
    unit_failure_times[a] = max(trainingdata[trainingdata['unit number']==a]['time'])

In [None]:
unit_failure_times

In [None]:
trainingdata['RUL'] = trainingdata.apply(lambda r: unit_failure_times[r['unit number']] - r['time'],axis=1)

In [None]:
testingdata['RUL'] = trainingdata.apply(lambda r: testrul['RUL'].loc[r['unit number']-1] - r['time'],axis=1)

In [None]:
import matplotlib.pyplot as plt


In [None]:
fig, ax = plt.subplots()
unitnumber=2
for a in range(1,21):
    ax.plot(trainingdata[trainingdata['unit number']==unitnumber]['RUL'],
            trainingdata[trainingdata['unit number']==unitnumber]['sensor measurement {}'.format(a)])

plt.show()

In [None]:
fig, ax = plt.subplots()
unitnumber=1
for a in range(1,21):
    ax.plot(testingdata[testingdata['unit number']==unitnumber]['RUL'],
            testingdata[testingdata['unit number']==unitnumber]['sensor measurement {}'.format(a)])

plt.show()

In [None]:
import sklearn

In [None]:
from sklearn.decomposition import PCA

Our features for training are all the data apart from the unit number, RUL, and time

In [None]:
features = tt_header[2:]

In [None]:
X_train = trainingdata.loc[:,features].values
X_test = testingdata.loc[:,features].values

In [None]:
y_train = trainingdata.loc[:,'RUL'].values
y_test = testingdata.loc[:,'RUL'].values

In [None]:
from sklearn.neural_network import MLPRegressor

In [None]:
y_train

In [None]:
regr = MLPRegressor(random_state=1, max_iter=5000000).fit(X_train, y_train)

In [None]:
regr.score(X_test, y_test)

In [None]:
regr.predict(X_test[:2])

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
x = StandardScaler().fit_transform(X_train)

In [None]:


scaled_x = pandas.DataFrame(data = x, columns = features).head()



In [None]:
pca = PCA(n_components=2)

In [None]:
principalComponents = pca.fit_transform(x)

In [None]:
principalDf = pandas.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])

In [None]:
principalDf.head(5)

In [None]:
principalDf['RUL'] = list(trainingdata['RUL'].copy())
principalDf['unit number'] = list(trainingdata['unit number'].copy())

In [None]:
principalDf.head()

In [None]:
fig, ax = plt.subplots()
unitnumber=2
ax.scatter(principalDf[principalDf['unit number']==unitnumber]['RUL'],
            principalDf[principalDf['unit number']==unitnumber]['principal component 2'])

plt.show()

In [None]:
fig, ax = plt.subplots()
for a in principalDf['unit number'].unique():
    ax.scatter(principalDf[principalDf['unit number']==a]['principal component 1'],
                principalDf[principalDf['unit number']==a]['principal component 2'],label='Unit {}'.format(a))
plt.legend(bbox_to_anchor=(1,1), loc="upper left")
plt.show()

In [None]:
fig, ax = plt.subplots()
for a in principalDf['unit number'].unique():
    ax.plot(principalDf[principalDf['unit number']==a]['RUL'],
                principalDf[principalDf['unit number']==a]['principal component 1'],label='Unit {}'.format(a))
plt.legend(bbox_to_anchor=(1,1), loc="upper left")
plt.show()

In [None]:
fig, ax = plt.subplots()
for a in principalDf['unit number'].unique():
    ax.plot(principalDf[principalDf['unit number']==a]['RUL'],
                principalDf[principalDf['unit number']==a]['principal component 2'],label='Unit {}'.format(a))
plt.legend(bbox_to_anchor=(1,1), loc="upper left")
plt.show()

In [None]:
pca.explained_variance_ratio_