# Predictive Maintenance 

This assignment covers the topic of predictive maintenance. Predictive Maintenance problems adress predicting when a machine needs to be maintained ahead of breaking down. This problem can occur anywhere regular maintenance is required for a machine. For example, it can be used in manufacturing, fleet operations, train maintenance, etc.

This assignment will use the [Predictive Maintenance Dataset](https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset). The dataset consists of 10 000 data points stored as rows with 14 features in columns. The 'machine failure' label that indicates, whether the machine has failed in this particular datapoint.

# Learning Objectives
- Perform model tuning based on hyper parameters.
- Select the best model after attempting multiple models.
- Perform recursive feature elimination, producing a statistically significant improvement over a model without feature selection.

In [1]:
import pandas as pd
import numpy as np
from sklearn import preprocessing, metrics
from sklearn.model_selection import train_test_split


df = pd.read_csv('ai4i2020.csv')
print(df.info())
df.head(20)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 9 columns):
UDI                        10000 non-null int64
Product ID                 10000 non-null object
Type                       10000 non-null object
Air temperature [K]        10000 non-null object
Process temperature [K]    10000 non-null object
Rotational speed [rpm]     10000 non-null int64
Torque [Nm]                10000 non-null float64
Tool wear [min]            10000 non-null int64
Machine failure            10000 non-null int64
dtypes: float64(1), int64(4), object(4)
memory usage: 703.2+ KB
None


Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure
0,1,M14860,M,298.1,308.6,1551,42.8,0,0
1,2,L47181,L,298.2,308.7,1408,46.3,3,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0
5,6,M14865,M,298.1,308.6,1425,41.9,11,0
6,7,L47186,L,298.1,308.6,1558,42.4,14,0
7,8,L47187,L,298.1,308.6,1527,40.2,16,0
8,9,M14868,M,298.3,308.7,1667,28.6,18,0
9,10,M14869,M,298.5,309.0,1741,28.0,21,0


Question 1.1:  Write a command that will calculate the number of unique values for each feature in the training data.

In [2]:
# Command(s)
for col in df.columns:
    print(f'Feature: {col}')
    print(f'Unique values: {len(df[col].unique())}')
    print()

Feature: UDI
Unique values: 10000

Feature: Product ID
Unique values: 10000

Feature: Type
Unique values: 3

Feature: Air temperature [K]
Unique values: 93

Feature: Process temperature [K]
Unique values: 82

Feature: Rotational speed [rpm]
Unique values: 941

Feature: Torque [Nm]
Unique values: 577

Feature: Tool wear [min]
Unique values: 246

Feature: Machine failure
Unique values: 2



Question 1.2: Determine if the data contains any missing values, and replace the values with np.nan. Missing values would be '?'.

In [3]:
df = df.replace('?', np.nan)
for col in df.columns:
    print(f'{col}: {df[col].isnull().sum()}')

UDI: 0
Product ID: 0
Type: 0
Air temperature [K]: 140
Process temperature [K]: 183
Rotational speed [rpm]: 0
Torque [Nm]: 0
Tool wear [min]: 0
Machine failure: 0


Question 1.3: Replace all missing values with the mean. Change column types to numeric.

In [4]:
# only numeric columns have a concept of mean, so we don't want to include text columns, id columns,
# or our binary target column

numeric_columns = df.columns[3:8]
for col in numeric_columns:
    df[col] = pd.to_numeric(df[col])
    df[col] = df[col].fillna(df[col].mean())

for col in df.columns:
    print(f'{col}: {df[col].isnull().sum()}')

UDI: 0
Product ID: 0
Type: 0
Air temperature [K]: 0
Process temperature [K]: 0
Rotational speed [rpm]: 0
Torque [Nm]: 0
Tool wear [min]: 0
Machine failure: 0


Question 1.4: Drop UDI and 'Product ID' from the data

In [5]:
df = df.drop(['UDI', 'Product ID'], axis=1)

In [6]:
df.columns = df.columns.str.replace(r'\[|\]', '', regex=True) # xgboost doesnt like brackets in feature names
df.head(5)

Unnamed: 0,Type,Air temperature K,Process temperature K,Rotational speed rpm,Torque Nm,Tool wear min,Machine failure
0,M,298.1,308.6,1551,42.8,0,0
1,L,298.2,308.7,1408,46.3,3,0
2,L,298.1,308.5,1498,49.4,5,0
3,L,298.2,308.6,1433,39.5,7,0
4,L,298.2,308.7,1408,40.0,9,0


Question 2.1: Split the data into training and testing taking into consideration 'Machine failure' as the target (y)

In [7]:
from sklearn.model_selection import train_test_split

X = df.drop('Machine failure', axis=1)
y = df['Machine failure']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=99)

Question 2.2: Apply [One-Hot Encoding](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) to data. Make sure to Fit the training data and transform both training and test data. 

In [8]:
# encode the categorical feature
X_train = pd.get_dummies(X_train)
X_test = pd.get_dummies(X_test)

Question 2.3: Apply [SMOTE](https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html) to the training data since there is class imbalance.

In [11]:
from imblearn.over_sampling import SMOTE

In [12]:
X_col = X_train.columns
sm = SMOTE(random_state=99)
X_train, y_train = sm.fit_sample(X_train, y_train)
X_train = pd.DataFrame(X_train, columns=X_col)

Question 3.1: Train five machine learning [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html), [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html), [KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html), [DecisionTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html), and [XGBClassifier](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier) based on the training data, and evaluate their performance on the test dataset. Use default hyperparameter values.

In [13]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost.sklearn import XGBClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix


In [14]:
#Build models (You can either do it combined or separate)

models = {
    'Logistic Regresion': LogisticRegression(max_iter=1000, random_state=5), 
    'Support Vector Machine': SVC(), 
    'K-NN': KNeighborsClassifier(), 
    'Decision Tree':DecisionTreeClassifier(),
    'XGBoost': XGBClassifier()
}

for model in models:
    clf = models[model]
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(model)
    print(classification_report(y_test, y_pred))
    print(confusion_matrix(y_test, y_pred))
    print()



Logistic Regresion
              precision    recall  f1-score   support

           0       0.99      0.82      0.90      2902
           1       0.13      0.79      0.22        98

    accuracy                           0.82      3000
   macro avg       0.56      0.80      0.56      3000
weighted avg       0.96      0.82      0.88      3000

[[2384  518]
 [  21   77]]

Support Vector Machine
              precision    recall  f1-score   support

           0       0.97      0.99      0.98      2902
           1       0.21      0.08      0.12        98

    accuracy                           0.96      3000
   macro avg       0.59      0.54      0.55      3000
weighted avg       0.94      0.96      0.95      3000

[[2871   31]
 [  90    8]]

K-NN
              precision    recall  f1-score   support

           0       0.99      0.89      0.94      2902
           1       0.17      0.68      0.28        98

    accuracy                           0.88      3000
   macro avg       0.58  

Questions 3.2:  Perform recursive feature elimination (3 features) on the dataset using a logistic regression classifier with max_iter= 1000, random_state=5.  Any difference in the results? Explain.

In [15]:
from sklearn.feature_selection import RFE

clf = LogisticRegression(max_iter=1000, random_state=5)
selector = RFE(clf, n_features_to_select=3)
selector.fit(X_train, y_train)
selector.support_



array([ True,  True, False, False, False, False,  True, False])

In [16]:
selections = X_train.columns[selector.support_]
clf = LogisticRegression(max_iter=1000, random_state=5)
clf.fit(X_train[selections], y_train) # select only those features
y_pred = clf.predict(X_test[selections])
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))



              precision    recall  f1-score   support

           0       0.98      0.59      0.74      2902
           1       0.05      0.59      0.09        98

    accuracy                           0.59      3000
   macro avg       0.51      0.59      0.41      3000
weighted avg       0.95      0.59      0.72      3000

[[1724 1178]
 [  40   58]]


Compared to our previous version, this feature-selected version actually performed worse on every single metric. This would seem to indicate that we don't want to eliminate quite so many features. 

Q.4. Create a new text cell in your Notebook: Complete a 50-100 word summary (or short description of your thinking in applying this week's learning to the solution) of your experience in this assignment. Include:
What was your incoming experience with this model, if any? what steps you took, what obstacles you encountered. how you link this exercise to real-world, machine learning problem-solving. (What steps were missing? What else do you need to learn?) This summary allows your instructor to know how you are doing and allot points for your effort in thinking and planning, and making connections to real-world work. 

I was surprised by the fact that recursive feature elimination seemed to make our model perform so much worse. I suppose it is possible that this simply isn't the best dataset for that method — it would appear that our features actually are useful for making predictions. 

With the full feature set, there was a lot of variation in model quality. I think I would say that XGBoost performed best, with the DecisionTree as a close second. In an application of this type, we would want to prioritize true positives, even if that comes at the cost of false positives. Since XGBoost had the best recall score, that's probably the most useful classifier. 