# Exploring data

We will explore the data using pandas.

Take a note that pandas allows us to load data from [csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) (like we are going to do) or from [database](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html).

In [1]:
import pandas as pd

df = pd.read_csv('./data/predictive_maintenance.csv')
# Take only 20 first data
df = df[0:1000]

# Proprocess Data

We want to make the data ready for ML training. Since our ML model need numeric values, we need to change `Type` into numeric value:

- H (High quality machine): 2
- M (Medium quality machine): 1
- L (Low quality machine): 0

In [2]:
df['Numeric Type'] = df['Type'].replace({'H': 2, 'M': 1, 'L': 0})
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type,Numeric Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure,1
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure,0
...,...,...,...,...,...,...,...,...,...,...,...
995,996,L48175,L,296.3,307.3,1566,35.8,175,0,No Failure,0
996,997,M15856,M,296.3,307.2,1286,51.1,177,0,No Failure,1
997,998,M15857,M,296.3,307.2,1446,45.9,180,0,No Failure,1
998,999,M15858,M,296.4,307.2,2071,19.4,183,0,No Failure,1


# Splitting data into input (X) and target (y)

For typical supervised learning, we usually need set of data and target. We will name them X and y respectively

In [3]:
X = df[['Numeric Type', 'Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']]
X

Unnamed: 0,Numeric Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min]
0,1,298.1,308.6,1551,42.8,0
1,0,298.2,308.7,1408,46.3,3
2,0,298.1,308.5,1498,49.4,5
3,0,298.2,308.6,1433,39.5,7
4,0,298.2,308.7,1408,40.0,9
...,...,...,...,...,...,...
995,0,296.3,307.3,1566,35.8,175
996,1,296.3,307.2,1286,51.1,177
997,1,296.3,307.2,1446,45.9,180
998,1,296.4,307.2,2071,19.4,183


In [4]:
y = df['Failure Type']
y

0      No Failure
1      No Failure
2      No Failure
3      No Failure
4      No Failure
          ...    
995    No Failure
996    No Failure
997    No Failure
998    No Failure
999    No Failure
Name: Failure Type, Length: 1000, dtype: object

# Split data into training and testing

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Perform training

We will use RandomForestClassifier model and train it against the training data

In [6]:
from sklearn.ensemble import RandomForestClassifier

# Training
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Get model's accuracy score against test data

Now let's evaluate the model

In [7]:
from sklearn.metrics import accuracy_score

y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

accuracy

0.975

# Using model for prediction

We will use the following data for prediction:
- UDI: 11, Failure type: No Failure
- UDI: 51, Failure type: Power Failure

See [./data/predictive_maintenance.csv](./data/predictive_maintenance.csv)

In [8]:

X_predict_sample = [
    [2, 298.4, 308.9, 1782, 23.9, 24],
    [0, 298.9, 309.1, 2861, 4.6, 143]
]
clf.feature_names_in_ = None # We don't care about feature name
clf.predict(X_predict_sample)


array(['No Failure', 'Power Failure'], dtype=object)

# Save the model

Once we satisfied with the model, we can serialize the model so that we can use it later

In [9]:
import pickle

# Save the trained model to a file
with open('manual-model.pkl', 'wb') as model_file:
    pickle.dump(clf, model_file)

# Load the model + Simulate JSON request

In [10]:
import json

# Load the model
with open('manual-model.pkl', 'rb') as model_file:
    loaded_clf = pickle.load(model_file)

# Doing prediction
def predict(json_request_str: str) -> str:
    x_dict = json.loads(json_request_str)
    type_map = {'H': 2, 'M': 1, 'L': 0}
    X = [[
        type_map[x_dict['type']],
        x_dict['air_temperature'],
        x_dict['process_temperature'],
        x_dict['rotational_speed'],
        x_dict['torque'],
        x_dict['total_wear']
    ]]
    y = loaded_clf.predict(X)
    return y[0]

json_request_str = '{"type": "H", "air_temperature": 298.4, "process_temperature": 308.9, "rotational_speed": 1782, "torque": 23.9, "total_wear": 24}'
predict(json_request_str)



'No Failure'