### Notebook de treinamento
Objetivo:  Criar um modelo de classificação que identifique quais máquinas apresentam potencial de falha tendo como base dados extraídos através de sensores durante o processo de manufatura.

Importando as bibliotecas necessarias:

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

In [2]:
SEED = 20
np.random.seed(SEED)

Abrindo o arquivo de treino:

In [3]:
## dataset composto por 9 colunas de informação (features) e a variável a ser prevista (“failure_type”)
df_train = pd.read_csv('desafio_manutencao_preditiva_treino.csv')
df_train.head()

Unnamed: 0,udi,product_id,type,air_temperature_k,process_temperature_k,rotational_speed_rpm,torque_nm,tool_wear_min,failure_type
0,1,M14860,M,298.1,308.6,1551,42.8,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,No Failure
2,5,L47184,L,298.2,308.7,1408,40.0,9,No Failure
3,6,M14865,M,298.1,308.6,1425,41.9,11,No Failure
4,7,L47186,L,298.1,308.6,1558,42.4,14,No Failure


Separando a coluna 'failure_type':

In [4]:
x = df_train[['air_temperature_k', 'process_temperature_k', 'rotational_speed_rpm', 'torque_nm', 'tool_wear_min']]
y = df_train['failure_type']
x.head()

Unnamed: 0,air_temperature_k,process_temperature_k,rotational_speed_rpm,torque_nm,tool_wear_min
0,298.1,308.6,1551,42.8,0
1,298.2,308.7,1408,46.3,3
2,298.2,308.7,1408,40.0,9
3,298.1,308.6,1425,41.9,11
4,298.1,308.6,1558,42.4,14


Manipulando os valores:

In [5]:
df_train['type'].unique()

array(['M', 'L', 'H'], dtype=object)

In [7]:
troca = {'M': 0, 'L': 1, 'H': 2}
x['type'] = df_train['type'].map(troca)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x['type'] = df_train['type'].map(troca)


In [8]:
x.head()

Unnamed: 0,air_temperature_k,process_temperature_k,rotational_speed_rpm,torque_nm,tool_wear_min,type
0,298.1,308.6,1551,42.8,0,0
1,298.2,308.7,1408,46.3,3,1
2,298.2,308.7,1408,40.0,9,1
3,298.1,308.6,1425,41.9,11,0
4,298.1,308.6,1558,42.4,14,1


Dividindo o dataset entre treino e teste:

In [23]:
train_x, test_x, train_y, test_y = train_test_split(x, y, stratify=y, test_size = 0.20)

Treinando o modelo:

In [24]:
model = DecisionTreeClassifier(criterion='entropy', max_depth=3)
model.fit(train_x, train_y)

#### Validação e metricas

Verificando a performace do modelo:

In [44]:
from sklearn import metrics
from sklearn.metrics import confusion_matrix, classification_report

In [45]:
### Função de metricas (accuracy, confusion matrix)
def get_metrics(predict, ref):
    accuracy = metrics.accuracy_score(ref, predict)
    cm=confusion_matrix(ref, predict)
    class_report = classification_report(ref, predict)

    return accuracy, cm, class_report

In [47]:
pred_y = model.predict(test_x)
accuracy, cm, class_report = get_metrics(pred_y, test_y)

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [48]:
print("Accuracy: {:.2f}".format(accuracy))
print('Confusion Matrix: \n', cm)
print(class_report)

Accuracy: 0.97
Confusion Matrix: 
 [[   0   15    0    0    0    0]
 [   0 1288    0    0    0    0]
 [   0   10    0    0    0    0]
 [   0    7    0    6    0    0]
 [   0    2    0    0    0    0]
 [   0    6    0    0    0    0]]
                          precision    recall  f1-score   support

Heat Dissipation Failure       0.00      0.00      0.00        15
              No Failure       0.97      1.00      0.98      1288
      Overstrain Failure       0.00      0.00      0.00        10
           Power Failure       1.00      0.46      0.63        13
         Random Failures       0.00      0.00      0.00         2
       Tool Wear Failure       0.00      0.00      0.00         6

                accuracy                           0.97      1334
               macro avg       0.33      0.24      0.27      1334
            weighted avg       0.95      0.97      0.96      1334



Exportando modelo:

In [56]:
import pickle

with open('model.pkl', 'wb') as file:
    pickle.dump(model, file)