# MLP com Scikit Learn

Fonte: https://www.pluralsight.com/guides/machine-learning-neural-networks-scikit-learn

## Problema a ser analisado

Desenvolver um classificador para analisar dados de pessoas diabéticas e classifica-las.

#### Dataset

Amostras - 768
Features - 9

gravidez - Número de vezes grávidas.

glicose - Concentração de glicose no plasma.

diastólica - pressão arterial diastólica (mm Hg).

SkinThickness - Espessura da dobra cutânea (mm).

insulina - insulina sérica por hora (mu U/ml).

IMC – Taxa metabólica basal (peso em kg/altura em m).

DiabetesPedigreeFunction - Função do pedigree do diabetes.

idade - Idade em anos.

diabetes - “1” representa a presença de diabetes enquanto “0” representa a ausência dela.

### Passo a passo:


 Passo 1 - Carregando as bibliotecas e módulos necessários.

 Passo 2 - Lendo os dados e realizando verificações básicas de dados.

 Etapa 3 - Criando arrays para os recursos e a variável de resposta.

 Etapa 4 - Criando os conjuntos de dados de treinamento e teste.

 Etapa 5 - Construir, prever e avaliar o modelo de rede neural.

In [None]:
import os

os.environ['KAGGLE_CONFIG_DIR'] = "/content"

!chmod 600 /content/kaggle.json

!kaggle datasets download -d uciml/pima-indians-diabetes-database
!unzip pima-indians-diabetes-database.zip -d /content/kaggle/

Downloading pima-indians-diabetes-database.zip to /content
  0% 0.00/8.91k [00:00<?, ?B/s]
100% 8.91k/8.91k [00:00<00:00, 8.18MB/s]
Archive:  pima-indians-diabetes-database.zip
  inflating: /content/kaggle/diabetes.csv  


In [None]:
# Import required libraries

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import sklearn
from sklearn.neural_network import MLPClassifier
from sklearn.neural_network import MLPRegressor


# Import necessary modules

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.metrics import r2_score

In [None]:
df = pd.read_csv('/content/kaggle/diabetes.csv') 
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
print(df.shape)
df.describe().transpose()

(768, 9)


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Pregnancies,768.0,3.845052,3.369578,0.0,1.0,3.0,6.0,17.0
Glucose,768.0,120.894531,31.972618,0.0,99.0,117.0,140.25,199.0
BloodPressure,768.0,69.105469,19.355807,0.0,62.0,72.0,80.0,122.0
SkinThickness,768.0,20.536458,15.952218,0.0,0.0,23.0,32.0,99.0
Insulin,768.0,79.799479,115.244002,0.0,0.0,30.5,127.25,846.0
BMI,768.0,31.992578,7.88416,0.0,27.3,32.0,36.6,67.1
DiabetesPedigreeFunction,768.0,0.471876,0.331329,0.078,0.24375,0.3725,0.62625,2.42
Age,768.0,33.240885,11.760232,21.0,24.0,29.0,41.0,81.0
Outcome,768.0,0.348958,0.476951,0.0,0.0,0.0,1.0,1.0


In [None]:
target_column = ['Outcome'] 
predictors = list(set(list(df.columns))-set(target_column))
df[predictors] = df[predictors]/df[predictors].max()
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Pregnancies,768.0,0.22618,0.19821,0.0,0.058824,0.176471,0.352941,1.0
Glucose,768.0,0.60751,0.160666,0.0,0.497487,0.58794,0.704774,1.0
BloodPressure,768.0,0.566438,0.158654,0.0,0.508197,0.590164,0.655738,1.0
SkinThickness,768.0,0.207439,0.161134,0.0,0.0,0.232323,0.323232,1.0
Insulin,768.0,0.094326,0.136222,0.0,0.0,0.036052,0.150414,1.0
BMI,768.0,0.47679,0.117499,0.0,0.406855,0.4769,0.545455,1.0
DiabetesPedigreeFunction,768.0,0.19499,0.136913,0.032231,0.100723,0.153926,0.258781,1.0
Age,768.0,0.410381,0.145188,0.259259,0.296296,0.358025,0.506173,1.0
Outcome,768.0,0.348958,0.476951,0.0,0.0,0.0,1.0,1.0


In [None]:
X = df[predictors].values
y = df[target_column].values


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)
print(X_train.shape)
print(X_test.shape)

(537, 8)
(231, 8)


In [None]:
from sklearn.neural_network import MLPClassifier


mlp = MLPClassifier(hidden_layer_sizes=(8,8,8), activation='relu', solver='adam', max_iter=500)
mlp.fit(X_train,y_train)


predict_train = mlp.predict(X_train)
predict_test = mlp.predict(X_test)

  y = column_or_1d(y, warn=True)


### Métrica de avaliação

 Avaliaremos o desempenho do modelo usando a acurácia, que representa a porcentagem de casos classificados corretamente.

 Matematicamente, para um classificador binário, é representado como precisão = (TP+TN)/(TP+TN+FP+FN), onde:

     True Positive, ou TP, são casos com rótulos positivos que foram corretamente classificados como positivos.
     True Negative, ou TN, são casos com rótulos negativos que foram corretamente classificados como negativos.
     Falso Positivo, ou FP, são casos com rótulos negativos que foram classificados incorretamente como positivos.
     Falso Negativo, ou FN, são casos com rótulos positivos que foram incorretamente classificados como negativos.

In [None]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print('Acc: ', accuracy_score(y_train,predict_train))
print('Matriz Confusão: \n', confusion_matrix(y_train,predict_train))
print('Classification Report: \n', classification_report(y_train,predict_train))

Acc:  0.7951582867783985
Matriz Confusão: 
 [[318  40]
 [ 70 109]]
Classification Report: 
               precision    recall  f1-score   support

           0       0.82      0.89      0.85       358
           1       0.73      0.61      0.66       179

    accuracy                           0.80       537
   macro avg       0.78      0.75      0.76       537
weighted avg       0.79      0.80      0.79       537



In [None]:
print('Acc: ',accuracy_score(y_test,predict_test))
print('Matriz Confusão: \n', confusion_matrix(y_test,predict_test))
print('Classification Report: \n', classification_report(y_test,predict_test))

Acc:  0.7575757575757576
Matriz Confusão: 
 [[124  18]
 [ 38  51]]
Classification Report: 
               precision    recall  f1-score   support

           0       0.77      0.87      0.82       142
           1       0.74      0.57      0.65        89

    accuracy                           0.76       231
   macro avg       0.75      0.72      0.73       231
weighted avg       0.76      0.76      0.75       231



### Real Prection

In [None]:
X_real = np.asarray([0.48484848, 0.57228018, 0.30864198, 0.67213115, 0.86934673,
                     0.17647059, 0.54964539, 0.88305785]).reshape(1, -1)

In [None]:
mlp.predict(X_real)[0]

0