<head>
  <meta name="author" content="Rogério de Oliveira">
  <meta institution="author" content="Universidade Presbiteriana Mackenzie">
</head>

<img src="http://meusite.mackenzie.br/rogerio/mackenzie_logo/UPM.2_horizontal_vermelho.jpg" width=300, align="right"> 

<h1 align=left><font size = 8, style="color:rgb(200,0,0)"><b>Deep Learning</b></font></h1> 
<a href="mailto:rogerio.oliveira@mackenzie.br">Rogério de Oliveira</a><br>

<br>
<br>

---


# Neurônio simples
 
Um neurônio artificial do tipo **perceptron** faz uma combinação linear de entradas e aplica uma função de ativação como a função $sign$, $tanh$ ou $relu$ para produzir uma saída.

$$ f(X) = sign( w_0 + w_1 x_1 + ... + w_n x_n ) $$

O treinamento do neurônio é feito ajustando-se os pesos $w_n$ de acordo com base em uma função de custo (por exemplo uma medida do erro e predição) obtido para se estimar a saída $f(X) \cong y$.

$$ \min_{W} \sum || f(X)- y || $$

Desse modo, você pode entender o aprendizado de um neurônio como um problema de otimização.


# Redes neurais 
 
Um único neurônio entretanto tem uma capacidade bastante limitada de aprendizado, restringindo-se a problemas de separação linear. Desse modo, por exemplo, ele não consegue aprender uma função como a função **XOR**.


Função $XOR(X) \rightarrow y$:

```
  X     y
 0 0    0
 0 1    1
 1 0    1
 1 0    0
```

que é não linearmente separável. 

Para resolver essa limitação podemos então trabalhar com múltiplos neurônios em camadas. As saídas dos neurônios de uma camada são então empregadas como entradas para a camada seguinte. As camadas entre a camada inicial de neurônios (de entrada) e a camada final (de saída) constituem as camadas ocultas da rede.

O treinamento da rede segue o mesmo princípio, embora mais complexo, ajustando os pesos $w_n$ de acordo com o erro de predição obtido para se estimar a saída $f(X) \cong y$.

$$ \min_{W} \sum || f(X)- y || $$

Chamamos esse aprendizado de *backpropagation* ou *retropropagação*.

Acesse agora http://playground.tensorflow.org/ para uma demonstração.

# Esquema Geral para Modelos Supervisionados e MLP

Modelos de Aprendizado Supervisionado seguem todos um esquema bastante geral no [`scikit learn`](https://scikit-learn.org/stable/supervised_learning.html) e em muitos frameworks. Modelos Supervisionados de [Redes Neurais](https://scikit-learn.org/stable/modules/neural_networks_supervised.html) podem ser implementados com o `scikit learn` e seguem a mesma estrutura. 

# Toy-example: `Remain` or `Leave`?

Abaixo um *toy-example*, um conjunto de dados de notas de alunos, de 0 a 5, para as disciplinas `A, B, C, D` e se o aluno `R` (*remain*) permanece no curso no final do semestre ou `L` (*leave*) deixa o curso.

Apesar de um exemplo simpls, muitos outros problemas interesse se encaixam nessa mesma tipologia como problemas de *churn* de clientes, *fraud/non-fraud*, *credit/non-credit*, *defect/not-defect*, *benign/malign* etc.

<img src="http://meusite.mackenzie.br/rogerio/TIC/SUP_ML.png" width=1000, align="center"></a>
Fig. 1.Esquema Geral para Modelos Supervisionados, aqui empregando uma Árvore de Decisão.


<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS2b8v5LvJ-2VoobTCraNw_0hOQS5Mf-zlc0Q&usqp=CAU" width=600, align="center"> 



# Obtenção dos Dados

In [1]:
import pandas as pd
students = pd.DataFrame({'A':[3, 5, 1, 1, 4, 2, 1, 5, 2, 4, 4, 2, 2, 2, 5, 5, 3, 4, 5, 4],
                         'B':[1, 1, 4, 5, 2, 4, 2, 3, 2, 1, 4, 3, 5, 2, 4, 1, 3, 2, 3, 2],
                         'C':[2, 1, 5, 3, 3, 1, 3, 4, 1, 4, 2, 4, 4, 1, 4, 1, 5, 1, 4, 3],
                         'D':[2, 3, 3, 1, 1, 2, 3, 4, 2, 4, 3, 5, 4, 2, 3, 1, 2, 2, 3, 2],
                         'status':['R', 'L', 'R', 'L', 'L', 'L', 'R', 'R', 'L', 'R', 'L', 'R', 'R', 'L', 'L', 'L', 'R', 'L', 'L', 'L']})
print(students)
new_students = pd.DataFrame({'A':[5, 4, 4, 3],
                             'B':[1, 4, 1, 3],
                             'C':[1, 1, 2, 3],
                             'D':[2, 4, 1, 3],
                             'status':['?', '?', '?', '?']})
print(new_students)


    A  B  C  D status
0   3  1  2  2      R
1   5  1  1  3      L
2   1  4  5  3      R
3   1  5  3  1      L
4   4  2  3  1      L
5   2  4  1  2      L
6   1  2  3  3      R
7   5  3  4  4      R
8   2  2  1  2      L
9   4  1  4  4      R
10  4  4  2  3      L
11  2  3  4  5      R
12  2  5  4  4      R
13  2  2  1  2      L
14  5  4  4  3      L
15  5  1  1  1      L
16  3  3  5  2      R
17  4  2  1  2      L
18  5  3  4  3      L
19  4  2  3  2      L
   A  B  C  D status
0  5  1  1  2      ?
1  4  4  1  4      ?
2  4  1  2  1      ?
3  3  3  3  3      ?


# Definição das Variáveis Preditoras `X` e Dependente `y`

Neste ponto é esperado que os dados já estejam prontos para a aplicação do modelo tendo sido analisados e transformados nas fases de Entendimento e Preparação dos Dados (tratamento de nulos, hot encode, normalização e outras transformações necessárias).

In [2]:
X = students[['A','B','C','D']]       
y = students.status

# Separação dos Conjuntos de Treinamento e Teste `X_train, X_test, y_train, y_test`

In [3]:
from sklearn.model_selection import train_test_split
seed = 1984

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=seed)

# Declaração do Modelo `clf`

In [4]:
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(criterion='gini',
                              max_depth=None,
                              random_state=seed)

# Treinamento do modelo `fit()`

In [5]:
clf.fit(X_train, y_train)

DecisionTreeClassifier(random_state=1984)

# Predição do Conjunto de Teste `predict()`

In [6]:
y_pred = clf.predict(X_test)


# Avaliação das métricas do modelo 

Dependente do tipo de modelo. Aqui empregamos apenas a acuracidade.

In [7]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_pred, y_test)
print(f'Accuracy: {accuracy :0.3f} %')

Accuracy: 1.000 %


# Predição dos novos casos

In [8]:
X_new = new_students[['A','B','C','D']]    

y_pred = clf.predict(X_new)

print(y_pred)

['L' 'R' 'L' 'R']


**Nota** O esquema apresentado é aqui é um esquema introdutório com propósitos unicamente didáticos e é, portanto, um modelo bastante simplificado. Particularmete ele não leva em consideração múltiplas execuções dos modelos ou técnicas como Cross Validation e também emprega uma única métrica de resultados, a acuracidade. O seguimento desse tema deveria incluir o Cross Validation e outras métricas (ROC, F1-score, recall, overfitting etc.). 

# Esquema Geral Completo

Colocando todas as células em um único código temos o seguinte:

In [9]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.tree import DecisionTreeClassifier

seed = 1984

X = students[['A','B','C','D']]       
y = students.status

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=seed)

clf = DecisionTreeClassifier(criterion='gini',
                              max_depth=None,
                              random_state=seed)

#
# Por exemplo, alternativamente, poderíamos empregar um outro modelo como um SVC no lugar a Árvore de Decisão
#
# from sklearn import svm
# clf = svm.SVC()
#
# Todos as demais instruções, poderiam ser mantidas sem qualquer alteração.
#

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_pred, y_test)
print(f'Accuracy: {accuracy :0.3f} %')

X_new = new_students[['A','B','C','D']]    

y_pred = clf.predict(X_new)

print(y_pred)


Accuracy: 1.000 %
['L' 'R' 'L' 'R']


Note que

>```
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion='gini',
                              max_depth=None,
                              random_state=seed)
```

Poderia ser substituído por qualquer outro modelo, por exemplo uma Support Vector Machine:

>```
from sklearn import svm
clf = svm.SVC()
```

Com todas as demais instruções mantidas sem qualquer mudança.



# Mais Métricas (opcional)

In [10]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

y_pred = clf.predict(X_test)

cm = confusion_matrix(y_pred, y_test)
print(cm)
    
accuracy = accuracy_score(y_pred, y_test)
print(accuracy)

print(classification_report(y_pred,y_test))

[[4 0]
 [0 2]]
1.0
              precision    recall  f1-score   support

           L       1.00      1.00      1.00         4
           R       1.00      1.00      1.00         2

    accuracy                           1.00         6
   macro avg       1.00      1.00      1.00         6
weighted avg       1.00      1.00      1.00         6



Alternativamente

In [11]:
 clf.score(X_test, y_test)

1.0

# Exercício

Altere o esquema geral apresentado para aplicação de um modelo de rede neural [MLP](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier).

```
from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(solver='lbfgs', 
              alpha = ...,
              max_iter = ...,
              hidden_layer_sizes = ..., 
              random_state = ...)

```


**Nota** 

> Função de Ativação: 
* `activation{‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default=’relu’`

> Solvers: 
* L-BFGS: Use para pequenos conjuntos de dados.
* Adam: Use para grandes conjuntos de dados (default).
* SGD: Gradiente estocástico, requer definir corretamente parâmetros como taxa de aprendizado, momentum etc.

In [12]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.neural_network import MLPClassifier

seed = 1984

X = students[['A','B','C','D']]       
y = students.status

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=seed)

clf = MLPClassifier(solver='lbfgs', 
              alpha = 0.01,
              max_iter = 100,
              hidden_layer_sizes = (2,2), 
              random_state = seed)

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_pred, y_test)
print(f'Accuracy: {accuracy :0.3f} %')

X_new = new_students[['A','B','C','D']]    

y_pred = clf.predict(X_new)

print(y_pred)


Accuracy: 1.000 %
['L' 'L' 'L' 'R']


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


# Exibindo os pesos da MLP

Este código executará sobre o modelo o exercício anterior (`clf`).

In [13]:
import numpy as np

print(f'\nCamadas : ')
for i in range(len(clf.coefs_)):
  print(np.shape(clf.coefs_[i]))

for i in range(len(clf.coefs_)):
  print(f'\nW{i}: ')
  print(clf.coefs_[i])
  print(f'\nB{i}: ')
  print(clf.intercepts_[i])




Camadas : 
(4, 2)
(2, 2)
(2, 1)

W0: 
[[ 1.0703103  -0.27709637]
 [ 1.53423257 -0.42054466]
 [-1.0931328   0.30892174]
 [-1.50550167  0.46162624]]

B0: 
[1.16101666 0.33004816]

W1: 
[[ 2.66789684e+00  7.15011937e-04]
 [-7.21939314e-01 -3.02613406e-02]]

B1: 
[ 0.41857034 -0.8053998 ]

W2: 
[[-2.77532442]
 [-0.03938361]]

B2: 
[10.97377198]


# Exercício

Neste exercício você fará a predição de diagnóstico de câncer de mama a partir de features já extraídas de imagens para diagnóstico. Você pode usar os dados pré-formatados [aqui](http://meusite.mackenzie.br/rogerio/DLA2021S1/breast_cancer.csv), evitando assim empregar os dados brutos da fonte original.

> [Decision Tree Model in the Diagnosis of Breast Cancer](https://niklausliu.github.io/files/Yi-Decision%20Tree%20Model%20in%20the%20Diagnosis%20of%20Breast%20Cancer.pdf)

> [Breast Cancer Data](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

Analise os dados e, antes de criar seu modelo neural, verifique a necessidade tratamentos prévios dos dados como:

* *Feature selection* com a exclusão de atributos para o treinamento 
(&#X1F44D;)
* Tratamento dos dados faltantes (&#X1F44E;)
* *Hot encode* para conversão de dados categóricos (&#X1F44E;) 
* Normalização dos dados (&#X1F44D;)

Apresente os resultados de acuracidade do seu modelo para as variáveis preditoras normalizadas e não normalizadas. Qual a sua conclusão?


In [16]:
import pandas as pd
breast = pd.read_csv('http://meusite.mackenzie.br/rogerio/DLA2021S1/breast_cancer.csv')
breast.head(100)

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,M,17.990,10.38,122.80,1001.0,0.11840,0.27760,0.300100,0.147100,...,25.38,17.33,184.60,2019.0,0.1622,0.66560,0.71190,0.26540,0.4601,0.11890
1,842517,M,20.570,17.77,132.90,1326.0,0.08474,0.07864,0.086900,0.070170,...,24.99,23.41,158.80,1956.0,0.1238,0.18660,0.24160,0.18600,0.2750,0.08902
2,84300903,M,19.690,21.25,130.00,1203.0,0.10960,0.15990,0.197400,0.127900,...,23.57,25.53,152.50,1709.0,0.1444,0.42450,0.45040,0.24300,0.3613,0.08758
3,84348301,M,11.420,20.38,77.58,386.1,0.14250,0.28390,0.241400,0.105200,...,14.91,26.50,98.87,567.7,0.2098,0.86630,0.68690,0.25750,0.6638,0.17300
4,84358402,M,20.290,14.34,135.10,1297.0,0.10030,0.13280,0.198000,0.104300,...,22.54,16.67,152.20,1575.0,0.1374,0.20500,0.40000,0.16250,0.2364,0.07678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,86208,M,20.260,23.03,132.40,1264.0,0.09078,0.13130,0.146500,0.086830,...,24.22,31.59,156.10,1750.0,0.1190,0.35390,0.40980,0.15730,0.3689,0.08368
96,86211,B,12.180,17.84,77.79,451.1,0.10450,0.07057,0.024900,0.029410,...,12.83,20.92,82.14,495.2,0.1140,0.09358,0.04980,0.05882,0.2227,0.07376
97,862261,B,9.787,19.94,62.11,294.5,0.10240,0.05301,0.006829,0.007937,...,10.92,26.29,68.81,366.1,0.1316,0.09473,0.02049,0.02381,0.1934,0.08988
98,862485,B,11.600,12.84,74.34,412.6,0.08983,0.07525,0.041960,0.033500,...,13.06,17.16,82.96,512.5,0.1431,0.18510,0.19220,0.08449,0.2772,0.08756


In [21]:
# Seu código

# Mapeando a feature 'diagnosis' para '0' ou '1'
diag = {"M":0, "B":1}
breast['diagnosis'] = breast['diagnosis'].map(diag)

In [22]:
breast.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,0,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,0,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,0,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,0,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [34]:
breast.corr().style.background_gradient().set_precision(2)

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
id,1.0,-0.04,0.075,0.1,0.073,0.097,-0.013,9.6e-05,0.05,0.044,-0.022,-0.053,0.14,-0.0075,0.14,0.18,0.097,0.034,0.055,0.079,-0.017,0.026,0.082,0.065,0.08,0.11,0.01,-0.003,0.023,0.035,-0.044,-0.03
diagnosis,-0.04,1.0,-0.73,-0.42,-0.74,-0.71,-0.36,-0.6,-0.7,-0.78,-0.33,0.013,-0.57,0.0083,-0.56,-0.55,0.067,-0.29,-0.25,-0.41,0.0065,-0.078,-0.78,-0.46,-0.78,-0.73,-0.42,-0.59,-0.66,-0.79,-0.42,-0.32
radius_mean,0.075,-0.73,1.0,0.32,1.0,0.99,0.17,0.51,0.68,0.82,0.15,-0.31,0.68,-0.097,0.67,0.74,-0.22,0.21,0.19,0.38,-0.1,-0.043,0.97,0.3,0.97,0.94,0.12,0.41,0.53,0.74,0.16,0.0071
texture_mean,0.1,-0.42,0.32,1.0,0.33,0.32,-0.023,0.24,0.3,0.29,0.071,-0.076,0.28,0.39,0.28,0.26,0.0066,0.19,0.14,0.16,0.0091,0.054,0.35,0.91,0.36,0.34,0.078,0.28,0.3,0.3,0.11,0.12
perimeter_mean,0.073,-0.74,1.0,0.33,1.0,0.99,0.21,0.56,0.72,0.85,0.18,-0.26,0.69,-0.087,0.69,0.74,-0.2,0.25,0.23,0.41,-0.082,-0.0055,0.97,0.3,0.97,0.94,0.15,0.46,0.56,0.77,0.19,0.051
area_mean,0.097,-0.71,0.99,0.32,0.99,1.0,0.18,0.5,0.69,0.82,0.15,-0.28,0.73,-0.066,0.73,0.8,-0.17,0.21,0.21,0.37,-0.072,-0.02,0.96,0.29,0.96,0.96,0.12,0.39,0.51,0.72,0.14,0.0037
smoothness_mean,-0.013,-0.36,0.17,-0.023,0.21,0.18,1.0,0.66,0.52,0.55,0.56,0.58,0.3,0.068,0.3,0.25,0.33,0.32,0.25,0.38,0.2,0.28,0.21,0.036,0.24,0.21,0.81,0.47,0.43,0.5,0.39,0.5
compactness_mean,9.6e-05,-0.6,0.51,0.24,0.56,0.5,0.66,1.0,0.88,0.83,0.6,0.57,0.5,0.046,0.55,0.46,0.14,0.74,0.57,0.64,0.23,0.51,0.54,0.25,0.59,0.51,0.57,0.87,0.82,0.82,0.51,0.69
concavity_mean,0.05,-0.7,0.68,0.3,0.72,0.69,0.52,0.88,1.0,0.92,0.5,0.34,0.63,0.076,0.66,0.62,0.099,0.67,0.69,0.68,0.18,0.45,0.69,0.3,0.73,0.68,0.45,0.75,0.88,0.86,0.41,0.51
concave points_mean,0.044,-0.78,0.82,0.29,0.85,0.82,0.55,0.83,0.92,1.0,0.46,0.17,0.7,0.021,0.71,0.69,0.028,0.49,0.44,0.62,0.095,0.26,0.83,0.29,0.86,0.81,0.45,0.67,0.75,0.91,0.38,0.37


In [62]:
fList=[]
for i in breast:
    fList.append(i)

In [63]:
# Removendo features com correlação menor que [.1]
cVal=0
for j in fList:
    cVal = breast['diagnosis'].corr(breast[j])
    if cVal < 0:
        cVal*=-1
    if cVal < 0.1:
        breast = breast.drop([j], axis = 1)

In [65]:
breast.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,0,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,0,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,0,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,0,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [67]:
# Normalizando os dados
from sklearn.preprocessing import MinMaxScaler

# create a scaler object
scaler = MinMaxScaler()
# fit and transform the data
breastNorm = pd.DataFrame(scaler.fit_transform(breast), columns=breast.columns)

# Aplicando `Cross-Validation`

Este código executará sobre o modelo e conjuntos de dados do exercício anterior (`clf, X_train, y_train, X_test, y_test`).

<img src="https://scikit-learn.org/stable/_images/grid_search_cross_validation.png" width=600, align="center">

In [16]:
from sklearn.model_selection import cross_val_score

cv = cross_val_score(clf, X_train, y_train, cv=10)
test_score = clf.fit(X_train, y_train).score(X_test, y_test)

print('CV accuracy score: %0.3f' % np.mean(cv))
print('Test accuracy score: %0.3f' % (test_score))

CV accuracy score: 0.904
Test accuracy score: 0.953


# Vários Modelos Comparados (opcional)

In [17]:
models = [ ]

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion='gini',
                              max_depth=None,
                              random_state=seed)
models.append(['Decision Tree', clf])

from sklearn import neighbors
n_neighbors = 5                                     
clf = neighbors.KNeighborsClassifier(n_neighbors)    

models.append(['Knn', clf])

from sklearn.naive_bayes import GaussianNB, BernoulliNB
clf = BernoulliNB()

models.append(['Naive Bayes', clf])

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=10)

models.append(['Random Forest', clf])

from sklearn import svm
clf = svm.SVC()

models.append(['Support Vector Machines', clf])

from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(solver='lbfgs', 
              alpha = 1e-5,
              max_iter = 10000,
              hidden_layer_sizes = (3, 10, 2), 
              random_state = seed)

models.append(['MLP Neural Network', clf])

from sklearn.metrics import accuracy_score
for model in models:
    clf = model[1]
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_pred, y_test)
    print('Model ' + model[0] + ' accuracy: {:0.3f} %'.format(accuracy))

Model Decision Tree accuracy: 0.942 %
Model Knn accuracy: 0.977 %
Model Naive Bayes accuracy: 0.977 %
Model Random Forest accuracy: 0.953 %
Model Support Vector Machines accuracy: 0.977 %
Model MLP Neural Network accuracy: 0.953 %
