# Using Scikit-Learn Neural Network Class to classify MNIST



# Sobre

O MNIST de Yann LeCun é o conjunto de dados mais "usado" em Machine Learning, sendo muito utilizado como o problema "Hello World" em Machine Learning. É antigo, mas ainda muito útil. Mesmo o próprio Hinton permanece [Geoffrey Hinton's Capsule Network](https ://en.wikipedia.org/wiki/Capsule_neural_network) também usando MNIST como teste.

Atualmente qualquer tutorial orientará o aluno a usar a biblioteca PyTorch para resolver o problema MNIST, mas vamos resolver o problema MNIST por código Python "puro", criando o algoritmo do zero ou usando a biblioteca convencional de aprendizado de máquina [Scikit-Learn](https://scikit-learn.org/stable/) `MLPClassifier`

Para entender o que é o dataset, vale uma leitura em https://www.openml.org/d/554


In [1]:
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier
from sklearn.decomposition import PCA

In [2]:
# load the data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

X = X / 255.0

# use the traditional train/test split
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

Em seguida, vamos construir um modelo MLaP de camada oculta única. Aqui vamos esperar que vocês modifiquem o modelo e os parâmetros de treinamento até que encontrem a melhor MLP possível:

In [3]:
# Modelo MLP sem PCA
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=50, alpha=1e-4,
                    solver='sgd', verbose=10, random_state=1, learning_rate_init=0.1)

print("Treinando MLP sem PCA...")
mlp.fit(X_train, y_train)

Treinando MLP sem PCA...
Iteration 1, loss = 0.29472131
Iteration 2, loss = 0.12177005
Iteration 3, loss = 0.08714874
Iteration 4, loss = 0.06943661
Iteration 5, loss = 0.05769795
Iteration 6, loss = 0.04738813
Iteration 7, loss = 0.04063357
Iteration 8, loss = 0.03374012
Iteration 9, loss = 0.02797139
Iteration 10, loss = 0.02464722
Iteration 11, loss = 0.02036596
Iteration 12, loss = 0.01651301
Iteration 13, loss = 0.01448737
Iteration 14, loss = 0.01197540
Iteration 15, loss = 0.00978637
Iteration 16, loss = 0.00857836
Iteration 17, loss = 0.00701300
Iteration 18, loss = 0.00587422
Iteration 19, loss = 0.00499287
Iteration 20, loss = 0.00426864
Iteration 21, loss = 0.00363586
Iteration 22, loss = 0.00316218
Iteration 23, loss = 0.00298604
Iteration 24, loss = 0.00268774
Iteration 25, loss = 0.00250510
Iteration 26, loss = 0.00231729
Iteration 27, loss = 0.00218511
Iteration 28, loss = 0.00202351
Iteration 29, loss = 0.00190881
Iteration 30, loss = 0.00182350
Iteration 31, loss = 0.0

0,1,2
,hidden_layer_sizes,"(100,)"
,activation,'relu'
,solver,'sgd'
,alpha,0.0001
,batch_size,'auto'
,learning_rate,'constant'
,learning_rate_init,0.1
,power_t,0.5
,max_iter,50
,shuffle,True


In [4]:
print(f"Taxa de acerto no conjunto de treino (sem PCA): {mlp.score(X_train, y_train):.3f}")
print(f"Taxa de acerto no conjunto de teste (sem PCA): {mlp.score(X_test, y_test):.3f}")

Taxa de acerto no conjunto de treino (sem PCA): 1.000
Taxa de acerto no conjunto de teste (sem PCA): 0.980


Como exercício, aplique PCA novamente nesse conjunto de dados, treine novamente os modelos e compare as diferenças entre performance nos conjuntos de treino e de testes em ambos os casos (ou seja, com e sem PCA).

Depois de fazer isso, envie seu arquivo com o código no AVA.


In [5]:
# Reduzindo dimensionalidade de 784 -> 100 componentes
pca = PCA(n_components=100)
X_pca = pca.fit_transform(X)

# Divide novamente em treino/teste
X_train_pca, X_test_pca = X_pca[:60000], X_pca[60000:]
y_train, y_test = y[:60000], y[60000:]

In [6]:
# Modelo MLP com PCA
mlp_pca = MLPClassifier(hidden_layer_sizes=(100,), max_iter=50, alpha=1e-4,
                        solver='sgd', verbose=10, random_state=1, learning_rate_init=0.1)

print("Treinando MLP com PCA...")
mlp_pca.fit(X_train_pca, y_train)


Treinando MLP com PCA...
Iteration 1, loss = 0.27239283
Iteration 2, loss = 0.11626254
Iteration 3, loss = 0.08595327
Iteration 4, loss = 0.06800528
Iteration 5, loss = 0.05684109
Iteration 6, loss = 0.04857310
Iteration 7, loss = 0.04209162
Iteration 8, loss = 0.03687012
Iteration 9, loss = 0.03225769
Iteration 10, loss = 0.02838275
Iteration 11, loss = 0.02480549
Iteration 12, loss = 0.02134989
Iteration 13, loss = 0.01946926
Iteration 14, loss = 0.01748456
Iteration 15, loss = 0.01534034
Iteration 16, loss = 0.01356220
Iteration 17, loss = 0.01211656
Iteration 18, loss = 0.01074876
Iteration 19, loss = 0.00957521
Iteration 20, loss = 0.00815735
Iteration 21, loss = 0.00776996
Iteration 22, loss = 0.00705377
Iteration 23, loss = 0.00640832
Iteration 24, loss = 0.00551386
Iteration 25, loss = 0.00527970
Iteration 26, loss = 0.00485134
Iteration 27, loss = 0.00449096
Iteration 28, loss = 0.00409158
Iteration 29, loss = 0.00379580
Iteration 30, loss = 0.00364874
Iteration 31, loss = 0.0



0,1,2
,hidden_layer_sizes,"(100,)"
,activation,'relu'
,solver,'sgd'
,alpha,0.0001
,batch_size,'auto'
,learning_rate,'constant'
,learning_rate_init,0.1
,power_t,0.5
,max_iter,50
,shuffle,True


In [7]:
print(f"Taxa de acerto no conjunto de treino (com PCA): {mlp_pca.score(X_train_pca, y_train):.3f}")
print(f"Taxa de acerto no conjunto de teste (com PCA): {mlp_pca.score(X_test_pca, y_test):.3f}")

Taxa de acerto no conjunto de treino (com PCA): 1.000
Taxa de acerto no conjunto de teste (com PCA): 0.981
