# Introduction to Neural Networks

Find the best neural network classifier for the mnist data. Do your own research (exploring number of hidden units, activation functions etc) and report (only!) the model that performs best on the test set.

Rules: 

Trainingset: first 60000 records
Test set: last 10000 records

Best model should be  trained from scratch on the training set (training in the note book)
Use only numpy and sklearn, no other libraries.

Use only models from 
sklearn.neural_network.MLPClassifier, sklearn.preprocessing, sklearn.decomposition.PCA  (the latter two are optional)

No more than a total of 100 units in your neural net (you may choose how many per layer as long as the total does not exceed 100 )
Use only solver 'sgd' or 'adam'


Global manual scaling applied to both training data and test data is allowed (you may use the fact that greyscale are between 0 and 255, endpoints included. Global manual scaling may not use any orther information from the test set)  
Data-driven preprocessing from sklearn is allowed (not required!), but these may only be based on information of the training set.  (you may sort this out in your own research how this is done best).

Code should consists of one call of MLPClassifier. One call to its fit function. One call to its predict function. 

Provide runnable code including the computation of the 
fraction of correctly classified digits on the test set.  A fraction of 0.97 (or larger)  correctly classified digits on the test set should be doable. 

Valid approaches that lead to better test performance are awarded with higher grades. 

In [None]:
import numpy as np               # imports as usual

from sklearn.neural_network import MLPClassifier  # neural network classifier

from sklearn.datasets import fetch_openml     # import MNIST data set
mnist = fetch_openml('mnist_784')

X_train_real = np.array(mnist['data'][:60000])        # I use np.array because sklearn 
y_train = np.array(mnist['target'][:60000])      # is not always happy with dataframes
X_test_real = np.array(mnist['data'][60000:])
y_test = np.array(mnist['target'][60000:])


In [None]:
#normalize/rescale
scale_factor = 255
X_train = X_train_real / scale_factor
X_test = X_test_real / scale_factor

The code below should be improved

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(100,), solver='sgd', max_iter=100, verbose=True, random_state=1,
                    early_stopping=True, validation_fraction=0.1)
# with verbose you can see what is happening
# random state is for reproducibility. 
# early stopping is convenient to avoid overtraining. Here  0.1 validation fraction. 

mlp.fit(X_train, y)
y_mlp = mlp.predict(X_test)
fraction_correct_test_set = np.mean(y_mlp==y_test)

print(' ')
print('Fraction correctly classified on test set:', fraction_correct_test_set)


Iteration 1, loss = 1.92902957
Validation score: 0.605500
Iteration 2, loss = 0.82505939
Validation score: 0.819500
Iteration 3, loss = 0.53568596
Validation score: 0.864667
Iteration 4, loss = 0.42003825
Validation score: 0.884667
Iteration 5, loss = 0.37140106
Validation score: 0.890333
Iteration 6, loss = 0.33984300
Validation score: 0.891333
Iteration 7, loss = 0.32526359
Validation score: 0.904167
Iteration 8, loss = 0.30964340
Validation score: 0.912500
 
Fraction correctly classified on test set: 0.9125





So my performance is a fraction of 0.9406 correctly classified digits in the test set. I tried to improve the result by using the *adam* solver. 

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(100,), solver='adam', max_iter=100, verbose=True, random_state=1,
                    early_stopping=True, validation_fraction=0.1)
# with verbose you can see what is happening
# random state is for reproducibility
# early stopping is convenient to avoid overtraining. Here  0.1 validation fraction. 

mlp.fit(X_train, y_train)
y_mlp = mlp.predict(X_test)
fraction_correct_test_set = np.mean(y_mlp==y_test)

print(' ')
print('Fraction correctly classified on test set:', fraction_correct_test_set)

Iteration 1, loss = 0.45926350
Validation score: 0.930000
Iteration 2, loss = 0.21254782
Validation score: 0.946500
Iteration 3, loss = 0.15597639
Validation score: 0.957167
Iteration 4, loss = 0.12476013
Validation score: 0.963333
Iteration 5, loss = 0.10305632
Validation score: 0.967167
Iteration 6, loss = 0.08764004
Validation score: 0.967833
Iteration 7, loss = 0.07462874
Validation score: 0.969667
Iteration 8, loss = 0.06599581
Validation score: 0.970500
Iteration 9, loss = 0.05776797
Validation score: 0.973500
Iteration 10, loss = 0.05140064
Validation score: 0.972833
Iteration 11, loss = 0.04488681
Validation score: 0.971667
Iteration 12, loss = 0.04008229
Validation score: 0.972667
Iteration 13, loss = 0.03503224
Validation score: 0.973167
Iteration 14, loss = 0.03159713
Validation score: 0.972500
 
Fraction correctly classified on test set: 0.9751




# EXPERIMENTATION

In [None]:
alphas = [0.1, 0.01, 0.001, 0.0001, 0.00001]
max_iters = [100,200,500,1000]
layers = [(100,),(50,50,),(20,20,20,20,20,)]

In [None]:
for a in alphas:
  mlp = MLPClassifier(hidden_layer_sizes=(100,),solver='adam',alpha = a,max_iter=100, verbose=False, random_state=1,early_stopping=True, validation_fraction=0.1)

  mlp.fit(X, y)
  y_mlp = mlp.predict(X_test)
  fraction_correct_test_set = np.mean(y_mlp==y_test)

  print(' ')
  print('Fraction correctly classified on test set with a = {} : {}'.format(a, fraction_correct_test_set))

 
Fraction correctly classified on test set with a = 0.1 : 0.9637
 
Fraction correctly classified on test set with a = 0.01 : 0.9631
 
Fraction correctly classified on test set with a = 0.001 : 0.9604
 
Fraction correctly classified on test set with a = 0.0001 : 0.9618
 
Fraction correctly classified on test set with a = 1e-05 : 0.9644


In [None]:
for a in [1.5, 0.8, 0.5, 0.0000001]:
  mlp = MLPClassifier(hidden_layer_sizes=(100,),solver='adam',alpha = a, max_iter=100, verbose=False, random_state=1,early_stopping=True, validation_fraction=0.1)

  mlp.fit(X, y)
  y_mlp = mlp.predict(X_test)
  fraction_correct_test_set = np.mean(y_mlp==y_test)

  print(' ')
  print('Fraction correctly classified on test set with a = {} : {}'.format(a, fraction_correct_test_set))

 
Fraction correctly classified on test set with a = 1.5 : 0.9692
 
Fraction correctly classified on test set with a = 0.8 : 0.9685
 
Fraction correctly classified on test set with a = 0.5 : 0.9647
 
Fraction correctly classified on test set with a = 1e-07 : 0.966


In [None]:
for i in max_iters:
  mlp = MLPClassifier(hidden_layer_sizes=(100,),solver='adam', alpha = 1.5, max_iter=i, verbose=False, random_state=1,early_stopping=True, validation_fraction=0.1)

  mlp.fit(X, y)
  y_mlp = mlp.predict(X_test)
  fraction_correct_test_set = np.mean(y_mlp==y_test)

  print(' ')
  print('Fraction correctly classified on test set with max_iters = {} : {}'.format(i, fraction_correct_test_set))

 
Fraction correctly classified on test set with max_iters = 100 : 0.9692
 
Fraction correctly classified on test set with max_iters = 200 : 0.9692
 
Fraction correctly classified on test set with max_iters = 500 : 0.9692
 
Fraction correctly classified on test set with max_iters = 1000 : 0.9692


In [None]:
for i in layers:
  mlp = MLPClassifier(hidden_layer_sizes=i,solver='adam', max_iter=100, verbose=False, random_state=1,early_stopping=True, validation_fraction=0.1)

  mlp.fit(X, y)
  y_mlp = mlp.predict(X_test)
  fraction_correct_test_set = np.mean(y_mlp==y_test)

  print(' ')
  print('Fraction correctly classified on test set with layer set = {} : {}'.format(a, fraction_correct_test_set))