# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [2]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [3]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [4]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)

(70000, 784)

In [5]:
X.min()

0.0

In [6]:
#look at max/min value in the data
X.max()

255.0

In [32]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.55)

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)

In [36]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from scipy.stats import uniform
from sklearn.model_selection import RandomizedSearchCV
from skopt.space import Real, Categorical, Integer

distributions = dict(C=uniform(0.1, 10.0), gamma=uniform(0.1, 1.0))

clf = RandomizedSearchCV(SVC(max_iter=250),distributions, random_state=0, n_jobs=10, n_iter=5)

clf.fit(X_train, y_train)



RandomizedSearchCV(estimator=SVC(max_iter=250), n_iter=5, n_jobs=10,
                   param_distributions={'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x0000026645CDE0A0>,
                                        'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x0000026645CDEA60>},
                   random_state=0)

In [37]:
clf.best_params_

{'C': 5.588135039273247, 'gamma': 0.8151893663724195}

### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [18]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, plot_confusion_matrix

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.25)

In [20]:
def run_hp_set(mlp_clf):
    mlp = make_pipeline(StandardScaler(), mlp_clf)
    mlp.fit(X_train, y_train)
    return accuracy_score(y_test, mlp.predict(X_test))

#### E1.2.1 Layer-Size (1-3)

In [17]:
run_hp_set(MLPClassifier(hidden_layer_sizes=(100), alpha=0.0001, learning_rate_init=0.001, random_state=0, activation='tanh', solver='sgd'))



0.9380380952380952

In [45]:
run_hp_set(MLPClassifier(hidden_layer_sizes=(100,100), alpha=0.0001, learning_rate_init=0.001, random_state=0, activation='tanh', solver='sgd'))



0.9605142857142858

In [46]:
run_hp_set(MLPClassifier(hidden_layer_sizes=(100,100,100), alpha=0.0001, learning_rate_init=0.001, random_state=0, activation='tanh', solver='sgd'))



0.9564571428571429

In [56]:
run_hp_set(MLPClassifier(hidden_layer_sizes=(100,6,10), alpha=0.0001, learning_rate_init=0.001, random_state=0, activation='tanh', solver='sgd'))

0.9429142857142857

#### E1.2.2 Learning Rate

In [18]:
run_hp_set(MLPClassifier(alpha=0.0001, learning_rate_init=0.0001, random_state=0, activation='tanh', solver='sgd'))



0.9059428571428572

In [20]:
run_hp_set(MLPClassifier(hidden_layer_sizes=(100,6,10), alpha=0.01, learning_rate_init=0.01, random_state=0, activation='tanh', solver='sgd'))

0.9260190476190476

In [21]:
run_hp_set(MLPClassifier(hidden_layer_sizes=(100,6,10), alpha=0.01, learning_rate_init=0.1, random_state=0, activation='tanh', solver='sgd'))

0.9183619047619047

In [22]:
run_hp_set(MLPClassifier(hidden_layer_sizes=(100,6,10), alpha=0.01, learning_rate_init=1, random_state=0, activation='tanh', solver='sgd'))

0.7656761904761905

#### E1.2.3 Alpha

In [23]:
run_hp_set(MLPClassifier(alpha=0.001, learning_rate_init=0.0001, random_state=0, activation='tanh', solver='sgd'))



0.9059428571428572

In [24]:
run_hp_set(MLPClassifier(alpha=0.00001, learning_rate_init=0.0001, random_state=0, activation='tanh', solver='sgd'))



0.9059428571428572

In [23]:
run_hp_set(MLPClassifier(alpha=0.1, learning_rate_init=0.0001, random_state=0, activation='tanh', solver='sgd'))



0.9040571428571429