# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [1]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [2]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [3]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)

(70000, 784)

In [4]:
X.min()

0.0

In [5]:
#look at max/min value in the data
X.max()

255.0

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [7]:
from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

In [8]:
svm = SVC()
svm.fit(X_train,y_train)

SVC()

In [9]:
svm.score(X_test,y_test)

0.9794285714285714

In [10]:
svm = SVC() 
parameters = dict(C=[0.2,4,6,9,10,7,200],gamma=['scale', 12,5, 4, 0.2, 'auto'])
search = RandomizedSearchCV(svm, parameters, n_jobs=3,n_iter=10, cv=3,random_state=0)

In [11]:
res=search.fit(X_train[:500,:],y_train[:500])

In [12]:
res.best_params_

{'gamma': 'scale', 'C': 7}

In [13]:
res.best_score_

0.8779429093619989

In [14]:
res.cv_results_

{'mean_fit_time': array([0.26042684, 0.25207631, 0.35411588, 0.2900544 , 0.26038202,
        0.25577068, 0.24707063, 0.25570218, 0.25667834, 0.25607189]),
 'std_fit_time': array([0.01121672, 0.01780371, 0.01574318, 0.00382183, 0.00337821,
        0.00315547, 0.00339411, 0.00111043, 0.00129252, 0.00155184]),
 'mean_score_time': array([0.0687751 , 0.07541498, 0.09681654, 0.08570099, 0.06860582,
        0.07267698, 0.07151238, 0.06854033, 0.06847183, 0.07240089]),
 'std_score_time': array([0.00273244, 0.00806169, 0.00881677, 0.00364875, 0.00176519,
        0.00153409, 0.00198874, 0.00135497, 0.00091437, 0.00155686]),
 'param_gamma': masked_array(data=['scale', 'scale', 4, 0.2, 0.2, 12, 0.2, 'auto', 12, 12],
              mask=[False, False, False, False, False, False, False, False,
                    False, False],
        fill_value='?',
             dtype=object),
 'param_C': masked_array(data=[7, 200, 10, 0.2, 4, 10, 10, 4, 200, 7],
              mask=[False, False, False, False, Fals

### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [15]:
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

In [16]:
nn = make_pipeline(StandardScaler(), MLPClassifier(random_state=1, activation='tanh', hidden_layer_sizes=(64,64), solver='sgd', alpha=0.01, max_iter=100))

In [17]:
nn.fit(X_train,y_train)



Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', alpha=0.01,
                               hidden_layer_sizes=(64, 64), max_iter=100,
                               random_state=1, solver='sgd'))])

In [18]:
nn.score(X_test,y_test)

0.9550285714285714

In [19]:
nn2 = make_pipeline(StandardScaler(), MLPClassifier(random_state=1, activation='tanh', hidden_layer_sizes=(64,64), solver='sgd', alpha=0.001, max_iter=100))

In [20]:
nn2.fit(X_train,y_train)



Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', alpha=0.001,
                               hidden_layer_sizes=(64, 64), max_iter=100,
                               random_state=1, solver='sgd'))])

In [21]:
nn2.score(X_test,y_test)

0.9550285714285714

In [22]:
nn3 = make_pipeline(StandardScaler(), MLPClassifier(random_state=1, activation='tanh', hidden_layer_sizes=(64,64), solver='sgd', alpha=0.0001, max_iter=200))

In [23]:
nn3.fit(X_train,y_train)



Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', hidden_layer_sizes=(64, 64),
                               random_state=1, solver='sgd'))])

In [24]:
nn3.score(X_test,y_test)

0.9579428571428571

In [25]:
nn4 = make_pipeline(StandardScaler(), MLPClassifier(random_state=1, activation='tanh', hidden_layer_sizes=(64,64), solver='sgd', alpha=0.0001, max_iter=300))

In [26]:
nn4.fit(X_train,y_train)

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', hidden_layer_sizes=(64, 64),
                               max_iter=300, random_state=1, solver='sgd'))])

In [27]:
nn4.score(X_test,y_test)

0.9586857142857143

In [28]:
nn5 = make_pipeline(StandardScaler(), MLPClassifier(random_state=1, activation='tanh', hidden_layer_sizes=(64,64), solver='sgd', alpha=0.0001, max_iter=400))

In [29]:
nn5.fit(X_train,y_train)

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', hidden_layer_sizes=(64, 64),
                               max_iter=400, random_state=1, solver='sgd'))])

In [30]:
nn5.score(X_test,y_test)

0.9586857142857143