# Cognitive Algorithms - Assignment 6 (30 points)
Cognitive Algorithms        
Summer term 2018      
Technische Universität Berlin     
Fachgebiet Maschinelles Lernen 

**Due on July 16, 2018 10am via ISIS **
                    
After completing all tasks, run the whole notebook so that the content of each cell is properly displayed. Make sure that the code was ran and the entire output (e.g. figures) is printed. Print the notebook as a PDF file and again make sure that all lines are readable - use line breaks in the Python Code '\' if necessary. Points will be deducted, if code or content is not readable!                  
           
**Upload the PDF file that contains a copy of your notebook on ISIS.** 

Group:     
Members:     

# Part 1: Theory (15 points)
---
### Task 1: Multiple Choice Questions (2 points)
**A)** A Multilayer Perceptron can be used for ...           
- [x] classification                   
- [x] regression                       

**B)** The training algorithm of the MLP mainly consists of two phases. Which statement about the backward phase is true?       
- [x] the error of each neuron is computed for each neuron, starting with the neurons in the output layer          
- [ ] the error of each neuron is computed for each neuron, starting with the neurons in the input layer               

### Task 2: Learning Procedure (5 points)
Before we can use an MLP for a given task, we have to train it. This training procedure (here: batch mode) is composed of different steps, that you can find below. However, the order of the steps is not correct. Please bring the steps in the correct order.

1. FOR EACH input vector
1. END FOR EACH
1. REPEAT until stopping criterion is fulfilled
1. END REPEAT
1. compute the error of the neurons in the hidden layer
1. update the hidden layer weights
1. Initialize all weights
1. compute the activation of each neuron of the hidden layer
1. update the output layer weights
1. compute the error of the output neuron
1. compute the activation of the output layer neurons

**[Your solution for task 2 here]**       
1. Initialize all weights
1. FOR EACH input vector            
1. compute the activation of each neuron of the hidden layer         
1. REPEAT until stopping criterion is fulfilled
1. compute the activation of the output layer neurons
1. END REPEAT
1. compute the error of the output neuron
1. compute the error of the neurons in the hidden layer
1. update the output layer weights
1. update the hidden layer weights
1. END FOR EACH

### Task 3: Linear Activation Function (8 points)
It can be shown, if a multilayer perceptron has a linear activation function in all neurons, any number of layers can be reduced to a two-layer input-output model.                                         
Consider an MLP with $N$ input neurons $x_n$, one hidden layer with $K$ neurons $z_k$ and a single output $y$. ${\alpha_k}_n$ define the weights connecting neuron $x_n$ and $z_k$, and $\beta_k$ the weights connecting $z_k$ and the output. ${\alpha_k}_0$ and $\beta_0$ are the weights of the biases. All neurons have a linear activation function, thus the output of a hidden neuron can be written as 
$$z_k = -{\alpha_k}_0 + \sum_{n=1}^{N} {\alpha_k}_n x_n$$
and the total output becomes 
$$y = -\beta_0 + \sum_{k=1}^{K} \beta_k z_k$$

**A) (4 points)** Draw a graph representing the MLP and annotate it with the relevant variables (input, hidden and output neurons, bias and weights).

**[Your answer for 3A here]**

**B) (4 points)** Show that there exists an MLP without hidden layers, which models the same function.

**[Your answer for 3B here]**

# Part 2: Programming (15 points)
---
Like in the first assignment you aim to recognize handwritten digits. This time you will not train a linear perceptron, but a non-linear multilayer perceptron (MLP). You won’t have to implement it – we just want you to play around with existing code and modify it slightly. We are using the ```scikit-learn``` implementation, that can be found here:            
http://scikit-learn.org/stable/modules/neural_networks_supervised.html            
You might have to install ```scikit-learn``` beforehand. Follow the instructions on their webpage to do so.                   
This time we will use the full MNIST Data set.             

Below you find the code to load the MNIST dataset and to train an MLP.

In [1]:
import numpy as np
from scipy.ndimage import convolve
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.externals import joblib
import os.path
import scipy as sp
import pylab as pl
%matplotlib inline

In [2]:
PATH = 'mlp_model.pkl'

if __name__ == '__main__':
    print('Fetching and loading MNIST data')
    #loads mnist dataset
    mnist = fetch_mldata('MNIST original')
    # separate dataset into two arrays X- training samples represented as floating point feature vectors
    #                                  y- target values for the training samples
    X, y = mnist.data, mnist.target
    # split X and y into random train (75% of data set) and test subsets( 25% ) randomly with seed used 
    # by the random number generator
    X_train, X_test, y_train, y_test = train_test_split(X / 255., y, test_size=0.25, random_state=0)

    print('Got MNIST with %d training- and %d test samples' % (len(y_train), len(y_test)))
    print('Digit distribution in whole dataset:', np.bincount(y.astype('int64')))

    clf = None
    # Load the model from file if there is some
    if os.path.exists(PATH):
        print('Loading model from file.')
        clf = joblib.load(PATH).best_estimator_
    # else train model
    else:
        print('Training model.')
        # Create 3 different size for hidden layers and its number of neurons: 
        # one hidden layer with 256 neurons
        # one hidden layer with 512 neurons
        # and three hiddden layers with 128 in the first and third layer and 256 in the second
        params = {'hidden_layer_sizes': [(256,), (512,), (128, 256, 128,)]}
        # create implemented scikit-learn MLP Classifier that will print its progress and adaptive learning rate 
        # which keeps the learning rate constant to 0.001 as long as training loss keeps decreasing.
        mlp = MLPClassifier(verbose=10, learning_rate='adaptive')
        # Exhaustive Grid Search exhaustively generates candidates from a grid of parameter values specified 
        # in variable params to find the reasonable estimator, priting its probgress, not running in pararel and 
        # using the 5-fold cross-validation strategy
        clf = GridSearchCV(mlp, params, verbose=10, n_jobs=-1, cv=5)
        #Learns from training set
        clf.fit(X_train, y_train)
        print('Finished with grid search with best mean cross-validated score:', clf.best_score_)
        print('Best params appeared to be', clf.best_params_)
        # dump clf to file
        joblib.dump(clf, PATH)
        # sets the estimator which gave the highest score as classifier
        clf = clf.best_estimator_

Fetching and loading MNIST data
Got MNIST with 52500 training- and 17500 test samples
('Digit distribution in whole dataset:', array([6903, 7877, 6990, 7141, 6824, 6313, 6876, 7293, 6825, 6958]))
Training model.
Fitting 5 folds for each of 3 candidates, totalling 15 fits
[CV] hidden_layer_sizes=(256,) .......................................
[CV] hidden_layer_sizes=(256,) .......................................
[CV] hidden_layer_sizes=(256,) .......................................
[CV] hidden_layer_sizes=(256,) .......................................
Iteration 1, loss = 0.40642654
Iteration 1, loss = 0.41297862
Iteration 1, loss = 0.40986989
Iteration 1, loss = 0.41142810
Iteration 2, loss = 0.17356502
Iteration 2, loss = 0.17806639
Iteration 2, loss = 0.17954160
Iteration 2, loss = 0.18402350
Iteration 3, loss = 0.12499738
Iteration 3, loss = 0.12821320
Iteration 3, loss = 0.12847990
Iteration 3, loss = 0.13025889
Iteration 4, loss = 0.09665224
Iteration 4, loss = 0.09746064
Iteration 

Iteration 19, loss = 0.00234996
Iteration 16, loss = 0.00406724
Iteration 34, loss = 0.00118447
Iteration 17, loss = 0.00320654
Iteration 20, loss = 0.00194831
Iteration 17, loss = 0.00302918
Iteration 35, loss = 0.01925881
Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping.
[CV] .. hidden_layer_sizes=(256,), score=0.972370426829, total= 2.4min
[CV] hidden_layer_sizes=(512,) .......................................


[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:  4.1min


Iteration 18, loss = 0.00264193
Iteration 21, loss = 0.00164648
Iteration 18, loss = 0.00275922
Iteration 19, loss = 0.00224224
Iteration 1, loss = 0.36705543
Iteration 22, loss = 0.00157695
Iteration 19, loss = 0.00284498
Iteration 20, loss = 0.00202360
Iteration 2, loss = 0.15619505
Iteration 23, loss = 0.02113121
Iteration 20, loss = 0.00224578
Iteration 21, loss = 0.00178836
Iteration 3, loss = 0.10459253
Iteration 24, loss = 0.01315171
Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping.
Iteration 21, loss = 0.00179684
[CV] .. hidden_layer_sizes=(512,), score=0.974676313785, total= 2.7min
[CV] hidden_layer_sizes=(512,) .......................................
Iteration 22, loss = 0.00151342
Iteration 4, loss = 0.08346322
Iteration 22, loss = 0.00162237
Iteration 1, loss = 0.34216352
Iteration 23, loss = 0.00139090
Iteration 5, loss = 0.05841853
Iteration 23, loss = 0.00135531
Iteration 24, loss = 0.00129817
Iteration 6, loss = 0.04708451
Iterat

[Parallel(n_jobs=-1)]: Done  10 out of  15 | elapsed:  7.2min remaining:  3.6min


Iteration 1, loss = 0.39722883
Iteration 25, loss = 0.00137260
Iteration 20, loss = 0.01169141
Iteration 1, loss = 0.40835696
Iteration 2, loss = 0.14947995
Iteration 21, loss = 0.00922874
Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping.
[CV]  hidden_layer_sizes=(128, 256, 128), score=0.97438583127, total= 1.6min
[CV] hidden_layer_sizes=(128, 256, 128) ..............................
Iteration 26, loss = 0.00122344
Iteration 2, loss = 0.17099341
Iteration 3, loss = 0.09737475
Iteration 3, loss = 0.11231319
Iteration 27, loss = 0.00114278
Iteration 4, loss = 0.07164677
Iteration 1, loss = 0.39376984
Iteration 4, loss = 0.07961975
Iteration 5, loss = 0.05052172
Iteration 2, loss = 0.13824705
Iteration 28, loss = 0.00105494
Iteration 5, loss = 0.05970548
Iteration 6, loss = 0.04095516
Iteration 3, loss = 0.09740214
Iteration 29, loss = 0.03345718
Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping.
[CV] ...... 

[Parallel(n_jobs=-1)]: Done  12 out of  15 | elapsed:  7.6min remaining:  1.9min


Iteration 7, loss = 0.03437354
Iteration 4, loss = 0.07072031
Iteration 7, loss = 0.03669869
Iteration 8, loss = 0.02477591
Iteration 5, loss = 0.05275092
Iteration 8, loss = 0.02755860
Iteration 9, loss = 0.01802605
Iteration 6, loss = 0.05792011
Iteration 9, loss = 0.02321964
Iteration 10, loss = 0.01453453
Iteration 7, loss = 0.03344756
Iteration 10, loss = 0.01786765
Iteration 11, loss = 0.01567504
Iteration 8, loss = 0.04729200
Iteration 11, loss = 0.01681748
Iteration 12, loss = 0.01227599
Iteration 9, loss = 0.02190519
Iteration 12, loss = 0.01444114
Iteration 13, loss = 0.01290528
Iteration 10, loss = 0.01777878
Iteration 13, loss = 0.01830027
Iteration 14, loss = 0.01395799
Iteration 11, loss = 0.01296514
Iteration 14, loss = 0.01147425
Iteration 15, loss = 0.00978983
Iteration 12, loss = 0.01265858
Iteration 15, loss = 0.00940699
Iteration 16, loss = 0.01315197
Iteration 13, loss = 0.00910297
Iteration 16, loss = 0.00777036
Iteration 17, loss = 0.00903016
Iteration 14, loss =

[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:  8.2min finished


Iteration 1, loss = 0.31480014
Iteration 2, loss = 0.13265725
Iteration 3, loss = 0.08846696
Iteration 4, loss = 0.06458865
Iteration 5, loss = 0.04799160
Iteration 6, loss = 0.03642691
Iteration 7, loss = 0.02812341
Iteration 8, loss = 0.02251528
Iteration 9, loss = 0.01718398
Iteration 10, loss = 0.01316793
Iteration 11, loss = 0.01070230
Iteration 12, loss = 0.00836069
Iteration 13, loss = 0.00677829
Iteration 14, loss = 0.00476518
Iteration 15, loss = 0.00403162
Iteration 16, loss = 0.00353455
Iteration 17, loss = 0.00276550
Iteration 18, loss = 0.00275566
Iteration 19, loss = 0.00262856
Iteration 20, loss = 0.00176854
Iteration 21, loss = 0.00141403
Iteration 22, loss = 0.00120354
Iteration 23, loss = 0.00108292
Iteration 24, loss = 0.00102183
Iteration 25, loss = 0.00096207
Iteration 26, loss = 0.00090132
Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping.
('Finished with grid search with best mean cross-validated score:', 0.9768)
('Best par

**A) (4 points)** Shortly explain in your own words, what the code does. Ideally explain it line-by-line (```print``` statements can be omitted). You can write short comments directly in the code. 

**B) (1 point)**  Run the code (this may take a while when running it for the first time). What are the training and testing errors?         
*Hint: The current progress is printed on the Jupyter Notebook terminal.* 

In [4]:
print(clf.score(X_train, y_train))
print(clf.score(X_test, y_test))

1.0
0.9822285714285715


The training error is **~[0.0]**.                      
The test error is **~[0.01777142857]**.             

**C) (4 points)** What does ```GridSearchCV``` do? Do we really need this function? Explain your decision.

**[Your answer for C here]**         

**D) (3 points)** What role plays the ```random_state``` parameter in ```train_test_split```? What happens if we left it out?

**[Your answer for D here]**

**E) (3 points)** We now want to compare an MLP without any hidden units with a single Perceptron. To do so, first train an MLP without an hidden layer by changing the given code. Print its training and test error. Compare the result to the one you obtained when training the Perceptron on the ```USPS``` dataset (assignment 2). Is this MLP then the same algorithm as the Perceptron of assignment 2? 

In [None]:
# your code for task E) here

The training error is **~[training error]**.                   
The test error is **~[test error]**.              

**[Your answer for E here]**