######  The University of Melbourne, School of Computing and Information Systems
# COMP30027 Machine Learning, 2021 Semester 1

## Week 11 - Practical Workshop

### NOTE:  You will need the newer (18.1) build of `scikit-learn` for its neural network support.


### Exercise 1.
The Multilayer Perceptron is available from (newer builds of) `scikit-learn` as `sklearn.neural_network.MLPClassifier`.


In [3]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.neural_network import MLPClassifier
from collections import Counter

### Exercise 1.(a) 
Build a default Multilayer Perceptron to classify the `Iris` data. Evaluate its cross-validation accuracy.

In [4]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
print('X:', X.shape, 'y:', set(y))


clf = MLPClassifier(max_iter=2000)

print('corss-val acc:', np.mean(cross_val_score(clf, X, y, cv=5)))
clf.fit(X, y)


X: (150, 4) y: {0, 1, 2}
corss-val acc: 0.9866666666666667


MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(100,), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=2000,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

### Exercise 1.(b) 
Check the `coefs_` and `n_layers_` attributes of the fitted classifier to examine the resulting neural network.

In [13]:
print(clf.coefs_)
print('parameter shapes:',[p.shape for p in clf.coefs_])
print('num layers:', clf.n_layers_)

[array([[ 3.66718038e-01,  3.88247175e-01, -6.15693510e-02,
         5.79906789e-01, -4.35990579e-02,  5.04288895e-02,
        -1.90279648e-01,  6.71765998e-01, -6.95615506e-02,
         1.28235058e-01],
       [ 1.50653818e-01,  4.13385199e-02,  2.24216036e-02,
         9.79004094e-01, -9.55817039e-02, -4.91173544e-01,
         1.83898936e-02, -4.23969816e-01, -1.80312634e-01,
         1.17055071e-01],
       [-3.29298310e-01, -4.90696148e-01, -4.07649037e-03,
        -8.08397568e-01,  5.05216225e-06,  7.24227623e-01,
         2.53221120e-02, -5.88011125e-01, -2.71407286e-02,
         7.54511339e-01],
       [ 6.03154447e-01, -4.28797783e-01,  8.96386201e-02,
        -8.76669929e-01, -3.86065952e-06,  1.97945034e-01,
         7.37566983e-02, -1.68779330e-01, -2.48014333e-02,
        -3.49660796e-02]]), array([[ 1.79203342e-02, -1.06248373e-02,  5.50521381e-01,
        -2.07344564e-01,  3.90209148e-01, -1.85515280e-01,
         3.86661993e-01,  3.49371971e-01, -6.89183338e-04,
        

### Exercise 2.
One important issue with this Multilayer Perceptron is that it is sensitive to the scale of the input attribute values.
### Exercise 2.(a) 
Read up on the `StandardScaler` , and re-scale the `Iris` data so that each attribute has a *mean* of 0 and a *variance* of 1. Evaluate and examine the resulting neural network built on the re-scaled data.


In [7]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
clf = MLPClassifier(max_iter=2000)
#it is cheating because the mean and variance are estimated using both training and test data
print('corss-val cheating standardised features acc:', np.mean(cross_val_score(clf, scaler.fit_transform(X), y, cv=5))) 


corss-val cheating standardised features acc: 0.9600000000000002


### Exercise 2.(c) 
(Harder) Calculating the _mean_ and _variance_ on the entire data set (before splitting into train/test sets) is cheating slightly. Write a re-scale function that calculates the scaling factors for the training data, and applies the scaler to the test data. Then, write a wrapper function that uses this to cross-validate.

In [8]:
clf = MLPClassifier(max_iter=2000)
#this way we don't cheat read more on pipelines https://scikit-learn.org/stable/modules/compose.html
pipeline = Pipeline([('transformer', scaler), ('estimator', clf)])
print('corss-val noncheating standardised features acc:', np.mean(cross_val_score(pipeline, X, y, cv=5)))


corss-val noncheating standardised features acc: 0.9666666666666668


*You might not see reduction in performance for the noncheating method, but in general it is best to standardise only the training data (fit_transform), and then apply the transformation to the test data (transform).*

*Also you didn't see improvements with standardisation, which might be the result of the neural network not being tuned well in terms of regularisation, and number/size of the layers.*

### Exercise 3 
You can coerce the Multilayer Perceptron to have specifically–sized hidden layers using the *hidden_layer_sizes* parameter.
### Exercise 3.(a) 
Train a Multilayer Perceptron on the two-class `Abalone` data, and examine the resulting neural
network.


In [3]:
def convert_class(raw, num_class=2):
    raw = int(raw)
    if num_class == 2:
        if raw<=10: return 0
        else: return 1
    elif num_class == 3:
        if raw <= 8:
            return 0
        elif 9<=raw<=10:
            return 1
        elif 11<=raw:
            return 2
    elif num_class == 29:
        return raw

def load_abalone(addsex=False, num_class=2):
    X, y = [], []
    with open('abalone.data', 'r') as fin:
        for line in fin:
            atts = line[:-1].split(",")
            if not addsex:
                X.append(atts[1:-1])
            else:
                sex = atts[0]
                if sex == "M": sex = 0
                elif sex=="I": sex = 1
                elif sex=="F": sex = 2
                else: sex = 3
                
                X.append([sex] + atts[1:-1])
            y.append(convert_class(atts[-1], num_class))
    X = np.array(X, dtype=float)
    return X, y

X, y = load_abalone(addsex=False, num_class=2)
print('X:', X.shape, 'y:', set(y))

clf = MLPClassifier(max_iter=2000)
clf.fit(X,y)
print(clf.coefs_)

X: (4177, 7) y: {0, 1}
[array([[ 3.52572688e-001, -1.23919665e-001,  3.42827444e-001,
        -4.92489901e-002,  3.61520912e-001,  3.03376111e-001,
        -1.08845783e-001,  2.15759509e-001,  7.83537638e-006,
        -1.49217546e-001, -7.89495359e-002,  1.16375903e-001,
        -1.82286447e-001,  2.78605252e-145, -2.94521679e-001,
         1.81483691e-001, -1.01313416e-158,  3.10248577e-001,
        -2.93549303e-160, -4.56630795e-141, -1.49350241e-178,
        -3.42014059e-002, -3.75905327e-001,  1.00452307e-001,
         3.61646417e-001, -9.32571047e-002,  3.33744542e-001,
         4.81296718e-002, -3.21699808e-001,  5.74118266e-143,
        -6.79989459e-155,  8.41724209e-002,  3.34750452e-001,
        -5.18586062e-177,  7.93962566e-007,  3.31397062e-001,
         1.00345517e-001, -1.88890983e-001,  3.12171001e-001,
        -1.51902774e-001,  7.64060414e-002, -6.14498822e-002,
         7.94123030e-002,  6.68564925e-002, -8.84254741e-002,
         3.21908409e-171,  1.19866498e-001,  1

### Exercise 3.(b) 
(Harder) Change the size and/or number of hidden layers. How are the resulting weights affected? Can you discern any relationship between the weights for layers of varying sizes?

In [9]:
clf = MLPClassifier(hidden_layer_sizes=[10, 10, 4], max_iter=2000)
#this way we don't cheat read more on pipelines https://scikit-learn.org/stable/modules/compose.html
clf.fit(X, y)
print(clf.coefs_)

[array([[ 3.76084391e-01, -1.13947094e-01, -1.34522862e-01,
         8.95557788e-01, -6.08703456e-01, -1.30231565e-02,
         2.31100192e-01, -5.94617879e-02, -3.34515546e-01,
        -9.37085264e-02],
       [ 8.46616908e-02, -2.60303015e-01, -2.26368975e-01,
         6.40072301e-01,  5.43843139e-01, -4.64420425e-02,
         2.01323783e-01, -3.31400500e-03,  4.90019264e-01,
        -2.27951739e-01],
       [-3.61717586e-01,  7.03168093e-01,  7.24304751e-01,
        -5.93953680e-01,  3.62272864e-01,  1.41123756e-05,
         4.26361947e-01, -5.16114856e-03,  1.90923783e-01,
         1.78550563e-01],
       [-6.98956889e-01,  4.00416081e-01,  3.35867411e-01,
        -2.73298097e-01, -4.95713229e-01,  1.76467202e-12,
         4.58799692e-02, -1.29637374e-01, -1.33141264e-01,
        -2.09096557e-01]]), array([[ 8.43171933e-03,  1.78581510e-01,  4.86077367e-01,
        -2.52841651e-02, -1.00125552e+00,  3.71859576e-03,
        -5.03138632e-02,  6.22128033e-01,  1.68278243e-02,
        

### Exercise 4. 
There are a couple of different tune-able parameters for the MLPClassifier , mostly dealing with the weight optimisation — however, it is often worthwhile to tune the Regularisation parameter (α).
### Exercise 4.(a) 
Try varying orders of α between 10 and 10 −5 for a Multilayer Perceptron built on the two-class `Abalone` data. How much variance in cross-validation accuracy do you observe?


In [10]:
alphas = [np.power(10.0, i) for i in range(-7, 2)]
print(alphas)

for alpha in alphas:
    clf = MLPClassifier(max_iter=2000, alpha=alpha)
    pipeline = Pipeline([('transformer', scaler), ('estimator', clf)])
    scores = cross_val_score(pipeline, X, y, cv=5)
    print('alpha: {} mean_acc: {} standard_dev_acc: {}'.format(alpha, np.mean(scores), np.std(scores)))

[1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0]
alpha: 1e-07 mean_acc: 0.9666666666666668 standard_dev_acc: 0.029814239699997188
alpha: 1e-06 mean_acc: 0.9666666666666668 standard_dev_acc: 0.029814239699997188
alpha: 1e-05 mean_acc: 0.9666666666666668 standard_dev_acc: 0.029814239699997188
alpha: 0.0001 mean_acc: 0.9666666666666668 standard_dev_acc: 0.029814239699997188
alpha: 0.001 mean_acc: 0.9600000000000002 standard_dev_acc: 0.024944382578492935
alpha: 0.01 mean_acc: 0.96 standard_dev_acc: 0.024944382578492935
alpha: 0.1 mean_acc: 0.9666666666666668 standard_dev_acc: 0.029814239699997188
alpha: 1.0 mean_acc: 0.9666666666666668 standard_dev_acc: 0.029814239699997188
alpha: 10.0 mean_acc: 0.9466666666666665 standard_dev_acc: 0.03399346342395189


### Exercise 4.(b) 
Read up on the `GridSearchCV` utility, to help you in tuning the performance of the *Multilayer Perceptron*. Split the data into a training–and–tuning partition, and a test partition. What is the value of the regularisation parameter that `GridSearchCV` comes up with? How does the test accuracy compare to the default (un-tuned) `MLPClassifier` ?

In [11]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

X_train, X_devtest, y_train, y_devtest = train_test_split(X, y, test_size=0.4, random_state=42)
X_dev, X_test, y_dev, y_test = train_test_split(X_devtest, y_devtest, test_size=0.5, random_state=42)

clf.fit(X_train, y_train)
print('MLP acc without tuning:', clf.score(X_test, y_test))

hidden_sizes = [[100], [10, 10]]
#arguments of MLPClassifier and a list of values for them to search and find the best.
param_grid = {'alpha': alphas, 'hidden_layer_sizes':hidden_sizes}


gs = GridSearchCV(estimator=clf,
                  param_grid=param_grid,
                  scoring='accuracy',
                  cv=3,
                  n_jobs=2,
                  verbose=1)
gs.fit(X_train, y_train)

best_params = gs.best_params_
print('best_params', best_params)
clf = MLPClassifier(max_iter=2000, **best_params)
clf.fit(X_train, y_train)
print('acc with best params:', clf.score(X_test, y_test))


MLP acc without tuning: 1.0
Fitting 3 folds for each of 18 candidates, totalling 54 fits


[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   20.1s
[Parallel(n_jobs=2)]: Done  54 out of  54 | elapsed:   24.7s finished


best_params {'alpha': 1e-05, 'hidden_layer_sizes': [10, 10]}
acc with best params: 1.0
