# Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow

# By Aure'lien Ge'ron

## Chapter 3: Classification 


### Exercise 1.

We suppose to use `KNeighborsClassifier` for the MNIST dataset. 

In [1]:
## This part id from Ge'ron's notebook.
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Is this notebook running on Colab or Kaggle?
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os
import timeit


# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

In [2]:
## This part id from Ge'ron's notebook.
#Loading data
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
#check the keys:
mnist.keys()

dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])

In [3]:
# we get assign the data X and the labels Y. I pick up the first "m_NG" for the training. 
##This data is already shuffled. 
# reading X and Y 
X,Y = mnist['data'], mnist['target'] 

#changing labels from string to integers
Y = Y.astype(np.uint8)

#split:
#m_Ng is the "m" that Andrew Ng uses for the number of data for the training. 
m_NG = 60000
X_train, X_test, Y_train, Y_test= X[:m_NG], X[m_NG:], Y[:m_NG], Y[m_NG:]

In [4]:
# scaling the data, X_train

from sklearn.preprocessing import StandardScaler
std_scalar = StandardScaler()

start = timeit.default_timer()

X_train_scaled = std_scalar.fit_transform(X_train)

stop = timeit.default_timer()
print('Time: ', (stop - start),'sec.')  

Time:  1.0522802790000014 sec.


In [5]:
## We define a K-neghbor classifier and a parameter grid. 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

kn_clf= KNeighborsClassifier()


par_grid_search=[{'n_neighbors':[3,4,5,6,7], 'weights':['uniform','distance']}]

In [6]:
#defining the search function
grid_search=GridSearchCV(kn_clf,par_grid_search,cv=5,scoring='accuracy',verbose=2)

In [7]:
#perforing the search
grid_search.fit(X_train_scaled,Y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV] END .....................n_neighbors=3, weights=uniform; total time=  18.5s
[CV] END .....................n_neighbors=3, weights=uniform; total time=  13.7s
[CV] END .....................n_neighbors=3, weights=uniform; total time=  12.7s
[CV] END .....................n_neighbors=3, weights=uniform; total time=  12.8s
[CV] END .....................n_neighbors=3, weights=uniform; total time=  13.0s
[CV] END ....................n_neighbors=3, weights=distance; total time=  12.5s
[CV] END ....................n_neighbors=3, weights=distance; total time=  12.4s
[CV] END ....................n_neighbors=3, weights=distance; total time=  12.3s
[CV] END ....................n_neighbors=3, weights=distance; total time=  12.4s
[CV] END ....................n_neighbors=3, weights=distance; total time=  12.6s
[CV] END .....................n_neighbors=4, weights=uniform; total time=  16.5s
[CV] END .....................n_neighbors=4, wei

GridSearchCV(cv=5, estimator=KNeighborsClassifier(),
             param_grid=[{'n_neighbors': [3, 4, 5, 6, 7],
                          'weights': ['uniform', 'distance']}],
             scoring='accuracy', verbose=2)

In [8]:
#which one was the best?
grid_search.best_params_

{'n_neighbors': 4, 'weights': 'distance'}

In [10]:
# Let's look at all the performances
cv_res=grid_search.cv_results_
for acc,params in zip(cv_res["mean_test_score"],cv_res["params"]):
    print(acc,params)

0.9427833333333332 {'n_neighbors': 3, 'weights': 'uniform'}
0.9442833333333335 {'n_neighbors': 3, 'weights': 'distance'}
0.9408666666666667 {'n_neighbors': 4, 'weights': 'uniform'}
0.9465166666666667 {'n_neighbors': 4, 'weights': 'distance'}
0.94205 {'n_neighbors': 5, 'weights': 'uniform'}
0.94435 {'n_neighbors': 5, 'weights': 'distance'}
0.9405000000000001 {'n_neighbors': 6, 'weights': 'uniform'}
0.9447166666666666 {'n_neighbors': 6, 'weights': 'distance'}
0.9406666666666667 {'n_neighbors': 7, 'weights': 'uniform'}
0.94235 {'n_neighbors': 7, 'weights': 'distance'}


In [11]:
# Performance on the test set.
from sklearn.metrics import accuracy_score

X_test_scaled = std_scalar.transform(X_test.astype(np.float64))
Y_test_pred = grid_search.predict(X_test_scaled)

accuracy_score(Y_test, Y_test_pred)

0.9489

# Discussion

 - First I checked `'n_neighbors':[5,10,30]` and the best was 5, so I looked around 5, which is what we have above. 
 - In the [notebook](https://github.com/ageron/handson-ml2) provided by Ge'ron on GitHub, there is no scaling. They use the original data.But the final conclusion is the same, namely `'n_neighbors'=4` and `'weights'='distance'` is the best. 
 - They get a better performance with the original data! Here we get around `95%` on the test set and they have `97%`. It is remarkable that the performance is better without the scaling! In the book with SGDclassifier the performance increases around 5% by scaling the data! 
 - Although they get a `97%` accuracy, but they have a warning for a very long running time, i.e. 16 hours! With the sacling it only took a few minutes even though I also considered two more values for `'n_neighbors'`. So this is a big difference! 
 - Note that with the scaling we get around `95%` by using the `KNeighborsClassifier`. This is already `6%` better than the `SGDClassifier`. 
 

### Exercise 2. 

We suppose to add some "artificial data" to the dataset by shifting each image to left, right,up and down by one pixel. 

Shifing a matrix by one column/row can be done by multiplying a shift matrix with it. `numpy` has a function for this which is `roll`. But I follow the suggestion in the book and use the `shift` from `scipy`.

The `shift_op` function and the way that I construct the *augmented* data is from the notebook by Ge'ron. Apart from changes to variable names to those I have been using, I do a bit more:

 - I first tried to do the shift on all images, which is of course possible. But performing scaling on them made my notebook frozen. So I just pick up part of data and do the shifts on those. 

 - In addition I added a few lines to see how long does things take. 
 
 - I will do a permutation and then feed the data for the fitting-**Shuffling**. 
 
 - I did a grid search like above, but the result was the same. So I just use the best paramaters here.  
 
 - As we see, like what it is written in the Ge'ron's notebook, the accuracy on the test set increases by sth like `0.6%`-from `94.9%` to `95.5%`. So it increases but less than a percent. 

In [12]:
from scipy.ndimage.interpolation import shift
def shift_op(x,l_r,u_d):
    x = x.reshape((28, 28))
    x_shifted = shift(x, [l_r, u_d], cval=0, mode="constant")
    return x_shifted.reshape([-1]) 

In [13]:
start = timeit.default_timer()

X_aug=[]
Y_aug=[]

# number of data for augmentation. Here it is 10000 but I write it in terms of m_NG
N_aug = m_NG//6

for l_r,u_d in [[1,0],[-1,0],[0,1],[0,-1]]:
    for image,num in zip(X_train[:N_aug],Y_train[:N_aug]):
        X_aug.append(shift_op(image,l_r,u_d))
        Y_aug.append(num)
        
stop = timeit.default_timer()
print('Time: ', (stop - start),'sec.')  

Time:  4.533125458999848 sec.


In [14]:
#change lists to numpy arrays
X_aug = np.asarray(X_aug)
Y_aug = np.asarray(Y_aug)

#check the dimensions
X_aug.shape,Y_aug.shape

((40000, 784), (40000,))

In [15]:
# merge all the data
start = timeit.default_timer()
X_train_with_aug=np.concatenate((X_train,X_aug),axis=0)
Y_train_with_aug=np.concatenate((Y_train,Y_aug),axis=0)
stop = timeit.default_timer()

print('Time: ', (stop - start),'sec.')  

Time:  0.3302451280001151 sec.


In [16]:
#check the dimensions
X_train_with_aug.shape,Y_train_with_aug.shape

((100000, 784), (100000,))

In [17]:
#scaling
X_t_w_a_scaled = std_scalar.fit_transform(X_train_with_aug)

In [18]:
#permute
permute = np.random.permutation(X_t_w_a_scaled.shape[0])
X_permuted, Y_permuted = X_t_w_a_scaled[permute], Y_train_with_aug[permute]
X_permuted.shape, Y_permuted.shape

((100000, 784), (100000,))

In [19]:
kn_clf_aug= KNeighborsClassifier(**grid_search.best_params_)

kn_clf_aug.fit(X_permuted,Y_permuted)

KNeighborsClassifier(n_neighbors=4, weights='distance')

In [20]:
X_test_scaled = std_scalar.transform(X_test)
Y_test_pred = kn_clf_aug.predict(X_test_scaled)
accuracy_score(Y_test, Y_test_pred)

0.9552