## Step 1 - import relevant packages

In [128]:
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import SGDClassifier
from sklearn import metrics
import numpy as np

## Step 2 - load the Fashion-MNIST dataset

In [129]:
fashion_mnist = keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

## Step 3 - take a subset of the data set (3000 for training and 1000 for testing)

In [130]:
# combine the training data and testing to get the complete dataset
x_data = np.concatenate((x_train, x_test), axis=0)
y_data = np.concatenate((y_train, y_test), axis=0)

test_size = 1000 / (1000 + 3000)
X_train, X_test, Y_train, Y_test = train_test_split(x_data, y_data, test_size=test_size, random_state=0)

x_train = X_train[:3000]
y_train = Y_train[:3000]
x_test = X_test[:1000]
y_test = Y_test[:1000]

print(x_train.shape)
print(x_test.shape)

(3000, 28, 28)
(1000, 28, 28)


## Step 4 - perform necessary reshaping of the data for the classifiers

In [131]:
x_train = x_train.reshape(x_train.shape[0], -1)
x_test = x_test.reshape(x_test.shape[0], -1)

print(x_train.shape)
print(x_test.shape)

(3000, 784)
(1000, 784)


## Step 5 - initialise the classifier model

In [132]:
knn_classifier = KNeighborsClassifier(n_neighbors=3)
dt_classifier = DecisionTreeClassifier()
sgd_classifier = SGDClassifier(max_iter=250)


## Step 6 - fit the model to the traing data

In [133]:
knn_classifier.fit(x_train, y_train)
dt_classifier.fit(x_train, y_train)
sgd_classifier.fit(x_train, y_train)

SGDClassifier(max_iter=250)

## Step 7 - use the trained/fitted model to evaluate the testing data

In [134]:
y_knn_predict = knn_classifier.predict(x_test)
y_dt_predict = dt_classifier.predict(x_test)
y_sgd_predict = sgd_classifier.predict(x_test)

## Step 8 - report the performance of each classifier

### KNN classifier

In [135]:
knn_accuary = metrics.accuracy_score(y_test, y_knn_predict)
knn_precision = metrics.precision_score(y_test, y_knn_predict, average='macro')
knn_recall = metrics.recall_score(y_test, y_knn_predict, average='macro')
knn_f1 = metrics.f1_score(y_test, y_knn_predict, average='macro')
knn_confusion_matrix = metrics.confusion_matrix(y_test, y_knn_predict)

print('The accuracy score of KNN is:')
print(knn_accuary)
print('The precision score of KNN is:')
print(knn_precision)
print('The recall score of KNN is:')
print(knn_recall)
print('The f1 score of KNN is:')
print(knn_f1)
print('The confusion matrix of KNN:')
print(knn_confusion_matrix)

The accuracy score of KNN is:
0.777
The precision score of KNN is:
0.7815910643829962
The recall score of KNN is:
0.7763271214957356
The f1 score of KNN is:
0.7732931062179594
The confusion matrix of KNN:
[[ 83   1   2   4   1   0   7   0   1   0]
 [  3 108   1   1   1   0   1   0   0   0]
 [  5   0  78   0  15   0  13   0   0   0]
 [ 12   2   1  70   5   0   3   0   1   0]
 [  0   0  17   7  61   0  16   0   0   0]
 [  0   0   0   0   0  68   0  20   0  17]
 [ 21   0  14   2   6   0  49   0   0   0]
 [  0   0   0   0   0   2   0  70   0   5]
 [  4   0   1   0   0   0   3   1  97   0]
 [  0   0   0   0   0   3   1   3   0  93]]


### Compare the accuary of our KNN classifier with the scores presented in the paper

Our result is worse than the accuracy in paper. When initializing the KNN classifier, we only set the parameter n_neighbors to 3, while other parameters remain the default. Therefore, when comparing our result with the results of the KNN classifiers used in the paper, we only compare our result with those generated by classifiers with default parameters other than n_neighbors (i.e. weights=uniform, p=2). 
We can find that when n_neighbors=3, the accuracy is only 0.777, when n_neighbors=5, the accuracy is 0.849, when n_neighbors=9, the accuracy is 0.847. 

There may be three reasons. One is that KNN is very sensitive to the selection of n_neighbors or the number of k, n_neighbors=3 is too small for this dataset. The other is that we only use part of the entire dataset, so the features may be not enough. The last one is that is that the subset we get from the entire dataset may not have exactly the same data distribution as the entire dataset used in the paper which will also affect the accuracy of the trained classifier.

### DT classifier

In [138]:
dt_accuary = metrics.accuracy_score(y_test, y_dt_predict)
dt_precision = metrics.precision_score(y_test, y_dt_predict, average='macro')
dt_recall = metrics.recall_score(y_test, y_dt_predict, average='macro')
dt_f1 = metrics.f1_score(y_test, y_dt_predict, average='macro')
dt_confusion_matrix = metrics.confusion_matrix(y_test, y_dt_predict)

print('The accuracy score of DT is:')
print(dt_accuary)
print('The precision score of DT is:')
print(dt_precision)
print('The recall score of DT is:')
print(dt_recall)
print('The f1 score of DT is:')
print(dt_f1)
print('The confusion matrix of DT:')
print(dt_confusion_matrix)

The accuracy score of DT is:
0.713
The precision score of DT is:
0.7141606824331195
The recall score of DT is:
0.7093676968454571
The f1 score of DT is:
0.710221565751715
The confusion matrix of DT:
[[ 72   0   2   5   0   0  16   1   3   0]
 [  2 106   0   2   1   0   4   0   0   0]
 [  5   2  63   2  16   1  19   0   3   0]
 [ 10   3   3  66   2   0   5   1   4   0]
 [  2   0  17  14  55   0  13   0   0   0]
 [  0   2   0   0   0  82   0  15   2   4]
 [ 17   0  11   0  14   0  47   0   3   0]
 [  0   0   0   0   0  14   0  56   0   7]
 [  1   0   1   1   3   4  10   0  86   0]
 [  0   0   0   0   0   6   1  11   2  80]]


### Compare the accuary of our DT classifier with the scores presented in the paper

Our result is worse than the accuracy in paper. When initializing the DT classifier, we keep all parameters as default values (i.e. criterion=gini, splitter=best). Therefore, when comparing our result with the results of the DT classifiers used in the paper, we only compare our result with those generated by classifiers with default parameters. We can find that the accuracy scores generated by the DT classifiers in paper are between 0.78 and 0.79.  But our accuracy is 0.713. 

There may be two reasons. One is that we only use part of the entire dataset, so the features may be not enough, the other is that the subset we get from the entire dataset may not have exactly the same data distribution as the entire dataset used in the paper which will also affect the accuracy of the trained classifier.



### SGD classifier

In [137]:
sgd_accuary = metrics.accuracy_score(y_test, y_sgd_predict)
sgd_precision = metrics.precision_score(y_test, y_sgd_predict, average='macro')
sgd_recall = metrics.recall_score(y_test, y_sgd_predict, average='macro')
sgd_f1 = metrics.f1_score(y_test, y_sgd_predict, average='macro')
sgd_confusion_matrix = metrics.confusion_matrix(y_test, y_sgd_predict)

print('The accuracy score of SGD is:')
print(sgd_accuary)
print('The precision score of SGD is:')
print(sgd_precision)
print('The recall score of SGD is:')
print(sgd_recall)
print('The f1 score of SGD is:')
print(sgd_f1)
print('The confusion matrix of SGD:')
print(sgd_confusion_matrix)

The accuracy score of SGD is:
0.769
The precision score of SGD is:
0.7890365416046505
The recall score of SGD is:
0.7666215736392156
The f1 score of SGD is:
0.752531024161388
The confusion matrix of SGD:
[[ 80   1   2  10   3   0   0   0   3   0]
 [  2 107   1   3   1   0   1   0   0   0]
 [  3   0  66   1  37   0   2   0   2   0]
 [  5   1   1  78   8   0   1   0   0   0]
 [  0   0   7   8  85   0   1   0   0   0]
 [  0   0   0   0   2  82   1   7   4   9]
 [ 16   0  13   5  41   0  15   0   2   0]
 [  0   0   0   0   0   2   0  71   1   3]
 [  2   0   0   4   3   0   0   1  96   0]
 [  0   0   0   0   0   1   0  10   0  89]]


### Compare the accuary of our SGD classifier with the scores presented in the paper

Our result is worse than the accuracy in paper. When initializing the SGD classifier, we only set max_iter to 250 to limit the maximum number of algorithm iterations. Therefore, when comparing our result with the results of the SGD classifiers used in the paper, we only compare our result with those generated by classifiers with default parameters (loss=hinge, penalty=l2). We can find that the accuracy scores generated by the DT classifier in paper is 0.819.  But our accuracy is 0.769. 


There may be three reasons. One is that we set max_iter to 250, but the parameter used in the paper is 1000. Therefore, the number of training times of our model is much less than the model in the paper, which may lead to underfitting and lower accuracy. The other is that we only use part of the entire dataset, so the features may be not enough. The last one is that the subset we get from the entire dataset may not have exactly the same data distribution as the entire dataset used in the paper which will also affect the accuracy of the trained classifier.