# Loading CIFAR-10

In the following cells we determine the number of images for each split and load the images.

In [1]:
from google.colab import drive
drive.mount('/content/drive')
!pip install scipy==1.1.0
%cd "drive/My Drive/Colab Notebooks/ART - Assignment 1"

import random
import numpy as np
from data_process import get_CIFAR10_data
import math
from scipy.spatial import distance
from models import KNN, Perceptron, SVM, Softmax, LinearClassifier, LogisticRegression
from kaggle_submission import output_submission_csv
%matplotlib inline

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive
Collecting scipy==1.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl (31.2MB)
[K     |████████████████████████████████| 31.2MB 62.3MB/s 
[31mERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.[0m
Installing collected packages: scipy
  Found existing installation: sci

In [0]:

# You can change these numbers for experimentation
# For submission we will use the default values 
TRAIN_IMAGES = 49000
VAL_IMAGES = 1000
TEST_IMAGES = 5000

In [0]:
data = get_CIFAR10_data(TRAIN_IMAGES, VAL_IMAGES, TEST_IMAGES)
X_train, y_train = data['X_train'], data['y_train']
X_val, y_val = data['X_val'], data['y_val']
X_test, y_test = data['X_test'], data['y_test']

Convert the sets of images from dimensions of **(N, 3, 32, 32) -> (N, 3072)** where N is the number of images so that each **3x32x32** image is represented by a single vector.

In [0]:
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))

### Get Accuracy

This function computes how well your model performs using accuracy as a metric.

In [0]:
def get_acc(pred, y_test):
    return np.sum(y_test==pred)/len(y_test)*100

# K-Nearest Neighbors

The kNN classifier consists of two stages:

- During training, the classifier takes the training data and simply remembers it
- During testing, kNN classifies every test image by comparing to all training images and selecting the class that is most common among the k most similar training examples

In this exercise you will implement these steps using writing efficient, vectorized code. Your final implementation should not use for loops to loop over each of the test and train examples. Instead, you should calculate distances between vectorized forms of the datasets. You may refer to the `scipy.spatial.distance.cdist` function to do this efficiently.

The following code :
- Creates an instance of the KNN classifier class with k = 5
- The train function of the KNN class is trained on the training data
- We use the predict function for predicting testing data labels

### Training KNN

In [0]:
knn = KNN(5)
knn.train(X_train, y_train)

### Find best k on validation

The value of k is an important hyperparameter for the KNN classifier. We will choose the best k by examining the performance of classifiers trained with different k values on the validation set.

It's not necessary to try many different values of k for the purposes of this exercise. You may increase k by a magnitude of 2 each iteration up to around k=100 or something similar to get a sense of classifier performance for different k values.

**Modify the code below to loop though different values of k, train a KNN classifier for each k, and output the validation accuracy for each of the classifiers. Be sure to note your best k below as well.**

In [0]:
# TO DO : Experiment with different values of k
k = 5
knn = KNN(k)
knn.train(X_train, y_train)

pred_knn = knn.predict(X_val)
print('The validation accuracy is given by : %f' % (get_acc(pred_knn, y_val)))

The validation accuracy is given by : 36.100000


### Testing KNN

Finally, once you have found the best k according to your experiments on the validation set, retrain a classifier with the best k and test your classifier on the test set.

In [0]:
best_k = 3
knn = KNN(best_k)
knn.train(X_train, y_train)

In [0]:
pred_knn = knn.predict(X_test)
print('The testing accuracy is given by : %f' % (get_acc(pred_knn, y_test)))

The testing accuracy is given by : 35.400000


### KNN Kaggle Submission

Once you are satisfied with your solution and test accuracy output a file to submit your test set predictions to the Kaggle for Assignment 1 KNN. Use the following code to do so:

In [0]:
output_submission_csv('knn_submission.csv', knn.predict(X_test))

# Support Vector Machines (with SGD)

Next, you will implement a "soft margin" SVM. In this formulation you will maximize the margin between positive and negative training examples and penalize margin violations using a hinge loss.

We will optimize the SVM loss using SGD. This means you must compute the loss function with respect to model weights. You will use this gradient to update the model weights.

SVM optimized with SGD has 3 hyperparameters that you can experiment with :
- **Learning rate** - similar to as defined above in Perceptron, this parameter scales by how much the weights are changed according to the calculated gradient update. 
- **Epochs** - similar to as defined above in Perceptron.
- **Regularization constant** - Hyperparameter to determine the strength of regularization. In this case it is a coefficient on the term which maximizes the margin.

You will implement the SVM using SGD in the **models/SVM.py**

The following code: 
- Creates an instance of the SVM classifier class 
- The train function of the SVM class is trained on the training data
- We use the predict function to find the training accuracy as well as the testing accuracy

### Train SVM

In [0]:
svm = SVM(2e-07,18000)
svm.train(X_train, y_train)
#gradient=svm.calc_gradient(X_train,y_train)

In [0]:
pred_svm = svm.predict(X_train)
print(pred_svm)
print('The training accuracy is given by : %f' % (get_acc(pred_svm, y_train)))

[6 1 9 ... 4 6 8]
The training accuracy is given by : 37.344898


### Validate SVM

In [0]:
pred_svm = svm.predict(X_val)
print('The validation accuracy is given by : %f' % (get_acc(pred_svm, y_val)))

The validation accuracy is given by : 38.800000


### Test SVM

In [0]:
pred_svm = svm.predict(X_test)
print('The testing accuracy is given by : %f' % (get_acc(pred_svm, y_test)))

The testing accuracy is given by : 37.540000


In [0]:
#calculating the best learning rate and regularisation strengths
learn_rates = [0.5e-7, 1e-7, 2e-7, 6e-7]
reg_strengths = [500,5000,18000,20000,25000,30000,35000,40000,45000,50000]
bestParameters = [0, 0]
bestAcc = -1
bestModel = None
numClasses = np.max(y_train) + 1

for rs in reg_strengths:
    for lr in learn_rates:
        #print(str(lr)+"     "+str(rs))
        classifier = SVM(lr, rs)
        classifier.train(X_train, y_train)
        pred_svm = classifier.predict(X_val)
        valAcc = get_acc(pred_svm, y_val)
        #print("Regularization rate:",rs)
        #print("Learning Rate:",lr)
        #print("Accuracy:",valAcc)
        if valAcc > bestAcc:
            bestAcc = valAcc
            bestModel = classifier
            bestParameters = [lr,rs]
print("<------------------------------>")
print("Best Parameters:",bestParameters)
print("<------------------------------>")

<------------------------------>
Best Parameters: [2e-07, 18000]
<------------------------------>


### SVM Kaggle Submission

Once you are satisfied with your solution and test accuracy output a file to submit your test set predictions to the Kaggle for Assignment 1 SVM. Use the following code to do so:

In [0]:
output_submission_csv('svm_submission.csv', svm.predict(X_test))

# Logistic Regression


This classifier classifies binary data only. It can predict if a particular instance belongs to a particular class or not. Alter the class_val variable to classify a particular class.

The class values of CIFAR-10 dataset are,
airplane : 0

*   Airplane ----> 0
*   Automobile --> 1
*   Bird --------> 2
*   Cat ---------> 3
*   Deer --------> 4
*   Dog ---------> 5
*   Frog --------> 6
*   Horse -------> 7
*   Ship --------> 8
*   Truck -------> 9

### Train Logistic Regression

In [0]:
class_val = 5
yy_train = np.where(data["y_train"]==class_val, 1, 0).T
yy_val = np.where(data["y_val"]==class_val, 1, 0).T 
yy_test = np.where(data["y_test"]==class_val, 1, 0).T 

XX_train=X_train.T
yy_train=yy_train.T

XX_test=X_test.T
yy_test=yy_test.T

XX_val=X_val.T
yy_val=yy_val.T

XX_train = XX_train/255.
XX_test = XX_test/255.
XX_val = XX_val/255.

lgr = LogisticRegression()
lgr.train(XX_train, yy_train, num_iterations = 2000, learning_rate = 0.005)


In [0]:
pred_lgr = lgr.predict(XX_train)
print('The training accuracy is given by : %f' % (get_acc(pred_lgr, yy_train)))

The training accuracy is given by : 90.189796


### Validate Logistic Regression

In [0]:
pred_lgr = lgr.predict(XX_val)
print('The validation accuracy is given by : %f' % (get_acc(pred_lgr, yy_val)))

The validation accuracy is given by : 90.600000


### Testing Logistic Regression

In [0]:
pred_lgr = lgr.predict(XX_test)
print('The testing accuracy is given by : %f' % (get_acc(pred_lgr, yy_test)))

The testing accuracy is given by : 90.320000


### Logistic Regression Kaggle Submission

Once you are satisfied with your solution and test accuracy output a file to submit your test set predictions to the Kaggle for Assignment 1 Logistic Regression. Use the following code to do so:

In [0]:
output_submission_csv('logistic_submission.csv', lgr.predict(XX_test))

# Linear Classifier


This linear classifier classifies the CIFAR10 dataset. The dataset is imported as Tensor from the Pytorch library and it corresponding gradient and loss are found in tensor format.


In [0]:
lc = LinearClassifier()

### Train Linear Classifier

In [0]:
lc.train()

[1,  2000] loss: 2.160
[1,  4000] loss: 1.838
[1,  6000] loss: 1.639
[1,  8000] loss: 1.553
[1, 10000] loss: 1.495
[1, 12000] loss: 1.452
Finished Training
[2,  2000] loss: 1.378
[2,  4000] loss: 1.341
[2,  6000] loss: 1.327
[2,  8000] loss: 1.295
[2, 10000] loss: 1.278
[2, 12000] loss: 1.244
Finished Training


In [0]:
pred_y = lc.predict('train')
print(pred_y)

Accuracy of the network on the 50000 train images: 57 %
tensor([[ 0.6791, -3.3771,  1.1327,  1.0285,  3.3371,  1.4880, -1.0712,  2.5845,
         -2.6819, -2.0401],
        [-0.6433,  2.0178, -2.1142, -0.9707, -1.8392, -0.9683, -0.0438, -1.0062,
         -0.0644,  3.7347],
        [-1.4226, -3.3271,  2.4809,  2.5203,  1.7453,  2.2421,  2.5180, -0.8288,
         -3.0340, -3.3931],
        [-1.4034, -3.1537,  4.4823,  1.2480,  2.9038,  0.9080,  5.2580, -2.0016,
         -3.5879, -3.9760]])


### Testing Logistic Regression

In [0]:
pred_y = lc.predict('test')
print(pred_y)

Accuracy of the network on the 10000 test images: 55 %
tensor([[-2.6300, -3.8116,  2.0837,  3.3655,  2.6499,  2.8926,  4.1150, -0.1600,
         -4.2937, -4.1655],
        [-2.7961, -4.2933,  1.5953,  3.7190,  0.3255,  5.9770,  1.2937,  1.7417,
         -4.4888, -3.4388],
        [ 2.0419,  0.0887,  3.5738, -0.6992,  3.1226,  0.5898, -2.1230, -0.8175,
         -4.0420, -2.2996],
        [-0.7495, -3.2200, -0.3087,  0.5260,  4.2669,  1.6696, -0.6438,  6.4248,
         -4.0104, -1.7967]])


### Logistic Regression Kaggle Submission

Once you are satisfied with your solution and test accuracy output a file to submit your test set predictions to the Kaggle for Assignment 1 Logistic Regression. Use the following code to do so:

In [0]:
output_submission_csv('logistic_submission.csv', lc.predict('test'))

Accuracy of the network on the 10000 test images: 55 %
