# COMS 4995_002 Deep Learning Assignment 1
Due on Thursday, Feb 8, 11:59pm

This assignment can be done in groups of at most 2 students. Everyone must submit on Courseworks individually.

Write down the UNIs of your group (if applicable)

Member 1: Leon Stilwell, ls3223

Member 2: Saahil Jain, sj2675

In [59]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.misc
import glob
import sys
# you shouldn't need to make any more imports

In [130]:
class NeuralNetwork(object):
    """
    Abstraction of neural network.
    Stores parameters, activations, cached values. 
    Provides necessary functions for training and prediction. 
    """
    def __init__(self, layer_dimensions, drop_prob=0.0, reg_lambda=0.0):
        """
        Initializes the weights and biases for each layer
        :param layer_dimensions: (list) number of nodes in each layer
        :param drop_prob: drop probability for dropout layers. Only required in part 2 of the assignment
        :param reg_lambda: regularization parameter. Only required in part 2 of the assignment
        """
        np.random.seed(1)
        
        self.parameters = {}
        self.num_layers = len(layer_dimensions) - 1
        self.drop_prob = drop_prob
        self.reg_lambda = reg_lambda
        
        # init parameters
        for layer in range(1, self.num_layers + 1):
            curActivationUnits = layer_dimensions[layer]
            prevActivationUnits = layer_dimensions[layer - 1]
            weight = np.divide(np.random.normal(0, 1, (curActivationUnits, prevActivationUnits)), np.sqrt(prevActivationUnits))
            bias = np.zeros(curActivationUnits)
            self.parameters[layer] = [weight, bias]

    def affineForward(self, A, W, b):
        """
        Forward pass for the affine layer.
        :param A: input matrix, shape (L, S), where L is the number of hidden units in the previous layer and S is
        the number of samples
        :returns: the affine product WA + b, along with the cache required for the backward pass
        """
        Z = np.dot(W, A)
        for i in range(len(Z)):
            Z[i] = Z[i] + b[i]
        return Z, [W, A, b, Z]
        
    def activationForward(self, A, activation="relu"):
        """
        Common interface to access all activation functions.
        :param A: input to the activation function
        :param prob: activation funciton to apply to A. Just "relu" for this assignment.
        :returns: activation(A)
        """ 
        if(activation == "relu"):
            ret = self.relu(A)
        return ret

    def relu(self, X):
        A = np.maximum(0, X)
        return A
            
    def dropout(self, A, prob):
        """
        :param A:
        :param prob: drop prob
        :returns: tuple (A, M) 
            WHERE
            A is matrix after applying dropout
            M is dropout mask, used in the backward pass
        """
        M = np.random.rand(A.shape[0], A.shape[1])
        M = (M > prob) * 1.0
        M /= (1 - prob)
        A *= M
        return A, M

    def forwardPropagation(self, X):
        """
        Runs an input X through the neural network to compute activations
        for all layers. Returns the output computed at the last layer along
        with the cache required for backpropagation.
        :returns: (tuple) AL, cache
            WHERE 
            AL is activation of last layer
            cache is cached values for each layer that
                     are needed in further steps
        """
        cache = []
        cache.append([]) # Empty cache for layer 1
        Z, cacheLayer = self.affineForward(X, self.parameters[1][0], self.parameters[1][1])
        A = self.activationForward(Z)
        cache.append(cacheLayer)
        for layer in range(2, self.num_layers):
            Z, cacheLayer = self.affineForward(A, self.parameters[layer][0], self.parameters[layer][1])
            A = self.activationForward(Z)
            cache.append(cacheLayer)
        Z, cacheLayer = self.affineForward(A, self.parameters[self.num_layers][0], self.parameters[self.num_layers][1])
        AL = self.softmax(Z)
        cache.append(cacheLayer)
        
        return AL, cache
    
    def costFunction(self, AL, y):
        """
        :param AL: Activation of last layer, shape (num_classes, S)
        :param y: labels, shape (S)
        :param alpha: regularization parameter
        :returns cost, dAL: A scalar denoting cost and the gradient of cost
        """
        # compute loss
        m = y.shape[0]
        correct_label_prob = AL[y, range(m)]
        cost = -np.sum(np.log(correct_label_prob)) / m
        
        #if self.reg_lambda > 0:
            # add regularization
            # reg =
        
        # gradient of cost
        dAL = AL
        dAL[y, range(AL.shape[1])] = dAL[y, range(AL.shape[1])] - 1
        return cost, dAL
    
    def softmax(self, X):
        return np.exp(X) / np.sum(np.exp(X), axis = 0)

    def affineBackward(self, dA_prev, cache):
        """
        Backward pass for the affine layer.
        :param dA_prev: gradient from the next layer.
        :param cache: cache returned in affineForward
        :returns dA: gradient on the input to this layer
                 dW: gradient on the weights
                 db: gradient on the bias
        """
        W = cache[0]
        A = cache[1]
        b = cache[2]
        Z = cache[3]
        dZ_prev = np.multiply(dA_prev, self.relu_derivative(Z))
        dA = np.dot(W.transpose(), dZ_prev)
        dW = np.dot(dZ_prev, A.transpose())
        db = np.mean(dZ_prev, axis = 1) # Aggregate samples
        
        return dA, dW, db
    
    def affineBackwardLastLayer(self, dA_prev, Y, cache):
        W = cache[0]
        A = cache[1]
        b = cache[2]
        Z = cache[3]
        dZ_prev = dA_prev
        dA = np.dot(W.transpose(), dZ_prev)
        dW = np.dot(dZ_prev, A.transpose())
        db = np.mean(dZ_prev, axis = 1) # Aggregate samples
        
        return dA, dW, db

#     def activationBackward(self, dA, cache, activation="relu"):
#         """
#         Interface to call backward on activation functions.
#         In this case, it's just relu. 
#         """
#         We decide not to use activationBackward, see affineBackward
#
        
    def relu_derivative(self, cached_x):
        # We don't use dx
        relu_d = 1 * (cached_x > 0)
        return relu_d
    
    def dropout_backward(self, dA, cache):
        return dA

    def backPropagation(self, dAL, Y, cache):
        """
        Run backpropagation to compute gradients on all paramters in the model
        :param dAL: gradient on the last layer of the network. Returned by the cost function.
        :param Y: labels
        :param cache: cached values during forwardprop
        :returns gradients: dW and db for each weight/bias
        """
        gradients = {}
        dA, dW, db = self.affineBackwardLastLayer(dAL, Y, cache[self.num_layers])
        gradients[self.num_layers] = [dW, db]
        
        for i in range(self.num_layers - 1):
            layer = self.num_layers - 1 - i
            dA, dW, db = self.affineBackward(dA, cache[layer])
            gradients[layer] = [dW, db]
            
            #if self.drop_prob > 0:
                #call dropout_backward
            
        #if self.reg_lambda > 0:
            # add gradients from L2 regularization to each dW
        
        return gradients

    def updateParameters(self, gradients, alpha):
        """
        :param gradients: gradients for each weight/bias
        :param alpha: step size for gradient descent 
        """
        # W = W - alpha * gradients [for each layer]
        for layer in range(1, self.num_layers + 1):
            weight = self.parameters[layer][0]
            bias = self.parameters[layer][1]
            gradientW = gradients[layer][0]
            gradientB = gradients[layer][1]
            self.parameters[layer][0] = weight - alpha * gradientW
            self.parameters[layer][1] = bias - alpha * gradientB
    
    def calculateAccuracy(self, y_actual, y_prediction):
        correct = 0
        for i in range(len(y_actual)):
            if y_prediction[i] == y_actual[i]:
                correct = correct + 1
        accuracy = correct / len(y_actual) * 100
        return accuracy

    def train(self, X, y, iters=1000, alpha=0.0001, batch_size=100, print_every=100):
        """
        :param X: input samples, each column is a sample
        :param y: labels for input samples, y.shape[0] must equal X.shape[1]
        :param iters: number of training iterations
        :param alpha: step size for gradient descent
        :param batch_size: number of samples in a minibatch
        :param print_every: no. of iterations to print debug info after
        """
        # Shuffle data sets
        shuffledIndices = np.random.permutation(X.shape[1])
        X_shuffled = X[:, shuffledIndices]
        y_shuffled = y[shuffledIndices]
        
        # Split into training and validation sets
        splitIndex = int(.9 * len(X[0])) # 90% train, 10% validation
        X_train = X_shuffled[:, :splitIndex]
        y_train = y_shuffled[:splitIndex]
        X_validation = X_shuffled[:, splitIndex:]
        y_validation = y_shuffled[splitIndex:]
        
        # Train on batches, Test on train / validation sets
        startIndex = 0
        endIndex = startIndex + batch_size
        for i in range(0, iters):
            # get minibatch
            X_mini, y_mini = self.get_batch(X_train, y_train, batch_size, startIndex, endIndex)
            
            # forward prop
            AL, cache = self.forwardPropagation(X_mini)

            # compute loss
            cost, dAL = self.costFunction(AL, y_mini)

            # compute gradients
            gradients = self.backPropagation(dAL, y_mini, cache)

            # update weights and biases based on gradient
            self.updateParameters(gradients, alpha)

            if i % print_every == 0:
                # print cost, train and validation set accuracies
                print("Metrics for Iteration " + str(i))
                
                # cost
                print("     Cost: " + str(cost))
                
                # train accuracy
                y_train_prediction = self.predict(X_train)
                train_accuracy = self.calculateAccuracy(y_train, y_train_prediction)
                print("     Training accuracy: " + "{0:.3f}".format(train_accuracy) + " percent")
                
                # validation accuracy
                y_validation_prediction = self.predict(X_validation)
                validation_accuracy = self.calculateAccuracy(y_validation, y_validation_prediction)
                print("     Validation accuracy: " + "{0:.3f}".format(validation_accuracy) + " percent")

            # update indices
            startIndex = startIndex + batch_size
            endIndex = endIndex + batch_size
            if(endIndex > len(y_train)):
                shuffledIndices = np.random.permutation(X.shape[1])
                X_train = X[:, shuffledIndices]
                y_train = y[shuffledIndices]
                startIndex = 0
                endIndex = startIndex + batch_size
            
                
    def predict(self, X):
        """
        Make predictions for each sample
        """
        AL, cache = self.forwardPropagation(X)
        y_pred = np.argmax(AL, axis = 0)

        return y_pred

    def get_batch(self, X, y, batch_size, startIndex, endIndex):
        """
        Return minibatch of samples and labels
        
        :param X, y: samples and corresponding labels
        :param batch_size: minibatch size
        :returns: (tuple) X_batch, y_batch
        """
        X_batch = X[:, startIndex : endIndex]
        y_batch = y[startIndex : endIndex]
        
        return X_batch, y_batch

In [131]:
# Helper functions, DO NOT modify this

def get_img_array(path):
    """
    Given path of image, returns it's numpy array
    """
    return scipy.misc.imread(path)

def get_files(folder):
    """
    Given path to folder, returns list of files in it
    """
    filenames = [file for file in glob.glob(folder+'*/*')]
    filenames.sort()
    return filenames

def get_label(filepath, label2id):
    """
    Files are assumed to be labeled as: /path/to/file/999_frog.png
    Returns label for a filepath
    """
    tokens = filepath.split('/')
    label = tokens[-1].split('_')[1][:-4]
    if label in label2id:
        return label2id[label]
    else:
        sys.exit("Invalid label: " + label)

In [132]:
# Functions to load data, DO NOT change these

def get_labels(folder, label2id):
    """
    Returns vector of labels extracted from filenames of all files in folder
    :param folder: path to data folder
    :param label2id: mapping of text labels to numeric ids. (Eg: automobile -> 0)
    """
    files = get_files(folder)
    y = []
    for f in files:
        y.append(get_label(f,label2id))
    return np.array(y)

def one_hot(y, num_classes=10):
    """
    Converts each label index in y to vector with one_hot encoding
    """
    y_one_hot = np.zeros((y.shape[0], num_classes))
    y_one_hot[y] = 1
    return y_one_hot.T

def get_label_mapping(label_file):
    """
    Returns mappings of label to index and index to label
    The input file has list of labels, each on a separate line.
    """
    with open(label_file, 'r') as f:
        id2label = f.readlines()
        id2label = [l.strip() for l in id2label]
    label2id = {}
    count = 0
    for label in id2label:
        label2id[label] = count
        count += 1
    return id2label, label2id

def get_images(folder):
    """
    returns numpy array of all samples in folder
    each column is a sample resized to 30x30 and flattened
    """
    files = get_files(folder)
    images = []
    count = 0
    
    for f in files:
        count += 1
        if count % 10000 == 0:
            print("Loaded {}/{}".format(count,len(files)))
        img_arr = get_img_array(f)
        img_arr = img_arr.flatten() / 255.0
        images.append(img_arr)
    X = np.column_stack(images)

    return X

def get_train_data(data_root_path):
    """
    Return X and y
    """
    train_data_path = data_root_path + 'train'
    id2label, label2id = get_label_mapping(data_root_path + 'labels.txt')
    print(label2id)
    X = get_images(train_data_path)
    y = get_labels(train_data_path, label2id)
    return X, y

def save_predictions(filename, y):
    """
    Dumps y into .npy file
    """
    np.save(filename, y)

In [137]:
# Load the data
data_root_path = 'cifar10-hw1/'
X_train, y_train = get_train_data(data_root_path) # this may take a few minutes
X_test = get_images(data_root_path + 'test')
print('Data loading done')

{'automobile': 1, 'horse': 7, 'deer': 4, 'frog': 6, 'dog': 5, 'bird': 2, 'airplane': 0, 'truck': 9, 'cat': 3, 'ship': 8}
Loaded 10000/50000
Loaded 20000/50000
Loaded 30000/50000
Loaded 40000/50000
Loaded 50000/50000
Loaded 10000/10000
Data loading done


## Part 1

#### Simple fully-connected deep neural network
I achieve validation accuracy of 52.92% in final iteration.

In [133]:
layer_dimensions = [X_train.shape[0], 400, 200, 50, 10]  # including the input and output layers
NN = NeuralNetwork(layer_dimensions)
NN.train(X_train, y_train, iters=5000, alpha=0.001, batch_size=100, print_every=100)

Metrics for Iteration 0
     Cost: 2.31618804101
     Training accuracy: 10.762 percent
     Validation accuracy: 10.800 percent
Metrics for Iteration 100
     Cost: 2.00905667779
     Training accuracy: 21.529 percent
     Validation accuracy: 20.920 percent
Metrics for Iteration 200
     Cost: 2.18300802418
     Training accuracy: 29.156 percent
     Validation accuracy: 27.720 percent
Metrics for Iteration 300
     Cost: 1.94265341061
     Training accuracy: 28.356 percent
     Validation accuracy: 28.200 percent
Metrics for Iteration 400
     Cost: 1.86188646622
     Training accuracy: 33.318 percent
     Validation accuracy: 32.760 percent
Metrics for Iteration 500
     Cost: 1.82142407766
     Training accuracy: 37.420 percent
     Validation accuracy: 36.920 percent
Metrics for Iteration 600
     Cost: 1.75873650464
     Training accuracy: 36.560 percent
     Validation accuracy: 36.680 percent
Metrics for Iteration 700
     Cost: 1.76325894479
     Training accuracy: 33.868 per

In [143]:
y_predicted = NN.predict(X_test)
save_predictions('ans1-UNI', y_predicted)

In [145]:
# test if your numpy file has been saved correctly
loaded_y = np.load('ans1-UNI.npy')
print(loaded_y.shape)
loaded_y[:10]

(10000,)


array([3, 9, 0, 4, 5, 1, 9, 4, 8, 1])

## Part 2: Improving the performance

In [None]:
NN2 = NeuralNetwork(layer_dimensions, drop_prob=0, reg_lambda=0)
NN2.train(X_train, y_train, iters=1000, alpha=0.00001, batch_size=1000, print_every=10)

In [None]:
y_predicted2 = NN2.predict(X)
save_predictions(y_predicted, 'ans2-UNI')

Write down results for Part 2 here:
...