## Exercise - DL Tutorial 03

Please complete the following notebook and submit your solutions to manuel.milling@informatik.uni-augsburg.de and maurice.gerczuk@informatik.uni-augsburg.de by May 05.

## student name: Anastasia Karsten, Pavlo Mospan

Load mnist data 

In [1]:
import numpy as np
#numpy random seed
np.random.seed(42)

trainx, trainy, testx, testy = np.load('mnist.npy', allow_pickle=True)
print("Trainx shape: {}".format(trainx.shape))
print("Trainy shape: {}".format(trainy.shape))
print("Testx shape:  {}".format(testx.shape))
print("Testy shape:  {}".format(testy.shape))


Trainx shape: (60000, 784)
Trainy shape: (60000,)
Testx shape:  (10000, 784)
Testy shape:  (10000,)


1.   Implement sigmoid-function.


In [2]:
def sigmoid(X):
    """
    X: input: shape (num_samples, num_neurons)
    return: element-wise application of sigmoid function
    """
    return(1/(1+np.exp(-X)))

# test = np.array([[2,3], [4,5]])
# print(sigmoid(test))


2.   Implement forward propagation for one layer.

In [3]:
def fcc_one_layer(H, W, b, activation):
    """
    H: input: shape (num_samples, num_neurons_in)
    W: weights: shape (num_neurons_in, num_neurons_out)
    b: bias: shape (num_neurons_out,)
    activation: activation function: python method
    return: forward propagation of layer.
    """
    return(activation(np.add(np.matmul(H, W),b)))

3.  Implement the softmax function

In [4]:
def softmax(X):
    """
    X: input: shape (num_samples, num_neurons)
    return: softmax function applied to neuron axis.
    """
    return(np.exp(X) / np.sum(np.exp(X), axis=0))

# test = np.array([[2,3], [4,5]])
# print(softmax(test))

4.   Implement the neural network class with weight- and bias-initialization. Note: First initialise all weights, then all biases in order to compare your results for the given random seed. 

5.   Implement the full forward propagation for our neural network and the given architecture.

In [5]:
class fcc:
    def __init__(self, n_input, n_hidden1, n_hidden2, n_out):
        """
        n_input: number of neurons in input layer: int
        n_hidden1: number of neurons in hidden layer 1: int
        n_hidden2: number of neurons in hidden layer 2: int
        n_out: number of neurons in output layer: int
        """
        self.W1 = np.random.randn(n_input, n_hidden1)
        self.W2 = np.random.randn(n_hidden1, n_hidden2)
        self.W3 = np.random.randn(n_hidden2, n_out)

        # self.b0 = np.random.randn(n_input)
        self.b1 = np.random.randn(n_hidden1)
        self.b2 = np.random.randn(n_hidden2)
        self.b3 = np.random.randn(n_out)
        
        self.numberOfParams = self.W1.size + self.W2.size + self.W3.size + len(self.b1) + len(self.b2) + len(self.b3)


    def forward_propagation(self, X):
        """
        X: input: shape (num_samples, num_pixels)
        return: predicition of the neural network
        """
        step1 = fcc_one_layer(X, self.W1, self.b1, sigmoid)
        step2 = fcc_one_layer(step1, self.W2, self.b2, sigmoid)
        return(fcc_one_layer(step2, self.W3, self.b3, softmax))


6.   Implement a function for the cross-entropy.

In [6]:
def cross_entropy(predictions, labels):
    """
    predictions: predicted probabilities for classes: shape (num_samples, num_classes)
    labels: correct classes: shape (num_samples,)
    return: cross_entropy averaged across all samples 
    """
    # return np.sum(-np.log(predictions[range(len(labels))], labels))/len(labels)
    
    m = labels.shape[0]
    log_likelihood = -np.log(predictions[range(m),labels])
    loss = np.sum(log_likelihood) / m
    return loss


7.   Implement a function for the accuracy.

In [7]:
def accuracy(predictions, labels):
    """
    predictions: predicted probabilities for classes: shape (num_samples, num_classes)
    labels: correct classes: shape (num_samples,)
    return: accuracy of the predictions
    """
    rights = 0

    pr = np.argmax(predictions, axis=1)
    for i in range(len(pr)):
        if pr[i] == labels[i]:
            rights += 1
    return rights/len(labels)

8.   Evaluate the loss and the accuracy of your network.

In [8]:
network = fcc(784, 400, 400, 10)
predictions = network.forward_propagation(trainx)
print("Cross-Entropy: ", cross_entropy(predictions, trainy))
print("Accuracy:", accuracy(predictions, trainy))

Cross-Entropy:  28.691579283356656
Accuracy: 0.09853333333333333


9.   Which are the parameters we can tune to improve the performance of our network? How many trainable parameters (scalars) does our network have in total?

In [9]:
# We can tune parameters of hidden layers (hidden1, hidden2), since they brin complexity to the model. 
# We could make less / more neurons in each layer.

# Trainable parameters : 
# n_iput * n_hidden1 + b1 + n_hidden1 * n_hidden2 + b2 + n_hidden2 * out + b3 = 
# = 784 * 400 + 400 + 400 * 400 + 400 + 400 * 10 + 10 = 478,410

network.numberOfParams

478410

10.  Why did we implement two different evaluation metrics of our system (cross-entropy and accuracy)? What are the main differences between the two and why can’t/shouldn’t we use them interchangeably?

Accuracy and cross-entropy measure different things. Cross-entropy loss is the difference between the predicted value of model and the true value, thus awarding lower loss to predictions which are closer to the class label. 
<b>A loss function is used to optimize a machine learning algorithm.</b>

The accuracy, on the other hand, is a binary true/false for a particular sample. It tends to increase with the decrease in loss. <b>An accuracy metric is used to measure the algorithm’s performance (accuracy) in an interpretable way.</b> Loss is not always a good and trustworthy performance measure, cause if 90% of the samples are “red”, then the model would have good accuracy score if it simply predicts “red” every time.

So, cross-entropy is a continuous variable i.e. it’s best when predictions are close to 1 (for true labels) and close to 0 (for false ones) and accuracy is discrete valued.

Sometimes loss is getting better while accuracy is getting worse, or vice versa. If the model becomes over-confident in its predictions, a single false prediction will increase the loss unproportionally compared to the (minor) drop in accuracy. An over-confident (overfit) model can have good accuracy but bad loss.

The other reason why we cannot use them both inerchangeably is the fact, that accuracy isn’t differentiable so it can’t be used for back-propagation by the learning algorithm and we need a differentiable loss function to act as a good proxy for accuracy.