## __Loss Function__

Let's get our code from the previous notebook

In [1]:
import numpy as np

class Dense:
  def __init__(self, input_neurons, output_neurons):
    self.weights = np.random.randn(input_neurons, output_neurons)
    self.biases = np.zeros((1, output_neurons))
  def forward(self, inputs):
    self.output = np.dot(inputs, self.weights) + self.biases

class ReLu:
  def forward(self, inputs):
    self.output = np.maximum(0, inputs)

class Softmax:
    def forward(self, inputs):
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        self.output = exp_values / np.sum(exp_values, axis=1, keepdims=True)   

layer_one = Dense(784, 128)
layer_two = Dense(128, 64)
layer_three = Dense(64, 10)

relu_one = ReLu()
relu_two = ReLu()
softmax = Softmax()

So let's learn how loss is calculated. Consider a ball in a picture. It can be either red, green or blue. A sample model will produce a probability distribution of the probabilities, such as $\hat{y}=[0.7, 0.1, 0.2]$, where the values represent green, red, and blue respectively. The reason we are using probability distribtuions instead of accuracy, is that along with telling us about if our mode made incorrect predictions, it tells us how incorrect were they, or it's also done to increase the confidence of the model. Now the label of that picture was green, which can be represented with a probability distribution of $y=[1, 0, 0]$. To calculate loss for each sample, we first calculate the log of the values in $\hat{y}$, and then we multiply them by the corresponding true value, then multiply by -1. After that we add it all up, which looks like:

$L = -(1 \times \ln(0.7) + 0 \times \ln(0.2) + 0 \times \ln(0.1)) = -0.3563$

### Loss function: $-\sum_{ }^{ }y_{i}\cdot\ln\left(\hat{y_{i}}\right)$

To simplify the loss of a sample, is the negative of the natural log of the true value. Now let's implement loss.

In [2]:
class Loss:
  def calculate(self, y_pred, y_true):
    samples = len(y_pred)
    y_pred = np.clip(y_pred, 1e-7, 1-1e-7)
    correct_confidences = y_pred[range(samples), y_true]
    return -np.mean(np.log(correct_confidences))

First we get how many samples are in a batch, so we can go through each sample, and get the correct confidence for each. We clip the predicted values, because if for the correct label they are 0, we can recieve log errors. Then using some indexing we extract the correct confidences, and use our loss function to calculate the loss for the entire batch, which is then returned. Now let's import the dataset and implement our new code

In [3]:
import pickle
with open('dataset.p', 'rb') as file:
  X, y = pickle.load(file)
X = X.reshape(X.shape[0], -1)

loss_function = Loss()

layer_one.forward(X)
relu_one.forward(layer_one.output)
layer_two.forward(relu_one.output)
relu_two.forward(layer_two.output)
layer_three.forward(relu_two.output)
softmax.forward(layer_three.output)
loss = loss_function.calculate(softmax.output, y)
print(loss)

14.53908326617032


So it looks like the loss is around 14. This will obviously be the case, since we have not actually trained our model. But that's what we will do next time, when we implement backpropagation.