# Chapter 5 : Calculating Network Error with Loss

Pg. 111-130

## Subsections
1. Introduction
2. Categorical Cross-Entropy Loss
3. The Categorical Cross-Entropy Loss Class
4. Combining everything up to this point
5. Accuracy Calculation

# Introduction

### Loss Function
- also known as the cost function
- algorithm that quantifies how wrong a model is
- ideally we want the loss to be 0
- why do we not calculate the error of a model based on the argmax accuracy? 
- we strive to increase the correct confidence and decrease misplaced confidence

# Categorical Cross-Entropy Loss

- used for classification problems
- model has softmax activation function for output layer
- means outputting a porbabilities distribution
- categorical cross entropy : used to compare "ground truth" probabilities and predicted probabilities
- categorical cross entropy most commonly used loss function for softmax activation on output layer
- categorical cross entropy compares two probability distributions
- equation: Li = -sum(y_i * log(y_hat_i))
- Li : loss for a single sample
- y_i : ground truth probability
- we can simplify equation more by using one-hot encoding
- new equation: Li = -log(correct_class_confidence) or Li = -log(yi, k) where K is index of true probability

In [94]:
# example softmax output for cross-entropy loss
softmax_output = [0.7, 0.1, 0.2]

# example of it one-hot encoded
# means only 1 value is 1 and rest are 0
one_hot_targets = [1, 0, 0]

In [95]:
# example
import math

# example output from output layer of nn
softmax_output = [0.7, 0.1, 0.2]
# ground truth
target_output = [1, 0, 0]

loss = -(math.log(softmax_output[0])*target_output[0] + # 0.35667494393873245
         math.log(softmax_output[1])*target_output[1] + # 0
         math.log(softmax_output[2])*target_output[2]) # 0

print(f"Full Loss: {loss}")
print(f"1st Output Loss: {-(math.log(softmax_output[0])*target_output[0])}")
print(f"2nd Output Loss: {-(math.log(softmax_output[1])*target_output[1])}")
print(f"3rd Output Loss: {-(math.log(softmax_output[2])*target_output[2])}")

Full Loss: 0.35667494393873245
1st Output Loss: 0.35667494393873245
2nd Output Loss: 0.0
3rd Output Loss: 0.0


you can see from the outputs from that only the 1st (true) had a value and therefore matters. The rest don't even need to be calculated. The other 2 classes are 0 and anything multiplied by 0 is 0.

In [96]:
# more analysis
print(f"Softmax Output [0]: {softmax_output[0]}")
print(f"target output[0] : {target_output[0]}")
print(f"Loss: {-(math.log(softmax_output[0])*target_output[0])}")

Softmax Output [0]: 0.7
target output[0] : 1
Loss: 0.35667494393873245


The great thing about cross entropy loss is it takes into account the confidence of the model. If the model is very confident and wrong, the loss will be very high. If the model is very confident and right, the loss will be very low. If the model is not confident, the loss will be somewhere in the middle.

In [97]:
# categorical cross-entropy loss takes into account confidence
higher_conf = [0.22, 0.6, 0.18]
lower_conf = [0.32, 0.36, 0.32]

In [98]:
import math

print(f"Confidence of 1 has Loss value = {math.log(1.)}") # Confidence of 1 means 100% sure
# [1, 0, 0]
print(f"0.95 : {math.log(0.95)}")
# [0.95, 0.05, 0]
print(f"0.9 : {math.log(0.9)}")
print(f"0.8 : {math.log(0.8)}")
print(f"0.2 : {math.log(0.2)}")
print(f"0.1 : {math.log(0.1)}")
print(f"0.05 : {math.log(0.05)}")
print(f"0.01 : {math.log(0.01)}")

Confidence of 1 has Loss value = 0.0
0.95 : -0.05129329438755058
0.9 : -0.10536051565782628
0.8 : -0.2231435513142097
0.2 : -1.6094379124341003
0.1 : -2.3025850929940455
0.05 : -2.995732273553991
0.01 : -4.605170185988091


### Log
- log is short for logarithm
- a^x = b
- example: 10^x = 100 --> log10(100) = 2
- eulers number - 2.71828
- Logarithsm we e as base are known as natural logarithms or ln(x) = log(x) = log_e(x)
- any menthion of log will always be natural logarithm
- example: e^x = b ---> e^x = 5.2 -->log(5.2)

In [99]:
import numpy as np

b = 5.2
print(np.log(b))

1.6486586255873816


In [100]:
# confirm this by exponentiating our result
import math

print(math.e ** 1.6486586255873816)

5.199999999999999


### Loss Calculation

we have 2 more things to do. First we need to fix to handle batches and second make negative log calculation dynamic to target index.

In [101]:
# probabilities for 3 samples
# batches
softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])
class_targets = [0, # dog
                 1, # cat
                 1] # cat
# class target 0 intended for 0.7
# class target 1 intended for 0.5
# class target 1 intended for 0.9

for targ_idx, distribution in zip(class_targets, softmax_outputs):
    print(distribution[targ_idx])

0.7
0.5
0.9


In [102]:
# simplify above with numpy
softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])
class_targets = np.array([0, 1, 1])
print(softmax_outputs[[0, 1, 2], class_targets])

[0.7 0.5 0.9]


In [103]:
# simplify above even more with range
print("Confidences: ")

print(softmax_outputs[
    range(len(softmax_outputs)), class_targets
])
# returns list of highest confidences for each sample

Confidences: 
[0.7 0.5 0.9]


In [104]:
# applying negative log to confidences
print("Negative Log: ")
print(-np.log(softmax_outputs[
    range(len(softmax_outputs)), class_targets
]))

Negative Log: 
[0.35667494 0.69314718 0.10536052]


In [105]:
# average loss per batch - numpy

neg_log = -np.log(softmax_outputs[
    range(len(softmax_outputs)), class_targets
])
print(f"Losses : {neg_log}")

# average loss
average_loss = np.mean(neg_log)
print(f"Average Loss: {average_loss}")


Losses : [0.35667494 0.69314718 0.10536052]
Average Loss: 0.38506088005216804


In [106]:
import numpy as np

softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])
class_targets = np.array([[1, 0, 0],
                         [0, 1, 0],
                         [0, 1, 0]])

print(f"Shape of Class Targets : {len(class_targets.shape)}")
# if 2 dimensional, then it is one-hot encoded
# if 1 dimensional, then it is sparse

# probabilities for target values
# only if categorical labels
if len(class_targets.shape) == 1:
    print("Dealing with - Sparse")
    correct_confidences = softmax_outputs[
        range(len(softmax_outputs)),
        class_targets
    ]

# mask values - only for one-hot encoded labels
elif len(class_targets.shape) == 2:
    print("Dealing with - One-hot Encoded")
    correct_confidences = np.sum(
        softmax_outputs * class_targets,
        axis=1)

# losses
neg_log = -np.log(correct_confidences)

# average loss
average_loss = np.mean(neg_log)
print(f"Average Loss: {average_loss}")

Shape of Class Targets : 2
Dealing with - One-hot Encoded
Average Loss: 0.38506088005216804


In [107]:
# dealing with confidence of 0
import numpy as np

print(-np.log(0))
# runtime warning
# log(0) is undefined

inf


  print(-np.log(0))


In [108]:
# cant use 0 so a clipped version for both sides of the log will do
# will prevent loss from being exactly 0 just make it very close to 0
print(-np.log(1-1e-7))

1.0000000494736474e-07


In [109]:
# use numpy to clip
y_pred = [0, 1, 0]

y_pred_clippped = np.clip(y_pred, 1e-7, 1-1e-7)
print(y_pred_clippped)

[1.000000e-07 9.999999e-01 1.000000e-07]


# The Categorical Cross-Entropy Loss Class

In [110]:
# common loss class
class Loss:

    # calculates data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):

        # calculate sample losses
        sample_losses = self.forward(output, y)
        print("Sample Losses: ", sample_losses)

        # calculate mean loss
        data_loss = np.mean(sample_losses)
        print(f"Data Loss: {data_loss}")

        # return loss
        return data_loss

In [111]:
# cross entropy loss
# inherits from loss class ****
class Loss_CategoricalCrossentropy(Loss):

    # forward pass
    def forward(self, y_pred, y_true):

        # num of samples in batch
        samples = len(y_pred)
        print(f"Samples: {samples}")

        # clip data to prevent division by 0
        # clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)
        print(f"Clipped Predictions: {y_pred_clipped}")

        # probabilities for target values
        # only if categorical labels
        if len(y_true.shape) == 1:
            print("Dealing with - Sparse")
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]
            print(f"Correct Confidences: {correct_confidences}")
        
        # mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            print("Dealing with - One-hot Encoded")
            correct_confidences = np.sum(
                y_pred_clipped * y_true,
                axis=1
            )
            print(f"Correct Confidences: {correct_confidences}")
        
        # losses
        negative_log_likelihoods = -np.log(correct_confidences)
        print(f"Negative Log Likelihoods: {negative_log_likelihoods}")

        return negative_log_likelihoods

In [112]:
loss_function = Loss_CategoricalCrossentropy()
loss = loss_function.calculate(softmax_outputs, class_targets)
print(f"Loss: {loss}")

Samples: 3
Clipped Predictions: [[0.7  0.1  0.2 ]
 [0.1  0.5  0.4 ]
 [0.02 0.9  0.08]]
Dealing with - One-hot Encoded
Correct Confidences: [0.7 0.5 0.9]
Negative Log Likelihoods: [0.35667494 0.69314718 0.10536052]
Sample Losses:  [0.35667494 0.69314718 0.10536052]
Data Loss: 0.38506088005216804
Loss: 0.38506088005216804


# Combining everything up to this point

In [119]:
import numpy as np
import nnfs
from nnfs.datasets import spiral_data

nnfs.init()

# dense layer
class Layer_Dense:

    # layer initialization
    def __init__(self, n_inputs, n_neurons):
        # initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
    
    # forward pass
    def forward(self, inputs):
        # calculate output values from inputs, weights, and biases
        self.output = np.dot(inputs, self.weights) + self.biases

# ReLU activation
class Activation_ReLU:
    
        # forward pass
        def forward(self, inputs):
            # calculate output values from inputs
            self.output = np.maximum(0, inputs)

# softmax activation
class Activation_Softmax:
         
        # forward pass
        def forward(self, inputs):
    
            # get unnormalized probabilities
            exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
    
            # normalize them for each sample
            probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
    
            self.output = probabilities

# common loss class
class Loss:
         
        # calculates data and regularization losses
        # given model output and ground truth values
        def calculate(self, output, y):
    
            # calculate sample losses
            sample_losses = self.forward(output, y)
            #print("Sample Losses: ", sample_losses)
    
            # calculate mean loss
            data_loss = np.mean(sample_losses)
            #print(f"Data Loss: {data_loss}")
    
            # return loss
            return data_loss
        
# cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):
     
     # forward pass
    def forward(self, y_pred, y_true):
         
        # number of samples in a batch
        samples = len(y_pred)
         
        # clip data to prevent division by 0
        # clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
         
        # probabilities for target values
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[range(samples), y_true]
             
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)
         
        # losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods
    
# create dataset
X, y = spiral_data(samples=100, classes=3)

# create dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)

# create ReLU activation (to be used with Dense Layer)
activation1 = Activation_ReLU()

# create second dense layer with 3 input features (as we take output of previous layer here) and 3 output values
dense2 = Layer_Dense(3, 3)

# create softmax activation (to be used with Dense Layer)
activation2 = Activation_Softmax()

# create loss function
loss_function = Loss_CategoricalCrossentropy()

# perform a forward pass of our training data through this layer
dense1.forward(X)

# perform a forward pass through activation function
# takes the output of first dense layer here
activation1.forward(dense1.output)

# perform a forward pass through second Dense Layer
# takes outputs of activation function of first layer as inputs
dense2.forward(activation1.output)

# perform a forward pass through activation function
# takes the output of second dense layer here
activation2.forward(dense2.output)

# let's see output of the first few samples:
print(f"Output of Few Samples : \n{activation2.output[:5]}")

# perform a forward pass through activation function
# takes the output of second dense layer here and returns loss
loss = loss_function.calculate(activation2.output, y)

# print loss value
print('Loss:', loss)


Output of Few Samples : 
[[0.33333334 0.33333334 0.33333334]
 [0.33333316 0.3333332  0.33333364]
 [0.33333287 0.3333329  0.33333418]
 [0.3333326  0.33333263 0.33333477]
 [0.33333233 0.3333324  0.33333528]]
Loss: 1.0986104
Accuracy: 0.34


We get 0.33 since model is random and loss is bad for that reason.

# Accuracy Calculation

In [117]:
import numpy as np

# probabilities of 3 samples
softmax_outputs = np.array([[0.7, 0.2, 0.1],
                            [0.5, 0.1, 0.4],
                            [0.02, 0.9, 0.08]])
# target (ground-truth) labels for 3 samples
class_targets = np.array([0, 1, 1])

# calculate values along second axis (axis of index 1)
predictions = np.argmax(softmax_outputs, axis=1)
print(f"Predictions: {predictions}")
# if targets are one-hot encoded - convert them
if len(class_targets.shape) == 2:
    class_targets = np.argmax(class_targets, axis=1)
    print(f"Class Targets (one hot encode): {class_targets}")
# True evaluates to 1; False to 0
accuracy = np.mean(predictions == class_targets)
print(f"Accuracy: {accuracy}")

Predictions: [0 0 1]
Accuracy: 0.6666666666666666


In [118]:
# code to add to Full Code up to this Point

# calculate accuracy from output of activation2 and targets
# calculate values along first axis
predictions = np.argmax(activation2.output, axis=1)
if len(y.shape) == 2:
    y = np.argmax(y, axis=1)
accuracy = np.mean(predictions == y)

# print accuracy
print('Accuracy:', accuracy)

Accuracy: 0.34
