# Chapter 5 - Loss

## Categorical Cross Entropy

$$
L_i = -\sum_j y_{i,j}log(\hat{y}_{i,j})
$$

Where:
  * $L_i$ Sample loss value
  * $i$ ith sample
  * $j$ is the label/output index
  * $y$ target value
  * $\hat{y}$ predicted values

In [20]:
# imports
import math
import numpy as np
import nnfs
from nnfs.datasets import spiral_data

In [3]:
# data
softmax_output = [0.7, 0.1, 0.2] # y hat
target_output = [1, 0, 0] # y

In [4]:
# calculate loss function
loss = -(
    math.log(softmax_output[0]) * target_output[0] +
    math.log(softmax_output[1]) * target_output[1] +
    math.log(softmax_output[2]) * target_output[2]
)
print(loss)

0.35667494393873245


Since `target_output[1]` and `target_output[2]` are 0 we can omit these terms.
And given that `target_output[0]` is 1, we can simplify the above as:

In [5]:
softmax_output = [0.7, 0.1, 0.2] # y hat
loss = -math.log(softmax_output[0])
print(loss)

0.35667494393873245


-------

In [6]:
# Probabilities for 3 samples
softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])

class_targets = [0, 1, 1]  # dog, cat, cat

indexed_distributions = zip(class_targets, softmax_outputs)
print(list(indexed_distributions))

for target_idx, distribution in zip(class_targets, softmax_outputs):
    print(distribution[target_idx])

[(0, array([0.7, 0.1, 0.2])), (1, array([0.1, 0.5, 0.4])), (1, array([0.02, 0.9 , 0.08]))]
0.7
0.5
0.9


In [7]:
print(
    softmax_outputs[
        range(len(softmax_outputs)), # first dimension, list of numbers from 0 to last index. We have as many indices as distributions, so we can use a range()
        class_targets
    ]
)


[0.7 0.5 0.9]


In [8]:
print(
    -np.log(
        softmax_outputs[
            range(len(softmax_outputs)),
            class_targets
        ]
    )
)

[0.35667494 0.69314718 0.10536052]


In [9]:
# we want an average loss per batch to have an idea about how our model is doing during training
neg_log = -np.log(
    softmax_outputs[
        range(len(softmax_outputs)), class_targets
    ]
)
average_loss = np.mean(neg_log)
print(average_loss)

0.38506088005216804


Targets can be:
  * **one-hot encoded**: all values, except for one, are zeros and the correct label’s position is filled with 1)
  * **sparse**: the numbers they contain are the correct class numbers

We have to add a check if they are one-hot encoded and handle it differently if that's case.

The check can be performed by counting the dimensions:
  * if targets are single-dimensional (like a list), they are sparse
  * if there are 2 dimensions (like a list of lists), then there is a set of one-hot encoded vectors.

 In this second case, we’ll implement a solution using the first equation from this chapter, instead of filtering out the confidences at the target labels.

We have to multiply confidences by the targets, zeroing out all values except the ones at correct labels, performing a sum along the row axis (axis 1).
We have to add a test to the code we just wrote for the number of dimensions, move calculations of the log values outside this new if statement, and implement the solution for the one-hot encoded labels following the first equation:

In [10]:
softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])
class_targets = np.array([[1, 0, 0],
                          [0, 1, 0],
                          [0, 1, 0]])


# Probabilities for target values
if len(class_targets.shape) == 1: # only if categorical labels
    correct_confidences = softmax_outputs[
        range(len(softmax_outputs)),
        class_targets
    ]
elif len(class_targets.shape) == 2: # only for one-hot encoded labels
    correct_confidences = np.sum(
        softmax_outputs * class_targets,
        axis=1
    )

neg_log = -np.log(correct_confidences)
average_loss = np.mean(neg_log)
print(average_loss)

0.38506088005216804


$$
\lim_{x\rightarrow 0} log(x) = -\infty
$$

In [11]:
-np.log(0) # RuntimeWarning: divide by zero encountered in log

  """Entry point for launching an IPython kernel.


inf

In [12]:
np.e**(-np.inf)

0.0

In [17]:
print(np.clip(0, 1e-7, 1 - 1e-7))
print(np.clip(1, 1e-7, 1 - 1e-7))

1e-07
0.9999999


## Categorical Cross-Entropy Class

In [18]:
# Common loss class
class Loss:

    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):

        # Calculate sample losses
        sample_losses = self.forward(output, y)

        # Calculate mean loss
        data_loss = np.mean(sample_losses)

        # Return loss
        return data_loss


# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):

    # Forward pass
    def forward(self, y_pred, y_true):

        # Number of samples in a batch
        samples = len(y_pred)

        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)

        # Probabilities for target values -
        # only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]

        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(
                y_pred_clipped * y_true,
                axis=1
            )

        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods

loss_function = Loss_CategoricalCrossentropy()
loss = loss_function.calculate(softmax_outputs, class_targets)
print(loss)


0.38506088005216804


## Combining all together

In [23]:
nnfs.init()

# Dense layer
class Layer_Dense:

    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases


# ReLU activation
class Activation_ReLU:

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs
        self.output = np.maximum(0, inputs)




# Softmax activation
class Activation_Softmax:

    # Forward pass
    def forward(self, inputs):

        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)

        self.output = probabilities


# Common loss class
class Loss:

    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):

        # Calculate sample losses
        sample_losses = self.forward(output, y)

        # Calculate mean loss
        data_loss = np.mean(sample_losses)

        # Return loss
        return data_loss


# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):

    # Forward pass
    def forward(self, y_pred, y_true):

        # Number of samples in a batch
        samples = len(y_pred)

        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)



        # Probabilities for target values -
        # only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[
                range(samples),
                y_true
            ]

        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(
                y_pred_clipped*y_true,
                axis=1
            )

        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods



# Create dataset
X, y = spiral_data(samples=100, classes=3)

# Create Dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)

# Create ReLU activation (to be used with Dense layer):
activation1 = Activation_ReLU()

# Create second Dense layer with 3 input features (as we take output
# of previous layer here) and 3 output values
dense2 = Layer_Dense(3, 3)

# Create Softmax activation (to be used with Dense layer):
activation2 = Activation_Softmax()

# Create loss function
loss_function = Loss_CategoricalCrossentropy()

# Perform a forward pass of our training data through this layer
dense1.forward(X)

# Perform a forward pass through activation function
# it takes the output of first dense layer here
activation1.forward(dense1.output)


# Perform a forward pass through second Dense layer
# it takes outputs of activation function of first layer as inputs
dense2.forward(activation1.output)

# Perform a forward pass through activation function
# it takes the output of second dense layer here
activation2.forward(dense2.output)

# Let's see output of the first few samples:
print(activation2.output[:5])

# Perform a forward pass through loss function
# it takes the output of second activation function and returns loss
loss = loss_function.calculate(activation2.output, y)

# Print loss value
print('loss:', loss)

# Calculate accuracy from output of activation2 and targets
# calculate values along first axis
predictions = np.argmax(activation2.output, axis=1)
if len(y.shape) == 2:
    y = np.argmax(y, axis=1)
accuracy = np.mean(predictions==y)

# Print accuracy
print('acc:', accuracy)

[[0.33333334 0.33333334 0.33333334]
 [0.3333332  0.3333332  0.33333364]
 [0.3333329  0.33333293 0.3333342 ]
 [0.3333326  0.33333263 0.33333477]
 [0.33333233 0.3333324  0.33333528]]
loss: 1.0986104
acc: 0.34


[Final code for chapter](https://github.com/Sentdex/nnfs_book/tree/main/Chapter_5)