# Implementing and analyzing Custom Loss Functions in PyTorch

This task sequence introduces the development of custom loss functions in PyTorch, with a focus on applying theoretical knowledge to practical implementation.

In [1]:
import torch
import torch.nn as nn

class L1Loss(nn.Module):
    """
    L1 Loss, also known as Mean Absolute Error (MAE).
    """
    def forward(self, y_pred, y_true):
        """
        Forward pass for L1 loss using PyTorch operations.

        :param y_pred: Predicted values (Tensor).
        :param y_true: Ground truth values (Tensor).
        :return: Scalar tensor representing the L1 loss.
        """
        return torch.mean(torch.abs(y_pred - y_true))


# Example usage
if __name__ == "__main__":
    # Define sample predicted values and ground truth values for testing the implementation
    y_pred = torch.tensor([1.0, 2.0, 3.0, 4.0], requires_grad=True)
    y_true = torch.tensor([1.5, 2.5, 3.0, 4.5])

    # Initialize custom L1Loss
    criterion = L1Loss()

    # Compute the loss using L1Loss class and print it
    loss = criterion(y_pred, y_true)
    print(f"L1 Loss: {loss}")

    # Perform a backward pass to compute gradients (optional demonstration of PyTorch's autograd)
    loss.backward()
    print(f"Gradients on y_pred: {y_pred.grad}")

L1 Loss: 0.375
Gradients on y_pred: tensor([-0.2500, -0.2500,  0.0000, -0.2500])


In [2]:
class L2Loss(nn.Module):
    """
    L2 Loss, also known as Mean Squared Error (MSE).
    """
    def forward(self, y_pred, y_true):
        """
        Forward pass for L2 loss using PyTorch operations.
        :param y_pred: Predicted values (Tensor).
        :param y_true: Ground truth values (Tensor).
        :return: Scalar tensor representing the L2 loss.
        """
        return torch.mean((y_pred - y_true) ** 2)


# Example usage
if __name__ == "__main__":
    # Define sample predicted values and ground truth values for testing the implementation
    # Ensure y_pred and y_true are PyTorch tensors
    y_pred = torch.tensor([1.0, 2.0, 3.0, 4.0], requires_grad=True)
    y_true = torch.tensor([1.5, 2.5, 3.0, 4.5])

    # Initialize custom L1Loss
    criterion = L2Loss()

    # Compute the loss using L2Loss class and print it
    loss = criterion(y_pred, y_true)
    print(f"L2 Loss: {loss}")

    # Perform a backward pass to compute gradients (optional demonstration of PyTorch's autograd)
    loss.backward()
    print(f"Gradients on y_pred: {y_pred.grad}")

L2 Loss: 0.1875
Gradients on y_pred: tensor([-0.2500, -0.2500,  0.0000, -0.2500])


In [3]:
class BCELoss(nn.Module):
    """
    Binary Cross-Entropy (BCE) Loss implemented for PyTorch.
    Note: PyTorch already provides nn.BCELoss, but implementing it manually can be educational.
    """
    def forward(self, y_pred, y_true):
        """
        Forward pass for BCE loss using PyTorch operations.

        :param y_pred: Predicted probabilities (Tensor) with values in range [0, 1].
        :param y_true: Ground truth values (Tensor) with binary values 0 or 1.
        :return: Scalar tensor representing the BCE loss.
        """
        epsilon = 1e-12  # Small value to avoid numerical instability for log(0)
        bce_loss = -torch.mean(y_true * torch.log(torch.clamp(y_pred, epsilon, 1.0)) + (1 - y_true) * torch.log(torch.clamp(1 - y_pred, epsilon, 1.0)))
        return bce_loss


# Example usage
if __name__ == "__main__":
    # Define sample predicted values and ground truth values for testing the implementation
    # Ensure y_pred and y_true are PyTorch tensors
    y_pred = torch.tensor([0.9, 0.2, 0.8, 0.1], requires_grad=True)
    y_true = torch.tensor([1, 0, 1, 0], dtype=torch.float32)

    # Initialize custom BCELoss
    criterion = BCELoss()

    # Compute the loss using BCELoss class and print it
    loss = criterion(y_pred, y_true)
    print(f"BCE Loss: {loss}")

    # Perform a backward pass to compute gradients (optional demonstration of PyTorch's autograd)
    loss.backward()
    print(f"Gradients on y_pred: {y_pred.grad}")

BCE Loss: 0.16425204277038574
Gradients on y_pred: tensor([-0.2778,  0.3125, -0.3125,  0.2778])


In [4]:
class CELoss(nn.Module):
    """
    Implement the Cross-Entropy Loss for multi-class classification in PyTorch.
    """
    def __init__(self):
        super(CELoss, self).__init__()

    def forward(self, logits, targets):
        """
        Forward pass for Cross-Entropy loss.

        :param logits: Logits from the model (Tensor). Shape: [batch_size, num_classes].
        :param targets: Ground truth class indices (Tensor). Shape: [batch_size].
        :return: Scalar tensor representing the CE loss.
        """
        batch_size = logits.size(0)
        log_softmax = logits - torch.log(torch.sum(torch.exp(logits), dim=1, keepdim=True))
        ce_loss = -torch.sum(log_softmax[range(batch_size), targets]) / batch_size
        return ce_loss


# Example usage
if __name__ == "__main__":
    # Define sample predicted values and ground truth values for testing the implementation
    # Ensure y_pred and y_true are PyTorch tensors
    y_pred = torch.tensor([[0.5, 0.3, 0.2], [0.1, 0.8, 0.1], [0.2, 0.2, 0.6]], requires_grad=True)
    y_true = torch.tensor([0, 1, 2])  # Assuming the ground truth class indices

    # Initialize custom CELoss
    criterion = CELoss()

    # Compute the loss using BCELoss class and print it
    loss = criterion(y_pred, y_true)
    print(f"CE Loss: {loss}")

    # Perform a backward pass to compute gradients (optional demonstration of PyTorch's autograd)
    loss.backward()
    print(f"Gradients on y_pred: {y_pred.grad}")

CE Loss: 0.8266606330871582
Gradients on y_pred: tensor([[-0.2031,  0.1066,  0.0965],
        [ 0.0830, -0.1661,  0.0830],
        [ 0.0955,  0.0955, -0.1909]])


# Implementing Custom Activation Functions in PyTorch

Note: The backward calculation for the Softmax function is not straightforward; hence, we may rely solely on PyTorch's built-in functionality for the backward pass.

In [5]:
class ReLU(nn.Module):
    """
    Implement the ReLU activation function.
    """
    def __init__(self):
        super(ReLU, self).__init__()

    def forward(self, x):
        """
        Forward pass for ReLU.
        :param x: Input tensor.
        :return: Output tensor where ReLU(x) = max(0, x).
        """
        return torch.maximum(torch.zeros_like(x), x)


    def backward(self, grad_output):
        """
        Backward pass for custom ReLU.
        :param grad_output: Gradient tensor of the output.
        :return: Gradient tensor for the input.
        """
        # Gradient of ReLU is 1 for input > 0; otherwise, it's 0
        grad_input = grad_output.clone()
        grad_input[grad_input >= 1] = 1
        return grad_input


# Example usage
if __name__ == "__main__":
    # Define a sample input tensor
    x = torch.tensor([-1.0, 0.0, 1.0, 2.0], requires_grad=True)

    # Initialize the custom ReLU activation function
    custom_relu = ReLU()

    # Compute the activation using the custom ReLU class
    activated_x_custom = custom_relu(x)

    # Perform a backward pass to compute gradients for the custom implementation
    gradients_custom = custom_relu.backward(activated_x_custom)

    # Print the outputs and gradients from the custom implementation
    print("Custom ReLU output:", activated_x_custom)
    print("Custom ReLU gradients:", gradients_custom)

    # Reset gradients to zero before another backward pass
    x.grad = None

    # Compute the activation using PyTorch's built-in relu function
    activated_x_torch = torch.relu(x)

    # Perform a backward pass to compute gradients for PyTorch's implementation
    activated_x_torch.backward(torch.ones_like(x))
    gradients_torch = x.grad

    # Print the outputs and gradients from PyTorch's implementation
    print("PyTorch ReLU output:", activated_x_torch)
    print("PyTorch ReLU gradients:", gradients_torch)

Custom ReLU output: tensor([0., 0., 1., 2.], grad_fn=<MaximumBackward0>)
Custom ReLU gradients: tensor([0., 0., 1., 1.], grad_fn=<IndexPutBackward0>)
PyTorch ReLU output: tensor([0., 0., 1., 2.], grad_fn=<ReluBackward0>)
PyTorch ReLU gradients: tensor([0., 0., 1., 1.])


In [9]:
class Sigmoid(nn.Module):
    """
    Implement the Sigmoid activation function.
    """
    def __init__(self):
        super(Sigmoid, self).__init__()

    def forward(self, x):
        """
        Forward pass for Sigmoid.
        :param x: Input tensor.
        :return: Output tensor where Sigmoid(x) = 1 / (1 + exp(-x)).
        """
        self.output = 1 / (1 + torch.exp(-x))
        return self.output


    def backward(self, grad_output):
        """
        Backward pass for custom Sigmoid.
        :param grad_output: Gradient tensor of the output.
        :return: Gradient tensor for the input.
        """
        sigmoid_grad = self.output * (1 - self.output) * grad_output
        return sigmoid_grad


# Example usage
if __name__ == "__main__":
    # Define a sample input tensor
    x = torch.tensor([-1.0, 0.0, 1.0, 2.0], requires_grad=True)

    # Initialize the custom Sigmoid activation function
    custom_sigmoid = Sigmoid()

    # Compute the activation using the custom Sigmoid class
    activated_x_custom = custom_sigmoid(x)

    # Perform a backward pass to compute gradients for the custom implementation
    gradients_custom = custom_sigmoid.backward(torch.ones_like(activated_x_custom))

    # Print the outputs and gradients from the custom implementation
    print("Custom Sigmoid output:", activated_x_custom)
    print("Custom Sigmoid gradients:", gradients_custom)
    
    # Reset gradients to zero before another backward pass
    x.grad = None
    
    # Compute the activation using PyTorch's built-in sigmoid function
    activated_x_torch = torch.sigmoid(x)

    # Perform a backward pass to compute gradients for PyTorch's implementation
    activated_x_custom.backward(torch.ones_like(x))
    gradients_torch = x.grad

    # Print the outputs and gradients from PyTorch's implementation
    print("PyTorch Sigmoid output:", activated_x_torch)
    print("PyTorch Sigmoid gradients:", gradients_torch)

Custom Sigmoid output: tensor([0.2689, 0.5000, 0.7311, 0.8808], grad_fn=<MulBackward0>)
Custom Sigmoid gradients: tensor([0.1966, 0.2500, 0.1966, 0.1050], grad_fn=<MulBackward0>)
PyTorch Sigmoid output: tensor([0.2689, 0.5000, 0.7311, 0.8808], grad_fn=<SigmoidBackward0>)
PyTorch Sigmoid gradients: tensor([0.1966, 0.2500, 0.1966, 0.1050])


In [10]:
class Tanh(nn.Module):
    """
    Implement the Tanh activation function.
    """
    def __init__(self):
        super(Tanh, self).__init__()

    def forward(self, x):
        """
        Forward pass for Tanh.
        :param x: Input tensor.
        :return: Output tensor where Tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)).
        """
        self.output = (torch.exp(x) - torch.exp(-x)) / (torch.exp(x) + torch.exp(-x))
        return self.output
    

    def backward(self, grad_output):
        """
        Backward pass for custom Tanh.
        :param grad_output: Gradient tensor of the output.
        :return: Gradient tensor for the input.
        """
        # Gradient of tanh function
        grad_input = (1 - self.output ** 2) * grad_output
        return grad_input


# Example usage
if __name__ == "__main__":
    # Define a sample input tensor
    x = torch.tensor([-1.0, 0.0, 1.0, 2.0], requires_grad=True)

    # Initialize the custom Tanh activation function
    custom_tanh = Tanh()

    # Compute the activation using the custom Tanh class
    activated_x_custom = custom_tanh(x)

    # Perform a backward pass to compute gradients for the custom implementation
    gradients_custom = custom_tanh.backward(torch.ones_like(activated_x_custom))

    # Print the outputs and gradients from the custom implementation
    print("Custom Tanh output:", activated_x_custom)
    print("Custom Tanh gradients:", gradients_custom)
    
    # Reset gradients to zero before another backward pass
    x.grad = None
    
    # Compute the activation using PyTorch's built-in Tanh function
    activated_x_torch = torch.tanh(x)

    # Perform a backward pass to compute gradients for PyTorch's implementation
    activated_x_custom.backward(torch.ones_like(x))
    gradients_torch = x.grad

    # Print the outputs and gradients from PyTorch's implementation
    print("PyTorch Tanh output:", activated_x_torch)
    print("PyTorch Tanh gradients:", gradients_torch)

Custom Tanh output: tensor([-0.7616,  0.0000,  0.7616,  0.9640], grad_fn=<DivBackward0>)
Custom Tanh gradients: tensor([0.4200, 1.0000, 0.4200, 0.0707], grad_fn=<MulBackward0>)
PyTorch Tanh output: tensor([-0.7616,  0.0000,  0.7616,  0.9640], grad_fn=<TanhBackward0>)
PyTorch Tanh gradients: tensor([0.4200, 1.0000, 0.4200, 0.0707])


In [20]:
class Softmax(nn.Module):
    """
    Implement the Softmax activation function.
    """
    def __init__(self):
        super(Softmax, self).__init__()

    def forward(self, x, dim=1):
        """
        Forward pass for Softmax.
        :param x: Input tensor.
        :param dim: The dimension Softmax would be applied to.
        :return: Output tensor after applying Softmax.
        """
        # Subtract the maximum value in each row for numerical stability
        max_vals, _ = torch.max(x, dim=dim, keepdim=True)
        exp_x = torch.exp(x - max_vals)
        softmax_output = exp_x / torch.sum(exp_x, dim=dim, keepdim=True)
        return softmax_output


# Example usage
if __name__ == "__main__":
    # Define a sample input tensor
    x = torch.tensor([[1.0, 2.0, 3.0],
                      [4.0, 5.0, 6.0]], requires_grad=True)

    # Initialize the custom Softmax activation function
    custom_softmax = Softmax()

    # Compute the activation using the custom Softmax class
    activated_x_custom = custom_softmax(x)

    # Print the outputs and gradients from the custom implementation
    print("Custom Softmax output:", activated_x_custom)
    
    # Compute the activation using PyTorch's built-in Softmax function
    activated_x_torch = torch.softmax(x, dim=1)
    
    # Perform a backward pass to compute gradients for PyTorch's implementation
    activated_x_custom.backward(torch.ones_like(x))
    gradients_torch = x.grad
    
    # Print the outputs and gradients from PyTorch's implementation
    print("PyTorch Softmax output:", activated_x_torch)
    print("PyTorch Softmax gradients:", gradients_torch)

Custom Softmax output: tensor([[0.0900, 0.2447, 0.6652],
        [0.0900, 0.2447, 0.6652]], grad_fn=<DivBackward0>)
PyTorch Softmax output: tensor([[0.0900, 0.2447, 0.6652],
        [0.0900, 0.2447, 0.6652]], grad_fn=<SoftmaxBackward0>)
PyTorch Softmax gradients: tensor([[-8.0666e-09, -2.1927e-08,  2.9994e-08],
        [-8.0666e-09, -2.1927e-08,  2.9994e-08]])


# Task: Connecting Sigmoid and Softmax Functions

The sigmoid and softmax functions are foundational to machine learning, particularly in classification tasks. While the sigmoid function is traditionally used for binary classification, the softmax function generalizes this concept to multi-class problems. The sigmoid function can be seen as a special case of the softmax function when the output space consists of two classes.

Consider a binary classification problem and the general form of the softmax function for an arbitrary vector $\mathbf{z} $ with components $\mathbf{z_i} $ for $\mathbf( i = 1, \ldots, K) $ classes. The softmax function is defined as:

$$
\text{softmax}(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}}
$$

The task is to demonstrate that the softmax function simplifies to the sigmoid function in the context of binary classification.

**Express the Softmax Function for Two Classes:**
Show the softmax function for a two-class system and define the components of the vector $\mathbf{z} $ as arbitrary logits without specifying any particular values.

For a two-class system, let $\mathbf{z} = [z_1, z_2]$. Then the softmax function becomes:
$$\text{softmax}(\mathbf{z_1}) = \frac{e^{z_1}}{e^{z_1} + e^{z_2}}$$
$$\text{softmax}(\mathbf{z_2}) = \frac{e^{z_2}}{e^{z_1} + e^{z_2}}$$


**Derive the Sigmoid Function from Softmax:**
Simplify the expression for the probability of the first class and show how it is equivalent to the sigmoid function for an arbitrary logit.

For binary classification, if we denote $z_1$ as the logit for the positive class (let's say class 1), then the logit for the negative class (class 2) can be expressed in terms of $z_1$. Since the probabilities must sum to one, we have:

$$ P(y=1|\mathbf{z}) + P(y=2|\mathbf{z}) = 1 $$

Given that there are only two classes, $P(y=2|\mathbf{z})$ represents the probability of the negative class, which we can express in terms of $P(y=1|\mathbf{z})$. Therefore, we can set $z_2 = 0$ without loss of generality, because $e^0 = 1$:

$$ P(y=1|\mathbf{z}) = \frac{e^{z}}{e^{z} + e^{0}} = \frac{e^{z}}{e^{z} + 1} $$

So, $z_1 = z$ and $z_2 = 0$.

Finally, dividing by $e^{z}$ gives the sigmoid function:

$$ = \frac{1}{1 + e^{-z}} = \text{sigmoid}(\mathbf{z}) $$