# HW1-B. A weird operation function

## About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.0 (01/02/2025)

**Requirements:**
- Python 3
- Matplotlib
- Numpy
- Pandas
- Torch
- Torchmetrics

## 0. Imports and CUDA

In [1]:
%%capture
! pip install torchmetrics

In [2]:
# Matplotlib
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
# Numpy
import numpy as np
# Pandas
import pandas as pd
# Torch
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torchmetrics.classification import BinaryAccuracy
# Helper functions (additional file)
from helper_functions import *
#from hidden_functions import *

In [3]:
# Use GPU if available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


## 4. Writing a custom operation function

In this section, we will consider a weird operation function $ f(x) $, whose behavior is decribed as:

$$ f(x) = w \cdot x + b + \alpha \cdot \tanh(w' \cdot x + b') $$

Where $ w $ and $ b $  are the weights and biases learned through the nn.Linear layer, representing the linear transformation $w \cdot x + b$. $ \alpha $ is a trainable scalar parameter, initialized to 0.5, and can be adjusted during the training process. $tanh$ is the hyperbolic tangent activation function, which is used to apply the activation after passing $x$ through a similar linear transformation $w' \cdot x + b'$.


**Question 7:** In the code below, we will define a *WeirdLayer* object, implementing our operation function. As before, there are a few None variables that probably need to be replaced. Show your code for the *WeirdLayer* object in your report. You should probably use the *torch.tanh(x)* function in your implementation.

In [4]:
class WeirdLayer(nn.Module):
    def __init__(self, input_dim, output_dim, alpha_init=0.5):
        super().__init__()
        self.linear = nn.Linear(input_dim, output_dim)
        self.alpha = nn.Parameter(torch.tensor(alpha_init, dtype=torch.float32))
        self.fc_for_tanh = nn.Linear(input_dim, output_dim)

    def forward(self, x):
      linear_output = self.linear(x)
      tanh_output = torch.tanh(self.fc_for_tanh(x))
      return linear_output + self.alpha * tanh_output

**Question 8:** Is the gradient of WeirdLayer with respect to $ \alpha $ computable?

You may use the fucntion *test_act_object()* below, which will produce a few test cases for your *WeirdActivation* object. As before, if you have correctly figured out the code to use in Question 8, you will be able to pass all test cases.

In [5]:
# Define our weird operation function
act_fun = WeirdLayer(input_dim = 2, output_dim = 10)

In [6]:
# Running test function for our operation function object
test_act_object(act_fun)

--- Test case (activation function): Checking for correct output shape.
Testing forward on a Tensor of values.
Retrieved shape: torch.Size([1, 10])
Expected shape: (1, 10)
Test case result: Passed


**Question 9:** How does this layer differ from a standard fully connected layer?

**ANSWER 9:**
A standard fully connected layer implements the function f(x) = wx + b. The WeirdLayer however implements both linear and non-linear components. This creates a adjustable non-linearity in the layer. The α parameter controls how much the non-linear tanh component contributes. When α is close to 0, the layer might behave more like a standard linear layer. When α is larger, the non-linear effects might be stronger.

## What is next?

Our task continues in the Notebook 1-C.