# Torch Modules
Modules are the basic building blocks of neural networks. They are the components that can be combined to create a neural network. A module can be a layer, an activation function, a loss function, etc. In this notebook, we will learn how to create modules and how to use them to create a neural network.

In [None]:
import torch.nn as nn
import torch.nn.functional as F
import torch

class Model(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5) # Learn more about this in other sections
        self.conv2 = nn.Conv2d(20, 1, 5)
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return self.conv2(x)


Modules can be used as a submodule of another module.

In [None]:
class SuperModel(nn.Module):

    def __init__(self):
        super().__init__()
        self.model1 = Model()
        self.model2 = Model()
    
    def forward(self, x):
        x = self.model1(1)
        return self.model2(x)

nn.Sequential is a module that contains other modules and applies them in sequence to produce its output.

In [None]:
model = nn.Sequential(
    SuperModel(), # Note that we do not use F.ReLU here, since we need a module.
    SuperModel()
)

## Learning XOR

Note! It is VERY important that the target vector has shape (N, 1) and not (N,). If the target vector has shape (N,), the loss function will not work properly.

In [None]:
data = torch.tensor([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
target = torch.tensor([0., 1., 1., 0.]).reshape(4,1)

# Let's define a very small neural network.
perceptron = nn.Sequential(
    nn.Linear(2, 2),
    nn.Sigmoid(),
    nn.Linear(2, 1),
)

loss_func = nn.MSELoss()
# Let's do a forward pass on this fake data:
pred = perceptron(data)

# Let's calculate the loss
loss = loss_func(pred, target) 

gradient_before = next(perceptron.parameters()).grad

# Now we must do some backwards propagation
loss.backward() # This stores the gradient for every model param in the parameters .grad attribute

gradient_after = next(perceptron.parameters()).grad # Now we have a gradient!

# Now, we must load an optimizer. Most commonly, this is Stochastic Gradient Descent (SGD)
optim = torch.optim.Adam(perceptron.parameters(), lr = 0.03)
optim.step()

print(pred)

Let's put this in a loop until our loss is less than 0.1!

In [None]:
from tqdm import tqdm

loss = 1

losses = []


for iterations in tqdm(range(1000)):
    pred = perceptron(data)
    loss = loss_func(pred, target)
    
    loss.backward()
    optim.step()
    optim.zero_grad()

    if iterations % 10 == 0:
        losses.append(loss.item())

import matplotlib.pyplot as plt
plt.plot(losses)

We can use .apply() to apply a function to a model. This is useful for things like weight initialization.

In [None]:
def reset_weights(m):
    if isinstance(m, nn.Linear):
        m.reset_parameters()

perceptron.apply(reset_weights)