## Block

In [10]:
import torch 
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F

Below we define a net in a process-oriented way


In [11]:
torch.manual_seed(100)
net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10)) 
# no nn.Softmax() is required here since in computing loss, it is done automatically in nn.CrossEntropyLoss()
x = torch.randn(2,20)
net(x)

tensor([[-0.0041,  0.1994, -0.0439, -0.3023,  0.1296, -0.2967, -0.1734,  0.0149,
         -0.1510, -0.2858],
        [ 0.0819, -0.0980, -0.0153, -0.0092,  0.0740, -0.3273, -0.4315,  0.0735,
          0.0503, -0.1280]], grad_fn=<AddmmBackward>)

In [12]:
net._modules # thi is an OrderedDict

OrderedDict([('0', Linear(in_features=20, out_features=256, bias=True)),
             ('1', ReLU()),
             ('2', Linear(in_features=256, out_features=10, bias=True))])

In [13]:
net.children

<bound method Module.children of Sequential(
  (0): Linear(in_features=20, out_features=256, bias=True)
  (1): ReLU()
  (2): Linear(in_features=256, out_features=10, bias=True)
)>

Recall in Ch3, we define MLP as a class, and its `.init()` create each layer by `nn.Linear()`, `nn.ReLU()` and use `forward()` to specify how the quantities should be computed, so **essentially, object oriented way is a BLOCK!**

## Sequential Block

In [14]:
class MySequential(nn.Sequential):
    def __init__(self, **kwargs):
        super(MySequential, self).__init__(**kwargs)
    
    def add_module(self, block):
        self._modules[block] = block # OrderedDict
    
    def forward(self, x):
        for block in self._modules.values():
            x = block(x)
        return x

At its core is the add method. It adds any block to the ordered dictionary of children. These are then executed in sequence when forward propagation is invoked. Let’s see what the MLP looks like now.

In [17]:
net = MySequential()
net.add_module(nn.Linear(20, 256))
net.add_module(nn.ReLU())
net.add_module(nn.Linear(256, 10))
torch.manual_seed(100)
x = torch.randn(2, 20)
net(x) 
# fixme:
# manually set seed does not remove the randomness in initialization?

tensor([[ 0.0037,  0.2254, -0.3256,  0.0614, -0.0765,  0.2539, -0.0049, -0.1747,
          0.2823, -0.0172],
        [-0.0225,  0.0785, -0.0906, -0.0450, -0.1233,  0.1334, -0.0704, -0.3179,
         -0.0714, -0.2188]], grad_fn=<AddmmBackward>)

## [**Executing Code in the Forward Propagation Function**]

The `Sequential` class makes model construction easy,
allowing us to assemble new architectures
without having to define our own class.
However, not all architectures are simple daisy chains.
When greater flexibility is required,
we will want to define our own blocks.
For example, we might want to execute
Python's control flow within the forward propagation function.
Moreover, we might want to perform
arbitrary mathematical operations,
not simply relying on predefined neural network layers.

You might have noticed that until now,
all of the operations in our networks
have acted upon our network's activations
and its parameters.
Sometimes, however, we might want to
incorporate terms
that are neither the result of previous layers
nor updatable parameters.
We call these *constant parameters*.
Say for example that we want a layer
that calculates the function
$f(\mathbf{x},\mathbf{w}) = c \cdot \mathbf{w}^\top \mathbf{x}$,
where $\mathbf{x}$ is the input, $\mathbf{w}$ is our parameter,
and $c$ is some specified constant
that is not updated during optimization.
So we implement a `FixedHiddenMLP` class as follows.

In [None]:
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # Random weight parameters that will not compute gradients and
        # therefore keep constant during training
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        # Use the created constant parameters, as well as the `relu` and `mm`
        # functions
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        # Reuse the fully-connected layer. This is equivalent to sharing
        # parameters with two fully-connected layers
        # TODO:
        #   does reuse mean, the parameters of these two layers will remains the same? 
        #   I think so, since it calls the same method `self.linear()`
        X = self.linear(X)
        # Control flow
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()