# Building Models

In [1]:
print("Building Models with Pytorch")

Building Models with Pytorch


In [2]:
import torch

class TinyModel(torch.nn.Module):
    def __init__(self):
        super(TinyModel, self).__init__()
        self.linear1 = torch.nn.Linear(100, 200)
        self.activation = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200, 10)
        self.softmax = torch.nn.Softmax()

    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)
        return x

tinymodel = TinyModel()

print('The Model:\n', tinymodel)

print('\n\nJust one layer:\n', tinymodel.linear2)

print('\n\nModel Params:\n')
for param in tinymodel.parameters():
    print(param)

print('\n\nLayer Params:')
for param in tinymodel.linear2.parameters():
    print(param)

The Model:
 TinyModel(
  (linear1): Linear(in_features=100, out_features=200, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=200, out_features=10, bias=True)
  (softmax): Softmax(dim=None)
)


Just one layer:
 Linear(in_features=200, out_features=10, bias=True)


Model Params:

Parameter containing:
tensor([[ 0.0933,  0.0112,  0.0278,  ...,  0.0813,  0.0355, -0.0155],
        [-0.0538,  0.0095,  0.0097,  ...,  0.0780, -0.0580, -0.0880],
        [ 0.0297, -0.0545, -0.0400,  ..., -0.0958, -0.0323,  0.0478],
        ...,
        [-0.0847, -0.0996, -0.0570,  ..., -0.0636,  0.0311,  0.0149],
        [-0.0492, -0.0312,  0.0997,  ...,  0.0221,  0.0615,  0.0968],
        [ 0.0979, -0.0021,  0.0966,  ..., -0.0898,  0.0638, -0.0793]],
       requires_grad=True)
Parameter containing:
tensor([ 4.9517e-02,  6.9481e-03,  6.2736e-02,  6.5808e-02, -1.5988e-02,
        -2.6240e-02,  5.5974e-02,  9.1646e-02,  9.4806e-02,  5.9449e-02,
        -8.7631e-02, -1.3541e-02, -7.6675e-02,  7.9

This shows the fundamental structure of a PyTorch model: there is an `__init__()` method that defines the layers and other components of a model, and a `forward()` method where the computation gets done. Note that we can print the model, or any of its submodules, to learn about its structure.

## Linear Layers

In [4]:
lin = torch.nn.Linear(3,2)
x = torch.rand(1,3)

print('Input:')
print(x)

print('\n\nWeight and Bias Parameters:')
for param in lin.parameters():
    print(param)

y = lin(x)
print('\n\nOutput:')
print(y)

Input:
tensor([[0.1708, 0.4724, 0.6928]])


Weight and Bias Parameters:
Parameter containing:
tensor([[-0.3162, -0.4083,  0.3630],
        [ 0.3860,  0.1159, -0.3086]], requires_grad=True)
Parameter containing:
tensor([ 0.1978, -0.0044], requires_grad=True)


Output:
tensor([[ 0.2024, -0.0975]], grad_fn=<AddmmBackward0>)


## Convolutional Layers

In [5]:
import torch.functional as F

class LeNet(torch.nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = torch.nn.Conv2d(1,6,3)
        self.conv2 = torch.nn.Conv2d(6,16,3)
        # an affine operation: y = Wx + b
        self.fc1 = torch.nn.Linear(16*6*6, 120)
        self.fc2 = torch.nn.Linear(120,84)
        self.fc3 = torch.nn.Linear(84,10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

    def num_flat_features(self, x):
        size = x.size()[1:]     # all dimensions except the batch dimensions
        num_features = 1
        for s in size:
            num_features *= s
        return num_features



## Recurrent Layers

Recurrent neural networks (or RNNs) are used for sequential data - anything from time-series measurements from a scientific instrument to natural language sentences to DNA nucleotides. An RNN does this by maintaining a hidden state that acts as a sort of memory for what it has seen in the sequence so far.



In [None]:
class LSTMTagger(torch.nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, target_size):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim
        self.word_embeddings = torch.nn.Embedding(vocab_size, embedding_dim)
        self.lstm = torch.nn.LSTM(embedding_dim, hidden_dim)
        self.hidden2tag = torch.nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence),1,-1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores


Link for reference: https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

## Transformers

Link: https://pytorch.org/docs/stable/nn.html#transformer-layers

## Data Manipulation Layers

### Maxpooling

Max pooling (and its twin, min pooling) reduce a tensor by combining cells, and assigning the maximum value of the input cells to the output cell (we saw this). If you look closely at the values above, you’ll see that each of the values in the maxpooled output is the maximum value of each quadrant of the 6x6 input.


In [6]:
my_tensor = torch.rand(1,6,6)
print(my_tensor)

maxpool_layer = torch.nn.MaxPool2d(3)
print(maxpool_layer(my_tensor))

tensor([[[0.8154, 0.7255, 0.4330, 0.3691, 0.9561, 0.9617],
         [0.4603, 0.1120, 0.2590, 0.0756, 0.9736, 0.0066],
         [0.0914, 0.9699, 0.6992, 0.4234, 0.8032, 0.6409],
         [0.1872, 0.4598, 0.9655, 0.6025, 0.6152, 0.7163],
         [0.2440, 0.2840, 0.2116, 0.5312, 0.6664, 0.0310],
         [0.6662, 0.3239, 0.9236, 0.4888, 0.2591, 0.5335]]])
tensor([[[0.9699, 0.9736],
         [0.9655, 0.7163]]])


### Normalization

Normalization layers re-center and normalize the output of one layer before feeding it to another. Centering and scaling the intermediate tensors has a number of beneficial effects, such as letting you use higher learning rates without exploding/vanishing gradients.

In [7]:
my_tensor = torch.rand(1,4,4) * 20 + 5
print(my_tensor)

print(my_tensor.mean())

norm_layer = torch.nn.BatchNorm1d(4)
normed_tensor = norm_layer(my_tensor)
print(normed_tensor)

print(normed_tensor.mean())

tensor([[[17.4006, 16.6246, 19.8065, 23.2332],
         [23.0658, 16.5272, 11.1158, 20.3753],
         [11.1270, 20.6154, 24.7695, 12.9475],
         [ 8.4940,  5.5755, 13.5272, 24.9915]]])
tensor(16.8873)
tensor([[[-0.7250, -1.0266,  0.2100,  1.5416],
         [ 1.1791, -0.2770, -1.4821,  0.5800],
         [-1.1212,  0.5842,  1.3309, -0.7940],
         [-0.6282, -1.0223,  0.0513,  1.5992]]],
       grad_fn=<NativeBatchNormBackward0>)
tensor(-2.9802e-08, grad_fn=<MeanBackward0>)


Link: Batch Normalization | https://arxiv.org/abs/1502.03167

### Dropout

Dropout layers work by randomly setting parts of the input tensor during training - dropout layers are always turned off for inference. This forces the model to learn against this masked or reduced dataset.

Below, you can see the effect of dropout on a sample tensor. You can use the optional p argument to set the probability of an individual weight dropping out; if you don’t it defaults to 0.5.



In [9]:
my_tensor = torch.rand(1,4,4)

dropout = torch.nn.Dropout(p=0.4)

print(dropout(my_tensor))
print(dropout(my_tensor))
print(dropout(my_tensor))

tensor([[[0.0000, 0.3705, 0.9996, 1.5912],
         [0.7045, 0.3990, 0.8228, 0.9637],
         [0.6504, 0.4500, 0.0000, 1.3733],
         [0.0000, 0.8640, 0.0000, 0.4564]]])
tensor([[[0.9955, 0.3705, 0.9996, 0.0000],
         [0.7045, 0.3990, 0.0000, 0.0000],
         [0.6504, 0.4500, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.4564]]])
tensor([[[0.0000, 0.3705, 0.9996, 1.5912],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.4500, 0.0000, 0.0000],
         [0.4581, 0.0000, 0.0000, 0.4564]]])
