In [4]:
import torch
from torch import nn # it is stand for Neural Network

## `nn.Linear`

To create a linear layer, you need to pass it the number of input dimensions and the number of output dimensions. The linear object initialized as `nn.Linear(10, 2)` will take in a $n\times10$ matrix and return an $n\times2$ matrix, where all $n$ elements have had the same linear transformation performed. 

In [7]:
linear = nn.Linear(10, 2)
example_input = torch.randn(3, 10)
example_output = linear(example_input)
example_output

tensor([[-0.5473,  0.8653],
        [ 0.2047,  0.1283],
        [-1.6952,  0.0803]], grad_fn=<AddmmBackward0>)

## `nn.ReLU`

`nn.ReLU()` will create an object that, when receiving a tensor, will perform a ReLU activation function. A ReLU non-linearity sets all negative numbers in a tensor to zero. In general, the simplest neural networks are composed of series of linear transformations, each followed by activation functions. 

In [8]:
relu = nn.ReLU()
relu_output = relu(example_output)
relu_output

tensor([[0.0000, 0.8653],
        [0.2047, 0.1283],
        [0.0000, 0.0803]], grad_fn=<ReluBackward0>)

## `nn.BatchNorm1d`

`nn.BatchNorm1d` is a normalization technique that will rescale a batch of $n$ inputs to have a consistent mean and standard deviation between batches. 

In [9]:
batchnorm = nn.BatchNorm1d(2)
batchnorm_output = batchnorm(relu_output)
batchnorm_output

tensor([[-0.7067,  1.4121],
        [ 1.4135, -0.6392],
        [-0.7067, -0.7728]], grad_fn=<NativeBatchNormBackward0>)

## `nn.Sequential`

`nn.Sequential` creates a single operation that performs a sequence of operations. For example, you can write a neural network layer with a batch normalization as

In [10]:
mlp_layer = nn.Sequential(
    nn.Linear(5, 2),
    nn.BatchNorm1d(2),
    nn.ReLU()
)

test_example = torch.randn(5,5) + 1
print("input: ")
print(test_example)
print("output: ")
print(mlp_layer(test_example))

input: 
tensor([[ 2.5846e+00,  4.9193e-01,  5.8913e-01,  3.0832e+00, -2.0777e-01],
        [ 3.9364e-01,  8.0023e-01,  2.2960e+00,  9.3966e-01,  4.4342e-01],
        [ 1.6927e+00,  2.3627e+00,  1.9047e-01,  6.2804e-01,  3.3643e-01],
        [ 2.0554e+00,  1.2780e+00,  9.9691e-01,  8.5513e-01, -9.9841e-02],
        [ 4.9251e-04,  6.6356e-01,  1.0199e+00, -1.9589e-01,  1.8946e+00]])
output: 
tensor([[0.2432, 1.8097],
        [0.9818, 0.0000],
        [0.0000, 0.0000],
        [0.0000, 0.1172],
        [1.0129, 0.0000]], grad_fn=<ReluBackward0>)


## Optimizers

To create an optimizer in PyTorch, you'll need to use the `torch.optim` module, often imported as `optim`.   
`optim.Adam` corresponds to the Adam optimizer. To create an optimizer object, you'll need to pass it the parameters to be optimized and the learning rate, `lr`, as well as any other parameters specific to the optimizer.

For all `nn` objects, you can access their parameters as a list using their `parameters()` method, as follows:

In [11]:
import torch.optim as optim
adam_opt = optim.Adam(mlp_layer.parameters(), lr=1e-1)

## Training Loop

A (basic) training step in PyTorch consists of four basic parts:


1.   Set all of the gradients to zero using `opt.zero_grad()`
2.   Calculate the loss, `loss`
3.   Calculate the gradients with respect to the loss using `loss.backward()`
4.   Update the parameters being optimized using `opt.step()`

That might look like the following code (and you'll notice that if you run it several times, the loss goes down):


In [12]:
train_example = torch.randn(100,5) + 1
adam_opt.zero_grad()

# We'll use a simple loss function of mean distance from 1
# torch.abs takes the absolute value of a tensor
cur_loss = torch.abs(1 - mlp_layer(train_example)).mean()

cur_loss.backward()
adam_opt.step()
print(cur_loss)

tensor(0.7543, grad_fn=<MeanBackward0>)


# New `nn` Classes

You can also create new classes which extend the `nn` module. For these classes, all class attributes, as in `self.layer` or `self.param` will automatically treated as parameters if they are themselves `nn` objects or if they are tensors wrapped in `nn.Parameter` which are initialized with the class. 

The `__init__` function defines what will happen when the object is created. The first line of the init function of a class, for example, `WellNamedClass`, needs to be `super(WellNamedClass, self).__init__()`. 

The `forward` function defines what runs if you create that object `model` and pass it a tensor `x`, as in `model(x)`. If you choose the function signature, `(self, x)`, then each call of the forward function, gets two pieces of information: `self`, which is a reference to the object with which you can access all of its parameters, and `x`, which is the current tensor for which you'd like to return `y`.

One class might look like the following:

In [13]:
class ExampleModule(nn.Module):
    def __init__(self, input_dims, output_dims):
        super(ExampleModule, self).__init__()
        self.linear = nn.Linear(input_dims, output_dims)
        self.exponent = nn.Parameter(torch.tensor(1.))

    def forward(self, x):
        x = self.linear(x)

        # This is the notation for element-wise exponentiation, 
        # which matches python in general
        x = x ** self.exponent 
        
        return x

In [14]:
example_model = ExampleModule(10, 2)
list(example_model.parameters())

[Parameter containing:
 tensor(1., requires_grad=True),
 Parameter containing:
 tensor([[-0.0128, -0.1707, -0.0394,  0.2377,  0.2220, -0.0957,  0.0123,  0.2168,
           0.3161, -0.0836],
         [-0.2541,  0.2870,  0.0894,  0.1800, -0.2800, -0.1936, -0.0727, -0.3009,
          -0.0031,  0.0273]], requires_grad=True),
 Parameter containing:
 tensor([-0.2845,  0.3094], requires_grad=True)]