## Introduction
[PyTorch](https://pytorch.org/) is a machine learning framework that is used in both academia and industry for various applications. PyTorch started of as a more flexible alternative to [TensorFlow](https://www.tensorflow.org/), which is another popular machine learning framework. At the time of its release, `PyTorch` appealed to the users due to its user friendly nature: as opposed to defining static graphs before performing an operation as in `TensorFlow`, `PyTorch` allowed users to define their operations as they go, which is also the approached integrated by `TensorFlow` in its following releases. Although `TensorFlow` is more widely preferred in the industry, `PyTorch` is often times the preferred machine learning framework for researchers. If you would like to learn more about the differences between the two, you can check out [this](https://blog.udacity.com/2020/05/pytorch-vs-tensorflow-what-you-need-to-know.html) blog post.

Now that we have learned enough about the background of `PyTorch`, let's start by importing it into our notebook. To install `PyTorch`, you can follow the instructions here. Alternatively, you can open this notebook using `Google Colab`, which already has `PyTorch` installed in its base kernel. Once you are done with the installation process, run the following cell:

Adapted from the following resources:
* "Word Window Classification" tutorial notebook by Matt Lamm, from Winter 2020 offering of CS224N
* Official PyTorch Documentation on Deep Learning with PyTorch: A 60 Minute Blitz by Soumith Chintala
* Stanford CS224N PyTorch Tutorial
* Cornell Tech CS5787 PyTorch Tutorial

In [2]:
!pip3 install torch torchvision torchaudio

Collecting torch
  Downloading torch-2.2.2-cp310-none-macosx_10_9_x86_64.whl.metadata (25 kB)
Collecting networkx (from torch)
  Downloading networkx-3.3-py3-none-any.whl.metadata (5.1 kB)
Downloading torch-2.2.2-cp310-none-macosx_10_9_x86_64.whl (150.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m150.8/150.8 MB[0m [31m575.5 kB/s[0m eta [36m0:00:00[0m00:01[0m00:09[0m
[?25hDownloading networkx-3.3-py3-none-any.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m768.3 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: networkx, torch
Successfully installed networkx-3.3 torch-2.2.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
import torch
import torch.nn as nn

In [5]:
# !nvidia-smi

We are all set to start our tutorial. Let's dive in!

## Tensors

Tensors are the most basic building blocks in `PyTorch`.  Tensors are similar to matrices, but the have extra properties and they can represent higher dimensions. For example, an square image with 256 pixels in both sides can be represented by a `3x256x256` tensor, where the first 3 dimensions represent the color channels, red, green and blue.


### Tensor Initialization
There are several ways to instantiate tensors in `PyTorch`, which we will go through next.

#### **From a Python List**

We can initalize a tensor from a `Python` list, which could include sublists. The dimensions and the data types will be automatically inferred by `PyTorch` when we use [`torch.tensor()`](https://pytorch.org/docs/stable/generated/torch.tensor.html).


In [None]:
# Initialize a tensor from a Python List
data = [
        [0, 1],
        [2, 3],
        [4, 5]
       ]
x_python = torch.tensor(data)

# Print the tensor
x_python

tensor([[0, 1],
        [2, 3],
        [4, 5]])

In [None]:
x_python.shape

torch.Size([3, 2])

We can also call `torch.tensor()` with the optional `dtype` parameter, which will set the data type. Some useful datatypes to be familiar with are: `torch.bool`, `torch.float`, and `torch.long`.

In [None]:
# We are using the dtype to create a tensor of particular type
x_float = torch.tensor(data, dtype=torch.float)
x_float

tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

In [None]:
# We are using the dtype to create a tensor of particular type
x_bool = torch.tensor(data, dtype=torch.bool)
x_bool

tensor([[False,  True],
        [ True,  True],
        [ True,  True]])

In [None]:
# Can we multiply these?
x_bool * x_float

tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

In [None]:
x_python.dtype

torch.int64

We can also get the same tensor in our specified data type using methods such as `float()`, `long()` etc.

In [None]:
x_python.double().dtype

torch.float64

We can also use `tensor.FloatTensor`, `tensor.LongTensor`, `tensor.Tensor` classes to instantiate a tensor of particular type. `LongTensor`s are particularly important in NLP as many methods that deal with indices require the indices to be passed as a `LongTensor`, which is a 64 bit integer.

In [None]:
# `torch.Tensor` defaults to float
# Same as torch.FloatTensor(data)
x = torch.Tensor(data)
x

tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

#### **From a NumPy Array**
We can also initialize a tensor from a `NumPy` array.

In [None]:
import numpy as np

# Initialize a tensor from a NumPy array
ndarray = np.array(data)
x_numpy = torch.from_numpy(ndarray)

# Print the tensor
x_numpy

tensor([[0, 1],
        [2, 3],
        [4, 5]])

#### **From a Tensor**
We can also initialize a tensor from another tensor, using the following methods:

* `torch.ones_like(old_tensor)`: Initializes a tensor of `1s`.
* `torch.zeros_like(old_tensor)`: Initializes a tensor of `0s`.
* `torch.rand_like(old_tensor)`: Initializes a tensor where all the elements are sampled from a uniform distribution between `0` and `1`.
* `torch.randn_like(old_tensor)`: Initializes a tensor where all the elements are sampled from a normal distribution.

All of these methods preserve the tensor properties of the original tensor passed in, such as the `shape` and `device`, which we will cover in a bit.

In [None]:
# Initialize a base tensor
x = torch.tensor([[1., 2], [3, 4]])
x

tensor([[1., 2.],
        [3., 4.]])

In [None]:
# Initialize a tensor of 0s
x_zeros = torch.zeros_like(x)
x_zeros

tensor([[0., 0.],
        [0., 0.]])

In [None]:
# Initialize a tensor of 1s
x_ones = torch.ones_like(x)
x_ones

tensor([[1., 1.],
        [1., 1.]])

In [None]:
# Initialize a tensor where each element is sampled from a uniform distribution
# between 0 and 1
x_rand = torch.rand_like(x)
x_rand

tensor([[0.7780, 0.7595],
        [0.8201, 0.6796]])

In [None]:
# Initialize a tensor where each element is sampled from a normal distribution
x_randn = torch.randn_like(x)
x_randn

tensor([[-0.7930,  1.7701],
        [-1.1557,  0.7682]])

#### **By Specifying a Shape**
We can also instantiate tensors by specifying their shapes (which we will cover in more detail in a bit). The methods we could use follow the ones in the previous section:
* `torch.zeros()`
* `torch.ones()`
* `torch.rand()`
* `torch.randn()`

In [None]:
# Initialize a 2x3x2 tensor of 0s
shape = (4, 2, 2)
x_zeros = torch.zeros(shape) # x_zeros = torch.zeros(4, 3, 2) is an alternative
x_zeros

tensor([[[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]]])

#### **With `torch.arange()`**
We can also create a tensor with `torch.arange(end)`, which returns a `1-D` tensor with elements ranging from `0` to `end-1`. We can use the optional `start` and `step` parameters to create tensors with different ranges.  

In [None]:
# Create a tensor with values 0-9
x = torch.arange(10)
x

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Tensor Properties

Tensors have a few properties that are important for us to cover. These are namely `shape`, and the `device` properties.

#### Data Type

The `dtype` property lets us see the data type of a tensor.

In [None]:
# Initialize a 3x2 tensor, with 3 rows and 2 columns
x = torch.ones(3, 2)
x.dtype

torch.float32

#### Shape

The `shape` property tells us the shape of our tensor. This can help us identify how many dimensional our tensor is as well as how many elements exist in each dimension.

In [None]:
# Initialize a 3x2 tensor, with 3 rows and 2 columns
x = torch.Tensor([[1, 2], [3, 4], [5, 6]])
x

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

In [None]:
# Print out its shape
# Same as x.size()
x.shape

torch.Size([3, 2])

In [None]:
# Print out the number of elements in a particular dimension
# 0th dimension corresponds to the rows
x.shape[0]

3

In [None]:
len(x)

3

We can also get the size of a particular dimension with the `size()` method.


In [None]:
# Get the size of the 0th dimension
x.size(0)

3

In [None]:
x.shape

torch.Size([3, 2])

We can change the shape of a tensor with the `view()` method.

In [None]:
# Example use of view()
# x_view shares the same memory as x, so changing one changes the other
x_view = x.view(2, 3)
x_view

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [None]:
# We can ask PyTorch to infer the size of a dimension with -1
x_view = x.view(3, -1)
x_view

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

We can also use `torch.reshape()` method for a similar purpose. There is a subtle difference between `reshape()` and `view()`: `view()` requires the data to be stored contiguously in the memory. You can refer to [this](https://stackoverflow.com/questions/49643225/whats-the-difference-between-reshape-and-view-in-pytorch) StackOverflow answer for more information. In simple terms, contiguous means that the way our data is laid out in the memory is the same as the way we would read elements from it. This happens because some methods, such as `transpose()` and `view()`, do not actually change how our data is stored in the memory. They just change the meta information about out tensor, so that when we use it we will see the elements in the order we expect.

`reshape()` calls `view()` internally if the data is stored contiguously, if not, it returns a copy. The difference here isn't too important for basic tensors, but if you perform operations that make the underlying storage of the data non-contiguous (such as taking a transpose), you will have issues using `view()`. If you would like to match the way your tensor is stored in the memory to how it is used, you can use the `contiguous()` method.  

In [None]:
# Change the shape of x to be 3x2
# x_reshaped could be a reference to or copy of x
x_reshaped = torch.reshape(x, (2, 3))
x_reshaped

tensor([[1., 2., 3.],
        [4., 5., 6.]])

We can use `torch.unsqueeze(x, dim)` function to add a dimension of size `1` to the provided `dim`, where `x` is the tensor. We can also use the corresponding use `torch.squeeze(x)`, which removes the dimensions of size `1`.


In [None]:
# Initialize a 5x2 tensor, with 5 rows and 2 columns
x = torch.arange(10).reshape(5, 2)
x

tensor([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]])

In [None]:
# Add a new dimension of size 1 at the 1st dimension
x = x.unsqueeze(1)
x.shape

torch.Size([5, 1, 2])

In [None]:
# Squeeze the dimensions of x by getting rid of all the dimensions with 1 element
x = x.squeeze()
x.shape

torch.Size([5, 2])

If we want to get the total number of elements in a tensor, we can use the `numel()` method.

In [None]:
x

tensor([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]])

In [None]:
# Get the number of elements in tensor.
x.numel()

10

#### **Device**
Device property tells `PyTorch` where to store our tensor. Where a tensor is stored determines which device, `GPU` or `CPU`, would be handling the computations involving it. We can find the device of a tensor with the `device` property.

In [None]:
# Initialize an example tensor
x = torch.Tensor([[1, 2], [3, 4]])
x

tensor([[1., 2.],
        [3., 4.]])

In [None]:
# Get the device of the tensor
x.device

device(type='cpu')

In [None]:
y = torch.arange(4).reshape((2, 2))
y = y.to('cuda')
# y + x

In [None]:
y.device

device(type='cuda', index=0)

In [None]:
x + y

RuntimeError: ignored

In [None]:
x.cuda() + y

In [None]:
x + y.cpu()

### Tensor Indexing
In `PyTorch` we can index tensors, similar to `NumPy`.

In [None]:
# Initialize an example tensor
x = torch.Tensor([
                  [[1, 2], [3, 4]],
                  [[5, 6], [7, 8]],
                  [[9, 10], [11, 12]]
                 ])
x

In [None]:
x.shape

In [None]:
# Access the 0th element, which is the first row
x[0] # Equivalent to x[0, :]

We can also index into multiple dimensions with `:`.

In [None]:
# Get the top left element of each element in our tensor
x[:, 0, 0]

We can also access arbitrary elements in each dimension.

In [None]:
# Print x again to see our tensor
x

In [None]:
# Let's access the 0th and 1st elements, each twice
i = torch.tensor([0, 0, 1, 1])
x[i]

In [None]:
# Let's access the 0th elements of the 1st and 2nd elements
i = torch.tensor([1, 2])
j = torch.tensor([0])
x[i, j]

We can get a `Python` scalar value from a tensor with `item()`.

In [None]:
x[0, 0, 0]

In [None]:
x[0, 0, 0].item()

### Operations
PyTorch operations are very similar to those of `NumPy`. We can work with both scalars and other tensors.


In [None]:
# Create an example tensor
x = torch.ones((3, 2, 2))
x

In [None]:
# Perform elementwise addition
# Use - for subtraction
x + 2

In [None]:
# Perform elementwise multiplication
# Use / for division
x * 2

We can apply the same operations between different tensors of compatible sizes.


In [None]:
# Create a 4x3 tensor of 6s
a = torch.ones((4,3)) * 6
a

In [None]:
# Create a 1D tensor of 2s
b = torch.ones(3) * 2
b

In [None]:
# Divide a by b
a / b

In [None]:
a.shape, b.shape

We can use `tensor.matmul(other_tensor)` for matrix multiplication and `tensor.T` for transpose. Matrix multiplication can also be performed with `@`.

In [None]:
# Alternative to a.matmul(b)
# a @ b.T returns the same result since b is 1D tensor and the 2nd dimension
# is inferred
(a @ b).shape

We can take the mean and standard deviation along a certain dimension with the methods `mean(dim)` and `std(dim)`. That is, if we want to get the mean `3x2` matrix in a `4x3x2` matrix, we would set the `dim` to be 0. We can call these methods with no parameter to get the mean and standard deviation for the whole tensor. To use `mean` and `std` our tensor should be a floating point type.

In [None]:
# Create an example tensor
m = torch.tensor(
    [
     [1., 1.],
     [2., 2.],
     [3., 3.],
     [4., 4.]
    ]
)

print("Mean: {}".format(m.mean()))
print("Mean in the 0th dimension: {}".format(m.mean(0)))
print("Mean in the 1st dimension: {}".format(m.mean(1)))

In [None]:
print("Standard deviation:", m.std().item())
print("Median:", m.median().item())

We can concatenate tensors using `torch.cat`.



In [None]:
# Concatenate in dimension 0 and 1
a_cat0 = torch.cat([a, a, a], dim=0)
a_cat1 = torch.cat([a, a, a], dim=1)

print("Initial shape: {}".format(a.shape))
print("Shape after concatenation in dimension 0: {}".format(a_cat0.shape))
print("Shape after concatenation in dimension 1: {}".format(a_cat1.shape))

Most of the operations in `PyTorch` are not in place. However, `PyTorch` offers the in place versions of operations available by adding an underscore (`_`) at the end of the method name.

In [None]:
# Print our tensor
a

In [None]:
# add() is not in place
a.add(a)
a

In [None]:
# add_() is in place
a.add_(a)
a

## Autograd
`PyTorch` and other machine learning libraries are known for their automatic differantiation feature. That is, given that we have defined the set of operations that need to be performed, the framework itself can figure out how to compute the gradients. We can call the `backward()` method to ask `PyTorch` to calculate the gradients, which are then stored in the `grad` attribute.

In [None]:
# Create an example tensor
# requires_grad parameter tells PyTorch to store gradients
x = torch.tensor([2.], requires_grad=True)

# Print the gradient if it is calculated
# Currently None since x is a scalar
print(x.grad)

In [None]:
# Calculating the gradient of y with respect to x
y = x * x * 3 # 3x^2
y.backward()
x.grad # d(y)/d(x) = d(3x^2)/d(x) = 6x = 12

Let's run backprop from a different tensor again to see what happens.

In [None]:
z = x * x * 3 # 3x^2
z.backward()
x.grad

We can see that the `x.grad` is updated to be the sum of the gradients calculated so far. When we run backprop in a neural network, we sum up all the gradients for a particular neuron before making an update. This is exactly what is happening here! This is also the reason why we need to run `zero_grad()` in every training iteration (more on this later). Otherwise our gradients would keep building up from one training iteration to the other, which would cause our updates to be wrong.

# 拓展内容

## Neural Network Module

So far we have looked into the tensors, their properties and basic operations on tensors. These are especially useful to get familiar with if we are building the layers of our network from scratch. We will utilize these in Assignment 3, but moving forward, we will use predefined blocks in the `torch.nn` module of `PyTorch`. We will then put together these blocks to create complex networks. Let's start by importing this module with an alias so that we don't have to type `torch` every time we use it.

In [None]:
import torch.nn as nn

### **Linear Layer**
We can use `nn.Linear(H_in, H_out)` to create a a linear layer. This will take a matrix of `(N, *, H_in)` dimensions and output a matrix of `(N, *, H_out)`. The `*` denotes that there could be arbitrary number of dimensions in between. The linear layer performs the operation `Ax+b`, where `A` and `b` are initialized randomly. If we don't want the linear layer to learn the bias parameters, we can initialize our layer with `bias=False`.

In [None]:
# Create the inputs
input = torch.ones(2,3,4)
# N* H_in -> N*H_out


# Make a linear layers transforming N,*,H_in dimensinal inputs to N,*,H_out
# dimensional outputs
linear = nn.Linear(4, 2)
nn.Linear(2,1)
linear_output = linear(input)
linear_output

In [None]:
list(linear.parameters()) # Ax + b

### **Other Module Layers**
There are several other preconfigured layers in the `nn` module. Some commonly used examples are `nn.Conv2d`, `nn.ConvTranspose2d`, `nn.BatchNorm1d`, `nn.BatchNorm2d`, `nn.Upsample` and `nn.MaxPool2d` among many others. We will learn more about these as we progress in the course. For now, the only important thing to remember is that we can treat each of these layers as plug and play components: we will be providing the required dimensions and `PyTorch` will take care of setting them up.

### **Activation Function Layer**
We can also use the `nn` module to apply activations functions to our tensors. Activation functions are used to add non-linearity to our network. Some examples of activations functions are `nn.ReLU()`, `nn.Sigmoid()` and `nn.LeakyReLU()`. Activation functions operate on each element seperately, so the shape of the tensors we get as an output are the same as the ones we pass in.

In [None]:
linear_output

In [None]:
sigmoid = nn.Sigmoid()
output = sigmoid(linear_output)
output

## Functional form

Activation functions can be applied as layers (like with `nn.Sigmoid`) or using the functional form via `torch.nn.functional`:

In [None]:
torch.nn.functional.sigmoid(output)

### **Putting the Layers Together**
So far we have seen that we can create layers and pass the output of one as the input of the next. Instead of creating intermediate tensors and passing them around, we can use `nn.Sequential`, which does exactly that.

In [None]:
block = nn.Sequential(
    nn.Linear(4, 2),
    nn.Sigmoid()
)

input = torch.ones(2,3,4)
output = block(input)
output

### Custom Modules

Instead of using the predefined modules, we can also build our own by extending the `nn.Module` class. For example, we can build a the `nn.Linear` (which also extends `nn.Module`) on our own using the tensor introduced earlier! We can also build new, more complex modules, such as a custom neural network. You will be practicing these in the later assignment.

To create a custom module, the first thing we have to do is to extend the `nn.Module`. We can then initialize our parameters in the `__init__` function, starting with a call to the `__init__` function of the super class. All the class attributes we define which are `nn` module objects are treated as parameters, which can be learned during the training. Tensors are not parameters, but they can be turned into parameters if they are wrapped in `nn.Parameter` class.

All classes extending `nn.Module` are also expected to implement a `forward(x)` function, where `x` is a tensor. This is the function that is called when a parameter is passed to our module, such as in `model(x)`.

In [None]:
import torch.nn as nn
class MultilayerPerceptron(nn.Module):
  def __init__(self, input_size, hidden_size):
    # Call to the __init__ function of the super class
    super().__init__()

    # Bookkeeping: Saving the initialization parameters
    self.input_size = input_size
    self.hidden_size = hidden_size

    # Defining of our model
    # There isn't anything specific about the naming of `self.model`. It could
    # be something arbitrary.
    self.model = nn.Sequential(
        nn.Linear(self.input_size, self.hidden_size),
        nn.ReLU(),
        nn.Linear(self.hidden_size, self.input_size),
        nn.Sigmoid()
    )

  def forward(self, x):
    output = self.model(x)
    return output

Here is an alternative way to define the same class. You can see that we can replace `nn.Sequential` by defining the individual layers in the `__init__` method and connecting the in the `forward` method.

In [None]:
class MultilayerPerceptron(nn.Module):
  def __init__(self, input_size, hidden_size):
    # Call to the __init__ function of the super class
    super(MultilayerPerceptron, self).__init__()

    # Bookkeeping: Saving the initialization parameters
    self.input_size = input_size
    self.hidden_size = hidden_size

    # Defining of our layers
    self.linear = nn.Linear(self.input_size, self.hidden_size) # Ax + b ==> A.shape? b.shape? (go from size 5 to 3)
    self.relu = nn.ReLU()
    self.linear2 = nn.Linear(self.hidden_size, self.input_size) # Ax + b ==> A.shape? b.shape? (go from size 3 to 5)
    self.sigmoid = nn.Sigmoid()

  def forward(self, x):
    linear = self.linear(x)
    relu = self.relu(linear)
    linear2 = self.linear2(relu)
    output = self.sigmoid(linear2)
    return output

<i><b style="color: red">Question: How many parameters in this model?</b></i>

Now that we have defined our class, we can instantiate it and see what it does.

In [None]:
import torch

# Make a sample input
input = torch.randn(2, 5)

# Create our model
model = MultilayerPerceptron(5, 3)

# Pass our input through our model
model(input)

tensor([[0.5510, 0.4670, 0.5036, 0.3651, 0.4176],
        [0.3549, 0.4204, 0.4885, 0.4946, 0.4341]], grad_fn=<SigmoidBackward0>)

We can inspect the parameters of our model with `named_parameters()` and `parameters()` methods.

In [None]:
num_parameters = 0
for n, p in model.named_parameters():
  print(n, 'has', p.numel(), 'parameters')
  num_parameters += p.numel()
print('-' * 40)
print('total num parameters:', num_parameters)

linear.weight has 15 parameters
linear.bias has 3 parameters
linear2.weight has 15 parameters
linear2.bias has 5 parameters
----------------------------------------
total num parameters: 38


In [None]:
print(model)

MultilayerPerceptron(
  (linear): Linear(in_features=5, out_features=3, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=3, out_features=5, bias=True)
  (sigmoid): Sigmoid()
)


## Optimization
We have showed how gradients are calculated with the `backward()` function. Having the gradients isn't enought for our models to learn. We also need to know how to update the parameters of our models. This is where the optomozers comes in. `torch.optim` module contains several optimizers that we can use. Some popular examples are `optim.SGD` and `optim.Adam`. When initializing optimizers, we pass our model parameters, which can be accessed with `model.parameters()`, telling the optimizers which values it will be optimizing. Optimizers also has a learning rate (`lr`) parameter, which determines how big of an update will be made in every step. Different optimizers have different hyperparameters as well.

In [None]:
import torch.optim as optim

After we have our optimization function, we can define a `loss` that we want to optimize for. We can either define the loss ourselves, or use one of the predefined loss function in `PyTorch`, such as `nn.BCELoss()`. Let's put everything together now! We will start by creating some dummy data.

In [None]:
import torch
# Create the y data
y = torch.ones(10, 5)

# Add some noise to our goal y to generate our x
# We want out model to predict our original data, albeit the noise
x = y + torch.randn_like(y)
x

tensor([[-0.2214,  0.6889,  0.5493,  1.2362,  1.9521],
        [ 0.5472,  1.7272,  0.9165,  1.0015,  1.1384],
        [-0.1626,  0.7759,  0.1111,  2.1401,  0.2645],
        [ 1.4272,  1.7919,  1.3914,  0.4673,  1.6070],
        [ 0.9705,  1.5223,  1.0252,  2.3606,  1.0656],
        [ 2.2029,  0.7358, -0.3979,  2.0095, -0.7031],
        [ 1.6055,  0.0042, -0.8418, -1.1823,  3.0799],
        [ 0.5836,  1.7840,  0.8600,  1.2844,  1.0638],
        [ 1.8731, -0.1323, -0.1827,  0.6417, -0.2170],
        [ 0.8074,  1.4787,  0.6010,  0.4693, -0.0743]])

Now, we can define our model, optimizer and the loss function.

In [None]:
# Instantiate the model
model = MultilayerPerceptron(5, 3)

# Define the optimizer
adam = optim.Adam(model.parameters(), lr=1)

# Define loss using a predefined loss function
loss_function = nn.BCELoss()

# Calculate how our model is doing now
y_pred = model(x)
loss_function(y_pred, y).item()

0.6781231164932251

Let's see if we can have our model achieve a smaller loss. Now that we have everything we need, we can setup our training loop.

In [None]:
# Set the number of epoch, which determines the number of training iterations
n_epoch = 10

for epoch in range(n_epoch):
  # Set the gradients to 0
  adam.zero_grad()

  # Get the model predictions
  y_pred = model(x)

  # Get the loss
  loss = loss_function(y_pred, y)

  # Print stats
  print(f"Epoch {epoch}: training loss: {loss}")

  # Compute the gradients
  loss.backward()

  # Take a step to optimize the weights
  adam.step()


Epoch 0: traing loss: 0.6781231164932251
Epoch 1: traing loss: 0.26641646027565
Epoch 2: traing loss: 0.11468765139579773
Epoch 3: traing loss: 0.0518057644367218
Epoch 4: traing loss: 0.025344399735331535
Epoch 5: traing loss: 0.013451475650072098
Epoch 6: traing loss: 0.007681891787797213
Epoch 7: traing loss: 0.004674314986914396
Epoch 8: traing loss: 0.0030038156546652317
Epoch 9: traing loss: 0.002023308305069804


In [None]:
list(model.parameters())

[Parameter containing:
 tensor([[-4.5308, -3.9733, -4.4686, -4.0116, -3.8651],
         [-3.9764, -4.0669, -4.4570, -4.2890, -3.8520],
         [-3.8711, -4.1119, -4.1699, -4.0015, -4.5175]], requires_grad=True),
 Parameter containing:
 tensor([-4.3073, -4.4077, -4.2221], requires_grad=True),
 Parameter containing:
 tensor([[4.2747, 3.7099, 4.0836],
         [3.8719, 4.4511, 3.6660],
         [3.7980, 4.4729, 3.7731],
         [4.3358, 3.7851, 4.6103],
         [3.8353, 3.6951, 3.7058]], requires_grad=True),
 Parameter containing:
 tensor([6.5959, 6.7022, 6.5133, 6.4282, 6.5640], requires_grad=True)]

You can see that our loss is decreasing. Let's check the predictions of our model now and see if they are close to our original `y`, which was all `1s`.

In [None]:
# See how our model performs on the training data
y_pred = model(x)
y_pred

tensor([[0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986]], grad_fn=<SigmoidBackward0>)

In [None]:
# Create test data and check how our model performs on it
x2 = y + torch.randn_like(y)
y_pred = model(x2)
y_pred

tensor([[0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986],
        [0.9986, 0.9988, 0.9985, 0.9984, 0.9986]], grad_fn=<SigmoidBackward0>)

Great! Looks like our model almost perfectly learned to filter out the noise from the `x` that we passed in!