# Pytorch learning 


In this notebook I cover a set of tutorials about PyTorch. First, I start with this notebook https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/ and cover a series of 5 of them. When I switch to a new notebook, I explicitely mention it.

## Autograd

Let's create a tensor

In [None]:
import torch
from torch import nn

In [None]:
tsr = torch.Tensor(3, 5)
tsr

tensor([[4.0387e-35, 0.0000e+00, 3.7835e-44, 0.0000e+00,        nan],
        [0.0000e+00, 1.3733e-14, 6.4069e+02, 4.3066e+21, 1.1824e+22],
        [4.3066e+21, 6.3828e+28, 3.8016e-39, 0.0000e+00, 1.4394e-35]])

To create a graph corresponding to a tensor, we need to pass a parameter. They are 2 ways to do it.

In [None]:
# Way number 1
tsr.requires_grad = True
# Way number 2
t1 = torch.randn((3, 3), requires_grad=True)

Let's implement a simple model

In [None]:
a = torch.randn((3, 3), requires_grad=True)

w1 = torch.randn((3, 3), requires_grad=True)
w2 = torch.randn((3, 3), requires_grad=True)
w3 = torch.randn((3, 3), requires_grad=True)
w4 = torch.randn((3, 3), requires_grad=True)

b = w1 * a
c = w2 * a
d = w3 * b + w4 * c
L = 10 -d
print("The grad fn for a is ", a.grad_fn)
print("the grad fn for d is ", d.grad_fn)
print("the grad fn for b is ", b.grad_fn)

The grad fn for a is  None
the grad fn for d is  <AddBackward0 object at 0x7f99a279ff98>
the grad fn for b is  <MulBackward0 object at 0x7f99a279ff28>


All mathematical operations are implemented by the *torch.nn.Autograd.Function* class

The backward propagation can be called only on a scalar value. If I call it on a tensor, I get an error

In [None]:
L.backward()

RuntimeError: ignored


it can be overcome using an array of scalar values with the size of the Tensor

In [None]:
L.backward(torch.ones(L.shape))

## Difference to tensorflow



Graphs are created dynamically in PyTorch, so until the function *forward* is called, no graph is created

After each run of the forward function all non-leaf node buffers are computed. After running the backward, they are automatically destroyed, while leaf nodes are updated

In [None]:
L.backward(torch.ones(L.shape))

RuntimeError: ignored

It allows networks to be created on the fly, since graph is created after explicitely calling forward function.

## Useful operators

*requires_grad* - gradient isn't needed for freezed layers.

*no_grad* when doing inference, there's no need to compute the memory and, therefore, memory can be saved

# Building Your First Neural Network

In this part I cover the topics from https://blog.paperspace.com/pytorch-101-building-neural-networks/

*nn.Module* means an arbitraty function **f** in pytorch. To create a network 2 function need to be overwritten: *\__init__* and *forward*

*nn.Sequential* is another class which is widely used in pytorch. It accepts a set of *nn.Module* objects and returns an *nn.Sequentil* object

In [None]:
import torch
from torch import nn
import numpy as np

In [None]:
combinedNetwork = nn.Sequential(nn.Linear(5, 10), nn.Linear(10, 20))
type(combinedNetwork)

torch.nn.modules.container.Sequential

In [None]:
network = combinedNetwork(torch.from_numpy(np.random.randn(5)).float())
type(network)

torch.Tensor

the benefit of *DataLoader* is that it automatically converts the output to Tensor type

## Going deep with neural nets

In this part, I cover https://blog.paperspace.com/pytorch-101-advanced/

The main difference between *nn.Module* and *nn.Functional* is the following. The former has a state(since each layer need to keep the weights and gradients). The functions from *nn.Functional* are used for stateful-ness operations.

Another interesting class is *nn.Parameter* To demonstate it, I define a simple net

In [None]:
class Mynet(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(10, 5)

  def forward(self, x):
    output = self.linear(x)
    return output

mynet = Mynet()
list(mynet.parameters())

[Parameter containing:
 tensor([[ 0.0605, -0.1577, -0.2564,  0.2327, -0.0998,  0.3030, -0.1706, -0.0659,
          -0.0222, -0.1855],
         [-0.1012,  0.1191, -0.1285, -0.0612, -0.2749,  0.0721, -0.1269, -0.1609,
          -0.2924,  0.0676],
         [ 0.2205, -0.0171,  0.2662, -0.3061, -0.1241,  0.3135, -0.1262, -0.0601,
          -0.0219,  0.2227],
         [-0.2299, -0.0571, -0.2580, -0.1033,  0.2799, -0.1813, -0.0012,  0.3117,
          -0.2663, -0.3112],
         [-0.2855, -0.0405,  0.0629, -0.1601,  0.3151,  0.1530, -0.0834,  0.1996,
           0.0059, -0.0091]], requires_grad=True), Parameter containing:
 tensor([ 0.2642, -0.2559, -0.0846, -0.1772,  0.2134], requires_grad=True)]

As you see above, the parameters from the linear level are automatically recognized by the net as its parameters

Everything that is inside \__init__ function is added as parameters. Exeption is only for Tensor class, such as *tensor.ones* and *tensor.zeros*.

To make it show up, an *nn.Parameter* class must be used

In [None]:
class Mynet(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(10, 5)
    self.inner = nn.Parameter(torch.zeros(4))

  def forward(self, x):
    output = self.linear(x)
    return output

mynet = Mynet()
list(mynet.parameters())

[Parameter containing:
 tensor([0., 0., 0., 0.], requires_grad=True), Parameter containing:
 tensor([[-0.2888, -0.0802,  0.0872, -0.1590, -0.0287, -0.0796,  0.1052, -0.0085,
          -0.2639,  0.2807],
         [-0.1082,  0.0433, -0.2152, -0.0853,  0.2812, -0.0872,  0.2499, -0.3032,
           0.2299, -0.2259],
         [ 0.1409,  0.1870, -0.3132, -0.2750,  0.0386,  0.0013, -0.1540,  0.1911,
          -0.1353, -0.0429],
         [ 0.1782, -0.0319, -0.0177, -0.1158,  0.1186, -0.1488,  0.1651, -0.0716,
          -0.2456, -0.1220],
         [ 0.0268,  0.1888, -0.2030, -0.1096,  0.3083, -0.1071, -0.1529,  0.1707,
           0.0433,  0.2223]], requires_grad=True), Parameter containing:
 tensor([-0.2261,  0.1739,  0.1476, -0.0485,  0.1648], requires_grad=True)]

When a stack of layers is defined, they are not recognized by a model

In [None]:
layers_list = [nn.Linear(10, 5), nn.Linear(5, 15), nn.Linear(15, 20)]

class Mynet(nn.Module):
  def __init__(self):
    super().__init__()
    self.layers = layers_list

  def forward(self, x):
    output = self.layers(x)
    return output

mynet = Mynet()
list(mynet.parameters())

[]

To change it, *nn.ModuleList* or *nn.ParameterList* must be used

In [None]:
layers_list = [nn.Linear(10, 5), nn.Linear(5, 15), nn.Linear(15, 20)]
parameter_list = [nn.Parameter(torch.ones(10))]

class Mynet(nn.Module):
  def __init__(self):
    super().__init__()
    self.layers = nn.ModuleList(layers_list)
    self.params = nn.ParameterList(parameter_list)

  def forward(self, x):
    output = self.layers(x)
    return output

mynet = Mynet()
list(mynet.parameters())

[Parameter containing:
 tensor([[ 0.2256,  0.1057, -0.0938,  0.1719, -0.0162, -0.0553, -0.1997,  0.2098,
          -0.2369,  0.1588],
         [-0.0916,  0.1186,  0.2654, -0.0181, -0.1121,  0.1290, -0.0324, -0.0633,
          -0.0599,  0.2931],
         [ 0.0052, -0.2441, -0.0056, -0.1030,  0.2593, -0.0070,  0.1628, -0.3102,
          -0.2755,  0.2872],
         [ 0.2851,  0.2772, -0.1978, -0.2040,  0.1724,  0.0808,  0.1916, -0.2662,
           0.1573,  0.2082],
         [ 0.0880,  0.0807, -0.1373, -0.2930, -0.0022,  0.1389,  0.0883, -0.1949,
           0.3004,  0.1947]], requires_grad=True), Parameter containing:
 tensor([0.1116, 0.1571, 0.2895, 0.0630, 0.1353], requires_grad=True), Parameter containing:
 tensor([[ 0.3368, -0.2188,  0.0900, -0.1622, -0.1693],
         [-0.1741, -0.3960, -0.3861,  0.3227, -0.0122],
         [ 0.2166,  0.3882, -0.3045, -0.3232,  0.2660],
         [ 0.4345,  0.2395,  0.0025,  0.3183,  0.4440],
         [ 0.1576, -0.1098,  0.3615,  0.3684,  0.0667],
     

### Modules


We can also access all modules(layers) of the model using *module* function

In [None]:
class Mynet(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(10, 5)
    self.inner = nn.Parameter(torch.zeros(4))

  def forward(self, x):
    output = self.linear(x)
    return output

mynet = Mynet()

list(mynet.modules())

[Mynet(
   (linear): Linear(in_features=10, out_features=5, bias=True)
 ), Linear(in_features=10, out_features=5, bias=True)]

The similar function to *modules* is *named_modules*, it returns an iterator accross layers.

In [None]:
for x in mynet.named_modules():
  print(x[0], x[1], "\n------------------------------------")

 Mynet(
  (linear): Linear(in_features=10, out_features=5, bias=True)
) 
------------------------------------
linear Linear(in_features=10, out_features=5, bias=True) 
------------------------------------


### Different Learning Rates for Different Layers


It's possible to specify different learning rates for different layers of the net by providing different rates for different parameters

In [None]:
optimizer = torch.optim.SGD([{"params": mynetb.linear.parameters(), 'lr':0.001, "momentum":0.9}])

### Saving models

It's possible to save models in pytorch using *save* and *load* functions


In [None]:
torch.save(mynet, "mynet.pth")
mynet = torch.load("mynet.pth")
print(mynet)

Mynet(
  (linear): Linear(in_features=10, out_features=5, bias=True)
)


If we specify only the weights, we can save them using *statedict()* function


In [None]:
torch.save(mynet.state_dict(), "net_state_dict.pth")
mynet.load_state_dict(torch.load("net_state_dict.pth"))

<All keys matched successfully>

<All keys matched successfully>