# Introduction to PyTorch for deep learning

Following the tutorial available at:
https://heartbeat.fritz.ai/introduction-to-pytorch-for-deep-learning-5b437cea90ac

## Basics

First of all, import the library:

In [1]:
# (if not already available: sudo pip install torch)

import torch

Basic objects are float tensors, fo example:

In [2]:
torch.FloatTensor([[20, 30, 40], [90, 60, 70]])

tensor([[20., 30., 40.],
        [90., 60., 70.]])

They support math operations, for example:

In [3]:
x = torch.FloatTensor([25])
y = torch.FloatTensor([30])
x + y

tensor([55.])

Additionally, matrix definitions and operations work as expected, for example:

In [4]:
matrix = torch.randn(4, 5) # fill it with random floats
print(matrix)
print(matrix.t()) # transpose

tensor([[-1.2644, -0.0329, -0.4050, -1.7556,  1.9857],
        [ 0.7800, -1.2804,  0.9665, -1.2011, -2.5846],
        [ 0.2083,  1.6292,  0.6508,  0.9250, -0.3651],
        [-2.6477,  2.4488, -1.2462,  0.0196, -0.1552]])
tensor([[-1.2644,  0.7800,  0.2083, -2.6477],
        [-0.0329, -1.2804,  1.6292,  2.4488],
        [-0.4050,  0.9665,  0.6508, -1.2462],
        [-1.7556, -1.2011,  0.9250,  0.0196],
        [ 1.9857, -2.5846, -0.3651, -0.1552]])


PyTorch uses "automatic differentiation" (`torch.autograd`) to adjust the weights of a neural network during backward passes. For example:

In [5]:
# set requires_grad=True to keep track all operation on a tensor
a = torch.tensor([3.0, 2.0], requires_grad=True)
b = torch.tensor([4.0, 7.0])
ab_sum = a + b
print(ab_sum)

ab_res = (ab_sum*8).sum()
# method .backward() computes all gradients
ab_res.backward()
print(ab_res)
print(a.grad)

tensor([7., 9.], grad_fn=<AddBackward0>)
tensor(128., grad_fn=<SumBackward0>)
tensor([8., 8.])


## A simple neural network

The module `torch.nn` is used to describe the structure of the network. For example:

In [6]:
# basic config
N = 64 # batch size
D_in = 1000 # input dimension
H = 100 # hidden layer dimension
D_out = 10 # output dimension

x = torch.randn(N, D_in) # init matrix
y = torch.randn(N, D_out) # init matrix

In [7]:
# model definition
model = torch.nn.Sequential( # sequential stack of layers
    torch.nn.Linear(D_in, H), # linear transformation from input to hidden
    torch.nn.ReLU(), # ReLU transformation of the weights
    torch.nn.Linear(H, D_out), # linear tr. from hidden to output
)

In [8]:
# training options
loss_fn = torch.nn.MSELoss() # mean squared error between x and y
learning_rate = 1e-4 # self explaining

The module `optim` is then used to updated the weights:

In [9]:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) # use Adam optimizer

In [10]:
for t in range(500): # 500 epochs
    y_pred = model(x) # predict y with model and input x
    
    loss = loss_fn(y_pred, y) # compute and print loss for the step
    print(t, loss.item())
    
    optimizer.zero_grad()    
    loss.backward() # compute gradients    
    optimizer.step() # update weights

0 1.0009000301361084
1 0.9748170375823975
2 0.9495235681533813
3 0.925037145614624
4 0.9011276960372925
5 0.8778361082077026
6 0.855163037776947
7 0.8330785036087036
8 0.8115586042404175
9 0.7907435297966003
10 0.7705259919166565
11 0.7508348822593689
12 0.7317702770233154
13 0.713363528251648
14 0.6954378485679626
15 0.6779881119728088
16 0.6612109541893005
17 0.6448585987091064
18 0.6289124488830566
19 0.6134655475616455
20 0.5984372496604919
21 0.583889901638031
22 0.5697553157806396
23 0.5560014247894287
24 0.5426000356674194
25 0.5295177698135376
26 0.5167689919471741
27 0.5043398141860962
28 0.49222299456596375
29 0.48041102290153503
30 0.46893179416656494
31 0.4577582776546478
32 0.4468485116958618
33 0.4361775517463684
34 0.4257792532444
35 0.41559961438179016
36 0.40561121702194214
37 0.3958286643028259
38 0.3863048255443573
39 0.37695854902267456
40 0.3678199052810669
41 0.35888442397117615
42 0.3501546084880829
43 0.34161895513534546
44 0.3332899808883667
45 0.32511895895004

343 7.588611197206774e-07
344 7.213429853436537e-07
345 6.856245704511821e-07
346 6.516065695905127e-07
347 6.192112778080627e-07
348 5.883815674678772e-07
349 5.590342198047438e-07
350 5.310820938575489e-07
351 5.044905719842063e-07
352 4.791862693309668e-07
353 4.551048107259703e-07
354 4.3218977907599765e-07
355 4.1038646259039524e-07
356 3.8965620774433773e-07
357 3.699273065649322e-07
358 3.51164715084451e-07
359 3.3331863846797205e-07
360 3.163484620927193e-07
361 3.002048458711215e-07
362 2.8487013992162247e-07
363 2.702881261029688e-07
364 2.5641796241870907e-07
365 2.432403789498494e-07
366 2.307197348727641e-07
367 2.1881723455408064e-07
368 2.0751195961565827e-07
369 1.9676575391258666e-07
370 1.8656240285963577e-07
371 1.76868553580789e-07
372 1.676576601994384e-07
373 1.589144602576198e-07
374 1.506104325699198e-07
375 1.427199549652869e-07
376 1.3523769837320287e-07
377 1.2812839145226462e-07
378 1.2138288241203554e-07
379 1.1497423457740297e-07
380 1.0889958446114179e-07

## Custom neural networks

Creating a subclass of `torch.nn.Module` allows us to define custom modules. For example:

In [11]:
# custom module
class TwoLayerNet(torch.nn.Module): 
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred

In [12]:
# nn config
N, D_in, H, D_out = 64, 1000, 100, 10 # parameters as before

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = TwoLayerNet(D_in, H, D_out) # this time the model is created by calling the class above

In [13]:
# training
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    y_pred = model(x)

    loss = criterion(y_pred, y)
    print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 1.0113317966461182
1 1.0111998319625854
2 1.0110678672790527
3 1.0109360218048096
4 1.010804295539856
5 1.0106724500656128
6 1.0105407238006592
7 1.0104089975357056
8 1.0102773904800415
9 1.010145664215088
10 1.0100140571594238
11 1.0098824501037598
12 1.0097509622573853
13 1.0096193552017212
14 1.0094878673553467
15 1.0093563795089722
16 1.0092250108718872
17 1.0090936422348022
18 1.0089622735977173
19 1.0088310241699219
20 1.0086997747421265
21 1.008568525314331
22 1.0084372758865356
23 1.0083061456680298
24 1.008175015449524
25 1.008043885231018
26 1.0079127550125122
27 1.0077816247940063
28 1.00765061378479
29 1.0075196027755737
30 1.007388710975647
31 1.0072576999664307
32 1.007126808166504
33 1.0069959163665771
34 1.0068650245666504
35 1.0067341327667236
36 1.0066033601760864
37 1.0064724683761597
38 1.0063416957855225
39 1.0062110424041748
40 1.0060802698135376
41 1.00594961643219
42 1.0058189630508423
43 1.0056883096694946
44 1.0055577754974365
45 1.0054272413253784
46 1.0052

391 0.9623396396636963
392 0.9622203707695007
393 0.9621011018753052
394 0.9619818925857544
395 0.9618626832962036
396 0.9617434740066528
397 0.9616243243217468
398 0.9615052342414856
399 0.9613861441612244
400 0.9612670540809631
401 0.9611479640007019
402 0.9610289931297302
403 0.960909903049469
404 0.9607909321784973
405 0.9606720209121704
406 0.9605530500411987
407 0.9604341387748718
408 0.9603152275085449
409 0.9601963758468628
410 0.9600775241851807
411 0.9599587321281433
412 0.959839940071106
413 0.9597212076187134
414 0.9596024751663208
415 0.959483802318573
416 0.95936518907547
417 0.9592465758323669
418 0.9591279625892639
419 0.9590094089508057
420 0.9588908553123474
421 0.9587723612785339
422 0.9586538076400757
423 0.9585353136062622
424 0.9584169387817383
425 0.9582984447479248
426 0.9581800699234009
427 0.958061695098877
428 0.957943320274353
429 0.9578250050544739
430 0.95770663022995
431 0.9575884342193604
432 0.9574701189994812
433 0.9573518633842468
434 0.95723366737365