![hslu_logo.png](img/hslu_logo.png)

## Week 01

<hr style="border:1px solid black">

# Exercise: Introduction to PyTorch
---
---

*PyTorch is a machine learning framework based on the Torch library,used for applications such as computer vision and natural language processing,originally developed by Meta AI and now part of the Linux Foundation umbrella. It is recognized as one of the two most popular machine learning libraries alongside TensorFlow, offering free and open-source software released under the modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.* [Wikipedia](https://en.wikipedia.org/wiki/PyTorch).

Due to its similarity with numpy - with regard to array (or tensor) management - you can quickly familiarize with PyTorch if you have a good knowledge of numpy. 

### Import PyTorch and numpy

In [None]:
import torch
import numpy as np

### Handling of tensors
#### Convert from numpy

In [None]:
array_np = np.ones([3,2])
print(array_np)

array_py = torch.from_numpy(array_np)
print(array_py)

#### Create from scratch

In [None]:
x = torch.ones([3,2],dtype=torch.float)
print(x)

y = torch.rand([2,3],dtype=torch.float)
print(y)


#### Operation on tensors 

Observe the following multiplications. The multiplication sign `*` represents an element-wise multiplication (including possible broadcasting as in numpy) and `@` is a multiplication in the mathematical sense i.e., matrix multiplication.


In [None]:
print(x*y.T)
print(x@y)
print(y@x)
print(y.T@x.T)


#### Indexing

Indexing of torch tensors is as in numpy

In [None]:
print(y[:,0])
print(y[1,:])
print(y > 0.5)

#### Assignemnt / Copy

Care has to be taken - as in numpy - when using an assignment of a `x` to a tensor `y`. This does $not$ represent a deep copy and changes to the elements of `y` will also change `x`.

In [None]:
x = torch.ones([3,2],dtype=torch.float)
y = torch.rand([2,3],dtype=torch.float)
y = x
print(x)
print(y)
y[0,0] = 2
print(x)
print(y)

Use a deep copy with the `copy_` operator to ensure a new tensor with own elements.

In [None]:
x = torch.ones([3,2],dtype=torch.float)
y = torch.rand([3,2],dtype=torch.float)
y.copy_(x)
print(x)
print(y)
y[0,0] = 2
print(x)
print(y)

### Autograd functionality
PyTorch provides the possibility to automatically keep track of the gradients with respect to certain tensors. In the following example, PyTorch will keep track of the gradients with respect to `W` and `B`when calculating the result `res`. This is done by calling `res.backward()`. The gradients are then available in the `grad` member of the respective tensor.

#### Automatic gradient using autograd

In [None]:
x = torch.tensor([[1,2,3]],dtype=torch.double).T
#declear W and b as tensors with gradient determination
W = torch.randn([2,3],dtype=torch.double, requires_grad=True) 
b = torch.randn([2,1],dtype=torch.double, requires_grad=True) 

#calculate a function called 'cost'
a = W@x + b
cost = a.T@a
print(cost.item())

#now call backward() on 'cost' to determine the gradients of W and b
cost.backward()

#print the result
print(W.grad.numpy())
print(b.grad.numpy())

It is important to notice, that the `grad` member is not cleared automatically before a call to `backward()`. Thus successive calls will accumulate the result

In [None]:
#repeat the same calculation 
a = W@x + b
cost = a.T@a
print(cost.item())

#now call backward() on 'cost' to determine the gradients of W and b
cost.backward()

#print the result
print(W.grad.numpy())
print(b.grad.numpy())

#it will differ and increase with each call

Thus, the `grad` member of `W` and `b` must be reset before each call to `backward()`:

In [None]:
#repeat the same calculation 
a = W@x + b
cost = a.T@a
print(cost.item())

#clear the grad entry of W and b
W.grad = None
b.grad = None

#now call backward() on 'cost' to determine the gradients of W and b
cost.backward()

#print the result
print(W.grad.numpy())
print(b.grad.numpy())

#now the results will be again identical as in the first call

A further problem may arise, once you want to use the values of `W` and `b` for calculation. Variables that required gradient calculations are restricted in their use. But you can always call `with torch.no_grad()` in order to suppress the gradient calculation.

In [None]:
#the following call (W += ...) will raise an error as long as the first statement (with torch.no_grad():) is commented
#with torch.no_grad():
W += W.grad
W -= W.grad

#### Manual verification of gradient determination

For comparison we determine the gradient manually. Recall that 
$$
res = \mathbf{a}^T \cdot \mathbf{a} = (\mathbf{W} \cdot \mathbf{x} + \mathbf{b})^T \cdot (\mathbf{W} \cdot \mathbf{x} + \mathbf{b}) 
$$
Thus it is straight forward to show that:
$$
\frac{\partial}{\partial \mathbf{W}} res = 2 \cdot (\mathbf{W} \cdot \mathbf{x} + \mathbf{b}) \cdot \mathbf{x}^T = 2\cdot \mathbf{a} \cdot \mathbf{x}^T
$$
$$
\frac{\partial}{\partial \mathbf{b}} res = 2 \cdot (\mathbf{W} \cdot \mathbf{x} + \mathbf{b})= 2\cdot \mathbf{a}
$$

In [None]:
with torch.no_grad():
    W_grad = 2*a*x.T
    b_grad = 2*a
    print(W_grad.numpy())
    print(b_grad.numpy())

#### Numeric gradient determination

Finally, again for comparison we determine the gradient numerically.

In [None]:
eps = 1e-7
W_grad = torch.zeros(W.shape,dtype=torch.double)
for row in range(0, W.shape[0]):
    for col in range(0, W.shape[1]):
        dw = torch.zeros(W.shape,dtype=torch.double)
        dw[row,col] = eps
        a_eps = (W+dw)@x + b
        cost_eps = a_eps.T@a_eps
        W_grad[row,col] = (cost_eps - cost)/eps
        
print(W_grad)

b_grad = torch.zeros(b.shape,dtype=torch.double)
for row in range(0, b.shape[0]):
    db = torch.zeros(b.shape,dtype=torch.double)
    db[row,0] = eps
    a_eps = W@x + b + db
    cost_eps = a_eps.T@a_eps
    b_grad[row,0] = (cost_eps - cost)/eps

print(b_grad)