## <center> CSCI 6516 : PyTorch Intro </center>

This is a tutorial on PyTorch basics. This notes is just a starting point. You are encouraged to find out more. The ideal use of this notebook would be as a reference which you keep updating as you learn new things.

<b>Credits:</b> Content presented in this tutorial is what I have learnt from my teachers, friends and the python community on stackoverflow.


- PyTorch vs Numpy
- Compute Graph
  * Directed Acyclic Graph
  * Nodes having no parent nodes - Leaf Nodes in PyTorch [Inputs, Parameters, Expected Outputs]

In [None]:
import torch
from torch import nn
import torch.nn.functional as F
print(torch.__version__)

import numpy as np

2.1.0+cu121


## Tensor - basic unit of representation

* Tensor:
    * [Formal] An nth-rank tensor in  m-dimensional space is a mathematical object that has n indices and m^n components and obeys certain transformation rules.
    * [Informal] An n-dimensional matrix.

In [None]:
torch.FloatTensor(np.random.randn(10,5))

tensor([[-1.3327, -1.0318,  1.0795, -0.7287,  0.9532],
        [-1.2961,  0.2298,  0.5695,  0.1179,  0.9346],
        [ 1.4082, -0.8558, -0.7098, -0.5607, -1.0040],
        [ 0.1338,  1.4050, -0.0794, -0.1851, -0.2972],
        [-0.3347, -1.4563, -0.4792,  0.6049, -1.1870],
        [-0.0572, -1.2766,  1.9515,  0.8081,  0.2033],
        [-0.7248, -0.2511, -0.3619, -1.2093,  0.0409],
        [-0.4883,  0.1910,  1.4972, -0.0207, -0.8428],
        [-1.4065, -1.3832, -1.0319,  1.3215, -0.1393],
        [-0.0472, -0.5751, -0.1017,  0.5011,  0.2971]])

## Size and Stride of Tensor

In [None]:
x = torch.LongTensor(
    [
       [
           [1,2],
           [3,4]
       ],

       [
           [5,6],
           [7,8]
       ],

       [
           [9,10],
           [11,12]
       ]
    ])

x_2d = torch.LongTensor([[1,2],
                         [3,4]])

In [None]:
x_2d.shape

torch.Size([2, 2])

In [None]:
x_2d.stride()

(2, 1)

<b> Reshape/View </b>

In [None]:
x.reshape(-1)

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [None]:
x.reshape(6,-1)

tensor([[ 1,  2],
        [ 3,  4],
        [ 5,  6],
        [ 7,  8],
        [ 9, 10],
        [11, 12]])

<b> Transpose </b>

.t , .transpose and .permute

In [None]:
x.stride()

(4, 2, 1)

In [None]:
x.permute(2,1,0).stride()

(1, 2, 4)

<b> Mechanics of Transpose and Reshape </b>

.contiguous, .storage, .data_ptr

In [None]:
x_2d.t().contiguous().stride()

(2, 1)

In [None]:
x_2d.t().contiguous().storage().data_ptr()==x_2d.storage().data_ptr()

False

#### Exercise

Reshape x into a 2D matrix such that column 1 has [1,2,3,4]; column 2 has [5,6,7,8] and column 3 has [9,10,11,12].

Challenge: Can you give another way of achieving the same?

In [None]:
x

## Arithmetic operations

<b> Addition/Subtraction </b>

In [None]:
x

tensor([[[ 1,  2],
         [ 3,  4]],

        [[ 5,  6],
         [ 7,  8]],

        [[ 9, 10],
         [11, 12]]])

In [None]:
x + 1

tensor([[[ 2,  3],
         [ 4,  5]],

        [[ 6,  7],
         [ 8,  9]],

        [[10, 11],
         [12, 13]]])

In [None]:
x + x

tensor([[[ 2,  4],
         [ 6,  8]],

        [[10, 12],
         [14, 16]],

        [[18, 20],
         [22, 24]]])

<b> Multiplication </b>

In [None]:
x_2d

tensor([[1, 2],
        [3, 4]])

In [None]:
x_2d * 5

tensor([[ 5, 10],
        [15, 20]])

In [None]:
x_2d / x_2d

tensor([[1., 1.],
        [1., 1.]])

In [None]:
x_2d @ x_2d

tensor([[ 7, 10],
        [15, 22]])

<b> Division </b>

In [None]:
x_2d.dtype

torch.int64

In [None]:
x_2d/3

tensor([[0.3333, 0.6667],
        [1.0000, 1.3333]])

In [None]:
(x_2d/x_2d).dtype

torch.float32

<b> Broadcasting </b> <br>
Broadcasting is "auto-correction" of one of the arguments in order to bring them to the correct dimensions.
* Replicates the 1/more dimensions of one or both the tensors to match them.
* From Scipy:
<pre>
When operating on two arrays, NumPy compares their shapes element-wise. It starts with
the trailing dimensions, and works its way forward. Two dimensions are compatible when:
    * they are equal, or
    * one of them is 1
</pre>
* Makes a best effort to auto-correct; throws error if it cannot.
* Not applicable to all operators.

In [None]:
x_2d

tensor([[1, 2],
        [3, 4]])

In [None]:
y = np.array([[1,2]])
print(x_2d.shape,y.shape)
x_2d + y

torch.Size([2, 2]) (1, 2)


tensor([[2, 4],
        [4, 6]])

In [None]:
y = np.array([[1],[2]])
print(x_2d.shape, y.shape)
x_2d + y

torch.Size([2, 2]) (2, 1)


tensor([[2, 3],
        [5, 6]])

In [None]:
y = np.array([[1,2]])
print(x.shape, y.shape)
x * y

torch.Size([3, 2, 2]) (1, 2)


tensor([[[ 1,  4],
         [ 3,  8]],

        [[ 5, 12],
         [ 7, 16]],

        [[ 9, 20],
         [11, 24]]])

In [None]:
y = np.array([[1],[2]])
print(x_2d.shape, y.shape)
x_2d * y

torch.Size([2, 2]) (2, 1)


tensor([[1, 2],
        [6, 8]])

In [None]:
y = np.array([[1,2]])
print(x_2d.shape, y.shape)
x_2d / y

torch.Size([2, 2]) (1, 2)


tensor([[1., 1.],
        [3., 2.]], dtype=torch.float64)

In [None]:
y = np.array([[1],[2]])
print(x_2d.shape, y.shape)
x_2d / y

torch.Size([2, 2]) (2, 1)


tensor([[1.0000, 2.0000],
        [1.5000, 2.0000]], dtype=torch.float64)

<b>Perils of broadcasting</b>

In [None]:
p = torch.FloatTensor([[1],[2],[3]]).view(-1)
q = torch.FloatTensor([4,5,6])
print(p.shape,q.shape)
p+q

torch.Size([3]) torch.Size([3])


tensor([5., 7., 9.])

## Idea of dim

In [None]:
y = torch.LongTensor([[1,2,5],
                      [3,4,6]])

In [None]:
y.sum(dim=1)

tensor([ 8, 13])

In [None]:
y.sum(dim=0)

tensor([ 4,  6, 11])

In [None]:
y.sum(dim=0,keepdims=True).shape

torch.Size([1, 3])

In [None]:
x_2d

tensor([[1, 2],
        [3, 4]])

In [None]:
F.softmax(x_2d.float(),dim=0)

tensor([[0.1192, 0.1192],
        [0.8808, 0.8808]])

**Unary Tensor functions**

sum, mean, max, min, etc

In [None]:
x_2d

In [None]:
x_2d.sum()

In [None]:
x_2d.sum(dim=1)

In [None]:
x_2d.sum(dim=1)

In [None]:
x_2d.sum(dim=1,keepdims=True)

In [None]:
x_2d = x_2d
torch.exp(x_2d)/torch.exp(x_2d).sum(dim=0,keepdims=True)

tensor([[0.1192, 0.1192],
        [0.8808, 0.8808]])

<b> Exercise </b> <br>
Find the sum of each of the 2x2 matrices of x.

In [None]:
x

Other important/useful functions:
- torch.log
- torch.abs
- torch.exp
- torch.cat
- torch.stack
- torch.where
- F.leaky_relu, F.sigmoid, F.tanh, F.conv*, etc

## Autograd

Derivatives: grad,  retain_graph, create_graph

In [None]:
x = torch.FloatTensor([1]).requires_grad_(True)
y = torch.FloatTensor([2])
y.requires_grad = (True)
f = x**3 + y**2

print(torch.autograd.grad(f,x,retain_graph=True)[0])
print(torch.autograd.grad(f,y,retain_graph=True)[0])

df_dx = torch.autograd.grad(f,x,retain_graph=True,create_graph=True)[0]

torch.autograd.grad(df_dx,x)

tensor([3.])
tensor([4.])


(tensor([6.]),)

Exercise: Taylor-Series, jacobian-vector products and vector-jacobian products
- Fill out the code to compute the jacobian matrix.
- Confirm that the jacobian matrix is correctly computed by constructing a linear approximation of the following network.
- Finally, fill out the jvp method to directly compute the jacobian-vector products without computing the entire jacobian matrix. Confirm the correctness of this method.

In [None]:
class Act(nn.Module):
  def __init__(self):
    super().__init__()

  def forward(self, x):
    return (x)

class Net(nn.Module):
  def __init__(self,num_dims=3):
    super().__init__()
    torch.random.manual_seed(23)

    self.num_dims = num_dims
    self.net = nn.Sequential(
        nn.Linear(num_dims,1024),
        Act(),
        nn.Linear(1024,num_dims+2)
    )

  def forward(self, x):
    return self.net(x)

  def jacobian(self, x):
    # write code here
    y = self.forward(x)
    jacobian_matrix = torch.zeros(x.shape[0], self.out_dims, self.num_dims)
    for column in range(y.shape[1]):
      gradients = torch.autograd.grad(y[:, column].sum(), x, retain_graph = True)
      jacobian_matrix = gradients[:, column, :]
    print(y.shape)
    return jacobian_matrix


  def jvp(self, x, v):
    # write code here
    pass

In [None]:
f = Net()
x = torch.arange(9).float().reshape(3,3)
f.jacobian(x)

AttributeError: 'Net' object has no attribute 'out_dims'

In [None]:
delta = 0.1*torch.ones_like(x)
torch.square(f(x+delta)-(f(x)+(f.jacobian(x)@delta.view(delta.shape[0],delta.shape[1],1)).view(x.shape[0],x.shape[1]))).sum()

Leaf Tensors and Optimizers

In [None]:
w = torch.tensor([[1.0]]).zero_().requires_grad_(True)
optim = torch.optim.SGD([w],lr=0.1)