<a href="https://colab.research.google.com/github/reban87/PyTorch-Basics/blob/main/PyTorch_beginners.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PyTorch Crash Course
### Overview
1. Tensor Basics
    - Create, Operations, NumPy, GPU Support
2. Autograd
    - Linear Regression Example
3. Training loop with: Model Loss and Optimizer
    - A typical PyTorch training pipeline
4. Neural Network
   - Also: GPU, Datasets, DataLoaders, Transforms and Evaluations
5. CNN
  - Save / Load Model
  







## 1. Tensors
Everything is PyTorch is based on Tensor Operations. A Tensor is a multi-dimensional matrix containing element of a single data type:

In [10]:
import torch

# torch.empty(size): uninitialized
x = torch.empty(1) #scalar
print("empty(1):", x)
x = torch.empty(3) # vector
print("empty(3):", x)
x = torch.empty(2 ,3) # matrix
print("empty(2,3):", x)
x= torch.empty(2,2,3) # tensor 3 dimensions
print("empty(2,2,3):", x)
x = torch.empty(2,2,3,4) # tensor 4 dimensions
print("empty(2,2,3,4):", x)

# torch.rand(size): random numbers [0,1]
x = torch.rand(1)
print("rand(1):", x)
x = torch.rand(3)
print("rand(3):", x)
x = torch.rand(2,3)
print("rand(2,3):", x)

# torch.zeroes(size) , fill with 0
# torch.ones(size), fill with 1
x = torch.zeros(5,3)
print("zeros(5,3):", x)
x = torch.ones(5,3)
print("ones(5,3):", x)

empty(1): tensor([5.5294e+12])
empty(3): tensor([7.3061e-16, 3.1476e-41, 0.0000e+00])
empty(2,3): tensor([[2.2300e+18, 3.1472e-41, 2.0852e+18],
        [3.1472e-41, 1.1210e-43, 0.0000e+00]])
empty(2,2,3): tensor([[[7.2980e-16, 3.1476e-41, 0.0000e+00],
         [1.4013e-45, 8.9683e-44, 0.0000e+00]],

        [[1.1210e-43, 0.0000e+00, 7.1007e-16],
         [3.1476e-41, 0.0000e+00, 0.0000e+00]]])
empty(2,2,3,4): tensor([[[[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
          [ 1.4013e-45,  0.0000e+00,  0.0000e+00,  0.0000e+00]],

         [[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  8.4078e-45,  0.0000e+00],
          [ 1.4013e-45,  0.0000e+00, -1.7014e+38,  1.1515e-40]]],


        [[[ 2.6905e-43,  0.0000e+00,  1.1210e-43,  0.0000e+00],
          [ 7.2899e-16,  3.1476e-41,  5.5294e+12,  4.3362e-41],
          [ 3.1389e-43,  0.0000e+00,  3.5873e-43,  0.0000e+00]],

   

In [21]:
# Check the size of the Tensor
print(x.size())
print("shape:",x.shape)

torch.Size([5, 3])
shape: torch.Size([5, 3])


In [24]:
# Check the data type
print(x.dtype)

# specify types, float32 default
x = torch.ones(5,3, dtype=torch.float16)
print(x)

print(x.dtype)

torch.float16
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float16)
torch.float16


In [26]:
# construct from data
x = torch.tensor([5.5, 3])
print(x, x.dtype)

tensor([5.5000, 3.0000]) torch.float32


In [30]:
# requires_grad argument
# This will tell pytorch that it will need to calculate the gradients for this tensor
# later in your optimization steps
# i.e. this is a variable in your model that you want to optimize
# Suppose if I'm desining the position embedding, and i want to keep the position out of the training, then i can set it as False
x = torch.tensor([5.5, 4], requires_grad=True)
print(x)

tensor([5.5000, 4.0000], requires_grad=True)


### Operations with Tensors

In [36]:
# Operations
x = torch.ones(2,2)
y = torch.rand(2,2)

#element wise addition
z = x + y
# torch.add(x,y)

# in place addition, everythin with a trailing underscore is an inplace operation
# i.e. it will modify the variable
# z = y.add_(x)

print(x)
print(y)
print(z)

tensor([[1., 1.],
        [1., 1.]])
tensor([[0.1514, 0.8546],
        [0.6015, 0.1006]])
tensor([[1.1514, 1.8546],
        [1.6015, 1.1006]])


In [37]:
# subtraction
z = x - y
z = torch.sub(x, y)

# multiplication
z = x * y
z = torch.mul(x,y)

# division
z = x / y
z = torch.div(x,y)

In [42]:
# Slicing
x  = torch.rand(5,3)
print(x)
print("x[:,0]", x[:,0]) # all rows 0th column
print("x[0, :]", x[0, :]) # o row all column
print("x[1,1]", x[1,1]) # 1 row 1 column


# Get the actual value if only 1 element in your tensor
print("x[1,1].item()", x[1,1].item())

tensor([[0.2326, 0.7650, 0.1902],
        [0.7890, 0.1312, 0.7895],
        [0.6212, 0.9069, 0.2796],
        [0.2609, 0.1997, 0.6165],
        [0.1663, 0.0283, 0.0931]])
x[:,0] tensor([0.2326, 0.7890, 0.6212, 0.2609, 0.1663])
x[0, :] tensor([0.2326, 0.7650, 0.1902])
x[1,1] tensor(0.1312)
x[1,1].item() 0.13119006156921387


### NumPy
Converting a Torch Tensor to a NumPy array and vice versa is very easy

In [45]:
a = torch.ones(5)
print(a)

# torch to NumPy with .numpy()
b  = a.numpy()
print(b)
print(type(b))

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>


In [46]:
# Careful: If the Tensor is on the CPU (not the GPU),
# both objects will share the same memory location, so changing one
# will also change the other
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


In [48]:
# numpy to torch with .from_numpy(x), or torch.tensor() to copy it
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
c = torch.tensor(a)
print(a)
print(b)
print(c)

# again be careful when modifying
a += 1
print(a)
print(b)
print(c)

[1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


### GPU Support
By default all tensors are created on the CPU. But we can also move them to the GPU (if it's available ), or create them directly on the GPU.

In [49]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [55]:
x = torch.rand(2,2).to(device) # move tensors to GPU device
# x = x.to("cpu")
x = x.to("cuda")
print(x)

x = torch.rand(2,2, device=device)  # or directy create them on GPU
print(x)

tensor([[0.9995, 0.6914],
        [0.7331, 0.2176]], device='cuda:0')
tensor([[0.9149, 0.6454],
        [0.4184, 0.7588]], device='cuda:0')


## 2. Autograd

The autograd package provides automatic differentiation for all operations on Tensors. Generally speaking, torch.autograd is an engine for computing the vector-Jacobian product. It computes partial derivates while applying the chain rule.

Set `requires_grad = True`:

In [56]:
import torch
# requires_grad = True -> tracks all operations on the tensor.
x = torch.randn(3, requires_grad=True) # torch.randn will generate random number which follows normal distribution
y = x + 2

# y was created as a result of an operation, so it has a grad_fn attribute.
# grad_fn: references a Function that has created the Tensor
print(x) # created by the user -> grad_fn is None
print(y)
print(y.grad_fn)

tensor([ 1.4742,  1.7714, -0.1978], requires_grad=True)
tensor([3.4742, 3.7714, 1.8022], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x78df5984bb20>


In [57]:
# Do more operations on y
z = y * y * 3
print(z)
z = z.mean()
print(z)

tensor([36.2093, 42.6714,  9.7439], grad_fn=<MulBackward0>)
tensor(29.5415, grad_fn=<MeanBackward0>)


In [58]:
# Let's compute the gradients with backpropagation
# When we finish our computation we can call .backward() and have all the gradients computed automatically.
# The gradient for this tensor will be accumulated into .grad attribute.
# It is the partial derivate of the function w.r.t. the tensor

print(x.grad)
z.backward()
print(x.grad) # dz/dx

# !!! Careful!!! backward() accumulates the gradient for this tensor into .grad attribute.
# !!! We need to be careful during optimization !!! optimizer.zero_grad()

None
tensor([6.9483, 7.5429, 3.6044])


### Stop a tensor from tracking history
For example during the training loop when we want to update our weights, or after training during evaluation. These operations should not be part of the gradient computation. To prevent this, we can use:

- `x.requires_grad_(False)`
- `x.detach()`
- wrap in `with torch.no_grad():`

In [59]:
# .requires_grad_(...) changes an existing flag in-place.
a = torch.randn(2, 2)
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)

a.requires_grad_(True)
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)

False
None
True
<SumBackward0 object at 0x78df598870d0>


In [60]:
# .detach(): get a new Tensor with the same content but no gradient computation:
a = torch.randn(2, 2, requires_grad=True)
b = a.detach()
print(a.requires_grad)
print(b.requires_grad)

True
False


In [61]:
# wrap in 'with torch.no_grad():'
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
with torch.no_grad():
    b = a ** 2
    print(b.requires_grad)

True
False
