<a href="https://colab.research.google.com/github/rickygrosvenor-pramanick/learn-ml/blob/main/pytorch/pytorch_fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PyTorch Fundamentals

Resources: https://www.learnpytorch.io/00_pytorch_fundamentals/

In [1]:
!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


In [2]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

2.3.1+cu121


## Introduction to Tensors

### Creating Tensors

https://pytorch.org/docs/stable/tensors.html

https://pytorch.org/docs/stable/generated/torch.tensor.html

In [3]:
# Scalar
scalar = torch.tensor(7)
scalar

tensor(7)

In [4]:
scalar.ndim

0

In [5]:
# reverting from tensor to python int
scalar.item()

7

In [6]:
# Vector
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [7]:
vector.ndim

1

In [8]:
vector.shape

torch.Size([2])

In [9]:
# MATRIX
MATRIX = torch.tensor([[7, 8],
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [10]:
MATRIX.ndim

2

In [11]:
MATRIX[0]

tensor([7, 8])

In [12]:
MATRIX.shape

torch.Size([2, 2])

In [13]:
# TENSOR
TENSOR = torch.tensor([[[1, 2, 3, 4],
                        [3, 6, 9 ,12],
                        [2, 4, 6, 8]]])
TENSOR

tensor([[[ 1,  2,  3,  4],
         [ 3,  6,  9, 12],
         [ 2,  4,  6,  8]]])

In [14]:
TENSOR.ndim

3

In [15]:
TENSOR.shape

torch.Size([1, 3, 4])

In [16]:
TENSOR[0]

tensor([[ 1,  2,  3,  4],
        [ 3,  6,  9, 12],
        [ 2,  4,  6,  8]])

In [17]:
TENSOR[0][1]

tensor([ 3,  6,  9, 12])

## Random Tensors
We've established tensors represent some form of data.

And machine learning models such as neural networks manipulate and seek patterns within tensors.

But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've being doing).

Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

In essence:

Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.

We'll get hands on with these steps later on.

For now, let's see how to create a tensor of random numbers.

We can do so using `torch.rand()` and passing in the size parameter.

In [18]:
## Creating Random Tensors with `torch.rand()`
random = torch.rand(size=(1,3,4))
random

tensor([[[0.4810, 0.6869, 0.2154, 0.4532],
         [0.4893, 0.0229, 0.5147, 0.6158],
         [0.2290, 0.6951, 0.7865, 0.1972]]])

In [19]:
random.shape

torch.Size([1, 3, 4])

In [20]:
## Creating a Random Tensor which is similar to an image tensor
random_image = torch.rand(size=(224, 244, 3)) # height, width, colour channels (R, G, B)
random_image.shape

torch.Size([224, 244, 3])

In [21]:
random_image.ndim

3

## Zeros and ones

Sometimes you'll just want to fill tensors with zeros or ones.

This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them).

Let's create a tensor full of zeros with `torch.zeros()`

Again, the `size` parameter comes into play.

In [22]:
# Create a tensor of zeros
zero_tensor = torch.zeros(size=(3,4,2))
zero_tensor

tensor([[[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         [0., 0.]]])

In [23]:
# Create a tensor of ones
ones_tensor = torch.ones(size=(3,4), dtype=int)
ones_tensor

tensor([[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]])

## Creating a range and tensors like

Sometimes you might want a range of numbers, such as 1 to 10 or 0 to 100.

You can use `torch.arange(start, end, step)` to do so.

Where:

    start = start of range (e.g. 0)
    end = end of range (e.g. 10)
    step = how many steps in between each value (e.g. 1)


In [24]:
# Use torch.arange(), torch.range() is deprecated
zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future

# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

  zero_to_ten_deprecated = torch.range(0, 10) # Note: this may return an error in the future


tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



Sometimes you might want one tensor of a certain type with the same shape as another tensor.

For example, a tensor of all zeros with the same shape as a previous tensor.

To do so you can use `torch.zeros_like(input)` or `torch.ones_like(input)` which return a tensor filled with zeros or ones in the same shape as the input respectively.


In [25]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

## Tensor Datatypes
**Note:** Tensor Datatypes is one of the 3 big errors to run into in PyTorch and Deep Learning:

1. Tensors not having the right datatypes for operations - compatibility
2. Tensors not having the right shape for operations
3. Tensors not on the right devices for operations (i.e. one on cpu and one on gpu)


In [26]:
# Float 32 Tensor - Default Datatypes (Single Precision)
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default cpu
                               requires_grad=False) # if True, operations performed on the tensor are recorded
float_32_tensor

tensor([3., 6., 9.])

In [27]:
float_32_tensor.dtype

torch.float32

In [28]:
float_16_tensor = float_32_tensor.type(torch.float16)

float_16_tensor.dtype

torch.float16

Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:

    shape - what shape is the tensor? (some operations require specific shape rules)
    dtype - what datatype are the elements within the tensor stored in?
    device - what device is the tensor stored on? (usually GPU or CPU)

Remember to use the following:
1. `tensor.shape`
2. `tensor.dtype`
3. `tensor.device`

Let's create a random tensor and find out details about it.

In [29]:
# Create a Tensor
some_tensor = torch.rand(size=(3, 4))

# Find out details about it
print(some_tensor)
print(f"Datatype is: {some_tensor.dtype}")
print(f"Shape is: {some_tensor.shape}")
print(f"Device is: {some_tensor.device}")

tensor([[0.1245, 0.6715, 0.9942, 0.8416],
        [0.9077, 0.6311, 0.3031, 0.7863],
        [0.3301, 0.2724, 0.3393, 0.1151]])
Datatype is: torch.float32
Shape is: torch.Size([3, 4])
Device is: cpu


## Manipulating Tensors

In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

    Addition
    Substraction
    Multiplication (element-wise)
    Division
    Matrix multiplication

And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).

### Basic Operations

Let's start with a few of the fundamental operations, addition (+), subtraction (-), mutliplication (*).

They work just as you think they would.

In [30]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10


tensor([11, 12, 13])

In [31]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

Notice how the tensor values above didn't end up being `tensor([110, 120, 130])`, this is because the values inside the tensor don't change unless they're reassigned.

In [32]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [33]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([1, 2, 3])

PyTorch also has a bunch of built-in functions like `torch.mul()` (short for multiplication) and `torch.add()` to perform basic operations.

In [34]:
# Can also use torch functions
torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [35]:
# Original tensor is still unchanged
tensor

tensor([1, 2, 3])

In [36]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


## Matrix Multiplication
**Recall from Linear Algebra**:

    The inner dimensions must match:

    (3, 2) @ (3, 2) won't work
    (2, 3) @ (3, 2) will work
    (3, 2) @ (2, 3) will work

    The resulting matrix has the shape of the outer dimensions:

    (2, 3) @ (3, 2) -> (2, 2)
    (3, 2) @ (2, 3) -> (3, 3)

    Note: "@" in Python is the symbol for matrix multiplication.

PyTorch implements matrix multiplication functionality in the `torch.matmul()` method.


In [37]:
tensor = torch.tensor([x for x in range(1, 4)])
tensor.shape

torch.Size([3])

In [38]:
# Element-wise matrix multiplication
tensor * tensor

tensor([1, 4, 9])

In [39]:
# Matrix multiplication - dot product
torch.matmul(tensor, tensor)

tensor(14)

### Shape Errors

In [40]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

We can make matrix multiplication work between `tensor_A` and `tensor_B` by making their inner dimensions match.

One of the ways to do this is with a transpose (switch the dimensions of a given tensor).

You can perform transposes in PyTorch using either:

    `torch.transpose(input, dim0, dim1)` - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
    
    `tensor.T` - where tensor is the desired tensor to transpose.


In [41]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)


tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [42]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)


tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [43]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


### Tensor Aggregation

In [44]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [46]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error due to x being a tensor of int64s
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")


Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


### Positional Min/Max

You can also find the index of a tensor where the max or minimum occurs with `torch.argmax()` and `torch.argmin()` respectively.

In [47]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")


Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


### Changing Tensor Datatype

As mentioned, a common issue with deep learning operations is having your tensors in different datatypes.

If one tensor is in `torch.float64` and another is in torch.`float32`, you might run into some errors.

But there's a fix.

You can change the datatypes of tensors using `torch.Tensor.type(dtype=None)` where the dtype parameter is the datatype you'd like to use.

First we'll create a tensor and check it's datatype (the default is `torch.float32`).

In [48]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype


torch.float32

In [49]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [50]:
# Create a int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

### Reshaping, Stacking, Squeezing, and Unsqueezing

Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.

* Reshaping - Reshapes an input tensor to a defined shape
* View - Return a view of an input tensor of a certain shape but keep the same memory as the original tensor
* Stacking - Combine multiplie tensors on top of each other (vstack) or side-by-side (hstack).
* Squeeze - Removes all `1` dimensions from a tensor
* Unsqueeze - add a `1` dimension to a target tensor
* Permute - Return a view of the input with dimensions permuted (swapped) in a certain way

In [63]:
# Create a tensor
import torch
x = torch.arange(1., 10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [64]:
# Add an extra dimension
x_reshaped = x.reshape(shape=[1, 9])
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [65]:
# We can also reshape into compatible shapes like such
x_reshaped_2 = x.reshape(shape=(3,3))
x_reshaped_2

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [66]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(size=(1,9))
z, z.shape


(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

Remember though, changing the view of a tensor with `torch.view()` really only creates a new view of the same tensor.

So changing the view changes the original tensor too.

In [67]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.]))

In [68]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # dim=0 is like v stack
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.]])

In [69]:
x_stacked = torch.stack([x, x, x, x], dim=1) # dim=1 is like h stack
x_stacked

tensor([[5., 5., 5., 5.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.],
        [4., 4., 4., 4.],
        [5., 5., 5., 5.],
        [6., 6., 6., 6.],
        [7., 7., 7., 7.],
        [8., 8., 8., 8.],
        [9., 9., 9., 9.]])