# 00. Pytorch Fundamentals
 
Learning Pytorch through Daniel Bourke's video class: "Learn PyTorch for deep learning in a day. Literally."

Link: https://www.youtube.com/watch?v=Z_ikDlimN6A

His github: https://dbourke.link/pt-github

In [4]:
import pandas as pd
import numpy as np
import torch
import sklearn
import matplotlib
import torchinfo, torchmetrics

## Introduction to tensors
### Creating Tensors

PyTorch tensors are created using `torch.Tensor()` - https://pytorch.org/docs/stable/tensors.html

In [6]:
# scalar
scalar = torch.tensor(7)
scalar

tensor(7)

In [7]:
scalar.ndim

0

In [8]:
scalar.item()

7

In [9]:
vector = torch.tensor([7,7])
vector

tensor([7, 7])

In [10]:
vector.ndim

1

In [11]:
vector.shape

torch.Size([2])

In [12]:
# MATRIX
MATRIX = torch.tensor([
    [7,8],[9,10]
])

In [13]:
MATRIX.ndim

2

In [14]:
MATRIX[1]

tensor([ 9, 10])

In [15]:
MATRIX.shape

torch.Size([2, 2])

In [16]:
# TENSOR
TENSOR = torch.tensor([
    [
        [1, 2 ,3 ],
        [3 , 6, 9],
        [2 , 2 , 6]
    ], 
    [
        [1, 2 , 3],
        [3 , 6, 9],
        [2 , 2 , 6]
    ] 
])
                      

In [17]:
TENSOR.ndim

3

In [18]:
TENSOR.shape

torch.Size([2, 3, 3])

In [19]:
TENSOR[1]


tensor([[1, 2, 3],
        [3, 6, 9],
        [2, 2, 6]])

![Tux, the Linux mascot](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-different-tensor-dimensions.png)

In my case it's a shaped like [2, 3, 3] because I have an array of 2 2D 3x3 Arrays as the tensor

### Random tensors

Why random tensors?

Random tensors are important because the way many neural networks learn is that they start with tensor full of random numbers and then adjust those numbers to better represent the data.

`Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers `

Torch random tensors - https://pytorch.org/docs/stable/generated/torch.rand.html

In [23]:
# Create a random tensor of size (3,4)
random_tensor = torch.rand(3,4)

random_tensor

tensor([[0.3419, 0.2426, 0.6098, 0.2004],
        [0.9822, 0.5209, 0.5324, 0.7372],
        [0.3283, 0.5919, 0.4178, 0.2906]])

In [24]:
random_tensor.ndim

2

In [25]:
random_tensor = torch.rand(5,5,5)

In [26]:
random_tensor.ndim

3

In [27]:
#Create a random tensor with similar shape to an image tensor
random_image_size_tensor = torch.rand(size = (244,244,3)) #height, width colour channel
random_image_size_tensor.shape , random_image_size_tensor.ndim

(torch.Size([244, 244, 3]), 3)

![Illustration of an image encoded in a Tensor](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-tensor-shape-example-of-image.png)

In [29]:
# Create a tensor of all zeros
zeros = torch.zeros(3,3,3)
zeros

tensor([[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]])

In [30]:
# Create a tensor of all ones
ones = torch.ones(3,3,3)
ones

tensor([[[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.],
         [1., 1., 1.]]])

In [31]:
ones.dtype

torch.float32

In [32]:
random_tensor.dtype

torch.float32

In [33]:
# Don't use torch.range() -> depracated
torch.range(0,10)

  torch.range(0,10)


tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [34]:
# Use torch.arange()
one_to_thousand=torch.arange(0,1001,100)
one_to_thousand

tensor([   0,  100,  200,  300,  400,  500,  600,  700,  800,  900, 1000])

In [35]:
# Creating tensors like
ten_zeros = torch.zeros_like(one_to_thousand)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Tensor datatypes

**Note:** Tensor datatypes is one of the main errors you'll run into ith PyTorch & deep learning:
1. Tensors not right datatype
2. Tensors not right shape
3. Tensors not on the right device

In [37]:
#float 32 tensor
float_32_tensor = torch.tensor([3.0 , 6.0,  9.0],
                               dtype=None, # What is the datatype of the tensor 
                               device=None, # Where the computation is being made (CPU, GPU (cuda) or TPU)
                               requires_grad=False ) # Wheter or not to track gradients with this tensors operations 
float_32_tensor ,float_32_tensor.dtype

(tensor([3., 6., 9.]), torch.float32)

In [38]:
float_16_tensor = float_32_tensor.type(torch.float16)
float_16_tensor

tensor([3., 6., 9.], dtype=torch.float16)

In [39]:
new_tensor = float_16_tensor * float_32_tensor
new_tensor, new_tensor.dtype

(tensor([ 9., 36., 81.]), torch.float32)

In [40]:
int_32_tensor = torch.tensor([3 , 6 , 9])
int_32_tensor, int_32_tensor.dtype

(tensor([3, 6, 9]), torch.int64)

In [41]:
float_32_tensor * int_32_tensor

tensor([ 9., 36., 81.])

### Getting information from tensors

1. **Datatype** - to get `dtype` from a tensor, can use `tensor.dtype`
2. **Shape** - to get `shape` from a tensor, can use `tensor.shape`
3. **Device** - to get `device` from a tensor, can use `tensor.device`

In [43]:
random_tensor = torch.rand(3,3,3)

In [44]:
int_32_tensor.dtype, int_32_tensor.shape, int_32_tensor.device

(torch.int64, torch.Size([3]), device(type='cpu'))

### Changing device

#### Processing Devices

**CPU** :`"cpu"` , 
**GPU** : `"cuda"` , `"cuda:X"` (When you have multiple GPUs you can choose wich one by specifying the index),

**Others**: `"mps"` , `"xpu"` , `"xla"` or `"meta"`


*Same logic applies to running models and model training*

In [46]:
# Check if CUDA (GPU) is available
print(torch.cuda.is_available())  # Returns True if CUDA is available, otherwise False

# Check if MPS (Apple Silicon) is available
print(torch.backends.mps.is_available())  # Returns True if MPS is available, otherwise False

True
False


In [47]:
# Define the GPU device (if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Create a tensor and move it to the GPU through function chaining
tensor_gpu = torch.tensor([1, 2, 3]).to(device)

In [48]:
# Create a tensor directly on the GPU 
tensor_gpu = torch.tensor([1, 2, 3], device='cuda')

In [49]:
# If you have multiple GPU's a tensor directly on the GPU 
device = torch.device('cuda:0')  # Use the first GPU
tensor_gpu = torch.tensor([1, 2, 3], device=device)


In [50]:
# Create a tensor on the CPU
tensor_cpu = torch.tensor([1, 2, 3])

# Move it to the GPU (if available)
tensor_gpu = tensor_cpu.to('cuda')

# Move it back to the CPU
tensor_cpu_again = tensor_gpu.to('cpu')

### Manipulationg Tensors (Tensor operations)

Tensor operations include:
* Addition
* Subtraction
* Multiplication (element-wise)
* Division
* Matrix multiplication

#### Arithmetic operations

In [53]:
# Create a tensor and add 10 to it
tensor = torch.rand(3)
tensor + 10

tensor([10.2540, 10.7716, 10.1775])

In [54]:
# Multiply by 10
tensor * 10

tensor([2.5397, 7.7157, 1.7753])

In [55]:
# Subtract by 10
tensor - 10

tensor([-9.7460, -9.2284, -9.8225])

In [56]:
# Division by 10
tensor / 10

tensor([0.0254, 0.0772, 0.0178])

In [57]:
new_tensor = torch.rand(3)

tensor * new_tensor 

print(f"{tensor} * {new_tensor} = {tensor * new_tensor}")

tensor([0.2540, 0.7716, 0.1775]) * tensor([0.3844, 0.0576, 0.9072]) = tensor([0.0976, 0.0445, 0.1610])


#### Pytorch in-built functions

In [59]:
torch.mul(tensor, 10) # Multiplication

tensor([2.5397, 7.7157, 1.7753])

In [60]:
torch.matmul(new_tensor, tensor) # Matrix Multiplication

tensor(0.3032)

In [61]:
torch.div(tensor, 10) # Division

tensor([0.0254, 0.0772, 0.0178])

In [62]:
torch.add(tensor, 10) # addition

tensor([10.2540, 10.7716, 10.1775])

In [63]:
torch.sub(tensor, 10) # subtraction

tensor([-9.7460, -9.2284, -9.8225])

#### Comparison of execution time

Built in functions utilize vectorized operations that take much less time to process

In [65]:
%%time 
value = 0
for i in range(len(tensor)):
    value += tensor[i]+ tensor[i]
print(value)

tensor(2.4061)
CPU times: total: 0 ns
Wall time: 2 ms


In [66]:
%%time 
torch.matmul(tensor, tensor) # Matrix Multiplication

CPU times: total: 0 ns
Wall time: 0 ns


tensor(0.6913)

### Matrix multiplication

Two main ways of performing multiplication in neural networks and deep learning:

1. Element-wise multiplication
2. Matrix multiplication (dot product)

More information on multiplying matrices: https://www.mathsisfun.com/algebra/matrix-multiplying.html

There are two main rules for performing matrix multiplication
1. The **inner dimensions** must match:
* `(3,2) @ (3,2)` won't work
* `(2,3) @ (3,2)` will work
* `(3,2) @ (2,3)` will work
2. The resulting matrix has the shape of the **outer dimensions**:
* `(2,3) @ (3,2)` -> `(2,2)`
* `(3,2) @ (2,3)` -> `(3,3)`


In [68]:
# Example 1
torch.matmul(torch.rand(10,10), torch.rand(10,10)).shape

torch.Size([10, 10])

In [69]:
# Example 2
torch.matmul(torch.rand(2,10), torch.rand(10,2)), torch.matmul(torch.rand(2,10), torch.rand(10,2)).shape

(tensor([[1.8918, 1.8990],
         [1.9147, 2.2229]]),
 torch.Size([2, 2]))

In [70]:
# Example 3 : Error
try:
    torch.matmul(torch.rand(2,10), torch.rand(2,10)).shape
except Exception as error:
    print(error)

mat1 and mat2 shapes cannot be multiplied (2x10 and 2x10)


### One of the most common errors in deep learning: shape errors

In [72]:
# Shapes for matrix multiplication
try:
    tensor_A = torch.tensor([
        [1,2], 
        [7,3] , 
        [5,10]])
        
    tensor_B = torch.tensor([
        [6,2], 
        [7,5] , 
        [5,15]])
    
    torch.mm(tensor_A, tensor_B) 
    # torch.mm is the same as torch.matmul
    #(it's an alias for torch.matmul())

except Exception as error:
    print(error)

mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)


In [73]:
tensor_A.shape

torch.Size([3, 2])

In [74]:
tensor_B.shape

torch.Size([3, 2])

To fix our shape issues, we can maniputalte the shape of one of our tensors using a transpose operation

In [76]:
tensor_B.T # Transposed

tensor([[ 6,  7,  5],
        [ 2,  5, 15]])

In [77]:
tensor_A.shape, tensor_B.T.shape # Inner dimensions are the same !

(torch.Size([3, 2]), torch.Size([2, 3]))

In [78]:
#The matrix multiplication works when tensor_B is transposed

torch.mm(tensor_A, tensor_B.T) # Success!!

tensor([[ 10,  17,  35],
        [ 48,  64,  80],
        [ 50,  85, 175]])

## Finding the min, max, mean , sum, etc (tensor agreggation)

In [80]:
#Create a tensor
x = torch.arange(0,101,10.)
x

tensor([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

In [81]:
xInt = x.type(torch.int32)

In [82]:
# Find the min value
torch.min(x) , x.min() 

(tensor(0.), tensor(0.))

In [83]:
# Find the max value
torch.max(x) , x.max() 

(tensor(100.), tensor(100.))

In [84]:
# Find the mean
torch.mean(x), x.mean() # success

(tensor(50.), tensor(50.))

In [85]:
# Find the mean 
 # it can only be performed on float or complex dtypes
try:
    torch.mean(xInt) , xInt.mean() 
except Exception as error:
    print(error)

mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Int


In [86]:
# Find the sum 
torch.sum(xInt) , xInt.sum()

(tensor(550), tensor(550))

## Finding the position min and max

In [88]:
x

tensor([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

In [89]:
#Find the index of the maxValue
torch.argmax(x), x.argmax() 


(tensor(10), tensor(10))

In [90]:
#Find the index of the minValue
torch.argmin(x), x.argmin() 

(tensor(0), tensor(0))

## Reshaping, stacking, squeezing and unsqueezing tensors

* Resphaping - reshapes an input tensor to a defined shape
* View - return a view of an input tensor of certain shape but keep the same memory as the original tensor
* Stacking - combine multiple tensors on top of each other (vstack) or side by side (hstack)
* Squeeze - removes all `1` dimensions from a tensor
* Unsqueeze - add `1` dimension to a tensor
* Permute - Return a view of the input with dimensions permuted (swapped) in a certain way

In [92]:
# Create a tensor
x = torch.arange(1.,10.)

x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [93]:
# Add an extra dimension
try:
    x_reshaped = x.reshape(1,7) #reshape must be the number of elements factorized
    x_reshaped, x_reshaped.shape
except Exception as error:
    print (error) #This raises an error because 1 x 7 != 9

shape '[1, 7]' is invalid for input of size 9


In [94]:
# Add an extra dimension
x_reshaped = x.reshape(1,9) 
x_reshaped, x_reshaped.shape #This works because 1x9 = 9

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [95]:
# Add an extra dimension
x_reshaped = x.reshape(1,3,3)  #This works because 1x3x3 = 9
x_reshaped, x_reshaped.shape

(tensor([[[1., 2., 3.],
          [4., 5., 6.],
          [7., 8., 9.]]]),
 torch.Size([1, 3, 3]))

In [96]:
# View ( a copy of a tensor with different shape)
z = x.view(3,3)
z , z.shape

(tensor([[1., 2., 3.],
         [4., 5., 6.],
         [7., 8., 9.]]),
 torch.Size([3, 3]))

In [97]:
# Z and X share the same memory,
#even though z is a copy changing it will lead to changes in x
z[0,0] = 10
z , x

(tensor([[10.,  2.,  3.],
         [ 4.,  5.,  6.],
         [ 7.,  8.,  9.]]),
 tensor([10.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]))

In [183]:
# Stack tensors on top of each otherab
x_stacked = torch.stack([x,x,x,x], dim = 1) 
x_stacked

tensor([[10., 10., 10., 10.],
        [ 2.,  2.,  2.,  2.],
        [ 3.,  3.,  3.,  3.],
        [ 4.,  4.,  4.,  4.],
        [ 5.,  5.,  5.,  5.],
        [ 6.,  6.,  6.,  6.],
        [ 7.,  7.,  7.,  7.],
        [ 8.,  8.,  8.,  8.],
        [ 9.,  9.,  9.,  9.]])

In [201]:
x_squeezed = x.squeeze() 
x_squeezed

tensor([10.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [203]:
y = torch.tensor([[[1,2,3,4,5]]])
y , y.shape

(tensor([[[1, 2, 3, 4, 5]]]), torch.Size([1, 1, 5]))

In [205]:
y_squeezed = y.squeeze() #Remove single dimensions for example
y_squeezed, y_squeezed.shape # In this case the y shape was [1,1,5] and it became [5]

(tensor([1, 2, 3, 4, 5]), torch.Size([5]))

In [237]:
y_unsqueezed = y_squeezed.unsqueeze(dim = 0) # Add a single dimension on the 0th dimension (dim = 0)
y_unsqueezed

tensor([[1, 2, 3, 4, 5]])

In [239]:
x_unsqueezed = x.unsqueeze(dim = 1) # Add a single dimension on the 1st dimension (dim = 1)
x_unsqueezed

tensor([[10.],
        [ 2.],
        [ 3.],
        [ 4.],
        [ 5.],
        [ 6.],
        [ 7.],
        [ 8.],
        [ 9.]])

In [255]:
#Permutation
x = torch.randn(2,3,5)
x.size() # = x.shape

torch.Size([2, 3, 5])

In [249]:
torch.permute(x , (2,0,1)).size()  # change the dimensions order
# Instead of 0 -> 2 ,1 -> 0, 2 -> 1 : [2, 3, 5] -> [5, 2, 3]

torch.Size([5, 2, 3])

In [263]:
#Real use case 
x_original =  torch.rand(size = (224,224, 3)) # [height, width, colour_channel] 

x_permuted = x_original.permute(2,0,1)

x_permuted.shape

torch.Size([3, 224, 224])

In [267]:
x_original[0,0,0] = 100

In [269]:
x_permuted[0,0,0]

tensor(100.)

In [271]:
x_permuted[0,0,0] = 10

In [273]:
x_original[0,0,0]

tensor(10.)

## Indexing (selecting data from tensors)
Indexing with Pytorch is similar to indexing with NumPy.

In [279]:
x = torch.arange(1,28).reshape(3,3,3)
x , x.shape

(tensor([[[ 1,  2,  3],
          [ 4,  5,  6],
          [ 7,  8,  9]],
 
         [[10, 11, 12],
          [13, 14, 15],
          [16, 17, 18]],
 
         [[19, 20, 21],
          [22, 23, 24],
          [25, 26, 27]]]),
 torch.Size([3, 3, 3]))

In [281]:
x[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [283]:
x[0, 1] # or x[0][1]

tensor([4, 5, 6])

In [292]:
x[0, 1 ,2] # or x[0][1][2]

tensor(6)

In [294]:
try:
    x[4]
except Exception as error:
    print(f"{error}")

index 4 is out of bounds for dimension 0 with size 3


In [321]:
x[2, :] # : -> select all elements of dimension

tensor([[19, 20, 21],
        [22, 23, 24],
        [25, 26, 27]])

In [323]:
# return the whole array from position 2, 1
x[2, 1, :] 

tensor([22, 23, 24])

In [325]:
# return the array from position 2, 1 starting in the second position
x[2, 1, 1:] 

tensor([23, 24])

## PyTorch tensors & Numpy 
Numpy is a popular scientific python numerical computing library.

And because of this, PyTorch has functionality to interact with it.

* Data in Numpy, want in PyTorch tensor `torch.from_numpy(ndarray)`

* PyTorch tensor -> Numpy -> `torch.Tensor.numpy()`

In [345]:
# Numpy array to tensor
import torch
import numpy as np

array = np.arange(1,8)
tensor = torch.from_numpy(array)
array , array.dtype , tensor

(array([1, 2, 3, 4, 5, 6, 7]),
 dtype('int32'),
 tensor([1, 2, 3, 4, 5, 6, 7], dtype=torch.int32))

In [349]:
#Changing the value of original array does not change tensor 
array = array + 1 
array , tensor

(array([3, 4, 5, 6, 7, 8, 9]),
 tensor([1, 2, 3, 4, 5, 6, 7], dtype=torch.int32))

In [351]:
# Tensor to NumPy array
tensor = torch.ones(7)

numpy_tensor = tensor.numpy()

tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

## Reproducibility (trying to take random out of random)

In short how a neural network learns:

`Start with random numbers -> tensor operations -> update random numbers to try and make them better representations of the data -> again -> again -> again`

To reduce randomness we can use the **same random seed**

In [362]:
torch.rand(3,3) #everytime we run this values will be random

tensor([[0.0688, 0.2926, 0.1974],
        [0.5002, 0.7387, 0.6784],
        [0.5851, 0.0456, 0.1940]])

In [372]:
#Lets make random but reproducible tensors

#Set the random seed
random_seed = 42

#Use manual_seed before "random" operations
torch.manual_seed(random_seed) 
random_tensor_C = torch.rand(2,2)

torch.manual_seed(random_seed)
random_tensor_D = torch.rand(2,2)

random_tensor_D == random_tensor_C

tensor([[True, True],
        [True, True]])

## Running tensors and PyTorch objetcts on the GPUs (and making faster computations)

GPUs = faster computation on numbers, thanks to cuda + nvidia + pytorch working behind the scenes

### Best Practices 
https://pytorch.org/docs/main/notes/cuda.html#best-practices

In [377]:
###
!nvidia-smi

Tue Oct 15 17:10:08 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 552.44                 Driver Version: 552.44         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 2060      WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   58C    P8             13W /  115W |    1158MiB /   6144MiB |     19%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [379]:
torch.cuda.is_available()

True

In [383]:
# Setup device agnostic code
device = 'cuda' if torch.cuda.is_available() else "cpu"
device

'cuda'

In [385]:
# Count number of gpus
torch.cuda.device_count()

1

### Putting tensors and models on the GPU 

In [392]:
#Create tensor (default to the CPU)
tensor = torch.tensor([1,2,3])

#Tensor not on GPU
print(tensor, tensor.device)


tensor([1, 2, 3]) cpu


In [394]:
# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

In [400]:
#If tensor is on GPU, Can't transform it to NumPy
try:
    tensor_on_gpu.numpy()
except Exception as error:
    print(error)

can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.


In [406]:
# To fix this error 

tensor_back_on_CPU = tensor_on_gpu.to('cpu')

tensor_back_on_CPU.device, tensor_back_on_CPU.numpy()

(device(type='cpu'), array([1, 2, 3], dtype=int64))