## 00. PyTorch Fundamentals

Resource notebook:
https://www.learnpytorch.io/00_pytorch_fundamentals/

If you have a question:
https://github.com/mrdbourke/pytorch-deep-learning/discussions
and click "New Discussion" button

import some data science packages like pandas, numpy, and matplotlib.pyplot

In [1]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(torch.__version__)

print("Hello I'm exciting to learn PyTorch")

2.3.0+cu121
Hello I'm exciting to learn PyTorch


2.2.1 is the version of PyTorch and the cu121 is the CUDA version.

In [2]:
!nvidia-smi

Sat Jun  1 13:54:17 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P8              12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Introduction to Tensors

### Creating tensors

PyTorch tensors are created using 'torch.tensor()' =
https://pytorch.org/docs/stable/tensors/html

In [3]:
# scalar
scalar = torch.tensor(7)
scalar

tensor(7)

In [4]:
scalar.ndim

0

In [5]:
# Get tensor back as Python int (Variable names are lowercase)
scalar.item()

7

In [6]:
# Vector (magnitude and direction) (Variable names are lowercase)
vector = torch.tensor([7,7])
vector

tensor([7, 7])

In [7]:
vector.ndim

1

In [8]:
# Two-by-one
vector.shape

torch.Size([2])

In [9]:
# MATRIX (Variable names are uppercase)
MATRIX = torch.tensor([[7, 8], [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [10]:
MATRIX.ndim
# the answer is 2 as we have 2 square brackets

2

In [11]:
# Indexing in tensors starts at '0' and for '0' it shows the first bracket numbers
MATRIX[0]

tensor([7, 8])

In [12]:
# Shape of the MATRIX shows the number of elements of each bracket
MATRIX.shape

torch.Size([2, 2])

In [13]:
# Tensor (Variable names are uppercase)
TENSOR = torch.tensor([[[7, 8, 9], [10, 11, 12], [13, 14, 15]]])
TENSOR

tensor([[[ 7,  8,  9],
         [10, 11, 12],
         [13, 14, 15]]])

In [14]:
TENSOR.ndim

3

In [15]:
TENSOR.shape

torch.Size([1, 3, 3])

In [16]:
TENSOR[0]

tensor([[ 7,  8,  9],
        [10, 11, 12],
        [13, 14, 15]])

In [17]:
TENSOR[0,1]

tensor([10, 11, 12])

In [18]:
TENSOR[0,1,2]

tensor(12)

### RANDOM TENSORS

Why random tensors?  They are important because the way neural networks learn is that they start with tensors full of random numbers and adjust those random numbers to better represent the data.

Start with random numbers -> look at data -> update random numbers -> update random numbers

https://pytorch.org/docs/stable/generated/torch.rand.html

In [19]:
# Create a random tensors of shape or size (3, 4)
random_tensor = torch.rand(3, 4)
random_tensor

tensor([[0.4256, 0.9660, 0.4370, 0.4421],
        [0.4183, 0.9871, 0.1943, 0.5863],
        [0.9140, 0.6782, 0.8927, 0.1802]])

In [20]:
random_tensor.ndim

2

In [21]:
# Create a random tensor with similar shape to an image tensor
random_image_size_tensor = torch.rand(size=(3,224,224)) # colour channel, height, width
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([3, 224, 224]), 3)

In [22]:
# Challenge is to create your own random tensor

In [23]:
my_random_tensor1 = torch.rand(size=(5, 96))
my_random_tensor2 = torch.rand(size=(1,4,32))
my_random_tensor2

tensor([[[0.5273, 0.8586, 0.2849, 0.6407, 0.4241, 0.2747, 0.5739, 0.2134,
          0.7639, 0.2942, 0.3607, 0.3654, 0.8014, 0.8517, 0.5137, 0.2409,
          0.7408, 0.8420, 0.7584, 0.5072, 0.6319, 0.7301, 0.1510, 0.1756,
          0.3215, 0.5639, 0.2101, 0.4462, 0.7855, 0.4120, 0.2717, 0.9194],
         [0.9206, 0.9103, 0.4294, 0.5264, 0.2758, 0.3500, 0.1618, 0.4769,
          0.3827, 0.4339, 0.5708, 0.7805, 0.1047, 0.3576, 0.4420, 0.7927,
          0.4105, 0.6345, 0.2404, 0.3870, 0.4117, 0.2547, 0.2757, 0.9512,
          0.6161, 0.7843, 0.0713, 0.5886, 0.9849, 0.4102, 0.5460, 0.0461],
         [0.7815, 0.6174, 0.9325, 0.7073, 0.3488, 0.1084, 0.5732, 0.9919,
          0.3504, 0.1286, 0.6995, 0.7674, 0.9896, 0.6016, 0.8284, 0.4304,
          0.7643, 0.0065, 0.4707, 0.2720, 0.7803, 0.4891, 0.7635, 0.0424,
          0.3952, 0.8444, 0.2919, 0.0494, 0.2836, 0.6943, 0.5591, 0.7792],
         [0.4872, 0.8183, 0.2505, 0.7580, 0.5918, 0.2014, 0.7044, 0.1746,
          0.7870, 0.9269, 0.0457, 0

In [24]:
my_random_tensor3 = torch.rand(size=(3,4,5))
my_random_tensor3

tensor([[[0.3609, 0.8773, 0.5425, 0.4263, 0.6445],
         [0.6980, 0.6062, 0.9386, 0.9333, 0.1538],
         [0.7775, 0.2810, 0.2521, 0.5469, 0.3214],
         [0.7517, 0.9857, 0.7816, 0.9906, 0.5373]],

        [[0.2719, 0.6813, 0.2321, 0.2083, 0.0855],
         [0.5878, 0.5952, 0.8133, 0.1111, 0.8292],
         [0.8716, 0.0204, 0.7159, 0.2853, 0.4609],
         [0.8440, 0.7792, 0.2071, 0.2593, 0.0777]],

        [[0.0333, 0.3179, 0.4629, 0.2370, 0.7006],
         [0.4372, 0.8137, 0.1931, 0.5746, 0.7620],
         [0.6706, 0.7454, 0.8354, 0.0465, 0.6493],
         [0.7017, 0.8772, 0.5221, 0.3368, 0.5497]]])

In [25]:
my_random_tensor4 = torch.rand(size=(2,4,5,6))
my_random_tensor4

tensor([[[[0.4741, 0.2235, 0.9621, 0.2040, 0.4313, 0.7788],
          [0.3918, 0.3348, 0.7035, 0.9693, 0.8881, 0.8614],
          [0.0998, 0.1077, 0.7291, 0.7097, 0.1558, 0.0216],
          [0.1207, 0.9492, 0.9111, 0.0935, 0.1557, 0.2225],
          [0.3727, 0.7275, 0.0261, 0.7182, 0.3944, 0.2015]],

         [[0.4640, 0.4185, 0.3139, 0.4340, 0.1491, 0.9981],
          [0.4800, 0.7944, 0.7259, 0.7756, 0.9983, 0.4535],
          [0.1320, 0.8448, 0.5625, 0.1238, 0.9664, 0.3870],
          [0.0581, 0.0991, 0.9179, 0.8368, 0.0071, 0.4885],
          [0.3161, 0.1816, 0.4240, 0.2823, 0.0187, 0.9763]],

         [[0.0452, 0.8000, 0.2128, 0.7527, 0.1661, 0.3321],
          [0.4009, 0.1969, 0.0904, 0.1380, 0.5756, 0.9457],
          [0.1627, 0.8181, 0.6308, 0.4276, 0.3276, 0.6927],
          [0.4566, 0.5318, 0.0202, 0.4087, 0.1670, 0.9171],
          [0.6487, 0.5160, 0.7028, 0.0933, 0.5933, 0.4069]],

         [[0.4718, 0.3782, 0.0143, 0.6224, 0.8519, 0.4835],
          [0.4001, 0.3106, 0.4750,

In [26]:
my_random_tensor4.ndim

4

### Zeros and ones


In [27]:
# Create a tensor of all zeros.  Helpful for creating a mask
zeros = torch.zeros(3,4)
zeros


tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [28]:
# Create a tensor of all ones.  Helpful for creating a mask
ONES = torch.ones(3,4)
ONES

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [29]:
# Data type using <variable.dtype>
ONES.dtype

torch.float32

### Create a range of tensors and tensor-like

In [30]:
# Use torch.range() and get deprecated message, use torch.arange()
one_to_ten = torch.arange(1, 11)
my_range = torch.arange(start=1, end=11, step=1)
one_to_ten
my_range

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [31]:
# Creating tensors like
ten_zeros = torch.zeros_like(input=one_to_ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Tensor Data Types

**Note:** Tensor datatype is one of the 3 big issues or errors you will run across when programming PyTorch and Deep Learning
1. Tensors not in the right Data Type
2. Tensor not the right size
3. Tensor not on right device

In [32]:
# Float 32 tensor
float_32_tensor = torch.tensor([3.0, 6.0, 9.0], dtype=torch.float16, # what datatype is the tensor
                                                device=None,
                                                requires_grad=False)

float_32_tensor.dtype

torch.float16

In [33]:
float_32_tensor

tensor([3., 6., 9.], dtype=torch.float16)

In [34]:
float_16_tensor = float_32_tensor.type(torch.float16)
float_16_tensor

tensor([3., 6., 9.], dtype=torch.float16)

In [35]:
float_16_tensor * float_32_tensor

tensor([ 9., 36., 81.], dtype=torch.float16)

In [36]:
int_32_tensor = torch.tensor([3, 6, 9], dtype=torch.long)
int_32_tensor

tensor([3, 6, 9])

In [37]:
float_32_tensor * int_32_tensor

tensor([ 9., 36., 81.], dtype=torch.float16)

### Getting information from tensor attributes
1. Tensors not in in right data type - to get tensor data type, can use 'tensor.dtype'
2. Tensors not in right shape - to get shape from a tensor, can use 'tensor.shape'
3. Tensor not on right device - to get device from tensor, can use 'tensor.device'

In [38]:
# Create a tensor
some_tensor = torch.rand(3,4)
some_tensor

tensor([[0.8139, 0.7369, 0.2037, 0.8726],
        [0.5225, 0.2784, 0.7119, 0.7679],
        [0.8738, 0.3327, 0.3194, 0.4472]])

In [39]:
# Find details about some tensor
print(some_tensor)
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Device of tensor: {some_tensor.device}")

tensor([[0.8139, 0.7369, 0.2037, 0.8726],
        [0.5225, 0.2784, 0.7119, 0.7679],
        [0.8738, 0.3327, 0.3194, 0.4472]])
Datatype of tensor: torch.float32
Shape of tensor: torch.Size([3, 4])
Device of tensor: cpu


In [40]:
x = torch.randn(3,3)
print('Original device:', x.device) # should be 'cpu'

Original device: cpu


In [41]:
x = x.to('cuda')
print('New device:', x.device)

New device: cuda:0


In [42]:
x = x.to('cpu')

### Manipulating Tensors (tensor operations)

Tensor operations include
- Addition
- Subtraction
- Multiplication (element-wise)
- Division
- Matrix multiplication

In [43]:
# Create a tensor
t1 = torch.tensor([1,2,3])
t1 + 10

tensor([11, 12, 13])

In [44]:
# Multiple tensor by 10
t1 = t1*10
t1


tensor([10, 20, 30])

In [45]:
# Subtract
t1 - 10

tensor([ 0, 10, 20])

In [46]:
# Try out PyTorch in-built functions
torch.mul(t1, 10)

tensor([100, 200, 300])

## Matrix multiplication

Two main ways of performing multiplication in neural networks and deep learning:
1: Element-wise multiplication
2: Matrix multiplication (dot-product)

row elements of matrix A to columns of matrix B

https://www.mathsisfun.com/algebra/matrix-multiplying.html


In [47]:
# Element wise multiplication
print(t1, "*", t1)
print(f"Element-wise multiply equals: {t1 * t1}")

tensor([10, 20, 30]) * tensor([10, 20, 30])
Element-wise multiply equals: tensor([100, 400, 900])


In [48]:
# Matrix multiplication
torch.matmul(t1, t1)
print(t1, "dot", t1)
print(f"Dot product equals: {torch.matmul(t1,t1)}")

tensor([10, 20, 30]) dot tensor([10, 20, 30])
Dot product equals: 1400


In [49]:
t1

tensor([10, 20, 30])

In [50]:
# Matrix multiplication by hand:
10*10 + 20*20 + 30*30

1400

In [51]:
%%time
value = 0
for i in range(len(t1)):
  value += t1[i] * t1[i]
  print(value)

tensor(100)
tensor(500)
tensor(1400)
CPU times: user 1.23 ms, sys: 0 ns, total: 1.23 ms
Wall time: 1.24 ms


In [52]:
%%time
torch.matmul(t1,t1)

CPU times: user 45 µs, sys: 0 ns, total: 45 µs
Wall time: 48.4 µs


tensor(1400)

# **One of the most common errors in deep learning (shape errors)**
Because much of deep earning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

In [53]:
import torch
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [2, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

tensor_C = torch.matmul(tensor_A, tensor_B)  # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

We can make matrix multiplication work between tensor_A and tensor_B by making their inner dimensions match.

One of the ways to do this is with a **transpose** (switch the dimensions of a given tensor).

*   torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
*   tensor.T - where tensor is the desired tensor to transpose.

Let's try the latter.

In [54]:
# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [2., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [55]:
# View tensor_A and tensor_B.T
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [2., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [57]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 54.,  60.,  66.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


You can also use torch.mm() which is short for torch.matmul().

In [56]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A, tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 54.,  60.,  66.],
        [ 95., 106., 117.]])

Without the transpose, the rules of matrix multiplication aren't fullfilled and we get an error like above.

**Note:** A matrix multiplication is also referred to as the **dot product** of two matrices.

Neual networks are full of matrix multiplications and dot products.

The torch.nn.Linear() module (we'll see this in action later on), also known as the *feed-forward* layer or fully connected layer, implements a matric multiplication between an input and a weights matrix A.

y = x * A^T + b

Where:


*   *x* is the input to the layer (deep learning is a stack of layers like torch.nn.Linear() and others on top of each other)
*   *A* is the *weights* matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "*T*", that's because the weights matrix gets transposed).

  **Note:** You might also often see *W* or another letter like *X* used to showcase the weights matrix.
*   *b* is the bias term used to slightly offset the weights and inputs.
*   *y* is the output (a manipulation of the input in the hopes to discover patterns in it).

This is a linear function (ypu may have seen something like *y = mx + b* in high school or elsewhere), and can be used to draw a straight line.

Let's play around with a linear layer.

Try changing the values of <b>in_features</b> and out_features below and see what happens.

Do you notice anything to do with the shapes?

In [58]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2,  # in_features = matches inner dimension of input)
                         out_features=6) # out_features = describe outer value
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[ 2.2368,  1.2292,  0.4714,  0.3864,  0.1309,  0.9838],
        [ 3.9513,  2.3627,  0.6018,  0.8727, -0.2832,  1.8631],
        [ 6.7469,  3.1648,  0.4224,  0.6705,  0.5493,  3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


If you've never done it before, matrix multiplication can be a confusing topic at first.

But, after you've played around with it a few times and even cracked open a few neural networks, you'll notice it's everywhere.

**Remember, matrix multiplication is all you need.**

# Finding the min, max, mean, sum, etc. (aggregation)
Now we've seen a few ways to manipulate tensors, let's run through a few ways to aggregate them (go from more values to less values).

First, we'll create a tensor and then find the max, min, mean, and sum of it.

In [59]:
# Create a tensor
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Now let's perform some aggregation.

In [60]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}")  # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


**Note:** You may find some methods such as torch.mean() require tensors to be in torch.float32 (the most common) or another specific datatype, otherwise the operation will fail.

You can also do the same as above with torch methods.

In [61]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(90), tensor(0), tensor(45.), tensor(450))

# Positional min/max
You can also find the index of a tensor where the max or min occurs with `torch.argmax()` and `torch.argmin()` respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the softmax activation function.



In [62]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns the index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


# Change tensor datatype
As mentioned, a common issue with deep learning operations is having your tensors in different datatypes,

If one tensor is in `torch.float64` and another is in `torch.float32`, you might run into some errors.

But, there's a fix.

You can change the datatypes of tensors using `torch.Tensor.type(dtype=None)` where the `dtype` parameter is the datatype you'd like to use.

First, we'll create a tensor and check it's datatype (the default is `torch.float32`).

In [63]:
# Create a tensor and check its datatype
tensor1 = torch.arange(10., 100., 10.)
tensor1.dtype

torch.float32

Now we'll create another tensor the same as before but change its datatype to `torch.float16`.

In [64]:
# Create a float16 tensor
tensor1_float16 = tensor1.type(torch.float16)
tensor1_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

And, we can do something similar to make a `torch.int8` tensor.

In [66]:
# Create an int8 tensor
tensor1_int8 = tensor.type(torch.int8)
tensor1_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

# Reshaping, stacking, squeezing, and unsqueezing
Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.

- Reshaping - reshapes an input tensor to a defined shape
- View - Return a view of an input tensor of certain shape but keep the same memory as the original tensor
- Stacking - combine multiple tensors on top of each other (vstack) or side by side (hstack)
- Squeeze - removes all `1` dimensions from a tensor
- Unsqeeze - add a `1` dimension to a target tensor
- Permute - Return a view of the input with dimensions permutted (swapped) in a certain way

In [67]:
# Let's create a tensor
import torch
x = torch.arange(1., 10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [68]:
# Add an extra dimension
x_reshaped = x.reshape(1, 9)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [69]:
# Change the view
z = x.view(1, 9)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [70]:
# Changing z changes x (because a view of a tensor shares the same memory as the original input)
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.]))

In [71]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0)
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.]])

In [72]:
# torch.squeeze() - removes all single dimensions from a target tensor
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimensions from x_reshaped
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
Previous shape: torch.Size([1, 9])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])
New shape: torch.Size([9])


In [73]:
# torch.permute - rearranges the dimensions of a target tensor in a specified order
x_original = torch.rand(size=(224, 224, 3)) # [height, width, colour_channels]

# Permute the original tensor to rearrange the axis (or dim) order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}") # [colour_channels, height, width]

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


In [74]:
x_original[0, 0, 0] = 728218
x_original[0, 0, 0], x_permuted[0, 0, 0]

(tensor(728218.), tensor(728218.))

# Indexing (selecting data from tensors)
Indexing with PyTorch is similar to indexing with NumPy.

In [75]:
# Create a tensor
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [76]:
# Let's index on our new tensor
x[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [77]:
# Let's index on the middle bracket (dim=1)
x[0][0]

tensor([1, 2, 3])

In [78]:
# Let's on the most inner bracket (last dimension)
x[0][1][1]

tensor(5)

In [79]:
# You can also use ":" to select "all" of a target dimension
x[:, 0]

tensor([[1, 2, 3]])

In [80]:
# Get all values of 0th and 1st dimensions but only index 1 of the 2nd dimension
x[:, :, 1]

tensor([[2, 5, 8]])

In [81]:
# Get all values of the 0 dimension but only 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [82]:
# Get index 0 of the 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :]

tensor([1, 2, 3])

In [83]:
# Index on x to return 9
print(f"Index on x to return 9: {x[0][2][2]}")

# Index on x to return 3, 6, 9
print(f"Index on x to return 3, 6, 9: {x[:, :, 2]}")

Index on x to return 9: 9
Index on x to return 3, 6, 9: tensor([[3, 6, 9]])


# PyTorch tensors & NumPy
NumPy is a popular scientific Python numerical computing library.
And because of this, Python has functionality to interact with it.
- Data in NumPy, want in PyTorch tensor -> `torch.from_numpy(ndarray)`
- PyTorch tensor -> NumPy -> `torch.Tensor.numpy()`



In [84]:
# NumPy array to tensor
import torch
import numpy as np

array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array) # warning: when converting from numpy -> pytorch, pytorch reflects numpy's default datatype of float64 unless specified otherwise
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [85]:
# Change the value of array, what will this do to 'tensor'?
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [86]:
#Tensor to NumPy array
tensor = torch.ones(7)
numpy_tensor = tensor.numpy()
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [87]:
# Change the tensor, what happens to 'numpy_tensor'?
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

# Reproducibility (trying to take the random out of random)
In short how neural network learns:

`start with random numbers -> tensor operations -> update random numbers to try and make them a better representation of the data -> again -> again -> again ...`

To reduce the randomness in neural networks and PyTorch comes the concept of <b>random seed</b>.

Essentially what the random seed does is "flavour" the randomness.

Extra resources for reproducibility:

- https://pytorch.org/docs/stable/notes/randomness.html
- https://en.wikipedia.org/wiki/Random_seed

In [88]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3,4)
random_tensor_B = torch.rand(3,4)

print(random_tensor_A)
print(random_tensor_B)
print(random_tensor_A == random_tensor_B)

tensor([[0.8016, 0.3649, 0.6286, 0.9663],
        [0.7687, 0.4566, 0.5745, 0.9200],
        [0.3230, 0.8613, 0.0919, 0.3102]])
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
        [0.3743, 0.5220, 0.1336, 0.9666],
        [0.9754, 0.8474, 0.8988, 0.1105]])
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


In [89]:
# Let's make some random but reproducible tensors
import torch

# Set the random seed
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

torch.manual_seed(RANDOM_SEED)
random_tensor_D = torch.rand(3, 4)

print(random_tensor_C)
print(random_tensor_D)
print(random_tensor_C == random_tensor_D)

tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])


# Running tensors and PyTorch objects on the GPUs (and making faster computations)
GPUs = faster computations on numbers, thanks to CUDA + NVIDIA hardware + PyTorch wokring behind the scenes to make everything hunky dory (good).

## 1.Getting a GPU
1. Easiest - Use Google Colab for a free GPU (options to upgrade)
2. Use your own GPU - takes a little bit of setup and requires the investment of purchasing a GPU, there's a lot of options..., see this post for what to get:
https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/
3. Use cloud computing = GCP, AWS, Azure, these services allow you to rent computers on the cloud and access them

For 2, 3 PyTorch + GPU drivers (CUDA) takes a little bit of setting up, to do this, refer to PyTorch setup documentation:
https://pytorch.org/get-started/locally/


In [90]:
!nvidia-smi

Sat Jun  1 14:27:39 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   72C    P0              33W /  70W |    105MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

##2. Check for GPU access with PyTorch

In [91]:
# Check for GPU access with PyTorch
import torch
torch.cuda.is_available()

True

For PyTorch since it's capable of running cmpute on the GPU or CPU, it's best practice to setup device agnostic code:

https://pytorch.org/docs/stable/notes/cuda.html#best-practices

E.g. run on GPU if available, else default to CPU

In [92]:
# Setup device aganostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [93]:
# Count number of devices
torch.cuda.device_count()

1

##3. Putting tensors (and models) on the GPU
The reason we want out tensors/models on the GPU is because using a GPU results in faster computations.

In [94]:
# Create a tensor (default on the CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

tensor([1, 2, 3]) cpu


In [95]:
# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

##4. Moving tensors back to the CPU

In [96]:
# If tensors is on GPU, can't transform it to NumPy
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [97]:
# To fix the GPU tensor with NumPy issue, we can first set it to the CPU
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

In [98]:
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

# Excercises & Extra-curriculum
See exercises for this notebook here:
https://www.learnpytorch.io/00_pytorch_fundamentals/#exercises

See the template exercises notebook for this module here:
https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/00_pytorch_fundamentals_exercises.ipynb

In [3]:
from google.colab import drive
drive.mount('Colab_Notebooks')

Mounted at Colab_Notebooks


<function dir>