<a href="https://colab.research.google.com/github/lirichardil/PyTorch/blob/main/00_Pytorch_fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!nvidia-smi # to check the GPU status

Mon Dec  8 20:20:22 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   66C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## 00. PyTorch Fundamentals

Resource: https://www.learnpytorch.io/00_pytorch_fundamentals/

In [None]:
import torch #colab comes with torch pre-installed.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#print(torch.__version__)

## 01. Intro to Tensor

A way to represent data



### Creating Tensors
https://docs.pytorch.org/docs/stable/tensors.html
Tensors are created using torch.tensor

In [None]:
# scalar
scalar = torch.tensor(7)

scalar

tensor(7)

In [None]:
scalar.ndim # has no dimension

0

In [None]:
scalar.item() # get tensor back as python int


7

In [None]:
# Vector
vector = torch.tensor([7,7])
vector.ndim


1

In [None]:
vector.shape

torch.Size([2])

In [None]:
# MATRIX

MATRIX = torch.tensor([[7,8],
                       [9,10]])
MATRIX.ndim


2

In [None]:
MATRIX.shape

torch.Size([2, 2])

In [None]:
# TENSOR
TENSOR = torch.tensor([[[1,2,3],
                        [3,6,9],
                        [2,4,5]]])
TENSOR.ndim


3

In [None]:
TENSOR.ndim

3

In [None]:
TENSOR.shape

torch.Size([1, 3, 3])

we got 1 x 3by3 shaped tensor

In [None]:
TENSOR2 = torch.tensor([[[1,2,3],
                        [3,6,9],
                        [2,4,5]],
                        [[1,2,3],
                        [3,6,9],
                        [2,4,5]]])
TENSOR2.shape

torch.Size([2, 3, 3])

### Random Tensors
Why Random Tensors?
And machine learning models such as neural networks manipulate and seek patterns within tensors.

But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've been doing).

Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

For now, let's see how to create a tensor of random numbers.

In [None]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.3483, 0.2447, 0.1550, 0.1115],
         [0.1759, 0.3014, 0.8638, 0.0659],
         [0.1253, 0.6901, 0.2059, 0.7935]]),
 torch.float32)

The flexibility of torch.rand() is that we can adjust the size to be whatever we want.

For example, say you wanted a random tensor in the common image shape of [224, 224, 3] ([height, width, color_channels]).

In [None]:
# Create a random tensor of size (224, 224, 3)
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

### Zeros and Ones
This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them).

Let's create a tensor full of zeros with torch.zeros()

Again, the size parameter comes into play.

In [None]:
Zeros = torch.zeros(size=(3,4))
Zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [None]:
ones = torch.ones(size=(3,4))
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [None]:
ones.dtype
#default data type, when you create a pytorth data, it starts off as float32.

torch.float32

### Creating a range and tensors like

Sometimes you might want a range of numbers, such as 1 to 10 or 0 to 100.

You can use torch.arange(start, end, step) to do so.

Where:

start = start of range (e.g. 0)

end = end of range (e.g. 10)

step = how many steps in between each value (e.g. 1)

In [None]:
onetoten_step2 = torch.arange(0,10,2)

Sometimes you might want one tensor of a certain type with the same shape as another tensor.

For example, a tensor of all zeros with the same shape as a previous tensor.

In [None]:
onetoten_zerolike = torch.zeros_like(input=onetoten_step2)
print(onetoten_zerolike)

onetoten_onelike = torch.ones_like(input=onetoten_step2)
onetoten_onelike

tensor([0, 0, 0, 0, 0])


tensor([1, 1, 1, 1, 1])

### Tensor DataType

There are many different tensor datatypes available in PyTorch.

Some are specific for CPU and some are better for GPU.

Getting to know which one can take some time.

Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

The most common type (and generally the default) is torch.float32 or torch.float.

This is referred to as "32-bit floating point".

But there's also 16-bit floating point (torch.float16 or torch.half) and 64-bit floating point (torch.float64 or torch.double).

And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.

Note: An integer is a flat round number like 7 whereas a float has a decimal 7.0.

In [None]:
#Float 32 Tensor = a number contains 32 bits in memory
#Defalut data type is float 32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type "cpu" or "cuda"
                               requires_grad=False) # if True, operations perfromed on the tensor are recorded. (pytorch will track the gradient)

float_32_tensor.dtype

#Float 16
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype


torch.float16

or example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the same format).

Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device).


### Getting information from tensors

Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:

  shape - what shape is the tensor? (some operations require specific shape rules)
  dtype - what datatype are the elements within the tensor stored in?
  device - what device is the tensor stored on? (usually GPU or CPU)

<div class="Tensor data">
<b>Note:</b>
Tensor datatypes is one of the big 3 errors we will run into in DL.

1. Tensors not right datatype
2. Tensors not right shape
3. Tensors not on right device
</div>

In [None]:
# Create a tensor
some_tensor = torch.rand(3, 4)

# Find out details about it
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

tensor([[0.4009, 0.8449, 0.8831, 0.5428],
        [0.3072, 0.4211, 0.2349, 0.7760],
        [0.1953, 0.8125, 0.5523, 0.1872]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Manipulating tensors (tensor operations)


In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

Addition
Substraction
Multiplication (element-wise)
Division
Matrix multiplication
And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).

#### Basic operations

 addition (+), subtraction (-), mutliplication (*).

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
print(tensor + 10)
# Multiply it by 10
print(tensor * 10)

#notice that the tensor won't change because it hasn't been re assigned.

tensor([11, 12, 13])
tensor([10, 20, 30])


In [None]:
#to reassign

tensor = tensor - 10
print(tensor)

tensor([-9, -8, -7])


another option to perform operations is through torch tensor class function


In [None]:
torch.mul(tensor, 10)
print(tensor) # mul does not re-assigned

tensor = torch.mul(tensor, 10)
print(tensor) # mul re-assigned

tensor([-9, -8, -7])
tensor([-90, -80, -70])


#### Matrix multiplication (is all you need)


One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

PyTorch implements matrix multiplication functionality in the torch.matmul() method.

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:

- (3, 2) @ (3, 2) won't work
- (2, 3) @ (3, 2) will work
- (3, 2) @ (2, 3) will work

The resulting matrix has the shape of the outer dimensions:
- (2, 3) @ (3, 2) -> (2, 2)
- (3, 2) @ (2, 3) -> (3, 3)

Note: "@" in Python is the symbol for matrix multiplication.

Resource: You can see all of the rules for matrix multiplication using torch.matmul() in the PyTorch documentation.

In [None]:
#let's create a tensor to perform element-wise multiplication and matrix multiplication.

tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

The difference between element-wise multiplication and matrix multiplication is the addition of values.

For our tensor variable with values [1, 2, 3]:

Element-wise -> tensor * tensor = [1 * 1,2 * 2,3 * 3] = [1,4,9]

Matrix Multiplication(aka dot product) = [1 * 1 + 2 * 2 + 3 * 3] = [14]


In [None]:
tensor*tensor

tensor([1, 4, 9])

In [None]:
torch.matmul(tensor, tensor)

# the same thing as torch.mm(tensor,tensor) just writing less code.

tensor(14)

In [None]:
tensor@tensor

tensor(14)

the matmul is faster than @ as it avoids doing operations with "For" loops at all cost.

In [None]:
%%time
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

CPU times: user 992 µs, sys: 15 µs, total: 1.01 ms
Wall time: 1.25 ms


tensor(14)

In [None]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 33 µs, sys: 5 µs, total: 38 µs
Wall time: 40.8 µs


tensor(14)

#### One of the most common errors in deep learning (shape errors)

Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

We can make matrix multiplication work between tensor_A and tensor_B by making their inner dimensions match.

One of the ways to do this is with a transpose (switch the dimensions of a given tensor).

You can perform transposes in PyTorch using either:

torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
tensor.T - where tensor is the desired tensor to transpose.

In [None]:
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)
tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

torch.mm(tensor_A, tensor_B)
# error - size unmatched.

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

one way to fix this is to use the transpose.

*  torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
*   tensor.T - where tensor is the desired tensor to transpose.



In [None]:
tensor_BT = torch.transpose(tensor_B, 0, 1)

print(torch.mm(tensor_A, tensor_BT))


tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])


In [None]:
tensor_B.T,tensor_B.T.shape # the same element being re-arranged

print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match")
print("Output:\n")
output = torch.mm(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")


Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])
New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])
Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match
Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


#### The torch.nn.Linear() module

The torch.nn.Linear() module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input x and a weights matrix A.

y = x*A_t + b

x is the input to the layer (deep learning is a stack of layers like torch.nn.Linear() and others on top of each other).
A is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "T", that's because the weights matrix gets transposed).
Note: You might also often see W or another letter like X used to showcase the weights matrix.
b is the bias term used to slightly offset the weights and inputs.
y is the output (a manipulation of the input in the hopes to discover patterns in it).

This is a linear function (you may have seen something like $y = mx+b$ in high school or elsewhere), and can be used to draw a straight line!

Let's play around with a linear layer.

Try changing the values of in_features and out_features below and see what happens.

Do you notice anything to do with the shapes?

In [None]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
                         out_features=6) # out_features = describes outer value
print(f"tensor_A is {tensor_A}")
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

tensor_A is tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


## Finding the min, max, mean, sum, etc (tensor aggregation)

In [None]:
# Create a tensor

x = torch.arange(0,100,10)

x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:

print(f"Minimum - x.min: {x.min()}")
print(f"Minimum - torch.min(x): {torch.min(x)}")
print(f"Maximum - x.max: {x.max()}")
print(f"Maximum - torch.max(x):{torch.max(x)}")


Minimum - x.min: 0
Minimum - torch.min(x): 0
Maximum - x.max: 90
Maximum - torch.max(x):90


In [None]:
#find the mean
torch.mean(x) # get an error, x is the data type of int64(long), the torch mean function can't work with long


RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Long

In [None]:
#change the data type, either a floating point or complex dtypes.

torch.mean(x.type(torch.float32))
#or
x.type(torch.float32).mean()


tensor(45.)

In [None]:
# find the sum
torch.sum(x), x.sum()

(tensor(450), tensor(450))

## Position min/max


You can also find the index of a tensor where the max or minimum occurs with torch.argmax() and torch.argmin() respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the softmax activation function).

In [None]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


## Reshaping, stacking, squeezing and unsqueezing tensors

* Reshaping - reshapes an input tensor to a defined shape
* View - return a view of an input tensor of certain shape but keep the same memory as the original tensor - or to show a tensor in different shape without changing the oringial shape of the tensor
* Stacking - combine multiple tensors on top of each other (vstack) or side by side (hstack)
* Squeeze - removes all `1` dimensions from a tensor
* Unsqueeze - add a `1` dimension to a target tensor
* permute - return a view of the input with dimensions permuted( swapped) in a certain way

Why do any of these?

Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make sure the right elements of your tensors are mixing with the right elements of other tensors.

Let's try them out.

First, we'll create a tensor.

In [None]:
import torch
x = torch.arange(1., 10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [None]:
# Add an extra dimension
x_reshaped = x.reshape(1, 9) #(reshape into (x row, y column) (1 ))
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [None]:
#x_reshaped = x.reshape(3, 3)
#x_reshaped, x_reshaped.shape

we can also change the view with torch.view()

In [None]:
z = x.view(1,9)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [None]:
# changing z changes x (because a view of a tensor shares the same memory as the oringinal tensor)
z[:,0] = 5
z,x
z.shape,x.shape

(torch.Size([1, 9]), torch.Size([9]))

In [None]:
# stacking of a bunch of sensor

x_stacked = torch.stack([x,x,x,x], dim=0)
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.],
        [5., 2., 3., 4., 5., 6., 7., 8., 9.]])

In [None]:
x_stacked = torch.stack([x,x,x,x], dim=1)
x_stacked

tensor([[5., 5., 5., 5.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.],
        [4., 4., 4., 4.],
        [5., 5., 5., 5.],
        [6., 6., 6., 6.],
        [7., 7., 7., 7.],
        [8., 8., 8., 8.],
        [9., 9., 9., 9.]])

In [None]:
x_stacked = torch.stack([x,x,x,x], dim=-1)
x_stacked

tensor([[5., 5., 5., 5.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.],
        [4., 4., 4., 4.],
        [5., 5., 5., 5.],
        [6., 6., 6., 6.],
        [7., 7., 7., 7.],
        [8., 8., 8., 8.],
        [9., 9., 9., 9.]])

In [None]:
# torch.squeeze removes all single  dimensions from tensor , squeezing the tensor to only have dimensions over 1
print(f"Previous Tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")


Previous Tensor: tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
Previous shape: torch.Size([1, 9])


In [None]:
x_squeezed = x_reshaped.squeeze()
x_squeezed, x_squeezed.shape
print(f"New Tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

New Tensor: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])
New shape: torch.Size([9])


we can see that there is only 1 group of square bracket after squeezed, and the shape is just 9. A vector of 9 elements

#### Refresher on Dim

let say we have a torch.Size of [1,9] - aka 1 row by 9 column, the the dim-0 is the first element of the size, wich is row, and dim-1 is the second element of the size matrix, which is 9, etc.

In [None]:
# torch.unsqueeze() - adds a single dimension to a target at a specif dim.
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

# Add an extra dimension with unsqueeze
x_unsqueezed_dim0 = x_squeezed.unsqueeze(dim=0)
print(f"After shape: {x_unsqueezed_dim0.shape},{x_unsqueezed_dim0}")
x_unsqueezed_dim1 = x_squeezed.unsqueeze(dim=1)
print(f"After shape: {x_unsqueezed_dim1.shape},{x_unsqueezed_dim1}")

Previous tensor: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])
Previous shape: torch.Size([9])
After shape: torch.Size([1, 9]),tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
After shape: torch.Size([9, 1]),tensor([[5.],
        [2.],
        [3.],
        [4.],
        [5.],
        [6.],
        [7.],
        [8.],
        [9.]])


In [None]:
# torch.permute - rearranges the dimensons of a target sensor in a specific order and return a view( share memory)

x_orintial = torch.rand(size=(224,224,3)) # height, width, color channel

# permute the original tensor to rearrange the axis(or dim) order
x_permuted = x_orintial.permute(2,0,1) # shift axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_orintial.shape}")
print(f"New shape: {x_permuted.shape}") # color channel, height ,width.


Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


## Indexing (selecting data from tensors)


Indexing with pytorch is similar to indexing in numpy

In [None]:
 # Create a tensor
 import torch
 x = torch.arange(1,10).reshape(1,3,3)

 x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [None]:
# Let's index on our new tensor

x[0] # index on the first dimension ->> outer bracket
#notice that there is one less bracket. Now the size is 3,3


tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [None]:
# let's index on the middle bracket
x[0][0] #or x[0,0]

tensor([1, 2, 3])

In [None]:
# Let's index on the most inner bracket (lst dimension)

x[0][0][0] # the first element of the 0 dim, the first element of 1 dim and first element of the 2 dim

tensor(1)

In [None]:
x[0][2][2]

tensor(9)

In [None]:
# you can also use ":" to select all of a target dimension

x[:,0]


tensor([[1, 2, 3]])

In [None]:
# Get all values 0th and 1st dimensions but only index 1 of 2nd dimensions

x[:,:,1]

tensor([[2, 5, 8]])

In [None]:
# Get all values of the 0th dim but only the index 1 value of the 1 and 2nd dim
x[:,1,1]

tensor([5])

## PyTorch tensors & NumPy

Numpy is a popular scientific python numerical computing library.

And because of this, PyTorch has functionality to interact with it.

* Data in NumPy array -> PyTorch tensor -> `torch.from_numpy(ndarray) `

* PyTorch tensor -> NumPy ->`torch.Tesnsor.numpy()`


In [None]:
import torch
import numpy as np


array = np.arange(1.0,8.0)

tensor = torch.from_numpy(array)
array,tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

we see that tensor datatype after conversion is float64. That is because array default data type is

In [None]:
array.dtype

dtype('float64')

In [None]:
torch.arange(1.0,8.0).dtype

torch.float32

In [None]:
# to change the data type we can do

tensor = torch.from_numpy(array).type(torch.float32)
tensor.dtype

torch.float32

In [None]:
# change the value of the array, what will this do to tensor

array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]), tensor([1., 2., 3., 4., 5., 6., 7.]))

we can see that only the array changes, the tensor has not changed.

In [None]:
# Tensor to NumPy Array

tensor = torch.ones(7)
numpy_tensor = tensor.numpy()
# the Numpy tensor is float32, the original data type.
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

As we would expect, changing the tensor, the numpy_tensor do not change. They DON"T share memory.

## Reproducbility ( tryign to tkae the random out of random)

As you learn more about neural networks and machine learning, you'll start to discover how much randomness plays a part.

Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so the randomness they create are simulated randomness (though there is debate on this too, but since I'm not a computer scientist, I'll let you find out more yourself).

How does this relate to neural networks and deep learning then?

We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.

In short:

start with random numbers -> tensor operations -> try to make better (again and again and again)

Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.

Why?

So you can perform repeatable experiments.

For example, you create an algorithm capable of achieving X performance.

And then your friend tries it out to verify you're not crazy.

How could they do such a thing?

That's where reproducibility comes in.

In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?

Let's see a brief example of reproducibility in PyTorch.

We'll start by creating two random tensors, since they're random, you'd expect them to be different right?

In [None]:
import torch
#Create two random tensors

random_tensor_A = torch.rand(3,4)
random_tensor_B = torch.rand(3,4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")

Tensor A:
tensor([[0.8016, 0.3649, 0.6286, 0.9663],
        [0.7687, 0.4566, 0.5745, 0.9200],
        [0.3230, 0.8613, 0.0919, 0.3102]])

Tensor B:
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
        [0.3743, 0.5220, 0.1336, 0.9666],
        [0.9754, 0.8474, 0.8988, 0.1105]])



Just as you might've expected, the tensors come out with different values.

But what if you wanted to create two random tensors with the same values.

As in, the tensors would still contain random values but they would be of the same flavour.

That's where torch.manual_seed(seed) comes in, where seed is an integer (like 42 but it could be anything) that flavours the randomness.

Let's try it out by creating some more flavoured random tensors.

In [None]:
import torch
import random

# # Set the random seed
RANDOM_SEED=14 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called
# Without this, tensor_D would be different to tensor_C
torch.random.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.5695, 0.0047, 0.9303, 0.7257],
        [0.8295, 0.7683, 0.0600, 0.1453],
        [0.2924, 0.5292, 0.1466, 0.8305]])

Tensor D:
tensor([[0.5695, 0.0047, 0.9303, 0.7257],
        [0.8295, 0.7683, 0.0600, 0.1453],
        [0.2924, 0.5292, 0.1466, 0.8305]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

## Running Tensors and PyTorch objects on the GPUs ( making faster computations)

GPUs = faster computation on numbers, thanks to CUDA + NVIDIA hardware + PyTorch.

### 1.  Getting a GPU

* Easiest = use Google Colab for a free GPU （option to upgrade)

* Use your own GPU - takes a little to setup
* Use clould computing, GCP, AWS, Azure allow you to rent computer and access them.

For 2,3 PyTorch + GPU driver(CUDA) requires set-up. refer to PyTorch Setup Document.



In [None]:
!nvidia-smi

Mon Dec  8 20:20:40 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   64C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### 2. Check for GPU access with PyTorch


In [None]:
import torch
torch.cuda.is_available()

True

Now, let's say you wanted to setup your code so it ran on CPU or the GPU if it was available.

That way, if you or someone decides to run your code, it'll work regardless of the computing device they're using.

Let's create a device variable to store what kind of device is available.

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'


If the above output "cuda" it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output "cpu", our PyTorch code will stick with the CPU.

Note: In PyTorch, it's best practice to write [device agnostic code](https://pytorch.org/docs/master/notes/cuda.html#device-agnostic-code). This means code that'll run on CPU (always available) or GPU (if available).

In [None]:
# Count number of devices

torch.cuda.device_count()

1

## 3. Putting tensors(and models) on the GPU

Using a GPD results in faster computations.

In [None]:
# Create a tensor (default on CPU)

tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

tensor([1, 2, 3]) cpu


In [None]:
# Move tensor to GPU (if available)

tensor_on_gpu = tensor.to(device)
#this ties back to where we set the device parameter pervious:
#device = "cuda" if torch.cuda.is_available() else "cpu"
# such that regardless if device is gpu, this line of code would work.
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

### 4. Moving Tensors back to CPU

somecases we would like to move tensors back to CPU (working with Numpu, Numpy only works with CPU)

In [None]:
tensor_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_on_cpu

array([1, 2, 3])