## What is PyTorch?
PyTorch is an open source machine learning and deep learning framework.

## What can PyTorch be used for?
PyTorch allows you to manipulate and process data and write machine learning algorithms using Python code.

## Who uses PyTorch?
Many of the world's largest technology companies such as Meta (Facebook), Tesla and Microsoft as well as artificial intelligence research companies such as OpenAI use PyTorch to power research and bring machine learning to their products.

## Why use PyTorch?
Machine learning researchers love using PyTorch. And as of February 2022, PyTorch is the most used deep learning framework on Papers With Code, a website for tracking machine learning research papers and the code repositories attached with them.

PyTorch also helps take care of many things such as GPU acceleration (making your code run faster) behind the scenes.

So you can focus on manipulating data and writing algorithms and PyTorch will make sure it runs fast.

And if companies such as Tesla and Meta (Facebook) use it to build models they deploy to power hundreds of applications, drive thousands of cars and deliver content to billions of people, it's clearly capable on the development front too.

In [1]:
import torch
torch.__version__

'2.6.0+cu124'

## Introduction to tensors(![Tensor docs](https://pytorch.org/docs/stable/tensors.html))
Tensors are the fundamental building block of PyTorch.
Tensors are a generalization of matrices to higher dimensions.
Tensors are similar to NumPy arrays, but they have some key differences:
- Tensors can be used on GPUs (graphics processing units) to accelerate computing.
- Tensors can be used to build deep learning models.
- Tensors can be used to build neural networks.

Their job is to represent data in a numerical way.

For example, you could represent an image as a tensor with shape [3, 224, 224] which would mean [colour_channels, height, width], as in the image has 3 colour channels (red, green, blue), a height of 224 pixels and a width of 224 pixels.

## Creating tensors
The first thing we're going to create is a scalar.
A scalar is a single number and in tensor-speak it's a zero dimension tensor.

In [2]:
scalar = torch.tensor(7)
scalar

tensor(7)

In [3]:
scalar.ndim

0

In [4]:
scalar.item()

7

Okay, now let's see a vector.

A vector is a single dimension tensor but can contain many numbers.

As in, you could have a vector [3, 2] to describe [bedrooms, bathrooms] in your house. Or you could have [3, 2, 2] to describe [bedrooms, bathrooms, car_parks] in your house.

The important trend here is that a vector is flexible in what it can represent (the same with tensors).

In [5]:
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [6]:
vector.ndim

1

In [7]:
vector.shape

torch.Size([2])

In [8]:
MATRIX = torch.tensor([[7, 8],
                       [9, 10]])
print(MATRIX)

tensor([[ 7,  8],
        [ 9, 10]])


In [9]:
MATRIX.ndim

2

In [10]:
MATRIX.shape

torch.Size([2, 2])

In [11]:
MATRIX.size()

torch.Size([2, 2])

In [12]:
TENSOR = torch.tensor([[[1, 2, 3],
                        [4, 5 ,6],
                       [7, 8, 9]]])
TENSOR

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [13]:
TENSOR.ndim

3

In [14]:
TENSOR.shape

torch.Size([1, 3, 3])

In [15]:
TENSOR.size()

torch.Size([1, 3, 3])

<img src="Images/00-pytorch-different-tensor-dimensions.png">

## Random tensors
We've established tensors represent some form of data.

And machine learning models such as neural networks manipulate and seek patterns within tensors.

But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've been doing).

Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

In essence:

Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.

In [16]:
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.2524, 0.3894, 0.2999, 0.3894],
         [0.8470, 0.0857, 0.1151, 0.4692],
         [0.1140, 0.8895, 0.4464, 0.5055]]),
 torch.float32)

In [17]:
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

## Zeros and ones
Sometimes you'll just want to fill tensors with zeros or ones.

This happens a lot with masking (like masking some of the values in one tensor with zeros to let a model know not to learn them).

In [19]:
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [20]:
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

In [21]:
zero_to_ten = torch.arange(start=0, end=10, step=2)
zero_to_ten, zero_to_ten.dtype, zero_to_ten.ndim, zero_to_ten.shape

(tensor([0, 2, 4, 6, 8]), torch.int64, 1, torch.Size([5]))

In [22]:
ten_zeros = torch.zeros_like(input=zero_to_ten)
ten_zeros, ten_zeros.dtype, ten_zeros.ndim, ten_zeros.shape

(tensor([0, 0, 0, 0, 0]), torch.int64, 1, torch.Size([5]))

In [23]:
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded
float_32_tensor, float_32_tensor.dtype, float_32_tensor.ndim, float_32_tensor.shape

(tensor([3., 6., 9.]), torch.float32, 1, torch.Size([3]))

Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are datatype and device issues.

For example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the same format).

Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device).

In [24]:
float_16_tensor = float_32_tensor.type(torch.float16) # torch.half would also work
float_16_tensor, float_16_tensor.dtype, float_16_tensor.ndim, float_16_tensor.shape

(tensor([3., 6., 9.], dtype=torch.float16), torch.float16, 1, torch.Size([3]))

In [25]:
some_tensor = torch.rand(3, 4)

print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is on: {some_tensor.device}")

tensor([[0.1026, 0.3963, 0.6784, 0.9381],
        [0.5713, 0.9725, 0.7737, 0.4507],
        [0.4507, 0.1358, 0.2803, 0.1061]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is on: cpu


Note: When you run into issues in PyTorch, it's very often one to do with one of the three attributes above. So when the error messages show up, sing yourself a little song called "what, what, where":

"what shape are my tensors? what datatype are they and where are they stored? what shape, what datatype, where where where"

## Tensor operations

In [26]:
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [27]:
tensor * 10 # torch.multiply(tensor, 10)

tensor([10, 20, 30])

In [28]:
tensor - 10

tensor([-9, -8, -7])

In [29]:
tensor / 10

tensor([0.1000, 0.2000, 0.3000])

In [30]:
# elementwise operations
tensor * tensor

tensor([1, 4, 9])

In [37]:
torch = torch.tensor([1, 2, 3])
tensor.shape

AttributeError: 'Tensor' object has no attribute 'tensor'

In [33]:
# The in-built torch.matmul() method is faster.
tensor.matmul(tensor)

tensor(14)

In [34]:
tensor @ tensor

tensor(14)

## One of the most common errors in deep learning (shape errors)
Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

We can make matrix multiplication work between tensor_A and tensor_B by making their inner dimensions match.

One of the ways to do this is with a transpose (switch the dimensions of a given tensor).

You can perform transposes in PyTorch using either:

torch.transpose(input, dim0, dim1) - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
tensor.T - where tensor is the desired tensor to transpose.

In [38]:
import torch

tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

In [39]:
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])


In [40]:
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}")
print(f"New shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.T.shape}")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape}")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput Shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])
New shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([2, 3])
Multiplying: torch.Size([3, 2]) * torch.Size([2, 3])
Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output Shape: torch.Size([3, 3])


In [43]:
print(torch.mm(tensor_A, tensor_B.T))

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])


## Linear
Applies a linear transformation to the incoming data:
$$
y = xA^T + b
$$

In [47]:
torch.manual_seed(42)
linear = torch.nn.Linear(in_features=2, out_features=6, bias=True, device=None, dtype=None)

x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output: {output}\n")
print(f"Output shape: {output.shape}\n")

Input shape: torch.Size([3, 2])

Output: tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])



## aggregation

In [48]:
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [49]:
print(f"Maximum: {x.max()}")
print(f"Minimum: {x.min()}")
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Maximum: 90
Minimum: 0
Mean: 45.0
Sum: 450


In [51]:
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(90), tensor(0), tensor(45.), tensor(450))

In [52]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


In [53]:
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

In [55]:
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [56]:
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

## Reshaping, stacking, squeezing and unsqueezing
torch.reshape(input, shape)	Reshapes input to shape (if compatible), can also use torch.Tensor.reshape().

Tensor.view(shape)	Returns a view of the original tensor in a different shape but shares the same data as the original tensor.

torch.stack(tensors, dim=0)	Concatenates a sequence of tensors along a new dimension (dim), all tensors must be same size.

torch.squeeze(input)	Squeezes input to remove all the dimenions with value 1.

torch.unsqueeze(input, dim)	Returns input with a dimension value of 1 added at dim.

torch.permute(input, dims)	Returns a view of the original input with its dimensions permuted (rearranged) to dims.

Why do any of these?

Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make sure the right elements of your tensors are mixing with the right elements of other tensors.

In [58]:
import torch
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [59]:
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [60]:
# Change view (keeps same data as original but changes view)
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

Both `torch.view` and `torch.reshape` are used to reshape tensors in PyTorch, but they have key differences. `torch.view` always returns a view of the original tensor, meaning the reshaped tensor shares the same underlying data, and changes in one reflect in the other. However, it requires the tensor to be contiguous in memory, and will raise an error otherwise—such as when attempting to reshape a transposed tensor without calling `.contiguous()` first. In contrast, `torch.reshape`, introduced in version 0.4, can reshape both contiguous and non-contiguous tensors and attempts to return a view when possible, but may return a copy instead. This behavior is not guaranteed and depends on memory layout. Therefore, you cannot rely on `reshape()` to share data with the original tensor. As per PyTorch developers, if you need a copy, use `clone()`, and if you want to ensure shared storage, use `view()`. In summary, use `reshape()` for general reshaping and `view()` when you explicitly require shared data and the tensor is contiguous.

In [61]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

In [64]:
x_stacked = torch.stack([x, x ,x ,x], dim=0) # try changing dim to dim=1 and see what happens
x_stacked, x_stacked.shape

(tensor([[5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.],
         [5., 2., 3., 4., 5., 6., 7.]]),
 torch.Size([4, 7]))

In [65]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

x_squeezed = x_reshaped.squeeze()
print(f"New tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
Previous shape: torch.Size([1, 7])
New tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
New shape: torch.Size([7])


In [66]:
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

## Add an extra dimension with unsqueeze
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
Previous shape: torch.Size([7])

New tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
New shape: torch.Size([1, 7])


In [67]:
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Original shape: {x_original.shape}")
print(f"Permuted shape: {x_permuted.shape}")

Original shape: torch.Size([224, 224, 3])
Permuted shape: torch.Size([3, 224, 224])


Because permuting returns a view (shares the same data as the original), the values in the permuted tensor will be the same as the original tensor and if you change the values in the view, it will change the values of the original.

## Indexing (selecting data from tensors)

In [68]:
# Create a tensor
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [69]:
# Let's index bracket by bracket
print(f"First square bracket:\n{x[0]}")
print(f"Second square bracket: {x[0][0]}")
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


In [73]:
# Get all values of 0th dimension and the 0 index of 1st dimension
print(x[:, 0])

tensor([[1, 2, 3]])


In [74]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
print(x[:, :, 1])

tensor([[2, 5, 8]])


In [75]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [76]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

## PyTorch tensors & NumPy

In [80]:
import torch
import numpy as np

array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

By default, NumPy arrays are created with the datatype float64 and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above).

However, many PyTorch calculations default to using float32.

So if you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), you can use tensor = torch.from_numpy(array).type(torch.float32)

In [81]:
array = array + 1  # tensor unchanged, but if we use array += 1 tensor will also change.
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [82]:
tensor = torch.ones(7)
numpy_tensor = tensor.numpy()
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [83]:
# Change the tensor, keep the array the same
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))


## Reproducibility (trying to take the random out of random)

As you learn more about neural networks and machine learning, you'll start to discover how much randomness plays a part.

Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so the randomness they create are simulated randomness (though there is debate on this too, but since I'm not a computer scientist, I'll let you find out more yourself).

How does this relate to neural networks and deep learning then?

We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.

In short:

start with random numbers -> tensor operations -> try to make better (again and again and again)

Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.

Why?

So you can perform repeatable experiments.

For example, you create an algorithm capable of achieving X performance.

And then your friend tries it out to verify you're not crazy.

How could they do such a thing?

That's where reproducibility comes in.

In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?

In [84]:
import torch

random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Are they equal? {random_tensor_A == random_tensor_B}")

Tensor A:
tensor([[0.8016, 0.3649, 0.6286, 0.9663],
        [0.7687, 0.4566, 0.5745, 0.9200],
        [0.3230, 0.8613, 0.0919, 0.3102]])

Tensor B:
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
        [0.3743, 0.5220, 0.1336, 0.9666],
        [0.9754, 0.8474, 0.8988, 0.1105]])

Are they equal? tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


In [88]:
import torch
import random

RANDOM_SEED = 42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called
# Without this, tensor_D would be different to tensor_C
torch.manual_seed(RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Are they equal? {random_tensor_C == random_tensor_D}")

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Are they equal? tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])


Key Point: Exact reproducibility across different PyTorch versions, commits, platforms (CPU vs GPU) is not guaranteed, even with the same seed.

However, you can reduce nondeterminism on a specific platform, device, and PyTorch version by doing two things:

Control sources of randomness (seeds).

Configure PyTorch to use deterministic algorithms, which ensures operations return the same result for the same input.

⚠️ Note: Deterministic operations can be slower but are helpful during development (e.g., debugging, testing).



In [89]:
!nvidia-smi

Wed Apr 23 14:30:36 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.70                 Driver Version: 572.70         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3050 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   51C    P8              3W /   60W |     290MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [90]:
# Check for GPU
import torch
torch.cuda.is_available()

True

In [91]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [92]:
# Count number of devices
torch.cuda.device_count()

1

In [93]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='cuda:0')

In [94]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [95]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

In [96]:
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

## Exercise

In [98]:
tensor_A = torch.randn(7, 7)
tensor_A

tensor([[-1.1441,  0.3383,  1.6992,  0.0109, -0.3387, -1.3407, -0.5854],
        [ 0.5362,  1.0563, -1.4692,  1.4332,  0.7440, -0.4816, -1.0495],
        [ 0.6039, -1.7223, -1.3543, -0.4976,  0.4747, -2.5095,  0.4880],
        [ 0.7846,  0.0286,  0.6408, -1.3527,  0.2191,  0.5526, -0.1853],
        [ 0.7528,  0.4048,  0.1785,  0.2649,  0.5886, -0.5797, -0.1691],
        [ 1.9312,  1.0119, -1.4364, -1.1299, -0.1360,  1.6354,  0.6793],
        [ 0.4405,  1.1415,  0.0186, -1.8058,  0.9254, -0.3753,  1.0331]])

In [99]:
tensor_B = torch.randn(1, 7)
tensor_B

tensor([[ 0.8448, -0.1627,  1.3187,  0.5707,  1.2832,  0.0538, -0.3380]])

In [103]:
print(tensor_A.matmul(tensor_B.T))

tensor([[ 9.1637e-01],
        [ 4.4508e-01],
        [-9.7049e-01],
        [ 1.1046e+00],
        [ 1.7379e+00],
        [-1.3885e+00],
        [-1.5606e-03]])


In [105]:
RANDOM_SEED = 0
torch.manual_seed(RANDOM_SEED)
tensor_C = torch.randn(7, 7)

torch.manual_seed(RANDOM_SEED)
tensor_D = torch.randn(1, 7)

print(tensor_C.matmul(tensor_D.T))

tensor([[-3.1132],
        [-0.4052],
        [ 2.2938],
        [-4.9556],
        [-0.9954],
        [ 1.9452],
        [ 2.2410]])


In [106]:
RANDOM_SEED = 0
torch.cuda.manual_seed(RANDOM_SEED)
tensor_E = torch.randn(7, 7).to(device)

torch.cuda.manual_seed(RANDOM_SEED)
tensor_F = torch.randn(1, 7).to(device)

print(tensor_E.matmul(tensor_F.T))

tensor([[-0.5522],
        [ 1.1752],
        [ 0.5535],
        [ 0.4791],
        [-0.0275],
        [ 1.3365],
        [-0.8435]], device='cuda:0')


In [108]:
RANDOM_SEED = 1234
torch.manual_seed(RANDOM_SEED)
tensor_G = torch.randn(2, 3).to(device)

torch.manual_seed(RANDOM_SEED)
tensor_H = torch.randn(2, 3).to(device)

output = tensor_G.matmul(tensor_H.T)
print(output)

tensor([[ 1.1872, -0.7458],
        [-0.7458,  0.6755]], device='cuda:0')


In [109]:
print(f"Min: {output.min()}")
print(f"Max: {output.max()}")

Min: -0.7457990646362305
Max: 1.1872471570968628


In [110]:
tensor_I = torch.randn(1, 1, 1, 10)

torch.manual_seed(7)
tensor_J = tensor_I.squeeze()

print(f"Tensor I: {tensor_I}")
print(f" Tensor I shape: {tensor_I.shape}")
print(f"Tensor J: {tensor_J}")
print(f" Tensor J shape: {tensor_J.shape}")

Tensor I: tensor([[[[ 0.2310,  0.6931, -0.2669,  2.1785,  0.1021, -0.2590, -0.1549,
           -1.3706, -0.1319,  0.8848]]]])
 Tensor I shape: torch.Size([1, 1, 1, 10])
Tensor J: tensor([ 0.2310,  0.6931, -0.2669,  2.1785,  0.1021, -0.2590, -0.1549, -1.3706,
        -0.1319,  0.8848])
 Tensor J shape: torch.Size([10])
