# Introduction to PyTorch

## What is PyTorch?
[PyTorch](https://pytorch.org/) is an **open-source deep learning framework** that provides a flexible and efficient platform for building and training machine learning and deep learning models. It emphasizes **dynamic computation graphs** (define-by-run), making it intuitive and easy to debug compared to older static-graph frameworks.

---

## Who Developed PyTorch?
PyTorch was developed and is **maintained by Meta AI (formerly Facebook AI Research – FAIR)**, with contributions from the open-source community. PyTorch started in the year 2016.

---

## What Can PyTorch Be Used For?
PyTorch is widely used in both **research** and **production**, including:
- **Computer Vision** (image classification, object detection, segmentation)
- **Natural Language Processing (NLP)** (transformers, text classification, translation, chatbots)
- **Generative Models** (GANs, diffusion models)
- **Reinforcement Learning**
- **Time Series Forecasting**
- Deployment in **mobile, cloud, and production environments**

---

## Other Deep Learning Frameworks
Besides PyTorch, several other popular frameworks exist:
- **TensorFlow** (by Google)  
- **Keras** (high-level API, often running on TensorFlow backend. It currently supports PyTorch backend as well)  
- **MXNet** (Apache)  
- **JAX** (by Google, optimized for research)   

---

## Installation Guide

Go to https://pytorch.org/ and go to the section "Install PyTorch". In the website, based on your system you will get the command to install pytorch.

<img src="https://learn.microsoft.com/en-us/windows/ai/images/tutorials/pytorch/anaconda-setup.png" alt="Alt text" width="1000" height="400">

---

## Advantages of PyTorch

- **Dynamic Computation Graphs:** A dynamic computation graph means the graph is built on-the-fly as your code runs (define-by-run). Which means you can change the model's behaviour at runtime without recompiling the graph. 

- **Automatic Gradient (AutoGrad) Computation**: Autograd is powered by the dynamic computation graph — it records operations as they happen, then replays them backward to compute gradients.

- **Pythonic & Intuitive:** Feels natural to Python developers; integrates well with NumPy and other libraries.

- **Strong Community Support:** Huge open-source ecosystem, lots of tutorials, and rapid research adoption.

- **Production Ready:** PyTorch has TorchScript, ONNX, and TorchServe for deployment.

- **Integration with Hugging Face & Ecosystem:** Many cutting-edge models (e.g., transformers, diffusion models) are built on PyTorch.

- **GPU Acceleration Made Easy:** If the system supports, the PyTorch comes with CUDA. CUDA stands for `Compute Unified Device Architecture`. It's a parallel computing platform and API model created by NVIDIA. Simple `.to("cuda")` or `.cuda()` calls to move models/data to GPU.

- **Widely Adopted in Academia & Industry:** Most state-of-the-art (SOTA) research papers use PyTorch. Many of the world's largest technology companies such as Meta (Facebook), Tesla and Microsoft as well as artificial intelligence research companies such as [OpenAI use PyTorch](https://openai.com/index/openai-pytorch/) to power research and bring machine learning to their products.

In short: PyTorch is the go-to deep learning framework today because of its balance of flexibility, ease of use, and performance.


## PyTorch vs NumPy

- PyTorch has very similar operations (like broadcasting, indexing etc.) like NumPy and that makes PyTorch very easy to learn if you have worked with NumPy before.

- However, PyTorch support GPU computation.

- PyTorch also supports automatic differentiation and keep track of computation graph and operations that created them.

## Importing PyTorch 

you import PyTorch as `import torch` not `import pytorch` 

In [1]:
import torch
torch.__version__

'2.8.0+cu126'

## Introduction to Tensors

Tensors are the fundamental building block of machine learning.

Their job is to represent data in a numerical way.

For example, you could represent an image as a tensor with shape [3, 224, 224] which would mean `[colour_channels, height, width]`, as in the image has 3 colour channels (red, green, blue), a height of 224 pixels and a width of 224 pixels.

![](https://camo.githubusercontent.com/1ed28f5c8dc4e8d8390c00f2ab9407b3dddeab375d3657b70310f6699bdb6890/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6d7264626f75726b652f7079746f7263682d646565702d6c6561726e696e672f6d61696e2f696d616765732f30302d74656e736f722d73686170652d6578616d706c652d6f662d696d6167652e706e67)

The basic building block in pytorch is called `torch.Tensor`. 

Think `torch.Tensor` as `numpy.ndarray` on steroids. The operations are very similar to `numpy.ndarray`. On top of that it supports dynamic computation graph with AutoGrad and GPU computation.

Mathematically, Tensors are nothing but generalization of vectors, matrices. Computationally, Tensors are data containers like arrays.

Tensors are specified by ranks.

- A `rank-0` tensor is nothing but scaler.
- A `rank-1` tensor is a vector.
- A `rank-2` tensor is a matrix. etc.

In [2]:
# scalar
scalar = torch.tensor(7)
scalar

tensor(7)

That means although scalar is a single number, it's of type torch.Tensor.

We can check the dimensions (or rank) of a tensor using the `ndim` attribute.

In [3]:
scalar.ndim

0

now let's see a vector.

A vector is a single dimension tensor but can contain many numbers.

In [4]:
# Vector
vector = torch.tensor([7, 8])
vector

tensor([7, 8])

In [5]:
vector.ndim

1

Another important concept for tensors is their `shape` attribute. The shape tells you how the elements inside them are arranged.

In [6]:
vector.shape

torch.Size([2])

The above returns `torch.Size([2])` which means our vector has a shape of `[2]`. This is because of the two elements we placed inside the square brackets `([7, 8])`.

In [7]:
# Matrix
MATRIX = torch.tensor([[7, 8], 
                       [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [8]:
MATRIX.ndim

2

In [9]:
MATRIX.shape

torch.Size([2, 2])

In [10]:
# Tensor (general n-dimesional array)
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]],
                        [[4, 5, 6],
                        [6, 9, 12],
                        [8, 16, 20]]])
TENSOR

tensor([[[ 1,  2,  3],
         [ 3,  6,  9],
         [ 2,  4,  5]],

        [[ 4,  5,  6],
         [ 6,  9, 12],
         [ 8, 16, 20]]])

In [11]:
TENSOR.ndim

3

In [12]:
TENSOR.shape

torch.Size([2, 3, 3])

The dimensions go outer to inner.

That means there's 1 dimension of 3 by 3.

![](https://camo.githubusercontent.com/1b7db4d1eac472685f864c4c5e3e457c20a8856c52d006b5047126eda0f97540/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6d7264626f75726b652f7079746f7263682d646565702d6c6561726e696e672f6d61696e2f696d616765732f30302d7079746f7263682d646966666572656e742d74656e736f722d64696d656e73696f6e732e706e67)

|  Name  |                                          What is it?                                          |                                  Number of dimensions                                 |
|:------:|:---------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------:|
| scalar | a single number                                                                               | 0                                                                                     |
| vector | a collection of number like 1D array | 1                                                                                     |
| matrix | a 2-dimensional array of numbers                                                              | 2                                                                                     |
| tensor | an n-dimensional array of numbers                                                             | can be any number, a 0-dimension tensor is a scalar, a 1-dimension tensor is a vector |

### Random Tensors

In [13]:
# Create a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.2056, 0.3297, 0.2002, 0.6002],
         [0.9499, 0.8281, 0.7788, 0.9420],
         [0.1265, 0.2694, 0.5041, 0.9766]]),
 torch.float32)

In [14]:
random_tensor.ndim

2

### Zeros and ones

In [15]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [16]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

### Creating a range and tensors like

Sometimes you might want a range of numbers, such as 1 to 10 or 0 to 100.

You can use `torch.arange(start, end, step)` to do so.

Where:

- start = start of range (e.g. 0)
- end = end of range (e.g. 10)
- step = how many steps in between each value (e.g. 1)

In [17]:
# Create a range of values 0 to 10
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Sometimes you might want one tensor of a certain type with the same shape as another tensor.

For example, a tensor of all zeros with the same shape as a previous tensor.

To do so you can use `torch.zeros_like(input)` or `torch.ones_like(input)` which return a tensor filled with zeros or ones in the same shape as the input respectively.

In [18]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [19]:
# Can also create a tensor of ones similar to another tensor
ten_ones = torch.ones_like(input=zero_to_ten) # will have same shape
ten_ones

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

### Tensor datatypes

There are many different [tensor datatypes available in PyTorch](https://docs.pytorch.org/docs/stable/tensors.html#data-types).

Some are specific for CPU and some are better for GPU.

Generally if you see `torch.cuda` anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

The most common type (and generally the default) is `torch.float32` or torch.float.

This is referred to as "32-bit floating point".

But there's also 16-bit floating point (`torch.float16` or `torch.half`) and 64-bit floating point (`torch.float64` or `torch.double`).

There's also 8-bit, 16-bit, 32-bit and 64-bit integers.

In [20]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default 'cpu'
                               requires_grad=False) # if True, operations performed on the tensor are recorded 

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

In [21]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=torch.float16) # torch.half would also work

float_16_tensor.dtype, float_16_tensor.device

(torch.float16, device(type='cpu'))

In [22]:
float_32_tensor_gpu = torch.tensor([5.0, 8.0, 11.0],
                               dtype=None, 
                               device='cuda:0', # device is 'cuda'
                               requires_grad=False)

float_32_tensor_gpu.shape, float_32_tensor_gpu.dtype, float_32_tensor_gpu.device

(torch.Size([3]), torch.float32, device(type='cuda', index=0))

We can also use `.to('cuda')` to convert a CPU native tensor to a GPU one.

In [23]:
another_float_32_tensor_gpu = float_32_tensor.to('cuda')

another_float_32_tensor_gpu.shape, another_float_32_tensor_gpu.dtype, another_float_32_tensor_gpu.device

(torch.Size([3]), torch.float32, device(type='cuda', index=0))

### Tensor operations

In [24]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [25]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

PyTorch also has a bunch of built-in functions like `torch.mul()` (short for multiplication) or `torch.multiply()` and `torch.add()` to perform basic operations.

In [26]:
# Can also use torch functions
torch.mul(tensor, 10)

tensor([10, 20, 30])

One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

PyTorch implements matrix multiplication functionality in the `torch.matmul()` method.

The main two rules for matrix multiplication to remember are:

1. The inner dimensions must match:
- `(3, 2) @ (3, 2)` won't work
- `(2, 3) @ (3, 2)` will work
- `(3, 2) @ (2, 3)` will work

2. The resulting matrix has the shape of the outer dimensions:
- `(2, 3) @ (3, 2)` -> (2, 2)
- `(3, 2) @ (2, 3)` -> (3, 3)


> **Note:** "@" in Python is the symbol for matrix multiplication.

In [27]:
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

a*b  # element wise multiplication [1*4, 2*5, 3*6]; Hadamard product

tensor([ 4, 10, 18])

In [28]:
torch.matmul(a, b) # dot product, will error out as shapes are not aligned

tensor(32)

In [29]:
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

torch.matmul(tensor_A, tensor_B) # (this will error)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

In [30]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


Neural networks are full of matrix multiplications and dot products.

$$ y = x.W^T + b$$

The `torch.nn.Linear()` module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input $x$ and a weights matrix $W$.


Where:

- $x$ is the input to the layer (deep learning is a stack of layers like torch.nn.Linear() and others on top of each other).
- $W$ is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent
- patterns in the data (notice the "T", that's because the weights matrix gets transposed).

- $b$ is the bias term used to slightly offset the weights and inputs.
- $y$ is the output (a manipulation of the input in the hopes to discover patterns in it).

Let's play around with a linear layer.

Try changing the values of `in_features` and `out_features` below and see what happens.

In [31]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input 
                         out_features=6) # out_features = describes outer value 
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


In [32]:
linear.weight, linear.bias

(Parameter containing:
 tensor([[ 0.5406,  0.5869],
         [-0.1657,  0.6496],
         [-0.1549,  0.1427],
         [-0.3443,  0.4153],
         [ 0.6233, -0.5188],
         [ 0.6146,  0.1323]], requires_grad=True),
 Parameter containing:
 tensor([ 0.5224,  0.0958,  0.3410, -0.0998,  0.5451,  0.1045],
        requires_grad=True))

### Aggregations

In [33]:
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [34]:
x.dtype

torch.int64

In [35]:
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


You can also find the index of a tensor where the max or minimum occurs with torch.argmax() and torch.argmin() respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the `softmax activation function`).

In [36]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


### Changing the datatype

A common issue with deep learning operations is having your tensors in different datatypes.

If one tensor is in `torch.float64` and another is in `torch.float32`, you might run into some errors.

But there's a fix.

You can change the datatypes of tensors using `torch.Tensor.type(dtype=None)` where the dtype parameter is the datatype you'd like to use.

First we'll create a tensor and check its datatype (the default is `torch.float32`).

In [37]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

In [38]:
# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

In [39]:
# Create an int8 tensor
tensor_int8 = tensor.type(torch.int8)
tensor_int8

tensor([10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

### Reshaping, stacking, squeezing and unsqueezing

Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.

To do so, some popular methods are:

|            Method           |                                             One-line description                                            |
|:---------------------------:|:-----------------------------------------------------------------------------------------------------------:|
| `torch.reshape(input, shape)` | Reshapes input to shape (if compatible), can also use torch.Tensor.reshape().                               |
| `Tensor.view(shape)`          | Returns a view of the original tensor in a different shape but shares the same data as the original tensor. |
| `torch.stack(tensors, dim=0)` | Concatenates a sequence of tensors along a new dimension (dim), all tensors must be same size.              |
| `torch.squeeze(input)`        | Squeezes input to remove all the dimenions with value 1.                                                    |
| `torch.unsqueeze(input, dim)` | Returns input with a dimension value of 1 added at dim.                                                     |
| `torch.permute(input, dims)`  | Returns a view of the original input with its dimensions permuted (rearranged) to dims.                     |                                                                                    |

In [40]:
# Create a tensor
import torch
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [41]:
# Add an extra dimension
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [42]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

Remember though, changing the view of a tensor with `torch.view()` really only creates a new view of the same tensor.

So changing the view changes the original tensor too.

In [43]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

In [44]:
# Stack tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # try changing dim to dim=1 and see what happens
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.]])

In [45]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
Previous shape: torch.Size([1, 7])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
New shape: torch.Size([7])


In [46]:
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

## Add an extra dimension with unsqueeze
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
Previous shape: torch.Size([7])

New tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
New shape: torch.Size([1, 7])


In [47]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


> **Note:** Because permuting returns a view (shares the same data as the original), the values in the permuted tensor will be the same as the original tensor and if you change the values in the view, it will change the values of the original.

### Indexing (selecting data from tensors)

Sometimes you'll want to select specific data from tensors (for example, only the first column or second row).

To do so, you can use indexing.

If you've ever done indexing on Python lists or NumPy arrays, indexing in PyTorch with tensors is very similar.

In [48]:
# Create a tensor 
import torch
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

Indexing values goes outer dimension -> inner dimension (check out the square brackets).

In [49]:
# Let's index bracket by bracket
print(f"First square bracket:\n{x[0]}") 
print(f"Second square bracket: {x[0][0]}") 
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


In [50]:
# Get all values of 0th dimension and the 0 index of 1st dimension
x[:, 0]

tensor([[1, 2, 3]])

In [51]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[2, 5, 8]])

In [52]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
x[:, 1, 1]

tensor([5])

In [53]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension 
x[0, 0, :] # same as x[0][0]

tensor([1, 2, 3])

### PyTorch tensors & NumPy

Since NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely.

The two main methods you'll want to use for NumPy to PyTorch (and back again) are:

- `torch.from_numpy(ndarray)` - NumPy array -> PyTorch tensor.
- `torch.Tensor.numpy()` - PyTorch tensor -> NumPy array.

In [54]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [55]:
# Change the array, keep the tensor
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [56]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [57]:
# Change the tensor, keep the array the same
tensor = tensor + 1
tensor, numpy_tensor

(tensor([2., 2., 2., 2., 2., 2., 2.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

### Running tensors on GPUs (and making faster computations)

Deep learning algorithms require a lot of numerical operations.

And by default these operations are often done on a CPU (computer processing unit).

However, there's another common piece of hardware called a GPU (graphics processing unit), which is often much faster at performing the specific types of operations neural networks need (matrix multiplications) than CPUs.

Your computer might have one.

If so, you should look to use it whenever you can to train neural networks because chances are it'll speed up the training time dramatically.

> **Note:** When I reference "GPU" throughout this course, I'm referencing a Nvidia GPU with CUDA enabled (CUDA [Compute Unified Device Architecture] is a computing platform and API that allows GPUs be used for general purpose computing & not just graphics).

To check if you've got access to a Nvidia GPU, you can run !nvidia-smi where the ! (also called bang) means "run this on the command line".

In [58]:
!nvidia-smi

Sun Sep 14 12:48:11 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 577.02                 Driver Version: 577.02         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4050 ...  WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   41C    P8              2W /   65W |    1050MiB /   6141MiB |     25%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [59]:
# Check for GPU
import torch
torch.cuda.is_available()

True

In [60]:
# Set device type
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

If the above output "cuda" it means we can set all of our PyTorch code to use the available CUDA device (a GPU) and if it output "cpu", our PyTorch code will stick with the CPU.

> **Note:** In PyTorch, it's best practice to write device agnostic code. This means code that'll run on CPU (always available) or GPU (if available).

If you want to do faster computing you can use a GPU but if you want to do much faster computing, you can use multiple GPUs.

You can count the number of GPUs PyTorch has access to using `torch.cuda.device_count()`.

In [61]:
# Count number of devices
torch.cuda.device_count()

1

In [62]:
# Create tensor (default on CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu, tensor_on_gpu.device

tensor([1, 2, 3]) cpu


(tensor([1, 2, 3], device='cuda:0'), device(type='cuda', index=0))

Let's try using the `torch.Tensor.numpy()` method on our tensor_on_gpu.

In [63]:
# If tensor is on GPU, can't transform it to NumPy (this will error)
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Instead, to get a tensor back to CPU and usable with NumPy we can use Tensor.cpu().

This copies the tensor to CPU memory so it's usable with NumPy.

In [65]:
# Instead, copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu()
tensor_back_on_cpu, tensor_back_on_cpu.device

(tensor([1, 2, 3]), device(type='cpu'))