<a href="https://colab.research.google.com/github/parthnijh/intro-to-pytorch/blob/main/Introduction_to_Pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import torch
import pandas as pd
import matplotlib.pyplot as plt

# Intro to tensors

## creating  a tensor


In [3]:
# scalar

scalar=torch.tensor(7)
scalar

tensor(7)

In [4]:
scalar.item() ## get tensor back as python int

7

In [5]:
vector=torch.tensor([7,7])
vector

tensor([7, 7])

In [6]:
vector.ndim

1

In [7]:
vector.shape

torch.Size([2])

In [8]:
## matrix
matrix =torch.tensor([[1,2],[3,4]])
matrix.ndim
matrix.shape

torch.Size([2, 2])

In [9]:
TENSOR=torch.tensor([[[1,2,3],[3,6,9],[10,5,4]]])

In [10]:
TENSOR.ndim
TENSOR.shape

torch.Size([1, 3, 3])

In [11]:
TENSOR[0][0]

tensor([1, 2, 3])

# Random Tensors

In [12]:
random_tensors=torch.rand(3,4)
random_tensors

tensor([[0.0959, 0.2881, 0.0152, 0.6888],
        [0.9312, 0.8907, 0.9553, 0.4083],
        [0.0405, 0.5006, 0.4327, 0.7738]])

In [13]:
# Create a random tensor with similar shape to an image tensor
random_tensors=torch.rand(224,224,3)
random_tensors

tensor([[[2.8139e-01, 5.1415e-01, 2.8356e-01],
         [3.6027e-01, 7.9067e-01, 9.1618e-01],
         [8.0335e-01, 4.9817e-01, 2.0782e-01],
         ...,
         [9.9728e-01, 6.7460e-01, 6.2561e-01],
         [3.5746e-01, 8.3760e-01, 2.9762e-01],
         [4.8188e-01, 8.7092e-01, 3.7772e-01]],

        [[1.5687e-01, 3.6145e-01, 2.2563e-01],
         [7.3280e-01, 4.2220e-01, 3.8050e-01],
         [1.1500e-03, 1.3647e-01, 9.8924e-01],
         ...,
         [7.8489e-01, 8.2361e-02, 5.1623e-01],
         [8.1505e-01, 9.0028e-01, 2.8622e-01],
         [4.8607e-01, 3.2484e-01, 4.8690e-01]],

        [[1.7543e-01, 3.1394e-01, 2.0730e-01],
         [9.0008e-01, 9.1673e-01, 6.8814e-01],
         [6.4043e-01, 7.2646e-01, 5.5427e-01],
         ...,
         [3.6660e-01, 8.5652e-01, 2.7333e-01],
         [1.9376e-01, 2.3107e-01, 2.8628e-04],
         [3.6331e-01, 6.3509e-01, 1.3918e-01]],

        ...,

        [[9.3933e-03, 2.2816e-01, 8.5552e-01],
         [6.0085e-01, 9.6411e-01, 9.7818e-01]

# Create a tensor of zeroes

In [14]:
zeroes=torch.zeros(3,4)
zeroes

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [15]:
ones=torch.ones(3,4)
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

# Create A range of tensors

In [16]:
t=torch.arange(1,11,2)
t

tensor([1, 3, 5, 7, 9])

In [17]:
torchzeros=torch.zeros_like(t)
torchzeros

tensor([0, 0, 0, 0, 0])

# tensor datatypes

Tensor datatypes
There are many different tensor datatypes available in PyTorch.

Some are specific for CPU and some are better for GPU.

Getting to know which one can take some time.

Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA).

The most common type (and generally the default) is torch.float32 or torch.float.

This is referred to as "32-bit floating point".

But there's also 16-bit floating point (torch.float16 or torch.half) and 64-bit floating point (torch.float64 or torch.double).

And to confuse things even more there's also 8-bit, 16-bit, 32-bit and 64-bit integers.

Plus more!

Note: An integer is a flat round number like 7 whereas a float has a decimal 7.0.

The reason for all of these is to do with precision in computing.

Precision is the amount of detail used to describe a number.

The higher the precision value (8, 16, 32), the more detail and hence data used to express a number.

This matters in deep learning and numerical computing because you're making so many operations, the more detail you have to calculate on, the more compute you have to use.

So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).

Resources:

See the PyTorch documentation for a list of all available tensor datatypes.
Read the Wikipedia page for an overview of what precision in computing) is.
Let's see how to create some tensors with specific datatypes. We can do so using the dtype parameter

In [18]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

In [19]:
float16=float_32_tensor.type(torch.float16)
float16

tensor([3., 6., 9.], dtype=torch.float16)

Getting information from tensors
Once you've created tensors (or someone else or a PyTorch module has created them for you), you might want to get some information from them.

We've seen these before but three of the most common attributes you'll want to find out about tensors are:

shape - what shape is the tensor? (some operations require specific shape rules)
dtype - what datatype are the elements within the tensor stored in?
device - what device is the tensor stored on? (usually GPU or CPU)
Let's create a random tensor and find out details about it.


In [20]:

# Create a tensor
some_tensor = torch.rand(3, 4)

# Find out details about it
print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # will default to CPU

tensor([[0.2409, 0.8203, 0.5262, 0.5018],
        [0.4165, 0.4152, 0.1742, 0.2942],
        [0.9587, 0.2805, 0.1833, 0.8111]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


Manipulating tensors (tensor operations)
In deep learning, data (images, text, video, audio, protein structures, etc) gets represented as tensors.

A model learns by investigating those tensors and performing a series of operations (could be 1,000,000s+) on tensors to create a representation of the patterns in the input data.

These operations are often a wonderful dance between:

Addition
Substraction
Multiplication (element-wise)
Division
Matrix multiplication
And that's it. Sure there are a few more here and there but these are the basic building blocks of neural networks.

Stacking these building blocks in the right way, you can create the most sophisticated of neural networks (just like lego!).

Basic operations
Let's start with a few of the fundamental operations, addition (+), subtraction (-), mutliplication (*).

In [21]:
tensor=torch.tensor([1,2,3])
tensor+10

tensor([11, 12, 13])

In [22]:
tensor*10

tensor([10, 20, 30])

In [23]:
tensor-10

tensor([-9, -8, -7])

In [24]:
## try out pytorch inbuilt functions
torch.mul(tensor,10)

tensor([10, 20, 30])

In [25]:
torch.add(tensor,10)

tensor([11, 12, 13])

Matrix multiplication (is all you need)
One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication.

PyTorch implements matrix multiplication functionality in the torch.matmul() method.

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:
(3, 2) @ (3, 2) won't work
(2, 3) @ (3, 2) will work
(3, 2) @ (2, 3) will work
The resulting matrix has the shape of the outer dimensions:
(2, 3) @ (3, 2) -> (2, 2)
(3, 2) @ (2, 3) -> (3, 3)
Note: "@" in Python is the symbol for matrix multiplication.

Resource: You can see all of the rules for matrix multiplication using torch.matmul() in the PyTorch documentation.

Let's create a tensor and perform element-wise multiplication and matrix multiplication on it.

In [26]:
torch.matmul(tensor,tensor) ## matrix wise

tensor(14)

In [27]:
tensor*tensor ## element wise multiplication

tensor([1, 4, 9])

In [28]:
torch.matmul(torch.rand(2,3),torch.rand(3,2))

tensor([[0.6732, 0.6725],
        [0.8080, 0.7050]])

## the resulting matrix is mxq if first matrix is mxn and second is pxq  

In [29]:
torch.matmul(torch.rand(2,3),torch.rand(3,2)).shape

torch.Size([2, 2])

In [30]:
# Shapes need to be in the right way
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)
tensor_A.shape
# torch.matmul(tensor_A, tensor_B) # (this will error)

torch.Size([3, 2])

In [31]:
tensor_A.ndim

2

In [32]:
x=torch.arange(0,100,10)
torch.min(x)

tensor(0)

In [33]:
x.min()

tensor(0)

In [34]:
## find the max
torch.max(x)

tensor(90)

In [35]:
x.dtype

torch.int64

In [36]:
torch.mean(x.type(torch.float32))

tensor(45.)

In [37]:
## torch.mean requires a tensor of float32

In [38]:
torch.sum(x)

tensor(450)

# Finding the positional min and max

In [39]:
torch.argmin(x)

tensor(0)

In [40]:
torch.argmax(x)

tensor(9)

# Reshaping ,stacking ,squeezing and unsqueezing tensors

| Method | One-line description |
|--------|----------------------|
| torch.reshape(input, shape) | Reshapes input to shape (if compatible); can also use torch.Tensor.reshape(). |
| Tensor.view(shape) | Returns a view of the original tensor in a different shape but shares the same data as the original tensor. |
| torch.stack(tensors, dim=0) | Concatenates a sequence of tensors along a new dimension (dim); all tensors must be same size. |
| torch.squeeze(input) | Squeezes input to remove all the dimensions with value 1. |
| torch.unsqueeze(input, dim) | Returns input with a dimension of value 1 added at dim. |
| torch.permute(input, dims) | Returns a view of the original input with its dimensions permuted (rearranged) to dims. |


In [41]:
import torch
x=torch.arange(1.,10.)
x.dtype

torch.float32

In [42]:
x.shape

torch.Size([9])

In [43]:
x.ndim

1

In [44]:
x_reshape=x.reshape(3,3)

In [45]:
x_reshape

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [46]:
x_reshape[0]=4
x,x_reshape

(tensor([4., 4., 4., 4., 5., 6., 7., 8., 9.]),
 tensor([[4., 4., 4.],
         [4., 5., 6.],
         [7., 8., 9.]]))

In [47]:
x.is_contiguous()


True

In [48]:
# change the view
z=x.view(1,9)

## changing z changes x as they share the same memory,reshape creates a view if the memory is contigious other wise it creates a copy


In [49]:
z[0]=3

In [50]:
z

tensor([[3., 3., 3., 3., 3., 3., 3., 3., 3.]])

In [51]:
x

tensor([3., 3., 3., 3., 3., 3., 3., 3., 3.])

In [52]:
#Stack tensors on top of each other
xst=torch.stack([z,z,z],dim=2)
xst

tensor([[[3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.],
         [3., 3., 3.]]])

In [53]:
x_reshape.shape

torch.Size([3, 3])

In [54]:
x_reshape.squeeze().shape

torch.Size([3, 3])

In [55]:
x_reshape.shape

torch.Size([3, 3])

### Why you cannot squeeze a `(3, 3)` tensor

`torch.squeeze()` removes **only those dimensions whose size is exactly `1`**.

A tensor with shape:

(3, 3)

has **no dimensions of size 1**, so there is **nothing to remove**.

Because of this, calling:

x.squeeze()

does **not change the shape**, and the result remains:

(3, 3)

PyTorch does this intentionally to avoid destroying meaningful data.  
Only dimensions that do not add information (size = 1) are removed.

**Rule to remember:**

Squeeze removes dimensions of size `1` —  
if a dimension’s size is greater than `1`, it cannot be squeezed.


In [56]:
#unsqueeze simply adds a dimension instead of reducing one

### What does `unsqueeze()` do?

`torch.unsqueeze()` **adds a new dimension of size `1`** at a specified position.

It does **not change the data**, only how the tensor is shaped.

For a tensor with shape:

(3, 3)

Using:

x.unsqueeze(0)

adds a new dimension at position `0`, giving:

(1, 3, 3)

Using:

x.unsqueeze(2)

adds a new dimension at position `2`, giving:

(3, 3, 1)

**Key points to remember:**

- `unsqueeze()` always increases the number of dimensions by `1`
- The new dimension always has size `1`
- No data is copied; only the view changes
- `unsqueeze()` is the inverse of `squeeze()`


In [57]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


### Difference between `view()`, `reshape()`, and `permute()`

**1. `view()`**
- Changes the shape of a tensor **without copying data**
- Returns a new view that shares the **same underlying memory**
- Works **only if the tensor is contiguous in memory**
- Changing values in the view also changes the original tensor

**2. `reshape()`**
- Changes the shape of a tensor
- Tries to return a **view** first
- If a view is not possible, it **creates a copy**
- The original tensor’s shape remains unchanged
- Value changes affect the original tensor **only if** memory is shared

**3. `permute()`**
- Reorders (rearranges) the dimensions of a tensor
- Does **not** change the number of elements
- Returns a view with a **different dimension order**
- Often produces a **non-contiguous** tensor
- Commonly used to switch between layouts (e.g., NCHW ↔ NHWC)

**Key differences at a glance:**

| Operation | Changes shape | Reorders dimensions | Copies data | Requires contiguous memory |
|---------|---------------|---------------------|-------------|----------------------------|
| `view()` | Yes | No | No | Yes |
| `reshape()` | Yes | No | Sometimes | No |
| `permute()` | No | Yes | No | No |

**Rule to remember:**

- Use `view()` when memory is contiguous and you want speed
- Use `reshape()` when you want safety and flexibility
- Use `permute()` when you need to change dimension order


#Indexing (selecting data from tensors)

In [58]:
import torch
x=torch.arange(1,10).reshape(1,3,3)


In [59]:
x

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [60]:
x[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [61]:
x[0,2,2]

tensor(9)

In [62]:
x[:,0]

tensor([[1, 2, 3]])

In [63]:
x[:,:,1]

tensor([[2, 5, 8]])

In [64]:
x[:,1,1]

tensor([5])

In [65]:
x[0,0,:]

tensor([1, 2, 3])

In [66]:
x[:,2,2]

tensor([9])

In [67]:
x[:,:,2]

tensor([[3, 6, 9]])

In [68]:
x[0][0][2]

tensor(3)

In [69]:
x[:,:,0]

tensor([[1, 4, 7]])


PyTorch tensors & NumPy
Since NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely.

The two main methods you'll want to use for NumPy to PyTorch (and back again) are:

torch.from_numpy(ndarray) - NumPy array -> PyTorch tensor.
torch.Tensor.numpy() - PyTorch tensor -> NumPy array

In [70]:
import numpy as np
array=np.arange(1.0,8.0)
tensor=torch.from_numpy(array)
array,tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [71]:
# when converting gfrom numpy to torch tensor it takes default float64

In [72]:
array=array+1
array

array([2., 3., 4., 5., 6., 7., 8.])

In [73]:
tensor

tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64)

In [74]:
# no change in tensor when change in array


In [75]:
tensor=torch.ones(7)
nu=tensor.numpy()

In [76]:
## pytorch t numpy is float32

In [77]:
# again no change when tensor is changed

In [78]:
torch.cuda.is_available()

True

In [79]:
device="cuda"if torch.cuda.is_available() else "cpu"

In [80]:
torch.cuda.device_count()

1

In [81]:
tensor=torch.tensor([1,2,3],device="cpu")
print(tensor,tensor.device)

tensor([1, 2, 3]) cpu


In [82]:
tensorgpu=tensor.to(device)

In [83]:
tensorgpu

tensor([1, 2, 3], device='cuda:0')

In [85]:
tensorgpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [86]:
tensorgpu.cpu().numpy()

array([1, 2, 3])