# 🧠 PyTorch Fundamentals
---
This notebook provides a comprehensive introduction to **PyTorch**, covering essential topics, theory, functions, and practical examples. 

📘 **Official Documentation**: [https://pytorch.org/docs/stable/index.html](https://pytorch.org/docs/stable/index.html)

📘 **Official Pytorch Cheatsheet**: [https://docs.pytorch.org/tutorials/beginner/ptcheat.html](https://docs.pytorch.org/tutorials/beginner/ptcheat.html)

📘 **Notebook Resource**: [https://www.learnpytorch.io/00_pytorch_fundamentals/](https://www.learnpytorch.io/00_pytorch_fundamentals/)

---

## 📑 Contents
1. What is PyTorch?
2. Tensors in PyTorch
3. Tensor Operations
4. Indexing and Slicing
4. Autograd / Gradients
5. Loss Functions
6. Optimization
7. Neural Network Modules
8. GPU Acceleration with CUDA
9. Saving and Loading Models
10. Summary


In [None]:
!nvidia-smi

In [2]:
import torch
print(torch.__version__)
print(torch.cuda.is_available())

2.6.0+cu118
True


## Introduction to Tensors

### Creating Tensor

In [9]:
# scalar

scalar = torch.tensor(7)
scalar

tensor(7)

In [11]:
# dimensions
scalar.ndim

0

In [15]:
scalar.item()       #returns a python int

7

In [12]:
# Vector
vector = torch.tensor([7,7])
vector

tensor([7, 7])

In [16]:
vector.ndim

1

In [17]:
vector.shape

torch.Size([2])

In [20]:
# Matrix
MATRIX = torch.tensor([[7,8],[9,10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

In [21]:
MATRIX.ndim

2

In [23]:
MATRIX.shape

torch.Size([2, 2])

In [25]:
# Tensor
TENSOR = torch.tensor([[[1,2,3],[4,5,6],[7,8,9]]])
TENSOR

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [26]:
TENSOR.ndim

3

In [27]:
TENSOR.shape

torch.Size([1, 3, 3])

## Random Tensors

In [14]:
# 2D
random_tensor = torch.rand(3,4)
random_tensor

tensor([[0.6109, 0.9391, 0.0976, 0.6724],
        [0.4682, 0.1958, 0.3150, 0.9964],
        [0.3927, 0.4063, 0.8833, 0.6341]])

In [10]:
random_tensor.ndim

2

In [16]:
# 3D
random_tensor = torch.rand(4,3,2)
random_tensor

tensor([[[0.0046, 0.9711],
         [0.5808, 0.7848],
         [0.1656, 0.8431]],

        [[0.8961, 0.9369],
         [0.2559, 0.7328],
         [0.8091, 0.4358]],

        [[0.5315, 0.0542],
         [0.2342, 0.3462],
         [0.5998, 0.0997]],

        [[0.0203, 0.3451],
         [0.6138, 0.3920],
         [0.5740, 0.6097]]])

In [12]:
random_tensor.ndim

3

In [17]:
# Tensor of zeros

zeros = torch.zeros(size=(3,3))
zeros

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

In [18]:
# Tensor of ones

ones = torch.ones(size=(3,4))
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [19]:
ones.dtype

torch.float32

In [25]:
# Tensor in a range
nm = torch.arange(0,10,2)
nm

tensor([0, 2, 4, 6, 8])

In [27]:
# Tensor like

ten_zero = torch.zeros_like(nm)
ten_zero

tensor([0, 0, 0, 0, 0])

## Tensor Datatypes

In [35]:
# Float 32 tensor

f32 = torch.tensor([3.0,6.0,9.0],dtype=None)
f32.dtype

torch.float32

In [31]:
# Float 16 tensor

f16 = torch.tensor([3.0,6.0,9.0],dtype=torch.float16)
f16.dtype

torch.float16

### Type conversion

In [33]:
f16_tensor = f32.type(torch.float16)
f16_tensor

tensor([3., 6., 9.], dtype=torch.float16)

## Getting information from tensors
1. Tensor Datatype  - `tensor.dtype`
2. Tensor Shape     - `tensor.shape`
3. Tensor Device    - `tensor.device`

In [38]:
anytensor = torch.rand(3,4)
anytensor

tensor([[0.0752, 0.8115, 0.8622, 0.3307],
        [0.5111, 0.3151, 0.5759, 0.7082],
        [0.1241, 0.9233, 0.7214, 0.2037]])

In [40]:
# Details
print(anytensor)
print('Datatype of tensor: ', anytensor.dtype)
print('Shape of tensor: ', anytensor.shape)
print('Device of tensor: ', anytensor.device)

tensor([[0.0752, 0.8115, 0.8622, 0.3307],
        [0.5111, 0.3151, 0.5759, 0.7082],
        [0.1241, 0.9233, 0.7214, 0.2037]])
Datatype of tensor:  torch.float32
Shape of tensor:  torch.Size([3, 4])
Device of tensor:  cpu


## Manipulating Tensors
### Tensor Operations
- Addition
- Subtraction
- Multplication (Element-wise)
- Division
- Matrix Multiplication 

In [41]:
tensor = torch.tensor([1,2,3])
tensor + 10

tensor([11, 12, 13])

In [42]:
tensor - 5

tensor([-4, -3, -2])

In [43]:
tensor * 80

tensor([ 80, 160, 240])

In [44]:
tensor / 10

tensor([0.1000, 0.2000, 0.3000])

In [None]:
# Matric Multiplication / Dot Product (@)
torch.matmul(tensor,tensor) 
torch.dot(tensor,tensor)

tensor(14)

In [49]:
# Transpose
tensor.T

tensor([1, 2, 3])

## Tensor Aggregation (Min, Max, Mean, Sum)

In [50]:
tensor = torch.arange(0,100,10)
tensor

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [60]:
tensor.min(), torch.min(tensor)

(tensor(0), tensor(0))

In [52]:
tensor.max()

tensor(90)

In [58]:
tensor.mean(dtype=torch.float32)

tensor(45.)

In [59]:
tensor.sum()

tensor(450)

### Finding the positional index

In [61]:
tensor.argmin()

tensor(0)

In [62]:
tensor.argmax()

tensor(9)

## Reshaping, Stacking, Squeezing, Unsqueezing, ... Tensors

- Reshaping:    Changing the arrangement of elements in a tensor into a new shape while maintaining the total number of elements
- View:         Return a view of an input tensor of certain shape but keep the same memory
- Stacking:     Concatenating multiple tensors along a new dimension, increasing the total number of dimensions. (hstack/vstack)
- Squeezing:    Removing dimensions of size 1 from a tensor's shape.
- Unsqueezing:  Adding a dimension of size 1 at a specified position in a tensor's shape.
- Permute:      Return a view of input with dimensions swapped in a certain way

In [40]:
import torch
tensor = torch.arange(1.0,10.)
tensor, tensor.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [4]:
tensor.reshape(3,3)

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [5]:
tensor.view(3,3)

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [None]:
# Horizontal Stack
torch.stack([tensor, tensor, tensor],dim=0).shape

torch.Size([3, 9])

In [25]:
# Vertical Stack
torch.stack([tensor, tensor, tensor],dim=1).shape

torch.Size([9, 3])

In [33]:
print(tensor.reshape(1,9).shape)
s_tensor = tensor.squeeze()
s_tensor.shape

torch.Size([1, 9])


torch.Size([9])

In [39]:
us_tensor = torch.unsqueeze(s_tensor, dim=0)
us_tensor.shape

torch.Size([1, 9])

In [None]:
# permute (shares the same memory as origninal tensor)

tensor = torch.rand(2,3,5)

print(tensor)
print(tensor.shape)

p_tensor = torch.permute(tensor, (2,0,1))
print(p_tensor)
print(p_tensor.shape)

tensor([[[0.7457, 0.0568, 0.3926, 0.6394, 0.1810],
         [0.7443, 0.9939, 0.2039, 0.4428, 0.3932],
         [0.3008, 0.3278, 0.8890, 0.7036, 0.7722]],

        [[0.9955, 0.8555, 0.3286, 0.9879, 0.7837],
         [0.0158, 0.4958, 0.5395, 0.0318, 0.3989],
         [0.4702, 0.7677, 0.3712, 0.3610, 0.0878]]])
torch.Size([2, 3, 5])
tensor([[[0.7457, 0.7443, 0.3008],
         [0.9955, 0.0158, 0.4702]],

        [[0.0568, 0.9939, 0.3278],
         [0.8555, 0.4958, 0.7677]],

        [[0.3926, 0.2039, 0.8890],
         [0.3286, 0.5395, 0.3712]],

        [[0.6394, 0.4428, 0.7036],
         [0.9879, 0.0318, 0.3610]],

        [[0.1810, 0.3932, 0.7722],
         [0.7837, 0.3989, 0.0878]]])
torch.Size([5, 2, 3])


## Indexing (Selecting data from Tensors)


In [60]:
tensor = torch.arange(1,10).reshape(1,3,3)
tensor, tensor.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [48]:
tensor[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [49]:
tensor[0][0]

tensor([1, 2, 3])

In [55]:
tensor[0][1][1]

tensor(5)

In [51]:
tensor[0][2][2]

tensor(9)

In [54]:
tensor[:,1,1]

tensor([5])

In [64]:
tensor[:,:,2]

tensor([[3, 6, 9]])

## Python Tensors and Numpy

#### Numpy Array to Pytorch Tensor

In [68]:
import torch
import numpy as np

array = np.arange(1.0,8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

#### Pytorch Tensor to Numpy Array 
Numpy stores in float64 whereas Torch tensors are float32 by default.

In [73]:
tensor = torch.ones(7)
np_tensor = tensor.numpy()
tensor, np_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

In [74]:
# They have different memory
id(tensor), id(np_tensor)

(3040558081968, 3040558276720)

## Reproducibility 

Reproducibility is the ability to get the exact same results when running the same code with the same inputs multiple times. This is crucial for debugging, comparing models,

In [113]:
import torch

RANDOM_SEED = 42

torch.manual_seed(RANDOM_SEED)
random_tensor_A = torch.rand(3,3)

torch.manual_seed(RANDOM_SEED)
random_tensor_B = torch.rand(3,3)

print(random_tensor_A)
print(random_tensor_B)
print(random_tensor_A == random_tensor_B )



tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009],
        [0.2566, 0.7936, 0.9408]])
tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009],
        [0.2566, 0.7936, 0.9408]])
tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])


# Running Tensors on GPUs

In [115]:
!nvidia-smi

Wed Jun 18 16:48:48 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.94                 Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce GTX 1060 3GB  WDDM  |   00000000:01:00.0  On |                  N/A |
| 24%   54C    P0             26W /  120W |    2228MiB /   3072MiB |     16%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [116]:
import torch

torch.cuda.is_available()

True

In [118]:
# Setup device agnostic code

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

'cuda'

In [119]:
# Count number of devices

torch.cuda.device_count()

1

## Putting tensors & model on the GPU

In [None]:
# Creating Tensor on cpu

tensor = torch.tensor([1,2,3])

print(tensor, tensor.device)

tensor([1, 2, 3]) cpu


In [133]:
# Creating Tensor on GPU

tensor = torch.tensor([1,2,3],device=device)

print(tensor, tensor.device)

tensor([1, 2, 3], device='cuda:0') cuda:0


In [136]:
# Transferring from CPU to GPU

tensor_cpu = torch.tensor([1,2,3])
print(tensor_cpu, tensor.device)

tensor_gpu = tensor_cpu.to(device=device)

print(tensor_gpu, tensor.device)

tensor([1, 2, 3]) cuda:0
tensor([1, 2, 3], device='cuda:0') cuda:0


In [165]:
# Transferring from GPU to CPU

tensor_gpu = torch.tensor([1,2,3],device=device)
print(tensor_gpu, tensor_gpu.device)

tensor_cpu = tensor_gpu.to(device="cpu")

print(tensor_cpu, tensor_cpu.device)

tensor([1, 2, 3], device='cuda:0') cuda:0
tensor([1, 2, 3]) cpu


## Numpy only works with tensors on CPU

In [168]:
tensor_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [170]:
tensor_gpu.cpu().numpy()

array([1, 2, 3], dtype=int64)