**What is a Tensor?**

A tensor is just a multi-dimensional array (like NumPy arrays) but optimized for GPUs. They are the building blocks of PyTorch.

🔹 Scalars (0D Tensor) → Single number

🔹 Vectors (1D Tensor) → List of numbers

🔹 Matrices (2D Tensor) → Table of numbers

🔹 Higher-Dimensional Tensors (3D, 4D, etc.) → Like image/video data

**1. Creating Tensors**:
Manually creating tensors

In [None]:
import torch

In [None]:
# Scalar (0D)

scalar = torch.tensor(5)
print(scalar, scalar.shape)

tensor(5) torch.Size([])


In [None]:
# Vector (1D)
vector = torch.tensor([1, 2, 3])
print(vector, vector.shape)

tensor([1, 2, 3]) torch.Size([3])


In [None]:
# Matrix (2D)
matrix = torch.tensor([ [1, 2, 3], [4, 5, 6] ])
print(matrix, matrix.shape)

tensor([[1, 2, 3],
        [4, 5, 6]]) torch.Size([2, 3])


In [None]:
# 3D Tensor (like an RGB with 3 channels)
tensor_3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(tensor_3d, tensor_3d.shape)

tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]]) torch.Size([2, 2, 2])


**1.1 -  Creating Tensors with Special Values**

In [None]:
# All zeros
zeros = torch.zeros(3, 3)
print(zeros)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


In [None]:
# All ones
ones = torch.ones(2, 2)
print(ones)


tensor([[1., 1.],
        [1., 1.]])


In [None]:
# Random values
random = torch.rand(3, 3)
print(random)

tensor([[0.4193, 0.3982, 0.6006],
        [0.5107, 0.9279, 0.6164],
        [0.3213, 0.0068, 0.5915]])


In [None]:
# Identity matrix
eye = torch.eye(3)
print(eye)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])


In [None]:
# Tensor with specific data type
float_tensor = torch.ones(3, 3, dtype=torch.float32)
print(float_tensor.dtype)

torch.float32


**2. Tensor Operations (Math & Indexing)**

Basic Math Operations

In [None]:
a = torch.tensor([2, 3, 4])
b = torch.tensor([1, 5, 7])

# Element-Wise Operations
print(a + b)
print(a - b)
print(a * b)   # (Element-wise multiplication)
print(a / b)

tensor([ 3,  8, 11])
tensor([ 1, -2, -3])
tensor([ 2, 15, 28])
tensor([2.0000, 0.6000, 0.5714])


In [None]:
A = torch.tensor([[1, 2], [3, 4]])
B = torch.tensor([[5, 6],
                  [7, 8]])

# Element-Wise multiplication
print(A * B)

# Matrix multiplication
matrix_multiplication = torch.matmul(A, B)
print(matrix_multiplication)

tensor([[ 5, 12],
        [21, 32]])
tensor([[19, 22],
        [43, 50]])


Matrix multiplication is a fundamental operation in linear algebra.


matrix_multiplication = torch.matmul(A, B)


To calculate the result, you take the rows of the first matrix (A) and multiply them by the columns of the second matrix (B) in a specific way. Here's the breakdown:


Result (top-left): (15) + (27) = 19

Result (top-right): (16) + (28) = 22

Result (bottom-left): (35) + (47) = 43

Result (bottom-right): (36) + (48) = 50


Therefore, the result of the matrix multiplication is:

[[19, 22],

  [43, 50]]

**2.2 - Indexing & Slicing (Like NumPy)**

In [None]:
x = torch.tensor([[10, 20, 30], [40, 50, 60]])

# Get single element
print(x[0, 1])

# Get a row
print(x[1])

# Get first column
print(x[:, 0])

# Get sub-matrix
print(x[0:2, 1:3])

tensor(20)
tensor([40, 50, 60])
tensor([10, 40])
tensor([[20, 30],
        [50, 60]])


**3. NumPy ↔ PyTorch Conversion**

 Convert NumPy Array to PyTorch Tensor

In [None]:
import numpy as np

In [None]:
numpy_array = np.array([1, 2, 3])
torch_tensor = torch.from_numpy(numpy_array)

print(f'numpy array:', numpy_array)
print(torch_tensor)
print(torch_tensor.dtype)

numpy array: [1 2 3]
tensor([1, 2, 3])
torch.int64


 Convert PyTorch Tensor to NumPy

In [None]:
new_np_array = torch_tensor.numpy()
print(new_np_array)     # [1 2 3]
print(type(new_np_array))    # <class 'numpy.ndarray'>

[1 2 3]
<class 'numpy.ndarray'>


**4. Tensor Reshaping**

Changing Tensor Shape

In [None]:
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x.shape)

torch.Size([2, 3])


 .view() vs .reshape()

In [None]:
# Change shape (2x3 → 3x2)
x_view = x.view(3, 2)
print(f'x_view:', x_view.shape)

# Alternative method
x_reshape = x.reshape(3, 2)
print(f'x_reshape:', x_reshape.shape)

x_view: torch.Size([3, 2])
x_reshape: torch.Size([3, 2])


Key difference: **.view()** returns a view of the same memory, while **.reshape()** creates a new copy if needed.

**.squeeze() & .unsqueeze()** : These are useful when dealing with single dimensions (1D → 2D or 2D → 1D).

**.squeeze() → Removes dimensions of size 1**

In [None]:
x = torch.tensor([[[1, 2, 3]]])    # Shape: [1, 1, 3]
print(x.shape)              # Output: torch.Size([1, 1, 3])

x_squeezed = x.squeeze()
print(x_squeezed.shape)   # Output: torch.Size([3]) (removes [1, 1])

torch.Size([1, 1, 3])
torch.Size([3])


**.unsqueeze() → Adds a new dimension**

In [None]:
y = torch.tensor([1, 2, 3])
print(y.shape)

y_unsqueezed = y.unsqueeze(0)      # Adds a new batch dimension
print(y_unsqueezed.shape)          # Output: torch.Size([1, 3])

y_unsqueezed = y.unsqueeze(1)      # Adds a new column dimension
print(y_unsqueezed.shape)          # Output: torch.Size([3, 1])

torch.Size([3])
torch.Size([1, 3])
torch.Size([3, 1])


Key Differences and When to Use Them:

**squeeze()** removes dimensions of size 1. Use it when you want to simplify a tensor by getting rid of unnecessary single-sized dimensions.

**unsqueeze()** adds a dimension of size 1. Use it when you need to add a dimension that is required by a function or model, such as a batch dimension.

**Broadcasting**

In [None]:
a = torch.tensor([[1], [2], [3]])  # Shape: [3, 1]
b = torch.tensor([4, 5, 6])        # Shape: [3] -> treated as [1, 3]

print(a + b)

tensor([[5, 6, 7],
        [6, 7, 8],
        [7, 8, 9]])


PyTorch automatically **expands smaller tensors to match larger ones**.

**Autograd (Automatic Differentiation)**

Autograd is a core feature in PyTorch that allows automatic calculation of gradients. This is crucial for training models, as gradients are used to adjust weights during optimization.

**requires_grad – Tracking gradients for a tensor**

**backward() function - Computing the gradients**




The **requires_grad** attribute tells PyTorch to track operations on a tensor so that it can compute gradients later when performing the backward pass.


The **backward()** function computes the derivatives (gradients) of tensors that require gradients. These gradients are stored in the .grad attribute of the tensor.


In [None]:
# Create a tensor with requires_grad=True to track computations on it
x = torch.tensor([2.0, 3.0], requires_grad=True)

# Perform some operations on it
y = x**2 + 3*x + 1       # y = x^2 + 3x + 1

print(x.requires_grad)   # True
print(y.requires_grad)   # Output: True, because y depends on x

# Calculate the sum of y to get a scalar loss
loss = y.sum()
# backward() needs a scalar value (a single number) to start backpropagation.
# y is a vector (a tensor with multiple elements).
# Here, we calculate the sum of all the elements in y and store it in the variable loss.  Now, loss is a single number (a scalar), representing the overall "loss" or error.



# Check the gradient for y that is stored in loss variable
loss.backward()

# Access gradients
print(x.grad)  # Output: tensor([7., 9.]), this is dy/dx for each element of x

True
True
tensor([7., 9.])


note:
the **backward()** function, by default, expects the tensor you call it on to be a scalar (a single number)


y = x**2 + 3*x + 1 will not be a single value (scalar) if x is a tensor with more than one element.  It will be a tensor of the same shape as x.

**detach() – Disconnecting from the computation graph**

Sometimes, you may want to stop a tensor from tracking gradients. This can be useful when you don’t want to compute gradients for a specific tensor (e.g., during inference).

The detach() method returns a new tensor that shares data with the original tensor but doesn’t track gradients.



In [None]:
# Create a tensor with requires_grad=True
a = torch.tensor([2.0], requires_grad=True)

# Perform some operations
b = a * 2

# Now, detach 'b' from the computation graph
b_detached = b.detach()


# 'b' tracks gradients, but 'b_detached' doesn't
print(b.requires_grad)           # True
print(b_detached.requires_grad)  # False


True
False


**torch.no_grad() – No gradient tracking for inference**

During inference, no need to compute gradients since you're only making predictions. The torch.no_grad() context manager temporarily disables gradient tracking, saving memory and speeding up computations.

This is useful during evaluation or testing when we don’t need gradients.


In [None]:
# Create a tensor with requires_grad=True
x = torch.tensor([2.0, 3.0], requires_grad=True)

# Without torch.no_grad(), PyTorch will track operations
with torch.no_grad():
  y = x**2 + 3*x + 1

  print(x.requires_grad)    # True
  print(y.requires_grad)    # No gradient tracking here.  Output: False

True
False


**NN Module (Neural Networks)**

In PyTorch, models are typically built by subclassing torch.nn.Module, which makes it easier to define the network architecture, the forward pass, and optimization routines.

**torch.nn.Module – Base class for all models**

To define a custom model, you subclass torch.nn.Module and implement the forward() method, which defines how input data passes through the network.

In [None]:
import torch
import torch.nn as nn

In [None]:
# Define a simple feedforward neural network (FFNN)
class SimpleNN(nn.Module):
  def __init__(self):
    super(SimpleNN, self).__init__()

    # Define layers (Fully connected layer)
    self.fc1 = nn.Linear(2, 4)     # Input: 2 features, Output: 4 features
    self.fc2 = nn.Linear(4, 1)     # Input: 4 features, Output: 1 feature


  def forward(self, x):
     # Forward pass through the network (data flows through layers)
    x = torch.relu(self.fc1(x))     # Apply ReLU activation after the first layer
    x = self.fc2(x)   # Second layer
    return x

# Create an instance of the model
model = SimpleNN()

# Example input (2 features)
input_data = torch.tensor([1.0, 2.0])

# Get the model's output (forward pass)
output = model(input_data)
print(output)


tensor([0.1676], grad_fn=<ViewBackward0>)


**torch.nn.Parameter – Defining parameters in the model**


torch.nn.Parameter is a special type of tensor used to define learnable parameters (like weights and biases) inside a model. PyTorch automatically tracks these parameters for gradient computation.

In [None]:
# Manually defining a learnable parameter
class CustomLayer(nn.Module):
  def __init__(self):
    super(CustomLayer, self).__init__()
    self.my_param = nn.Parameter(torch.randn(3, 3))   # 3x3 learnable matrix

  def forward(self, x):
    return torch.matmul(x, self.my_param)

layer = CustomLayer()
print(layer.my_param)


Parameter containing:
tensor([[-0.2248,  0.8450, -0.0895],
        [-0.5436,  1.0730, -0.6326],
        [-0.2650, -1.7898, -0.8515]], requires_grad=True)


**self.my_param** is a learnable parameter in the model. PyTorch will update this parameter during training.

**torch.nn.Sequential – Simplified model definition**

**nn.Sequential** allows you to build models by stacking layers in a sequential order without having to explicitly define the forward() function.

In [None]:
# Define a simple model using Sequential
model = nn.Sequential(
    nn.Linear(2, 4),    # Input: 2 features, Output: 4 features
    nn.ReLU(),          # ReLU activation
    nn.Linear(4, 1)      # Input: 4 features, Output: 1 feature
)


# Example input
input_data = torch.tensor([1.0, 2.0])

# Forward pass through the model
output = model(input_data)
print(output)

tensor([-0.4216], grad_fn=<ViewBackward0>)


With Sequential, the layers are automatically applied in the order you define them. It's a quick way to define simple models.

**torch.nn.functional – Functional API for operations**

torch.nn.functional provides a collection of functions (like activation functions, loss functions, etc.) that don’t create new layers or parameters but can still be used inside forward() function.

Example: Using ReLU activation and cross-entropy loss from torch.nn.functional:

In [None]:
import torch.nn.functional as F

# Example input
x = torch.tensor([[-1.0, 2.0], [1.0, -2.0]])

# Applying ReLU using functional API
x_relu = F.relu(x)
print(x_relu)

# Cross-entropy loss example (for classification)
y_true = torch.tensor([0, 1])  # True labels
y_pred = torch.tensor([[0.2, 0.8], [0.9, 0.1]])  # Predicted probabilities
loss = F.cross_entropy(y_pred, y_true)
print(loss)

tensor([[0., 2.],
        [1., 0.]])
tensor(1.1043)


In the above example, F.relu() applies the ReLU activation, and F.cross_entropy() computes the cross-entropy loss.



**Optimizers** in PyTorch

Optimizers are crucial in updating model parameters during training to minimize the loss function. PyTorch provides several optimizers, the most common being SGD (Stochastic Gradient Descent) and Adam.

**torch.optim.SGD – Stochastic Gradient Descent**

SGD is one of the most commonly used optimization algorithms in machine learning. It updates the parameters by computing the gradient of the loss with respect to the model parameters and moving in the opposite direction of the gradient (i.e., minimizing the loss).

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

In [None]:
# Define a simple model
model = nn.Linear(2, 2)  # A simple Linear model

# Define a simple loss function
criterion = nn.MSELoss()

# Define the optimizer: Stochastic Gradient Descent
optimizer = optim.SGD(model.parameters(), lr=0.01)    # learning rate = 0.01

# Dummy data
data = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
target = torch.tensor([[1.0, 0.0], [0.0, 1.0]])

# Training loop
for epoch in range(100):
  # Zero the gradients before the backward pass
  optimizer.zero_grad()

  # Forward pass: Compute predicted y by passing x to the model
  output = model(data)

  # Compute the loss
  loss = criterion(output, target)

  # Backward pass: Compute gradients
  loss.backward()

  # Update model parameters using optimizer
  optimizer.step()

  # Print loss every 10 epochs
  if epoch % 10 == 0:
    print(f"Epoch [{epoch+1}/100], Loss: {loss.item()}")

Epoch [1/100], Loss: 3.3989672660827637
Epoch [11/100], Loss: 0.26185113191604614
Epoch [21/100], Loss: 0.15574973821640015
Epoch [31/100], Loss: 0.1468612104654312
Epoch [41/100], Loss: 0.14128848910331726
Epoch [51/100], Loss: 0.13601897656917572
Epoch [61/100], Loss: 0.13094893097877502
Epoch [71/100], Loss: 0.12606796622276306
Epoch [81/100], Loss: 0.12136895209550858
Epoch [91/100], Loss: 0.11684507131576538


In [None]:

model = nn.Linear(2, 2)

criterion = nn.MSELoss()

optimizer = optim.Adam(model.parameters(), lr=0.01)

data = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
target = torch.tensor([[1.0, 0.0], [0.0, 1.0]])

for epoch in range(100):
  optimizer.zero_grad()
  output = model(data)
  loss = criterion(output, target)
  loss.backward()
  optimizer.step()

  if epoch % 10 == 0:
    print(f"Epoch [{epoch+1}/100], Loss: {loss.item()}")

Epoch [1/100], Loss: 7.031500339508057
Epoch [11/100], Loss: 4.196232795715332
Epoch [21/100], Loss: 2.24301815032959
Epoch [31/100], Loss: 1.0958170890808105
Epoch [41/100], Loss: 0.5368722081184387
Epoch [51/100], Loss: 0.3110317587852478
Epoch [61/100], Loss: 0.23197989165782928
Epoch [71/100], Loss: 0.20627081394195557
Epoch [81/100], Loss: 0.19760601222515106
Epoch [91/100], Loss: 0.19300217926502228


Key differences between SGD and Adam:

**SGD** uses a fixed learning rate and can sometimes get stuck in local minima.

**Adam** adjusts the learning rate per parameter based on moment estimates, which helps in faster convergence and stability.

**Learning Rate Scheduling (lr_scheduler)**

Learning rate scheduling allows you to adjust the learning rate during training to help optimize convergence. This is important for preventing overfitting and speeding up training after a certain number of epochs.


PyTorch provides several learning rate scheduling techniques, such as reducing the learning rate by a factor after a certain number of epochs or based on validation performance.



In [None]:
# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Define a learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

# Training loop with learning rate scheduling
for epoch in range(100):
  optimizer.zero_grad()
  output = model(data)
  loss = criterion(output, target)
  loss.backward()
  optimizer.step()

  # Step the scheduler to update the learning rate
  scheduler.step()

  if epoch % 10 == 0:
    print(f"Epoch: [{epoch+1}/100] Loss: {loss.item()} Learning Rate: {scheduler.get_last_lr()[0]}")

    # get_last_lr() gives you the past learning rate, and get_lr() gives you the future learning rate.
    # For most logging and monitoring purposes, get_last_lr() is the more appropriate choice.

Epoch: [1/100] Loss: 1.8784157873596996e-07 Learning Rate: 0.01
Epoch: [11/100] Loss: 0.0005542826256714761 Learning Rate: 0.01
Epoch: [21/100] Loss: 0.00010649542673490942 Learning Rate: 0.01
Epoch: [31/100] Loss: 2.3619895728188567e-06 Learning Rate: 0.001
Epoch: [41/100] Loss: 3.98743122786982e-06 Learning Rate: 0.001
Epoch: [51/100] Loss: 1.3967067502562713e-07 Learning Rate: 0.001
Epoch: [61/100] Loss: 3.4494672718210495e-07 Learning Rate: 0.0001
Epoch: [71/100] Loss: 1.9325361222399806e-07 Learning Rate: 0.0001
Epoch: [81/100] Loss: 9.005836432152137e-08 Learning Rate: 0.0001
Epoch: [91/100] Loss: 5.0339355794903895e-08 Learning Rate: 1e-05


StepLR(optimizer, step_size, gamma): This scheduler reduces the learning rate by a factor of gamma every step_size epochs.

scheduler.step(): Updates the learning rate after each epoch.

scheduler.get_lr(): Retrieves the current learning rate.


**Model Saving and Loading**

In PyTorch, saving and loading models is straightforward using torch.save() and torch.load().

**Saving a model (torch.save())**

There are two ways to save models:

Saving the model state_dict (recommended, only saves the model's learned parameters).
Saving the entire model (including structure and parameters).

In [None]:
# Save model state_dict
torch.save(model.state_dict(), "model_state_dict.pth")


**Loading a model (torch.load())**

To load the saved model, we first need to initialize the model structure and then load the saved state_dict.

In [None]:
# Define the model (same as when it was saved)
model = nn.Linear(2, 2)

# Load the saved state_dict
model.load_state_dict(torch.load("model_state_dict.pth"))

# Set the model to evaluation mode (important for inference)
model.eval()


**Data Handling**

torch.utils.data.Dataset & DataLoader

**torch.utils.data.Dataset – Base class for datasets**

The Dataset class is the starting point for working with any dataset in PyTorch. You subclass Dataset to define how your data is loaded and accessed.

The two key methods to implement when subclassing Dataset are:

__len__() – Returns the size of the dataset (total number of samples).

__getitem__() – Fetches a single sample from the dataset.

In [6]:
import torch
from torch.utils.data import Dataset

In [14]:
# A simple custom dataset class
class SimpleDataset(Dataset):
  def __init__(self):
    # Example data (input, target)
    self.data = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
    self.labels = torch.tensor([0, 1, 0, 1])

  def __len__(self):
    # Return the number of samples
    return len(self.data)
    # Return a sample and its label
  def __getitem__(self, idx):
    return self.data[idx], self.labels[idx]

# Create dataset instance
dataset = SimpleDataset()

# Access data
print(len(dataset))   # Output: 4 (number of samples)
print(dataset[0])     # Output: (tensor([1, 2]), tensor(0)) (first sample)

4
(tensor([1, 2]), tensor(0))


**DataLoader – Efficient batching and data shuffling**

DataLoader is responsible for batching, shuffling, and loading data in parallel using multiple workers. It can take in a Dataset and return batches of data for training.

In [15]:
from torch.utils.data import DataLoader

# Create DataLoader instance
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# Iterate through the DataLoader
for batch_data, batch_labels in dataloader:
  print(batch_data, batch_labels)

tensor([[5, 6],
        [1, 2]]) tensor([0, 0])
tensor([[3, 4],
        [7, 8]]) tensor([1, 1])


**batch_size=2**: Means each batch will contain 2 samples.

**shuffle=True**: Data will be shuffled before being batched.

**Custom Dataset & collate_fn**

Sometimes, you need to create a custom dataset where the standard behavior of Dataset and DataLoader doesn't fit exactly, such as when you need to customize how batches are combined or handle variable-length sequences.

**Creating a Custom Dataset**

Let’s say you're working with image data where each image has a different size. You can use the collate_fn to handle custom batching logic.

In [16]:
from torch.utils.data import Dataset, DataLoader
import torch
import numpy as np

In [21]:
# Custom dataaset with variable-length data
class CustomTextDataset(Dataset):
  def __init__(self):
    # Example text data with different lengths
    self.data = ['hello', 'world', 'this', 'is', 'a', 'custom', 'dataset']

  def __len__(self):
    return len(self.data)

  def __getitem__(self, idx):
    # Convert each word to a tensor of character indices
    return torch.tensor([ord(c) for c in self.data[idx]])

# Custom collate function to handle padding for variable-length sequences
def collate_fn(batch):
  # Pad sequences to the same length
  max_len = max([len(item) for item in batch])
  padded_batch = [torch.cat([item, torch.zeros(max_len - len(item))]) for item in batch]
  return torch.stack(padded_batch)


# Create the dataset
dataset = CustomTextDataset()

# Use DataLoader with custom collate_fn
dataloader = DataLoader(dataset, batch_size=2, collate_fn=collate_fn)

# Iterate through the DataLoader
for batch in dataloader:
  print(batch)

tensor([[104., 101., 108., 108., 111.],
        [119., 111., 114., 108., 100.]])
tensor([[116., 104., 105., 115.],
        [105., 115.,   0.,   0.]])
tensor([[ 97.,   0.,   0.,   0.,   0.,   0.],
        [ 99., 117., 115., 116., 111., 109.]])
tensor([[100.,  97., 116.,  97., 115., 101., 116.]])


Here, **collate_fn** is used to pad variable-length sequences (words) to the same length in each batch. Without it, PyTorch would throw an error due to mismatched sequence lengths.



**__init__**: Just stores the text data.

**__len__**: Returns the number of words.

**__getitem__**: This is the key. It converts each word into a tensor of character indices (ASCII values). For example, "hello" becomes tensor([104, 101, 108, 108, 111]). This numerical representation is what PyTorch can work with.

**torchvision.transforms for Data Augmentation**

In Computer Vision (CV), data augmentation is crucial for increasing the diversity of the training data by applying random transformations to the input images. torchvision.transforms provides common image transformations.

**Common Transformations in torchvision.transforms**

**Resizing**: Resize images to a fixed size.
Random Cropping: Randomly crop a portion of the image.

**Random Horizontal Flip**: Flip the image randomly.

**Normalization**: Normalize pixel values (mean and standard deviation).

In [22]:
from torchvision import transforms
from PIL import Image

In [26]:
# Example image (use any own image here)
img = Image.open('/content/cars.jpg')

# Define a sequence of transformations
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Apply transformations to the image
img_transformed = transform(img)

print(img_transformed.shape)     # Example output: torch.Size([3, 128, 128])

torch.Size([3, 128, 128])


**Using Data Augmentation with DataLoader**

You can apply transformations directly to datasets during data loading by passing them to the transform argument of a Dataset. Here's how you'd do that:

In [27]:
from torchvision import datasets

# Define transformations to apply to each image
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load a dataset (e.g., CIFAR-10) and apply transformations
dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

# Create DataLoader to batch and shuffle data
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Iterate through the DataLoader
for images, labels in dataloader:
  print(images.shape)   # Example output: torch.Size([32, 3, 128, 128])
  break   # Just show one batch

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:01<00:00, 94.6MB/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
torch.Size([32, 3, 128, 128])


**GPU Acceleration in PyTorch**

**to(device) – Moving tensors/models to a specific device (GPU or CPU)**

PyTorch allows you to run operations on either CPU or GPU. The GPU (Graphics Processing Unit) accelerates computations, especially for deep learning tasks. To move a tensor or model to the GPU, you use the to() method, specifying the device.

In [1]:
import torch

# Check if GPU is available, otherwise fall back to CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create a tensor on the CPU
x = torch.tensor([1.0, 2.0, 3.0])

# Move tensor to GPU if available
x = x.to(device)
print(x.device)   # Output: cuda:0 if GPU is used, cpu if not

cuda:0


**torch.device("cuda")** refers to the **GPU**.

**torch.device("cpu")** refers to the **CPU**.

**cuda() & cpu() – Moving tensors/models between CPU and GPU**

You can also use **.cuda()** to move tensors to the GPU and **.cpu()** to move tensors back to the **CPU**. These methods are often used for compatibility between GPU and CPU operations (for example, when working with models that require CPU input for predictions).

In [4]:
# Tensor on CPU
tensor_cpu = torch.tensor([1.0, 2.0, 3.0])

# Move tensor to GPU
tensor_gpu = tensor_cpu.cuda()
print(tensor_gpu.device)   # Output: cuda:0

# Move tensor back to CPU
tensor_cpu_back = tensor_gpu.cpu()
print(tensor_cpu_back.device)   # Output: cpu

cuda:0
cpu


**torch.cuda.amp – Mixed Precision for faster training**

Mixed Precision Training uses both 16-bit (half-precision) and 32-bit (single-precision) floating-point numbers during training. It can speed up training and reduce memory usage without sacrificing model accuracy. PyTorch’s Automatic Mixed Precision (AMP) provides an easy way to enable mixed precision.

In [8]:
import torch
from torch.cuda.amp import autocast, GradScaler

# Create a model and move it to GPU
model = torch.nn.Linear(2, 2).cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Initialize GradScaler for mixed precision
scaler = GradScaler()

# Dummy data and target
data = torch.randn(4, 2).cuda()
target = torch.randn(4, 2).cuda()

# Training loop with mixed precision
for epoch in range(100):
  optimizer.zero_grad()

  # Automatically use mixed precision for the forward pass
  with autocast():
    output = model(data)
    loss = torch.nn.functional.mse_loss(output, target)

  # Scaler the loss and backward pass
  scaler.scale(loss).backward()

  # Step the optimizer with scaler to handle gradients
  scaler.step(optimizer)
  scaler.update()

  if epoch % 10 == 0:
    print(f"Epoch {epoch}, Loss: {loss.item()}")

  scaler = GradScaler()
  with autocast():


Epoch 0, Loss: 1.6919735670089722
Epoch 10, Loss: 1.1706044673919678
Epoch 20, Loss: 0.8119431734085083
Epoch 30, Loss: 0.5912909507751465
Epoch 40, Loss: 0.4518594443798065
Epoch 50, Loss: 0.36078619956970215
Epoch 60, Loss: 0.2986535429954529
Epoch 70, Loss: 0.2544364929199219
Epoch 80, Loss: 0.2213820219039917
Epoch 90, Loss: 0.19566218554973602


**autocast()**: This context manager automatically casts operations to mixed precision (16-bit).

**GradScaler()**: Scales the gradients during backward pass to prevent underflow with mixed precision.

Explanation:

**Model and Optimizer:**

**model = torch.nn.Linear(2, 2).cuda()**:

Creates a simple linear model and moves it to the GPU using .cuda(). Mixed precision training is most effective on GPUs.
optimizer = torch.optim.SGD(...): Creates the optimizer (SGD in this case).

**GradScaler**:

**scaler = GradScaler()**:

Initializes the GradScaler. The GradScaler is essential for mixed precision training. It handles the scaling of the loss to prevent underflow issues that can occur when using lower precision.
Data and Target:

**data = torch.randn(4, 2).cuda() and target = torch.randn(4, 2).cuda()**:

Creates dummy input data and target values and moves them to the GPU.
Training Loop: The training loop iterates for the specified number of epochs.

**Autocast (Mixed Precision Forward Pass)**:

**with autocast()**:

This context manager enables mixed precision for the code within its block. PyTorch will automatically cast some operations to lower precision (typically FP16 or BF16) to speed up computation.

output = model(data) and loss = ...:

The forward pass and loss calculation are performed within the autocast context.
Loss Scaling and Backward Pass:

**scaler.scale(loss).backward()**:

This is crucial for mixed precision. The scaler.scale(loss) part scales the loss up. This helps prevent gradients from becoming too small in lower precision, which can lead to underflow. The .backward() call then computes the gradients for the scaled loss.

**Optimizer Step and Scaler Update:**

**scaler.step(optimizer):**

The scaler.step(optimizer) part is responsible for unscaling the gradients before the optimizer updates the parameters. It also skips updates for gradients that are too small.

**scaler.update():** This updates the scaling factor used by the GradScaler for the next iteration. It monitors the gradients and adjusts the scaling factor dynamically to maintain numerical stability.

**Model Deployment & Debugging**

**torch.jit.trace – JIT Compilation for optimizing models**

The TorchScript is an intermediate representation of the model in PyTorch, and JIT (Just-In-Time) compilation allows you to optimize your model for deployment. torch.jit.trace is used to trace the operations of the model, which can be run more efficiently.

In [9]:
import torch

# Define a simple model
class SimpleModel(torch.nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = torch.nn.Linear(2, 2)

    def forward(self, x):
        return self.fc(x)

# Instantiate model and move it to GPU
model = SimpleModel().cuda()

# Create example input (matching the model input size)
input_data = torch.randn(1, 2).cuda()

# Trace the model with the input data
traced_model = torch.jit.trace(model, input_data)

# Save the traced model
traced_model.save("traced_model.pt")

# Load and run the traced model for inference
loaded_model = torch.jit.load("traced_model.pt")
output = loaded_model(input_data)
print(output)


tensor([[-0.7818, -0.0381]], device='cuda:0', grad_fn=<AddmmBackward0>)


torch.jit.trace(): Traces the model to create a TorchScript version of it, which can be optimized and deployed independently of Python.
The model can then be saved, loaded, and run on any platform that supports PyTorch, without needing a Python interpreter.

**torch.profiler – Performance Analysis for Debugging and Optimization**

To analyze and optimize the performance of your PyTorch model, you can use torch.profiler. It helps you understand where the bottlenecks are during training, so you can focus on improving the critical parts.

In [10]:
import torch
import torch.profiler

# Define a simple model
class SimpleModel(torch.nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = torch.nn.Linear(2, 2)

    def forward(self, x):
        return self.fc(x)

# Instantiate model
model = SimpleModel().cuda()

# Create example input
input_data = torch.randn(1, 2).cuda()

# Profile the training process
with torch.profiler.profile(
    activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
    record_shapes=True,
    profile_memory=True,
    with_stack=True
) as prof:
    for _ in range(100):
        model(input_data)  # Forward pass

# Print profiling results
prof.export_chrome_trace("model_profile.json")  # Export the profile to Chrome Trace format
print(prof.key_averages().table(sort_by="cpu_time_total"))


-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                           aten::linear        18.96%       1.541ms        99.74%       8.109ms      81.090us       0.000us         0.00%     427.412us       4.274us           0 b           0 b      50.00 Kb           0 

**torch.profiler.profile()**: This context manager profiles the execution of your model, both on CPU and GPU.

**key_averages().table()**: Displays the profiling results in a table, showing which operations took the most time.