# Introduction to PyTorch

Welcome to the `01_intro_to_pytorch` notebook. This is part of a portfolio designed to showcase foundational PyTorch concepts and techniques that will be utilized in later projects. 

Here, I cover essential topics such as setting up the environment, working with tensors, leveraging GPU acceleration, and implementing automatic differentiation. Through various exercises, this notebook will show how to create and manipulate tensors, build and train simple neural networks, and evaluate model performance. 

This notebook lays the groundwork for more advanced PyTorch applications in subsequent projects.

Also, keep in mind that these notebooks following a "question-and-answer" format for active learning training purposes. So instead of just having explanatory code I'd rather go and and try to actively recall (or look up) the answer to a problem I face, which could as simple as loading libraries, to more complex things such as how to fine-tune models.

## What is PyTorch?

PyTorch is an open-source deep learning framework developed by Facebook's AI Research lab. It provides a flexible and intuitive platform for building and training neural networks. 
PyTorch's key features include dynamic computation graphs, which allow for more efficient model building and debugging, and support for GPU acceleration, enabling faster computations. 

With its extensive library of tools and utilities, PyTorch is widely used for both research and production in machine learning and artificial intelligence projects. These projects include:

- **Natural Language Processing (NLP)**: Building models for text classification, sentiment analysis, and machine translation.
- **Computer vision**: Implementing image classification, object detection, and image generation tasks.
- **Reinforcement learning**: Developing algorithms for game playing and decision-making processes.
- **Generative Adversarial Networks (GANs)**: Creating realistic images, videos, and other data generation tasks.
- **Time series analysis**: Forecasting and anomaly detection in sequential data.
- **Speech recognition**: Building models for converting speech to text and vice versa.
- **Robotics**: Developing intelligent control systems for robotic movements and actions.
- **Healthcare**: Predictive modeling and medical image analysis for diagnostics and treatment planning.

## Setting up the environment

##### **Q1: How do you install the base PyTorch libraries using a Jupyter notebook?**

In [2]:
# !pip install torch torchvision torchaudio

##### **Q2: How do you import the base PyTorch libraries for later use?**

In [3]:
import torch

print(torch.__version__)

2.3.1+cu121


## PyTorch basics

##### **Q3: How do you create a tensor in PyTorch? Provide examples of different ways to create tensors.**

In [4]:
# From a list
tensor_from_list = torch.tensor([1, 2, 3, 4])
print(tensor_from_list)

tensor([1, 2, 3, 4])


In [5]:
# Zeros tensor
zeros_tensor = torch.zeros(3, 3)
print(zeros_tensor)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


In [6]:
# Ones tensor
ones_tensor = torch.ones(2, 4)
print(ones_tensor)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])


In [7]:
# Random values
random_tensor = torch.rand(3, 2)
print(random_tensor)

tensor([[0.2381, 0.0810],
        [0.4930, 0.4038],
        [0.5550, 0.0527]])


In [8]:
# From a NumPy array
import numpy as np

numpy_array = np.array([[1, 2], [3, 4]])
tensor_from_numpy = torch.tensor(numpy_array)
print(tensor_from_numpy)

tensor([[1, 2],
        [3, 4]], dtype=torch.int32)


In [9]:
# With a specific data type
float_tensor = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float32)
print(float_tensor)

int_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)
print(int_tensor)

tensor([1., 2., 3.])
tensor([1, 2, 3], dtype=torch.int32)


In [10]:
# Uninitialized
uninitialized_tensor = torch.empty(2, 3)
print(uninitialized_tensor) # it's a tensor whose values are not set and can contain any data that was already present in the allocated memory block, making it useful for performance optimization when the initial values are irrelevant

tensor([[3.7516e-01, 1.7530e-42, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])


In [11]:
# Using a range
range_tensor = torch.arange(0, 10, step=2)
print(range_tensor)

tensor([0, 2, 4, 6, 8])


In [12]:
# Using linspace()
linspace_tensor = torch.linspace(0, 1, steps=5)
print(linspace_tensor)

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


##### **Q4: How do you perform basic tensor operations such as addition and multiplication?**

In [13]:
# Create two tensors for the exercise
tensor1 = torch.tensor([1, 2, 3])
tensor2 = torch.tensor([4, 5, 6])

# Element-wise addition
result = tensor1 + tensor2
print(result)

tensor([5, 7, 9])


In [14]:
# Using torch.add()
result = torch.add(tensor1, tensor2)
print(result)

tensor([5, 7, 9])


In [15]:
# Element-wise subtraction
result = tensor2 - tensor1
print(result)

tensor([3, 3, 3])


In [16]:
# Using torch.sub()
result = torch.sub(tensor2, tensor1)
print(result)

tensor([3, 3, 3])


In [17]:
# Element-wise multiplication
result = tensor1 * tensor2
print(result)

tensor([ 4, 10, 18])


In [18]:
# Using torch.mul()
result = torch.mul(tensor1, tensor2)
print(result)

tensor([ 4, 10, 18])


In [19]:
# Element-wise division
result = tensor2 / tensor1
print(result)

tensor([4.0000, 2.5000, 2.0000])


In [20]:
result = torch.div(tensor2, tensor1)
print(result)

tensor([4.0000, 2.5000, 2.0000])


In [21]:
# Examples for matrix operations
tensor1 = torch.tensor([[1, 2], [3, 4]])
tensor2 = torch.tensor([[5, 6], [7, 8]])

# Matrix multiplication
result = torch.matmul(tensor1, tensor2)
print(result)

tensor([[19, 22],
        [43, 50]])


In [22]:
# Using the @ operator
result = tensor1 @ tensor2
print(result)

tensor([[19, 22],
        [43, 50]])


In [23]:
# Broadcasting (i.e., arithmetic operations on tensors of different shapes)
tensor1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
tensor2 = torch.tensor([1, 2, 3])

result = tensor1 + tensor2
print(result)

tensor([[2, 4, 6],
        [5, 7, 9]])


##### **Q5: How do you slice and index tensors in PyTorch?**

In [24]:
# Indexing a single element
tensor = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
element = tensor[0, 1]  # Access the element at row 0, column 1
print(element)

tensor(2)


In [25]:
# Basic slicing
slice_tensor = tensor[:2, 1:]  # Slice the first two rows and columns from the second to the end
print(slice_tensor)

tensor([[2, 3],
        [5, 6]])


In [26]:
# Slicing with steps
step_slice = tensor[::2, ::2]  # Slice every second element along both dimensions
print(step_slice)

tensor([[1, 3],
        [7, 9]])


In [27]:
# Select all elements in a dimension with ellipsis
tensor = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
ellipsis_slice = tensor[..., 1]  # Select the last element from each sub-array
print(ellipsis_slice)

tensor([[2, 4],
        [6, 8]])


In [28]:
# Boolean indexing
tensor = torch.tensor([1, 2, 3, 4, 5, 6])
bool_index = tensor[tensor > 3]  # Select elements greater than 3
print(bool_index)

tensor([4, 5, 6])


In [29]:
# Indexing with a tensor of indices
tensor = torch.tensor([10, 20, 30, 40, 50])
indices = torch.tensor([0, 2, 4])
advanced_index = tensor[indices]  # Select elements at positions 0, 2, and 4
print(advanced_index)

tensor([10, 30, 50])


In [30]:
# Indexing + slicing
tensor = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
combined = tensor[1:, :2]  # Slice rows from the second to the end and columns up to the second
print(combined)

tensor([[4, 5],
        [7, 8]])


##### **Q6: How do you change the shape of a tensor in PyTorch?**

In [31]:
# Using reshape()
tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])

reshaped_tensor = tensor.reshape(3, 2)
print(reshaped_tensor)

tensor([[1, 2],
        [3, 4],
        [5, 6]])


In [32]:
# Using view()
viewed_tensor = tensor.view(3, 2)
print(viewed_tensor)

tensor([[1, 2],
        [3, 4],
        [5, 6]])


In [33]:
# Using transpose()
transposed_tensor = tensor.transpose(0, 1)
print(transposed_tensor)

tensor([[1, 4],
        [2, 5],
        [3, 6]])


In [34]:
# Create a 3D tensor
tensor_3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

# Use permute()
permuted_tensor = tensor_3d.permute(2, 0, 1)
print(permuted_tensor)

tensor([[[1, 3],
         [5, 7]],

        [[2, 4],
         [6, 8]]])


In [35]:
# Using flatten()
flattened_tensor = tensor.flatten()
print(flattened_tensor)

tensor([1, 2, 3, 4, 5, 6])


##### **Q7: How do you concatenate two tensors in PyTorch?**

In [36]:
# Concatenate along a specified dimension
tensor1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
tensor2 = torch.tensor([[7, 8, 9], [10, 11, 12]])

# Along the first dimension (rows)
concatenated_tensor = torch.cat((tensor1, tensor2), dim=0)
print(concatenated_tensor, '\n')

# Along the second dimension (columns)
concatenated_tensor = torch.cat((tensor1, tensor2), dim=1)
print(concatenated_tensor)

tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]]) 

tensor([[ 1,  2,  3,  7,  8,  9],
        [ 4,  5,  6, 10, 11, 12]])


In [37]:
# Stack along a new dimension
stacked_tensor = torch.stack((tensor1, tensor2), dim=0)
print(stacked_tensor, '\n')

stacked_tensor = torch.stack((tensor1, tensor2), dim=1)
print(stacked_tensor)

tensor([[[ 1,  2,  3],
         [ 4,  5,  6]],

        [[ 7,  8,  9],
         [10, 11, 12]]]) 

tensor([[[ 1,  2,  3],
         [ 7,  8,  9]],

        [[ 4,  5,  6],
         [10, 11, 12]]])


##### **Q8: How do you convert a NumPy array to a PyTorch tensor and vice versa?**

In [38]:
# Create a NumPy array
numpy_array = np.array([[1, 2, 3], [4, 5, 6]])

# Convert the NumPy array to a PyTorch tensor
torch_tensor = torch.from_numpy(numpy_array)
print(torch_tensor)

tensor([[1, 2, 3],
        [4, 5, 6]], dtype=torch.int32)


In [39]:
# Create a PyTorch tensor
torch_tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])

# Convert the PyTorch tensor to a NumPy array
numpy_array = torch_tensor.numpy()
print(numpy_array)

[[1 2 3]
 [4 5 6]]


In [40]:
# Avoiding memory sharing by creating a copy
torch_tensor_copy = torch.from_numpy(numpy_array.copy())
numpy_array_copy = torch_tensor.numpy().copy()

print(torch_tensor_copy, '\n') 
print(numpy_array_copy)

tensor([[1, 2, 3],
        [4, 5, 6]]) 

[[1 2 3]
 [4 5 6]]


##### **Q9: How do you get the size and shape of a tensor in PyTorch?**

In [41]:
# Get the size of a tensor
tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])

size = tensor.size()
print(size)

torch.Size([2, 3])


In [42]:
# Get the shape of the tensor
shape = tensor.shape
print(shape)

torch.Size([2, 3])


In [43]:
# Get number of dimensions
num_dimensions = tensor.ndimension()
print(num_dimensions)

2


In [44]:
# Get size of a specific dimension
rows = tensor.size(0)
cols = tensor.size(1)
print(f"Rows: {rows}, Columns: {cols}")

Rows: 2, Columns: 3


##### **Q10: How do you use advanced indexing techniques in PyTorch?**

In [45]:
# Boolean indexing
tensor = torch.tensor([1, 2, 3, 4, 5, 6])

mask = tensor > 3
selected_elements = tensor[mask]
print(selected_elements)

tensor([4, 5, 6])


In [46]:
# Indexing with another tensor
tensor = torch.tensor([10, 20, 30, 40, 50])

indices = torch.tensor([0, 2, 4])
selected_elements = tensor[indices]
print(selected_elements)

tensor([10, 30, 50])


In [47]:
# Indexing with a list of indices
tensor = torch.tensor([[1, 2], [3, 4], [5, 6]])

selected_rows = tensor[[0, 2]]
print(selected_rows)

tensor([[1, 2],
        [5, 6]])


In [48]:
# Use the mask with integer indexing
tensor = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

mask = tensor > 4

selected_elements = tensor[mask]
print(selected_elements)

tensor([5, 6, 7, 8, 9])


In [49]:
# Ellipsis indexing
tensor = torch.randn(3, 4, 5)

selected_elements = tensor[..., 1]
print(selected_elements.shape)

torch.Size([3, 4])


In [50]:
# Using torch.gather()
tensor = torch.tensor([[1, 2], [3, 4], [5, 6]])

indices = torch.tensor([[0, 0], [1, 0], [0, 1]])

gathered_tensor = torch.gather(tensor, 1, indices)
print(gathered_tensor)

tensor([[1, 1],
        [4, 3],
        [5, 6]])


## GPU acceleration

##### **Q11: How do you check if a GPU is available in PyTorch?**

In [53]:
if torch.cuda.is_available():
    print("GPU is available")
    device = torch.device("cuda")
    gpu_name = torch.cuda.get_device_name(0)
    print(f"GPU Name: {gpu_name}")
else:
    device = torch.device("cpu")
    print("GPU is not available")

GPU is available
GPU Name: NVIDIA GeForce RTX 3050 Laptop GPU


##### **Q12: How do you move tensors to GPU and perform operations on them?**

In [54]:
# Create a tensor
tensor_cpu = torch.tensor([1.0, 2.0, 3.0])

# Move the tensor to GPU
tensor_gpu = tensor_cpu.to(device)

# Alternatively, you can use .cuda() if device is explicitly set to "cuda"
# tensor_gpu = tensor_cpu.cuda()

print(tensor_gpu)

tensor([1., 2., 3.], device='cuda:0')


In [55]:
# Perform operations on GPU
another_tensor_gpu = torch.tensor([4.0, 5.0, 6.0]).to(device)

result = tensor_gpu + another_tensor_gpu
print(result)

tensor([5., 7., 9.], device='cuda:0')


In [60]:
# Move an entire model to GPU
import torch.nn as nn

# Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(3, 3)

    def forward(self, x):
        return self.fc(x)

In [69]:
# Instantiate the model
model = SimpleModel()

# Move the model to GPU
model.to(device)

# Create input tensor and move to GPU
input_tensor = torch.tensor([1.0, 2.0, 3.0]).to(device)

# Perform a forward pass
output = model(input_tensor)
print(output)

tensor([ 0.2044, -0.3446,  1.1305], device='cuda:0', grad_fn=<ViewBackward0>)


##### **Q13: How do you measure the time taken for tensor operations on GPU versus CPU?**

In [72]:
# Time for CPU operations
import time

tensor_cpu = torch.randn(10000, 10000)

# Measure the time
start_time = time.time()
result_cpu = tensor_cpu @ tensor_cpu  # Matrix multiplication
end_time = time.time()

cpu_time = end_time - start_time
print(f"Time for CPU operation: {cpu_time} seconds")

Time for CPU operation: 3.9019978046417236 seconds


In [73]:
# Time for GPU
if torch.cuda.is_available():
    device = torch.device("cuda")
    tensor_gpu = tensor_cpu.to(device)

    # Warm up the GPU
    _ = tensor_gpu @ tensor_gpu

    # Measure the time
    start_time = time.time()
    result_gpu = tensor_gpu @ tensor_gpu  # Matrix multiplication
    torch.cuda.synchronize()  # Wait for GPU operations to complete
    end_time = time.time()

    gpu_time = end_time - start_time
    print(f"Time for GPU operation: {gpu_time} seconds")
else:
    print("GPU is not available")

Time for GPU operation: 0.8348789215087891 seconds


In [74]:
# Alternative: use the timeit module to take the average from multiple operations
import timeit

tensor_cpu = torch.randn(10000, 10000)

# Time for a CPU operation
cpu_time = timeit.timeit('tensor_cpu @ tensor_cpu', globals=globals(), number=10)
print(f"Average time for CPU operation over 10 runs: {cpu_time / 10} seconds")

Average time for CPU operation over 10 runs: 4.267933060001814 seconds


In [75]:
# timeit on GPU
if torch.cuda.is_available():
    device = torch.device("cuda")
    tensor_gpu = tensor_cpu.to(device)

    # Warm up the GPU
    _ = tensor_gpu @ tensor_gpu

    # Measure the time
    def gpu_operation():
        result_gpu = tensor_gpu @ tensor_gpu
        torch.cuda.synchronize()  # Wait for GPU operations to complete

    gpu_time = timeit.timeit(gpu_operation, number=10)
    print(f"Average time for GPU operation over 10 runs: {gpu_time / 10} seconds")
else:
    print("GPU is not available")

Average time for GPU operation over 10 runs: 0.4593087299988838 seconds


##### **Q14: How do you handle tensors when working with multiple GPUs?**

In [76]:
# Check the number of available GPUs
n_gpus = torch.cuda.device_count()
print(f"Number of available GPUs: {n_gpus}")

# List available GPUs
if n_gpus > 1:
    for i in range(n_gpus):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
else:
    print("Multiple GPUs are not available.")

Number of available GPUs: 1
Multiple GPUs are not available.


In [79]:
# Had multiple GPUs been available, one could work with torch.nn.DataParallel. e.g.,
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(10, 10)

    def forward(self, x):
        return self.fc(x)

model = SimpleModel()

if n_gpus > 1:
    model = nn.DataParallel(model)
model = model.to('cuda')

input_tensor = torch.randn(64, 10).to('cuda')
output = model(input_tensor)
print(output)

tensor([[ 0.5793,  1.2301,  0.3336, -0.0730,  0.4936, -0.0848,  0.4817, -0.2437,
         -0.9768, -0.0174],
        [-1.1025, -0.7745,  0.1287, -0.5809, -0.5398, -0.9629, -0.4118, -0.7461,
          0.1665, -0.1868],
        [-0.1357,  0.5453, -0.2181,  1.6246, -0.1711,  1.6612,  0.1437, -0.3999,
         -0.3275,  0.7510],
        [-0.2530,  0.1424,  0.6478,  0.0620,  0.4956,  0.3044,  0.2441,  0.3048,
          0.1568,  0.2696],
        [-0.5817,  0.6161, -0.8001,  0.7301,  0.3400,  0.7400,  0.2450, -0.2854,
         -0.1619,  0.8682],
        [ 0.0060,  0.3469,  0.6073,  0.0050,  0.9235,  0.3002,  0.7779, -0.0766,
          0.1940,  0.3077],
        [-0.4639, -0.1702,  0.0768,  0.0903, -0.2852, -0.2345, -0.2375, -0.9020,
          0.1045, -0.3338],
        [-0.7160, -0.4117,  0.3285, -0.2942, -0.2763, -0.2346, -0.3866, -0.3820,
          0.8440, -0.7600],
        [ 0.0342,  0.8916,  0.2006,  1.3373,  0.6596,  1.5486,  0.0214,  0.6591,
         -0.2724,  0.8520],
        [-0.4273, -

## Automatic differentiation

##### **Q15: How do you enable automatic differentiation in PyTorch and compute gradients?**

In [85]:
# Create a tensor with requires_grad=True to enable automatic differentiation
x = torch.tensor([2.0, 3.0], requires_grad=True)
print(x)

tensor([2., 3.], requires_grad=True)


In [86]:
# Define a simple operation
y = x[0] * x[1] + x[1] ** 2
print(y)

tensor(15., grad_fn=<AddBackward0>)


In [87]:
# Compute gradients
y.backward()

# Print gradients
print(x.grad)

tensor([3., 8.])


In [88]:
# Disable gradient calculation (when no longer needed)
with torch.no_grad():
    z = x[0] * x[1] + x[1] ** 2
    print(z)

tensor(15.)


In [89]:
# Alternatively, using torch.set_grad_enabled(False)
torch.set_grad_enabled(False)
z = x[0] * x[1] + x[1] ** 2
print(z)
torch.set_grad_enabled(True)

tensor(15.)


<torch.autograd.grad_mode.set_grad_enabled at 0x1e44c3c9350>

##### **Q16: How do you stop PyTorch from tracking history on tensors?**

In [90]:
# Use torch.no_grad()
# Create a tensor with requires_grad=True
x = torch.tensor([2.0, 3.0], requires_grad=True)

# Perform operations without tracking history
with torch.no_grad():
    y = x[0] * x[1] + x[1] ** 2
    print(y)

tensor(15.)


In [91]:
# Verify that no gradient is tracked
print(y.requires_grad)

False


In [92]:
# Can also use .detach()
x = torch.tensor([2.0, 3.0], requires_grad=True)

# Detach the tensor from the computation graph
x_detached = x.detach()

# Perform operations on the detached tensor
y = x_detached[0] * x_detached[1] + x_detached[1] ** 2
print(y)

tensor(15.)


In [93]:
# Verify that no gradient is tracked
print(x_detached.requires_grad)

False


##### **Q17: How do you manually zero the gradients in PyTorch?**

In [100]:
# Use optimizer.zero_grad()
import torch.optim as optim

model = nn.Linear(10, 1)

# Define a loss function and an optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Dummy input and target tensors
input_tensor = torch.randn(10)
target_tensor = torch.randn(1)

# Forward pass
output = model(input_tensor)
loss = criterion(output, target_tensor)

print(output)
print(loss)

# Backward pass
optimizer.zero_grad()  # Reset gradients
loss.backward()        # Compute gradients
optimizer.step()       # Update model parameters

tensor([-0.3890], grad_fn=<ViewBackward0>)
tensor(0.3583, grad_fn=<MseLossBackward0>)


In [101]:
# You can also manually zero gradients for specific tensors
x = torch.tensor([2.0, 3.0], requires_grad=True)

y = x[0] * x[1] + x[1] ** 2

y.backward()

# Print gradients before zeroing
print(x.grad)

tensor([3., 8.])


In [102]:
# Manually zero the gradients
x.grad.zero_()

# Verify gradients are zeroed
print(x.grad)

tensor([0., 0.])


##### **Q18: How do you use the `backward()` method for computing gradients?**

In [103]:
# First, set up tensors and a simple operation
x = torch.tensor([2.0, 3.0], requires_grad=True)

y = x[0] * x[1] + x[1] ** 2
print(y)

tensor(15., grad_fn=<AddBackward0>)


In [104]:
# Compute and print gradients
y.backward()

print(x.grad) 

tensor([3., 8.])


In [105]:
# You can also use non-scalar outputs
z = x ** 2

# Compute gradients with a gradient argument
z.backward(torch.tensor([1.0, 1.0]))

print(x.grad)

tensor([ 7., 14.])


## Building a simple neural network

##### **Q19: How do you define a simple neural network using `nn.Module` in PyTorch?**

In [111]:
# Import required libraries - which have already been imported in previous cells
# import torch
# import torch.nn as nn
# import torch.optim as optim

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [112]:
# Define a simple neural network class with a forward method in it
class SimpleNeuralNetwork(nn.Module):
    def __init__(self):
        super(SimpleNeuralNetwork, self).__init__()
        # Define layers
        self.fc1 = nn.Linear(10, 50)  # Input layer to hidden layer
        self.fc2 = nn.Linear(50, 20)  # Hidden layer to hidden layer
        self.fc3 = nn.Linear(20, 1)   # Hidden layer to output layer

    def forward(self, x):
        # Define forward pass
        x = torch.relu(self.fc1(x))  # Apply ReLU activation function
        x = torch.relu(self.fc2(x))  # Apply ReLU activation function
        x = self.fc3(x)              # Output layer
        return x

In [113]:
# Build up the model
model = SimpleNeuralNetwork().to(device)

# Define loss function and optimizer
criterion = nn.MSELoss().to(device)                 # Mean Squared Error Loss
optimizer = optim.SGD(model.parameters(), lr=0.01)  # Stochastic Gradient Descent

# Dummy input and target tensors for demonstration
inputs = torch.randn(5, 10).to(device)   # 5 samples, each with 10 features
targets = torch.randn(5, 1).to(device)   # 5 samples, each with 1 target value

In [114]:
# Training loop
num_epochs = 1000
for epoch in range(num_epochs):
    # Zero the gradients
    optimizer.zero_grad()
    
    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    
    # Backward pass
    loss.backward()
    
    # Update weights
    optimizer.step()
    
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [100/1000], Loss: 0.1220
Epoch [200/1000], Loss: 0.0190
Epoch [300/1000], Loss: 0.0020
Epoch [400/1000], Loss: 0.0003
Epoch [500/1000], Loss: 0.0001
Epoch [600/1000], Loss: 0.0000
Epoch [700/1000], Loss: 0.0000
Epoch [800/1000], Loss: 0.0000
Epoch [900/1000], Loss: 0.0000
Epoch [1000/1000], Loss: 0.0000


##### **Q20: How do you initialize the weights and biases of a neural network?**

In [115]:
# Use the torch.nn.init module
import torch.nn.init as init

class SimpleNeuralNetwork(nn.Module):
    def __init__(self):
        super(SimpleNeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 20)
        self.fc3 = nn.Linear(20, 1)
        self.initialize_weights()

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def initialize_weights(self):
        # Apply Xavier initialization to weights and zero initialization to biases
        init.xavier_uniform_(self.fc1.weight)
        init.zeros_(self.fc1.bias)
        init.xavier_uniform_(self.fc2.weight)
        init.zeros_(self.fc2.bias)
        init.xavier_uniform_(self.fc3.weight)
        init.zeros_(self.fc3.bias)

In [116]:
model = SimpleNeuralNetwork()
print(model)

SimpleNeuralNetwork(
  (fc1): Linear(in_features=10, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=20, bias=True)
  (fc3): Linear(in_features=20, out_features=1, bias=True)
)


In [117]:
# Alternative: use custom initialization functions
def custom_weights_init(m):
    if isinstance(m, nn.Linear):
        # Apply custom initialization
        nn.init.normal_(m.weight, mean=0.0, std=0.01)
        nn.init.constant_(m.bias, 0.0)

class SimpleNeuralNetwork(nn.Module):
    def __init__(self):
        super(SimpleNeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 20)
        self.fc3 = nn.Linear(20, 1)
        self.apply(custom_weights_init)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [118]:
model = SimpleNeuralNetwork()
print(model)

SimpleNeuralNetwork(
  (fc1): Linear(in_features=10, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=20, bias=True)
  (fc3): Linear(in_features=20, out_features=1, bias=True)
)


In [119]:
# Another option: Use built-in initializations in layers
class SimpleNeuralNetwork(nn.Module):
    def __init__(self):
        super(SimpleNeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 20)
        self.fc3 = nn.Linear(20, 1)
        self.initialize_weights()

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def initialize_weights(self):
        # Apply He initialization to weights and zero initialization to biases
        init.kaiming_uniform_(self.fc1.weight, nonlinearity='relu')
        init.zeros_(self.fc1.bias)
        init.kaiming_uniform_(self.fc2.weight, nonlinearity='relu')
        init.zeros_(self.fc2.bias)
        init.kaiming_uniform_(self.fc3.weight, nonlinearity='relu')
        init.zeros_(self.fc3.bias)

In [120]:
model = SimpleNeuralNetwork()
print(model)

SimpleNeuralNetwork(
  (fc1): Linear(in_features=10, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=20, bias=True)
  (fc3): Linear(in_features=20, out_features=1, bias=True)
)


##### **Q21: How do you add multiple layers to a neural network?**

In [121]:
# Use nn.Sequential
class SimpleSequentialNN(nn.Module):
    def __init__(self):
        super(SimpleSequentialNN, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(10, 50),
            nn.ReLU(),
            nn.Linear(50, 20),
            nn.ReLU(),
            nn.Linear(20, 1)
        )

    def forward(self, x):
        return self.network(x)

In [122]:
model = SimpleSequentialNN()
print(model)

SimpleSequentialNN(
  (network): Sequential(
    (0): Linear(in_features=10, out_features=50, bias=True)
    (1): ReLU()
    (2): Linear(in_features=50, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=1, bias=True)
  )
)


In [123]:
# Or, define each layer explicitly
class SimpleExplicitNN(nn.Module):
    def __init__(self):
        super(SimpleExplicitNN, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.fc2 = nn.Linear(50, 20)
        self.fc3 = nn.Linear(20, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [124]:
model = SimpleExplicitNN()
print(model)

SimpleExplicitNN(
  (fc1): Linear(in_features=10, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=20, bias=True)
  (fc3): Linear(in_features=20, out_features=1, bias=True)
  (relu): ReLU()
)


In [125]:
# Or, add multiple layers w/ different architectures
class ComplexNN(nn.Module):
    def __init__(self):
        super(ComplexNN, self).__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        # Fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 128)  # Assuming input images are 28x28
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)  # Flatten the tensor
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

In [126]:
model = ComplexNN()
print(model)

ComplexNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=3136, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.5, inplace=False)
)


## Loss function and optimizer

##### **Q22: How do you define a loss function and an optimizer for your neural network?**

##### **Q23: How do you use different types of optimizers in PyTorch?**

##### **Q24: How do you adjust the learning rate during training?**

## Training the model

##### **Q25: How do you create a training loop to train your neural network in PyTorch?**

## Evaluation and inference

##### **Q26: How do you evaluate your model's performance and make predictions on new data?**

##### **Q27: How do you calculate the accuracy of your model?**

##### **Q28: How do you handle model evaluation for regression tasks?**

##### **Q29: How do you handle model evaluation for classification tasks?**

##### **Q30: How do you use confusion matrices to evaluate model performance?**

## Saving and loading models

##### **Q31: How do you save and load a PyTorch model?**

##### **Q32: How do you save and load model checkpoints during training?**

## Custom datasets and DataLoaders

##### **Q33: How do you use PyTorch's DataLoader to load a dataset in batches?**

##### **Q34: How do you implement a custom dataset in PyTorch?**

##### **Q35: How do you apply data transformations using `torchvision.transforms`?**

##### **Q36: How do you handle data augmentation in PyTorch?**

## Conclusion

## Further exercises

##### **Q37: How do you create a tensor of shape (2, 3) filled with zeros and then with ones?**

##### **Q38: How do you train a neural network to predict the output of a simple linear function?**

##### **Q39: How do you experiment with different optimizers and learning rates to see their effect on training?**

##### **Q40: How do you visualize the training loss and accuracy over epochs in PyTorch?**

##### **Q41: How do you implement dropout regularization in a neural network using PyTorch?**