# A First Shot at Deep Learning with PyTorch

In this notebook, we are going to take the first step into the world of deep learning using PyTorch.


## Importing the libraries

Like with any other programming exercise, the first step is to import the necessary libraries. As we are going to be using Google Colab to program our neural network, we need to install and import the necessary PyTorch libraries.

In [None]:
## The usual imports
import torch
import torch.nn as nn

## print out the pytorch version used
print(torch.__version__)

2.5.1+cu121


## The Neural Network

![alt text](https://drive.google.com/uc?export=view&id=1Lpi4VPBfAV3JkOLopcsGK4L8dyxmPF1b)

Before building and training a neural network the first step is to process and prepare the data. In this notebook, we are going to use syntethic data (i.e., fake data) so we won't be using any real world data.

For the sake of simplicity, we are going to use the following input and output **pairs converted to tensors**, which is how data is typically represented in the world of deep learning. The x values represent the input of dimension `(6,1)` and the y values represent the output of similar dimension.

**A tensor is a multi-dimensional array that serves as the fundamental data structure in PyTorch.
Similar to NumPy arrays but with the added ability to run on GPUs for accelerated computation.**

Why Use Tensors?
Versatility: Can represent scalars, vectors, matrices, and higher-dimensional data.
Performance: Seamless integration with GPUs for faster computations.
Core of Deep Learning: Tensors store input data, model parameters, and activations during training.


The objective of the neural network model that we are going to build and train is to automatically learn patterns that better characterize the relationship between the `x` and `y` values. Essentially, the model learns the relationship that exists between inputs and outputs which can then be used to predict the corresponding `y` value for any given input `x`.

In [None]:
## our data in tensor form
x = torch.tensor([[-1.0],  [0.0], [1.0], [2.0], [3.0], [4.0]], dtype=torch.float)
y = torch.tensor([[-3.0], [-1.0], [1.0], [3.0], [5.0], [7.0]], dtype=torch.float)

In [None]:
tensor = torch.tensor([1, 2, 3], dtype=torch.float32)
if torch.cuda.is_available():
    tensor = tensor.to("cuda")
    print("Tensor on GPU:", tensor)
else:
    print("Tensor on CPU:", tensor)

Tensor on GPU: tensor([1., 2., 3.], device='cuda:0')


In [None]:
## print size of the input tensor
x.size()
tensor.size()

torch.Size([3])

## The Neural Network Components

### Model

* Below we show an example of how to define a hidden layer named `layer1` with size `(1, 1)`. For the purpose of this tutorial, we won't explicitly define the `weights` and allow the built-in functions provided by PyTorch to handle that part for us. By the way, the `nn.Linear(...)` function applies a linear transformation ($y = xA^T + b$) to the data that was provided as its input. We ignore the bias for now by setting `bias=False`.


* The nn.Sequential - Wraps the defined layer1 into a Sequential model.
A Sequential model chains multiple layers together in the order they are defined.
Purpose:Makes it easier to add and organize additional layers if needed.
Enables forward passes without explicitly calling the layers one by one.





In [None]:
## Neural network with 1 hidden layer
layer1 = nn.Linear(1,1, bias=False)
model = nn.Sequential(layer1)

### Loss and Optimizer
* The loss function, `nn.MSELoss()`, is in charge of letting the model know how good it has learned the relationship between the input and output. Defines the Mean Squared Error (MSE) loss function, which calculates the average squared difference between the predicted output and the actual target.

* The optimizer (in this case an `SGD`) primary role is to minimize or lower that loss value as it tunes its weights. Implements the Stochastic Gradient Descent (SGD) optimization algorithm.

In [None]:
## loss function
criterion = nn.MSELoss()

## optimizer algorithm
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) #The learning rate determines the size of the steps taken during optimization to minimize the loss function.

## Training the Neural Network Model
We have all the components we need to train our model. Below is the code used to train our model.

In simple terms, we train the model by feeding it the input and output pairs for a couple of rounds (i.e., `epoch`). After a series of forward and backward steps, the model somewhat learns the relationship between x and y values. This is notable by the decrease in the computed `loss`.

In [None]:
## training
for ITER in range(150):
    model = model.train()

    ## forward
    output = model(x)
    loss = criterion(output, y)
    optimizer.zero_grad()

    ## backward + update model params
    loss.backward() #Computes the gradients of the loss with respect to each model parameter (weights and biases) using backpropagation.
    optimizer.step() #Updates the model parameters based on the computed gradients.

    model.eval()
    print('Epoch: %d | Loss: %.4f' %(ITER, loss.detach().item())) #Converts the PyTorch tensor loss to a Python scalar for display.

Epoch: 0 | Loss: 38.3069
Epoch: 1 | Loss: 30.9098
Epoch: 2 | Loss: 24.9625
Epoch: 3 | Loss: 20.1808
Epoch: 4 | Loss: 16.3362
Epoch: 5 | Loss: 13.2451
Epoch: 6 | Loss: 10.7599
Epoch: 7 | Loss: 8.7617
Epoch: 8 | Loss: 7.1551
Epoch: 9 | Loss: 5.8634
Epoch: 10 | Loss: 4.8249
Epoch: 11 | Loss: 3.9899
Epoch: 12 | Loss: 3.3186
Epoch: 13 | Loss: 2.7788
Epoch: 14 | Loss: 2.3448
Epoch: 15 | Loss: 1.9959
Epoch: 16 | Loss: 1.7154
Epoch: 17 | Loss: 1.4898
Epoch: 18 | Loss: 1.3085
Epoch: 19 | Loss: 1.1627
Epoch: 20 | Loss: 1.0454
Epoch: 21 | Loss: 0.9512
Epoch: 22 | Loss: 0.8754
Epoch: 23 | Loss: 0.8145
Epoch: 24 | Loss: 0.7655
Epoch: 25 | Loss: 0.7261
Epoch: 26 | Loss: 0.6944
Epoch: 27 | Loss: 0.6690
Epoch: 28 | Loss: 0.6485
Epoch: 29 | Loss: 0.6320
Epoch: 30 | Loss: 0.6188
Epoch: 31 | Loss: 0.6082
Epoch: 32 | Loss: 0.5996
Epoch: 33 | Loss: 0.5927
Epoch: 34 | Loss: 0.5872
Epoch: 35 | Loss: 0.5828
Epoch: 36 | Loss: 0.5792
Epoch: 37 | Loss: 0.5763
Epoch: 38 | Loss: 0.5740
Epoch: 39 | Loss: 0.5721
Epo

## Testing the Model
After training the model we have the ability to test the model predictive capability by passing it an input. Below is a simple example of how you could achieve this with our model.

In [None]:
## test the model
sample = torch.tensor([10.0], dtype=torch.float)
predicted = model(sample)
print(predicted.detach().item())

17.096769332885742


##1. Exploring Tensor Basics##

Exercise:

**Part 1**: Basic Operations
Create a 2D tensor of random values with size (3, 3).
Perform the following:
- Transpose the tensor.
- Compute its mean and standard deviation.
- Move the tensor to a GPU (if available).

**Part 2**: Arithmetic Operations
Create another random tensor of the same size (3, 3) and:
- Add the two tensors.
- Multiply them element-wise.

**Part 3**: Reshaping and Broadcasting
- Reshape one of the tensors to (9,) and back to (3, 3).

**Part 4**: Advanced Tensor Indexing
Perform slicing to extract:
- The first row.
- The last column.
- All elements greater than a certain value (e.g., 0.5).


In [None]:
import torch

# Part 1: Basic Operations
# Create a 2D tensor
tensor = torch.rand(3, 3)
print("Original Tensor:\n", tensor)

# Transpose the tensor
tensor_t = tensor.T
print("Transposed Tensor:\n", tensor_t)

# Compute mean and standard deviation
mean = tensor.mean()
std = tensor.std()
print("Mean:", mean.item(), "Std Dev:", std.item())

# Move to GPU if available
if torch.cuda.is_available():
    tensor_gpu = tensor.to("cuda")
    print("Tensor on GPU:\n", tensor_gpu)

# Part 2: Arithmetic Operations
# Create another random tensor
tensor2 = torch.rand(3, 3)

# Element-wise addition
sum_tensor = tensor + tensor2
print("Element-wise Addition:\n", sum_tensor)

# Element-wise multiplication
product_tensor = tensor * tensor2
print("Element-wise Multiplication:\n", product_tensor)

# Part 3: Reshaping
# Reshape tensor
reshaped_tensor = tensor.view(9)
print("Reshaped Tensor:\n", reshaped_tensor)

# Reshape back to original
reshaped_back_tensor = reshaped_tensor.view(3, 3)
print("Reshaped Back Tensor:\n", reshaped_back_tensor)

# Part 4: Advanced Tensor Indexing
# Extract specific elements
first_row = tensor[0, :]
print("First Row:\n", first_row)

last_column = tensor[:, -1]
print("Last Column:\n", last_column)

elements_greater_than_05 = tensor[tensor > 0.5]
print("Elements Greater Than 0.5:\n", elements_greater_than_05)


Original Tensor:
 tensor([[0.7126, 0.7316, 0.5398],
        [0.5471, 0.3772, 0.1180],
        [0.8992, 0.2857, 0.0736]])
Transposed Tensor:
 tensor([[0.7126, 0.5471, 0.8992],
        [0.7316, 0.3772, 0.2857],
        [0.5398, 0.1180, 0.0736]])
Mean: 0.47609981894493103 Std Dev: 0.28407466411590576
Tensor on GPU:
 tensor([[0.7126, 0.7316, 0.5398],
        [0.5471, 0.3772, 0.1180],
        [0.8992, 0.2857, 0.0736]], device='cuda:0')
Element-wise Addition:
 tensor([[1.0868, 1.7132, 1.0443],
        [0.5667, 0.3821, 0.2061],
        [1.4185, 0.5656, 1.0166]])
Element-wise Multiplication:
 tensor([[0.2667, 0.7181, 0.2723],
        [0.0107, 0.0019, 0.0104],
        [0.4669, 0.0800, 0.0694]])
Reshaped Tensor:
 tensor([0.7126, 0.7316, 0.5398, 0.5471, 0.3772, 0.1180, 0.8992, 0.2857, 0.0736])
Reshaped Back Tensor:
 tensor([[0.7126, 0.7316, 0.5398],
        [0.5471, 0.3772, 0.1180],
        [0.8992, 0.2857, 0.0736]])
First Row:
 tensor([0.7126, 0.7316, 0.5398])
Last Column:
 tensor([0.5398, 0.118

In [None]:
# Create a random 2D tensor
tensor = torch.rand(3, 3)

# Transpose the tensor
tensor_t = tensor.T

# Compute mean and standard deviation
mean = tensor.mean()
std = tensor.std()

# Move to GPU if available
if torch.cuda.is_available():
    tensor_gpu = tensor.to("cuda")
    print("Tensor on GPU:", tensor_gpu)

print("Original Tensor:", tensor)
print("Transposed Tensor:", tensor_t)
print("Mean:", mean.item(), "Std Dev:", std.item())


Tensor on GPU: tensor([[0.8545, 0.5221, 0.1670],
        [0.9671, 0.0054, 0.8206],
        [0.5487, 0.8343, 0.3922]], device='cuda:0')
Original Tensor: tensor([[0.8545, 0.5221, 0.1670],
        [0.9671, 0.0054, 0.8206],
        [0.5487, 0.8343, 0.3922]])
Transposed Tensor: tensor([[0.8545, 0.9671, 0.5487],
        [0.5221, 0.0054, 0.8343],
        [0.1670, 0.8206, 0.3922]])
Mean: 0.5679792165756226 Std Dev: 0.33296021819114685


## 2. Neural Network Architecture ##
Exercise:

Part 1: Add Multiple Hidden Layers
Modify the current network to include multiple hidden layers.
Use nn.Sequential to define the model with at least 3 hidden layers.

Part 2: Experiment with Activation Functions
Insert different activation functions (e.g., ReLU, Tanh, Sigmoid) between the layers.
Compare the performance of the model with each activation function.

Part 3: Adjust the Size of Hidden Layers
Vary the size (number of neurons) in each hidden layer:
Example: Try [10, 20, 10] neurons across three layers.
Experiment with smaller or larger layer sizes.

compare between all the architectures

In [None]:
import torch.nn as nn

# Define a function to create and train a model
def create_and_train_model(hidden_layers, activations, input_size=1, output_size=1, epochs=150, lr=0.01):
    # Build the model dynamically
    layers = []
    in_features = input_size
    for i, hidden_size in enumerate(hidden_layers):
        layers.append(nn.Linear(in_features, hidden_size))
        layers.append(activations[i])  # Add activation function
        in_features = hidden_size
    layers.append(nn.Linear(in_features, output_size))  # Output layer
    model = nn.Sequential(*layers)

    # Optimizer and criterion
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    criterion = nn.MSELoss()

    # Train the model
    for ITER in range(epochs):
        model.train()
        output = model(x)
        loss = criterion(output, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return model, loss.item()

# Experiment with various architectures
architectures = [
    {"hidden_layers": [10], "activations": [nn.ReLU()]},
    {"hidden_layers": [10, 20, 10], "activations": [nn.ReLU(), nn.ReLU(), nn.ReLU()]},
    {"hidden_layers": [15, 15], "activations": [nn.Tanh(), nn.Tanh()]},
    {"hidden_layers": [10, 10, 10], "activations": [nn.Sigmoid(), nn.Sigmoid(), nn.Sigmoid()]}
]

results = []
for arch in architectures:
    model, loss = create_and_train_model(arch["hidden_layers"], arch["activations"])
    results.append({"Architecture": arch["hidden_layers"], "Activation": arch["activations"], "Loss": loss})

# Print results
for res in results:
    print(f"Architecture: {res['Architecture']}, Activations: {[type(act).__name__ for act in res['Activation']]}, Loss: {res['Loss']}")



Architecture: [10], Activations: ['ReLU'], Loss: 0.022010646760463715
Architecture: [10, 20, 10], Activations: ['ReLU', 'ReLU', 'ReLU'], Loss: 0.06420841068029404
Architecture: [15, 15], Activations: ['Tanh', 'Tanh'], Loss: 0.10088322311639786
Architecture: [10, 10, 10], Activations: ['Sigmoid', 'Sigmoid', 'Sigmoid'], Loss: 11.584339141845703


## 3. Loss Function Experimentation ##

Choose 3 different loss functions to experiment with (nn.MSELoss(), nn.L1Loss()...)

Train the model with each loss function using the same training data.
Record the final loss for each loss function after 50 epochs.
Compare the results:
Print the final loss for each loss function.
Summarize how the choice of loss function impacts the model's learning.

In [None]:
# List of loss functions to test
loss_functions = {
    "MSELoss": nn.MSELoss(),
    "L1Loss": nn.L1Loss(),
    "SmoothL1Loss": nn.SmoothL1Loss(),
    "HingeEmbeddingLoss": nn.HingeEmbeddingLoss(),
    "HuberLoss": nn.HuberLoss()
}

final_losses = []

# Loop through each loss function
for loss_name, criterion in loss_functions.items():
    print(f"\nTesting Loss Function: {loss_name}")

    # Reinitialize the model and optimizer for each experiment
    model = nn.Sequential(
        nn.Linear(1, 1, bias=False)
    )
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

    # Train the model
    for ITER in range(150):
        model.train()
        output = model(x)
        loss = criterion(output, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Record the final loss
    final_losses.append((loss_name, loss.detach().item()))
    print(f"Final Loss for {loss_name}: {loss.detach().item()}")

# Print all results
print("\nLoss Function Results:")
for loss_name, loss in final_losses:
    print(f"Loss Function: {loss_name}, Final Loss: {loss}")




Testing Loss Function: MSELoss
Final Loss for MSELoss: 0.5645161271095276

Testing Loss Function: L1Loss
Final Loss for L1Loss: 0.6113112568855286

Testing Loss Function: SmoothL1Loss
Final Loss for SmoothL1Loss: 0.2750014364719391

Testing Loss Function: HingeEmbeddingLoss
Final Loss for HingeEmbeddingLoss: 0.8345963358879089

Testing Loss Function: HuberLoss
Final Loss for HuberLoss: 0.27561065554618835

Loss Function Results:
Loss Function: MSELoss, Final Loss: 0.5645161271095276
Loss Function: L1Loss, Final Loss: 0.6113112568855286
Loss Function: SmoothL1Loss, Final Loss: 0.2750014364719391
Loss Function: HingeEmbeddingLoss, Final Loss: 0.8345963358879089
Loss Function: HuberLoss, Final Loss: 0.27561065554618835


## Learning Rate Exploration ##
Exercise:

Select 5 different learning rates (e.g., 0.1, 0.01, 0.001, 0.0001, 0.5).
For each learning rate:
1. Train the model using the same training data.
2. Record the loss after 50 epochs.
3. Compare the results for all learning rates:
Print the final loss for each learning rate.
Observe and explain how the learning rate affects convergence.


In [None]:
# List of learning rates to test
learning_rates = [0.1, 0.01, 0.001, 0.0001, 0.2]
final_losses = []

# Loop through each learning rate
for lr in learning_rates:
    print(f"\nTesting learning rate: {lr}")

    # Reinitialize the model and optimizer for each experiment
    model = nn.Sequential(
        nn.Linear(1, 1, bias=False)
    )
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    criterion = nn.MSELoss()

    # Train the model
    for ITER in range(150):
        model.train()
        output = model(x)
        loss = criterion(output, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Record the final loss
    final_losses.append((lr, loss.detach().item()))
    print(f"Final Loss for learning rate {lr}: {loss.detach().item()}")

# Print all results
print("\nLearning Rate Results:")
for lr, loss in final_losses:
    print(f"Learning Rate: {lr}, Final Loss: {loss}")



Testing learning rate: 0.1
Final Loss for learning rate 0.1: 0.5645161271095276

Testing learning rate: 0.01
Final Loss for learning rate 0.01: 0.5645161271095276

Testing learning rate: 0.001
Final Loss for learning rate 0.001: 1.4162936210632324

Testing learning rate: 0.0001
Final Loss for learning rate 0.0001: 16.821443557739258

Testing learning rate: 0.2
Final Loss for learning rate 0.2: 1704851456.0

Learning Rate Results:
Learning Rate: 0.1, Final Loss: 0.5645161271095276
Learning Rate: 0.01, Final Loss: 0.5645161271095276
Learning Rate: 0.001, Final Loss: 1.4162936210632324
Learning Rate: 0.0001, Final Loss: 16.821443557739258
Learning Rate: 0.2, Final Loss: 1704851456.0
