# pyTorch Progressive Exercises

1. [Introduction to PyTorch: Overview of PyTorch and its features](#1)

2. [Creating Tensors: Understanding tensors and their properties](#2)

3. [Tensor Manipulation: Reshaping tensors](#3)

4. [Autograd: Automatic Differentiation: Understanding the autograd system](#4)

5. [Building a Simple Neural Network: Understanding the architecture of neural networks](#5)

6. [Loss Functions and Optimizers: Understanding loss functions (e.g., Mean Squared Error, Cross-Entropy)](#6)

7. [Training the Neural Network: Preparing the training loop](#7)

8. [Introduction to the Iris Dataset: Overview of the Iris dataset and its features](#8)

9. [Classifying the Iris Dataset: Adapting the neural network to handle the Iris dataset](#9)

10. [Evaluating the Model: Making predictions on the test set](#10)

Summary: Recap of key concepts learned throughout the tutorial

Additional resources for further learning

## Prelude

### Machine learning versus deep learning

The main difference between machine learning (ML) and deep learning (DL) lies in the level of structure and flexibility in their algorithms. In traditional machine learning, the algorithms are often pre-set and more structured. For instance, algorithms like decision trees or support vector machines follow well-defined rules or formulas to make predictions based on the data they are trained on. These models rely heavily on feature selection, where humans define which features are most relevant for the task at hand.

In deep learning, especially with neural networks, the algorithm is more loosely defined and flexible. Instead of relying on pre-set rules, neural networks automatically learn complex patterns and representations from the data. The structure and layers of a neural network are adaptable and can evolve during training as the model adjusts weights and biases to minimize errors. This allows deep learning to excel in tasks where the patterns are not easily defined or where large amounts of unstructured data, like images or text, need to be processed.

Essentially, deep learning involves more data-driven learning with less reliance on explicit human-defined instructions compared to traditional machine learning approaches.

### Neural Network

A neural network, like the human brain, can process various types of data, such as images, text, and sounds. It learns patterns through layers of connected units called neurons. These layers transform data step by step, enabling the network to recognize images, understand language, or make predictions. As the network trains, it improves its performance, making it adaptable to different tasks.

Each neuron holds an intermediate value, called an activation, which is the output of the neuron after processing its inputs. This activation is passed to the next layer of neurons. The activation function is the mathematical function applied to the neuron’s weighted input (plus bias) to introduce non-linearity, allowing the network to learn complex patterns. Common activation functions include ReLU, sigmoid, and tanh.

The width of the network refers to the number of neurons in each layer, while the depth refers to the number of layers. Together, width and depth determine the network's complexity and its capacity to learn from data.


### Object-Oriented Design in PyTorch

PyTorch adopts an object-oriented design pattern, making it flexible and user-friendly for building neural networks. By defining models as classes that inherit from `nn.Module`, users can take advantage of inheritance and method overriding. This structure allows for easy customization and extension of models.

For instance, in the code example below, the class `IrisNN` inherits from `nn.Module`, which provides access to built-in methods and features like parameter management. The `__init__` method defines the layers (fully connected layers in this case), while the `forward()` method is overridden to specify the flow of data through the network during the forward pass. This encapsulation of functionality within classes not only makes the code clean and modular but also allows for easy reuse and modification.

This object-oriented approach aligns with PyTorch's dynamic computation graph, meaning users can change the architecture or computations on the fly, unlike more static frameworks.

By separating the model architecture, loss function, and optimizer, PyTorch enables fine control over every aspect of training. The explicit use of `train()` and `eval()` methods ensures that training-specific behaviors like dropout are correctly managed.

Moreover, PyTorch’s design makes it intuitive for debugging, as it feels like writing regular Python code. The combination of inheritance, method overriding, and the dynamic nature of PyTorch allows users to create complex models while keeping the code easy to manage and extend.

This flexibility is one of the reasons PyTorch is widely regarded as highly accessible for both beginners and advanced users.

***

### <a id="1">1. Introduction to PyTorch</a>

In [None]:
# Install PyTorch (run this in your terminal, not in Python code)
# pip install torch torchvision

### <a id="2">2. Creating Tensors</a>

In [None]:
import torch

# Create a 1D tensor
tensor_1d = torch.tensor([1, 2, 3])
print("1D Tensor:", tensor_1d)

# Create a 2D tensor
tensor_2d = torch.tensor([[1, 2], [3, 4]])
print("2D Tensor:", tensor_2d)

# Create a 3D tensor
tensor_3d = torch.tensor([[[1], [2]], [[3], [4]]])
print("3D Tensor:", tensor_3d)

1D Tensor: tensor([1, 2, 3])
2D Tensor: tensor([[1, 2],
        [3, 4]])
3D Tensor: tensor([[[1],
         [2]],

        [[3],
         [4]]])


### <a id="3">3. Tensor Manipulation</a>

In [None]:
# Reshape tensor
reshaped_tensor = tensor_2d.view(4, 1)
print("Reshaped Tensor:", reshaped_tensor)

# Slicing and indexing
slice_tensor = tensor_2d[:, 1]
print("Sliced Tensor:", slice_tensor)

# Concatenating tensors
tensor_a = torch.tensor([[1, 2], [3, 4]])
tensor_b = torch.tensor([[5, 6]])
concat_tensor = torch.cat((tensor_a, tensor_b), dim=0)
print("Concatenated Tensor:", concat_tensor)

Reshaped Tensor: tensor([[1],
        [2],
        [3],
        [4]])
Sliced Tensor: tensor([2, 4])
Concatenated Tensor: tensor([[1, 2],
        [3, 4],
        [5, 6]])


### <a id="4">4. Autograd: Automatic Differentiation</a>

*Purpose:* Differentiate an equation y' and substitute a value of x into the function y.

*Concept:*
NEURAL NETWORK: A neural network correlates input and output through the training process, which involves iterative forward and backward propagation, along with weight adjustments. A layer in a neural network consists of a set of neurons that perform specific transformations on the input data using weights.

For example, consider a simple neural network designed to predict housing prices based on features like size and number of bedrooms. In forward propagation, the network processes input data (like a house's size of 1500 square feet and 3 bedrooms) to produce a predicted price (e.g., \$300,000). This predicted price is then compared to the actual price (e.g., \$350,000) to calculate a loss, which quantifies the prediction error—the difference between the actual and predicted values ($50,000). A lower loss indicates a more accurate correlation between input and output.

During backward propagation, it is essential to compute the gradients of the loss with respect to the weights. In this context, weights can be thought of as scale factors that determine how much influence each input feature has on the output. If we think of a simple linear correlation like y=scale×x, the "scale" represents the weight. The gradient indicates how much the loss will change with a slight adjustment to the weight. For example, a gradient value of −0.1 suggests that increasing the weight slightly will decrease the loss, guiding the model toward better predictions.

The relationship used to correlate inputs to each layer in a neural network is typically represented as: y=weight×x+bias.

This equation includes both a weight (scale factor) and a bias (offset), allowing the model to fit the data more flexibly. The bias term helps the model make predictions even when the input is zero.

While the core of each layer operates on linear equations, neural networks also incorporate non-linear activation functions (like ReLU, sigmoid, or tanh) applied after these linear transformations. These activation functions can be seen as conditional functions that transform input values based on certain conditions, create non-linear mappings, and impose constraints on the output. They add parameters to the neural network that enhance its ability to learn complex patterns in the data.

Overall, a neural network can be viewed as a network of connected intermediate values, where each connection involves linear equations adjusted by weights and biases. The ultimate goal of the training process is to find a set of weights and biases that correlate the input (features like size and number of bedrooms) and output (predicted price) as precisely as possible, achieving minimal loss. The effectiveness of this correlation is measured by the loss itself, making it a critical metric for evaluating the network's performance. Ultimately, minimizing the loss leads to improved accuracy in the network's predictions.

GRADIENTS: In PyTorch, gradients are essential for the backpropagation process used in training neural networks.

When a tensor is created with requires_grad=True, PyTorch tracks all operations involving that tensor, allowing for automatic differentiation. This capability enables the computation of gradients when the .backward() method is called, typically on a loss tensor after a forward pass.

Backpropagation itself is a systematic method for updating the weights of a neural network to minimize the error between predicted outputs and actual targets. It involves two main steps:

- first, the forward pass calculates the output and loss;
- second, the backward pass computes the gradients of the loss with respect to each weight using the chain rule of calculus.

These gradients indicate how much each weight should be adjusted to reduce the loss. By repeatedly performing this process across multiple epochs, the network learns to improve its predictions by optimizing its weights based on the computed gradients, ultimately leading to better performance on the given task.

In [None]:
# Create a tensor and track gradients
x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 2*x + 1

# Backward pass
y.backward()
print("Gradient:", x.grad)

Gradient: tensor(6.)


### <a id="5">5. Building a Simple Neural Network</a>

Let's create a simple neural network called SimpleNN. The network has two parts: an input layer with 2 neurons (or inputs) and an output layer with 1 neuron (or output).

It uses a fully connected (linear) layer, meaning all inputs are connected to the output. The forward function takes an input x (which would have two values) and applies the linear transformation to produce a single output.

The network essentially learns a weighted sum of the two inputs plus a bias, and adjusts these weights and bias during training to improve its predictions.

In [None]:
import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc = nn.Linear(2, 1)  # Input layer: 2 neurons, output layer: 1 neuron

    def forward(self, x):
        return self.fc(x)

# Initialize the network
model = SimpleNN()
print(model)

SimpleNN(
  (fc): Linear(in_features=2, out_features=1, bias=True)
)


### <a id="6">6. Loss Functions and Optimizers</a>

In this code, two key parts of training a neural network are set up: the loss function and the optimizer. The loss function, `nn.MSELoss()`, measures how well the model's predictions match the actual values by calculating the average of the squared differences. Bigger errors are punished more since the differences are squared, helping the model learn to make better predictions.

The optimizer is `optim.SGD`, which uses the Stochastic Gradient Descent method to update the model's parameters (the weights and biases). The parameters are the values the network adjusts to improve accuracy. The optimizer changes these values based on how much the predictions deviate from the actual values, with the goal of reducing the error.

The argument `model.parameters()` tells the optimizer which parts of the model to update. The learning rate, set to `0.01`, controls how big each adjustment is. A smaller rate like this ensures the updates are slow and careful, so the model gradually improves without jumping too far from the correct solution.

Together, the loss function and optimizer work to guide the network's learning and improve its accuracy over time.

In [None]:
import torch.optim as optim

# Define loss function and optimizer
criterion = nn.MSELoss()  # Mean Squared Error
optimizer = optim.SGD(model.parameters(), lr=0.01)

### <a id="7">7. Training the Neural Network</a>

This code sets up and trains a neural network to mimic the behavior of an XOR logic gate. The XOR problem is a classic example in machine learning where the output is 1 if one, and only one, of the inputs is 1, and the output is 0 otherwise. The input data consists of four pairs of values: [0.0, 0.0], [0.0, 1.0], [1.0, 0.0], and [1.0, 1.0]. These pairs represent all possible input combinations for the XOR gate. The corresponding labels define the correct outputs for these inputs: 0 for [0.0, 0.0] and [1.0, 1.0], and 1 for [0.0, 1.0] and [1.0, 0.0].

The training loop runs for 100 epochs, where each epoch represents one pass through the entire dataset. At the start of each epoch, the model is set to training mode using model.train(), preparing it to adjust its parameters. The optimizer's gradients are reset with optimizer.zero_grad() to ensure that previous gradients don't interfere with the current training step. The model then takes the input data and produces its output based on its current parameters.

Next, the loss is calculated using the criterion (Mean Squared Error), which compares the model's output to the actual labels. This loss measures how far off the model's predictions are from the correct XOR outputs. After calculating the loss, loss.backward() computes the gradients, which are used by the optimizer to update the model’s weights and biases through optimizer.step(). This process helps the network learn to map the input combinations to the correct XOR outputs.

Finally, every 10 epochs, the current epoch number and the value of the loss are printed. This provides feedback on how well the network is learning. Over time, the loss should decrease, indicating that the network is becoming better at correlating the input pairs to the correct XOR outputs.

In [None]:
# Dummy data for training
data = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
labels = torch.tensor([[0.0], [1.0], [1.0], [0.0]])  # XOR problem

# Training loop
for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    outputs = model(data)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

Epoch 0, Loss: 1.1114723682403564
Epoch 10, Loss: 0.7044702172279358
Epoch 20, Loss: 0.49115628004074097
Epoch 30, Loss: 0.3792296350002289
Epoch 40, Loss: 0.32038646936416626
Epoch 50, Loss: 0.28934672474861145
Epoch 60, Loss: 0.27287906408309937
Epoch 70, Loss: 0.2640575170516968
Epoch 80, Loss: 0.2592557370662689
Epoch 90, Loss: 0.2565743923187256


### <a id="8">8. Introduction to the Iris Dataset</a>

Let's prepare the Iris dataset for training a machine learning model. It first loads the dataset, where `X` holds the features (flower attributes) and `y` holds the labels (flower species). The data is then normalized using `StandardScaler` to scale the features evenly. After that, the dataset is split into two parts: 80% for training and 20% for testing, ensuring that the split is consistent with `random_state=42`. This prepares the data for building and evaluating a model.

In [None]:
#!pip install sklearn

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Normalize the data
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### <a id="9">9. Classifying the Iris Dataset</a>

Let's apply the dataset to a simple neural network model for classifying the Iris dataset.

The model, IrisNN, has two layers: the first layer (fc1) takes 4 input features (the attributes of the iris flowers) and connects to 10 hidden units. The second layer (fc2) outputs 3 values, corresponding to the three possible flower species. The forward function applies the ReLU activation to the first layer, and then passes the result through the second layer to get the final output.

The loss function used is CrossEntropyLoss(), which is suitable for multi-class classification problems, and the optimizer is Adam, which updates the model's parameters to reduce the loss. The data, X_train and y_train, are converted to tensors for use with PyTorch.

The training loop runs for 100 epochs. In each epoch, the model computes its predictions for the training data, calculates the loss by comparing these predictions to the actual labels, computes the gradients of the loss, and updates the model's parameters using the optimizer. Every 10 epochs, the current loss is printed to monitor the training progress.

In [None]:
class IrisNN(nn.Module):
    def __init__(self):
        super(IrisNN, self).__init__()
        self.fc1 = nn.Linear(4, 10)  # 4 input features
        self.fc2 = nn.Linear(10, 3)   # 3 output classes

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Initialize the Iris model
model_iris = IrisNN()

# Define loss function and optimizer
criterion_iris = nn.CrossEntropyLoss()
optimizer_iris = optim.Adam(model_iris.parameters(), lr=0.01)

# Convert data to tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)

# Training loop for the Iris dataset
for epoch in range(100):
    model_iris.train()
    optimizer_iris.zero_grad()
    outputs = model_iris(X_train_tensor)
    loss = criterion_iris(outputs, y_train_tensor)
    loss.backward()
    optimizer_iris.step()
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

Epoch 0, Loss: 1.1678049564361572
Epoch 10, Loss: 0.8244679570198059
Epoch 20, Loss: 0.5571746230125427
Epoch 30, Loss: 0.4026944935321808
Epoch 40, Loss: 0.32383206486701965
Epoch 50, Loss: 0.26537033915519714
Epoch 60, Loss: 0.21246005594730377
Epoch 70, Loss: 0.16835327446460724
Epoch 80, Loss: 0.1347096562385559
Epoch 90, Loss: 0.11175578087568283


A low loss means the model's predictions are close to the actual values, indicating good performance. The loss measures how far off the predictions are from the true outputs, but it doesn't have specific units—it just reflects prediction accuracy.

Achieving a loss of exactly 0 means perfect predictions, which is rare and often unrealistic in real-world data. A small loss is normal, and a loss of 0 could lead to overfitting, where the model only works well on the training data but not on new data.

The unit of loss depends on the specific loss function used. For example, in this case, CrossEntropyLoss() is used, which doesn't have a direct physical unit—it simply represents how far off the predicted probability distribution is from the true distribution.

Loss functions like Mean Squared Error (MSE) would have units related to the square of the difference between predicted and true values, though often units are abstract in machine learning.

Achieving a loss of exactly 0 is possible in rare cases but generally unlikely, especially in complex real-world data. A loss of 0 would mean the model is perfectly predicting every output. However, in most scenarios, especially with noisy or imperfect data, a small but non-zero loss is more realistic.

Overfitting can occur if the loss approaches zero on training data, meaning the model may perform poorly on new, unseen data.

### <a id="10">10. Evaluating the Model</a>

Let's checks how well the neural network performs on the *test data*. First, the test data (`X_test`) is converted into a tensor format using PyTorch, making it ready for the model to process. This step ensures the test data is in the right format for evaluation.

Next, the model is put into evaluation mode with `model_iris.eval()`. This mode disables certain features, like gradient calculations and dropout, that are only used during training, allowing the model to provide stable and accurate predictions. The `torch.no_grad()` block ensures that no unnecessary computations are done, speeding up the evaluation process.

Inside the evaluation block, the model predicts the outputs for the test data. The `torch.max()` function is used to find the predicted class for each test sample by selecting the class with the highest output score. This helps the model decide which class label to assign to each input.

Finally, the accuracy is calculated by comparing the predicted labels with the actual test labels (`y_test`). It measures the percentage of correct predictions, giving a clear idea of how well the model performs on data it hasn’t seen before.

The accuracy is printed as a percentage, showing the overall performance of the model.

In [None]:
# Prepare test data
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)

# Evaluate the model
model_iris.eval()
with torch.no_grad():
    test_outputs = model_iris(X_test_tensor)
    _, predicted = torch.max(test_outputs, 1)

# Calculate accuracy
accuracy = (predicted.numpy() == y_test).mean()
print(f'Accuracy: {accuracy * 100:.2f}%')

Accuracy: 96.67%
