# Neural networks for regression and classification tasks with PyTorch

ANNs are used for various tasks, including regression (predicting continuous values) and classification (assigning categories). In this guide, we will explore how to build and train ANNs using PyTorch.

Neural networks in PyTorch consist of layers of interconnected neurons that process input data to make predictions. The main components of a neural network include:

- **Input Layer**: Receives the input data.
- **Hidden Layers**: Intermediate layers that learn patterns in the data.
- **Output Layer**: Outputs the final prediction.

#### Steps in building neural networks with PyTorch

1. Define the model architecture: Choose the number of layers and neurons.
2. Define the loss function and optimizer.
3. Train the model: Use training data to learn weights.
4. Evaluate the model: Assess performance on test data.
5. Make predictions: Use the trained model on new data.
6. Save and load the model.

In this guide, we'll build Feedforward Neural Networks (FFNN) for two types of tasks:
1. **Regression**: Predicting continuous values.
2. **Classification**: Categorizing data into discrete classes.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.datasets import fetch_california_housing, load_iris
from sklearn.model_selection import train_test_split

## 1. Neural network for regression
We will start by building a neural network to predict house prices using the California housing dataset. This dataset contains features related to housing prices in California. It includes attributes like median income, housing age, and average rooms per household, among others.

In [2]:
# Load the Boston housing dataset
housing = fetch_california_housing()
X = housing.data
y = housing.target

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)  # Convert to tensor and set data type to float
X_test = torch.tensor(X_test, dtype=torch.float32)  # Convert to tensor and set data type to float
y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)  # Add a dimension for the output
y_test = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

### Step 1.1: Define the model architecture
Let's define a simple feedforward neural network for regression.

In [3]:
# Define the neural network class
class RegressionModel(nn.Module):
    def __init__(self):
        super(RegressionModel, self).__init__()
        self.fc1 = nn.Linear(in_features=X_train.shape[1], out_features=64)  # Input layer to first hidden layer with 64 neurons
        self.fc2 = nn.Linear(in_features=64, out_features=32)  # First hidden layer to second hidden layer with 32 neurons
        self.fc3 = nn.Linear(in_features=32, out_features=1)  # Second hidden layer to output layer with 1 neuron

    def forward(self, x):
        x = torch.relu(self.fc1(x))  # Apply ReLU activation after first hidden layer
        x = torch.relu(self.fc2(x))  # Apply ReLU activation after second hidden layer
        x = self.fc3(x)  # Output layer (no activation needed for regression)
        return x

# Instantiate the model
model = RegressionModel()

**Explanation**:

- **Defining the model class**: In PyTorch, we define a neural network by creating a class that inherits from `nn.Module`. This class contains the architecture of the neural network. Inside this class, we define the layers in the `__init__` method and the forward pass in the `forward` method.
    - Using a class structure allows for better organization of the model's components. It encapsulates the network's architecture and behavior (i.e., the forward pass) within a single, reusable object. This is particularly useful for more complex models and custom operations.
    - By inheriting from `nn.Module`, the model class gains access to PyTorch’s underlying features, such as automatic differentiation, easy layer management, and built-in methods for training and evaluation.
    ```python
  class MyModel(nn.Module):
      def __init__(self):
          super(MyModel, self).__init__()
          # Define layers here
          
      def forward(self, x):
          # Define forward pass here
          return x
  ```
    - `super(MyModel, self).__init__()`: This line calls the constructor of the `nn.Module` class, initializing the model correctly.
    - The `__init__` method is where we define the layers of our model.
    - The `forward` method is where we define the computation that happens when we pass data through the model.

- **Defining layers**: Layers are typically defined in the `__init__` method in the model class. This is where we specify the network’s architecture by creating the layers as class attributes. The `__init__` method is the constructor of the class. When an instance of the model is created, this method is automatically called to initialize the network's layers. Each layer is represented as an instance of a PyTorch layer class from the `torch.nn` module (like `nn.Linear` for fully connected layers, `nn.Conv2d` for convolutional layers). Defining layers as attributes of the class (using `self`) makes them part of the model’s state, allowing PyTorch to track them for operations like backpropagation and parameter updates.
  ```python
  def __init__(self):
      super(MyModel, self).__init__()
      self.fc1 = nn.Linear(in_features, out_features)
      self.fc2 = nn.Linear(in_features, out_features)
  ```
    - The linear layers (`nn.Linear`) represent fully connected layers (dense layer in TensorFlow), where each neuron in the layer is connected to every neuron in the previous layer. It assigns it as an attribute of the class (`self.fc1`). The `in_features` and `out_features` specify the number of input and output units for this layer, respectively. The first layer in the network will take the shape of the input data (`in_features` in the first layer should match the number of features in the input data.)  

- **Defining forward pass**: The `forward` method in the model class describes how the input data is transformed as it passes through the layers of the network. This method specifies the order in which layers are applied and the computation that occurs when we pass input data through the layers. The forward pass is separated into its own method to clearly define how input data is processed by the network. This structure makes it easy to modify or extend the forward pass without affecting the network’s architecture.
    - **Input transformation**: The `forward` method takes the input data (e.g., features of an image, a row of tabular data) and processes it through the network layers. Each layer performs specific computations, such as linear transformations or activations, to transform the data step-by-step. The order and type of transformations are defined in this method.
      ```python
      def forward(self, x):
          # Transformation through layers
          return x
      ```
      
  - **Layer applications**: Typically, data is passed through one or more layers. Each layer applies a specific operation defined in the `__init__` method, such as a linear transformation (`nn.Linear`), convolution (`nn.Conv2d`), or recurrent processing (`nn.LSTM`). The data is modified according to the layer’s function.
      ```python
      x = self.layer1(x)
      x = self.layer2(x)
      ```
      
  - **Activation functions**: Often, the output of a layer is passed through an activation function to introduce non-linearity into the model. Common activation functions include ReLU (`torch.relu`), Sigmoid (`torch.sigmoid`), and Tanh (`torch.tanh`). These functions help the network learn complex patterns by allowing the model to represent non-linear relationships.
       ```python
       x = torch.relu(self.layer1(x))
       ```

  - **Sequential processing**: The layers and activation functions are applied sequentially. The output of one layer or operation becomes the input to the next. This sequence of operations defines how the input is transformed into the final prediction or output.
      ```python
      x = torch.relu(self.layer1(x))  # Apply ReLU activation
      x = self.layer2(x)  # Apply second layer
      ```

  - **Output layer**: The final layer produces the output of the network. For regression tasks, this might be a single linear layer without an activation function. For classification tasks, it might include a softmax function or another activation function to produce probabilities for each class.
      ```python
      x = self.output_layer(x)  # Output layer for final predictions
      ```

  - **Returning the output**: The `forward` method returns the final output after all transformations are applied. This output represents the network’s prediction or result, which is then used for loss calculation during training or for making predictions during inference.
      ```python
      return x
      ```

- **Creating and using the model**: Once our model class is defined, we can create an instance of it. This instance can then be used to perform tasks like training or making predictions.
    ```python
    model = MyModel()
    ```

### Step 1.2: Define the loss function and optimizer
After defining the model, we need to specify how it will learn by defining the loss function and the optimizer.

In [4]:
# Define loss function and optimizer
criterion_mse = nn.MSELoss()  # Mean squared error loss for regression
criterion_mae = nn.L1Loss()   # Mean absolute error loss for regression

optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer

**Explanation**:

- **Loss function**: Measures how far the model’s predictions are from the actual values. In PyTorch, loss functions are provided by the `torch.nn` module. The choice of loss function depends on the type of problem we are solving:
    - MSE loss (`nn.MSELoss()`): MSE is commonly used for regression tasks where we predict continuous values, measuring the average squared difference between actual and predicted values.
    - MAE loss (`nn.L1Loss()`): MAE is also used for regression tasks, calculating the average absolute difference between actual and predicted values. It is less sensitive to outliers compared to MSE.
    - Cross-entropy loss (`nn.CrossEntropyLoss()`): Used for multi-class classification problems. Cross-entropy loss combines softmax and negative log-likelihood loss into one function. It measures the difference between the predicted probability distribution and the true distribution.
    - Binary cross-entropy loss (`nn.BCELoss()`): Used for binary classification problems where the output is a single probability value indicating the class. It measures the binary difference between the actual label and the predicted probability, penalizing incorrect predictions.

- **Optimizer**: The optimizer uses the gradients (which are calculated using backpropagation) to adjust the parameters in a way that minimizes the loss. PyTorch provides several optimizers in the `torch.optim` module. Some of the most commonly used ones are:
    - Stochastic gradient descent (`optim.SGD`): A basic optimization algorithm that updates model parameters based on the gradient of the loss function. In SGD, parameters are updated by taking small steps in the direction that decreases the loss, proportional to the gradient of the loss with respect to the parameters.
    - Adam (`optim.Adam`): An advanced optimizer that combines the benefits of both SGD and RMSProp, adjusting the learning rate for each parameter individually. Adam adjusts the learning rate dynamically based on estimates of first and second moments of the gradients
    
    Both optimizers require:
        - Model parameters (`model.parameters()`): This method passes all the learnable parameters of the model (i.e., the weights and biases of the network) to the optimizer so that it knows which variables to update during training.
        - Learning rate (`lr`): This hyperparameter specifies the learning rate, controlling how much to adjust the weights during each step of optimization.

### Step 1.3: Train the Model
Now, we will train the model using our training data.

In [5]:
# Prepare the DataLoader
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)

# Training loop
num_epochs = 20

for epoch in range(num_epochs):
    for inputs, targets in train_loader:
        # Forward pass
        outputs = model(inputs)
        loss = criterion_mse(outputs, targets) # Calculate mse loss
        mae_metric = criterion_mae(outputs, targets) # Calculate mae metric

        # Backward pass and optimization
        optimizer.zero_grad()  # Clear gradients
        loss.backward()  # Compute gradients
        optimizer.step()  # Update weights

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, MAE: {mae_metric.item():.4f}')

Epoch [1/20], Loss: 0.2949, MAE: 0.4321
Epoch [2/20], Loss: 0.3609, MAE: 0.4798
Epoch [3/20], Loss: 0.2324, MAE: 0.3626
Epoch [4/20], Loss: 0.2797, MAE: 0.4249
Epoch [5/20], Loss: 0.2151, MAE: 0.3183
Epoch [6/20], Loss: 0.4589, MAE: 0.4458
Epoch [7/20], Loss: 0.3397, MAE: 0.4017
Epoch [8/20], Loss: 0.3328, MAE: 0.3763
Epoch [9/20], Loss: 0.2225, MAE: 0.3703
Epoch [10/20], Loss: 0.3482, MAE: 0.3989
Epoch [11/20], Loss: 0.1467, MAE: 0.2827
Epoch [12/20], Loss: 0.2593, MAE: 0.3874
Epoch [13/20], Loss: 0.1358, MAE: 0.2801
Epoch [14/20], Loss: 0.3271, MAE: 0.3946
Epoch [15/20], Loss: 0.1647, MAE: 0.2790
Epoch [16/20], Loss: 0.2150, MAE: 0.3677
Epoch [17/20], Loss: 0.1741, MAE: 0.3077
Epoch [18/20], Loss: 0.6339, MAE: 0.4920
Epoch [19/20], Loss: 0.3624, MAE: 0.4298
Epoch [20/20], Loss: 0.2121, MAE: 0.3409


**Explanation**:

- **Data preparation**: In PyTorch, before we start training a neural network, we need to prepare your data. This often involves loading the data into batches and shuffling it to ensure that the model learns effectively and is typically handled using `DataLoader` and `TensorDataset`.
    - **TensorDataset**: This utility creates a dataset object that combines input data (features) with target data (labels). It is a simple way to handle datasets that can be represented as tensors which can be fed into a `DataLoader` and then into the model during training. `TensorDataset` is useful for creating a dataset when the data is already in tensor form.
        ```python
        train_dataset = TensorDataset(features, labels)
        ```
        
    - **DataLoader**: This is a PyTorch class that handles batching, shuffling, and loading data efficiently. It takes a dataset (like one created with `TensorDataset`) and returns batches of data during training. It handles the process of iterating through our dataset.
        ```python
        train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
        ```
        
        - **Batching** (`batch_size`): Data is split into smaller batches of a defined size (e.g., `batch_size=32`). This allows the model to train on smaller chunks of data rather than the entire dataset at once, which is more efficient and helps with generalization. A batch size of 32 means that 32 samples are processed before the model's weights are updated.
        - **Shuffling** (`shuffle`): Setting `shuffle=True` randomizes the order of data at the beginning of each epoch, which helps prevent the model from learning patterns in the data order instead of the data itself.


- **The Training Loop** (`for epoch in range(num_epochs)`): The training loop is where the model learns from the data by iterating through it multiple times (epochs) and adjusting its weights to minimize the loss. In PyTorch, we need to explicitly define the training loop. This is a key difference from TensorFlow’s Keras API, where methods like `model.fit()` abstract away the details of the training process.
    - **Number of epochs**: Training is usually done over several epochs. An epoch is one complete pass through the entire training dataset. We usually train for multiple epochs to allow the model to improve its predictions gradually.
    - **Iterating over batches** (`for inputs, targets in train_loader:`): For each epoch, the data is divided into batches, and the model processes one batch at a time. PyTorch provides easy iteration over these batches using the DataLoader, which gives you batches of input data (`inputs`) and their corresponding targets (`targets`).
        - **inputs, targets**: These variables represent the features and labels for the current batch. `train_loader` automatically handles fetching batches and feeding them into the loop.
    - **Forward pass** (`outputs = model(inputs)`): This step involves passing the inputs through the model to generate predictions. This is the step where the `forward()` method of the model is automatically called here and the model produces output based on its current weights.
    - **Loss calculation** (`loss = criterion(outputs, targets)`): After the forward pass, the loss (or error) is calculated by comparing the model’s predictions (`outputs`) with the actual targets (`targets`).
        - **criterion**: This is the loss function defined earlier. It computes how far the model’s predictions are from the actual values.
    - **Backward pass (Backpropagation)**: After calculating the loss, we need to update the model's weights to reduce the loss. This is done through backpropagation, where gradients of the loss (i.e., how much each weight contributes to the loss) with respect to each model parameter (weight) are computed and used to update the weights. PyTorch requires that we manually handle this process.    
        ```python
        optimizer.zero_grad()  # Clear the old gradients
        loss.backward()  # Compute new gradients
        ```
        - **Clearing gradients** (`optimizer.zero_grad()`): clears the gradients from the previous iteration to avoid accumulation. In PyTorch, gradients accumulate by default, so they need to be reset before each new backpropagation step.
        - **Computing gradients** (`loss.backward()`): calculates the gradients of the loss with respect to the model parameters.
    - **Update weights** (`optimizer.step()`): Finally, the optimizer updates the model’s parameters based on the gradients computed in the backward pass. This step modifies the model’s weights in a way that minimizes the loss, using the specific optimization algorithm (like Adam or SGD) that was defined earlier.

It’s common to print the loss after each epoch to monitor the model’s progress. `loss.item()` retrieves the scalar value of the loss.


### Step 1.4: Evaluate the model
After training, we evaluate the model's performance on a test dataset.

In [6]:
# Evaluate the model on test data
model.eval()  # Set the model to evaluation mode
with torch.no_grad():  # Disable gradient computation
    predictions = model(X_test)
    test_loss = criterion_mse(predictions, y_test)
    mae_metric = criterion_mae(predictions, y_test)
    
print(f'Test loss: {test_loss.item():.4f}, MAE: {mae_metric.item():.4f}')

Test loss: 0.3004, MAE: 0.3822


**Explanation**:

- **Setting to evaluation mode** (`model.eval()`): First, we switch the model to evaluation mode. This is important because certain layers, like dropout and batch normalization, behave differently during training and evaluation. For example, dropout layers will be deactivated during evaluation to use the full network capacity. This method ensures that the model operates in a deterministic way, appropriate for making predictions on test data.
- **Disabling gradient computation** (`torch.no_grad()`): This is a context manager that disables gradient computation during evaluation. This is crucial during evaluation or inference because we don’t need to compute gradients when you’re not training the model. It also saves memory and speeds up computations. 
    - We use it within the context manager (`with`) to ensures that gradient computation is disabled only for the specific block of code. This prevents accidental updates to the model’s parameters and avoids potential bugs where gradients are computed when they shouldn’t be.
- **Making predictions on test data** (`predictions = model(test_data)`): Here, we pass the test data (`test_data`) through the model to get predictions. The model’s `forward()` method is called, processing the input data through the network layers to produce output predictions.
- **Calculating test loss** (`test_loss = criterion(predictions, test_labels)`): The loss on the test data is computed by comparing the model’s predictions to the actual test labels (`test_labels`). The `criterion` here refers to the loss function defined earlier (e.g., MSE for regression).


### Step 1.5: Make predictions
Finally, we can use the trained model to make predictions on new data.

In [7]:
# Make predictions
with torch.no_grad():
    predictions = model(X_test)
    
print(predictions[:5])

tensor([[0.4923],
        [1.0944],
        [4.1475],
        [2.6493],
        [2.9261]])


The explanations for this section are similar to those in the last section. Since we already set the model to evaluation mode using `model.eval()` in the previous section, we don’t need to set it again here. The model remains in evaluation mode unless explicitly switched back to training mode.


### Step 1.6: Save and load the model
Saving the trained model allows us to reuse it later without retraining.

In [8]:
# Save the model
torch.save(model.state_dict(), 'regression_model.pth')

# Load the model
loaded_model = RegressionModel()
loaded_model.load_state_dict(torch.load('regression_model.pth'))

<All keys matched successfully>

**Explanation**:
In PyTorch, saving and loading models is done using `torch.save()` and `torch.load()`. PyTorch typically saves the model's state dictionary, which contains the model's parameters.

- **Save model**: This function is used to save the model's state dictionary, which includes all the parameters (weights and biases) of the model, but not the model architecture itself.
    ```python
    torch.save(model.state_dict(), 'file_path.pth')
    ```
    
    - **`model.state_dict()`**: This method returns a dictionary containing the model's parameters. The dictionary maps each layer's name to its corresponding tensor of weights and biases.
    - **`'file_path.pth'`**: The file path where the state dictionary will be saved. The `.pth` extension is commonly used in PyTorch for saving models.
    - In TensorFlow, saving with `model.save()` saves the entire model, including the architecture, optimizer state, and weights. PyTorch separates these concerns, typically focusing on saving just the model parameters.
- **Load model**:
    ```python
    loaded_model = MyModel()
    loaded_model.load_state_dict(torch.load('file_path.pth'))
    ```
    
    - **`loaded_model = MyModel()`**: We first need to create an instance of the model class (in this case, `MyModel`). This step initializes the model’s architecture but with random weights. In a clean environment, need to recreate the model class with the same architecture as the one used when saving the model. This step involves creating a new instance of the model class. This ensures that the saved weights are loaded correctly into the appropriate layers of the model.
    - **`torch.load()`**: This function loads the saved state dictionary from the file. It essentially reads the parameters from the file and returns them as a dictionary.
    - **`loaded_model.load_state_dict()`**: This method loads the saved parameters (from the state dictionary) into the model instance. This restores the model's parameters to the state they were in when the model was saved.
    
    
## 2. Neural network for classification

In this section, we will build a neural network to classify iris flowers into one of three species using the Iris dataset. This dataset contains features such as sepal length, sepal width, petal length, and petal width, which will be used to predict the species of the flower.

In [9]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)  # Convert labels to long for classification
y_test = torch.tensor(y_test, dtype=torch.long)

The process of loading and splitting the Iris dataset is similar to what we did for the regression task with the California Housing dataset. As in the regression task, we standardize the features and convert them into PyTorch tensors. The primary difference here is that the labels (`y_train` and `y_test`) are converted to the long data type, which is required for classification tasks in PyTorch.

### Step 2.1: Define the model architecture

We will define a simple feedforward neural network for classification.

In [10]:
# Define the neural network class
class ClassificationModel(nn.Module):
    def __init__(self):
        super(ClassificationModel, self).__init__()
        self.fc1 = nn.Linear(in_features=X_train.shape[1], out_features=64) # Input layer to first hidden layer with 64 neurons
        self.fc2 = nn.Linear(in_features=64, out_features=32) # First hidden layer to second hidden layer with 32 neurons
        self.fc3 = nn.Linear(in_features=32, out_features=3)  # Second hidden layer to output layer with 3 neurons (one for each class)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)  # Output layer (logits) - no activation here, as we will apply softmax in the loss function for classification
        return x

# Instantiate the model
model = ClassificationModel()

**Explanation**:

- **Defining the model class**: The structure of the model class and the process of defining layers in the `__init__` method and the forward pass in the `forward` method are similar to the regression model.
- **Output layer**: The key difference here is in the output layer (`fc3`). For classification, the number of output units corresponds to the number of classes (in this case, 3 for the Iris dataset). Unlike regression, where the output was a single continuous value, here we have multiple output units representing the probabilities (logits) for each class.
- **Softmax activation**: Unlike in regression, where we used no activation or a linear activation at the output layer, classification tasks typically require converting the raw outputs (logits) into probabilities. This is often done using a softmax function. However, instead of applying softmax in the `forward` method, we let the loss function handle this, which is typical in PyTorch.

### Step 2.3: Define the loss function and optimizer
Next, we need to define how the model will learn by specifying the loss function and the optimizer.

In [11]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()  # Cross-entropy loss for multi-class classification
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer

**Explanation**:
- **Loss function**: Unlike regression, where we used MSE and MAE loss, classification tasks require a different approach. For multi-class classification, we use cross entropy loss (`nn.CrossEntropyLoss()`).
  - **Cross entropy loss**: This loss function measures the difference between the predicted class probabilities (after applying softmax internally) and the actual class labels. It expects raw logits (outputs from the model before softmax is applied). The smaller the cross-entropy loss, the closer the predicted probabilities are to the true labels. It combines two steps into one: applying the softmax activation function to the output logits to get probabilities, and then calculating the negative log-likelihood of these probabilities given the true labels.
- **Optimizer**: The choice of optimizer, like Adam (`optim.Adam`), remains the same as in regression.

### Step 2.4: Train the model
Now, we will train the model using the training data.

In [12]:
# Prepare the DataLoader
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)

# Training loop
num_epochs = 20

for epoch in range(num_epochs):
    for inputs, targets in train_loader:
        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, targets)  # Calculate cross-entropy loss

        # Backward pass and optimization
        optimizer.zero_grad()  # Clear gradients
        loss.backward()  # Compute gradients
        optimizer.step()  # Update weights

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [1/20], Loss: 1.1545
Epoch [2/20], Loss: 1.0609
Epoch [3/20], Loss: 1.0571
Epoch [4/20], Loss: 0.9860
Epoch [5/20], Loss: 0.9736
Epoch [6/20], Loss: 0.9107
Epoch [7/20], Loss: 0.8850
Epoch [8/20], Loss: 0.7855
Epoch [9/20], Loss: 0.7750
Epoch [10/20], Loss: 0.6983
Epoch [11/20], Loss: 0.6373
Epoch [12/20], Loss: 0.6996
Epoch [13/20], Loss: 0.6634
Epoch [14/20], Loss: 0.5226
Epoch [15/20], Loss: 0.4867
Epoch [16/20], Loss: 0.5028
Epoch [17/20], Loss: 0.3194
Epoch [18/20], Loss: 0.3007
Epoch [19/20], Loss: 0.3168
Epoch [20/20], Loss: 0.3777


**Explanation**:

The steps here are similar to the regression task, where we use `DataLoader` to handle batching and shuffling, and the training loop consists of forward passes, loss calculations, and weight updates using backpropagation.
- **Loss calculation**: The key difference is in the loss calculation, where we use the cross-entropy loss instead of MSE.

### Step 2.5: Evaluate the model
After training, we evaluate the model’s performance on the test dataset.

In [13]:
# Evaluate the model
model.eval()  # Set the model to evaluation mode

with torch.no_grad():  # Disable gradient calculation for evaluation
    outputs = model(X_test)
    _, predicted = torch.max(outputs, 1)  # Get the class with the highest probability
    accuracy = (predicted == y_test).sum().item() / y_test.size(0)  # Calculate accuracy

print(f'Accuracy: {accuracy * 100:.2f}%')

Accuracy: 93.33%


**Explanation**:

- **Evaluation mode**: As with regression, we switch the model to evaluation mode using `model.eval()` to disable dropout and other training-specific operations. We also disables gradient calculation.
- **Prediction and accuracy**: For classification, after getting the model's outputs (logits), we use `torch.max(outputs, 1)` to get the index of the class with the highest probability. This index corresponds to the predicted class. The accuracy is then calculated by comparing these predictions to the true labels (`y_test`).
  - **torch.max**: This function is used to find the maximum value in each row of the output tensor (representing the probability of each class) and returns both the value and the index of the maximum value. The index corresponds to the predicted class.
  - **Accuracy**: The accuracy metric is straightforward for classification tasks, representing the percentage of correctly classified samples.
  
### Step 2.6: Save and load the model

Saving the trained model allows us to reuse it later without retraining.

In [14]:
# Save the model
torch.save(model.state_dict(), 'classification_model.pth')

# Load the model
loaded_model = ClassificationModel()
loaded_model.load_state_dict(torch.load('classification_model.pth'))

<All keys matched successfully>

The process of saving and loading the model is the same as described in the regression section. The difference is the model class (`ClassificationModel`), which must match the architecture of the saved model. Once the model is loaded, it can be used for making predictions on new data or further fine-tuning.