## Simple Neural Network using PyTorch

This guide explains how to create and train a simple feedforward neural network using PyTorch. The network consists of one hidden layer and uses ReLU activation. This example is a basic introduction to neural networks in PyTorch.

### Imports and Setup

In [4]:
import torch
import torch.nn as nn
import torch.optim as optim

# Set random seed for reproducibility
torch.manual_seed(42)

<torch._C.Generator at 0x115398c30>

- torch: The core PyTorch library for tensor operations.
- torch.nn: Contains modules and classes for building neural networks.
- torch.optim: Provides optimization algorithms like Adam, SGD, etc.
- torch.manual_seed(42): Sets the random seed to ensure that results are reproducible.

### Neural Network Architecture

In [5]:
class SimpleNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size):
      super(SimpleNN, self).__init__()
      self.hidden = nn.Linear(input_size, hidden_size)
      self.relu = nn.ReLU()
      self.output = nn.Linear(hidden_size, output_size)
  
  def forward(self, x):
      x = self.hidden(x)
      x = self.relu(x)
      x = self.output(x)
      return x

- SimpleNN: A class that defines the neural network architecture.
    - init: Initializes the network layers.
        - nn.Linear(input_size, hidden_size): Defines a fully connected layer from input to hidden layer.
        - nn.ReLU(): Applies the ReLU activation function.
        - nn.Linear(hidden_size, output_size): Defines a fully connected layer from hidden to output layer.
    - forward: Defines the forward pass of the network, specifying how data flows through the layers.

### Network Parameters and Instantiation

In [6]:
# Define network parameters
input_size = 10
hidden_size = 20
output_size = 2

# Create an instance of the neural network
model = SimpleNN(input_size, hidden_size, output_size)

- input_size: Number of input features.
- hidden_size: Number of neurons in the hidden layer.
- output_size: Number of output features.
- model: An instance of the SimpleNN class.

### Loss Function and Optimizer

In [7]:
# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

- criterion: The loss function used to measure the difference between the predicted and actual values. Here, Mean Squared Error (MSE) is used.
- optimizer: The optimization algorithm used to update the model's weights. Adam is chosen for its efficiency and adaptability.

### Data Generation

In [8]:
# Generate some dummy data
X = torch.randn(100, input_size)
y = torch.randn(100, output_size)

- X: Randomly generated input data with 100 samples, each having `input_size` features.
- y: Randomly generated target data with 100 samples, each having `output_size` features.

### Training Loop

In [9]:
# Training loop
num_epochs = 100
for epoch in range(num_epochs):
  # Forward pass
  outputs = model(X)
  loss = criterion(outputs, y)
  
  # Backward pass and optimization
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()
  
  # Print progress
  if (epoch + 1) % 10 == 0:
      print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [10/100], Loss: 0.7376
Epoch [20/100], Loss: 0.6285
Epoch [30/100], Loss: 0.5232
Epoch [40/100], Loss: 0.4220
Epoch [50/100], Loss: 0.3297
Epoch [60/100], Loss: 0.2574
Epoch [70/100], Loss: 0.2041
Epoch [80/100], Loss: 0.1678
Epoch [90/100], Loss: 0.1422
Epoch [100/100], Loss: 0.1259


- num_epochs: Number of times the entire dataset is passed through the network.
- Forward pass: Computes the output of the network.
- Loss computation: Calculates the loss using the criterion.
- Backward pass: Computes the gradient of the loss with respect to the model parameters.
- Optimizer step: Updates the model parameters based on the computed gradients.
- Progress printing: Outputs the loss every 10 epochs to monitor training progress.



1. **`optimizer.zero_grad()`**:
   - **Purpose**: This line resets the gradients of all model parameters before computing the new gradients.
   - **Why it's needed**: In PyTorch, gradients are accumulated by default. This means that if you don't reset them, the gradients from the previous batch will be added to the gradients of the current batch, leading to incorrect updates. By zeroing the gradients, you ensure that each batch's gradients are computed independently.

2. **`loss.backward()`**:
   - **Purpose**: This line computes the gradient of the loss with respect to each parameter (weight and bias) in the model using backpropagation.
   - **How it works**: PyTorch automatically computes the gradients by following the chain rule of calculus. It traverses the computation graph from the loss backward to each parameter, calculating the partial derivatives of the loss with respect to each parameter.
   - **Result**: After this call, each parameter in the model has its `.grad` attribute populated with the gradient of the loss with respect to that parameter.

3. **`optimizer.step()`**:
   - **Purpose**: This line updates the model parameters using the gradients computed in the previous step.
   - **How it works**: The optimizer adjusts each parameter by a small amount in the direction that reduces the loss. The specific adjustment depends on the optimization algorithm being used (e.g., Adam, SGD). For instance, in the case of the Adam optimizer, it uses the gradients along with running averages of past gradients and squared gradients to compute the parameter updates.
   - **Effect**: This step effectively moves the model parameters in the direction that minimizes the loss, iteratively improving the model's performance on the training data.

### Model Testing

In [10]:
# Test the model
test_input = torch.randn(1, input_size)
prediction = model(test_input)
print("Test input:", test_input)
print("Model prediction:", prediction)

Test input: tensor([[ 0.3649,  1.0394, -0.9494,  0.3799,  0.2059, -1.2399,  0.5463, -0.4714,
          1.1313,  1.8746]])
Model prediction: tensor([[1.7288, 2.0155]], grad_fn=<AddmmBackward0>)


- test_input: A single random input sample for testing the model.
- prediction: The model's output for the test input, demonstrating how to use the trained model for inference.

### Consolidated Code

In [11]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the neural network architecture
class SimpleNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size):
      super(SimpleNN, self).__init__()
      self.hidden = nn.Linear(input_size, hidden_size)
      self.relu = nn.ReLU()
      self.output = nn.Linear(hidden_size, output_size)
  
  def forward(self, x):
      x = self.hidden(x)
      x = self.relu(x)
      x = self.output(x)
      return x

# Set random seed for reproducibility
torch.manual_seed(42)

# Define network parameters
input_size = 10
hidden_size = 20
output_size = 2

# Create an instance of the neural network
model = SimpleNN(input_size, hidden_size, output_size)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Generate some dummy data
X = torch.randn(100, input_size)
y = torch.randn(100, output_size)

# Training loop
num_epochs = 100
for epoch in range(num_epochs):
  # Forward pass
  outputs = model(X)
  loss = criterion(outputs, y)
  
  # Backward pass and optimization
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()
  
  # Print progress
  if (epoch + 1) % 10 == 0:
      print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print("Training complete!")

# Test the model
test_input = torch.randn(1, input_size)
prediction = model(test_input)
print("Test input:", test_input)
print("Model prediction:", prediction)

Epoch [10/100], Loss: 0.7376
Epoch [20/100], Loss: 0.6285
Epoch [30/100], Loss: 0.5232
Epoch [40/100], Loss: 0.4220
Epoch [50/100], Loss: 0.3297
Epoch [60/100], Loss: 0.2574
Epoch [70/100], Loss: 0.2041
Epoch [80/100], Loss: 0.1678
Epoch [90/100], Loss: 0.1422
Epoch [100/100], Loss: 0.1259
Training complete!
Test input: tensor([[ 0.3649,  1.0394, -0.9494,  0.3799,  0.2059, -1.2399,  0.5463, -0.4714,
          1.1313,  1.8746]])
Model prediction: tensor([[1.7288, 2.0155]], grad_fn=<AddmmBackward0>)
