## Prompt to ChatGPT-4o:
Write a neural network in PyTorch that can add two numbers. 
Call it AddNet. 
It should add two whole numbers that are between 0 and 100.
Goal 1 is to get it to add two arbitrary whole numbers in a defined range with the lowest possible error.Goal 2 is to keep the neural net and training definitions to be as simple as possible while still accomplishing Goal 1.
At each relevant step of your network definition and training, write about why you made the choice to use that "technique".

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Define the range of input numbers
NUM_RANGE = 101  # From 0 to 100 inclusive

# Define the AddNet model
class AddNet(nn.Module):
    def __init__(self):
        super(AddNet, self).__init__()
        # Simple feedforward network with 1 hidden layer
        self.fc1 = nn.Linear(2, 10)  # 2 inputs (two numbers) -> 10 hidden neurons
        self.fc2 = nn.Linear(10, 1)  # 10 hidden neurons -> 1 output (sum)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))  # ReLU activation for non-linearity
        x = self.fc2(x)  # No activation for output layer (regression task)
        return x

# Step 1: Initialize the model, loss function, and optimizer
model = AddNet()
criterion = nn.MSELoss()  # Mean Squared Error Loss for regression
optimizer = optim.Adam(model.parameters(), lr=0.01)  # Adam optimizer for fast convergence

# Explain choice of architecture:
# - Input: Two numbers as a 2D tensor (e.g., [45, 30])
# - Hidden layer: A small fully connected layer with 10 neurons balances simplicity and capacity
# - ReLU: Adds non-linearity to allow the network to learn complex relationships
# - Output: Single value (sum), no activation because this is a regression problem

# Step 2: Generate training data
def generate_data(size):
    x_data = np.random.randint(0, NUM_RANGE, size=(size, 2))  # Random pairs of integers
    y_data = np.sum(x_data, axis=1, keepdims=True)  # Compute sums
    return torch.tensor(x_data, dtype=torch.float32), torch.tensor(y_data, dtype=torch.float32)

train_x, train_y = generate_data(10000)  # Generate 10,000 training examples

# Step 3: Train the model
epochs = 500
for epoch in range(epochs):
    optimizer.zero_grad()  # Clear gradients
    predictions = model(train_x)  # Forward pass
    loss = criterion(predictions, train_y)  # Compute loss
    loss.backward()  # Backward pass
    optimizer.step()  # Update weights

    # Print progress
    if (epoch + 1) % 50 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

# Explain training process:
# - Loss: MSE is suitable for regression because it penalizes large errors
# - Optimizer: Adam is chosen for its adaptive learning rate and robustness
# - Epochs: 500 iterations are enough to converge for this simple task
# - Data: 10,000 examples provide sufficient coverage of input space (0 to 100)

# Step 4: Test the model
test_x, test_y = generate_data(1000)  # Generate test data
with torch.no_grad():  # No gradient computation during testing
    test_predictions = model(test_x)
    test_loss = criterion(test_predictions, test_y)
    print(f"Test Loss: {test_loss.item():.4f}")

# Step 5: Demonstrate prediction
example = torch.tensor([[23, 45]], dtype=torch.float32)
with torch.no_grad():
    prediction = model(example).item()
    print(f"Prediction for input {example.tolist()}: {prediction:.2f} (Expected: {sum(example.tolist()[0])})")


Epoch [50/500], Loss: 143.0241
Epoch [100/500], Loss: 8.8742
Epoch [150/500], Loss: 0.6528
Epoch [200/500], Loss: 0.4755
Epoch [250/500], Loss: 0.3702
Epoch [300/500], Loss: 0.3032
Epoch [350/500], Loss: 0.2593
Epoch [400/500], Loss: 0.2286
Epoch [450/500], Loss: 0.2055
Epoch [500/500], Loss: 0.1877
Test Loss: 0.1801
Prediction for input [[23.0, 45.0]]: 68.49 (Expected: 68.0)
