### Homework 3 - Jeffrey Gong

### Part a (L1 Regularization)

#### Introduction
Used PyTorch for the first time (Used JAX previously). This was an interesting way to learn PyTorch and it was cool to be able to implement my own version of regularization.

#### Results
As you can see below, I achieved a L2-Relative error of 0.016 < 0.05, after using 100,000 epochs, a learning rate of 0.001, and a l1_lambda of 0.00001. 

In [1]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

# Define the oscillatory function
def f(x):
    result = torch.zeros_like(x)
    result[x < 0] = 5 + sum(torch.sin(k * x[x < 0]) for k in range(1, 5))
    result[x >= 0] = torch.cos(10 * x[x >= 0])
    return result

In [55]:
# Generate training data
x_train = torch.linspace(-np.pi, np.pi, 80)
y_train = f(x_train) + torch.randn(x_train.size()) * 0.1  # Adding Gaussian noise

# Generate testing data
x_test = torch.linspace(-np.pi, np.pi, 1000)
y_test = f(x_test)

In [76]:
# Define the neural network
class ReLUNet(nn.Module):
    def __init__(self):
        super(ReLUNet, self).__init__()
        self.fc1 = nn.Linear(1, 50)  # Input layer to hidden layer with 50 neurons
        self.fc2 = nn.Linear(50, 50)  # Hidden layer to another hidden layer with 50 neurons
        self.fc3 = nn.Linear(50, 1)  # Hidden layer to output layer

    def forward(self, x):
        x = torch.relu(self.fc1(x.unsqueeze(1)))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x.squeeze()

In [88]:
# Instantiate the model, loss function, and optimizer
model = ReLUNet()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
l1_lambda = 0.00001  # Regularization strength
# best so far: 0.00001

# Function for L1 regularization
def l1_regularization(model, loss):
    l1_norm = sum(p.abs().sum() for p in model.parameters())
    loss += l1_lambda * l1_norm
    return loss

In [89]:
# Train the model
epochs = 100000
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    y_pred = model(x_train)
    loss = criterion(y_pred, y_train)
    loss = l1_regularization(model, loss)  # Apply L1 regularization
    loss.backward()
    optimizer.step()

    if epoch % 500 == 0 or epoch == epochs-1:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

Epoch 0, Loss: 8.764226913452148
Epoch 500, Loss: 0.2782718241214752
Epoch 1000, Loss: 0.17133542895317078
Epoch 1500, Loss: 0.12010836601257324
Epoch 2000, Loss: 0.08732875436544418
Epoch 2500, Loss: 0.0655033066868782
Epoch 3000, Loss: 0.05131649971008301
Epoch 3500, Loss: 0.04194338247179985
Epoch 4000, Loss: 0.03606581687927246
Epoch 4500, Loss: 0.03214931860566139
Epoch 5000, Loss: 0.029873527586460114
Epoch 5500, Loss: 0.02694035694003105
Epoch 6000, Loss: 0.025153154507279396
Epoch 6500, Loss: 0.024354897439479828
Epoch 7000, Loss: 0.023468781262636185
Epoch 7500, Loss: 0.022691641002893448
Epoch 8000, Loss: 0.021713515743613243
Epoch 8500, Loss: 0.02313448116183281
Epoch 9000, Loss: 0.021611884236335754
Epoch 9500, Loss: 0.01984192430973053
Epoch 10000, Loss: 0.019190937280654907
Epoch 10500, Loss: 0.01917617954313755
Epoch 11000, Loss: 0.019613847136497498
Epoch 11500, Loss: 0.019601916894316673
Epoch 12000, Loss: 0.01784708723425865
Epoch 12500, Loss: 0.017682155594229698
Epo

In [90]:
# Calculate L2 relative error
model.eval()
with torch.no_grad():
    y_pred_test = model(x_test)
    l2_norm = torch.sqrt(torch.sum((y_pred_test - y_test) ** 2))
    f_norm = torch.sqrt(torch.sum(y_test ** 2))
    l2_relative_error = l2_norm / f_norm
    print(f'L2 Relative Error: {l2_relative_error.item()}')


L2 Relative Error: 0.016352219507098198
