# Homework 5 (Full mark: 100pt)

You can use ``Google Colab`` if you would like to use GPU.
- https://colab.research.google.com/notebooks/welcome.ipynb
- https://theorydb.github.io/dev/2019/08/23/dev-ml-colab/

# 1. Regression (50pt)

**For this question, using PyTorch, implement the 1) ridge regression and 2) Lasso. You can refer to the tutorial link below for how to implement linear regression. Note that the ridge regression is the linear regression with L2 penalty, and the Lasso is the linear regression with L1 penalty. You should use ````Boston```` dataset as shown in the code below. You should not only write the code for the models, but also train them and show the test MSE.**
- https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/linear_regression/main.py

<div>
<img src="figures/regressions.png" width="700"/>
</div>

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [11]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import numpy as np
# To fix the random seed
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# load data
boston = pd.read_csv( 'drive/MyDrive//data/Boston.csv').drop('Unnamed: 0', axis=1)
data = torch.FloatTensor(boston.values)
X = data[:,:-1] # Input (X)
y = data[:,-1].reshape(-1, 1) # Ground Truth (y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [45]:
num_epochs = 1000
learning_rate = 0.0003

class Ridge(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        ## Write your answer here
        self.layer = nn.Linear(input_dim, 1)

    def forward(self, X):
        ## Write your answer here
        pred = self.layer(X)
        return pred


class Lasso(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        ## Write your answer here
        self.layer = nn.Linear(input_dim, 1)

    def forward(self, X):
        ## Write your answer here
        pred = self.layer(X)
        return pred


## Write your answer here (Training code)

ridge_lambda = 0.1
# Training setup
input_dim = X_train.shape[1]
ridge_model = Ridge(input_dim)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(ridge_model.parameters(), lr=learning_rate)

for epoch in range(num_epochs):

  y_pred = ridge_model(X_train)
  mse_loss = criterion(y_pred, y_train)

        # (L2 norm squared) loss
  ridge_loss = sum(torch.norm(param, 2)**2 for name, param in ridge_model.named_parameters() if 'weight' in name)

        # Combine MSE loss and Ridge loss
  total_loss = mse_loss + ridge_lambda * ridge_loss

  optimizer.zero_grad()
  total_loss.backward()
  optimizer.step()


  ridge_model.eval()
  with torch.no_grad():
    y_test_pred = ridge_model(X_test)
    test_loss = criterion(y_test_pred, y_test)
   # test_mse.append(test_loss.item())

    print(f'Epoch {epoch+1}/{num_epochs}, Ridge Training Loss: {total_loss.item():.4f}, Ridge Test MSE: {test_loss.item():.4f}')








Epoch 1/1000, Ridge Training Loss: 627.1058, Ridge Test MSE: 525.3033
Epoch 2/1000, Ridge Training Loss: 618.6272, Ridge Test MSE: 518.0363
Epoch 3/1000, Ridge Training Loss: 610.3033, Ridge Test MSE: 510.9347
Epoch 4/1000, Ridge Training Loss: 602.1368, Ridge Test MSE: 504.0005
Epoch 5/1000, Ridge Training Loss: 594.1301, Ridge Test MSE: 497.2353
Epoch 6/1000, Ridge Training Loss: 586.2856, Ridge Test MSE: 490.6405
Epoch 7/1000, Ridge Training Loss: 578.6050, Ridge Test MSE: 484.2177
Epoch 8/1000, Ridge Training Loss: 571.0904, Ridge Test MSE: 477.9677
Epoch 9/1000, Ridge Training Loss: 563.7432, Ridge Test MSE: 471.8914
Epoch 10/1000, Ridge Training Loss: 556.5649, Ridge Test MSE: 465.9891
Epoch 11/1000, Ridge Training Loss: 549.5566, Ridge Test MSE: 460.2611
Epoch 12/1000, Ridge Training Loss: 542.7192, Ridge Test MSE: 454.7074
Epoch 13/1000, Ridge Training Loss: 536.0532, Ridge Test MSE: 449.3275
Epoch 14/1000, Ridge Training Loss: 529.5591, Ridge Test MSE: 444.1208
Epoch 15/1000, 

In [43]:
lasso_lambda = 0.1
# Training setup
input_dim = X_train.shape[1]
lasso_model = Ridge(input_dim)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(lasso_model.parameters(), lr=learning_rate)

for epoch in range(num_epochs):

  y_pred = lasso_model(X_train)
  lasso_mse_loss = criterion(y_pred, y_train)

        # (L1 norm squared) loss
  lasso_loss = sum(torch.norm(param, 1) for name, param in ridge_model.named_parameters() if 'weight' in name)

        # Combine MSE loss and lasso loss
  total_loss = lasso_mse_loss + lasso_lambda * lasso_loss

  optimizer.zero_grad()
  total_loss.backward()
  optimizer.step()


  lasso_model.eval()
  with torch.no_grad():
    y_test_pred1 = lasso_model(X_test)
    lasso_test_loss = criterion(y_test_pred1, y_test)


    print(f'Epoch {epoch+1}/{num_epochs}, Lasso Training Loss: {total_loss.item():.4f}, Lasso Test MSE: {lasso_test_loss.item():.4f}')

Epoch 1/1000, Lasso Training Loss: 4185.4033, Lasso Test MSE: 4666.4141
Epoch 2/1000, Lasso Training Loss: 4157.4219, Lasso Test MSE: 4635.4517
Epoch 3/1000, Lasso Training Loss: 4129.5991, Lasso Test MSE: 4604.6582
Epoch 4/1000, Lasso Training Loss: 4101.9360, Lasso Test MSE: 4574.0386
Epoch 5/1000, Lasso Training Loss: 4074.4358, Lasso Test MSE: 4543.5938
Epoch 6/1000, Lasso Training Loss: 4047.1018, Lasso Test MSE: 4513.3281
Epoch 7/1000, Lasso Training Loss: 4019.9353, Lasso Test MSE: 4483.2422
Epoch 8/1000, Lasso Training Loss: 3992.9380, Lasso Test MSE: 4453.3418
Epoch 9/1000, Lasso Training Loss: 3966.1140, Lasso Test MSE: 4423.6260
Epoch 10/1000, Lasso Training Loss: 3939.4644, Lasso Test MSE: 4394.0991
Epoch 11/1000, Lasso Training Loss: 3912.9929, Lasso Test MSE: 4364.7637
Epoch 12/1000, Lasso Training Loss: 3886.6992, Lasso Test MSE: 4335.6221
Epoch 13/1000, Lasso Training Loss: 3860.5874, Lasso Test MSE: 4306.6738
Epoch 14/1000, Lasso Training Loss: 3834.6580, Lasso Test MS

# 2. Autoencoder (50pt)
**Autoencoder is an unsupervised neural network model for learning representations of the input. In the figure below, you can see the structure of an autoencoder network. Given the original input image, we first encode the image using ``Encoder`` to a compressed representation, and reconstruct the image by using ``Decoder`` given the compressed representation. The compressed represesntation can be used as the dimension reduced representation of the original input. In this regard, autoencoder is also known as a model for dimensionality reduction (Recall PCA was also a method for dimensionality reduction). Note that the encoder and the decoder can be any neural network model such as MLP, CNN, MLP, etc.**

**For this question, you will use MNIST dataset to implement two versions of autoencoders: 1) An MLP-based autoencoder, and 2) A CNN-based autoencoder. The figure below shows an example of the MLP-based autoencoder. For the MLP-based autoencoder, you should follow the structure of the autoencoder shown in the figure below. For the CNN-based autoencoder, you are free to choose the architecture. You only need to implement ``Autoencoder_MLP`` class and ``Autoencoder_CNN`` class. Note that for ``Autoencoder_MLP``, you should flatten the original image into a vector, whereas for ``Autoencoder_CNN``, you can use the original image without any modification. After implementing these two classes, save the figures and observe the results. Write a few sentences to describe your findings. You do not need to submit the saved figure and the dataset for your final submission.**

<div>
<img src="figures/autoencoder.png" width="700"/>
</div>

In [17]:
import os

import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import save_image

# Make a directory "saved_img" if it does not exist
if not os.path.exists('./saved_img'):
    os.mkdir('./saved_img')


def to_img(x):
    x = 0.5 * (x + 1)
    x = x.clamp(0, 1)
    x = x.view(x.size(0), 1, 28, 28)
    return x


num_epochs = 100
batch_size = 128
learning_rate = 1e-3

img_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])
])

dataset = MNIST('drive/MyDrive/data', transform=img_transform)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)


class Autoencoder_MLP(nn.Module):
    def __init__(self):
        super(Autoencoder_MLP, self).__init__()
        ## Write your answer here
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 12),
            nn.ReLU(),
            nn.Linear(12, 3)
        )
        self.decoder = nn.Sequential(
            nn.Linear(3, 12),
            nn.ReLU(),
            nn.Linear(12, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, 28 * 28),
            nn.Tanh()
        )

    def forward(self, x):
        ## Write your answer here
        x = self.encoder(x)
        x = self.decoder(x)
        return x




# Uncomment below correspondingly

model = Autoencoder_MLP().cuda()
# model = Autoencoder_CNN().cuda()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)



for epoch in range(num_epochs):
    for data in dataloader:
        img, _ = data
        img = img.cuda()  # Move your images to GPU here

        # Flatten the image for Autoencoder_MLP (You can remove this line for Autoencoder_CNN)
        img = img.view(img.size(0), -1)

        # forward pass
        output = model(img)
        loss = criterion(output, img)
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Print log and save images
    print('epoch [{}/{}], loss:{:.4f}'.format(epoch + 1, num_epochs, loss.item()))
    if epoch % 10 == 0:
        pic = to_img(output.cpu().data)
        save_image(pic, './saved_img/image_{}.png'.format(epoch))

epoch [1/100], loss:0.1862
epoch [2/100], loss:0.1730
epoch [3/100], loss:0.1608
epoch [4/100], loss:0.1637
epoch [5/100], loss:0.1420
epoch [6/100], loss:0.1640
epoch [7/100], loss:0.1513
epoch [8/100], loss:0.1333
epoch [9/100], loss:0.1437
epoch [10/100], loss:0.1464
epoch [11/100], loss:0.1441
epoch [12/100], loss:0.1376
epoch [13/100], loss:0.1233
epoch [14/100], loss:0.1300
epoch [15/100], loss:0.1418
epoch [16/100], loss:0.1284
epoch [17/100], loss:0.1278
epoch [18/100], loss:0.1340
epoch [19/100], loss:0.1278
epoch [20/100], loss:0.1285
epoch [21/100], loss:0.1380
epoch [22/100], loss:0.1305
epoch [23/100], loss:0.1377
epoch [24/100], loss:0.1282
epoch [25/100], loss:0.1351
epoch [26/100], loss:0.1367
epoch [27/100], loss:0.1338
epoch [28/100], loss:0.1344
epoch [29/100], loss:0.1312
epoch [30/100], loss:0.1294
epoch [31/100], loss:0.1214
epoch [32/100], loss:0.1275
epoch [33/100], loss:0.1235
epoch [34/100], loss:0.1331
epoch [35/100], loss:0.1425
epoch [36/100], loss:0.1281
e

In [26]:
class Autoencoder_CNN(nn.Module):
    def __init__(self):
        super(Autoencoder_CNN, self).__init__()

        # Encoder
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 16, 3, stride=3, padding=1),
            nn.ReLU(True),
            nn.MaxPool2d(2, stride=2),
            nn.Conv2d(16, 8, 3, stride=2, padding=1),
            nn.ReLU(True),
            nn.MaxPool2d(2, stride=1)
        )

        # Decoder
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(8, 16, 3, stride=2),
            nn.ReLU(True),
            nn.ConvTranspose2d(16, 8, 5, stride=3, padding=1),
            nn.ReLU(True),
            nn.ConvTranspose2d(8, 1, 2, stride=2, padding=1),
            nn.Tanh()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

model = Autoencoder_CNN().cuda()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)

for epoch in range(num_epochs):
    for data in dataloader:
        img, _ = data
        img = img.cuda()  # Move your images to GPU here

        # forward pass
        output = model(img)
        loss = criterion(output, img)
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Print log and save images
    print('epoch [{}/{}], loss:{:.4f}'.format(epoch + 1, num_epochs, loss.item()))
    if epoch % 10 == 0:
        pic = to_img(output.cpu().data)
        save_image(pic, './saved_img/CNN_image_{}.png'.format(epoch))

epoch [1/100], loss:0.2236
epoch [2/100], loss:0.1839
epoch [3/100], loss:0.1597
epoch [4/100], loss:0.1388
epoch [5/100], loss:0.1349
epoch [6/100], loss:0.1249
epoch [7/100], loss:0.1336
epoch [8/100], loss:0.1320
epoch [9/100], loss:0.1282
epoch [10/100], loss:0.1157
epoch [11/100], loss:0.1182
epoch [12/100], loss:0.1171
epoch [13/100], loss:0.1183
epoch [14/100], loss:0.1178
epoch [15/100], loss:0.1139
epoch [16/100], loss:0.1176
epoch [17/100], loss:0.1209
epoch [18/100], loss:0.1136
epoch [19/100], loss:0.1077
epoch [20/100], loss:0.1137
epoch [21/100], loss:0.1102
epoch [22/100], loss:0.1033
epoch [23/100], loss:0.1082
epoch [24/100], loss:0.1077
epoch [25/100], loss:0.1152
epoch [26/100], loss:0.1083
epoch [27/100], loss:0.1165
epoch [28/100], loss:0.1133
epoch [29/100], loss:0.1011
epoch [30/100], loss:0.1057
epoch [31/100], loss:0.1152
epoch [32/100], loss:0.1127
epoch [33/100], loss:0.1041
epoch [34/100], loss:0.1047
epoch [35/100], loss:0.0985
epoch [36/100], loss:0.0978
e

I have noticed that as the number of epochs increases, the reconstructed images become clearer. However, the improvement is not consistent. The loss decreases initially, but then starts to fluctuate, sometimes increasing instead of continuously decreasing. This might be due as the model trains, it might start to overfit the training data, capturing noise rather than the underlying pattern, which can cause the loss to fluctuate.