<a href="https://colab.research.google.com/github/lolobq/ECGR-5105-Intro_To_Machine_Learning/blob/main/Homework6/Homework6Problem1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Libraries and Data

In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np

# Problem 1a

Develop a Fully Connected Neural Network with only one hidden layer (size of 32) to predict the housing value for the housing dataset. Make sure to include all input features. Compare your training loss value and validation results against the linear regression you implemented in Homework 5. Can you compare your model complexity (number of trainable parameters) against linear regression? Note: Perform 20%, and 80% split for training and validation.

In [8]:
# Import csv data from my GitHub repo
housing_url = 'https://raw.githubusercontent.com/lolobq/ECGR-5105-Intro_To_Machine_Learning/master/Homework6/Housing.csv'
data = pd.read_csv(housing_url)

# Map string variables to binary values
variable_list = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea']

def binary_mapping(x):
  return x.map({'no' : 0, 'yes' : 1})

data[variable_list] = data[variable_list].apply(binary_mapping)
data = data.drop('furnishingstatus', axis=1)

# Assuming the target variable is 'housing_value', adjust accordingly
y = data['price'].values
data = data.drop('price', axis=1)
x = data.values

# Split the dataset into training and validation sets
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.8, random_state=42)

# Standardize the input features
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_val = scaler.transform(x_val)

# Standardize the output features
scaler_y = StandardScaler()
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()
y_val_scaled = scaler_y.fit_transform(y_val.reshape(-1, 1)).flatten()

# Convert data to PyTorch tensors
x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train_scaled, dtype=torch.float32)
x_val_tensor = torch.tensor(x_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val_scaled, dtype=torch.float32)

# Create DataLoader for training and validation sets
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

val_dataset = TensorDataset(x_val_tensor, y_val_tensor)
val_loader = DataLoader(val_dataset, batch_size=32)

# Define the model
model = nn.Sequential(
    nn.Linear(x_train.shape[1], 32),
    nn.Tanh(),
    nn.Linear(32, 1)
)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

# Training loop
epochs = 5000
for epoch in range(epochs+1):
    model.train()
    for inputs, targets in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()

    # Validation
    model.eval()
    with torch.no_grad():
        val_outputs = model(x_val_tensor)
        val_loss = criterion(val_outputs.squeeze(), y_val_tensor)
        if epoch % 500 == 0:
          print(f'Epoch {epoch}/{epochs}, Training Loss: {loss.item()}, Validation Loss: {val_loss.item()}')

Epoch 0/5000, Training Loss: 0.8740482330322266, Validation Loss: 1.2669895887374878
Epoch 500/5000, Training Loss: 0.270549476146698, Validation Loss: 0.4624500572681427
Epoch 1000/5000, Training Loss: 0.285550057888031, Validation Loss: 0.48047706484794617
Epoch 1500/5000, Training Loss: 0.30756011605262756, Validation Loss: 0.47917163372039795
Epoch 2000/5000, Training Loss: 0.39476126432418823, Validation Loss: 0.47147655487060547
Epoch 2500/5000, Training Loss: 0.24433428049087524, Validation Loss: 0.4641323387622833
Epoch 3000/5000, Training Loss: 0.09919007867574692, Validation Loss: 0.4574287533760071
Epoch 3500/5000, Training Loss: 0.30647051334381104, Validation Loss: 0.4536994397640228
Epoch 4000/5000, Training Loss: 0.15413716435432434, Validation Loss: 0.44924670457839966
Epoch 4500/5000, Training Loss: 0.37633734941482544, Validation Loss: 0.44976645708084106
Epoch 5000/5000, Training Loss: 0.25254371762275696, Validation Loss: 0.44664818048477173


# Problem 1b

We will increase the network complexity by adding two additional hidden layers, the hidden layers overall. My suggestions for the size of layers are: 32, 64, 16, respectively. Please redesign the network and compare your training loss value and validation results against the linear regression you implemented in Homework 5 and Problem 1.a. Can you compare your model complexity? Note: Use the same 20%, and 80% split for training and validation.

In [9]:
data = pd.read_csv(housing_url)

# Map string variables to binary values
variable_list = ['mainroad', 'guestroom', 'basement', 'hotwaterheating', 'airconditioning', 'prefarea']

def binary_mapping(x):
    return x.map({'no': 0, 'yes': 1})

data[variable_list] = data[variable_list].apply(binary_mapping)
data = data.drop('furnishingstatus', axis=1)

# Assuming the target variable is 'housing_value', adjust accordingly
y = data['price'].values
data = data.drop('price', axis=1)
x = data.values

# Split the dataset into training and validation sets
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.8, random_state=42)

# Standardize the input features
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_val = scaler.transform(x_val)

# Standardize the output features
scaler_y = StandardScaler()
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()
y_val_scaled = scaler_y.fit_transform(y_val.reshape(-1, 1)).flatten()

# Convert data to PyTorch tensors
x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train_scaled, dtype=torch.float32)
x_val_tensor = torch.tensor(x_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val_scaled, dtype=torch.float32)

# Create DataLoader for training and validation sets
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

val_dataset = TensorDataset(x_val_tensor, y_val_tensor)
val_loader = DataLoader(val_dataset, batch_size=32)

# Define the model with the first hidden layer having 32 neurons
model = nn.Sequential(
    nn.Linear(x_train.shape[1], 32),  # First hidden layer with 32 neurons
    nn.Tanh(),
    nn.Linear(32, 64),  # Second hidden layer with 64 neurons
    nn.Tanh(),
    nn.Linear(64, 16),  # Third hidden layer with 16 neurons
    nn.Tanh(),
    nn.Linear(16, 1)  # Output layer with 1 neuron
)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

# Training loop
epochs = 5000
for epoch in range(epochs + 1):
    model.train()
    for inputs, targets in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()

    # Validation
    model.eval()
    with torch.no_grad():
        val_outputs = model(x_val_tensor)
        val_loss = criterion(val_outputs.squeeze(), y_val_tensor)
        if epoch % 500 == 0:
            print(f'Epoch {epoch}/{epochs}, Training Loss: {loss.item()}, Validation Loss: {val_loss.item()}')

Epoch 0/5000, Training Loss: 1.186692237854004, Validation Loss: 1.1106796264648438
Epoch 500/5000, Training Loss: 0.26614847779273987, Validation Loss: 0.41631677746772766
Epoch 1000/5000, Training Loss: 0.571448028087616, Validation Loss: 0.46390071511268616
Epoch 1500/5000, Training Loss: 0.3825329542160034, Validation Loss: 0.468228816986084
Epoch 2000/5000, Training Loss: 0.07898759841918945, Validation Loss: 0.4665388762950897
Epoch 2500/5000, Training Loss: 0.10419765114784241, Validation Loss: 0.4629688858985901
Epoch 3000/5000, Training Loss: 0.08431097865104675, Validation Loss: 0.4802996814250946
Epoch 3500/5000, Training Loss: 0.19947503507137299, Validation Loss: 0.5038023591041565
Epoch 4000/5000, Training Loss: 0.04506227746605873, Validation Loss: 0.530514121055603
Epoch 4500/5000, Training Loss: 0.08443551510572433, Validation Loss: 0.5555590391159058
Epoch 5000/5000, Training Loss: 0.07330196350812912, Validation Loss: 0.5717246532440186
