##**Assignment 3**

### Overview:
In this assignment, you will explore the field of time series forecasting using deep learning techniques. The importance of this task lies in its real-world applications, such as predicting energy consumption, stock prices, and more. You will work with synthetic training and testing datasets to create, train, and evaluate feedforward neural networks using PyTorch. The assignment consists of several key tasks that build upon each other, with opportunities for feature engineering, model architecture design, and regularization techniques.

Dataset: I have created the training and testing datasets synthetically to mimic real-world energy consumption data. These datasets have been provided in the assignment folder

Training Data: Canvas->Files>Assigment3->energy_consumption_train.csv
Test Data: Canvas->Files>Assigment3->energy_consumption_test.csv
Tasks:

###1. Feature Engineering (2 points):
In this section, you will be required to create meaningful features from the provided DATE column in the training and testing datasets. You should consider features like year, month, day of the week, day of the month, public holidays, and any other features that you find relevant. Feature engineering is open-ended, and students are encouraged to implement features based on a suggested list. No wrong answers so no penalty as long as you implement a set of basic features (should be more than or equal to three features)



In [72]:
'''#importing necessary libraries'''
#for dataframe
import pandas as pd
import calendar #for day of week
import holidays #holidays with the dates

#scaling & one-hot & mse/msa
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import mean_squared_error, mean_absolute_error

#pytorch for nn
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
import torch.optim as optim

#plot raw images
import matplotlib.pyplot as plt
import numpy as np

In [73]:
'''Feature Engineering'''
#load csv into pd df
train_df = pd.read_csv('energytrain.csv')
test_df = pd.read_csv('energytest.csv')

#print first 3 in both df
print(train_df.head(3))
print(test_df.head(3))

         Date  Energy_Consumption_kWh
0  2022-04-11               600.45796
1  2020-03-13               435.67424
2  2021-07-21               404.28419
         Date  Energy_Consumption_kWh
0  2022-10-30              723.311422
1  2021-12-08              375.930672
2  2022-01-13              642.631686


In [74]:
#turning date into datetime format
train_df['Date'] = pd.to_datetime(train_df['Date'])
test_df['Date'] = pd.to_datetime(test_df['Date'])

#list
dfs = [train_df, test_df]

for df in dfs:
    df['y'] = df['Date'].dt.year
    df['m'] = df['Date'].dt.month
    df['d'] = df['Date'].dt.day
    df['day_of_week'] = df['Date'].dt.weekday  #monday=0, ..., sunday=6
    #df['DayName'] = df['Date'].dt.day_name()  #'Monday', 'Tuesday', etc.

    #holiday (1 if yes, 0 if no)
    us_holidays = holidays.US()
    df['holiday'] = df['Date'].apply(lambda x: 1 if x in us_holidays else 0)

#drop original column
train_df.drop(columns=['Date'], inplace=True)
test_df.drop(columns=['Date'], inplace=True)

#print & check
print(train_df.head(3))
print(test_df.head(3))

   Energy_Consumption_kWh     y  m   d  day_of_week  holiday
0               600.45796  2022  4  11            0        0
1               435.67424  2020  3  13            4        0
2               404.28419  2021  7  21            2        0
   Energy_Consumption_kWh     y   m   d  day_of_week  holiday
0              723.311422  2022  10  30            6        0
1              375.930672  2021  12   8            2        0
2              642.631686  2022   1  13            3        0


In [75]:
#separating target
y_train = train_df[['Energy_Consumption_kWh']].values.astype('float32')
y_test = test_df[['Energy_Consumption_kWh']].values.astype('float32')

In [76]:
#SCALING
scaler = StandardScaler()
target_scaler = StandardScaler()

numeric = ['y', 'm', 'd']
train_df[numeric] = scaler.fit_transform(train_df[numeric]).astype('float32')
test_df[numeric] = scaler.transform(test_df[numeric]).astype('float32')

y_train = target_scaler.fit_transform(y_train).astype('float32')
y_test = target_scaler.transform(y_test).astype('float32')

In [77]:
#ONE-HOT
encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore') #ignore unseen categories in test

categorical_features = ['day_of_week']
train_cat_encoded = encoder.fit_transform(train_df[categorical_features]).astype('float32')
test_cat_encoded = encoder.transform(test_df[categorical_features]).astype('float32')

#convert encoded categorical features to DataFrame
train_cat_df = pd.DataFrame(train_cat_encoded, columns=encoder.get_feature_names_out(categorical_features))
test_cat_df = pd.DataFrame(test_cat_encoded, columns=encoder.get_feature_names_out(categorical_features))

#reset indices to align
train_cat_df.index = train_df.index
test_cat_df.index = test_df.index

#concatenate encoded categorical features back to dataset & drop what we changed
train_df = pd.concat([train_df.drop(columns=categorical_features), train_cat_df], axis=1)
test_df = pd.concat([test_df.drop(columns=categorical_features), test_cat_df], axis=1)

#convert to PyTorch tensors
X_train = torch.tensor(train_df.drop(columns=['Energy_Consumption_kWh']).values, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)

X_test = torch.tensor(test_df.drop(columns=['Energy_Consumption_kWh']).values, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)

#create pytorch data loader
batch_size = 64
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

#print to verify
print("X_train shape:", X_train.shape, "| y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape, "| y_test shape:", y_test.shape)

X_train shape: torch.Size([8000, 11]) | y_train shape: torch.Size([8000, 1])
X_test shape: torch.Size([2000, 11]) | y_test shape: torch.Size([2000, 1])


###2. Neural Network Design (2 points):
You will design a feedforward neural network using PyTorch and the nn.Module framework. This network will serve as your baseline model for energy consumption forecasting. You'll need to write the training and testing code and report two key metrics: Mean Squared Error (MSE) and Mean Absolute Error (MAE). Explain what these metrics represent and their significance in evaluating the model's performance.

Code References for metrics

1. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html

2. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html



In [78]:
'''Neural Network Design'''
# define the neural network - baseline
class BaseEnergyConsumptionNN(nn.Module):
    def __init__(self, input_size):
        super(BaseEnergyConsumptionNN, self).__init__()

        self.fc1 = nn.Linear(input_size, 128)
        self.relu1 = nn.ReLU()

        self.fc2 = nn.Linear(128, 64)
        self.relu2 = nn.ReLU()

        self.fc3 = nn.Linear(64, 32)
        self.relu3 = nn.ReLU()

        self.fc4 = nn.Linear(32, 1)

    def forward(self, x):
        x = self.relu1(self.fc1(x))
        x = self.relu2(self.fc2(x))
        x = self.relu3(self.fc3(x))
        return self.fc4(x)


### 3. Model Modification with Dropout (2 points):
Building upon your baseline model, you will modify the neural network by adding dropout layers at several points in the architecture. Train this modified network using the training data and report the evaluation metrics (MSE and MAE). Share your observations and insights regarding the impact of dropout on model performance.


In [79]:
# adding dropout
class EnergyConsumptionNN_Dropout(nn.Module):
    def __init__(self, input_size, dropout_rate=0.2):
        super(EnergyConsumptionNN_Dropout, self).__init__()

        self.fc1 = nn.Linear(input_size, 128)
        self.relu1 = nn.ReLU()
        self.drop1 = nn.Dropout(dropout_rate)

        self.fc2 = nn.Linear(128, 64)
        self.relu2 = nn.ReLU()
        self.drop2 = nn.Dropout(dropout_rate)

        self.fc3 = nn.Linear(64, 32)
        self.relu3 = nn.ReLU()
        self.drop3 = nn.Dropout(dropout_rate)

        self.fc4 = nn.Linear(32, 1)

    def forward(self, x):
        x = self.drop1(self.relu1(self.fc1(x)))
        x = self.drop2(self.relu2(self.fc2(x)))
        x = self.drop3(self.relu3(self.fc3(x)))
        return self.fc4(x)


In [80]:
# training function - without l1/l2 regularization
def train_model(model, train_loader, num_epochs=100, loss_threshold=0.1):
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0005)

    for epoch in range(num_epochs):
        model.train()
        epoch_loss = 0

        for batch_X, batch_y in train_loader:
            optimizer.zero_grad()
            predictions = model(batch_X)
            loss = criterion(predictions, batch_y)
            loss.backward()
            optimizer.step()

            epoch_loss += loss.item()

        avg_loss = epoch_loss / len(train_loader)
        print(f"epoch [{epoch+1}/{num_epochs}] | train loss: {avg_loss:.4f}")

        if avg_loss < loss_threshold:
            print(f"stopping early at epoch {epoch+1} due to low training loss.")
            break




###4. Regularization with L1 and L2 (2 points):
Extend your modified network further by implementing both L1 and L2 regularization techniques on the error. Retrain the network with dropout layers and regularization and report the resulting metrics (MSE and MAE). Provide insights into how regularization affects the model's performance and whether it helps mitigate overfitting.



In [81]:
# training with l1 & l2
def train_model_with_regularization(model, train_loader, num_epochs=100, loss_threshold=0.1, l1_lambda=0.001, l2_lambda=0.001):
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0005)

    for epoch in range(num_epochs):
        model.train()
        epoch_loss = 0

        for batch_X, batch_y in train_loader:
            optimizer.zero_grad()
            predictions = model(batch_X)
            loss = criterion(predictions, batch_y)

            # apply L1 & L2 regularization
            l1_reg = sum(torch.norm(p, 1) for p in model.parameters()) * l1_lambda
            l2_reg = sum(torch.norm(p, 2) for p in model.parameters()) * l2_lambda
            loss += l1_reg + l2_reg

            loss.backward()
            optimizer.step()

            epoch_loss += loss.item()

        avg_loss = epoch_loss / len(train_loader)
        print(f"epoch [{epoch+1}/{num_epochs}] | train loss: {avg_loss:.4f}")

        if avg_loss < loss_threshold:
            print(f"stopping early at epoch {epoch+1} due to low training loss.")
            break


In [82]:
# instantiate models
input_size = X_train.shape[1]
baseline_model = BaseEnergyConsumptionNN(input_size)
dropout_model = EnergyConsumptionNN_Dropout(input_size)
dropout_model_l1l2 = EnergyConsumptionNN_Dropout(input_size)  # same dropout model but trained with L1 & L2

# train baseline model (no dropout, no L1/L2)
print("\ntraining baseline model:")
train_model(baseline_model, train_loader)

# train dropout model (no L1/L2)
print("\ntraining dropout model:")
train_model(dropout_model, train_loader)

# train dropout model with L1 & L2
print("\ntraining dropout model with L1 & L2:")
train_model_with_regularization(dropout_model_l1l2, train_loader)

# evaluate models
models = {
    "Baseline (No Dropout, No L1/L2)": baseline_model,
    "Dropout Model (No L1/L2)": dropout_model,
    "Dropout Model (With L1 & L2)": dropout_model_l1l2
}

results = {}
for name, model in models.items():
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test).numpy()
    mse = mean_squared_error(y_test.numpy(), y_pred)
    mae = mean_absolute_error(y_test.numpy(), y_pred)
    results[name] = (mse, mae)



training baseline model:
epoch [1/100] | train loss: 1.0019
epoch [2/100] | train loss: 1.0004
epoch [3/100] | train loss: 0.9991
epoch [4/100] | train loss: 0.9976
epoch [5/100] | train loss: 0.9979
epoch [6/100] | train loss: 0.9968
epoch [7/100] | train loss: 0.9950
epoch [8/100] | train loss: 0.9945
epoch [9/100] | train loss: 0.9932
epoch [10/100] | train loss: 0.9922
epoch [11/100] | train loss: 0.9912
epoch [12/100] | train loss: 0.9904
epoch [13/100] | train loss: 0.9889
epoch [14/100] | train loss: 0.9881
epoch [15/100] | train loss: 0.9875
epoch [16/100] | train loss: 0.9864
epoch [17/100] | train loss: 0.9864
epoch [18/100] | train loss: 0.9850
epoch [19/100] | train loss: 0.9844
epoch [20/100] | train loss: 0.9828
epoch [21/100] | train loss: 0.9833
epoch [22/100] | train loss: 0.9808
epoch [23/100] | train loss: 0.9810
epoch [24/100] | train loss: 0.9803
epoch [25/100] | train loss: 0.9802
epoch [26/100] | train loss: 0.9792
epoch [27/100] | train loss: 0.9776
epoch [28/1

In [87]:
# print final comparison
print("\nmodel comparison")
for name, (mse, mae) in results.items():
    print(f"{name} - test mse: {mse:.4f}, test mae: {mae:.4f}")


model comparison
Baseline (No Dropout, No L1/L2) - test mse: 1.0802, test mae: 0.8230
Dropout Model (No L1/L2) - test mse: 1.0468, test mae: 0.8099
Dropout Model (With L1 & L2) - test mse: 1.0245, test mae: 0.8029


## Insights

 **Baseline Model (No Dropout, No L1/L2)**
- **Train Loss:** Steadily decreased to **0.9397**.
- **Test MSE:** **1.0802** | **Test MAE:** **0.8230**.
- Performs **decently**, but may be **overfitting** slightly.

**Dropout Model (No L1/L2)**
- **Train Loss:** Stabilized around **0.9704**.
- **Test MSE:** **1.0468** | **Test MAE:** **0.8099**.
- **Better generalization** than baseline, slight performance improvement.

**Dropout Model (With L1 & L2 Regularization)**
- **Train Loss:** Started high due to regularization but stabilized at **1.0007**.
- **Test MSE:** **1.0245** | **Test MAE:** **0.8029**.
- Best test performance **(lowest MSE & MAE)** --- less overfitting


### I tried batch normalization

Tried to stabilize training and improve generalization. The idea was to normalize activations, allowing the model to learn faster.

- **Loss was actually higher** with BN applied.
- The model didn't generalize better—test MSE stayed the same or worsened.
- Removing BN resulted in **lower training loss and better performance**.

### Final Takeaways
- **Dropout + L1 & L2 is the winner** – best test error.  
- **Dropout alone helps**, but adding L1 & L2 **reduces overfitting** even more.  
- **Baseline overfits slightly** – good, but not the best for real-world generalization.  
