# <div align="center"> COSC 2673/2793 | Machine Learning </div>

## <div align="center"> Assignment 2 - Joseph Packham (s3838978) and Kylie Nguyen (s3946026) </div>


# Introduction

This report will cover the process of producing a machine learning model that will predict energy usage...


In [None]:
# importing packages
# import seaborn package for plotting scatterplots
import seaborn as sns

import pandas as pd



import matplotlib.pyplot as plt


import numpy as np


from sklearn.model_selection import train_test_split


import tensorflow as tf


import pydot as pyd


from tensorflow.keras.metrics import MeanAbsoluteError


from sklearn.preprocessing import StandardScaler


from sklearn.preprocessing import MinMaxScaler


from tensorflow.keras import regularizers


from tensorflow.keras.layers import Dropout, BatchNormalization


from sklearn.preprocessing import PowerTransformer


from tensorflow.keras.optimizers import Adam


from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing import PolynomialFeatures
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.utils import compute_sample_weight
from tensorflow.keras.losses import Huber

In [None]:
# read in CSV file and display first 5 rows
energyUse_df = pd.read_csv("./dataset/UCI-electricity/UCI_data.csv", delimiter=",")
energyUse_df.head()

# Exploratory Data Analysis

First, the data is investigated through EDA. It is observed that the shape of the dataframe is 19735 rows of data with 28 columns, where 1 column is the target variable (energy usage in Wh), and the remaining columns are the attributes. According to the description of the data, these attributes cover the temperature and humidity of different rooms in the house, as well as outside, along with a few other weather related variables such as pressure and windspeed. It is noted that there are two variables listed as "Random Variable". Using the .info() function, it is confirmed that there are no null values within the dataset.


In [None]:
# check for any null values, using shape to compare
print("Shape of Energy Use dataframe: ", energyUse_df.shape, "\n")

energyUse_df.info()

Using the describe function, the count, mean, standard deviation, quantiles and the minimum and maximum values of the data are returned. With these values it is seen that, although the range of the values among the variables regarding humidity and temperature are relatively similar, there are cases where the range differs greatly. For example, the range of Windspeed is between 0-14, whereas the range of target energy is between 10-1110. This suggests that feature scaling should be done later in the process, as the differing ranges may cause problems or confuse the learning algorithms.


In [None]:
energyUse_df.describe()

# Data distribution

In order to observe the distributions of each variable, histograms are plotted for the variables other than date, as the date variable is of type object and cannot be plotted.


In [None]:
# get list of columns other than date
columns = (energyUse_df.columns).difference(["date"])
# plot histogram for all variables other than date
plt.figure(figsize=(20, 20))
for i, column in enumerate(columns):
    plt.subplot(6, 5, i + 1)
    plt.hist(energyUse_df[column], alpha=0.3, color="b", density=True)
    plt.title(column)
    plt.xticks(rotation="vertical")
    plt.tight_layout()

> **Observations:**
>
> - There are a number of attributes that appear to be skewed, eg. RH_5, RH_Out, T2 etc.
> - The two random variables are very evenly distributed.


In [None]:
# display boxplot for the target, energy usage, variable
plt.boxplot(energyUse_df["TARGET_energy"])
plt.title("Energy Usage")
plt.show()

After displaying the boxplot for the target variable, it is observed that there are a number of outliers above the lower limit. These values will be dropped as to prevent these dramatically different values from affecting the model. The outliers are dropped using the IQR method.


In [None]:
# get the quantiles and IQR
q1 = energyUse_df["TARGET_energy"].quantile(0.25)
q3 = energyUse_df["TARGET_energy"].quantile(0.75)
IQR = q3 - q1

# calculate lower and upper limits
lowerLimit = q1 - (1.5 * IQR)
upperLimit = q3 + (1.5 * IQR)

# get rid of rows with outliers from the dataframe
energyUse_df = energyUse_df.loc[
    (energyUse_df["TARGET_energy"] > lowerLimit)
    & (energyUse_df["TARGET_energy"] < upperLimit)
]

# display boxplot without outliers
plt.boxplot(energyUse_df["TARGET_energy"])
plt.title("Energy Usage")
plt.show()

In [None]:
energyUse_df.shape

# Relationship between variables

Using scatterplots, the relationship between the target variable, Energy Usage, against the other attributes in the dataframe is explored.


In [None]:


# plot scatterplots for all features against target variable
plt.figure(figsize=(20, 20))
for i, column in enumerate(columns):
    plt.subplot(6, 5, i + 1)
    sns.scatterplot(data=energyUse_df, x=column, y="TARGET_energy")
    plt.title(column)

plt.xticks(rotation="vertical")
plt.tight_layout()
plt.show()

In [None]:
# # get list of columns other than date and target
# columns = (energyUse_df.columns).difference(["date", "TARGET_energy"])

# g = sns.PairGrid(data=energyUse_df, vars=columns, hue="TARGET_energy")
# g.map(sns.scatterplot)
# plt.show()

> **Observations:**
>
> - There are some plots that show that a linear decision boundary may be able to separate the two classes. eg.
> - Whereas there are some plots that show that a non-linear decision boundary may be to separate the two classes.


In [None]:
# get df without date column
energyUse_df_noDate = energyUse_df.drop(columns=["date"])

# plot correlation plot
f, ax = plt.subplots(figsize=(11, 9))
corr = energyUse_df_noDate.corr()
ax = sns.heatmap(
    corr,
    vmin=-1,
    vmax=1,
    center=0,
    cmap=sns.diverging_palette(20, 220, n=200),
    square=True,
)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90, horizontalalignment="right")

> **Observations:**
>
> - Variables relating to temperature are highly positively correlated with each other, and variables that are related to humidity are similarly, highly positively correlated with each other.
> - Variables involving temperature generally have either a slight positive, or slight negative correlation with variables involving humidity.
> - RH_6, the humidity outside the building (northside) seems to be quite negatively correlated with variables regarding temperature.
> - The two random variables do not seem to be correlated with any other variable other being highly correlated with themselves as well as each other.


In [None]:
energyUse_df["TARGET_energy"].hist(figsize=(5, 5))
plt.xlabel("Energy Usage")
plt.ylabel("frequency")
plt.show()

# Non-Neural Network - Linear Regression

### Data Splitting

In [None]:
from sklearn.model_selection import train_test_split

# split the dataset into 70% train and 15% test and 15% val
with pd.option_context("mode.chained_assignment", None):
    LR_train, LR_test = train_test_split(
        energyUse_df, test_size=0.3, shuffle=True, random_state=42
    )
    LR_test, LR_val = train_test_split(
        LR_test, test_size=0.5, shuffle=True, random_state=42
    )

# Separate the target and the attributes
LR_X_train = LR_train.drop(["TARGET_energy", "date"], axis=1)
LR_y_train = LR_train["TARGET_energy"]

LR_X_test = LR_test.drop(["TARGET_energy", "date"], axis=1)
LR_y_test = LR_test["TARGET_energy"]

LR_X_val = LR_val.drop(["TARGET_energy", "date"], axis=1)
LR_y_val = LR_val["TARGET_energy"]

print("LR_X_train shape: ", LR_X_train.shape)
print("LR_y_train shape: ", LR_y_train.shape)
print("LR_X_test shape: ", LR_X_test.shape)
print("LR_y_test shape: ", LR_y_test.shape)
print("LR_X_val shape: ", LR_X_val.shape)
print("LR_y_val shape: ", LR_y_val.shape)

In [None]:
energyUse_df_X = energyUse_df.drop(["TARGET_energy", "date"], axis=1)

# plotting histograms of both training and test datasets
plt.figure(figsize=(20, 20))
for i, col in enumerate(energyUse_df_X.columns):
    plt.subplot(6, 5, i + 1)
    plt.hist(LR_X_train[col], alpha=0.3, color="b", density=True)
    plt.hist(LR_X_test[col], alpha=0.3, color="r", density=True)
    plt.title(col)
    plt.xticks(rotation="vertical")
    plt.tight_layout()

### Base Model, Unscaled Data

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import math

# unscaled
model_us_lr = LinearRegression().fit(LR_X_train, LR_y_train)
LR_y_val_pred_US = model_us_lr.predict(LR_X_val)

r2_us_lr = r2_score(LR_y_val, LR_y_val_pred_US)
print(
    "The R^2 score for the linear regression model (without feature scaling) is: {:.3f}".format(
        r2_us_lr
    )
)

MSE_us_lr = np.square(np.subtract(LR_y_val, LR_y_val_pred_US)).mean()
RMSE_us_lr = math.sqrt(MSE_us_lr)

print(
    "The RMSE score for the linear regression model (without feature scaling) is: {:.3f}".format(
        RMSE_us_lr
    )
)

In [None]:
# predicting using linear model and plotting predicted vs actual values

fig, energyUse_LinearRegression = plt.subplots()
energyUse_LinearRegression.scatter(
    LR_y_val, LR_y_val_pred_US, s=25, cmap=plt.cm.coolwarm, zorder=10
)

lims = [
    np.min(
        [energyUse_LinearRegression.get_xlim(), energyUse_LinearRegression.get_ylim()]
    ),
    np.max(
        [energyUse_LinearRegression.get_xlim(), energyUse_LinearRegression.get_ylim()]
    ),
]

energyUse_LinearRegression.plot(lims, lims, "k--", alpha=0.75, zorder=0)
energyUse_LinearRegression.plot(
    lims,
    [
        np.mean(LR_y_train),
    ]
    * 2,
    "r--",
    alpha=0.75,
    zorder=0,
)
energyUse_LinearRegression.set_aspect("equal")
energyUse_LinearRegression.set_xlim(lims)
energyUse_LinearRegression.set_ylim(lims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Predicted Energy Use")

plt.show()

In [None]:
# plot residuals for unscaled
fig, ax = plt.subplots()
ax.scatter(LR_y_val, LR_y_val - LR_y_val_pred_US, s=25, cmap=plt.cm.coolwarm, zorder=10)

xlims = ax.get_xlim()
ax.plot(
    xlims,
    [
        0.0,
    ]
    * 2,
    "k--",
    alpha=0.75,
    zorder=0,
)
ax.set_xlim(xlims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Residual")

plt.show()

### Model with MinMaxScaling and Power Transforming

In [None]:
# scaling all features, normalising skewed features
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PowerTransformer

logNorm_attributes = [
    "RH_1",
    "T2",
    "T3",
    "RH_3",
    "RH_4",
    "T5",
    "RH_5",
    "T6",
    "RH_6",
    "T7",
    "RH_7",
    "RH_8",
    "T9",
    "RH_9",
    "T_out",
    "Press_mm_hg",
    "RH_out",
    "Windspeed",
    "Visibility",
]
minmax_attributes = list(
    set(energyUse_df_X.columns).difference(set(logNorm_attributes))
)

LR_X_train_scaled = LR_X_train.copy()
LR_X_val_scaled = LR_X_val.copy()

minmaxscaler = MinMaxScaler().fit(LR_X_train_scaled.loc[:, minmax_attributes])
LR_X_train_scaled.loc[:, minmax_attributes] = minmaxscaler.transform(
    LR_X_train_scaled.loc[:, minmax_attributes]
)
LR_X_val_scaled.loc[:, minmax_attributes] = minmaxscaler.transform(
    LR_X_val_scaled.loc[:, minmax_attributes]
)

powertransformer = PowerTransformer(method="yeo-johnson", standardize=False).fit(
    LR_X_train.loc[:, logNorm_attributes]
)
LR_X_train_scaled.loc[:, logNorm_attributes] = powertransformer.transform(
    LR_X_train.loc[:, logNorm_attributes]
)
LR_X_val_scaled.loc[:, logNorm_attributes] = powertransformer.transform(
    LR_X_val.loc[:, logNorm_attributes]
)

minmaxscaler_pt = MinMaxScaler().fit(LR_X_train_scaled.loc[:, logNorm_attributes])
LR_X_train_scaled.loc[:, logNorm_attributes] = minmaxscaler_pt.transform(
    LR_X_train_scaled.loc[:, logNorm_attributes]
)
LR_X_val_scaled.loc[:, logNorm_attributes] = minmaxscaler_pt.transform(
    LR_X_val_scaled.loc[:, logNorm_attributes]
)

In [None]:
# plot all histograms after scaling and normalisation
plt.figure(figsize=(20, 20))
for i, col in enumerate(LR_X_train_scaled.columns):
    plt.subplot(6, 5, i + 1)
    plt.hist(LR_X_train_scaled[col], alpha=0.3, color="b", density=True)
    plt.hist(LR_X_val_scaled[col], alpha=0.3, color="r", density=True)
    plt.title(col)
    plt.xticks(rotation="vertical")
    plt.tight_layout()

In [None]:
# fitting a linear regression model
model_scaled_lr = LinearRegression().fit(LR_X_train_scaled, LR_y_train)

# predicting using linear model and plotting predicted vs actual values
LR_y_val_pred_scaled = model_scaled_lr.predict(LR_X_val_scaled)

fig, energyUse_LinearRegression = plt.subplots()
energyUse_LinearRegression.scatter(
    LR_y_val, LR_y_val_pred_scaled, s=25, cmap=plt.cm.coolwarm, zorder=10
)

lims = [
    np.min(
        [energyUse_LinearRegression.get_xlim(), energyUse_LinearRegression.get_ylim()]
    ),
    np.max(
        [energyUse_LinearRegression.get_xlim(), energyUse_LinearRegression.get_ylim()]
    ),
]

energyUse_LinearRegression.plot(lims, lims, "k--", alpha=0.75, zorder=0)
energyUse_LinearRegression.plot(
    lims,
    [
        np.mean(LR_y_train),
    ]
    * 2,
    "r--",
    alpha=0.75,
    zorder=0,
)
energyUse_LinearRegression.set_aspect("equal")
energyUse_LinearRegression.set_xlim(lims)
energyUse_LinearRegression.set_ylim(lims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Predicted Energy Use")

plt.show()

In [None]:
# plot residuals for scaled
fig, ax = plt.subplots()
ax.scatter(
    LR_y_val, LR_y_val - LR_y_val_pred_scaled, s=25, cmap=plt.cm.coolwarm, zorder=10
)

xlims = ax.get_xlim()
ax.plot(
    xlims,
    [
        0.0,
    ]
    * 2,
    "k--",
    alpha=0.75,
    zorder=0,
)
ax.set_xlim(xlims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Residual")

plt.show()

In [None]:
# scaled
r2_lr_scaled = r2_score(LR_y_val, LR_y_val_pred_scaled)

print(
    "The R^2 score for the linear regression model (with feature scaling) is: {:.3f}".format(
        r2_lr_scaled
    )
)

MSE_lr_scaled = np.square(np.subtract(LR_y_val, LR_y_val_pred_scaled)).mean()
RMSE_lr_scaled = math.sqrt(MSE_lr_scaled)

print(
    "The RMSE score for the linear regression model (with feature scaling) is: {:.3f}".format(
        RMSE_lr_scaled
    )
)

### Day of Week Column + Scaled & Transformed data

In [None]:
# trying to use date to see if that makes model perform better
energyUse_df["date"] = pd.to_datetime(energyUse_df["date"], format="%Y-%m-%d %H:%M:%S")
energyUse_df["day_of_week"] = energyUse_df["date"].dt.dayofweek

# split the dataset into 70% train and 15% test and 15% val
with pd.option_context("mode.chained_assignment", None):
    LR_train, LR_test = train_test_split(
        energyUse_df, test_size=0.3, shuffle=True, random_state=42
    )
    LR_test, LR_val = train_test_split(
        LR_test, test_size=0.5, shuffle=True, random_state=42
    )

# Separate the target and the attributes
LR_X_train = LR_train.drop(["TARGET_energy", "date"], axis=1)
LR_y_train = LR_train["TARGET_energy"]

LR_X_test = LR_test.drop(["TARGET_energy", "date"], axis=1)
LR_y_test = LR_test["TARGET_energy"]

LR_X_val = LR_val.drop(["TARGET_energy", "date"], axis=1)
LR_y_val = LR_val["TARGET_energy"]

print("LR_X_train shape: ", LR_X_train.shape)
print("LR_y_train shape: ", LR_y_train.shape)
print("LR_X_test shape: ", LR_X_test.shape)
print("LR_y_test shape: ", LR_y_test.shape)
print("LR_X_val shape: ", LR_X_val.shape)
print("LR_y_val shape: ", LR_y_val.shape)

In [None]:
# scaling all features, normalising skewed features
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import PowerTransformer

logNorm_attributes = [
    "RH_1",
    "T2",
    "T3",
    "RH_3",
    "RH_4",
    "T5",
    "RH_5",
    "T6",
    "RH_6",
    "T7",
    "RH_7",
    "RH_8",
    "T9",
    "RH_9",
    "T_out",
    "Press_mm_hg",
    "RH_out",
    "Windspeed",
    "Visibility",
]
minmax_attributes = list(
    set(energyUse_df_X.columns).difference(set(logNorm_attributes))
)

LR_X_train_scaled = LR_X_train.copy()
LR_X_val_scaled = LR_X_val.copy()

minmaxscaler = MinMaxScaler().fit(LR_X_train_scaled.loc[:, minmax_attributes])
LR_X_train_scaled.loc[:, minmax_attributes] = minmaxscaler.transform(
    LR_X_train_scaled.loc[:, minmax_attributes]
)
LR_X_val_scaled.loc[:, minmax_attributes] = minmaxscaler.transform(
    LR_X_val_scaled.loc[:, minmax_attributes]
)

powertransformer = PowerTransformer(method="yeo-johnson", standardize=False).fit(
    LR_X_train.loc[:, logNorm_attributes]
)
LR_X_train_scaled.loc[:, logNorm_attributes] = powertransformer.transform(
    LR_X_train.loc[:, logNorm_attributes]
)
LR_X_val_scaled.loc[:, logNorm_attributes] = powertransformer.transform(
    LR_X_val.loc[:, logNorm_attributes]
)

minmaxscaler_pt = MinMaxScaler().fit(LR_X_train_scaled.loc[:, logNorm_attributes])
LR_X_train_scaled.loc[:, logNorm_attributes] = minmaxscaler_pt.transform(
    LR_X_train_scaled.loc[:, logNorm_attributes]
)
LR_X_val_scaled.loc[:, logNorm_attributes] = minmaxscaler_pt.transform(
    LR_X_val_scaled.loc[:, logNorm_attributes]
)

In [None]:
# fitting a linear regression model
model_scaled_lr_wDayOfWeek = LinearRegression().fit(LR_X_train_scaled, LR_y_train)

# predicting using linear model and plotting predicted vs actual values
LR_y_val_pred_dayOfWeek = model_scaled_lr_wDayOfWeek.predict(LR_X_val_scaled)

fig, energyUse_wDayOfWeek_LinearRegression = plt.subplots()
energyUse_wDayOfWeek_LinearRegression.scatter(
    LR_y_val, LR_y_val_pred_dayOfWeek, s=25, cmap=plt.cm.coolwarm, zorder=10
)

lims = [
    np.min(
        [
            energyUse_wDayOfWeek_LinearRegression.get_xlim(),
            energyUse_wDayOfWeek_LinearRegression.get_ylim(),
        ]
    ),
    np.max(
        [
            energyUse_wDayOfWeek_LinearRegression.get_xlim(),
            energyUse_wDayOfWeek_LinearRegression.get_ylim(),
        ]
    ),
]

energyUse_wDayOfWeek_LinearRegression.plot(lims, lims, "k--", alpha=0.75, zorder=0)
energyUse_wDayOfWeek_LinearRegression.plot(
    lims,
    [
        np.mean(LR_y_train),
    ]
    * 2,
    "r--",
    alpha=0.75,
    zorder=0,
)
energyUse_wDayOfWeek_LinearRegression.set_aspect("equal")
energyUse_wDayOfWeek_LinearRegression.set_xlim(lims)
energyUse_wDayOfWeek_LinearRegression.set_ylim(lims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Predicted Energy Use")

plt.show()

In [None]:
fig, ax = plt.subplots()
ax.scatter(
    LR_y_val, LR_y_val - LR_y_val_pred_dayOfWeek, s=25, cmap=plt.cm.coolwarm, zorder=10
)

xlims = ax.get_xlim()
ax.plot(
    xlims,
    [
        0.0,
    ]
    * 2,
    "k--",
    alpha=0.75,
    zorder=0,
)
ax.set_xlim(xlims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Residual")

plt.show()

In [None]:
# scaled + dayOfWeek
r2_lr = r2_score(LR_y_val, LR_y_val_pred_dayOfWeek)

print(
    "The R^2 score for the linear regression model (with feature scaling + dayOfWeek) is: {:.3f}".format(
        r2_lr
    )
)

MSE_lr = np.square(np.subtract(LR_y_val, LR_y_val_pred_dayOfWeek)).mean()
RMSE_lr = math.sqrt(MSE_lr)

print(
    "The RMSE score for the linear regression model (with feature scaling + dayOfWeek) is: {:.3f}".format(
        RMSE_lr
    )
)

### Day of Week Column + Unscaled & Untransformed data

In [None]:
# fitting a linear regression model
model_us_lr_wDayOfWeek = LinearRegression().fit(LR_X_train, LR_y_train)

# predicting using linear model and plotting predicted vs actual values
LR_y_val_pred_dayOfWeek_us = model_us_lr_wDayOfWeek.predict(LR_X_val)

fig, energyUse_wDayOfWeek_LinearRegression = plt.subplots()
energyUse_wDayOfWeek_LinearRegression.scatter(
    LR_y_val, LR_y_val_pred_dayOfWeek_us, s=25, cmap=plt.cm.coolwarm, zorder=10
)

lims = [
    np.min(
        [
            energyUse_wDayOfWeek_LinearRegression.get_xlim(),
            energyUse_wDayOfWeek_LinearRegression.get_ylim(),
        ]
    ),
    np.max(
        [
            energyUse_wDayOfWeek_LinearRegression.get_xlim(),
            energyUse_wDayOfWeek_LinearRegression.get_ylim(),
        ]
    ),
]

energyUse_wDayOfWeek_LinearRegression.plot(lims, lims, "k--", alpha=0.75, zorder=0)
energyUse_wDayOfWeek_LinearRegression.plot(
    lims,
    [
        np.mean(LR_y_train),
    ]
    * 2,
    "r--",
    alpha=0.75,
    zorder=0,
)
energyUse_wDayOfWeek_LinearRegression.set_aspect("equal")
energyUse_wDayOfWeek_LinearRegression.set_xlim(lims)
energyUse_wDayOfWeek_LinearRegression.set_ylim(lims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Predicted Energy Use")

plt.show()

In [None]:
fig, ax = plt.subplots()
ax.scatter(
    LR_y_val, LR_y_val - LR_y_val_pred_dayOfWeek, s=25, cmap=plt.cm.coolwarm, zorder=10
)

xlims = ax.get_xlim()
ax.plot(
    xlims,
    [
        0.0,
    ]
    * 2,
    "k--",
    alpha=0.75,
    zorder=0,
)
ax.set_xlim(xlims)

plt.xlabel("Actual Energy Use")
plt.ylabel("Residual")

plt.show()

In [None]:
# unscaled + dayOfWeek
r2_lr = r2_score(LR_y_val, LR_y_val_pred_dayOfWeek_us)

print(
    "The R^2 score for the linear regression model (unscaled + dayOfWeek) is: {:.3f}".format(
        r2_lr
    )
)

MSE_lr = np.square(np.subtract(LR_y_val, LR_y_val_pred_dayOfWeek_us)).mean()
RMSE_lr = math.sqrt(MSE_lr)

print(
    "The RMSE score for the linear regression model (unscaled + dayOfWeek) is: {:.3f}".format(
        RMSE_lr
    )
)

In [None]:
LR_y_test_pred_dayOfWeek_us = model_us_lr_wDayOfWeek.predict(LR_X_test)
# unscaled + dayOfWeek
r2_lr = r2_score(LR_y_test, LR_y_test_pred_dayOfWeek_us)

print(
    "The R^2 score for the linear regression model (unscaled + dayOfWeek) is: {:.3f}".format(
        r2_lr
    )
)

MSE_lr = np.square(np.subtract(LR_y_test, LR_y_test_pred_dayOfWeek_us)).mean()
RMSE_lr = math.sqrt(MSE_lr)

print(
    "The RMSE score for the linear regression model (unscaled + dayOfWeek) is: {:.3f}".format(
        RMSE_lr
    )
)

# Creating Neural Network


In [None]:
# function to plot learning curve
def plot_learning_curve(
    train_loss, val_loss, train_metric, val_metric, metric_name="MeanAbsoluteError"
):

    plt.figure(figsize=(10, 5))
    plt.plot(train_loss, "r--")
    plt.plot(val_loss, "b--")
    plt.xlabel("epochs")
    plt.ylabel("Loss")
    plt.legend(["train", "val"], loc="upper left")
    plt.show()

    plt.figure(figsize=(10, 5))
    plt.plot(train_metric, "r--")
    plt.plot(val_metric, "b--")
    plt.xlabel("epochs")
    plt.ylabel(metric_name)
    plt.legend(["train", "val"], loc="upper left")


    plt.show()
# function for residual plot
def plot_residuals(model, val_y, y_pred):
    residuals = val_y - y_pred

    fig, ax = plt.subplots()
    ax.scatter(val_y, residuals, s=25, zorder=10)

    ax.axhline(y=0, color="k", linestyle="-", linewidth=1, alpha=0.75, zorder=0)

    plt.xlabel("Energy Usage")
    plt.ylabel("Residuals")
    plt.title(f"Residuals Plot for {model}")
    plt.grid(True)
    plt.show()

    #function to crate scatter plot
def scatter_plot(val_y, y_pred, model):
    plt.scatter(val_y, y_pred)
    plt.xlabel("Actual Energy Usage")
    plt.ylabel("Predicted Energy Usage")
    plt.title(f"Actual vs Predicted Energy Usage for {model}")
    #red line
    plt.plot([val_y.min(), val_y.max()], [val_y.min(), val_y.max()], "r--")
    plt.show()

    #function to calculate and print metrics
def calculate_metrics(model, val_y, y_pred):
    r2 = r2_score(val_y, y_pred)
    rmse = np.sqrt(mean_squared_error(val_y, y_pred))
    mae = mean_absolute_error(val_y, y_pred)

    print(f"R2 Score for {model}: {r2}")
    print(f"Root Mean Squared Error for {model}: {rmse}")
    print(f"Mean Absolute Error for {model}: {mae}")


In [None]:
# split the data into traing, testing and validation

with pd.option_context("mode.chained_assignment", None):
    train_data, test_data = train_test_split(
        energyUse_df, test_size=0.3, shuffle=True, random_state=42
    )
    test_data, val_data = train_test_split(
        test_data, test_size=0.5, shuffle=True, random_state=42
    )

# remove the target column from the data
X_train = train_data.drop(columns=["TARGET_energy", "date"])
y_train = train_data["TARGET_energy"]

X_test = test_data.drop(columns=["TARGET_energy", "date"])
y_test = test_data["TARGET_energy"]

X_val = val_data.drop(columns=["TARGET_energy", "date"])
y_val = val_data["TARGET_energy"]

# train data - used to train the model
# validation data - used to tune the hyperparameters
# test data - used to evaluate the final model

# print the shapes of the data
print("X_train shape: ", X_train.shape)
print("y_train shape: ", y_train.shape)
print("X_val shape: ", X_val.shape)
print("y_val shape: ", y_val.shape)
print("X_test shape: ", X_test.shape)
print("y_test shape: ", y_test.shape)

## Base Neural Network on unchanged Data


In [None]:
# base model values
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 256  # this can be tuned later
OUTPUT_CLASSES = 1

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(HIDDEN_LAYER_DIM, activation="relu"),
        tf.keras.layers.Dense(OUTPUT_CLASSES, activation="linear"),
    ]
)

model.summary()
tf.keras.utils.plot_model(model, show_shapes=True)

# compile model
model.compile(
    optimizer="adam", loss="mean_squared_error", metrics=["mean_absolute_error"]
)

# train the model
history = model.fit(
    X_train, y_train, validation_data=(X_val, y_val), epochs=50, verbose=1
)

In [None]:
# plot learning curve
plot_learning_curve(
    history.history["loss"],
    history.history["val_loss"],
    history.history["mean_absolute_error"],
    history.history["val_mean_absolute_error"],
)
#plot residual plot
y_pred = model.predict(X_val)
y_pred = y_pred.flatten()
plot_residuals("Base Model", y_val, y_pred)

#scatter plot
scatter_plot(y_val, y_pred, "Base Model")

#calculate metrics
calculate_metrics("Base Model", y_val, y_pred)

#### Observations

- the model performs extremely poorly
- neds significant improvement
  -over fitting is a major issue


In [None]:
# Scale the data
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)


In [None]:

model2 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(HIDDEN_LAYER_DIM, activation="relu"),
        tf.keras.layers.Dense(OUTPUT_CLASSES),
    ]
)

model2.summary()
tf.keras.utils.plot_model(model2, show_shapes=True)


# compile model
model2.compile(
    optimizer="adam", loss="mean_squared_error", metrics=["mean_absolute_error"]
)

# train the model
history2 = model2.fit(
    X_train_scaled, y_train, validation_data=(X_val_scaled, y_val), epochs=50, verbose=1
)

In [None]:
# plot learning curve
plot_learning_curve(
    history2.history["loss"],
    history2.history["val_loss"],
    history2.history["mean_absolute_error"],
    history2.history["val_mean_absolute_error"],
)

#plot residual plot
y_pred = model2.predict(X_val_scaled)
y_pred = y_pred.flatten()
plot_residuals("Scaled Model", y_val, y_pred)

#scatter plot
scatter_plot(y_val, y_pred, "Scaled Model")

#calculate metrics
calculate_metrics("Scaled Model", y_val, y_pred)


- still seems to be overfitting possibly


In [None]:
# change batchsize

INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 256  # this can be tuned later
OUTPUT_CLASSES = 1
BATCH_SIZE = 64

# create model
model3 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(HIDDEN_LAYER_DIM, activation="relu"),
        tf.keras.layers.Dense(OUTPUT_CLASSES),
    ]
)

# compile model
model3.compile(
    optimizer="adam", loss="mean_squared_error", metrics=["mean_absolute_error"]
)

# train the model
history3 = model3.fit(
    X_train_scaled,
    y_train,
    validation_data=(X_val_scaled, y_val),
    epochs=50,
    verbose=1,
    batch_size=BATCH_SIZE,
)

In [None]:
# plot learning curve
plot_learning_curve(
    history3.history["loss"],
    history3.history["val_loss"],
    history3.history["mean_absolute_error"],
    history3.history["val_mean_absolute_error"],
)

# plot residual plot
y_pred = model3.predict(X_val_scaled)
y_pred = y_pred.flatten()
plot_residuals("Model 3", y_val, y_pred)

# scatter plot
scatter_plot(y_val, y_pred, "Model 3")

# calculate metrics
calculate_metrics("Model 3", y_val, y_pred)

Model is overfitting


In [None]:
# try regularisation
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512  # this can be tuned later
OUTPUT_CLASSES = 1
BATCH_SIZE = 128

# create model
model4 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l1(0.01),
        ),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model4.compile(
    optimizer="adam", loss="mean_squared_error", metrics=["mean_absolute_error"]
)

# train the model
history5 = model4.fit(
    X_train_scaled,
    y_train,
    validation_data=(X_val_scaled, y_val),
    epochs=50,
    verbose=1,
    batch_size=BATCH_SIZE,
)

In [None]:
# plot learning curve
plot_learning_curve(
    history5.history["loss"],
    history5.history["val_loss"],
    history5.history["mean_absolute_error"],
    history5.history["val_mean_absolute_error"],
)

# plot residual plot
y_pred = model4.predict(X_val_scaled)
y_pred = y_pred.flatten()
plot_residuals("Model 4", y_val, y_pred)

# scatter plot
scatter_plot(y_val, y_pred, "Model 4")

# calculate metrics
calculate_metrics("Model 4", y_val, y_pred)

- still overfitting the model   
-standard scaler is working better than minmax scaler

In [None]:
# tune regularisation
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512  # this can be tuned later
OUTPUT_CLASSES = 1
BATCH_SIZE = 128
REGULARIZATIONFACTOR = 5

# create model
model5 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model5.compile(
    optimizer="adam",
    loss='mean_squared_error',
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history6 = model5.fit(
    X_train_scaled,
    y_train,
    validation_data=(X_val_scaled, y_val),
    epochs=50,
    verbose=1,
    batch_size=BATCH_SIZE,
)

In [None]:
# plot learning curve
plot_learning_curve(
    history6.history["loss"],
    history6.history["val_loss"],
    history6.history["mean_absolute_error"],
    history6.history["val_mean_absolute_error"],
)

# plot residual plot
y_pred = model5.predict(X_val_scaled)
y_pred = y_pred.flatten()
plot_residuals("Model 5", y_val, y_pred)

# scatter plot
scatter_plot(y_val, y_pred, "Model 5")

# calculate metrics
calculate_metrics("Model 5", y_val, y_pred)

- not greatv still but no more overfitting 

# MODEL 6 - Scaled Data, L2 Regularisation, DropOut and Batch Normalization


In [None]:

# tune regularisation
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 256  # this can be tuned later
OUTPUT_CLASSES = 1
BATCH_SIZE = 64
REGULARIZATIONFACTOR = 0.01

# create model
model6 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(0.5),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model6.compile(
    optimizer="adam",
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history7 = model6.fit(
    X_train_scaled,
    y_train,
    validation_data=(X_val_scaled, y_val),
    epochs=50,
    verbose=1,
    batch_size=BATCH_SIZE,
)

In [None]:
# plot learning curve
plot_learning_curve(
    history7.history["loss"],
    history7.history["val_loss"],
    history7.history["mean_absolute_error"],
    history7.history["val_mean_absolute_error"],
)



# residual plot
y_pred = model6.predict(X_val_scaled)
y_pred = y_pred.flatten()
plot_residuals("Model 6", y_val, y_pred)

# scatter plot
scatter_plot(y_val, y_pred, "Model 6")

# calculate metrics
calculate_metrics("Model 6", y_val, y_pred)

- performing not well but better thasn the start
- will try creatting features to represent the time series data

In [None]:
# create time based features
energyUse_df["date"] = pd.to_datetime(energyUse_df["date"], format="%Y-%m-%d %H:%M:%S")
# create time-based features
energyUse_df['hour'] = energyUse_df['date'].dt.hour
energyUse_df['day'] = energyUse_df['date'].dt.day
energyUse_df['month'] = energyUse_df['date'].dt.month
energyUse_df['year'] = energyUse_df['date'].dt.year

# drop rows with missing values
energyUse_df = energyUse_df.dropna()

# resplit the data
train_data, test_data = train_test_split(energyUse_df, test_size=0.3, shuffle=True, random_state=42)
test_data, val_data = train_test_split(test_data, test_size=0.5, shuffle=True, random_state=42)

# remove the target column from the data
X_train = train_data.drop(columns=["TARGET_energy", "date"])
y_train = train_data["TARGET_energy"]

X_test = test_data.drop(columns=["TARGET_energy", "date"])
y_test = test_data["TARGET_energy"]

X_val = val_data.drop(columns=["TARGET_energy", "date"])
y_val = val_data["TARGET_energy"]

# train data - used to train the model
# validation data - used to tune the hyperparameters
# test data - used to evaluate the final model

# print the shapes of the data
print("X_train shape: ", X_train.shape)
print("y_train shape: ", y_train.shape)
print("X_val shape: ", X_val.shape)
print("y_val shape: ", y_val.shape)
print("X_test shape: ", X_test.shape)
print("y_test shape: ", y_test.shape)

In [None]:
# Scale the data
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

In [None]:
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 256  # this can be tuned later
OUTPUT_CLASSES = 1
BATCH_SIZE = 64
REGULARIZATIONFACTOR = 0.01

# create model
model7 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(0.5),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model7.compile(
    optimizer="adam",
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history8 = model7.fit(
    X_train_scaled,
    y_train,
    validation_data=(X_val_scaled, y_val),
    epochs=50,
    verbose=1,
    batch_size=BATCH_SIZE,
)

In [None]:
#power transform data
scaler = PowerTransformer(method='yeo-johnson').fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_val_transformed = scaler.transform(X_val)
X_test_transformed = scaler.transform(X_test)



In [None]:
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512  # this can be tuned later
OUTPUT_CLASSES = 1
BATCH_SIZE = 128
REGULARIZATIONFACTOR = 0.05

# create model
model8 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(0.5),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(0.5),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model8.compile(
    optimizer="adam",
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history9 = model8.fit(
    X_train_transformed,
    y_train,
    validation_data=(X_val_transformed, y_val),
    epochs=50,
    verbose=1,
    batch_size=BATCH_SIZE,
)

In [None]:
#plot learning curve
plot_learning_curve(
    history9.history["loss"],
    history9.history["val_loss"],
    history9.history["mean_absolute_error"],
    history9.history["val_mean_absolute_error"],
)

    #plot residual plot
y_pred = model8.predict(X_val_transformed)
y_pred = y_pred.flatten()
plot_residuals("Model 8", y_val, y_pred)

#scatter plot
scatter_plot(y_val, y_pred, "Model 8")

#calculate metrics
calculate_metrics("Model 8", y_val, y_pred)


In [None]:
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM =    512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
REGULARIZATIONFACTOR = 0.015
DROPOUT = 0.5
learningrate = 0.001
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=10)

# create model
model9 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),

        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model9.compile(
    optimizer=optimizer,
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history10 = model9.fit(
    X_train_scaled,
    y_train,
    validation_data=(X_val_scaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

moidel is overfitting - parameter tuning required

In [None]:
#Further tuning of model parameters
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 512
REGULARIZATIONFACTOR = 0.0275
DROPOUT = 0.55
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)

# create model
model10 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model10.compile(
    optimizer=optimizer,
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history11 = model10.fit(
    X_train_transformed,
    y_train,
    validation_data=(X_val_transformed, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

In [None]:
# r2 score
# Generate predictions
y_val_pred = model10.predict(X_val_transformed)
# Flatten predictions to 1D array
y_val_pred = y_val_pred.flatten()

#residual plot
plot_residuals("Model 10", y_val, y_val_pred)

#scatter plot
scatter_plot(y_val, y_val_pred, "Model 10")

#calculate metrics
calculate_metrics("Model 10", y_val, y_val_pred)


In [None]:
# drop rh6
X_train= X_train.drop(columns=["RH_6"])
X_val = X_val.drop(columns=["RH_6"])
X_test = X_test.drop(columns=["RH_6"])
# power transform data
scaler = PowerTransformer(method="yeo-johnson").fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_val_transformed = scaler.transform(X_val)
X_test_transformed = scaler.transform(X_test)
#standard scale data
scaler = StandardScaler().fit(X_train_transformed)
X_train_transformedandscaled = scaler.transform(X_train_transformed)
X_val_transformedandscaled = scaler.transform(X_val_transformed)
X_test_transformedandscaled = scaler.transform(X_test_transformed)


In [None]:

INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 512
REGULARIZATIONFACTOR = 0.0275
DROPOUT = 0.55
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)

# create model
model11 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model11.compile(
    optimizer=optimizer,
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history12 = model11.fit(
    X_train_transformedandscaled,
    y_train,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

In [None]:
# r2 score
# Generate predictions
y_val_pred = model11.predict(X_val_transformed)
# Flatten predictions to 1D array
y_val_pred = y_val_pred.flatten()

# Calculate R2 score
r2 = r2_score(y_val, y_val_pred)

print(f"R2 score: {r2}")


plt.scatter(y_val, y_val_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.plot(
    [y_val.min(), y_val.max()], [y_val.min(), y_val.max()], color="red"
)  # y=x line
plt.show()

#residual plot
plot_residuals("Model 11", y_val, y_val_pred)

In [None]:
# drop t6
X_train = X_train.drop(columns=["T6"])
X_val = X_val.drop(columns=["T6"])
X_test = X_test.drop(columns=["T6"])
# power transform data
scaler = PowerTransformer(method="yeo-johnson").fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_val_transformed = scaler.transform(X_val)
X_test_transformed = scaler.transform(X_test)
# standard scale data
scaler = StandardScaler().fit(X_train_transformed)
X_train_transformedandscaled = scaler.transform(X_train_transformed)
X_val_transformedandscaled = scaler.transform(X_val_transformed)
X_test_transformedandscaled = scaler.transform(X_test_transformed)

In [None]:
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 512
REGULARIZATIONFACTOR = 0.0275
DROPOUT = 0.55
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)

# create model
model12 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model12.compile(
    optimizer=optimizer,
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history13 = model12.fit(
    X_train_transformedandscaled,
    y_train,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

In [None]:
# r2 score

# Generate predictions
y_val_pred = model12.predict(X_val_transformedandscaled)
# Flatten predictions to 1D array
y_val_pred = y_val_pred.flatten()

# Calculate R2 score
r2 = r2_score(y_val, y_val_pred)


print(f"R2 score: {r2}")
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_val, y_val_pred))
print(f"RMSE: {rmse}")

# Calculate MAE
mae = mean_absolute_error(y_val, y_val_pred)
print(f"MAE: {mae}")

# Calculate MSE (loss)
mse = mean_squared_error(y_val, y_val_pred)
print(f"MSE: {mse}")


plt.scatter(y_val, y_val_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.plot(
    [y_val.min(), y_val.max()], [y_val.min(), y_val.max()], color="red"
)  # y=x line
plt.show()

# residual plot
plot_residuals("Model 12", y_val, y_val_pred)

In [None]:
# add column for number of seconbnds from midnight
energyUse_df["date"] = pd.to_datetime(energyUse_df["date"])
energyUse_df["seconds_from_midnight"] = (
    energyUse_df["date"].dt.hour * 3600
    + energyUse_df["date"].dt.minute * 60
    + energyUse_df["date"].dt.second
)
#add column for day of the week
energyUse_df["day_of_week"] = energyUse_df["date"].dt.dayofweek

X_train = train_data.drop(columns=["TARGET_energy", "date"])
y_train = train_data["TARGET_energy"]


X_test = test_data.drop(columns=["TARGET_energy", "date"])
y_test = test_data["TARGET_energy"]

X_val = val_data.drop(columns=["TARGET_energy", "date"])
y_val = val_data["TARGET_energy"]

# power transform data
scaler = PowerTransformer(method="yeo-johnson").fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_val_transformed = scaler.transform(X_val)
X_test_transformed = scaler.transform(X_test)
# standard scale data
scaler = StandardScaler().fit(X_train_transformed)
X_train_transformedandscaled = scaler.transform(X_train_transformed)
X_val_transformedandscaled = scaler.transform(X_val_transformed)
X_test_transformedandscaled = scaler.transform(X_test_transformed)

# train data - used to train the model
# validation data - used to tune the hyperparameters
# test data - used to evaluate the final model

# print the shapes of the data
print("X_train shape: ", X_train.shape)
print("y_train shape: ", y_train.shape)
print("X_val shape: ", X_val.shape)
print("y_val shape: ", y_val.shape)
print("X_test shape: ", X_test.shape)
print("y_test shape: ", y_test.shape)

In [None]:
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 512
REGULARIZATIONFACTOR = 0.0275
DROPOUT = 0.55
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)

# create model
model13 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model13.compile(
    optimizer=optimizer,
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history14 = model13.fit(
    X_train_transformedandscaled,
    y_train,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

    # r2 score
# Generate predictions
y_val_pred = model13.predict(X_val_transformed)
# Flatten predictions to 1D array
y_val_pred = y_val_pred.flatten()

# Calculate R2 score
r2 = r2_score(y_val, y_val_pred)

print(f"R2 score: {r2}")


plt.scatter(y_val, y_val_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.plot(
    [y_val.min(), y_val.max()], [y_val.min(), y_val.max()], color="red"
)  # y=x line
plt.show()

# residual plot
plot_residuals("Model 13", y_val, y_val_pred)

In [None]:
# drop hour, day, month, year,
X_train = X_train.drop(columns=["hour", "day", "month", "year"])
X_val = X_val.drop(columns=["hour", "day", "month", "year"])
X_test = X_test.drop(columns=["hour", "day", "month", "year"])
# power transform data
scaler = PowerTransformer(method="yeo-johnson").fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_val_transformed = scaler.transform(X_val)
X_test_transformed = scaler.transform(X_test)
# standard scale data
scaler = StandardScaler().fit(X_train_transformed)
X_train_transformedandscaled = scaler.transform(X_train_transformed)
X_val_transformedandscaled = scaler.transform(X_val_transformed)
X_test_transformedandscaled = scaler.transform(X_test_transformed)


In [None]:
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 512
REGULARIZATIONFACTOR = 0.0275
DROPOUT = 0.55
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)

# create model
model14 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model14.compile(
    optimizer=optimizer,
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history15 = model14.fit(
    X_train_transformedandscaled,
    y_train,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

# r2 score
# Generate predictions
y_val_pred = model14.predict(X_val_transformed)
# Flatten predictions to 1D array
y_val_pred = y_val_pred.flatten()

# Calculate R2 score
r2 = r2_score(y_val, y_val_pred)

print(f"R2 score: {r2}")


plt.scatter(y_val, y_val_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.plot(
    [y_val.min(), y_val.max()], [y_val.min(), y_val.max()], color="red"
)  # y=x line
plt.show()

# residual plot
plot_residuals("Model 14", y_val, y_val_pred)

In [None]:
# add colkumn to check if day is weekday or weekend
X_train["is_weekend"] = X_train["day_of_week"].apply(lambda x: 1 if x >= 5 else 0)
X_val["is_weekend"] = X_val["day_of_week"].apply(lambda x: 1 if x >= 5 else 0)
X_test["is_weekend"] = X_test["day_of_week"].apply(lambda x: 1 if x >= 5 else 0)

# power transform data
scaler = PowerTransformer(method="yeo-johnson").fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_val_transformed = scaler.transform(X_val)
X_test_transformed = scaler.transform(X_test)
# standard scale data
scaler = StandardScaler().fit(X_train_transformed)
X_train_transformedandscaled = scaler.transform(X_train_transformed)
X_val_transformedandscaled = scaler.transform(X_val_transformed)
X_test_transformedandscaled = scaler.transform(X_test_transformed)

#shape of the data
print("X_train shape: ", X_train.shape)

In [None]:
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 512
REGULARIZATIONFACTOR = 0.0275
DROPOUT = 0.55
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)

# create model
model15 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model15.compile(
    optimizer=optimizer,
    loss="mean_squared_error",
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history16 = model15.fit(
    X_train_transformedandscaled,
    y_train,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

# r2 score
# Generate predictions
y_val_pred = model15.predict(X_val_transformedandscaled)
# Flatten predictions to 1D array
y_val_pred = y_val_pred.flatten()

# Calculate R2 score
r2 = r2_score(y_val, y_val_pred)

print(f"R2 score: {r2}")


plt.scatter(y_val, y_val_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.plot(
    [y_val.min(), y_val.max()], [y_val.min(), y_val.max()], color="red"
)  # y=x line
plt.show()

# residual plot
plot_residuals("Model 15", y_val, y_val_pred)

In [None]:
#get metrics
calculate_metrics("Model 15", y_val, y_val_pred)

## ensemble learning

In [None]:
# change date column to datetime
energyUse_df["date"] = pd.to_datetime(energyUse_df["date"])





# add seconds from midnight column
energyUse_df["seconds_from_midnight"] = (
    energyUse_df["date"].dt.hour * 3600
    + energyUse_df["date"].dt.minute * 60
    + energyUse_df["date"].dt.second
)



# add day_of_week column
energyUse_df["day_of_week"] = energyUse_df["date"].dt.dayofweek

# add is_weekend column
energyUse_df["is_weekend"] = energyUse_df["day_of_week"].apply(lambda x: 1 if x >= 5 else 0)

# describe the data and print the first 5 rows
energyUse_df.describe()
energyUse_df.head()

In [None]:

# drop rows with missing values

energyUse_df = energyUse_df.dropna()


# split the data

train_data, test_data = train_test_split(energyUse_df, test_size=0.3, shuffle=True, random_state=42)

test_data, val_data = train_test_split(test_data, test_size=0.5, shuffle=True, random_state=42)


# rebalance dataset by maually duplicating the instances of higher values

# Find the number of instances where the target is greater than 100

high_values = train_data[train_data["TARGET_energy"] > 100]

print("Number of instances with target greater than 100: ", high_values.shape[0])


# upsample the high values

train_data = pd.concat([train_data, high_values], ignore_index=True)

test_data = pd.concat([test_data, high_values], ignore_index=True)

val_data = pd.concat([val_data, high_values], ignore_index=True)


# remove the target column from the data

X_train = train_data.drop(columns=["TARGET_energy", "date"])

y_train = train_data["TARGET_energy"]

X_val = val_data.drop(columns=["TARGET_energy", "date"])

y_val = val_data["TARGET_energy"]

X_test = test_data.drop(columns=["TARGET_energy", "date"])

y_test = test_data["TARGET_energy"]

# # drop random variable columns
X_train = X_train.drop(columns=["rv1", "rv2"])
X_val = X_val.drop(columns=["rv1", "rv2"])
X_test = X_test.drop(columns=["rv1", "rv2"])

#drop rh6 and t6
X_train = X_train.drop(columns=["RH_6", "T6"])
X_val = X_val.drop(columns=["RH_6", "T6"])
X_test = X_test.drop(columns=["RH_6", "T6"])



In [None]:
#standard scale the data and power transform the data
scaler = PowerTransformer(method="yeo-johnson").fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_val_transformed = scaler.transform(X_val)
X_test_transformed = scaler.transform(X_test)
scaler = StandardScaler().fit(X_train_transformed)
X_train_transformedandscaled = scaler.transform(X_train_transformed)
X_val_transformedandscaled = scaler.transform(X_val_transformed)
X_test_transformedandscaled = scaler.transform(X_test_transformed)

In [None]:
# weighted loss function
def weighted_loss(y_true, y_pred):
    y_true_float = tf.cast(y_true, tf.float32)
    weights = tf.sqrt(y_true_float) + 1
    return tf.reduce_sum(weights * tf.square(y_true_float - y_pred))

In [None]:
# copy of models - so I don't have to keep scrolling
# model ensemble 1
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
REGULARIZATIONFACTOR = 0.03
DROPOUT = 0.50
learningrate = 0.002
DELTA = 1.0
optimizer = Adam(learning_rate=learningrate)



# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=20)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=20)


# create model
model_ensemble_1 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model_ensemble_1.compile(
    optimizer=optimizer,
    loss=weighted_loss,
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history_ensemble_1 = model_ensemble_1.fit(
    X_train_transformedandscaled,
    y_train,
    # sample_weight=sample_weights,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

In [None]:
# get predictions
y_val_pred = model_ensemble_1.predict(X_val_transformedandscaled)
# flatten predictions
y_val_pred = y_val_pred.flatten()

plot_residuals("Model Ensemble 1", y_val, y_val_pred)

# get scores
# Calculate R2 score
r2 = r2_score(y_val, y_val_pred)
print(f"R2 score: {r2}")
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_val, y_val_pred))
print(f"RMSE: {rmse}")
# Calculate MAE
mae = mean_absolute_error(y_val, y_val_pred)
print(f"MAE: {mae}")

In [None]:
# model 15
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
REGULARIZATIONFACTOR = 0.035
DROPOUT = 0.50
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.07, patience=15)

# create model
model15 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model15.compile(
    optimizer=optimizer,
    loss=Huber(),
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history16 = model15.fit(
    X_train_transformedandscaled,
    y_train,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

In [None]:
# plot residual plot
# get predictions
y_val_pred = model15.predict(X_val_transformedandscaled)
# flatten predictions
y_val_pred = y_val_pred.flatten()
plot_residuals("Model 15", y_val, y_val_pred)

# cakuculate metrics
calculate_metrics("Model 15", y_val, y_val_pred)

In [None]:
# model 16
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
REGULARIZATIONFACTOR = 0.03
DROPOUT = 0.50
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.07, patience=15)

# create model
model16 = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model16.compile(
    optimizer=optimizer,
    loss=weighted_loss,
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history17 = model16.fit(
    X_train_transformedandscaled,
    y_train,
    validation_data=(X_val_transformedandscaled, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

In [None]:
# plot residual plot
# get predictions
y_val_pred = model16.predict(X_val_transformedandscaled)
# flatten predictions
y_val_pred = y_val_pred.flatten()
plot_residuals("Model 16", y_val, y_val_pred)

# cakuculate metrics
calculate_metrics("Model 16", y_val, y_val_pred)


In [None]:
# model 21 - decision tree regressor
model21 = DecisionTreeRegressor(max_depth=30, min_samples_split=10, min_samples_leaf=10)

#weight is cube of y
weights = np.power(y_train, 2)

model21.fit(X_train_transformedandscaled, y_train, sample_weight=weights)


#plot residuals
plot_residuals("Model 21", y_val, y_val_pred)

#calculate metrics
calculate_metrics("Model 21", y_val, y_val_pred)





In [None]:
# model 22 - neural network but ensemble
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
REGULARIZATIONFACTOR = 0.03
DROPOUT = 0.50
learningrate = 0.002
DELTA = 1.0
optimizer = Adam(learning_rate=learningrate)

# get values over 100
threshold = 80

# Split the training data into two subsets
X_train_low = X_train_transformedandscaled[y_train <= threshold]
y_train_low = y_train[y_train <= threshold]
X_train_high = X_train_transformedandscaled[y_train > threshold]
y_train_high = y_train[y_train > threshold]
# Split the validation data into two subsets
# Split the validation data into two subsets
X_val_low = X_val_transformedandscaled[y_val <= threshold]
y_val_low = y_val[y_val <= threshold]
X_val_high = X_val_transformedandscaled[y_val > threshold]
y_val_high = y_val[y_val > threshold]

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)


# create model_low_values
model_low_values = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model_low_values.compile(
    optimizer=optimizer,
    loss=weighted_loss,
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history_low_values = model_low_values.fit(
    X_train_low,
    y_train_low,
    # sample_weight=sample_weights,
    validation_data=(X_val_low, y_val_low),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)


In [None]:
# model 22 - neural network but ensemble
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
REGULARIZATIONFACTOR = 0.03
DROPOUT = 0.50
learningrate = 0.002
DELTA = 1.0
optimizer = Adam(learning_rate=learningrate)
# create model_high_values
model_high_values = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model_high_values.compile(
    optimizer=optimizer,
    loss=weighted_loss,
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history_high_values = model_high_values.fit(
    X_train_high,
    y_train_high,
    # sample_weight=sample_weights,
    validation_data=(X_val_high, y_val_high),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

In [None]:
print(X_val_low.shape)
print(X_val_high.shape)

# Use the appropriate model to make predictions for each subset
y_val_pred_low = model_low_values.predict(X_val_low).flatten()
y_val_pred_high = model_high_values.predict(X_val_high).flatten()

# Concatenate the predictions and the true values
y_val_pred = np.concatenate([y_val_pred_low, y_val_pred_high])
y_val_true = np.concatenate([y_val[y_val <= threshold], y_val[y_val > threshold]])

# Flatten the predictions
y_val_pred = y_val_pred.flatten()
# Calculate R2 score
r2 = r2_score(y_val_true, y_val_pred)
print(f"R2 score: {r2}")
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_val_true, y_val_pred))
print(f"RMSE: {rmse}")
# plot residuals
plot_residuals("Model 22", y_val_true, y_val_pred)

# scatter plot of actual vs predicted values
plt.scatter(y_val_true, y_val_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.plot(
    [y_val_true.min(), y_val_true.max()], [y_val_true.min(), y_val_true.max()], color="red"
)  # y=x line
plt.show()


In [None]:
# list of models
models = [model16, model15, model14, model21]


# stacking models
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
learningrate = 0.0015
optimizer = Adam(learning_rate=learningrate)

# create meta features
# create meta features
meta_features = np.column_stack(
    [model.predict(X_train_transformedandscaled).reshape(-1, 1) for model in models] +
     [model_low_values.predict(X_train_transformedandscaled).reshape(-1, 1), model_high_values.predict(X_train_transformedandscaled).reshape(-1, 1)]

)

# create meta features for validation data
meta_val_features = np.column_stack(
    [model.predict(X_val_transformedandscaled).reshape(-1, 1) for model in models]+
    [
        model_low_values.predict(X_val_transformedandscaled).reshape(-1, 1),
        model_high_values.predict(X_val_transformedandscaled).reshape(-1, 1),
    ]
)


meta_model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=6),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_regularizer=regularizers.l2(0.5),  # L2 regularization
        ),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_regularizer=regularizers.l2(0.5),  # L2 regularization
        ),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_regularizer=regularizers.l2(0.5),  # L2 regularization
        ),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)
# Compile the meta-model
meta_model.compile(
    loss=weighted_loss,
    optimizer=optimizer,
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history_meta = meta_model.fit(
    meta_features,
    y_train,
    validation_data=(meta_val_features, y_val),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
)


# Make final predictions
final_predictions = meta_model.predict(meta_val_features)

# Calculate and print scores
r2 = r2_score(y_val, final_predictions)
rmse = np.sqrt(mean_squared_error(y_val, final_predictions))
mae = mean_absolute_error(y_val, final_predictions)
print(f"R2 score: {r2}")
print(f"RMSE: {rmse}")
print(f"MAE: {mae}")

# Create scatter and residual plots
plt.scatter(y_val, final_predictions)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.plot(
    [y_val.min(), y_val.max()], [y_val.min(), y_val.max()], color="red"
)  # y=x line
plt.show()

# flatten predictions
final_predictions_flat = final_predictions.flatten()
# Plot residuals
plot_residuals("Model Ensemble", y_val, final_predictions_flat)

In [None]:
# weighted loss function - copy
def weighted_loss(y_true, y_pred):
    y_true_float = tf.cast(y_true, tf.float32)
    weights = tf.sqrt(y_true_float) + 1
    return tf.reduce_sum(weights * tf.square(y_true_float - y_pred))

In [None]:
# Best Model - model 22

# model 22 - neural network but ensemble
INPUT_DIM = X_train.shape[1]
HIDDEN_LAYER_DIM = 512
OUTPUT_CLASSES = 1
BATCH_SIZE = 256
REGULARIZATIONFACTOR = 0.03
DROPOUT = 0.50
learningrate = 0.002
DELTA = 1.0
optimizer = Adam(learning_rate=learningrate)



# get values over 100
threshold = 100



# Split the training data into two subsets
X_train_low = X_train_transformedandscaled[y_train <= threshold]
y_train_low = y_train[y_train <= threshold]
X_train_high = X_train_transformedandscaled[y_train > threshold]
y_train_high = y_train[y_train > threshold]

# Split the validation data into two subsets
X_val_low = X_val_transformedandscaled[y_val <= threshold]
y_val_low = y_val[y_val <= threshold]
X_val_high = X_val_transformedandscaled[y_val > threshold]
y_val_high = y_val[y_val > threshold]

# Define the callbacks
early_stopping = EarlyStopping(monitor="val_loss", patience=15)
reduce_lr = ReduceLROnPlateau(monitor="val_loss", factor=0.05, patience=15)


# create model_low_values
model_low_values = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model_low_values.compile(
    optimizer=optimizer,
    loss=weighted_loss,
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history_low_values = model_low_values.fit(
    X_train_low,
    y_train_low,
    validation_data=(X_val_low, y_val_low),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

# create model_high_values
model_high_values = tf.keras.models.Sequential(
    [
        tf.keras.layers.Input(shape=(INPUT_DIM)),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(
            HIDDEN_LAYER_DIM,
            activation="relu",
            kernel_initializer="he_normal",
            kernel_regularizer=regularizers.l2(REGULARIZATIONFACTOR),
        ),
        Dropout(DROPOUT),
        BatchNormalization(),
        tf.keras.layers.Dense(OUTPUT_CLASSES, kernel_initializer="he_normal"),
    ]
)

# compile model
model_high_values.compile(
    optimizer=optimizer,
    loss=weighted_loss,
    metrics=["mean_absolute_error", tf.keras.metrics.RootMeanSquaredError()],
)

# train the model
history_high_values = model_high_values.fit(
    X_train_high,
    y_train_high,
    validation_data=(X_val_high, y_val_high),
    epochs=100,
    verbose=2,
    batch_size=BATCH_SIZE,
    callbacks=[early_stopping, reduce_lr],
)

# Use the appropriate model to make predictions for each subset
y_val_pred_low = model_low_values.predict(X_val_low).flatten()
y_val_pred_high = model_high_values.predict(X_val_high).flatten()

# Concatenate the predictions and the true values
y_val_pred = np.concatenate([y_val_pred_low, y_val_pred_high])
y_val_true = np.concatenate([y_val[y_val <= threshold], y_val[y_val > threshold]])

# Flatten the predictions
y_val_pred = y_val_pred.flatten()
# calcul;ate metrics
calculate_metrics("Model 22", y_val_true, y_val_pred)

# plot residuals
plot_residuals("Model 22", y_val_true, y_val_pred)

# scatter plot of actual vs predicted values
scatter_plot(y_val_true, y_val_pred, "Best Model")

In [None]:
# best model on test data
# split test data
X_test_low = X_test_transformedandscaled[y_test <= threshold]
y_test_low = y_test[y_test <= threshold]
X_test_high = X_test_transformedandscaled[y_test > threshold]
y_test_high = y_test[y_test > threshold]

# Use the appropriate model to make predictions for each subset
y_test_pred_low = model_low_values.predict(X_test_low).flatten()
y_test_pred_high = model_high_values.predict(X_test_high).flatten()

# Concatenate the predictions and the true values
y_test_pred = np.concatenate([y_test_pred_low, y_test_pred_high])
y_test_true = np.concatenate([y_test[y_test <= threshold], y_test[y_test > threshold]])

# Flatten the predictions
y_test_pred = y_test_pred.flatten()
# calculate metrics
calculate_metrics("Model 22", y_test_true, y_test_pred)

# plot residuals
plot_residuals("Best Model on Test Data", y_test_true, y_test_pred)

# scatter plot of actual vs predicted values
scatter_plot( y_test_true, y_test_pred, "Best Model - Test Data" )