# Predicting Residential EV Charging Loads using Neural Networks



In [None]:
# Setup - import basic data libraries
import numpy as np
import pandas as pd

## Task Group 1 - Load, Inspect, and Merge Datasets

### Task 1

The file `'datasets/EV charging reports.csv'` contains electric vehicle (EV) charging data. These come from various residential apartment buildings in Norway. The data includes specific user and garage information, plug-in and plug-out times, charging loads, and the dates of the charging sessions.

The file is imported to a pandas DataFrame named `ev_charging_reports`.

Use the `.head()` method to preview the first five rows.

In [None]:
ev_charging_reports = pd.read_csv('datasets/EV charging reports.csv')
ev_charging_reports.head()

<details><summary style="display:list-item; font-size:16px; color:blue;">What is the structure of the dataset?</summary>

- **session_ID** - the unique id for each EV charging session
- **Garage_ID** - the unique id for the garage of the apartment
- **User_ID** - the unique id for each user
- **User_private** - 1.0 indicates private charge point spaces and 0.0 indicates shared charge point spaces
- **Shared_ID** - the unique id if shared charge point spaces are used
- **Start_plugin** - the plug-in date and time in the format (day.month.year hour:minute)
- **Start_plugin_hour** - the plug-in date and time rounded to the start of the hour
- **End_plugout** - the plug-out date and time in the format (day.month.year hour:minute)
- **End_plugout_hour** - the start of the hour of the `End_plugout` hour
- **El_kWh** - the charged energy in kWh (charging loads)
- **Duration_hours** - the duration of the EV connection time per session
- **Plugin_category** - the plug-in time categorized by early/late night, morning, afternoon, and evening
- **Duration_category** - the plug-in duration categorized by 3 hour groups
- **month_plugin_{month}** - the month of the plug-in session
- **weekdays_plugin_{day}** - the day of the week of the plug-in session

### Task 2

The file `'datasets/Local traffic distribution.csv'` is imported to a pandas DataFrame named `traffic_reports`. This dataset contains the hourly local traffic density counts at 5 nearby traffic locations. 

Preview the first five rows.

In [None]:
traffic_reports = pd.read_csv('datasets/Local traffic distribution.csv')
traffic_reports.head()

<details><summary style="display:list-item; font-size:16px; color:blue;">What is the structure of the dataset?</summary>

- **Date_from** - the starting time in the format (day.month.year hour:minute)
- **Date_to** - the ending time in the format (day.month.year hour:minute)
- **Location 1 to 5** - contains the number of vehicles each hour at a specified traffic location.


### Task 3

We'd like to use the traffic data to help our model. The same charging location may charge at different rates depending on the number of cars being charged, so this traffic data might help the model out.

Merge the `ev_charging_reports` and `traffic_reports` datasets together into a Dataframe named `ev_charging_traffic` using the columns:

- `Start_plugin_hour` in `ev_charging_reports`
- `Date_from` in `traffic_reports`

In [None]:
ev_charging_reports = ev_charging_traffic = ev_charging_reports.merge(traffic_reports, 
                                left_on='Start_plugin_hour', 
                                right_on='Date_from')

ev_charging_traffic.head()

### Task 4

Use `.info()` to inspect the merged dataset. 

In [None]:
ev_charging_traffic.info()

<details><summary style="display:list-item; font-size:16px; color:blue;">What do we notice about merged dataset under inspection?</summary>

We see that there are 39 columns and 6,833 rows in our merged dataset.

Some notable things we might have to address:

- We expected columns like `El_kWh` and `Duration_hours` to be floats but they are actually object data types.

- There are many identifying columns like `session_ID` and `User_ID` that might not be useful for training.

## Task Group 2 - Data Cleaning and Preparation

### Task 5

We will drop the values that are not needed by our model

```py
['session_ID', 'Garage_ID', 'User_ID', 
                'Shared_ID',
                'Plugin_category','Duration_category', 
                'Start_plugin', 'Start_plugin_hour', 'End_plugout', 'End_plugout_hour', 
                'Date_from', 'Date_to']
```

In [None]:
drop_columns = ['session_ID', 'Garage_ID', 'User_ID', 
                'Shared_ID',
                'Plugin_category','Duration_category', 
                'Start_plugin', 'Start_plugin_hour', 'End_plugout', 'End_plugout_hour', 
                'Date_from', 'Date_to']

ev_charging_traffic = ev_charging_traffic.drop(columns=drop_columns, axis=1)
ev_charging_traffic.head()

### Task 6

Remove the "," in both `El_kWh` and `Duration_hours` and replace it by "."

In [None]:
for column in ev_charging_traffic.columns:
    if ev_charging_traffic[column].dtype == 'object':
        ev_charging_traffic[column] = ev_charging_traffic[column].str.replace(',', '.')
    
ev_charging_traffic.head()

### Task 7

Next, convert the data types of all the columns of `ev_charging_traffic` to floats.

In [None]:
ev_charging_traffic = ev_charging_traffic.astype(float)

## Task Group 3 - Train Test Split

Next, let's split the dataset into training and testing datasets. 

The training data will be used to train the model and the testing data will be used to evaluate the model.

### Task 8

We create two datasets from `ev_charging_traffic`:

- `X` contains only the input numerical features
- `y` contains only the target column `El_kWh`

In [None]:
numerical_features = ev_charging_traffic.drop(['El_kWh'], axis=1).columns
X = ev_charging_traffic[numerical_features]

y = ev_charging_traffic['El_kWh']

### Task 9

Split `X` and `y` into training and testing datasets using sklearn. The training set should use 80% of the data. Set the `random_state` parameter to `2`.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    train_size=0.80,
                                                    test_size=0.20,
                                                    random_state=2) # set a random seed - do not modify

print("Training size:", X_train.shape)
print("Testing size:", X_test.shape)

## Task Group 4 - Linear Regression Baseline

This section is optional, but useful. We want to compare our neural network to a basic linear regression. 



### Task 10

Train a Linear Regression model using the training data to predict EV charging loads.

The linear regression will be used as a baseline to compare against the neural network we will train later.

In [None]:
from sklearn.linear_model import LinearRegression

linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

### Task 11

Evaluating the linear regression baseline by calculating the MSE on the testing data. Use `mean_squared_error` from `sklearn.metrics`.

Save the testing MSE to the variable `test_mse` and print it out.

In [None]:
from sklearn.metrics import mean_squared_error

linear_test_predictions = linear_model.predict(X_test)
test_mse = mean_squared_error(y_test, linear_test_predictions)
print("Linear Regression - Test Set MSE:", test_mse)

The mean squared error is around `131.4`  If the square root is taken, we have about `11.5`. We can say that the linear regression, on average, is off by `11.5 kWh`.

## Task Group 5 - Train a Neural Network Using PyTorch
Let's now create a neural network using PyTorch to predict EV charging loads.

In [None]:
import torch
from torch import nn
from torch import optim

In [None]:
# Convert training set
X_train_tensor = torch.tensor(X_train.values, dtype=torch.float)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float).view(-1,1)

# Convert testing set
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float).view(-1,1)

### Task 14


We create a sequential neural network with the following architecture:

- input layer with number of nodes equal to the number of training features
- a first hidden layer with `56` nodes and a ReLU activation
- a second hidden layer with `26` nodes and a ReLU activation
- an output layer with `1` node



In [None]:

model = nn.Sequential(
    nn.Linear(26, 56),
    nn.ReLU(),
    nn.Linear(56, 26),
    nn.ReLU(),
    nn.Linear(26, 1)
)

### Task 15
We define the loss function and optimizer used for training:

- set the MSE loss function to the variable `loss`
- set the Adam optimizer to the variable `optimizer` with a learning rate of `0.0007`

In [None]:
loss = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0007)

### Task 16

We create a training loop to train our neural network for 3000 epochs.

We also keep track of the training loss by printing out the MSE every 500 epochs.

In [None]:
num_epochs = 3000 # number of training iterations
for epoch in range(num_epochs):
    outputs = model(X_train_tensor) # forward pass 
    mse = loss(outputs, y_train_tensor) # calculate the loss 
    mse.backward() # backward pass
    optimizer.step() # update the weights and biases
    optimizer.zero_grad() # reset the gradients to zero

    # keep track of the loss during training
    if (epoch + 1) % 500 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], MSE Loss: {mse.item()}')

### Task 17

We save the neural network in the `models` directory using the path `models/model.pth`.

In [None]:
torch.save(model, 'models/model.pth')  

### Task 18

We now evaluate the neural network on the testing set. 

We save the testing data loss to the variable `test_loss` and use `.item()` to extract and print out the loss. 

In [None]:
# using the loaded neural network `loaded_model`
model.eval() # set the model to evaluation mode
with torch.no_grad(): # disable gradient calculations
    predictions = model(X_test_tensor) # generate apartment rent predictions
    test_loss = loss(predictions, y_test_tensor) # calculate testing set MSE loss
    
print('Neural Network - Test Set MSE:', test_loss.item()) # print testing set MSE    

### Task 19

The same model was run for 4500 epochs locally. That model is saved as `models/model4500.pth`. Load this model using PyTorch and evaluate it. How well does the longer-trained model perform?

In [None]:
# load the model
model4500 = torch.load('models/model4500.pth')

# using the loaded neural network `loaded_model`
model4500.eval() # set the model to evaluation mode
with torch.no_grad(): # disable gradient calculations
    predictions = model4500(X_test_tensor) # generate apartment rent predictions
    test_loss = loss(predictions, y_test_tensor) # calculate testing set MSE loss
    
print('Neural Network - Test Set MSE:', test_loss.item()) # print testing set MSE

The increased training improved the test loss to about `115.2`. A full `12%` improvement on our linear regression baseline. So the nonlinearity introduced by the neural network actually improved the result.