In this lesson, we will read in the data collected from the Arduinos, and see if it improves predictions. 

# Step 1 - Loading libraries and data

In [None]:
import matplotlib.pyplot as plt #Matplotlib allows us to draw graphs
import numpy as np #Numpy allows us to perform complex mathematical processes quickly
import pandas as pd #Pandas is another useful set of tools for statistics
import datetime
from fbprophet import Prophet

In [None]:
energy = pd.read_csv('../input/school-smartmeter-data/meter-amr-readings-1200050359109.csv')
energy.head()

The data has been read in successfully, but we need one column for the energy readings, rather than one per time point. We can use the 'melt' function to do this. 

In [None]:
energy = pd.melt(energy, id_vars=['Reading Date', 'One Day Total kWh', 'Status', 'Substitute Date'], var_name='Time', value_name="Energy")
energy.head()

Next, we need to turn the date and time columns into a datetime column for python to work with. 

In [None]:
energy['Timestamp'] = pd.to_datetime(energy['Reading Date'] + " " + energy['Time'], format='%Y-%m-%d %H:%M')
print('Start of data collection: ', energy['Timestamp'].min())
print('End of data collection: ', energy['Timestamp'].max())
energy = energy.sort_values(by=['Timestamp'])

# energy['Timestamp'] = energy['Timestamp'].mask(energy['Timestamp'].dt.year == 2020, energy['Timestamp'] + pd.offsets.DateOffset(year=2021))
energy = energy[(energy['Timestamp']>"2021-05-27")]
# energy = energy[(energy['Timestamp']>"2021-05-27") & (energy['Timestamp']<"2021-06-08")]
energy.head()


Now, we can read in the extra data you've collected with you Arduinos.

In [None]:
arduino = pd.read_csv('../input/supplemental-data/all_arduino_data_trim.csv')

arduino_location = 'STAFF'
# choose from ['Temperature', 'Light values', 'Motion detected ']
data_type = 'Motion detected '

arduino_id = arduino_location + "_" + data_type

# edit the column name to pick out your dataset
arduino_data = arduino[['Timestamp',arduino_id]]
arduino_data['Timestamp'] = pd.to_datetime(arduino_data['Timestamp'], format='%Y-%m-%d %H:%M')

print('\n\nStart of data collection: ', arduino_data['Timestamp'].min())
print('End of data collection: ', arduino_data['Timestamp'].max())
print(arduino_data.head())


# Step 2 - Comparing the data



Now, we can plot the energy use data and the arduino data to see if they show a relationship. 

In [None]:
fig, ax1 = plt.subplots(figsize=(15, 6))
plt.xticks( rotation=25 )
ax2 = ax1.twinx()  

ax1.plot(arduino_data['Timestamp'], arduino_data[arduino_id], color='blue')
ax1.set_xlabel('Time')
ax1.set_ylabel(data_type)
ax1.legend([data_type], loc="upper left")

ax2.plot(energy['Timestamp'], energy['Energy'], color='red')
ax2.set_ylabel('Energy use')
ax2.legend(['Energy use'], loc='upper right')


# Step 3 - Preparing the data for modelling

We need to break the data into training and testing, to see how well the forecasting algorithm works.

In [None]:
train_data = energy[['Timestamp', 'Energy']][(energy['Timestamp']> "2021-05-27 13:00:00") & (energy['Timestamp']< "2021-06-08")]
test_data = energy[['Timestamp', 'Energy']][(energy['Timestamp']> "2021-06-08") & (energy['Timestamp']< "2021-06-10")]

# We can use the Python 'merge' function to add the Arduino data as a column to the energy data frame

train_data_arduino = train_data.merge(arduino_data, on='Timestamp')
test_data_arduino = test_data.merge(arduino_data, on='Timestamp')

train_data.columns = ['ds', 'y']
test_data.columns = ['ds', 'y']
train_data_arduino.columns = ['ds', 'y', data_type]
test_data_arduino.columns = ['ds', 'y', data_type]


# Step 3 - Training a forecasting model

Here, we will use a forecasting package called Prophet to train a model to predict energy usage. We will leave out yearly modelling, since we don't have enough 
data to learn about energy use over a year.

In [None]:
model = Prophet(daily_seasonality=True, weekly_seasonality=True)
model.fit(train_data)

forecast = model.predict(test_data)
fig = model.plot(forecast)


In [None]:
fig = model.plot_components(forecast)

In [None]:
model_plus = Prophet(daily_seasonality=True, weekly_seasonality=True)
model_plus.add_regressor(data_type)
model_plus.fit(train_data_arduino)

forecast_plus = model_plus.predict(test_data_arduino)
fig = model_plus.plot(forecast_plus)


We can break the model down into different components - an overall trend, weekly trend, yearly trend, and daily trend.

In [None]:
fig = model_plus.plot_components(forecast)

We can compare the forecasted energy use values to real energy use for the same time period. First we can use a scatter plot to view the correlation between forecasted and real rates. We can add a diagonal line to show where points should fall if they are a perfect prediction. 

In [None]:
plt.scatter(x=test_data['y'], y=forecast['yhat'], alpha=0.5, label='Basic energy model')
plt.plot([-20,140],[-20,140], ls="--", c=".3") 
# plt.xlim(-20,140)
# plt.ylim(-20,140)
plt.xlabel('Real energy use')
plt.ylabel('Predicted energy use')

plt.scatter(x=test_data['y'], y=forecast_plus['yhat'], alpha=0.5, label='Model with Arduino data')
plt.plot([-20,140],[-20,140], ls="--", c=".3") 
# plt.xlim(-20,140)
# plt.ylim(-20,140)
plt.xlabel('Real energy use')
plt.ylabel('Predicted energy use')
plt.legend()

Next, we can plot real and predicted energy use over time for the model with no Arduino data. 

In [None]:
fig, ax1 = plt.subplots()
# rotate the date labels so they don't overlap
plt.xticks( rotation=25 )
# set up the 2nd axis
ax2 = ax1.twinx()  

ax1.plot(test_data['ds'], test_data['y'], color='red')
ax1.set_xlabel('Timestamp')
ax1.set_ylabel('Energy use')

ax1.plot(forecast['ds'], forecast['yhat'], color='blue')
ax1.legend(['Energy use', 'Predicted energy use'], loc='upper right')


We can compare this to the predictions from the model that uses Arduino data. 

In [None]:
fig, ax1 = plt.subplots()
# rotate the date labels so they don't overlap
plt.xticks( rotation=25 )

ax1.plot(test_data['ds'], test_data['y'], color='red')
ax1.set_xlabel('Timestamp')
ax1.set_ylabel('Energy use')

ax1.plot(forecast_plus['ds'], forecast_plus['yhat'], color='blue')
ax1.legend(['Energy use', 'Predicted energy use'], loc='upper right')
