<a href="https://colab.research.google.com/github/karuguduncan/hamoyekaruguduncan/blob/main/HamoyeTimeseries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Data Set Information:

Data Link: https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption

This archive contains 2075259 measurements gathered between December 2006 and November 2010 (47 months).

Notes:

1.(globalactivepower*1000/60 - submetering1 - submetering2 - submetering3) represents the active energy consumed every minute (in watt hour) in the household by electrical equipment not measured in sub-meterings 1, 2 and 3.

2.The dataset contains some missing values in the measurements (nearly 1.25% of the rows). All calendar timestamps are present in the dataset but for some timestamps, the measurement values are missing: a missing value is represented by the absence of value between two consecutive semi-colon attribute separators. For instance, the dataset shows missing values on April 28, 2007.

Attribute Information:

1. date: Date in format dd/mm/yyyy
2. time: time in format hh:mm:ss
3. globalactivepower: household global minute-averaged active power (in kilowatt)
4. globalreactivepower: household global minute-averaged reactive power (in kilowatt)
5. voltage: minute-averaged voltage (in volt)
6. global_intensity: household global minute-averaged current intensity (in ampere)
7. submetering1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
8. submetering2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
9. submetering3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

In [104]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [105]:
import pandas as pd
import numpy as np


df=pd.read_csv('/content/drive/MyDrive/household_power_consumption.txt',header=None)

In [106]:
df.head()

Unnamed: 0,0
0,Date;Time;Global_active_power;Global_reactive_...
1,16/12/2006;17:24:00;4.216;0.418;234.840;18.400...
2,16/12/2006;17:25:00;5.360;0.436;233.630;23.000...
3,16/12/2006;17:26:00;5.374;0.498;233.290;23.000...
4,16/12/2006;17:27:00;5.388;0.502;233.740;23.000...


In [107]:
df.shape

(2075260, 1)

In [108]:
# Splitting the single column into multiple columns
df_new = df[0].str.split(';', expand=True)

In [109]:
df_new.shape

(2075260, 9)

In [110]:
df_new.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
1,16/12/2006,17:24:00,4.216,0.418,234.840,18.400,0.000,1.000,17.000
2,16/12/2006,17:25:00,5.360,0.436,233.630,23.000,0.000,1.000,16.000
3,16/12/2006,17:26:00,5.374,0.498,233.290,23.000,0.000,2.000,17.000
4,16/12/2006,17:27:00,5.388,0.502,233.740,23.000,0.000,1.000,17.000


In [111]:
# Setting the first row as the columns header
df_new.columns = df_new.iloc[0]
df_new = df_new[1:]

In [112]:
#Resetting the index of the DataFrame
df_new.reset_index(drop=True, inplace=True)

In [113]:
df_new.shape

(2075259, 9)

In [114]:
df_new.head()

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,16/12/2006,17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
1,16/12/2006,17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2,16/12/2006,17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
3,16/12/2006,17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
4,16/12/2006,17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


In [115]:
df_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075259 entries, 0 to 2075258
Data columns (total 9 columns):
 #   Column                 Dtype 
---  ------                 ----- 
 0   Date                   object
 1   Time                   object
 2   Global_active_power    object
 3   Global_reactive_power  object
 4   Voltage                object
 5   Global_intensity       object
 6   Sub_metering_1         object
 7   Sub_metering_2         object
 8   Sub_metering_3         object
dtypes: object(9)
memory usage: 142.5+ MB


In [116]:
# checking for missing values
df_new.isnull().sum()

0
Date                     0
Time                     0
Global_active_power      0
Global_reactive_power    0
Voltage                  0
Global_intensity         0
Sub_metering_1           0
Sub_metering_2           0
Sub_metering_3           0
dtype: int64

In [117]:
# Using the daily sampling rate (sum), divide the data into a train and test set.
#The last 300 days is your test set and the first (x-300) days is your training set.
#Where x is the length of the dataset.

#Converting the date column to datetime
df_new['Date'] = pd.to_datetime(df_new['Date'])
df_new.set_index('Date', inplace=True)

  df_new['Date'] = pd.to_datetime(df_new['Date'])


In [118]:
df_new.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2075259 entries, 2006-12-16 to 2010-11-26
Data columns (total 8 columns):
 #   Column                 Dtype 
---  ------                 ----- 
 0   Time                   object
 1   Global_active_power    object
 2   Global_reactive_power  object
 3   Voltage                object
 4   Global_intensity       object
 5   Sub_metering_1         object
 6   Sub_metering_2         object
 7   Sub_metering_3         object
dtypes: object(8)
memory usage: 142.5+ MB


In [119]:
# Renaming columns to match Prophet's expectations
# Renaming the datetime index column
df_new1 = df_new.rename_axis('ds', axis='index').rename(columns={'Global_active_power': 'y'})

In [120]:
df_new1.describe()

Unnamed: 0,Time,y,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
count,2075259,2075259,2075259.0,2075259,2075259.0,2075259.0,2075259.0,2075259.0
unique,1440,4187,533.0,2838,222.0,89.0,82.0,33.0
top,17:24:00,?,0.0,?,1.0,0.0,0.0,0.0
freq,1442,25979,481561.0,25979,172785.0,1880175.0,1436830.0,852092.0


In [121]:
df_new1.head()

Unnamed: 0_level_0,Time,y,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2006-12-16,17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
2006-12-16,17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2006-12-16,17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
2006-12-16,17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
2006-12-16,17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


In [122]:


# Resampling to daily frequency and sum the values
daily_data = df_new1.resample('D').sum()
# Reseting index to make datetime index a column
daily_data.reset_index(inplace=True)

In [123]:
daily_data.head()

Unnamed: 0,ds,Time,y,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,2006-12-16,17:24:0017:25:0017:26:0017:27:0017:28:0017:29:...,4.2165.3605.3745.3883.6663.5203.7023.7003.6683...,0.4180.4360.4980.5020.5280.5220.5200.5200.5100...,234.840233.630233.290233.740235.680235.020235....,18.40023.00023.00023.00015.80015.00015.80015.8...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,1.0001.0002.0001.0001.0002.0001.0001.0001.0002...,17.00016.00017.00017.00017.00017.00017.00017.0...
1,2006-12-17,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.0441.5203.0382.9742.8462.8482.8582.4720.6580...,0.1520.2200.1940.1940.1980.1980.2020.2080.2440...,242.730242.200240.140239.970240.390240.590241....,4.4007.40012.60012.40011.80011.80011.80010.800...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,2.0001.0002.0001.0002.0001.0001.0002.0001.0002...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...
2,2006-12-18,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.2780.2380.2080.2060.2060.2060.2040.2040.2120...,0.1260.0560.0000.0000.0000.0000.0000.0000.0000...,246.170246.400246.460245.940245.980245.560245....,1.2001.0000.8000.8000.8000.8000.8000.8001.0001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,2.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...
3,2006-12-19,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.4140.5040.4080.4160.3940.4100.3900.4080.3880...,0.2420.3260.2340.2240.2140.2200.2180.2220.2180...,241.190241.500241.770242.350241.840241.960242....,2.0002.6002.0002.0001.8002.0001.8002.0001.8002...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,1.0002.0002.0001.0002.0001.0001.0002.0001.0002...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...
4,2006-12-20,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.8240.9580.9740.9580.9761.0701.0761.0600.8980...,0.0580.0560.0620.0600.0640.1840.2020.1980.2000...,245.570244.780244.740245.140245.180245.020244....,3.4003.8004.0003.8004.0004.4004.4004.4003.8004...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0001.0000.0000.0000.0000.0001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...


In [124]:
print(daily_data['y'].head(10))

0    4.2165.3605.3745.3883.6663.5203.7023.7003.6683...
1    1.0441.5203.0382.9742.8462.8482.8582.4720.6580...
2    0.2780.2380.2080.2060.2060.2060.2040.2040.2120...
3    0.4140.5040.4080.4160.3940.4100.3900.4080.3880...
4    0.8240.9580.9740.9580.9761.0701.0761.0600.8980...
5    1.8141.4741.4781.7541.4661.4081.4021.3961.4001...
6    0.2060.2060.3280.3140.3100.3080.3060.3060.3060...
7    2.3282.3162.3420.5641.0082.3322.3181.5900.2281...
8    5.3765.3403.6843.5665.3885.2483.3385.1244.9144...
9    0.5860.5840.6480.6540.6560.6480.5620.5160.4840...
Name: y, dtype: object


In [125]:
# extracting numeric values
daily_data['y'] = daily_data['y'].str.extract('(\d+\.\d+)', expand=False)

In [126]:
# Convert 'Value' column to numeric
daily_data['y'] = pd.to_numeric(daily_data['y'])
daily_data.columns

Index(['ds', 'Time', 'y', 'Global_reactive_power', 'Voltage',
       'Global_intensity', 'Sub_metering_1', 'Sub_metering_2',
       'Sub_metering_3'],
      dtype='object', name=0)

In [127]:
#Determining the length of the dataset
x = len(daily_data)
x

1442

In [128]:
#The last 300 days is your test set and the first (x-300) days is your training set.
#Where x is the length of the dataset
#Spliting the data into train and test sets
train_df = daily_data.iloc[:x-300]
test_df = daily_data.iloc[x-300:]

In [129]:
train_df.head()

Unnamed: 0,ds,Time,y,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,2006-12-16,17:24:0017:25:0017:26:0017:27:0017:28:0017:29:...,4.2165,0.4180.4360.4980.5020.5280.5220.5200.5200.5100...,234.840233.630233.290233.740235.680235.020235....,18.40023.00023.00023.00015.80015.00015.80015.8...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,1.0001.0002.0001.0001.0002.0001.0001.0001.0002...,17.00016.00017.00017.00017.00017.00017.00017.0...
1,2006-12-17,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.0441,0.1520.2200.1940.1940.1980.1980.2020.2080.2440...,242.730242.200240.140239.970240.390240.590241....,4.4007.40012.60012.40011.80011.80011.80010.800...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,2.0001.0002.0001.0002.0001.0001.0002.0001.0002...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...
2,2006-12-18,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.278,0.1260.0560.0000.0000.0000.0000.0000.0000.0000...,246.170246.400246.460245.940245.980245.560245....,1.2001.0000.8000.8000.8000.8000.8000.8001.0001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,2.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...
3,2006-12-19,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.414,0.2420.3260.2340.2240.2140.2200.2180.2220.2180...,241.190241.500241.770242.350241.840241.960242....,2.0002.6002.0002.0001.8002.0001.8002.0001.8002...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,1.0002.0002.0001.0002.0001.0001.0002.0001.0002...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...
4,2006-12-20,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.824,0.0580.0560.0620.0600.0640.1840.2020.1980.2000...,245.570244.780244.740245.140245.180245.020244....,3.4003.8004.0003.8004.0004.4004.4004.4003.8004...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0001.0000.0000.0000.0000.0001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...


In [130]:
test_df.head()

Unnamed: 0,ds,Time,y,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
1142,2010-01-31,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.27,0.0000.0000.0000.0000.0000.0000.0000.0000.1000...,244.330244.460244.010244.400244.320244.470244....,1.0001.2001.0001.2001.0001.2001.0001.4001.6001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0001.0000.0000.0000.0000.0000.0001.0000.0000...,1.0000.0001.0001.0000.0001.0001.0000.0001.0001...
1143,2010-02-01,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.346,0.0000.0000.0000.0000.0000.0000.0000.0840.1140...,245.130245.290244.200244.660244.500245.790244....,1.4001.4001.4001.4001.4001.4001.4001.8001.8001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,1.0001.0000.0001.0001.0000.0001.0001.0000.0001...
1144,2010-02-02,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.6861,0.0560.0480.0460.0480.0460.0000.0580.0580.0600...,245.190244.580245.320244.790244.870243.610243....,6.8006.8006.6006.6005.8005.8006.0006.0006.0006...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,19.00019.00019.00019.00019.00019.00019.00019.0...
1145,2010-02-03,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.6421,0.0880.0840.0880.0860.0880.0880.0880.0860.0880...,243.050242.160243.100242.840243.640243.430243....,6.6006.6006.6006.6006.6006.6006.6006.6006.6006...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,19.00019.00018.00019.00019.00019.00019.00018.0...
1146,2010-02-04,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.3601,0.0000.0880.0780.0820.0740.0800.0740.0820.0760...,243.070243.440241.700241.110240.090240.240240....,5.6006.0005.8006.0005.8005.8005.8005.8005.8005...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0001.0000.0000.0000.0000.0000.0001...,19.00019.00018.00019.00018.00018.00019.00018.0...


In [131]:
# Printing the lengths of the train and test sets to verify
print(f"Training set length: {len(train_df)} days")
print(f"Test set length: {len(test_df)} days")

Training set length: 1142 days
Test set length: 300 days


In [132]:
from prophet import Prophet

#Use Facebook Prophet to train a Univariate time series modeling
#using this time column (‘dt’ or ‘ds’) and the global_active_power (or ‘y’)

#  Train the Prophet model
model = Prophet()
model.fit(train_df)

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpudg5ef41/ke15hbyc.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpudg5ef41/n4f5t3_2.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=10152', 'data', 'file=/tmp/tmpudg5ef41/ke15hbyc.json', 'init=/tmp/tmpudg5ef41/n4f5t3_2.json', 'output', 'file=/tmp/tmpudg5ef41/prophet_modelnbo6zn87/prophet_model-20240618204710.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
20:47:10 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
20:47:10 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


<prophet.forecaster.Prophet at 0x7c4d28169750>

In [133]:
# Making future DataFrame for prediction
future = model.make_future_dataframe(periods=300)
forecast = model.predict(future)


In [134]:
# Extracting the forecasted values for the test period
forecast_test = forecast[['ds', 'yhat']].iloc[-300:]


In [135]:
# Merge the actual and predicted values
test_df = test_df.set_index('ds').join(forecast_test.set_index('ds'))

In [136]:
# dropping none values
test_df = test_df.dropna()

In [137]:
#Evaluating the model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error,mean_absolute_percentage_error

mae = mean_absolute_error(test_df['y'], test_df['yhat'])
mse = mean_squared_error(test_df['y'], test_df['yhat'])
mape= mean_absolute_percentage_error(test_df['y'], test_df['yhat'])
rmse = mse ** 0.5

print(f"Mean Absolute Error: {mae}")
print(f"Mean Squared Error: {mse}")
print(f"Root Mean Squared Error: {rmse}")
print(f"Root Absolute Percentage Error: {mape}")

Mean Absolute Error: 0.4247229008931635
Mean Squared Error: 0.4636532933920061
Root Mean Squared Error: 0.6809209156664275
Root Absolute Percentage Error: 0.6739842256759565


In [138]:
#Preparing the additional regressors
extra_regressors = daily_data[['ds','Global_reactive_power', 'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3']].copy()
extra_regressors.columns = ['ds','add1', 'add2', 'add3', 'add4', 'add5', 'add6']

In [139]:
# extracting numeric values
extra_regressors.columns = ['ds'] + ['add' + str(i) for i in range(1, 7)]

In [140]:
for col in extra_regressors.columns[1:]:  # Start from the second column as the first is 'ds'
    extra_regressors[col] = pd.to_numeric(extra_regressors[col], errors='coerce') # 'coerce' will replace non-numeric values with NaN

extra_regressors.columns

Index(['ds', 'add1', 'add2', 'add3', 'add4', 'add5', 'add6'], dtype='object')

In [141]:
print('y' in train_df.columns)

True


In [142]:
extra_regressors=extra_regressors.dropna()

In [143]:
#Training the Prophet model with additional regressors

# Merging training data with extra regressors using left join
merged_df = train_df.merge(extra_regressors, on='ds', how='left')
merged_df.fillna(0, inplace=True)
print(merged_df.isna().sum())
merged_df.columns

ds                       0
Time                     0
y                        0
Global_reactive_power    0
Voltage                  0
Global_intensity         0
Sub_metering_1           0
Sub_metering_2           0
Sub_metering_3           0
add1                     0
add2                     0
add3                     0
add4                     0
add5                     0
add6                     0
dtype: int64


Index(['ds', 'Time', 'y', 'Global_reactive_power', 'Voltage',
       'Global_intensity', 'Sub_metering_1', 'Sub_metering_2',
       'Sub_metering_3', 'add1', 'add2', 'add3', 'add4', 'add5', 'add6'],
      dtype='object')

In [144]:
model1 = Prophet()
model1.add_regressor('add1')
model1.add_regressor('add2')
model1.add_regressor('add3')
model1.add_regressor('add4')
model1.add_regressor('add5')
model1.add_regressor('add6')

<prophet.forecaster.Prophet at 0x7c4d35e9cf40>

In [145]:
model1.fit(merged_df)

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpudg5ef41/pdnulrvi.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpudg5ef41/bxivg7ts.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=36923', 'data', 'file=/tmp/tmpudg5ef41/pdnulrvi.json', 'init=/tmp/tmpudg5ef41/bxivg7ts.json', 'output', 'file=/tmp/tmpudg5ef41/prophet_modelqz9x47tk/prophet_model-20240618204711.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
20:47:11 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
20:47:11 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


<prophet.forecaster.Prophet at 0x7c4d35e9cf40>

In [146]:
#Making future DataFrame for prediction
future1 = model1.make_future_dataframe(periods=300)
future1 = future.merge(extra_regressors, on='ds', how='left')
print(future.isna().sum())
future1.fillna(0, inplace=True)

ds    0
dtype: int64


In [147]:
forecast1 = model1.predict(future1)
forecast1.head()

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,add1,add1_lower,add1_upper,add2,...,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
0,2006-12-16,1.013136,0.28159,2.169218,1.013136,1.013136,0.0,0.0,0.0,0.0,...,0.051846,0.051846,0.051846,0.144746,0.144746,0.144746,0.0,0.0,0.0,1.209728
1,2006-12-17,1.012571,0.744743,2.722242,1.012571,1.012571,0.0,0.0,0.0,0.0,...,0.530965,0.530965,0.530965,0.16134,0.16134,0.16134,0.0,0.0,0.0,1.704876
2,2006-12-18,1.012006,0.062874,2.058742,1.012006,1.012006,0.0,0.0,0.0,0.0,...,-0.129341,-0.129341,-0.129341,0.177712,0.177712,0.177712,0.0,0.0,0.0,1.060377
3,2006-12-19,1.011441,0.139311,2.119687,1.011441,1.011441,0.0,0.0,0.0,0.0,...,-0.069447,-0.069447,-0.069447,0.193467,0.193467,0.193467,0.0,0.0,0.0,1.135462
4,2006-12-20,1.010877,0.070934,1.984959,1.010877,1.010877,0.0,0.0,0.0,0.0,...,-0.188179,-0.188179,-0.188179,0.20822,0.20822,0.20822,0.0,0.0,0.0,1.030917


In [148]:
#Extracting the forecasted values for the test period
forecast_test1 = forecast1[['ds', 'yhat']].iloc[-300:]
forecast1.columns

Index(['ds', 'trend', 'yhat_lower', 'yhat_upper', 'trend_lower', 'trend_upper',
       'add1', 'add1_lower', 'add1_upper', 'add2', 'add2_lower', 'add2_upper',
       'add3', 'add3_lower', 'add3_upper', 'add4', 'add4_lower', 'add4_upper',
       'add5', 'add5_lower', 'add5_upper', 'add6', 'add6_lower', 'add6_upper',
       'additive_terms', 'additive_terms_lower', 'additive_terms_upper',
       'extra_regressors_additive', 'extra_regressors_additive_lower',
       'extra_regressors_additive_upper', 'weekly', 'weekly_lower',
       'weekly_upper', 'yearly', 'yearly_lower', 'yearly_upper',
       'multiplicative_terms', 'multiplicative_terms_lower',
       'multiplicative_terms_upper', 'yhat'],
      dtype='object')

In [149]:
test_df.head()
test_df.reset_index(inplace=True)
test_df.head()

Unnamed: 0,ds,Time,y,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3,yhat
0,2010-01-31,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.27,0.0000.0000.0000.0000.0000.0000.0000.0000.1000...,244.330244.460244.010244.400244.320244.470244....,1.0001.2001.0001.2001.0001.2001.0001.4001.6001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0001.0000.0000.0000.0000.0000.0001.0000.0000...,1.0000.0001.0001.0000.0001.0001.0000.0001.0001...,1.681964
1,2010-02-01,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.346,0.0000.0000.0000.0000.0000.0000.0000.0840.1140...,245.130245.290244.200244.660244.500245.790244....,1.4001.4001.4001.4001.4001.4001.4001.8001.8001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,1.0001.0000.0001.0001.0000.0001.0001.0000.0001...,1.014897
2,2010-02-02,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.6861,0.0560.0480.0460.0480.0460.0000.0580.0580.0600...,245.190244.580245.320244.790244.870243.610243....,6.8006.8006.6006.6005.8005.8006.0006.0006.0006...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,19.00019.00019.00019.00019.00019.00019.00019.0...,1.078063
3,2010-02-03,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.6421,0.0880.0840.0880.0860.0880.0880.0880.0860.0880...,243.050242.160243.100242.840243.640243.430243....,6.6006.6006.6006.6006.6006.6006.6006.6006.6006...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,19.00019.00018.00019.00019.00019.00019.00018.0...,0.960979
4,2010-02-04,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.3601,0.0000.0880.0780.0820.0740.0800.0740.0820.0760...,243.070243.440241.700241.110240.090240.240240....,5.6006.0005.8006.0005.8005.8005.8005.8005.8005...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0001.0000.0000.0000.0000.0000.0001...,19.00019.00018.00019.00018.00018.00019.00018.0...,1.020618


In [150]:
test_df1 = test_df.set_index('ds').join(forecast_test1.set_index('ds'), how='left', lsuffix='_actual', rsuffix='_predicted')

In [151]:
test_df1.head()

Unnamed: 0_level_0,Time,y,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3,yhat_actual,yhat_predicted
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2010-01-31,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.27,0.0000.0000.0000.0000.0000.0000.0000.0000.1000...,244.330244.460244.010244.400244.320244.470244....,1.0001.2001.0001.2001.0001.2001.0001.4001.6001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0001.0000.0000.0000.0000.0000.0001.0000.0000...,1.0000.0001.0001.0000.0001.0001.0000.0001.0001...,1.681964,1.670123
2010-02-01,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,0.346,0.0000.0000.0000.0000.0000.0000.0000.0840.1140...,245.130245.290244.200244.660244.500245.790244....,1.4001.4001.4001.4001.4001.4001.4001.8001.8001...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,1.0001.0000.0001.0001.0000.0001.0001.0000.0001...,1.014897,1.018099
2010-02-02,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.6861,0.0560.0480.0460.0480.0460.0000.0580.0580.0600...,245.190244.580245.320244.790244.870243.610243....,6.8006.8006.6006.6005.8005.8006.0006.0006.0006...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,19.00019.00019.00019.00019.00019.00019.00019.0...,1.078063,1.081362
2010-02-03,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.6421,0.0880.0840.0880.0860.0880.0880.0880.0860.0880...,243.050242.160243.100242.840243.640243.430243....,6.6006.6006.6006.6006.6006.6006.6006.6006.6006...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,19.00019.00018.00019.00019.00019.00019.00018.0...,0.960979,0.960858
2010-02-04,00:00:0000:01:0000:02:0000:03:0000:04:0000:05:...,1.3601,0.0000.0880.0780.0820.0740.0800.0740.0820.0760...,243.070243.440241.700241.110240.090240.240240....,5.6006.0005.8006.0005.8005.8005.8005.8005.8005...,0.0000.0000.0000.0000.0000.0000.0000.0000.0000...,0.0000.0000.0001.0000.0000.0000.0000.0000.0001...,19.00019.00018.00019.00018.00018.00019.00018.0...,1.020618,1.023851


In [152]:
#Evaluating the model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error,mean_absolute_percentage_error

mae1 = mean_absolute_error(test_df1['y'], test_df1['yhat_predicted'])
mse1 = mean_squared_error(test_df1['y'], test_df1['yhat_predicted'])
mape1 = mean_absolute_percentage_error(test_df1['y'], test_df1['yhat_predicted'])
rmse1 = mse ** 0.5

print(f"Mean Absolute Error: {mae1}")
print(f"Mean Squared Error: {mse1}")
print(f"Root Mean Squared Error: {rmse1}")
print(f"Mean Absolute Percentage Error: {mape1}")

Mean Absolute Error: 0.4250171757756171
Mean Squared Error: 0.4643183373265933
Root Mean Squared Error: 0.6809209156664275
Mean Absolute Percentage Error: 0.6707446622050498
