MIT License

Copyright (c) Microsoft Corporation. All rights reserved.

This notebook is adapted from Francesca Lazzeri Energy Demand Forecast Workbench workshop.

Copyright (c) 2021 PyLadies Amsterdam, Alyona Galyeva

# Linear regression with recursive feature elimination

In [34]:
%matplotlib inline
import os
import pickle
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import pandas as pd
import numpy as np
from azureml.core import Workspace, Dataset
from azureml.core.experiment import Experiment
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFECV
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import TimeSeriesSplit
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import mean_absolute_percentage_error

This notebook shows how to train a linear regression model to create a forecast of future energy demand. In particular, the model will be trained to predict energy demand in period $t_{+1}$, one hour ahead of the current time period $t$. This is known as 'one-step' time series forecasting because we are predicting one period into the future.

In [3]:
WORKDIR = os.getcwd()
MODEL_NAME = "linear_regression"

In [5]:
ws = Workspace.from_config()

In [18]:
train_ds = Dataset.get_by_name(ws, name="capstone_data_processed")
print(train_ds.name, train_ds.version)

capstone_data_processed 2


In [23]:
train = (train_ds.to_pandas_dataframe()
                 .set_index(['data_index_'])
                 .loc[:'2021-05-31 00:00:00']
                 .reset_index(drop=True)
     # .drop(['load'], axis=1)
)

In [31]:
test =(train_ds.to_pandas_dataframe()
     .set_index(['data_index_'])
     .loc['2021-05-31 00:00:00':'2021-06-06 00:00:00']
     .reset_index(drop=True)
     )

Create design matrix - each column in this matrix represents a model feature and each row is a training example. We remove the *demand* and *timeStamp* variables as they are not model features.

In [32]:
lr = LinearRegression()
lr.fit(train.drop(['load'], axis=1), train["load"])
y_pred = lr.predict(test.drop(['load'], axis=1))

In [35]:
mean_absolute_percentage_error(test.load, y_pred)

0.017433209773943137

In [36]:

with open(os.path.join(WORKDIR, MODEL_NAME + '.pkl'), 'wb') as f:
    pickle.dump(lr, f)