# Multi-variable linear regression

The multivariable linear regression analysis is used to create a model of a single variable, typically an energy consumption. We call this the *dependent* variable y.  The model is constructed as a linear combination of *explanatory* or *independent* variables (x), like weather measurements or occupation. More information can be found on <a href="https://en.wikipedia.org/wiki/Linear_regression" target="_blank">wikipedia</a>.

The model is static.  This means that the data set should not contain dynamic effects.  For buildings, dynamic effects are mostly neglegible on a weekly basis unless the building has a very high thermal inertia.

Typical use of this analysis is to create a model of eg. the gas consumption of a building, and then use this model to detect and quantify changes in the gas consumption.  For example, the savings resulting from a new gas boiler can be computed as the difference between the consumption predicted by the model and the actual consumption. 


### Imports and loading data

In [None]:
import opengrid as og
import pandas as pd

In [None]:
df = og.datasets.get('gas_2016_hour')
# for this demo, we only compute a model for the sensor 313b
df = df[['313b']]
# load weather
dfw = og.datasets.get('weather_2016_hour')

### Compute degree-days

First we compute heating degree-days for different base temperatures. More information on the computation of degree-days can be found in [this demo](https://opengridcc.github.io/analysis/degree-days).

In [None]:
%matplotlib inline

In [None]:
# resample weather data to daily values and compute degree-days
dfw = dfw.resample('D').mean()
dfw_HDD = og.library.weather.compute_degree_days(ts=dfw['temperature'],
                                                  heating_base_temperatures=range(8, 18, 2),
                                                  cooling_base_temperatures=range(16, 26, 2)).bfill()

In [None]:
# resample the gas consumption to daily values and add the weather data and the degree-days
df_day = df.resample('D').sum()/1000. # kWh/day
df_day = pd.concat([df_day, dfw, dfw_HDD], axis=1).loc['2016']

# Create a monthly model for the gas consumption 

In [None]:
# resample to monthly data 
df_month = df_day.resample('MS').sum()
# we use the first half of 2016 as training data and will use the model to predict the second half of 2016
df_training = df_month.loc[:'2016-06-01']

In [None]:
# create the model
mvlr = og.MultiVarLinReg(df_training, y='313b')
mvlr.do_analysis()
# now use the entire year for making a plot (will also make the predictions)
mvlr.plot(df=df_month)

In [None]:
# we can also obtain some statistical info about the obtained model
print(mvlr.fit.summary())