# Multi-variable linear regression

The multivariable linear regression analysis is used to create a model of a single variable, typically an energy consumption. We call this the *dependent* variable.  The model is constructed as a linear combination of *explanatory* variables, like weather measurements or occupation. More information can be found on <a href="https://en.wikipedia.org/wiki/Linear_regression" target="_blank">wikipedia</a>.


The model is static.  This means that the data set should not contain dynamic effects.  For buildings, dynamic effects are mostly neglegible on a weekly basis unless the building has a very high thermal inertia.

Typical use of this analysis is to create a model of eg. the gas consumption of a building, and then use this model to detect and quantify changes in the gas consumption.  For example, the savings resulting from a new gas boiler can be computed as the difference between the consumption predicted by the model and the actual consumption. 


### Imports and loading data

In [None]:
import opengrid as og
from opengrid.library import plotting as og_plot
import pandas as pd
from joule import meta, filter_meta, remove_outliers

In [None]:
plt = og.plot_style()

In [None]:
# load weather
dfw = pd.read_pickle('weather_leuven_daily.pkl')
dfw = dfw.filter(items=['cloudCover', 'dewPoint', 'humidity', 'precipIntensity', 'windSpeed', 'temperature'])
for col in dfw:
    try:
        dfw[col] = dfw[col].apply(float)
    except:
        dfw.drop(columns=[col], inplace=True)

### Compute degree-days

First we compute heating degree-days for different base temperatures. More information on the computation of degree-days can be found in [this demo](https://opengridcc.github.io/analysis/degree-days).

In [None]:
# resample weather data to daily values and compute degree-days
dfw = dfw.resample('D').mean()
dfw_HDD = og.library.weather.compute_degree_days(ts=dfw['temperature'],
                                                  heating_base_temperatures=range(8, 18, 2),
                                                  cooling_base_temperatures=range(16, 26, 2)).bfill()

# Create a monthly and weekly models for the gas consumption 

In [None]:
for building in meta['RecordNumber'].unique()[:]:
    record_name = filter_meta(RecordNumber=building)['RecordName'].iloc[0]
    print('\nResults for {}\n'.format(record_name))
    try:
        ts = pd.read_pickle('data/NaturalGas_{}.pkl'.format(building)).sum(axis=1)*4
        ts = ts[ts.index.year==2017]
        ts.name = record_name
    except:
        print('Cannot load {}'.format(building))
    else:
        if ts.empty:
            print('No Data for {}'.format(building))
            continue

        ts_day = ts.resample('D').sum()/1000. # kWh/day
        # remove days with incomplete data
        ts_day = ts_day[ts_day.diff() > ts_day.mean()/1e6]
        ts_day = remove_outliers(ts_day)
        df_day = pd.concat([ts_day, dfw, dfw_HDD], axis=1).loc['2017']
        df_day = df_day.dropna()
        if df_day.empty:
            print('No data after cleaning for {}'.format(building))
            continue

        df_day[record_name].plot()
        plt.show()

        # resample to monthly data and plot 
        df_month = df_day.resample('MS').sum()
        # create the model
        mvlr = og.MultiVarLinReg(df_month, y=record_name)
        mvlr.do_analysis()

        print(mvlr.fit.summary())
        mvlr.plot()
        plt.show()
        
        # resample to weekly data and plot 
        df_month = df_day.resample('W').sum()
        # create the model
        mvlr = og.MultiVarLinReg(df_month, y=record_name)
        mvlr.do_analysis()

        print(mvlr.fit.summary())
        mvlr.plot()
        plt.show()

        
        