# The `eensight` functionality for composing linear model features

The functionality for composing features for linear regression is provided by `LinearModelFeatures` in `eensight.features.compose`. 

The different models that are used in `eensight` rely on their `LinearModelFeatures` composer to generate the regression's [design matrix](https://en.wikipedia.org/wiki/Design_matrix).

A `LinearModelFeatures` composer can get information about features and encoders from the application's catalog or through its API.

In [1]:
%load_ext autoreload
%autoreload 2

In [60]:
import calendar
import numpy as np
import pandas as pd

In [3]:
from eensight.utils.jupyter import load_catalog
from eensight.features.compose import LinearModelFeatures
from eensight.models.seasonal import SeasonalDecomposer

## Using the data catalog

The model details are read from a YAML file:

```yaml
add_features:
  time:
    feature: null 
    type: datetime
    remainder: passthrough
    subset: month, hourofweek

regressors:
  month:
    feature: month
    type: categorical
    max_n_categories: null 
    stratify_by: null 
    excluded_categories: null 
    encode_as: onehot

  tow:
    feature: hourofweek
    type: categorical
    max_n_categories: null 
    stratify_by: null 
    excluded_categories: null 
    encode_as: onehot 
  
  spl_temperature:
    feature: temperature
    type: spline
    n_knots: 4
    degree: 2
    strategy: quantile
    extrapolation: constant
    interaction_only: true

interactions:
  tow, spl_temperature:
    tow:
      max_n_categories: 2
      stratify_by: null 
```

In [30]:
catalog = load_catalog('demo', model='towt')

In [31]:
model_structure = catalog.load('model_structure')

In [33]:
composer = LinearModelFeatures(model_structure=model_structure)

The `fit` method of the composer calls two methods: `_create_transformers` and `_create_encoders`, but we can call them ourselves to see their impact: 

In [34]:
composer._create_transformers()

The feature generators are applied in the same order that they were declared in the YAML configuration file.

In [None]:
for item in composer.transformers_:
    print(item)

In [36]:
composer._create_encoders()

In [None]:
for name, encoder in composer.encoders_['main_effects'].items():
    print('--->', name)
    print(encoder)

In [None]:
for pair_name, encoder in composer.encoders_['interactions'].items():
    print('--->', pair_name)
    print(encoder)

We can load some data from the catalog:

In [39]:
train_input = catalog.load('train.root_input')

`train_input` is a partitioned dataset. We only care about the temperature dataset for now:

In [40]:
load_data = train_input['temperature']
data = load_data()

# Add dummy consumption data
data['consumption'] = 10 * data['temperature'] + np.random.randn(len(data))

# Set time index
data['timestamp'] = data['timestamp'].map(pd.to_datetime)
data = data.drop_duplicates(subset=['timestamp'])
data = data.set_index('timestamp')

In [41]:
X = data[['temperature']]
y = data[['consumption']]

In [42]:
composer = LinearModelFeatures(model_structure=model_structure)
composer = composer.fit(X, y)

After fitting, a composer has a `component_names_` attribute:

In [None]:
composer.component_names_

It also has a `component_matrix` attribute that shows how the different columns of the design matrix correspond to the different components. This allows us to break down a model's prediction into the additive contribution of each component.

In [None]:
composer.component_matrix

In [45]:
design_matrix = composer.transform(X)

In [46]:
assert design_matrix.shape[0] == X.shape[0]
assert design_matrix.shape[1] == composer.n_parameters

## Using the API

In [47]:
composer = LinearModelFeatures()

Feature generators can be added using the `add_new_feature` method. 

The feature generators are applied in the same order that they were added.

In [48]:
composer = composer.add_new_feature(name='time', enc_type='datetime', 
                                    remainder='passthrough', 
                                    subset=['month', 'hourofweek'])

Main effects can be added using the `add_main_effect` method:

In [49]:
composer = composer.add_main_effect(name='tow', enc_type='categorical', feature='hourofweek')

Intercations can be added using the `add_interaction` method:

In [50]:
composer = composer.add_interaction(left_name='tow', right_name='spl_temperature',
                                    left_enc_type='categorical', right_enc_type='spline',
                                    left_feature='hourofweek', right_feature='temperature',
                                    tow=dict(
                                        min_samples_leaf=20,
                                        max_n_categories=2
                                    ), 
                                    spl_temperature=dict(
                                        degree=2,
                                        n_knots=4
                                    )
)

In [51]:
composer = composer.fit(X, y)

In [None]:
for item in composer.transformers_:
    print(item)

In [None]:
for name, encoder in composer.encoders_['main_effects'].items():
    print('--->', name)
    print(encoder)

In [None]:
for pair_name, encoder in composer.encoders_['interactions'].items():
    print('--->', pair_name)
    print(encoder)

An example of using the `LinearModelFeatures` API can be found in `eensight.models.seasonal.SeasonalDecomposer`:

### `eensight.models.seasonal.SeasonalDecomposer` 

Seasonal decomposition model for time series data.
    
    Parameters
    ----------
    feature: str
        The name of the time series feature to decompose.
    dt : str, default=None
        The name of the input dataframe's column that contains datetime information.
        If None, it is assumed that the datetime information is provided by the
        input dataframe's index.
    add_trend : bool, default=False
        If True, a linear time trend will be added.
    yearly_seasonality: Fit yearly seasonality.
        Can be 'auto', True, False, or a number of Fourier terms to generate.
        Default: 'auto'.
    weekly_seasonality: Fit weekly seasonality.
        Can be 'auto', True, False, or a number of Fourier terms to generate.
        Default: 'auto'.
    daily_seasonality: Fit daily seasonality.
        Can be 'auto', True, False, or a number of Fourier terms to generate.
        Default: 'auto'.
    alpha : float, default=1
        Parameter for the underlying ridge estimator. It must be a positive float.
        Regularization improves the conditioning of the problem and reduces the
        variance of the estimates. Larger values specify stronger regularization. 

In [62]:
model = SeasonalDecomposer(
        'consumption',
        add_trend=True,
        yearly_seasonality="auto",
        weekly_seasonality=False,
        daily_seasonality=False,
)

We can add a different daili seasonality per day of week:

In [63]:
dates = data.index.to_series()
columns_before = data.columns
X = data[['consumption']].copy()

X["dayofweek"] = dates.dt.dayofweek.map(lambda x: calendar.day_abbr[x])
X = X.merge(pd.get_dummies(X["dayofweek"]), left_index=True, right_index=True).drop(
    "dayofweek", axis=1
)

for i in range(7):
    day = calendar.day_abbr[i]
    model.add_seasonality(
        f"daily_on_{day}", period=1, fourier_order=4, condition_name=day
    )

In [None]:
X.head()

In [65]:
model = model.fit(X)

In [None]:
for item in model.composer_.transformers_:
    print(item)

In [None]:
for name, encoder in model.composer_.encoders_['main_effects'].items():
    print('--->', name)
    print(encoder)

In [None]:
for pair_name, encoder in model.composer_.encoders_['interactions'].items():
    print('--->', pair_name)
    print(encoder)

In [None]:
model.composer_.component_matrix