# Feature selection

Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.

Skforecast is compatible with the **feature selection methods** implemented in [scikit-learn](https://scikit-learn.org/stable/modules/feature_selection.html) and [feature-engine](https://feature-engine.trainindata.com/en/latest/api_doc/selection/index.html) libraries. There are several methods for feature selection, but the most common are: 

**Recursive feature elimination**

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination ([RFE](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html)) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features, and the importance of each feature is obtained either by a specific attribute (such as `coef_`, `feature_importances_`) or by a `callable`. Then, the least important features are pruned from the current set of features. This procedure is repeated recursively on the pruned set until the desired number of features to select is eventually reached. [`RFECV`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV) performs RFE in a cross-validation loop to find the optimal number of features.

**Sequential Feature Selection**

Sequential Feature Selection ([`SFS`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SequentialFeatureSelector.html)) can be either forward or backward, with the `direction` parameter controlling whether forward or backward SFS is used.

+ **Forward-SFS** is a greedy procedure that iteratively finds the best new feature to add to the set of selected features. It starts with zero features and finds the one that maximizes a cross-validated score when an estimator is trained on that single feature. Once this first feature is selected, the procedure is repeated, adding one new feature to the set of selected features. The procedure stops when the desired number of selected features is reached, as determined by the `n_features_to_select` parameter.

+ **Backward-SFS** follows the same idea, but works in the opposite direction. Instead of starting with no features and greedily adding features, it starts with all features and greedily removes features from the set. 

In general, forward and backward selection do not produce equivalent results. Also, one can be much faster than the other depending on the requested number of selected features: if we have 10 features and ask for 7 selected features, forward selection would need to perform 7 iterations while backward selection would only need to perform 3.

[`SFS`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SequentialFeatureSelector.html) differs does not require the underlying model to expose a `coef_` or `feature_importances_` attribute. However, it may be slower compared to the other approaches, considering that more models have to be evaluated. For example in backward selection, the iteration going from $m$ features to $m - 1$ features using k-fold cross-validation requires fitting $m * k$ models to be evaluated.

**Feature selection based on threshold (SelectFromModel)**

[`SelectFromModel`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFromModel.html) can be used along with any estimator that has a `coef_` or `feature_importances_` attribute after fitting. Features are considered unimportant and removed, if the corresponding `coef_` or `feature_importances_` values are below the given `threshold` parameter. In addition to specifying the `threshold` numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are `'mean'`, `'median'` and float multiples of these, such as `'0.1*mean'`.

This method is very fast compared to the others because it does not require any additional model training. However, it does not evaluate the impact of feature removal on the model. It is often used for an initial selection before applying another more computationally expensive feature selection method.

**Minimum Redundancy Maximum Relevance (MRMR)**

Minimum Redundancy Maximum Relevance (MRMR) is a filter-based feature selection method that aims to identify a subset of features that are both highly relevant to the target variable and minimally redundant with respect to each other. Relevance is typically measured using mutual information between each feature and the target, while redundancy is assessed via the mutual information between pairs of features. By optimizing both criteria, mRMR helps reduce overfitting and improve model interpretability, especially in high-dimensional settings. The [`MRMR`](https://feature-engine.trainindata.com/en/latest/user_guide/selection/MRMR.html) class from [feature-engine](https://feature-engine.trainindata.com/en/latest/index.html) can be used to implement this method.

<div class="admonition note" name="html-admonition" style="background: rgba(0,191,191,.1); padding-top: 0px; padding-bottom: 6px; border-radius: 8px; border-left: 8px solid #00bfa5; border-color: #00bfa5; padding-left: 10px; padding-right: 10px;">

<p class="title">
    <i style="font-size: 18px; color:#00bfa5;"></i>
    <b style="color: #00bfa5;">&#128161 Tip</b>
</p>

Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a <b>small subset of the data and a simple model</b>. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.
<br><br>
For example, in this use case, the model is an <code>LGMBRegressor</code> with 900 trees and a maximum depth of 7. However, to find the best subset of features, only 100 trees and a maximum depth of 5 are used.

</div>

## Feature selection with skforecast

The `select_features` and `select_features_multiseries` functions can be used to select the best subset of features (autoregressive and exogenous variables). These functions are compatible with the feature selection methods implemented in the scikit-learn library. The available parameters are:

- `forecaster`: Forecaster of type `ForecasterRecursive`,  `ForecasterDirect`, `ForecasterRecursiveMultiSeries` or `ForecasterDirectMultiVariate`.

- `selector`: Feature selector from `sklearn.feature_selection`. For example, `RFE` or `RFECV`.

- `y` or `series`: Target time series to which the feature selection will be applied.

- `exog`: Exogenous variables.

- `select_only`: Decide what type of features to include in the selection process. 
        
    + If `'autoreg'`, only autoregressive features (lags and window features) are evaluated by the selector. All exogenous features are included in the output `selected_exog`.

    + If `'exog'`, only exogenous features are evaluated without the presence of autoregressive features. All autoregressive features are included in the outputs `selected_lags` and `selected_window_features`.

    + If `None`, all features are evaluated by the selector.

- `force_inclusion`: Features to force include in the final list of selected features.
        
    + If `list`, list of feature names to force include.
    
    + If `str`, regular expression to identify features to force include. For example, if `force_inclusion="^sun_"`, all features that begin with "sun_" will be included in the final list of selected features.

- `subsample`: Proportion of records to use for feature selection.

- `random_state`: Sets a seed for the random subsample so that the subsampling process is always deterministic.

- `verbose`: Print information about feature selection process.

These functions return three `list`:

- `selected_lags`: List of selected lags.

- `selected_window_features`: List of selected window features.

- `selected_exog`: List of selected exogenous features.

## Libraries and data

In [1]:
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from sklearn.feature_selection import RFECV
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import ShuffleSplit
from sklearn.preprocessing import StandardScaler
from feature_engine.selection import MRMR

from skforecast.datasets import fetch_dataset
from skforecast.preprocessing import RollingFeatures
from skforecast.recursive import ForecasterRecursive
from skforecast.recursive import ForecasterRecursiveMultiSeries
from skforecast.feature_selection import select_features
from skforecast.feature_selection import select_features_multiseries

In [2]:
# Download data
# ==============================================================================
data = fetch_dataset(name="bike_sharing_extended_features")
data.head(3)

bike_sharing_extended_features
------------------------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, the dataset
was enriched by introducing supplementary features. Addition includes calendar-
based variables (day of the week, hour of the day, month, etc.), indicators for
sunlight, incorporation of rolling temperature averages, and the creation of
polynomial features generated from variable pairs. All cyclic variables are
encoded using sine and cosine functions to ensure accurate representation.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17352, 90)


Unnamed: 0_level_0,users,weather,month_sin,month_cos,week_of_year_sin,week_of_year_cos,week_day_sin,week_day_cos,hour_day_sin,hour_day_cos,...,temp_roll_mean_1_day,temp_roll_mean_7_day,temp_roll_max_1_day,temp_roll_min_1_day,temp_roll_max_7_day,temp_roll_min_7_day,holiday_previous_day,holiday_next_day,temp,holiday
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-08 00:00:00,25.0,mist,0.5,0.866025,0.120537,0.992709,-0.781832,0.62349,0.258819,0.965926,...,8.063334,10.127976,9.02,6.56,18.86,4.92,0.0,0.0,7.38,0.0
2011-01-08 01:00:00,16.0,mist,0.5,0.866025,0.120537,0.992709,-0.781832,0.62349,0.5,0.866025,...,8.029166,10.113334,9.02,6.56,18.86,4.92,0.0,0.0,7.38,0.0
2011-01-08 02:00:00,16.0,mist,0.5,0.866025,0.120537,0.992709,-0.781832,0.62349,0.707107,0.707107,...,7.995,10.103572,9.02,6.56,18.86,4.92,0.0,0.0,7.38,0.0


In [3]:
# Data selection (reduce data size to speed up the example)
# ==============================================================================
data = data.drop(columns="weather")
data = data.loc["2012-01-01 00:00:00":]

## Create forecaster

A forecasting model is created to predict the number of users using the last 48 values (last two days) and the exogenous features available in the dataset.

In [4]:
# Create forecaster
# ==============================================================================
window_features = RollingFeatures(
                      stats        = ['mean', 'mean', 'sum'],
                      window_sizes = [24, 48, 24]
                  )

forecaster = ForecasterRecursive(
                 regressor       = LGBMRegressor(
                                       n_estimators = 900,
                                       random_state = 15926,
                                       max_depth    = 7,
                                       verbose      = -1
                                   ),
                 lags            = 48,
                 window_features = window_features
             )

## Feature selection with Recursive Feature Elimination (RFECV)


### Selection of autoregressive and exogenous features

By default, the `select_features` function selects the best subset of autoregressive and exogenous features.

In [5]:
# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
import warnings
warnings.filterwarnings(
    "ignore",
    message="X does not have valid feature names, but .* was fitted with feature names"
)
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, min_features_to_select=25)
selected_lags, selected_window_features, selected_exog = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data["users"],
    exog            = data.drop(columns="users"),
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 139
    Lags            (n=48)
    Window features (n=3)
    Exog            (n=88)
Number of features selected: 58
    Lags            (n=36) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 34, 35, 36, 40, 41, 42, 44, 46, 47, 48]
    Window features (n=1) : ['roll_mean_24']
    Exog            (n=21) : ['hour_day_sin', 'hour_day_cos', 'poly_month_cos__week_of_year_sin', 'poly_month_cos__hour_day_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_c

Then, the Forecaster model is trained with the selected features. As the window features are generated with the `RollingFeatures` class, the selected window features must be included manually creating a new object.

In [6]:
# Train forecaster with selected features
# ==============================================================================
new_window_features = RollingFeatures(
                          stats        = ['mean'],
                          window_sizes = 24
                      )

forecaster = ForecasterRecursive(
                 regressor       = LGBMRegressor(
                                       n_estimators = 900,
                                       random_state = 15926,
                                       max_depth    = 7,
                                       verbose      = -1
                                   ),
                 lags            = selected_lags,
                 window_features = new_window_features
             )

forecaster.fit(y=data["users"], exog=data[selected_exog])

### Selection on a subset of features

+ If `select_only = 'autoreg'`, only autoregressive features (lags or custom predictors) are evaluated by the selector. All exogenous features are included in the output `selected_exog`.

+ If `select_only = 'exog'`, exogenous features are evaluated by the selector in the absence of autoregressive features. All autoregressive features are included in the outputs `selected_lags` and `selected_window_features`.

In [7]:
# Feature selection (only autoregressive) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, min_features_to_select=25)

selected_lags, selected_window_features, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = 'autoreg',
    subsample   = 0.5,
    verbose     = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 125
    Lags            (n=36)
    Window features (n=1)
    Exog            (n=88)
Number of features selected: 33
    Lags            (n=33) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 22, 23, 24, 25, 26, 28, 29, 30, 32, 34, 35, 36, 40, 41, 42, 44, 46, 48]
    Window features (n=0) : []
    Exog            (n=88) : ['month_sin', 'month_cos', 'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunrise_hour_cos', 'sunset_hour_sin', 'sunset_hour_cos', 'poly_month_sin__month_cos', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_sin__sunrise_hour_

In [8]:
# Check all exogenous features are selected
# ==============================================================================
len(selected_exog) == data.drop(columns="users").shape[1]

True

In [9]:
# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, min_features_to_select=25)

selected_lags, selected_window_features, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = 'exog',
    subsample   = 0.5,
    verbose     = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 125
    Lags            (n=36)
    Window features (n=1)
    Exog            (n=88)
Number of features selected: 61
    Lags            (n=36) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 34, 35, 36, 40, 41, 42, 44, 46, 47, 48]
    Window features (n=1) : ['roll_mean_24']
    Exog            (n=61) : ['week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_cos__week_of_year_sin', 'poly_month_cos__week_day_sin', 'poly_month_cos__week_day_cos', 'poly_month_cos__hour_day_sin', 'poly

In [10]:
# Check all autoregressive features are selected
# ==============================================================================
print("Same lags :", len(selected_lags) == len(forecaster.lags))
print("Same window features :", len(selected_window_features) == len(forecaster.window_features))

Same lags : True
Same window features : True


### Force selection of specific features

The `force_inclusion` argument can be used to force the selection of certain features. To illustrate this, a non-informative feature is added to the data set, `noise`. This feature contains no information about the target variable and therefore should not be selected by the feature selector. However, if we force the inclusion of this feature, it will be included in the final list of selected features.

In [11]:
# Add non-informative feature
# ==============================================================================
data['noise'] = np.random.normal(size=len(data))

In [12]:
# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, min_features_to_select=10)

selected_lags, selected_window_features, selected_exog = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data["users"],
    exog            = data.drop(columns="users"),
    select_only     = 'exog',
    force_inclusion = ["noise"],
    subsample       = 0.5,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 126
    Lags            (n=36)
    Window features (n=1)
    Exog            (n=89)
Number of features selected: 78
    Lags            (n=36) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 34, 35, 36, 40, 41, 42, 44, 46, 47, 48]
    Window features (n=1) : ['roll_mean_24']
    Exog            (n=78) : ['month_sin', 'month_cos', 'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_sin__sunrise_hour_cos', 'poly_month_sin__sunset_hour_sin', 'poly_month_sin__sun

In [13]:
# Check if "noise" is in selected_exog
# ==============================================================================
"noise" in selected_exog

True

## Feature selection with Sequential Feature Selection (SFS)

Sequential Feature Selection is a robust method for selecting features, but it is **computationally expensive**. When the data set is very large, one way to reduce the computational cost is to use a single validation split to evaluate each candidate model instead of cross-validation (default).

In [14]:
# Feature selection (only exog) with scikit-learn SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=50, max_depth=3, random_state=15926, verbose=-1)
selector = SequentialFeatureSelector(
               estimator            = forecaster.regressor,
               n_features_to_select = 25,
               direction            = "forward",
               cv                   = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
               scoring              = "neg_mean_absolute_error",
           )

selected_lags, selected_window_features, selected_exog = select_features(
    forecaster   = forecaster,
    selector     = selector,
    y            = data["users"],
    exog         = data.drop(columns="users"),
    select_only  = 'exog',
    subsample    = 0.2,
    random_state = 123,
    verbose      = True,
)

Recursive feature elimination (SequentialFeatureSelector)
---------------------------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 1742
Number of features available: 126
    Lags            (n=36)
    Window features (n=1)
    Exog            (n=89)
Number of features selected: 25
    Lags            (n=36) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 34, 35, 36, 40, 41, 42, 44, 46, 47, 48]
    Window features (n=1) : ['roll_mean_24']
    Exog            (n=25) : ['week_of_year_sin', 'week_day_sin', 'hour_day_sin', 'hour_day_cos', 'sunset_hour_sin', 'poly_month_sin__week_day_cos', 'poly_month_cos__week_of_year_cos', 'poly_month_cos__sunset_hour_cos', 'poly_week_of_year_sin__sunset_hour_sin', 'poly_week_of_year_cos__sunrise_hour_cos', 'poly_week_of_year_cos__sunset_hour_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week

## Feature selection with Minimum Redundancy Maximum Relevance (MRMR)

Minimum Redundancy Maximum Relevance (MRMR) is a filter-based feature selection method that is **model-agnostic and fast to compute**, making it suitable for high-dimensional datasets. However, it relies on statistical criteria like mutual information rather than evaluating model performance directly, so it may be less tailored to a specific estimator compared to wrapper methods.

In [15]:
# Feature selection with feature-engine MRMR
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = MRMR(method="MIQ", max_features=25, regression=True, cv=3)

selected_lags, selected_window_features, selected_exog = select_features(
    forecaster   = forecaster,
    selector     = selector,
    y            = data["users"],
    exog         = data.drop(columns="users"),
    select_only  = None,
    subsample    = 0.5,
    random_state = 123,
    verbose      = True,
)

Recursive feature elimination (MRMR)
------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 126
    Lags            (n=36)
    Window features (n=1)
    Exog            (n=89)
Number of features selected: 25
    Lags            (n=17) : [1, 2, 3, 4, 5, 9, 11, 16, 22, 23, 24, 25, 26, 34, 46, 47, 48]
    Window features (n=0) : []
    Exog            (n=8) : ['hour_day_cos', 'poly_month_sin__month_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__sunrise_hour_cos', 'holiday_previous_day', 'temp', 'holiday', 'noise']


## Combination of feature selection methods

Combining feature selection methods can help speed up the process. An effective approach is to first use `SelectFromModel` to eliminate the less important features, and then use `SequentialFeatureSelector` to determine the best subset of features from this reduced list. This two-step method often improves efficiency by focusing on the most important features.

In [16]:
# Feature selection (autoregressive and exog) with SelectFromModel + SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

# Step 1: Select the 70% most important features with SelectFromModel
selector_1 = SelectFromModel(
                 estimator    = regressor,
                 max_features = int(data.shape[1] * 0.7),
                 threshold    = -np.inf
             )
selected_lags_1, selected_window_features_1, selected_exog_1 = select_features(
    forecaster  = forecaster,
    selector    = selector_1,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = None,
    subsample   = 0.2,
    verbose     = True,
)

Recursive feature elimination (SelectFromModel)
-----------------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 1742
Number of features available: 126
    Lags            (n=36)
    Window features (n=1)
    Exog            (n=89)
Number of features selected: 62
    Lags            (n=36) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 34, 35, 36, 40, 41, 42, 44, 46, 47, 48]
    Window features (n=1) : [np.str_('roll_mean_24')]
    Exog            (n=25) : [np.str_('week_of_year_sin'), np.str_('hour_day_sin'), np.str_('hour_day_cos'), np.str_('poly_month_sin__hour_day_sin'), np.str_('poly_month_cos__week_of_year_sin'), np.str_('poly_month_cos__week_day_sin'), np.str_('poly_month_cos__week_day_cos'), np.str_('poly_week_of_year_sin__week_day_sin'), np.str_('poly_week_of_year_sin__week_day_cos'), np.str_('poly_week_of_year_sin__hour_day_sin'), np.str_('poly_week_of

In [17]:
# Step 2: Select the 25 most important features with SequentialFeatureSelector
window_features_1 = RollingFeatures(stats=['mean'], window_sizes=24)
forecaster.set_lags(lags=selected_lags_1)
forecaster.set_window_features(window_features=window_features_1)

selector_2 = SequentialFeatureSelector(
                 estimator            = regressor,
                 n_features_to_select = 25,
                 direction            = "forward",
                 cv                   = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
                 scoring              = "neg_mean_absolute_error",
             )

selected_lags, selected_window_features, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector_2,
    y           = data["users"],
    exog        = data[selected_exog_1],
    select_only = None,
    subsample   = 0.2,
    verbose     = True,
)

Recursive feature elimination (SequentialFeatureSelector)
---------------------------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 1742
Number of features available: 62
    Lags            (n=36)
    Window features (n=1)
    Exog            (n=25)
Number of features selected: 25
    Lags            (n=15) : [1, 6, 8, 11, 16, 19, 24, 25, 28, 31, 32, 36, 41, 44, 48]
    Window features (n=0) : []
    Exog            (n=10) : ['hour_day_sin', 'hour_day_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_cos__sunset_hour_sin', 'temp_roll_mean_1_day', 'temp']


## Feature Selection in Global Forecasting Models

As with univariate forecasting models, feature selection can be applied to global forecasting models (multi-series). In this case, the `select_features_multiseries` function is used. This function has the same parameters as `select_features`, but the `y` parameter is replaced by `series`.

- `forecaster`: Forecaster of type `ForecasterRecursiveMultiSeries` or `ForecasterDirectMultiVariate`.

- `selector`: Feature selector from `sklearn.feature_selection`. For example, `RFE` or `RFECV`.

- `series`: Target time series to which the feature selection will be applied.

- `exog`: Exogenous variables.

- `select_only`: Decide what type of features to include in the selection process. 
        
    + If `'autoreg'`, only autoregressive features (lags or custom predictors) are evaluated by the selector. All exogenous features are included in the output `selected_exog`.

    + If `'exog'`, only exogenous features are evaluated without the presence of autoregressive features. All autoregressive features are included in the outputs `selected_lags` and `selected_window_features`.

    + If `None`, all features are evaluated by the selector.

- `force_inclusion`: Features to force include in the final list of selected features.
        
    + If `list`, list of feature names to force include.
    
    + If `str`, regular expression to identify features to force include. For example, if `force_inclusion="^sun_"`, all features that begin with "sun_" will be included in the final list of selected features.

- `subsample`: Proportion of records to use for feature selection.

- `random_state`: Sets a seed for the random subsample so that the subsampling process is always deterministic.

- `verbose`: Print information about feature selection process.

In [18]:
# Data
# ==============================================================================
data = fetch_dataset(name="items_sales")

items_sales
-----------
Simulated time series for the sales of 3 different items.
Simulated data.
Shape of the dataset: (1097, 3)


In [19]:
# Create exogenous features based on the calendar
# ==============================================================================
data["month"] = data.index.month
data["day_of_week"] = data.index.dayofweek
data["day_of_month"] = data.index.day
data["week_of_year"] = data.index.isocalendar().week
data["quarter"] = data.index.quarter
data["is_month_start"] = data.index.is_month_start.astype(int)
data["is_month_end"] = data.index.is_month_end.astype(int)
data["is_quarter_start"] = data.index.is_quarter_start.astype(int)
data["is_quarter_end"] = data.index.is_quarter_end.astype(int)
data["is_year_start"] = data.index.is_year_start.astype(int)
data["is_year_end"] = data.index.is_year_end.astype(int)
data.head()

Unnamed: 0_level_0,item_1,item_2,item_3,month,day_of_week,day_of_month,week_of_year,quarter,is_month_start,is_month_end,is_quarter_start,is_quarter_end,is_year_start,is_year_end
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2012-01-01,8.253175,21.047727,19.429739,1,6,1,52,1,1,0,1,0,1,0
2012-01-02,22.777826,26.578125,28.009863,1,0,2,1,1,0,0,0,0,0,0
2012-01-03,27.549099,31.751042,32.078922,1,1,3,1,1,0,0,0,0,0,0
2012-01-04,25.895533,24.567708,27.252276,1,2,4,1,1,0,0,0,0,0,0
2012-01-05,21.379238,18.191667,20.357737,1,3,5,1,1,0,0,0,0,0,0


In [20]:
# Create forecaster
# ==============================================================================
forecaster = ForecasterRecursiveMultiSeries(
    regressor       = LGBMRegressor(n_estimators=900, random_state=159, max_depth=7, verbose=-1),
    lags            = 24,
    window_features = RollingFeatures(stats=['mean', 'mean', 'mean'], window_sizes=[24, 48, 72])
)

In [21]:
# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
series_columns = ["item_1", "item_2", "item_3"]
exog_columns = [col for col in data.columns if col not in series_columns]
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, min_features_to_select=25)

selected_lags, selected_window_features, selected_exog = select_features_multiseries(
    forecaster      = forecaster,
    selector        = selector,
    series          = data[series_columns],
    exog            = data[exog_columns],
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 3075
Total number of records used for feature selection: 1537
Number of features available: 38
    Lags            (n=24)
    Window features (n=3)
    Exog            (n=11)
Number of features selected: 28
    Lags            (n=24) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
    Window features (n=2) : ['roll_mean_24', 'roll_mean_72']
    Exog            (n=2) : ['day_of_week', 'week_of_year']


Once the best subset of features has been selected, the global forecasting model is trained with the selected features.

In [22]:
# Train forecaster with selected features
# ==============================================================================
new_window_features = RollingFeatures(stats=['mean', 'mean'], window_sizes=[24, 72])
forecaster.set_lags(lags=selected_lags)
forecaster.set_window_features(window_features=new_window_features)

forecaster.fit(series=data[series_columns], exog=data[selected_exog])
forecaster