# Feature Lagging

In the realm of time series forecasting, lagged features play a pivotal role by incorporating values from preceding time steps as inputs to forecast future observations. The fundamental premise of time series analysis is the assumption that historical observations exert influence on forthcoming events.

By incorporating lag features, models can capture temporal dependencies and patterns, such as seasonality and trends, inherent in the data. For instance, the sales figure of the previous month can be a strong indicator of the sales figure in the current month. Utilizing lagged features allows forecasting models to leverage this historical data, improving the accuracy and robustness of predictions. In essence, lag features bridge the gap between past events and future predictions, making them indispensable in time series forecasting. 

The utilization of lagged features, even in simple forecasting models, can yield surprisingly robust results by capitalizing on the temporal structure of the data.

In [12]:
import pandas as pd
import numpy as np

import plotly.express as px

In [19]:
df = pd.read_csv('../data/train.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

df['store'] = df['store'].astype('category')
df['product'] = df['product'].astype('category')
df.head()

Unnamed: 0_level_0,store,product,number_sold
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-01,0,0,801
2010-01-02,0,0,810
2010-01-03,0,0,818
2010-01-04,0,0,796
2010-01-05,0,0,808


In [24]:
df['number_sold'].head(1000).to_clipboard()

## Visualizing the lagged feature
Imagine a simple time series, with a feature of note having values across various various timestamps. Then a lagged feature can be visualized as

![Feature Lag](../img/TimeSeries_lag.png)

<img src="../img/TrainingIcons/Warning.png" alt="Image" width="100" height="100"> 

Note: When a feature is lagged, it will create nulls at earliest point of the lag. In the given example lag values for `2010-01-01` will not exist for both `Feature Lag 1` as well as `Feature Lag 2`. You can choose to impute (fill these values based on knowledge or a specific strategy) or drop records where nulls are created

## Where would you use a lag feature?

In time series forecasting, lagged features can be created for both the independent features (the predictors) as well as the dependent feature(s) (the predicted values)

An example of dependent feature lagging would be, for industries where a weekly pattern exists (eg: Walmart sales spikes every weekend), knowing the the value of the target variable was 7 days ago, can have a high predictive power.

On the other hand, independent features could be lagged to account for delayed effects associated with events occuring in the past. As an example, the revenue associated with a roofing company could be expected to rise in the days/weeks after a local storm