##**Feature lagging**

Lagging a time series involves shifting its values forward in time by one or more steps. This means that the observations in the lagged series will appear to have happened later in time compared to the original series. By doing this, we create "lagged" copies of the series, which can be useful for investigating serial dependence or cycles in the time series data. This allows us to incorporate past values of the series into our prediction model, which may help capture patterns or dependencies in the data and improve forecast accuracy.

https://www.kaggle.com/code/ryanholbrook/time-series-as-features

In [1]:
import os
import pandas as pd
import numpy as np
from scipy import stats as ss
import datetime as dt

from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
df = pd.read_csv("/content/drive/MyDrive/ms_wind_curtailment_prediction/curtailment_target_features.csv", sep=";", index_col=0)

In [4]:
df_lag = df.drop(['redispatch', 'level'], axis =1)
df_lagged = pd.DataFrame(index=df.index)
df_lagged['redispatch'] = df['redispatch']
df_lagged['level'] = df['level']

for feature in df_lag.columns: # dependent variable included!
    df_lagged[feature] = df[feature]
    df_lagged[feature + '_lag1'] = df_lag[feature].shift(1)
    df_lagged[feature + '_lag2'] = df_lag[feature].shift(2)
    df_lagged[feature + '_lag3'] = df_lag[feature].shift(3)
    df_lagged[feature + '_lag4'] = df_lag[feature].shift(4)
    df_lagged[feature + '_lag5'] = df_lag[feature].shift(5)

df_lagged.dropna(inplace = True) # maybe better ways

tbd.

In [6]:
# Identify significant lag values (where autocorrelation is outside the confidence intervals)
significant_lags = [48, 96]

df_extended = df_lagged.copy()

# Create lagged features
for lag in significant_lags:
    df_extended[f'redispatch_lag_{lag}'] = df_extended['redispatch'].shift(lag)
    df_extended[f'level_lag_{lag}'] = df_extended['level'].shift(lag)

# Drop rows with NaN values resulting from the shifting
df_extended.dropna(inplace=True)

In [None]:
df_lagged.to_csv("/content/drive/MyDrive/ms_wind_curtailment_prediction/lagged_curtailment_target_features.csv", sep=";")

In [7]:
df_extended.to_csv("/content/drive/MyDrive/ms_wind_curtailment_prediction/lagged_curtailment_target_features_extended.csv", sep=";")