##**Feature lagging**

Lagging a time series involves shifting its values forward in time by one or more steps. This means that the observations in the lagged series will appear to have happened later in time compared to the original series. By doing this, we create "lagged" copies of the series, which can be useful for investigating serial dependence or cycles in the time series data. This allows us to incorporate past values of the series into our prediction model, which may help capture patterns or dependencies in the data and improve forecast accuracy.

https://www.kaggle.com/code/ryanholbrook/time-series-as-features

In [1]:
import os
import pandas as pd
import numpy as np
from scipy import stats as ss
import datetime as dt

from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [16]:
# searching for files, load data and convert index to datetime type
def search_file(directory, filename):
    for root, dirs, files in os.walk(directory):
        if filename in files:
            return os.path.join(root, filename)
    return None

search_directory = '/content/drive/My Drive'
file_name = 'curtailment_target_features.csv'
file_path = search_file(search_directory, file_name)

df = pd.read_csv(file_path, sep = ';', index_col=0)
df.index = pd.to_datetime(df.index)

In [17]:
# drop features that are highly correlating with other features
df.drop(['wind_speed_m/s', 'radiation_global_J/m2', 'wind_direction_gust_max_degrees'], axis=1,  inplace = True)

In [18]:
# features
df_lag = df.drop(['redispatch', 'level'], axis =1)
df_lagged = pd.DataFrame(index=df.index)
df_lagged['redispatch'] = df['redispatch']
df_lagged['level'] = df['level']

for feature in df_lag.columns:
    df_lagged[feature] = df[feature]
    df_lagged[feature + '_lag1'] = df_lag[feature].shift(1)
    df_lagged[feature + '_lag2'] = df_lag[feature].shift(2)

df_lagged.dropna(inplace = True)

In [19]:
# target variable
significant_lags = [48, 96]
df_extended = df_lagged.copy()

for lag in significant_lags:
    df_extended[f'redispatch_lag_{lag}'] = df_extended['redispatch'].shift(lag)
    df_extended[f'level_lag_{lag}'] = df_extended['level'].shift(lag)

df_extended.dropna(inplace=True)

In [20]:
folder_path = '/content/drive/My Drive/wind_curtailment_prediction'

if not os.path.exists(folder_path):
    os.makedirs(folder_path)
    print("Folder created successfully.")
else:
    print("Folder already exists.")

Folder already exists.


In [21]:
df_lagged.to_csv("/content/drive/MyDrive/wind_curtailment_prediction/lagged_curtailment_target_features.csv", sep=";")

In [22]:
df_extended.to_csv("/content/drive/MyDrive/wind_curtailment_prediction/lagged_curtailment_target_features_extended.csv", sep=";")