# Feature Engineering - Lag and Rolling Window Features

## Overview

This notebook performs feature engineering for electricity load forecasting. We create temporal features (lag features and rolling window statistics) from the load data, then merge all datasets (load, weather, calendar) into a comprehensive dataset ready for modeling.

### Key Principles

1. **No Data Leakage**: Only use past information that would be available at prediction time
2. **Temporal Features**: Capture temporal dependencies through lag features (48h, 72h, 96h, 120h, 144h, 168h)
3. **Statistical Summaries**: Use rolling window features (mean, std, min, max) to capture trends and variability
4. **Data Integration**: Combine multiple data sources (load, weather, calendar) into a unified dataset

### Process

1. Load electricity consumption data
2. Create lag features (historical values)
3. Create rolling window statistics
4. Merge with weather and calendar data
5. Save the merged dataset for modeling


In [1]:
import sys
sys.path.append('../src')

from feature_engineering import FeatureEngineer
import pandas as pd
import numpy as np
from pathlib import Path


In [2]:
engineer = FeatureEngineer()

print(f"Lag saatleri: {engineer.lag_hours}")
print(f"Rolling window saatleri: {engineer.window_hours}")


Lag saatleri: [48, 72, 96, 120, 144, 168]
Rolling window saatleri: [48, 72, 96, 120, 144, 168]


In [3]:
# Load datasini oku
load_path = "../data/raw/hungary_load_data_2015_2024.csv"
df_load = pd.read_csv(load_path)
df_load['datetime'] = pd.to_datetime(df_load['datetime'], utc=True)

print(f"Load datasi: {len(df_load):,} rows")
print(f"Date range: {df_load['datetime'].min()} - {df_load['datetime'].max()}")
print(f"\nÄ°lk 5 rows:")
print(df_load.head())


Load verisi: 350,688 kayÄ±t
Tarih aralÄ±ÄŸÄ±: 2015-01-01 00:00:00+00:00 - 2024-12-31 23:45:00+00:00

Ä°lk 5 kayÄ±t:
                   datetime  load_MW
0 2015-01-01 00:00:00+00:00  4164.73
1 2015-01-01 00:15:00+00:00  4106.20
2 2015-01-01 00:30:00+00:00  4053.31
3 2015-01-01 00:45:00+00:00  3952.49
4 2015-01-01 01:00:00+00:00  3863.72


In [4]:
# Lag Ã¶zellikleri generate (48h, 72h, 96h, 120h, 144h, 168h)
df_load = engineer.create_lag_features(df_load, target_col='load_MW')

print(f"\nLag Ã¶zellikleri eklendi. Toplam sÃ¼tun: {len(df_load.columns)}")
print(f"\nLag sÃ¼tunlarÄ±:")
lag_cols = [col for col in df_load.columns if 'lag' in col]
print(lag_cols)
print(f"\nÄ°lk 10 rows (lag Ã¶rnekleri):")
print(df_load[['datetime', 'load_MW'] + lag_cols].head(10))


2025-12-10 22:11:17,981 - INFO - Lag Ã¶zellikleri oluÅŸturuluyor: [48, 72, 96, 120, 144, 168] saat
2025-12-10 22:11:17,986 - INFO -   âœ“ load_MW_lag_48h oluÅŸturuldu
2025-12-10 22:11:17,987 - INFO -   âœ“ load_MW_lag_72h oluÅŸturuldu
2025-12-10 22:11:17,987 - INFO -   âœ“ load_MW_lag_96h oluÅŸturuldu
2025-12-10 22:11:17,990 - INFO -   âœ“ load_MW_lag_120h oluÅŸturuldu
2025-12-10 22:11:17,991 - INFO -   âœ“ load_MW_lag_144h oluÅŸturuldu
2025-12-10 22:11:17,994 - INFO -   âœ“ load_MW_lag_168h oluÅŸturuldu



Lag Ã¶zellikleri eklendi. Toplam sÃ¼tun: 8

Lag sÃ¼tunlarÄ±:
['load_MW_lag_48h', 'load_MW_lag_72h', 'load_MW_lag_96h', 'load_MW_lag_120h', 'load_MW_lag_144h', 'load_MW_lag_168h']

Ä°lk 10 kayÄ±t (lag Ã¶rnekleri):
                   datetime  load_MW  load_MW_lag_48h  load_MW_lag_72h  \
0 2015-01-01 00:00:00+00:00  4164.73              NaN              NaN   
1 2015-01-01 00:15:00+00:00  4106.20              NaN              NaN   
2 2015-01-01 00:30:00+00:00  4053.31              NaN              NaN   
3 2015-01-01 00:45:00+00:00  3952.49              NaN              NaN   
4 2015-01-01 01:00:00+00:00  3863.72              NaN              NaN   
5 2015-01-01 01:15:00+00:00  3805.69              NaN              NaN   
6 2015-01-01 01:30:00+00:00  3711.49              NaN              NaN   
7 2015-01-01 01:45:00+00:00  3638.79              NaN              NaN   
8 2015-01-01 02:00:00+00:00  3588.39              NaN              NaN   
9 2015-01-01 02:15:00+00:00  3527.63          

In [5]:
# Rolling window Ã¶zellikleri generate (mean, std, min, max)
df_load = engineer.create_rolling_features(df_load, target_col='load_MW')

print(f"\nRolling window Ã¶zellikleri eklendi. Toplam sÃ¼tun: {len(df_load.columns)}")
print(f"\nRolling window sÃ¼tunlarÄ±:")
rolling_cols = [col for col in df_load.columns if 'rolling' in col]
print(rolling_cols)
print(f"\nÄ°lk 10 rows (rolling Ã¶rnekleri):")
print(df_load[['datetime', 'load_MW'] + rolling_cols[:4]].head(10))


2025-12-10 22:11:22,058 - INFO - Rolling window Ã¶zellikleri oluÅŸturuluyor: [48, 72, 96, 120, 144, 168] saat
2025-12-10 22:11:22,106 - INFO -   âœ“ 48h rolling window Ã¶zellikleri oluÅŸturuldu (mean, std, min, max)
2025-12-10 22:11:22,135 - INFO -   âœ“ 72h rolling window Ã¶zellikleri oluÅŸturuldu (mean, std, min, max)
2025-12-10 22:11:22,166 - INFO -   âœ“ 96h rolling window Ã¶zellikleri oluÅŸturuldu (mean, std, min, max)
2025-12-10 22:11:22,197 - INFO -   âœ“ 120h rolling window Ã¶zellikleri oluÅŸturuldu (mean, std, min, max)
2025-12-10 22:11:22,227 - INFO -   âœ“ 144h rolling window Ã¶zellikleri oluÅŸturuldu (mean, std, min, max)
2025-12-10 22:11:22,257 - INFO -   âœ“ 168h rolling window Ã¶zellikleri oluÅŸturuldu (mean, std, min, max)



Rolling window Ã¶zellikleri eklendi. Toplam sÃ¼tun: 32

Rolling window sÃ¼tunlarÄ±:
['load_MW_rolling_mean_48h', 'load_MW_rolling_std_48h', 'load_MW_rolling_min_48h', 'load_MW_rolling_max_48h', 'load_MW_rolling_mean_72h', 'load_MW_rolling_std_72h', 'load_MW_rolling_min_72h', 'load_MW_rolling_max_72h', 'load_MW_rolling_mean_96h', 'load_MW_rolling_std_96h', 'load_MW_rolling_min_96h', 'load_MW_rolling_max_96h', 'load_MW_rolling_mean_120h', 'load_MW_rolling_std_120h', 'load_MW_rolling_min_120h', 'load_MW_rolling_max_120h', 'load_MW_rolling_mean_144h', 'load_MW_rolling_std_144h', 'load_MW_rolling_min_144h', 'load_MW_rolling_max_144h', 'load_MW_rolling_mean_168h', 'load_MW_rolling_std_168h', 'load_MW_rolling_min_168h', 'load_MW_rolling_max_168h']

Ä°lk 10 kayÄ±t (rolling Ã¶rnekleri):
                   datetime  load_MW  load_MW_rolling_mean_48h  \
0 2015-01-01 00:00:00+00:00  4164.73                       NaN   
1 2015-01-01 00:15:00+00:00  4106.20                       NaN   
2 2015-01-01

In [6]:
# Load datasini save
output_load_path = "../data/processed/hungary_load_with_features_2015_2024.csv"
Path(output_load_path).parent.mkdir(parents=True, exist_ok=True)
df_load.to_csv(output_load_path, index=False)

print(f"âœ… Load datasi saved: {output_load_path}")
print(f"Toplam rows: {len(df_load):,}")
print(f"Toplam sÃ¼tun: {len(df_load.columns)}")


âœ… Load verisi kaydedildi: ../data/processed/hungary_load_with_features_2015_2024.csv
Toplam kayÄ±t: 350,688
Toplam sÃ¼tun: 32


In [7]:
# All datasetleri merge (load, weather, calendar)
merged_df = engineer.merge_all_datasets(
    load_path=output_load_path,
    weather_path="../data/raw/hungary_weather_2015_2024.csv",
    calendar_path="../data/raw/hungary_calendar_2015_2024.csv",
    output_path="../data/processed/hungary_merged_dataset_2015_2024.csv"
)

print(f"\nðŸ“Š BirleÅŸtirilmiÅŸ Dataset:")
print(f"Toplam rows: {len(merged_df):,}")
print(f"Toplam sÃ¼tun: {len(merged_df.columns)}")
print(f"Date range: {merged_df['datetime'].min()} - {merged_df['datetime'].max()}")
print(f"\nSÃ¼tunlar:")
print(list(merged_df.columns))


2025-12-10 22:11:34,566 - INFO - Datasetler birleÅŸtiriliyor...
2025-12-10 22:11:34,567 - INFO - Load verisi okunuyor: ../data/processed/hungary_load_with_features_2015_2024.csv
2025-12-10 22:11:35,681 - INFO -   âœ“ 350,688 kayÄ±t
2025-12-10 22:11:35,682 - INFO - Weather verisi okunuyor: ../data/raw/hungary_weather_2015_2024.csv
2025-12-10 22:11:36,146 - INFO -   âœ“ 350,688 kayÄ±t
2025-12-10 22:11:36,146 - INFO - Calendar verisi okunuyor: ../data/raw/hungary_calendar_2015_2024.csv
2025-12-10 22:11:37,433 - INFO -   âœ“ 350,688 kayÄ±t
2025-12-10 22:11:37,434 - INFO - Load ve Weather birleÅŸtiriliyor...
2025-12-10 22:11:37,457 - INFO -   âœ“ 350,688 kayÄ±t
2025-12-10 22:11:37,458 - INFO - Calendar ile birleÅŸtiriliyor...
2025-12-10 22:11:37,622 - INFO -   âœ“ 350,688 kayÄ±t
2025-12-10 22:11:37,828 - INFO - âœ… TÃ¼m datasetler birleÅŸtirildi: 350,688 kayÄ±t, 79 sÃ¼tun
2025-12-10 22:11:49,856 - INFO - âœ… BirleÅŸtirilmiÅŸ veri kaydedildi: ../data/processed/hungary_merged_dataset_2015_202


ðŸ“Š BirleÅŸtirilmiÅŸ Dataset:
Toplam kayÄ±t: 350,688
Toplam sÃ¼tun: 79
Tarih aralÄ±ÄŸÄ±: 2015-01-01 00:00:00+00:00 - 2024-12-31 23:45:00+00:00

SÃ¼tunlar:
['datetime', 'load_MW', 'load_MW_lag_48h', 'load_MW_lag_72h', 'load_MW_lag_96h', 'load_MW_lag_120h', 'load_MW_lag_144h', 'load_MW_lag_168h', 'load_MW_rolling_mean_48h', 'load_MW_rolling_std_48h', 'load_MW_rolling_min_48h', 'load_MW_rolling_max_48h', 'load_MW_rolling_mean_72h', 'load_MW_rolling_std_72h', 'load_MW_rolling_min_72h', 'load_MW_rolling_max_72h', 'load_MW_rolling_mean_96h', 'load_MW_rolling_std_96h', 'load_MW_rolling_min_96h', 'load_MW_rolling_max_96h', 'load_MW_rolling_mean_120h', 'load_MW_rolling_std_120h', 'load_MW_rolling_min_120h', 'load_MW_rolling_max_120h', 'load_MW_rolling_mean_144h', 'load_MW_rolling_std_144h', 'load_MW_rolling_min_144h', 'load_MW_rolling_max_144h', 'load_MW_rolling_mean_168h', 'load_MW_rolling_std_168h', 'load_MW_rolling_min_168h', 'load_MW_rolling_max_168h', 'temperature_2m', 'hdd', 'cdd', 'hou

In [8]:
print("\nðŸ“ˆ Ä°lk 5 rows:")
print(merged_df.head())

print("\nðŸ“Š Veri tipleri:")
print(merged_df.dtypes)

print("\nðŸ“Š Eksik deÄŸerler:")
missing = merged_df.isnull().sum()
print(missing[missing > 0])

print("\nðŸ“Š Temel istatistikler:")
print(merged_df.describe())



ðŸ“ˆ Ä°lk 5 kayÄ±t:
                   datetime  load_MW  load_MW_lag_48h  load_MW_lag_72h  \
0 2015-01-01 00:00:00+00:00  4164.73              NaN              NaN   
1 2015-01-01 00:15:00+00:00  4106.20              NaN              NaN   
2 2015-01-01 00:30:00+00:00  4053.31              NaN              NaN   
3 2015-01-01 00:45:00+00:00  3952.49              NaN              NaN   
4 2015-01-01 01:00:00+00:00  3863.72              NaN              NaN   

   load_MW_lag_96h  load_MW_lag_120h  load_MW_lag_144h  load_MW_lag_168h  \
0              NaN               NaN               NaN               NaN   
1              NaN               NaN               NaN               NaN   
2              NaN               NaN               NaN               NaN   
3              NaN               NaN               NaN               NaN   
4              NaN               NaN               NaN               NaN   

   load_MW_rolling_mean_48h  load_MW_rolling_std_48h  ...  \
0               