## Feature Engineering
This Notebook will:
Create time-based features (day_of_week, month, year, week_of_year, is_holiday)

Create lag features (sales_last_week, sales_two_weeks_ago)

Create rolling averages (rolling_4_weeks, rolling_8_weeks)

Save the final dataset as merged_features.csv

In [2]:
import pandas as pd
import numpy as np


In [3]:
# Load cleaned data
data = pd.read_csv("data/merged_cleaned.csv", parse_dates=["date"])
print("Loaded merged_cleaned.csv with shape:", data.shape)


Loaded merged_cleaned.csv with shape: (3054348, 14)


#### Time based Features

In [4]:
# Day of week, month, year, week of year
data["day_of_week"] = data["date"].dt.dayofweek      
data["month"] = data["date"].dt.month
data["year"] = data["date"].dt.year
data["week_of_year"] = data["date"].dt.isocalendar().week

#### Lag Features

In [5]:

data = data.sort_values(by=["store_nbr", "family", "date"])

data["sales_last_week"] = data.groupby(["store_nbr", "family"])["sales"].shift(7)
data["sales_two_weeks_ago"] = data.groupby(["store_nbr", "family"])["sales"].shift(14)


#### Rolling Features

In [7]:
data["rolling_4_weeks"] = data.groupby(["store_nbr", "family"])["sales"].transform(lambda x: x.shift(1).rolling(28, min_periods=1).mean())
data["rolling_8_weeks"] = data.groupby(["store_nbr", "family"])["sales"].transform(lambda x: x.shift(1).rolling(56, min_periods=1).mean())


In [13]:

categorical_cols = ["store_nbr", "family", "city", "state", "type", "cluster"]
for col in categorical_cols:
    data[col] = data[col].astype("category")

data.to_parquet("data/merged_feature_engineered.parquet", index=False)
print(" Feature-engineered dataset saved as 'merged_feature_engineered.parquet'")


✅ Feature-engineered dataset saved as 'merged_feature_engineered.parquet'
