# 02 - Weekly Aggregation

In this notebook, we aggregate cleaned transactional sales data into
weekly time series per product to prepare it for demand forecasting.

In [1]:
import pandas as pd

In [2]:
## Load Cleaned Data

df = pd.read_csv("../data/processed/clean_sales.csv")
df["datetime"] = pd.to_datetime(df["datetime"])

Weeks are defined using pandas weekly periods (W), aligned to week start dates.

In [3]:
df["week"] = df["datetime"].dt.to_period("W").apply(lambda r: r.start_time)

## Select Relevant Columns

At this stage, we keep only the necessary columns required for weekly demand aggregation:
- product_name: identifies the product
- week: represents the weekly time period
- qty_sold: quantity sold per transaction


In [4]:
df = df[["product_name", "week", "qty_sold"]]

## Weekly Aggregation per Product

We aggregate transactional sales into total weekly demand per product
by summing the quantity sold within each week.


In [5]:
weekly_df = (
    df
    .groupby(["product_name", "week"], as_index=False)
    .agg({"qty_sold": "sum"})
)

## Ensure Continuous Weekly Time Series

Weeks with no sales are explicitly added with zero demand to distinguish

In [6]:
all_weeks = pd.period_range(
    weekly_df["week"].min(),
    weekly_df["week"].max(),
    freq="W"
).to_timestamp()

final_df = []

for product, group in weekly_df.groupby("product_name"):
    group = group.set_index("week").reindex(all_weeks, fill_value=0)
    group["product_name"] = product
    group = group.reset_index().rename(columns={"index": "week"})
    final_df.append(group)

final_df = pd.concat(final_df, ignore_index=True)

In [15]:
## Aggregation Validation Checks

# Check duplicates & negative values
print(f"Duplicate rows: {final_df.duplicated(['product_name', 'week']).sum()}")
print(f"Negative qty rows: {(final_df['qty_sold'] < 0).sum()}")

Duplicate rows: 0
Negative qty rows: 0


## Save Weekly Aggregated Data

The aggregated dataset will be used for exploratory data analysis
and forecasting model development.


In [7]:
final_df.to_csv("../data/processed/weekly_sales.csv", index=False)

## Aggregation Output Review & Observed Issues

The weekly aggregation process successfully transformed transactional sales
data into structured weekly time series per product.

However, a detailed inspection of the aggregated output revealed several
critical issues that must be addressed before exploratory analysis
or forecasting:

### Observed Issues

- **High Zero Inflation**  
  The majority of products exhibit long sequences of zero demand,
  indicating infrequent or irregular sales patterns.

- **Short Product Lifecycles**  
  Many products appear for only a few weeks of non-zero sales,
  followed by extended periods of zero demand.

- **Long-Tail Product Distribution**  
  A large number of products contribute minimal total demand,
  while a small subset accounts for most sales volume.

- **Heterogeneous Product Behavior**  
  Products differ significantly in category, packaging, and sales dynamics,
  making a single global forecasting approach inappropriate at this stage.

### Conclusion

Not all aggregated products are suitable for time series forecasting.
Proceeding directly to EDA or modeling would lead to misleading patterns
and unstable models.

As a result, a dedicated **product filtering step** is required to identify
a subset of products with sufficient sales regularity and volume for
reliable demand forecasting.
