# 03 – Aggregation and Window Features
     Capturing Behavioral Patterns Over Time and Groups
## Objective

This notebook focuses on **aggregation and window-based feature engineering**, covering:

- Why aggregations often outperform raw features
- Group-level vs time-window aggregations
- Rolling, cumulative, and trend-based features
- Customer behavior summarization
- Leakage-safe aggregation design

It answers:

        How do we summarize historical behavior into stable, predictive signals without leaking future information?


## Why Aggregation Features Matter

Single observations are noisy.

Aggregations:
- Reduce variance
- Capture long-term behavior
- Encode consistency and trends
- Align with how businesses reason about customers

Most production ML systems rely heavily on aggregation features.

## Imports and Dataset










In [10]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns


# Dataset
df = pd.read_csv("../datasets/synthetic_subscription_customer_activity.csv")
df.head()


Unnamed: 0,customer_id,snapshot_date,avg_monthly_usage,support_tickets,satisfaction_level,churn
0,0.0,2022-01-01,69.15,0,3.17,0
1,0.0,2022-02-01,81.23,0,3.64,0
2,0.0,2022-03-01,84.65,2,3.61,0
3,0.0,2022-04-01,74.6,2,3.15,0
4,0.0,2022-05-01,78.87,0,3.22,0


## Step 1 – Dataset Assumptions

We assume **event-level or periodic customer data**, such as:

- Monthly usage records
- Monthly support interactions
- Satisfaction scores over time

Key columns:
- `customer_id`
- `snapshot_date`
- Behavioral measures



## Step 2 – Ensure Time Ordering

### Datetime Parsing and Sorting

In [7]:
df["snapshot_date"] = pd.to_datetime(df["snapshot_date"])
df = df.sort_values(["customer_id", "snapshot_date"])


## Step 3 – Simple Group Aggregations

Summarize customer behavior over full history.


### Aggregations per Customer

In [8]:
customer_agg = (
    df.groupby("customer_id")
      .agg(
          avg_usage=("avg_monthly_usage", "mean"),
          max_usage=("avg_monthly_usage", "max"),
          total_support_tickets=("support_tickets", "sum"),
          avg_satisfaction=("satisfaction_level", "mean")
      )
)

customer_agg.head()


Unnamed: 0_level_0,avg_usage,max_usage,total_support_tickets,avg_satisfaction
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0.0,73.628889,85.44,7,3.288889
1.0,32.936111,75.54,4,3.237778
2.0,79.097778,90.85,9,3.797778
3.0,67.541111,85.98,7,3.836667
4.0,41.126667,50.96,8,2.396667


## Step 4 – Rolling Window Features

Rolling windows capture **recent behavior**, often more predictive than lifetime averages.

### Month Rolling Usage




In [11]:
df["usage_3m_avg"] = (
    df.groupby("customer_id")["avg_monthly_usage"]
      .rolling(window=3, min_periods=1)
      .mean()
      .reset_index(level=0, drop=True)
)

df[["customer_id", "snapshot_date", "usage_3m_avg"]].head()


Unnamed: 0,customer_id,snapshot_date,usage_3m_avg
0,0.0,2022-01-01,69.15
1,0.0,2022-02-01,75.19
2,0.0,2022-03-01,78.343333
3,0.0,2022-04-01,80.16
4,0.0,2022-05-01,79.373333


## Step 5 – Rolling Volatility Features

Volatility often signals instability or dissatisfaction.


### Rolling Std

In [12]:
df["usage_3m_std"] = (
    df.groupby("customer_id")["avg_monthly_usage"]
      .rolling(window=3, min_periods=2)
      .std()
      .reset_index(level=0, drop=True)
)


## Step 6 – Trend Features

Trends capture directionality rather than magnitude.

In [13]:
df["usage_trend"] = (
    df.groupby("customer_id")["avg_monthly_usage"]
      .diff()
)

df[["avg_monthly_usage", "usage_trend"]].head()


Unnamed: 0,avg_monthly_usage,usage_trend
0,69.15,
1,81.23,12.08
2,84.65,3.42
3,74.6,-10.05
4,78.87,4.27


## Step 7 – Cumulative Features

Cumulative features reflect **long-term exposure**.


In [14]:
df["cumulative_support_tickets"] = (
    df.groupby("customer_id")["support_tickets"]
      .cumsum()
)


## Step 8 – Window Selection Strategy

Window size should reflect:
- Business decision cycle
- Customer behavior frequency
- Data availability

Common windows:
- 3 months (short-term risk)
- 6 months (medium-term)
- 12 months (long-term)

## Step 9 – Leakage Guardrails

Aggregation features must:
- Use only past data
- Respect snapshot cut-off dates
- Avoid future outcome information

Never aggregate **after** the prediction point.


## Step 10 – Feature Consolidation

At prediction time, features must be **collapsed to one row per customer**.

### Last Snapshot per Customer

In [15]:
snapshot_features = (
    df.sort_values("snapshot_date")
      .groupby("customer_id")
      .tail(1)
)

snapshot_features.head()


Unnamed: 0,customer_id,snapshot_date,avg_monthly_usage,support_tickets,satisfaction_level,churn,usage_3m_avg,usage_3m_std,usage_trend,cumulative_support_tickets
14219,789.0,2023-06-01,40.69,0,3.18,0,36.086667,7.174555,0.94,5
13949,774.0,2023-06-01,79.81,0,4.21,0,77.026667,2.525912,3.42,6
14327,795.0,2023-06-01,56.27,0,2.72,0,57.96,3.333482,0.46,4
13967,775.0,2023-06-01,13.86,0,3.61,0,14.27,2.08545,-2.67,1
14057,780.0,2023-06-01,97.25,1,2.66,0,98.6,1.678958,-0.82,5


## Step 11 – Signal Sanity Checks

### Correlation Check

In [16]:
aggregation_features = [
    "usage_3m_avg",
    "usage_3m_std",
    "usage_trend",
    "cumulative_support_tickets"
]

snapshot_features[aggregation_features].describe()


Unnamed: 0,usage_3m_avg,usage_3m_std,usage_trend,cumulative_support_tickets
count,800.0,800.0,800.0,800.0
mean,64.327792,6.065355,-0.029675,4.15375
std,37.444434,4.54542,9.164862,2.859691
min,5.0,0.0,-33.19,0.0
25%,36.72,2.698846,-4.77,2.0
50%,60.775,4.828168,0.0,4.0
75%,85.7625,8.268639,4.8625,6.0
max,247.373333,27.675847,43.1,16.0


## Common Mistakes (Avoided)

- `[neg] -` Using full-history aggregations for early predictions
- `[neg] -` Mixing future and past data
- `[neg] -` Ignoring window alignment
- `[neg] -` Over-engineering too many windows


## Summary Table

| Feature Type | Example |
|-------------|--------|
| Group average | Mean usage |
| Rolling window | 3-month average |
| Volatility | Rolling std |
| Trend | Month-over-month delta |
| Cumulative | Total support tickets |


## Key Takeaways

- Aggregations reduce noise and improve stability
- Recent behavior often matters most
- Trends capture direction, not just level
- Leakage control is critical
- Aggregation design is a business decision


## Next Notebook

03_Feature_Engineering/

└── [04_feature_transformation_and_encoding.ipynb](04_feature_transformation_and_encoding.ipynb)
