# **Data Transformation**

## **11. Window Functions (Basic)**

In [1]:
import numpy as np 
import pandas as pd 

### 1. **What It Does and When to Use It**

#### ✅ **What it does:**

Window functions in pandas allow you to perform **operations over a sliding window of rows**, often with respect to a specific **grouping** or **order**. They **retain the original row structure** but calculate values based on the surrounding rows.

#### ✅ **When to use:**

* When you need **running totals**, **moving averages**, or **ranking**.
* When comparing values **relative to neighbors** (previous or next rows).
* When analyzing **time-series**, **grouped data**, or **trends** over a span.
* When aggregation is needed **without collapsing** the rows like `groupby().agg()` does.


### 2. **Syntax and Core Parameters**

#### ✅ **Key methods**:

```python
df.rolling(window, min_periods, center, win_type).agg()
df.expanding(min_periods).agg()
df.ewm(span).agg()
```

| Function       | Description                              |
| -------------- | ---------------------------------------- |
| `.rolling()`   | Applies operations over a moving window  |
| `.expanding()` | Applies operations over expanding window |
| `.ewm()`       | Exponentially weighted calculations      |

#### ✅ Common Parameters:

| Parameter     | Purpose                                                 |
| ------------- | ------------------------------------------------------- |
| `window`      | Number of rows in the window (e.g., `3`)                |
| `min_periods` | Minimum non-null observations required                  |
| `center`      | Whether the label should be at the center of the window |
| `win_type`    | Type of window function: e.g., `'boxcar'`, `'triang'`   |


### 3. **Different Methods and Techniques**

| Method         | Description                         | Common Aggregations                               |
| -------------- | ----------------------------------- | ------------------------------------------------- |
| `.rolling()`   | Fixed-size sliding window           | `.mean()`, `.sum()`, `.std()`, `.min()`, `.max()` |
| `.expanding()` | Expands window from the start row   | `.mean()`, `.sum()`                               |
| `.ewm()`       | Exponential weighted moving average | `.mean()`, `.var()`                               |


### 4. **Common Pitfalls and Best Practices**

| Pitfall                                                                | Best Practice                                                            |
| ---------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| NaNs appear at beginning (due to insufficient window size)             | Use `min_periods=1` if partial data is acceptable                        |
| Incorrect sorting causes invalid results                               | Always sort time-based data before using window functions                |
| Using `.rolling()` on non-numeric columns                              | Ensure window is applied to numeric data only                            |
| Expecting `.groupby().rolling()` to behave like `.rolling().groupby()` | These are different — test both or use `transform()` for grouped windows |


### 5. **Examples on Real/Pseudo Data**

#### 📌 Example 1: Rolling Mean (Moving Average)

In [2]:
df = pd.DataFrame({
    'day': pd.date_range('2023-01-01', periods=6),
    'sales': [100, 120, 130, 150, 170, 180]
})

df

Unnamed: 0,day,sales
0,2023-01-01,100
1,2023-01-02,120
2,2023-01-03,130
3,2023-01-04,150
4,2023-01-05,170
5,2023-01-06,180


In [4]:
df['3day_avg'] = df['sales'].rolling(window=3).mean()
df

Unnamed: 0,day,sales,3day_avg
0,2023-01-01,100,
1,2023-01-02,120,
2,2023-01-03,130,116.666667
3,2023-01-04,150,133.333333
4,2023-01-05,170,150.0
5,2023-01-06,180,166.666667


#### 📌 Example 2: Expanding Sum

In [5]:
df['expanding_sum'] = df['sales'].expanding().sum()
df

Unnamed: 0,day,sales,3day_avg,expanding_sum
0,2023-01-01,100,,100.0
1,2023-01-02,120,,220.0
2,2023-01-03,130,116.666667,350.0
3,2023-01-04,150,133.333333,500.0
4,2023-01-05,170,150.0,670.0
5,2023-01-06,180,166.666667,850.0


#### 📌 Example 3: Exponential Weighted Moving Average (EWMA)

In [6]:
df['ewm_mean'] = df['sales'].ewm(span=3, adjust=False).mean()
df

Unnamed: 0,day,sales,3day_avg,expanding_sum,ewm_mean
0,2023-01-01,100,,100.0,100.0
1,2023-01-02,120,,220.0,110.0
2,2023-01-03,130,116.666667,350.0,120.0
3,2023-01-04,150,133.333333,500.0,135.0
4,2023-01-05,170,150.0,670.0,152.5
5,2023-01-06,180,166.666667,850.0,166.25


#### 📌 Example 4: Group-wise Rolling Average

In [7]:
df = pd.DataFrame({
    'store': ['A', 'A', 'A', 'B', 'B', 'B'],
    'sales': [10, 20, 30, 5, 15, 25]
})
df

Unnamed: 0,store,sales
0,A,10
1,A,20
2,A,30
3,B,5
4,B,15
5,B,25


In [8]:
df['rolling_avg'] = df.groupby('store')['sales'].rolling(2).mean().reset_index(drop=True)
df

Unnamed: 0,store,sales,rolling_avg
0,A,10,
1,A,20,15.0
2,A,30,25.0
3,B,5,
4,B,15,10.0
5,B,25,20.0


### 6. **Real World Use Cases**

| Use Case                    | Description                                                               |
| --------------------------- | ------------------------------------------------------------------------- |
| **Stock market analysis**   | Rolling averages to smooth out noise in price data                        |
| **Sales forecasting**       | Moving averages to predict future trends based on past                    |
| **Anomaly detection**       | Compare current vs rolling stats to detect outliers                       |
| **Sensor data analysis**    | Detect spikes or trends using sliding windows                             |
| **Customer purchase trend** | Group by customer and apply window functions to track behavior            |
| **Credit scoring**          | Use expanding average of late payments or spending for behavioral scoring |


### ✅ Summary Table

| Function       | Use Case Example           | Notes                          |
| -------------- | -------------------------- | ------------------------------ |
| `.rolling()`   | 7-day avg temperature      | Fixed window                   |
| `.expanding()` | Cumulative sales or sum    | Expanding window               |
| `.ewm()`       | Smoothed customer behavior | More recent data weighted more |


<center><b>Thanks</b></center>