# Module 1.12: Plotting — Seeing the Patterns

> **Goal:** Build visual intuition for what's in your data.

Summary statistics lie. A mean of 10 could be steady demand, or wild swings between 0 and 100. The only way to know is to **look**.

This module is about seeing patterns—not measuring them yet, just recognizing what's there.

| This Module | Next Module (1.13) |
|-------------|--------------------|
| What do I see? | How strong is it? |
| Build intuition | Quantify patterns |
| Explore | Decide |

---

## 1. Setup and Load Data

In [None]:
# =============================================================================
# SETUP
# =============================================================================

# --- Imports ---
import sys
import warnings
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# --- Path Configuration (before local imports) ---
MODULE_DIR = Path().resolve()
PROJECT_ROOT = MODULE_DIR.parent.parent
sys.path.insert(0, str(PROJECT_ROOT))

# --- Local Imports ---
import tsforge as tsf
from src import (
    CacheManager,
    ArtifactManager,
    get_notebook_name
)

# --- Settings ---
warnings.filterwarnings("ignore")
plt.style.use("seaborn-v0_8-whitegrid")

# --- Paths ---
DATA_DIR = PROJECT_ROOT / "data"
DATA_DIR.mkdir(exist_ok=True)

# --- Managers ---
NB_NAME = get_notebook_name()  # e.g., '1_06_first_contact'
cache = CacheManager(PROJECT_ROOT / ".cache" / NB_NAME)
artifacts = ArtifactManager(PROJECT_ROOT / "artifacts")

print(f"✓ Setup complete | Root: {PROJECT_ROOT.name} | Module: {NB_NAME[:4]}")

✓ Setup complete | Root: real-world-forecasting-foundations | Module: 1_11


In [5]:
# --- Load Data from Module 1.08 ---
df = artifacts.load('1.08')

✓ Loaded '1.08' from 01_foundations/
   Shape: 6,848,887 × 20


---
## 2. The Plot-Question Framework

Before making a plot, ask: **What question am I trying to answer?**

| I want to know... | Use this plot | Function |
|-------------------|---------------|----------|
| Overall pattern | Line plot | `plot_timeseries()` |
| Is there a trend? | Smoothed line | `plot_timeseries(smooth_window=)` |
| Category differences | Grouped plots | `plot_timeseries(group_col=)` |
| Annual seasonality | Seasonal (week-of-year) | `plot_seasonal(freq='W')` |
| Monthly patterns | Seasonal (month) | `plot_seasonal(freq='M')` |
| Demand persistence | ACF | `plot_autocorrelation()` |
| Signal vs noise | Decomposition | `plot_decomposition()` (defined below) |

**Note:** We're working with weekly data, so we see annual cycles (52 weeks), not daily patterns.

**Golden rule:** "Different plots answer different questions."

---

## 3. The Shape of Demand

### 3.1 Plot the times series 

Start with the simplest question: **What does demand look like over time?**

In [7]:
# Basic time series plot - sample 3 random series
tsf.plot_timeseries(
    df=df,
    id_col='unique_id',
    date_col='ds',
    value_col='y',
    ids=3,
    mode='dropdown'
)

### What to notice:

- **Level shifts** — Does the series jump up or down permanently? (new product launch, store closure)
- **Spikes** — One-off events that don't repeat (promotions, stockouts, data errors)
- **Rhythm** — Regular ups and downs that repeat (seasonality)
- **Drift** — Gradual movement up or down over time (trend)
- **Flatlines** — Long periods of zeros (intermittent demand)

### 3.2 Find the Trend

Add a moving average to highlight the underlying trend:

In [8]:
# With smoothing to see trends more clearly
tsf.plot_timeseries(
    df=df,
    id_col='unique_id',
    date_col='ds',
    value_col='y',
    ids=3,
    mode='dropdown',
    smooth_window=4  # 4-week moving average
)

### What to notice:

- **Upward slope** → Growing demand. Naive models will underforecast.
- **Downward slope** → Declining demand. Product end-of-life?
- **Flat** → Stable. Simpler to forecast.
- **Kinked** → Trend changed direction. When? Why?

**The question:** Does your model need a trend component, or will it just chase noise?

---

## 4. Zooming Out: Do Patterns Hold?

You saw something in one series—does it persist when you aggregate or compare groups?

### 4.1 Aggregating Up

What does total demand look like? Aggregation cancels noise—if the pattern survives, it's real.

In [9]:
# Aggregate to total demand per week
total = df.groupby('ds', as_index=False)['y'].sum()
total['unique_id'] = 'TOTAL'

tsf.plot_timeseries(total, id_col='unique_id', date_col='ds', value_col='y', ids=1)

### What to notice:

- **Cleaner pattern** → Signal is real, noise cancelled out
- **Pattern disappears** → What you saw was noise or series-specific
- **New patterns emerge** → Hidden at item level, visible in aggregate

### 4.2 Comparing Groups

Do categories behave differently? Should you model them separately?

In [None]:
tsf.plot_timeseries(df, id_col='unique_id',
                    date_col='ds',
                    value_col='y',
                    group_col='cat_id',
                    agg='mean')

### What to notice:

- **Different seasonality** → Model separately or add category features
- **One trends up, another down** → Definitely separate treatment
- **All move together** → Shared driver, global model may work

**Modeling implication:** Similar patterns → can share model parameters. Different patterns → may need separate models.

---

## 5. Seasonality: The Repeating Rhythm

Overlay years to see if the same weeks behave the same way.

In [11]:
tsf.plot_seasonal(df, id_col='unique_id', date_col='ds', value_col='y', ids=1, freq='W')

### What to notice:

- **Peaks at the same week each year** → True seasonality
- **Peaks that move around** → Not seasonality—probably promotions or events
- **December spike** → Holiday effect, universal in retail
- **Flat line across weeks** → No annual pattern

---

## 6. The Zero Problem

Retail data is full of zeros. How you handle them changes everything.

In [13]:
# CV² vs ADI scatter plot with Syntetos-Boylan quadrants
tsf.plot_intermittency(
    df=df,
    id_col='unique_id',
    date_col='ds',
    value_col='y',
    kind='scatter',
    show_thresholds=True,
    style={'title': 'Intermittency Classification (Syntetos-Boylan)'}
)

### The four quadrants:

| ADI (how often) | CV² (how variable) | Type | What it looks like |
|-----------------|--------------------| -----|--------------------|
| Low | Low | **Smooth** | Regular, predictable demand |
| Low | High | **Erratic** | Frequent but wild swings |
| High | Low | **Intermittent** | Sparse but consistent when it happens |
| High | High | **Lumpy** | Sparse AND unpredictable—hardest case |

---

## 7. Distribution: What Values Are Common?

The shape tells you about outliers and transformations.

In [15]:
# Histogram view of value distributions (with dropdown selector)
tsf.plot_distribution(
    df=df,
    id_col='unique_id',
    date_col='ds',
    value_col='y',
    ids=4,
    kind='histogram',
    mode='dropdown',
    exclude_zeros=False,  # Include zeros to see full distribution
    bins=30,
    style={'title': 'Value Distributions by Series'}
)

### What to notice:

- **Spike at zero** → Intermittent demand
- **Long right tail** → A few weeks have huge demand—outliers or real?
- **Multiple peaks** → Mixture of behaviors (promo vs non-promo?)

---

## 8. Key Takeaways

### 8.1 Pattern Observations

| Pattern | What I Saw |
|---------|------------|
| **Trend** | Slight upward drift in most series |
| **Seasonality** | December peaks visible; some categories have summer dips |
| **Intermittency** | Many series have long flat periods at zero |
| **Distribution** | Right-skewed with spike at zero |
| **Group differences** | FOODS steadier than HOBBIES |
| **Anomalies** | Some unexplained spikes—promotions? |

### 8.2 Open Questions for 1.13

- How strong is the trend? Strong enough to model, or just noise?
- How much of the series is explained by seasonality vs residual?
- How far back does demand "remember"? (autocorrelation)
- What proportion of series are smooth vs lumpy?

> We've seen the patterns. Now let's measure them.

---

## Next: Module 1.13

**Reading the Patterns — Measuring & Deciding**

- Decompose series to quantify trend and seasonal strength
- Use ACF to determine how far back to look
- Classify series by behavior type
- Translate patterns into modeling decisions