# Module 1.09: Diagnostics — The Big Picture (v2)

> **Goal:** Compute time series diagnostics and understand what patterns exist in our portfolio.

**5Q Lens:** Q4 (Data & Drivers) — Measure structure & chaos across portfolio


---

## 1. Setup

In [None]:
import warnings
import numpy as np
import pandas as pd
from tsfeatures import tsfeatures
import forecast_foundations as ff
from tsforge.eda.ts_features_extension import ADI
import tsforge as tsf

# NEW: Import from unified plots API
from tsforge.plots import (
    plot_portfolio_metrics,  # Convenience wrapper for portfolio metrics
    plot_bar,plot_distribution)

# Settings
warnings.filterwarnings('ignore')

---
## 2. Load Data

In [None]:
# Load the forecast-ready dataset from Module 1.08
weekly_df = pd.read_parquet('./data/1.08_data_preparation_output.parquet')

In [3]:
# Quick sanity check
weekly_df.head(3)

Unnamed: 0,unique_id,ds,y,is_gap,item_id,store_id,dept_id,cat_id,state_id,wm_yr_wk,...,year,snap_CA,snap_TX,snap_WI,event_name_1,event_name_2,event_name_3,event_type_1,event_type_2,event_type_3
0,FOODS_1_001_CA_1,2011-01-23,3.0,0,FOODS_1_001,CA_1,FOODS_1,FOODS,CA,11101,...,2011,0,0,0,,,,,,
1,FOODS_1_001_CA_1,2011-01-30,9.0,0,FOODS_1_001,CA_1,FOODS_1,FOODS,CA,11101,...,2011,1,1,1,,,,,,
2,FOODS_1_001_CA_1,2011-02-06,7.0,0,FOODS_1_001,CA_1,FOODS_1,FOODS,CA,11102,...,2011,1,1,1,SuperBowl,,,Sporting,,


---

<div style="text-align: center;">

## 3. Compute Diagnostics

<div style="background: linear-gradient(135deg, #2E86AB 0%, #1a5276 100%); color: white; padding: 12px 20px; border-radius: 8px; margin: 10px auto; max-width: 600px;">
<strong>What patterns exist in the data?</strong><br>
<em>50+ metrics from  tsfeatures + tsforge that describe trend, seasonality, noise, intermittency, and more.</em>
</div>

</div>

### 3.1 Calculate diagnostics from `tsfeatures`

`tsfeatures` extracts dozens of time series characteristics automatically — this is our first systematic look at the portfolio.

In [None]:
diagnostics = tsfeatures(weekly_df, freq=52, threads=8)  # All defaults
adi_df = (weekly_df.groupby('unique_id')['y'].apply(lambda y: ADI(y.values, freq=52)["adi"]).reset_index(name='adi'))
diagnostics = diagnostics.merge(adi_df, on='unique_id')


✓ Loaded 'diagnostics'
   Module: 1_09 | Shape: 30,490 × 43


### 3.3 Merge Hierarchy Metadata

Attach business dimensions so we can slice diagnostics by department, category, store.

In [23]:
# Get hierarchy from original data
hierarchy = (
    weekly_df[['unique_id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id']]
    .drop_duplicates(subset=['unique_id'])
)

# Merge
diagnostics = diagnostics.merge(hierarchy, on='unique_id', how='left')

### 3.4 Preview Key Metrics

In [25]:
KEY_METRICS = ['trend', 'seasonal_strength', 'entropy', 'adi']

diagnostics[['unique_id', 'cat_id', 'dept_id'] + KEY_METRICS].head(10)

Unnamed: 0,unique_id,cat_id,dept_id,trend,seasonal_strength,entropy,adi
0,FOODS_1_001_CA_1,FOODS,FOODS_1,0.20445,0.376623,0.850729,1.105469
1,FOODS_1_001_CA_2,FOODS,FOODS_1,0.22328,0.439298,0.843529,1.105469
2,FOODS_1_001_CA_3,FOODS,FOODS_1,0.162804,0.384099,0.895521,1.118577
3,FOODS_1_001_CA_4,FOODS,FOODS_1,0.110839,0.479389,0.87317,1.276018
4,FOODS_1_001_TX_1,FOODS,FOODS_1,0.260977,0.376637,0.866836,1.200855
5,FOODS_1_001_TX_2,FOODS,FOODS_1,0.212788,0.397528,0.855971,1.156379
6,FOODS_1_001_TX_3,FOODS,FOODS_1,0.048173,0.417552,0.886295,1.181435
7,FOODS_1_001_WI_1,FOODS,FOODS_1,0.287522,0.423122,0.837675,1.119522
8,FOODS_1_001_WI_2,FOODS,FOODS_1,0.392052,0.434856,0.828196,1.288991
9,FOODS_1_001_WI_3,FOODS,FOODS_1,0.466792,0.466487,0.828913,1.524324


---

<div style="text-align: center;">

## 4. Structure & Chaos Overview

<div style="background: linear-gradient(135deg, #4A90A4 0%, #2d5a6b 100%); color: white; padding: 12px 20px; border-radius: 8px; margin: 10px auto; max-width: 600px;">
<strong>One view of all key metrics</strong><br>
<em>Structure (trend, seasonality) vs Chaos (entropy, intermittency, variability)</em>
</div>

</div>

In [27]:
# Single call to visualize all key metrics
# Includes: histogram + KDE, median line, threshold with percentage

plot_distribution(
    diagnostics,
    columns=['trend', 'seasonal_strength', 'entropy', 'adi'],
    mode='facet',
    wrap=2,
    use_metric_defaults=True,  # Auto: thresholds, colors, clipping
    show_kde=True,
    show_median=True,
    show_threshold_pct=True,
)

---

<div style="text-align: center;">

## 5. Deep Dive: Individual Metrics

</div>

### 5.1 Structure Metrics (Learnable Patterns)

In [31]:
# Focus on structure metrics only (blue coloring)
plot_portfolio_metrics(
    diagnostics,
    metrics=['trend', 'seasonal_strength'],
    bins=40,
    wrap=2,
    show_kde=True,
    show_median=True,
    show_threshold=True,
    style={"title": "Structure Metrics: Trend & Seasonality"},
)

### 5.2 Chaos Metrics (Unpredictability)

In [32]:
# Focus on chaos metrics (orange coloring, auto-clipped for outliers)
plot_portfolio_metrics(
    diagnostics,
    metrics=['entropy', 'adi', 'cv2', 'lumpiness'],
    bins=40,
    wrap=2,
    show_kde=True,
    show_median=True,
    show_threshold=True,
    style={"title": "Chaos Metrics: Entropy, Intermittency, Variability"},
)

---
<div style="text-align: center;">

## 6. Portfolio Summary by Segment

</div>

### 6.1 By Category

In [33]:
from tsforge.display import style_heatmap_table

display(style_heatmap_table(
    diagnostics, 'cat_id', KEY_METRICS,
    'Mean Diagnostics by Category'
))

Unnamed: 0_level_0,trend,seasonal_strength,entropy,adi
cat_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HOBBIES,0.332,0.559,0.838,1.625
HOUSEHOLD,0.357,0.572,0.82,1.52
FOODS,0.445,0.598,0.776,1.339


In [34]:
# NEW: Visualize category comparison with bar chart
cat_summary = diagnostics.groupby('cat_id')[KEY_METRICS].mean().reset_index()

# Show trend by category
plot_bar(
    cat_summary,
    x='cat_id',
    y='trend',
    orientation='h',
    sort_by='trend',
    sort_ascending=False,
    show_values=True,
    value_format='.3f',
    thresholds=[0.5],
    threshold_labels=['Strong Trend'],
    style={"title": "Mean Trend Strength by Category"},
)

### 6.2 By Department

In [35]:
display(style_heatmap_table(
    diagnostics, 'dept_id', KEY_METRICS,
    'Mean Diagnostics by Department'
))

Unnamed: 0_level_0,trend,seasonal_strength,entropy,adi
dept_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HOBBIES_1,0.345,0.542,0.826,1.484
HOBBIES_2,0.297,0.607,0.871,2.017
HOUSEHOLD_1,0.43,0.596,0.792,1.304
HOUSEHOLD_2,0.283,0.547,0.849,1.743
FOODS_1,0.358,0.539,0.803,1.372
FOODS_2,0.475,0.641,0.777,1.408
FOODS_3,0.454,0.592,0.768,1.297


In [36]:
# Department comparison: ADI (intermittency)
dept_summary = diagnostics.groupby('dept_id')[KEY_METRICS].mean().reset_index()

plot_bar(
    dept_summary,
    x='dept_id',
    y='adi',
    orientation='h',
    sort_by='adi',
    sort_ascending=False,
    show_values=True,
    value_format='.2f',
    thresholds=[1.32],
    threshold_labels=['Intermittent'],
    style={"title": "Mean ADI (Intermittency) by Department"},
)

---

## 7. Save Output

In [None]:
# Save diagnostics for downstream modules
diagnostics.to_parquet('./1.09_diagnostics.parquet', index=False)