# Module 1.10: Diagnostics — Computing the Coordinates

> **Goal:** Generate the coordinates for the portfolio map (but do not map yet).

Before we can triage our portfolio, we need to measure each series along two dimensions:

- **Structure:** How much predictable signal exists (trend, seasonality, information)
- **Chaos:** How much unpredictable noise exists (entropy, lumpiness, gaps)

This module computes those scores. The next module (1.12) will use them to build the strategic map.

| This Module (1.10) | Next Module (1.12) |
|-------------------|--------------------|
| Compute metrics | Plot the map |
| Normalize scores | Assign archetypes |
| Sanity check distributions | Strategic triage |

---

## 1. Setup and Load Data

In [1]:
# =============================================================================
# SETUP
# =============================================================================

# --- Imports ---
import sys
import warnings
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# --- Path Configuration (before local imports) ---
MODULE_DIR = Path().resolve()
PROJECT_ROOT = MODULE_DIR.parent.parent
sys.path.insert(0, str(PROJECT_ROOT))

# --- Local Imports ---
import tsforge as tsf
from src import (
    CacheManager,
    ArtifactManager,
    get_notebook_name
)

# --- Settings ---
warnings.filterwarnings("ignore")
plt.style.use("seaborn-v0_8-whitegrid")

# --- Paths ---
DATA_DIR = PROJECT_ROOT / "data"
DATA_DIR.mkdir(exist_ok=True)

# --- Managers ---
NB_NAME = get_notebook_name()
cache = CacheManager(PROJECT_ROOT / ".cache" / NB_NAME)
artifacts = ArtifactManager(PROJECT_ROOT / "artifacts")

print(f"✓ Setup complete | Root: {PROJECT_ROOT.name} | Module: {NB_NAME[:4]}")

✓ Setup complete | Root: real-world-forecasting-foundations | Module: 1_10


In [2]:
# --- Load Data from Module 1.08 ---
df = artifacts.load('1.08')

✓ Loaded '1.08' from 01_foundations/
   Shape: 6,848,887 × 20


---

## 2. Context: What is XYZ Segmentation?

In classic inventory management, **ABC-XYZ segmentation** combines two dimensions:

- **ABC (Volume):** How much does this item sell? (A = high, C = low)
- **XYZ (Predictability):** How forecastable is demand? (X = stable, Z = erratic)

We've already computed volume metrics. Now we need the **XYZ dimension** — but instead of simple CV (coefficient of variation), we'll use a richer characterization:

### Our Approach: Structure vs Chaos

We measure predictability along two axes using our **LD6 metrics** (Lean Diagnostics, 6 metrics):

| Dimension | What It Measures | High Score Means |
|-----------|------------------|------------------|
| **Structure** | Predictable patterns | Strong trend, clear seasonality, high signal |
| **Chaos** | Unpredictable noise | High entropy, lumpy demand, sparse data |

This gives us a 2D map instead of a 1D ranking — much more actionable for forecasting strategy.

---

## 3. Define Metric Groups

We separate our LD6 metrics into two groups based on what they measure:

In [3]:
# =============================================================================
# METRIC DEFINITIONS
# =============================================================================

# Structure metrics: indicators of predictable, learnable patterns
STRUCTURE_METRICS = [
    'trend_strength',      # How strong is the directional movement?
    'seasonal_strength',   # How strong is the repeating pattern?
    'mutual_information'   # How much signal is in the lagged values?
]

# Chaos metrics: indicators of noise, randomness, sparsity
CHAOS_METRICS = [
    'permutation_entropy', # How random is the ordering of values?
    'lumpiness',           # How inconsistent are the variance regimes?
    'intermittency_adi'    # How sparse is the demand? (avg days between sales)
]

# Combined list for computation
LD6_METRICS = STRUCTURE_METRICS + CHAOS_METRICS

print("Structure Group:", STRUCTURE_METRICS)
print("Chaos Group:", CHAOS_METRICS)

Structure Group: ['trend_strength', 'seasonal_strength', 'mutual_information']
Chaos Group: ['permutation_entropy', 'lumpiness', 'intermittency_adi']


### Metric Interpretation Guide

| Metric | Group | What It Measures | High Value Means |
|--------|-------|------------------|------------------|
| `trend_strength` | Structure | Strength of directional movement | Clear up/down trend |
| `seasonal_strength` | Structure | Strength of repeating cycles | Predictable seasonal pattern |
| `mutual_information` | Structure | Information in lagged values | Past predicts future |
| `permutation_entropy` | Chaos | Randomness of value ordering | Highly disordered, noisy |
| `lumpiness` | Chaos | Variance instability over time | Unstable, regime-switching |
| `intermittency_adi` | Chaos | Average inter-demand interval | Sparse, lots of zeros |

---

## 4. Compute LD6 Metrics

We compute all six metrics for every series in our dataset.

In [None]:
# =============================================================================
# COMPUTE METRICS
# =============================================================================

# Check cache first
metrics_raw = cache.load('ld6_metrics_raw')

if metrics_raw is None:
    print("Computing LD6 metrics (this may take a few minutes)...")
    
    # Compute metrics using tsforge
    metrics_raw = tsf.compute_features(
        df=df,
        id_col='unique_id',
        date_col='ds',
        value_col='y',
        features=LD6_METRICS,
        seasonal_period=52  # Weekly data with annual seasonality
    )
    
    cache.save(metrics_raw, 'ld6_metrics_raw')
    print(f"✓ Computed and cached metrics for {len(metrics_raw):,} series")
else:
    print(f"✓ Loaded cached metrics for {len(metrics_raw):,} series")

In [None]:
# Preview raw metrics
print("Raw Metrics Preview:")
print("=" * 60)
metrics_raw.head(10)

In [None]:
# Check for any missing values
print("Missing Values Check:")
print(metrics_raw[LD6_METRICS].isnull().sum())

# Basic statistics
print("\nRaw Metric Statistics:")
metrics_raw[LD6_METRICS].describe().round(3)

---

## 5. Normalize Metrics

Before averaging metrics into scores, we need to normalize them:

1. **Clip outliers:** Cap chaos metrics at 95th percentile to prevent extreme values from dominating
2. **Min-Max scale:** Transform all metrics to 0-1 range for fair comparison

This ensures that no single metric dominates the score just because of its scale.

In [None]:
# =============================================================================
# STEP 1: CLIP CHAOS METRICS AT 95TH PERCENTILE
# =============================================================================

metrics_clipped = metrics_raw.copy()

print("Clipping chaos metrics at 95th percentile:")
print("-" * 50)

for col in CHAOS_METRICS:
    p95 = metrics_raw[col].quantile(0.95)
    original_max = metrics_raw[col].max()
    
    # Clip values above 95th percentile
    metrics_clipped[col] = metrics_raw[col].clip(upper=p95)
    
    n_clipped = (metrics_raw[col] > p95).sum()
    print(f"  {col}: max {original_max:.3f} → {p95:.3f} ({n_clipped:,} values clipped)")

In [None]:
# =============================================================================
# STEP 2: MIN-MAX SCALING (0 TO 1)
# =============================================================================

metrics_normalized = metrics_clipped.copy()

print("\nApplying min-max normalization:")
print("-" * 50)

for col in LD6_METRICS:
    col_min = metrics_clipped[col].min()
    col_max = metrics_clipped[col].max()
    col_range = col_max - col_min
    
    if col_range > 0:
        metrics_normalized[col] = (metrics_clipped[col] - col_min) / col_range
    else:
        # If all values are the same, set to 0.5
        metrics_normalized[col] = 0.5
    
    print(f"  {col}: [{col_min:.3f}, {col_max:.3f}] → [0, 1]")

print("\n✓ All metrics now on 0-1 scale")

In [None]:
# Verify normalization
print("Normalized Metric Ranges:")
for col in LD6_METRICS:
    print(f"  {col}: [{metrics_normalized[col].min():.3f}, {metrics_normalized[col].max():.3f}]")

---

## 6. Calculate Super-Scores

Now we create two final coordinates by averaging the normalized metrics in each group:

- **Structure Score** = mean(trend_strength, seasonal_strength, mutual_information)
- **Chaos Score** = mean(permutation_entropy, lumpiness, intermittency_adi)

**Interpretation:**
- High Structure Score → Strong predictable patterns
- High Chaos Score → High noise/randomness/sparsity

In [None]:
# =============================================================================
# CALCULATE SUPER-SCORES
# =============================================================================

# Structure Score: average of structure metrics
metrics_normalized['structure_score'] = metrics_normalized[STRUCTURE_METRICS].mean(axis=1)

# Chaos Score: average of chaos metrics  
metrics_normalized['chaos_score'] = metrics_normalized[CHAOS_METRICS].mean(axis=1)

print("Super-Score Statistics:")
print("=" * 50)
print(f"\nStructure Score:")
print(f"  Mean: {metrics_normalized['structure_score'].mean():.3f}")
print(f"  Std:  {metrics_normalized['structure_score'].std():.3f}")
print(f"  Range: [{metrics_normalized['structure_score'].min():.3f}, {metrics_normalized['structure_score'].max():.3f}]")

print(f"\nChaos Score:")
print(f"  Mean: {metrics_normalized['chaos_score'].mean():.3f}")
print(f"  Std:  {metrics_normalized['chaos_score'].std():.3f}")
print(f"  Range: [{metrics_normalized['chaos_score'].min():.3f}, {metrics_normalized['chaos_score'].max():.3f}]")

In [None]:
# Preview the scored dataframe
score_cols = ['unique_id'] + LD6_METRICS + ['structure_score', 'chaos_score']
scores_df = metrics_normalized[score_cols].copy()

print("Scored DataFrame Preview:")
scores_df.head(10)

---

## 7. Sanity Check: Metric Distributions

Before passing these scores to the next module, let's verify the distributions look reasonable.

**What we're looking for:**
- No extreme skew (normalization worked)
- Reasonable spread (metrics differentiate series)
- No unexpected spikes at 0 or 1 (unless expected, like intermittency)

In [None]:
# =============================================================================
# INDIVIDUAL METRIC DISTRIBUTIONS
# =============================================================================

fig, axes = plt.subplots(2, 3, figsize=(14, 8))
fig.suptitle('Normalized LD6 Metric Distributions', fontsize=14, fontweight='bold')

# Structure metrics (top row)
for idx, col in enumerate(STRUCTURE_METRICS):
    ax = axes[0, idx]
    ax.hist(metrics_normalized[col], bins=50, color='#2E86AB', edgecolor='white', alpha=0.8)
    ax.set_title(col, fontsize=11)
    ax.set_xlabel('Normalized Value')
    ax.set_ylabel('Count')
    ax.axvline(metrics_normalized[col].mean(), color='red', linestyle='--', label=f'mean={metrics_normalized[col].mean():.2f}')
    ax.legend(fontsize=8)

# Chaos metrics (bottom row)  
for idx, col in enumerate(CHAOS_METRICS):
    ax = axes[1, idx]
    ax.hist(metrics_normalized[col], bins=50, color='#E94F37', edgecolor='white', alpha=0.8)
    ax.set_title(col, fontsize=11)
    ax.set_xlabel('Normalized Value')
    ax.set_ylabel('Count')
    ax.axvline(metrics_normalized[col].mean(), color='darkred', linestyle='--', label=f'mean={metrics_normalized[col].mean():.2f}')
    ax.legend(fontsize=8)

# Add row labels
axes[0, 0].set_ylabel('STRUCTURE\nCount', fontsize=10)
axes[1, 0].set_ylabel('CHAOS\nCount', fontsize=10)

plt.tight_layout()
plt.show()

In [None]:
# =============================================================================
# SUPER-SCORE DISTRIBUTIONS
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
fig.suptitle('Super-Score Distributions', fontsize=14, fontweight='bold')

# Structure Score
ax = axes[0]
ax.hist(metrics_normalized['structure_score'], bins=50, color='#2E86AB', edgecolor='white', alpha=0.8)
ax.axvline(0.5, color='gray', linestyle=':', linewidth=2, label='midpoint')
ax.axvline(metrics_normalized['structure_score'].mean(), color='red', linestyle='--', 
           label=f'mean={metrics_normalized["structure_score"].mean():.2f}')
ax.set_title('Structure Score', fontsize=12)
ax.set_xlabel('Score (0=Low Structure, 1=High Structure)')
ax.set_ylabel('Count')
ax.legend()

# Chaos Score
ax = axes[1]
ax.hist(metrics_normalized['chaos_score'], bins=50, color='#E94F37', edgecolor='white', alpha=0.8)
ax.axvline(0.5, color='gray', linestyle=':', linewidth=2, label='midpoint')
ax.axvline(metrics_normalized['chaos_score'].mean(), color='darkred', linestyle='--',
           label=f'mean={metrics_normalized["chaos_score"].mean():.2f}')
ax.set_title('Chaos Score', fontsize=12)
ax.set_xlabel('Score (0=Low Chaos, 1=High Chaos)')
ax.set_ylabel('Count')
ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# =============================================================================
# CORRELATION CHECK
# =============================================================================

print("Metric Correlations:")
print("=" * 50)
print("\nCorrelation between Structure and Chaos scores:")
corr = metrics_normalized['structure_score'].corr(metrics_normalized['chaos_score'])
print(f"  r = {corr:.3f}")

if abs(corr) < 0.3:
    print("  → Low correlation: good! These measure different things.")
elif abs(corr) < 0.6:
    print("  → Moderate correlation: some overlap, but still useful separation.")
else:
    print("  → High correlation: warning - these may be measuring similar things.")

---

## 8. Output & Save

We now have our scored dataframe ready for the portfolio architecture analysis in Module 1.12.

**Output columns:**
- `unique_id`: Series identifier
- 6 normalized LD6 metrics
- `structure_score`: Aggregate structure measure (0-1)
- `chaos_score`: Aggregate chaos measure (0-1)

**NOT included (saved for 1.12):**
- ❌ No Structure × Chaos scatter plot
- ❌ No quadrant assignments
- ❌ No archetype labels (Stable/Complex/Messy/Low Signal)

In [None]:
# =============================================================================
# FINAL OUTPUT
# =============================================================================

# Prepare final output dataframe
output_cols = ['unique_id'] + LD6_METRICS + ['structure_score', 'chaos_score']
scores_output = metrics_normalized[output_cols].copy()

print("Final Output Summary:")
print("=" * 50)
print(f"Shape: {scores_output.shape[0]:,} series × {scores_output.shape[1]} columns")
print(f"\nColumns: {list(scores_output.columns)}")
print(f"\nPreview:")
scores_output.head(10)

In [None]:
# Save as artifact for Module 1.12
artifacts.save(
    scores_output, 
    name='1.10',
    description='LD6 diagnostic scores (structure + chaos) for portfolio segmentation'
)
print("\n✓ Saved artifact '1.10' for use in Module 1.12")

---

## Key Takeaways

**What we computed:**

1. **LD6 Metrics** — Six diagnostic metrics capturing structure and chaos
2. **Normalization** — Clipped outliers + min-max scaling for fair comparison
3. **Super-Scores** — Two aggregate coordinates for each series

**What the distributions tell us:**

- Structure scores show [describe observed pattern]
- Chaos scores show [describe observed pattern]
- The two dimensions are [correlated/uncorrelated] — [interpretation]

**What's next:**

Module 1.12 will use these scores to:
- Plot the Structure × Chaos map
- Assign each series to an archetype
- Identify representative "hero" series
- Build the portfolio risk matrix

---

## Next: Module 1.12

**Portfolio Architecture — The Strategic View**

- Plot the 2×2 Structure × Chaos map
- Assign archetypes: Stable / Complex / Messy / Low Signal
- Select and validate hero series
- Build the Department × Archetype risk matrix