# Module 1.14: Automating Readiness (One Script)

> **Goal:** Use `tsforge` reusable readiness tool that generates a summary report.

**Key Principles:**
- Standard metrics: min history, gaps, imputation %, frequency, sparsity
- Keep it fast and readable
- Highlight exceptions

---

## Prerequisites

**Inputs:**
- `./output/m5_weekly_clean.parquet` ‚Äî From Module 1.9

**What this module shows:**
- `forecast_readiness_report()` function
- `./output/readiness_report.txt`
- `./output/readiness_summary.parquet`
- `./output/ready_series.csv`

**Data Flow:**
```
Module 1.13 (manual quality checks)
    ‚Üí Module 1.14 (automated readiness) ‚Üê YOU ARE HERE
        ‚Üí Module 1.15 (backtest planning)
```

---

### `readiness_report()` in `tsforge`

One function that:
1. Takes a DataFrame
2. Runs all quality checks
3. Prints a readable summary
4. Returns structured results
5. Saves reports for audit trail

## 2. Setup

In [None]:
import pandas as pd
import tsforge as tsf
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

Path('./output').mkdir(exist_ok=True)

print("‚úì Setup complete")

‚úì Setup complete


## 2. Run `readiness_report()` 

In [None]:
# Load data
df = pd.read_parquet('./output/m5_weekly_clean.parquet')

# Run readiness report
results = tsf.forecast_readiness_report(
    df=df,
    id_col='unique_id',
    date_col='ds',
    target_col='y',
    min_history=52,      # 1 year
    max_gap_pct=10.0,    # Max 10% gaps
    max_zero_pct=70.0    # Max 70% zeros
)

FORECAST READINESS REPORT
Generated: 2025-11-24T12:53:02

üìä DATASET
   Rows: 6,848,887
   Series: 30,490
   Date range: 2011-01-29 to 2016-06-25

‚úÖ READINESS
   Ready: 26,747 (87.7%)
   Not ready: 3,743

üìè HISTORY (min required: 52)
   Range: 19 - 283 periods
   Below threshold: 156

üï≥Ô∏è  GAPS (max allowed: 10.0%)
   Series with gaps: 0
   Above threshold: 0

üìâ SPARSITY (max zeros: 70.0%)
   Mean zero %: 28.5%
   Sparse series: 3596

‚ö†Ô∏è  ISSUES
   ‚Ä¢ 156 series have insufficient history (<52)
   ‚Ä¢ 3596 series are too sparse (>70.0% zeros)



## 6. Save Results

In [None]:
# Save series stats
results['series_stats'].to_parquet('./output/readiness_summary.parquet', index=False)
print("‚úì Saved ./output/readiness_summary.parquet")

# Save ready series list
pd.DataFrame({'unique_id': results['ready_series']}).to_csv('./output/ready_series.csv', index=False)
print(f"‚úì Saved ./output/ready_series.csv ({len(results['ready_series']):,} series)")

# Save text report
import io, sys

tsf.save_text_report(results, './output/readiness_report.txt')
print("‚úì Saved ./output/readiness_report.txt")

‚úì Saved ./output/readiness_summary.parquet
‚úì Saved ./output/ready_series.csv (26,747 series)
‚úì Saved ./output/readiness_report.txt


## 8. Key Takeaways

### The Main Function

```python
results = forecast_readiness_report(
    df=df,
    id_col='unique_id',
    date_col='ds',
    target_col='y',
    min_history=52,
    max_gap_pct=10.0,
    max_zero_pct=70.0
)
```

### What It Returns

| Key | Description |
|-----|-------------|
| `summary` | High-level metrics |
| `series_stats` | Per-series DataFrame |
| `issues` | List of problems found |
| `ready_series` | IDs ready for forecasting |
| `not_ready_series` | IDs not ready |

### Standard Metrics

- ‚úÖ History length (min, max, below threshold)
- ‚úÖ Gap percentage (series with gaps, above threshold)
- ‚úÖ Sparsity (mean zero %, sparse count)
- ‚úÖ Overall readiness verdict

### Best Practices

1. **Run before every modeling cycle**
2. **Save reports for audit trail**
3. **Adjust thresholds for your use case**
4. **Filter to ready series before training**

---

## What's Next

**Module 1.15: Backtest Plan - Rolling-Origin & Metrics**
- Define evaluation strategy
- Choose horizon, cutoffs, step size
- Select appropriate metrics

In [None]:
print("=" * 65)
print("MODULE 1.14 COMPLETE")
print("=" * 65)
print("\nOutputs:")
print("  ./output/readiness_summary.parquet")
print("  ./output/readiness_report.txt")
print("  ./output/ready_series.csv")
print("\nFunctions:")
print("  forecast_readiness_report() - Main report function")
print("  get_ready_series() - Get ready IDs")
print("  get_problem_series() - Get series by issue type")
print("  filter_to_ready() - Filter DataFrame")
print("  prepare_for_modeling() - One-stop prep")