# FRED/ALFRED Data Pipeline (Simple Overview)
This notebook exports a **minimal delivery file** from a pre-built ALFRED pipeline output.

## What happened before this notebook
1. Start from the 125 Fan et al. macro series list.
2. Map paper mnemonics to API series IDs (including manual overrides).
3. Build a monthly decision-date snapshot using ALFRED real-time data where available.
4. Handle missing real-time history with backfill rules.
5. Apply McCracken-Ng `tcode` transformations.
6. Save full audit workbook in `data/ALFRED/RAW/`.

## How series are treated
- **ALFRED available**: uses ALFRED vintage history.
- **FRED-only (no ALFRED history)**: backfilled from latest available FRED history.
- **Invalid/unmapped IDs**: remain missing and are documented in diagnostics.
- **Constructed series**: derived from component series when dependencies exist.

## Backfill policy used in the source workbook
- For missing cells, use **latest vintage** when vintages exist.
- If no ALFRED vintages exist, use **latest FRED** history.
- Every fill choice is tracked in diagnostics (`vintage_diag` / source tags in the full workbook).

## What this notebook does
1. Loads `data/ALFRED/RAW/fred_balanced_snapshot_auditable_latest_vintage_backfill.xlsx`.
2. Keeps only:
   - `snapshot_raw_bfill_latest`
   - `snapshot_tcode_bfill_latest`
   - `vintage_diag`
3. Writes `data/ALFRED/non-revised_data.xlsx`.

## Important interpretation notes
- This export is **practical**, but it is **not a strict pure real-time dataset** because latest-vintage/latest-FRED backfill is allowed.
- Use `vintage_diag` to audit where vintage depth is limited and where backfill likely matters most.


In [None]:
import pandas as pd

IN_XLSX = "data/ALFRED/RAW/fred_balanced_snapshot_auditable_latest_vintage_backfill.xlsx"
OUT_XLSX = "data/ALFRED/non-revised_data.xlsx"

KEEP_SHEETS = [
    "snapshot_raw_bfill_latest",
    "snapshot_tcode_bfill_latest",
    "vintage_diag",
]

xls = pd.ExcelFile(IN_XLSX)
missing = [s for s in KEEP_SHEETS if s not in xls.sheet_names]
if missing:
    raise ValueError(f"Missing required sheets in {IN_XLSX}: {missing}")

export = {s: pd.read_excel(IN_XLSX, sheet_name=s) for s in KEEP_SHEETS}
for s, df in export.items():
    print(f"{s}: {df.shape}")


In [None]:
with pd.ExcelWriter(OUT_XLSX, engine="openpyxl") as w:
    for sheet in KEEP_SHEETS:
        export[sheet].to_excel(w, sheet_name=sheet, index=False)

print("Wrote:", OUT_XLSX)
