<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/730_RGOv2_DataScopeRevisions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RGOv2_5 — Data & Scope Revisions (MVP Lock)

**Purpose:** Ground the v2 strategic layer to the actual CSV schema. No strategy change; implementation layer locked for build.

**Status:** MVP scope frozen. Implementation-ready.

---

## 1. Actual Data Schema (Source of Truth)

### retail_customers.csv
| Column          | Use in v2 MVP |
|-----------------|----------------|
| customer_id     | Primary key    |
| age             | Not used       |
| household_size  | Not used       |
| loyalty_member  | Optional for tier (see below) |

**No:** region, tier, signup_date, baseline_revenue, lifetime_value

### retail_weekly_sales.csv
| Column                | Use in v2 MVP |
|-----------------------|----------------|
| customer_id           | Join key       |
| week_start_date       | Time axis; join to stock.week_start |
| weekly_spend          | **Revenue** (treat as revenue in all logic/reports) |
| store_id              | Join to stock for operational attribution |
| week_number, month, … | As needed     |
| is_zero_spend         | Signal         |
| is_high_value_customer| **Tier derivation** (see below) |

**No:** product_category, units_sold

### stock_availability.csv
| Column          | Use in v2 MVP |
|-----------------|----------------|
| store_id        | Join to sales.store_id |
| sku             | For stockout definition (any SKU at store) |
| week_start      | Join to sales.week_start_date (align dates) |
| on_hand_units   | Stockout = 0 |

**No:** customer_id. Attribution is **store-level**.

---

## 2. Tier Derivation (No Tier Column)

**Rule (MVP):**

```
tier = "high"   if is_high_value_customer == True  (from sales)
tier = "standard"  otherwise
```

Optional refinement: incorporate `loyalty_member` (e.g. high + loyalty → strategic) in config. Default: two tiers above.

Tier multipliers (REI) apply as in RGOv2_1. No new CSV columns.

---

## 3. Baseline and Current Revenue (Locked)

**Definitions (configurable):**

| Metric            | Definition |
|-------------------|------------|
| current_revenue   | Rolling average of **last 4 weeks** of `weekly_spend` |
| baseline_revenue  | Rolling average of **prior 6 weeks** (trailing window, not including “current” period) |

Config example:

```python
baseline_window_weeks = 6
current_window_weeks = 4
```

No static baseline in customers table. Both derived from sales only.

---

## 4. Operational Attribution — Store-Level

**Join:** `(sales.store_id, sales.week_start_date)` ↔ `(stock.store_id, stock.week_start)`  
(Align `week_start_date` and `week_start` to same weekday.)

**Stockout definition:** For a given store and week, any row with `on_hand_units == 0` for that store+week → **stockout that week**.

**Attribution rule:**

- If the **customer’s store** had a stockout in a week when the customer had a revenue gap (or decline) → that week contributes to **operational** attribution.
- Derive: `stockout_overlap_weeks`, `percent_gap_due_to_stockout`, `stockout_impact_flag` from this store-level overlap.

No product_category or SKU-level customer sales required.

---

## 5. MVP Segment Aggregation (Only What We Have)

**In scope:**

- exposure_by_tier (tier from §2)
- exposure_by_root_cause

**Out of scope for MVP (v2.1+):**

- exposure_by_region (no region in data)
- exposure_by_category (no product/category in sales)

---

## 6. Removed From MVP Scope

| Item | Reason | When |
|------|--------|------|
| single_sku_dependency_ratio | No product/SKU in sales | v2.1+ if product-level sales exist |
| exposure_by_region | No region column | v2.1+ when available |
| exposure_by_category | No product category in sales | v2.1+ when available |
| Recurrence multiplier (REI) | Requires prior-run snapshot; adds state | v2.1 |
| Trend vs prior run (exposure_delta_percent, etc.) | Same snapshot dependency | v2.1 |

REI formula in MVP: **no recurrence term** (recurrence = 1.0). All other multipliers (churn, structural, tier) unchanged.

---

## 7. REI MVP Formula (No Recurrence)

```
REI = BaseGap × ChurnMultiplier × StructuralMultiplier × CustomerTierMultiplier
```

RecurrenceMultiplier omitted in MVP (treated as 1.0). Cap (e.g. max_exposure_multiplier = 2.5) still applies.

---

## 8. Naming Convention

- **Code/data:** use `weekly_spend` as the column name.
- **Config/reports:** may refer to “revenue” (same quantity). One place in config or docs: “revenue := weekly_spend.”

---

## 9. Summary: What’s In vs Out (MVP)

**In:**

- REI (multiplicative, capped; no recurrence).
- Structural risk scoring (decline weeks, zero-spend weeks, velocity, volatility only — no SKU dependency).
- Root cause attribution (behavioral, operational, structural, demand) with store-level stockout.
- Tier from `is_high_value_customer`; exposure_by_tier and exposure_by_root_cause only.

**Out (v2.1+):**

- Recurrence multiplier and trend vs prior run.
- Region/category segments and single_sku_dependency_ratio.

---

## 10. References

- Strategy: RGOv2.md  
- Data layers: RGOv2_0_Data_Proposal.md (interpret §3 stock attribution as store-level).  
- REI: RGOv2_1_REI_Formula.md (MVP = no recurrence).  
- Structural: RGOv2_2_StructuralRiskScoring.md (no SKU dependency).  
- Root cause: RGOv2_3_RootCauseAttributionLogic.md (operational = store-level stockout).  
- Data gen / baseline: RGOv2_4_DataGen.md (baseline/current as in §3 here; stock attribution as in §4 here).

---

**Document:** RGOv2_5_Data_and_Scope_Revisions.md  
**MVP scope:** Locked. Ready for state schema and orchestrator node plan.
