# Data Warehousing - Part 14: Advanced Fact Tables (Accumulating Snapshot & Factless Fact)

## 1. Accumulating Snapshot Fact Table
*   **Purpose:** Designed to track the entire lifecycle of a business process that has a definite start and end.
*   **Characteristics:**
    *   Unlike other facts, this table allows **Updates** (SCD 1 behavior on Fact).
    *   One row per "Life" of an entity (e.g., one row per Order).
    *   Multiple Date Keys representing milestones (Order Date, Ship Date, Delivery Date).
*   **Use Case:** Order Fulfillment, Loan Processing, Insurance Claims (anything with a workflow).

### Python Simulation: Order Lifecycle
Let's simulate an Order tracking system.
1.  **Day 1:** Order is placed.
2.  **Day 2:** Order is shipped.
3.  **Day 3:** Order is delivered.

In an accumulating snapshot, we update the *same row* as the order moves through the pipeline.

```python
import pandas as pd
import numpy as np

# --- Day 1: Order Placed ---
# Initial state: Only Order Date is known. Others are Null (or default).
fact_accumulating = pd.DataFrame({
    'Order_ID': [1001],
    'Order_Date_Key': [20230110],
    'Ship_Date_Key': [np.nan],      # Not happened yet
    'Delivery_Date_Key': [np.nan],  # Not happened yet
    'Quantity': [10],
    'Amount': [2000],
    'Status': ['Placed']
})

print("--- State 1: Order Placed ---")
display(fact_accumulating)

# --- Day 2: Order Shipped ---
# We UPDATE the existing row (1001) with the Ship Date
fact_accumulating.loc[fact_accumulating['Order_ID'] == 1001, 'Ship_Date_Key'] = 20230111
fact_accumulating.loc[fact_accumulating['Order_ID'] == 1001, 'Status'] = 'Shipped'

print("\n--- State 2: Order Shipped (Row Updated) ---")
display(fact_accumulating)

# --- Day 3: Order Delivered ---
# We UPDATE the existing row again
fact_accumulating.loc[fact_accumulating['Order_ID'] == 1001, 'Delivery_Date_Key'] = 20230112
fact_accumulating.loc[fact_accumulating['Order_ID'] == 1001, 'Status'] = 'Delivered'

print("\n--- State 3: Order Delivered (Lifecycle Complete) ---")
display(fact_accumulating)
```

### Why use this?
It makes "Lag Analysis" incredibly fast.
*   *Question:* "What is the average time between Order and Shipment?"
*   *Query:* `AVG(Ship_Date - Order_Date)`
*   If this was a Transactional Fact, Order and Ship would be two different rows, making the calculation painful.

---

## 2. Factless Fact Table
*   **Purpose:** Used to track events that happen but have no numeric value (measure), OR to track the *absence* of events.
*   **Characteristics:**
    *   Contains only Foreign Keys (Dimension Keys).
    *   No Measures (like Sales Amount or Quantity).
    *   Sometimes a dummy measure (like `1`) is added for counting rows.
*   **Use Cases:**
    1.  **Tracking Events:** Student Attendance (Student ID, Class ID, Date ID). The "fact" is that they showed up. There is no "amount" of attendance.
    2.  **Coverage / Eligibility:** Which products were *on promotion*? Even if they didn't sell, we need to know they were on promotion to calculate "Lost Opportunity".

### Python Simulation: Student Attendance
We want to track which student attended which class on which date.

```python
# --- Factless Fact: Attendance ---
# Keys: Date, Student, Class
fact_attendance = pd.DataFrame({
    'Date_Key': [20230110, 20230110, 20230110],
    'Class_ID': ['Math-101', 'Math-101', 'Math-101'],
    'Student_ID': ['S001', 'S002', 'S003'],
    'Attendance_Flag': [1, 1, 0] # 1=Present, 0=Absent (Optional flag)
})

print("--- Factless Fact: Attendance ---")
# Notice: No "Amount" or "Price" columns. Just keys connecting dimensions.
display(fact_attendance)
```

### Python Simulation: Promotion Coverage
We want to know which products were on promotion, regardless of whether they sold.

```python
# --- Factless Fact: Promotion Coverage ---
fact_promo = pd.DataFrame({
    'Date_Key': [20230110, 20230110],
    'Product_ID': ['P1', 'P2'],
    'Promotion_ID': ['PROMO_WINTER_SALE', 'PROMO_WINTER_SALE']
})

print("\n--- Factless Fact: Promotion Coverage ---")
display(fact_promo)
```

*   **Analysis:** If we join `fact_promo` with `fact_sales` (Left Join), we can find products that were on promotion (`P1`, `P2`) but had **Zero** sales. This is vital for marketing analysis.

---

## 3. Final Summary of All Fact Tables

| Fact Type | Granularity | Measures? | Updates? | Use Case |
| :--- | :--- | :--- | :--- | :--- |
| **Transactional** | Atomic (Line Item) | Yes | No (Append) | Detail analysis, Drill-down. |
| **Periodic Snapshot** | Aggregated (Day/Month) | Yes | No (Append) | Trends, Inventory, Performance. |
| **Accumulating Snapshot** | Per Lifecycle | Yes | **Yes** | Workflow analysis, Lag times. |
| **Factless** | Per Event | No | No | Attendance, Coverage, Associations. |

---

## 4. Conclusion
This concludes the Data Warehousing module. We have covered the entire journey from raw data in OLTP systems to sophisticated Fact and Dimension modeling techniques that power modern Business Intelligence.

**Key Skills Acquired:**
*   Designing Star Schemas.
*   Choosing the right Grain.
*   Handling History with SCDs.
*   Selecting the correct Fact Table type for the business problem.

Good luck with your data engineering journey!