# Data Warehousing - Part 13: Fact Tables (Transactional & Aggregated)

## 1. Introduction to Fact Tables
Fact tables are the heart of the Star Schema. They store the **Measures** (numbers) and foreign keys to Dimensions.
There are three main types of Fact Tables:
1.  **Transactional Fact Table:** Records every single event/transaction.
2.  **Periodic Snapshot Fact Table (Aggregated):** Records the state of things at a regular interval (e.g., Daily Stock Level).
3.  **Accumulating Snapshot Fact Table:** Records the milestones of a process (e.g., Order Placed -> Shipped -> Delivered).

In this session, we will focus on **Transactional** and **Periodic Snapshot (Aggregated)** facts.

---

## 2. Transactional Fact Table
*   **Grain:** Lowest level (Atomic). One row per transaction line item.
*   **Characteristics:**
    *   Dense data (millions/billions of rows).
    *   Captures data as it happens.
    *   Most flexible for analysis (can be rolled up to any level).

### Python Simulation: Transactional Load
Let's simulate a Source System generating orders every day, and a Batch ETL loading them into the Fact Table.

```python
import pandas as pd

# --- 1. Source System (OLTP) ---
# Orders happening over a week
source_data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-03'],
    'Order_ID': [101, 102, 103, 104],
    'Product': ['Pen', 'Pencil', 'Pen', 'Notebook'],
    'Amount': [10, 5, 10, 20]
}
df_source = pd.DataFrame(source_data)

print("--- Source System (Transactional Data) ---")
display(df_source)

# --- 2. Transactional Fact Table (Target) ---
# In a transactional load, we move this data 'as is' (usually appending).
# Grain: One row per order per product.

df_fact_transactional = df_source.copy() # ETL Process (Extract -> Load)

print("\n--- Transactional Fact Table (Grain: Atomic) ---")
display(df_fact_transactional)
```

*Note: If we have 1 million orders, this table has 1 million rows. It is huge but contains every detail.*

---

## 3. Periodic Snapshot Fact Table (Aggregated)
*   **Grain:** Higher level (Aggregated). One row per period (Day/Month) per dimension.
*   **Characteristics:**
    *   Less dense than transactional facts.
    *   Used for performance. If users only query "Monthly Sales", why query the 1-billion-row atomic table? Query the pre-aggregated monthly table instead.
    *   **Trade-off:** You lose detail. You cannot drill down to individual Order IDs from here.

### Python Simulation: Creating a Monthly Snapshot
Scenario: Amazon wants a report of Total Sales per Product per Month. Instead of scanning the huge Transactional Fact every time, we build an Aggregated Fact.

```python
# --- 3. Creating an Aggregated Fact (Periodic Snapshot) ---

# We assume 'df_fact_transactional' is our base.
# We convert dates to 'Month' periods.
df_fact_transactional['Month'] = pd.to_datetime(df_fact_transactional['Date']).dt.to_period('M')

# Aggregate logic: Group by Month and Product
df_fact_periodic = df_fact_transactional.groupby(['Month', 'Product'])['Amount'].sum().reset_index()

print("--- Periodic Snapshot Fact Table (Grain: Monthly) ---")
display(df_fact_periodic)
```

### Use Case: Point-in-Time Snapshot (Inventory)
Periodic snapshots are most famous for **Inventory** or **Balance** data.
*   *Question:* "What was the stock level of Red Pens on Jan 31st?"
*   You cannot easily get this from a transactional table (you'd have to sum all incoming and subtract all outgoing items since the beginning of time).
*   Instead, we take a **Snapshot** every day/month.

```python
# Simulation: Inventory Snapshot
inventory_snapshot_data = {
    'Snapshot_Date': ['2023-01-31', '2023-02-28'],
    'Product': ['Red Pen', 'Red Pen'],
    'Quantity_On_Hand': [100, 120] # State of the world on that specific day
}
df_inv_snapshot = pd.DataFrame(inventory_snapshot_data)

print("\n--- Inventory Periodic Snapshot ---")
display(df_inv_snapshot)
```

---

## 4. Accumulating Snapshots (Concept)
While not the main focus of the video, it was briefly mentioned as "Point on Time" in the context of accumulating values.

If we track a metric like **"Total Subscribers to Date"**, this is an accumulating number.
*   Jan: 10 New Subscribers. (Total: 10)
*   Feb: 5 New Subscribers. (Total: 15)
*   Mar: 15 New Subscribers. (Total: 30)

A snapshot table holding the **Total** (30) allows executives to see the current state immediately without summing up the history.

---

## 5. Summary Comparison

| Feature | Transactional Fact | Periodic Snapshot Fact |
| :--- | :--- | :--- |
| **Grain** | Atomic (Transaction/Line Item) | Aggregated (Day/Month/Week) |
| **Volume** | Huge (Millions of rows) | Moderate |
| **Detail** | High (Can see Order IDs) | Low (Summarized) |
| **Use Case** | Deep analysis, Drill-down | Trend analysis, Dashboarding, Inventory |
| **Loading** | Append new transactions | Aggregate and Insert period row |

---

## 6. Next Steps
Congratulations! You have completed the Data Warehousing course series.

**Recap of the Journey:**
1.  **Intro:** Architectures (OLTP, OLAP, Data Lake).
2.  **Modeling:** Measures, Attributes, Star Schemas.
3.  **Design:** Grain, KPIs, Bus Matrix.
4.  **ETL:** Loading Strategies (Full/Incremental).
5.  **Dimensions:** SCD Types (1, 2, 3), Junk, Role-Playing.
6.  **Facts:** Transactional vs. Snapshots.

You are now equipped with the theoretical and practical knowledge to design robust Data Warehouse solutions. Happy Coding!