# Data Warehousing - Part 6: Grain, KPIs, and Data Modeling Fundamentals

## 1. The Most Important Concept: Grain
If you get the "Grain" wrong, your entire Data Warehouse is useless.

**Definition:** The Grain is the **level of detail** (or measurement) of a single row in a Fact table.
It answers the question: *"What does one row in this table represent?"*

### Common Grains in Retail
1.  **Lowest Grain (Atomic):** One row = One specific item in a specific transaction.
    *   *Example:* Order #1001, Item: Red Pen.
2.  **Aggregated Grain (Daily):** One row = Total sales of a Product per Day per Store.
    *   *Example:* Jan 1st, NY Store, Red Pens sold: 50.
3.  **Highly Aggregated Grain (Monthly):** One row = Total sales of a Store per Month.
    *   *Example:* Jan 2023, NY Store, Total Sales: $5000.

### Why is Grain Important?
*   **Flexibility:** Lower grain (more detail) allows you to answer more questions. If you store data at a "Monthly" grain, you **cannot** answer "What happened on Jan 15th?".
*   **Storage:** Lower grain requires more storage space.

### Python Simulation: Determining Grain

```python
import pandas as pd

# Scenario: We have raw transaction data
raw_data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-01'],
    'Store': ['NY', 'NY', 'NY'],
    'Product': ['Pen', 'Pencil', 'Pen'],
    'Qty': [1, 2, 5]
}
df_raw = pd.DataFrame(raw_data)

print("--- Atomic Grain (Transaction Level) ---")
# Grain: Per Transaction, Per Product
display(df_raw)

print("\n--- Daily Store Grain (Aggregated) ---")
# Grain: Per Date, Per Store (Product detail is lost!)
df_daily_grain = df_raw.groupby(['Date', 'Store'])['Qty'].sum().reset_index()
display(df_daily_grain)
```

*Key Takeaway: You can always roll up (aggregate) from a lower grain to a higher grain, but you cannot drill down if you only store the high-level aggregate.*

---

## 2. KPI: Key Performance Indicators

**KPIs** are the *measurable values* that demonstrate how effectively a company is achieving its key business objectives. The entire purpose of building a Data Warehouse is to calculate and track these KPIs accurately.

### The Relationship: KPI -> Requirement -> Grain
1.  **KPI (The Business Need):** "We need to track 'Daily Sales Growth per Store'."
2.  **Requirement:** We need sales data.
3.  **Grain Decision:** To track "Daily" growth, our Fact table **must** have a grain of at least "Day". We cannot store data at a "Month" grain.

### Examples of KPIs
*   **Revenue:** Total Sales Amount.
*   **Profit Margin:** (Revenue - Cost) / Revenue.
*   **Customer Retention Rate:** % of customers who return.

```python
# Simulating KPI Calculation from a Fact Table
data = {
    'Date': ['Jan-01', 'Jan-02', 'Jan-03'],
    'Revenue': [1000, 1500, 1200],
    'Cost': [800, 900, 850]
}
df_kpi = pd.DataFrame(data)

# KPI 1: Profit
df_kpi['Profit'] = df_kpi['Revenue'] - df_kpi['Cost']

# KPI 2: Profit Margin %
df_kpi['Margin_Percent'] = (df_kpi['Profit'] / df_kpi['Revenue']).round(2)

print("--- KPI Report ---")
display(df_kpi)
```

---

## 3. Where does Data Live? Facts vs. Dimensions

This is a preview of the next big topic, but essential for understanding Attributes and Measures.

### Fact Tables
*   **Content:** Contains **Measures** (Numbers) and Foreign Keys.
*   **Characteristics:** Long and narrow (Millions of rows, few columns).
*   **Examples:** `Sales_Fact`, `Inventory_Fact`.

### Dimension Tables
*   **Content:** Contains **Attributes** (Context/Description).
*   **Characteristics:** Short and wide (Few rows, many descriptive columns).
*   **Examples:** `Customer_Dim` (Name, Address, Age), `Product_Dim` (Name, Category, Color).

**The Golden Rule:**
> Measures sit in Fact Tables (99% of the time).
> Attributes sit in Dimension Tables.

### Interview Question: Is Price a Measure or Attribute?
*   **Scenario:** You have a product "Red Pen" with a price of $2.00.
*   **Answer:** Unit Price is usually an **Attribute** in the Product Dimension because it describes the product.
*   **Exception:** When a transaction happens, `Extended_Sales_Amount` (Price * Qty) becomes a **Measure** in the Fact table.

---

## 4. Summary

| Concept | Definition | Importance |
| :--- | :--- | :--- |
| **Grain** | The level of detail of a single row. | Determines what questions the DW can answer. |
| **KPI** | Business metric (e.g., Profit %). | Defines the requirements for the DW design. |
| **Atomic Grain** | Lowest level of detail. | Most flexible, highest storage. |
| **Aggregated Grain** | Summarized data. | Less flexible, faster performance. |

---

## 5. Next Steps
In the next session, we will finally put everything together and explore the architectural patterns: **Dimensions, Facts, and the Star Schema**.