# Data Warehousing - Part 9: Advanced Dimension Types

## 1. Introduction
So far, we've dealt with standard dimensions like Product, Customer, and Store. However, real-world data modeling often presents scenarios where standard dimensions don't fit perfectly.

In this session, we will explore:
1.  **Junk Dimensions:** Handling low-cardinality flags and indicators.
2.  **Role-Playing Dimensions:** Using a single dimension for multiple purposes.
3.  **Degenerate Dimensions:** Attributes that live in the Fact table.
4.  **Slowly Changing Dimensions (SCD):** Introduction to handling history.

---

## 2. Junk Dimensions
A **Junk Dimension** is a collection of "junk" attributes—usually low-cardinality flags (Yes/No, True/False) or status indicators—combined into a single dimension table.

### The Problem: Too Many Dimensions
Imagine an Order Fact table with 10 flag columns: `Is_Shipped`, `Is_Packed`, `Is_Received`, `Is_Returned`, `Is_Gift_Wrapped`, etc.
Creating 10 separate dimension tables for these simple flags is inefficient and clutters the model.

### The Solution: Combine Them
We create *one* dimension that contains all possible combinations of these flags.

### Python Simulation: Creating a Junk Dimension

```python
import pandas as pd
import itertools

# Define the possible values for our flags
shipped_options = ['Yes', 'No']
packed_options = ['Yes', 'No']
returned_options = ['Yes', 'No']

# Generate all possible combinations (Cartesian Product)
combinations = list(itertools.product(shipped_options, packed_options, returned_options))

# Create the Junk Dimension DataFrame
dim_order_status_junk = pd.DataFrame(combinations, columns=['Is_Shipped', 'Is_Packed', 'Is_Returned'])

# Assign a Surrogate Key
dim_order_status_junk['Order_Status_Key'] = dim_order_status_junk.index + 1

print("--- Junk Dimension: Order Status ---")
print(f"Total Rows: {len(dim_order_status_junk)}")
display(dim_order_status_junk)

# Now, the Fact table only needs ONE Foreign Key column: Order_Status_Key
```

---

## 3. Role-Playing Dimensions
A **Role-Playing Dimension** is a single dimension table that is used multiple times in the same Fact table, but for different purposes (roles).

### The Classic Example: Date Dimension
In an Order Fact table, we often have:
*   `Order_Date`
*   `Ship_Date`
*   `Delivery_Date`

We **do not** create three separate physical tables (`Dim_Order_Date`, `Dim_Ship_Date`). Instead, we use **one** `Dim_Date` table and join to it three times using aliases (views).

### Python Simulation: Role-Playing

```python
# 1. The Physical Dimension Table
dim_date = pd.DataFrame({
    'Date_Key': [20230101, 20230102, 20230103, 20230105],
    'Full_Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-05'],
    'Day_Of_Week': ['Sunday', 'Monday', 'Tuesday', 'Thursday']
})

# 2. The Fact Table (Has multiple FKs pointing to Date)
fact_orders = pd.DataFrame({
    'Order_ID': [1001],
    'Order_Date_Key': [20230101],
    'Ship_Date_Key': [20230103]
})

print("--- Fact Table ---")
display(fact_orders)

# 3. Simulating the Join (Role Playing)
# Join 1: For Order Date
report = pd.merge(fact_orders, dim_date, left_on='Order_Date_Key', right_on='Date_Key')
report = report.rename(columns={'Full_Date': 'Order_Date_Actual'})

# Join 2: For Ship Date (Using the same dim_date table)
report = pd.merge(report, dim_date, left_on='Ship_Date_Key', right_on='Date_Key', suffixes=('_Order', '_Ship'))
report = report.rename(columns={'Full_Date': 'Ship_Date_Actual'})

# Final View
final_view = report[['Order_ID', 'Order_Date_Actual', 'Ship_Date_Actual']]
print("\n--- Report with Role-Playing Dates ---")
display(final_view)
```

---

## 4. Degenerate Dimensions
A **Degenerate Dimension** is a dimension attribute that is stored **directly in the Fact table**, rather than in a separate dimension table.

### Characteristics
*   It has no attributes of its own (no descriptive context).
*   It is usually a transaction identifier.

### Example: `Order_ID` or `Invoice_Number`
*   `Order_ID` is unique to the transaction.
*   If we created a separate `Dim_Order` table, it would have 1 row for every 1 row in the Fact table (1:1 relationship). This provides no compression or performance benefit.
*   **Decision:** We keep `Order_ID` inside the Fact table.

### Other Examples
*   `Bill_Of_Lading_Number`
*   `Ticket_Number`

---

## 5. Introduction to SCD (Slowly Changing Dimensions)
The transcript briefly introduces SCDs, which are crucial for handling data history.

*   **Scenario:** A customer moves from New York to California.
*   **The Question:** Do we update their address? If we do, all their *past* orders (made when they lived in NY) will now look like they were made in CA. This falsifies history.

### The Types of SCD
1.  **Type 1 (Overwrite):** No history kept. Update the record. Past data is lost.
2.  **Type 2 (Add Row):** Keep history. Insert a new row for the customer with the new address and mark the old row as inactive.
3.  **Type 3 (Add Column):** Keep limited history. Add a column `Previous_Address` to the existing row.

We will explore these in depth with coding examples in the next notebook.

---

## 6. Summary Table

| Dimension Type | Description | Best Use Case |
| :--- | :--- | :--- |
| **Standard** | Normal descriptive table | Customer, Product, Store |
| **Junk** | Combination of low-cardinality flags | Status codes, Yes/No flags |
| **Role-Playing** | One table used multiple times | Date (Order, Ship, Deliver), Employee (Buyer, Seller) |
| **Degenerate** | Attribute stored in Fact table | Order ID, Invoice Number, Ticket ID |

---

## 7. Next Steps
In the next session, we will deep dive into **Slowly Changing Dimensions (SCD Types 1, 2, and 3)** and learn how to implement historical tracking in our Data Warehouse.