# Data Warehousing - Part 8: Conformed Dimensions & The Bus Matrix

## 1. The Problem: Siloed Data
In the previous demo, we built a Star Schema for **Sales (Orders)**. But a real company doesn't just sell things; it also manages inventory, pays employees, and buys from vendors.

If we build a separate Data Warehouse for Sales and a completely separate one for Inventory, we create **Data Silos**.
*   *Sales Team says:* "We sold 50 units of Product X."
*   *Inventory Team says:* "We don't have 'Product X', we call it 'Item X-Ray'."
*   **Result:** You cannot compare Sales vs. Inventory because the "Product" definition doesn't match.

## 2. The Solution: Conformed Dimensions
A **Conformed Dimension** is a dimension that has the **same meaning and content** when referred to by different Fact tables.

*   **Write Once, Read Everywhere:** You create the `Product Dimension` table **once**.
*   **Link Multiple Facts:** Both the `Sales_Fact` and `Inventory_Fact` tables point to this same `Product Dimension`.

### Transcript Scenario
The transcript describes two processes:
1.  **Order Fact (Sales)**
2.  **Inventory Fact**

They share common dimensions: **Product, Staff, Time, Location**.
They have specific dimensions: **Customer** (Sales only), **Vendor** (Inventory only).

### Python Simulation: Drill-Across Query
Let's simulate two different business processes sharing a Conformed Dimension to answer a complex question: *"Did we oversell products that we didn't have in stock?"*

```python
import pandas as pd

# --- 1. The Conformed Dimension (Shared) ---
# This table is the "Source of Truth" for Products across the entire company.
dim_product = pd.DataFrame({
    'Product_Key': [1, 2, 3],
    'Product_Name': ['Red Pen', 'Blue Pen', 'Notebook'],
    'Category': ['Stationery', 'Stationery', 'Paper']
})

print("--- Conformed Dimension: Product ---")
display(dim_product)

# --- 2. Fact Table 1: Orders (Sales Subject Area) ---
# Links to Product_Key
fact_orders = pd.DataFrame({
    'Date_Key': [20230101, 20230101],
    'Product_Key': [1, 2], # Sold Red Pen and Blue Pen
    'Quantity_Sold': [100, 50]
})

# --- 3. Fact Table 2: Inventory (Inventory Subject Area) ---
# Links to Product_Key
fact_inventory = pd.DataFrame({
    'Date_Key': [20230101, 20230101, 20230101],
    'Product_Key': [1, 2, 3], # Inventory for all items
    'Quantity_On_Hand': [80, 100, 200]
})

print("\n--- Fact 1: Orders ---")
display(fact_orders)
print("\n--- Fact 2: Inventory ---")
display(fact_inventory)
```

### Performing the Analysis
Because `Product_Key` is conformed (shared), we can join these two separate business processes easily.

```python
# Merge Fact Orders with Product
sales_analysis = pd.merge(fact_orders, dim_product, on='Product_Key')

# Merge Fact Inventory with Product
inv_analysis = pd.merge(fact_inventory, dim_product, on='Product_Key')

# --- DRILL ACROSS REPORT ---
# Compare Sales vs Inventory
# We join the two facts on the Common Dimension Key
report = pd.merge(sales_analysis, inv_analysis, on=['Product_Key', 'Product_Name'], suffixes=('_Sold', '_InStock'))

# Calculate a metric spanning two departments
report['Stock_Status'] = report.apply(
    lambda x: 'Oversold' if x['Quantity_Sold'] > x['Quantity_On_Hand'] else 'OK', 
    axis=1
)

cols = ['Product_Name', 'Quantity_Sold', 'Quantity_On_Hand', 'Stock_Status']
print("\n--- Integrated Business Report (Sales + Inventory) ---")
display(report[cols])
```

*Observation: Notice that 'Red Pen' shows 'Oversold' (Sold 100, only had 80). This insight is only possible because the Product Dimension is conformed.*

---

## 3. The Data Warehouse Bus Matrix
To plan which dimensions should be conformed, architects use a tool called the **Bus Matrix**. It is a grid showing the intersection of Business Processes (Rows) and Dimensions (Columns).

Let's visualize the Bus Matrix described in the video.

```python
# Simulating a Bus Matrix
bus_matrix_data = {
    'Business Process (Fact)': ['Order / Sales', 'Inventory', 'Procurement', 'HR / Payroll'],
    'Date_Dim': ['X', 'X', 'X', 'X'],      # Shared by all
    'Product_Dim': ['X', 'X', 'X', ''],    # Shared by Sales, Inv, Proc
    'Employee_Dim': ['X', 'X', 'X', 'X'],  # Shared by all
    'Customer_Dim': ['X', '', '', ''],     # Specific to Sales
    'Vendor_Dim': ['', 'X', 'X', '']       # Specific to Inv, Proc
}

df_bus_matrix = pd.DataFrame(bus_matrix_data)
# Set Index for better visualization
df_bus_matrix.set_index('Business Process (Fact)', inplace=True)

print("--- Data Warehouse Bus Matrix ---")
print("X = Dimension is used by this Fact")
display(df_bus_matrix)
```

### Key Takeaways from the Matrix:
1.  **Date** and **Employee** are highly conformed (used everywhere). Designing them correctly is critical.
2.  **Product** is shared by Supply Chain and Sales, but not HR.
3.  **Customer** is unique to Sales.

---

## 4. Summary

| Concept | Definition |
| :--- | :--- |
| **Conformed Dimension** | A dimension used by more than one Fact table with the same meaning. |
| **Drill Across** | Querying metrics from two different facts (e.g., Sales and Inventory) using common dimensions. |
| **Bus Matrix** | A planning tool to identify which dimensions are shared across business processes. |

---

## 5. Next Steps
We have covered the logical design. In the next lectures, we will dive deeper into specific types of Dimensions and Facts, and then move into the final implementation details.