# Data Warehousing Architectures – Advanced Options


So far we’ve seen **centralized warehouses** and **data marts**. 
Organizations also have more advanced architectural choices.



## Centralized vs. Component-Based

- **Centralized Warehouse (EDW)**  
  - Default go-to option.  
  - Provides *one-stop shopping* for analytics.  
  - Requires strong **data governance** and cross-organizational cooperation.  
  - Small changes can ripple across the whole system.  
  - Platforms: Relational DBs (Oracle, SQL Server, Db2), Data Warehouse Appliances, or even **Data Lakes** built on Hadoop/S3.  

- **Component-Based Warehouse**  
  - Multiple data environments (warehouses + marts) operating together.  
  - **Advantages**: Isolation of changes, mix-and-match technology, flexibility.  
  - **Challenges**: Risk of inconsistent data, complex cross-integration.  



## Enterprise Data Warehouse (EDW)

- Centralized warehouse serving most or all enterprise needs.  
- May use relational DBs, specialized appliances, or big data platforms.  
- Can overlap with data lakes for scalability and flexibility.  



## Architected Component Approaches

### Dependent Data Marts (Classic CIF Model)
- **Corporate Information Factory (CIF)** – proposed by *Bill Inmon*.  
- Combines an EDW with dependent marts.  
- Strict rules for architecture and access.  
- Rarely used today but historically significant.  

### Front-End Data Marts
- Data marts sit **before** the warehouse.  
- Example: SAP NA vs SAP International → marts per system → partial data flows into warehouse.  
- Useful when regional systems need their own primary analytics.  

### Data Mart-Only Bus (Kimball Approach)
- Proposed by *Ralph Kimball*.  
- Based on **conformed dimensions** (consistent definitions across marts: customers, products, employees, geographies).  
- Ensures “apples-to-apples” comparisons across marts.  
- Conceptually powerful but challenging to maintain.  



## Non-Architected / Federated Approach

- Collection of **independent data marts**.  
- No strict integration, often inconsistent.  
- Popular in late 1990s as a fallback when centralized systems failed.  
- **Today**: Considered a *last resort*. Data lakes and modern warehouses are better alternatives.  



# Multi-Dimensional Databases (Cubes)

- Specialized databases optimized for **dimensional data**.  
- Not relational; inherently aware of measures and dimensions.  
- **Advantages**: Very fast query performance, ideal for smaller-scale warehouses/marts (<100 GB).  
- **Disadvantages**: Less flexible, difficult structural changes, vendor lock-in.  
- **Modern Use**: Often combined with relational DBs. Example:  
  - Warehouse on RDBMS → downstream marts built on cubes.  
  - Mix-and-match architectures are common.  
- BI tools (Tableau, Power BI) typically abstract the architecture from end-users.  



# Operational Data Store (ODS)

- **Definition**: Like a warehouse, integrates data from multiple sources.  
- **Focus**: *Current operational data*, not historical.  
- **Key Traits**:  
  - Often near real-time data feeds from source systems.  
  - Supports operational BI: “Tell me what is happening right now.”  
  - Popular in late 1990s–2000s as a complement to warehouses.  

### ODS + Data Warehouse Coexistence
- **Parallel Feeds**: Sources feed both ODS and DW independently.  
- **Staged Feed**: Sources → ODS → DW (ODS acts as staging).  

### Current Relevance
- Less common today due to:  
  - Reduced latency in modern warehouses.  
  - Real-time ingestion capabilities of big data platforms.  
- Still used in mission-critical operations.  
- Sometimes embedded as a component of **data lakes**.  



## Big Picture

Modern environments may combine:  
- **Data Warehouses**  
- **Data Lakes**  
- **Data Virtualization**  
- **Operational Data Stores**  
- **Business Intelligence Tools**  

Together, these support both **strategic and operational analytics** for truly data-driven organizations.  
