
## What is a Data Warehouse?

A **Data Warehouse** is a centralized system used for storing, organizing, and analyzing data collected from multiple source systems.

The classical and most widely accepted definition comes from **Bill Inmon**, often referred to as the *father of data warehousing*.

> A data warehouse is a **subject‑oriented, integrated, time‑variant, and non‑volatile** collection of data that supports management’s decision‑making process.



### Reporting Without a Data Warehouse (Manual & Fragmented)

![Manual Reporting Process](../images/manual_process.png)

In this scenario:
- Each analyst independently collects and transforms data
- Reports are generated using different tools (Excel, PowerPoint, BI tools)
- Data freshness and accuracy vary across reports

This approach does **not scale** and is error‑prone.



### Key Characteristics of a Data Warehouse

In 1990, **Bill Inmon**, known as the *Father of Data Warehousing*, defined four essential characteristics:

1. **Integrated**  
   - Data from multiple operational and external systems is combined into a unified format.  
    - Cleaned
    - Standardized
    - Unified into a consistent format

2. **Subject-Oriented**  
   - Data is organized by subject areas (e.g., customers, products, sales), not by application. 
        - Sales
        - Customers
        - Finance
        - Products

3. **Time-Variant**  
   - Contains **historical data** (not just current state), enabling trend analysis over time.  
    - Daily snapshots
    - Monthly trends
    - Year‑over‑year comparisons

4. **Non-Volatile**  
   - Data is **stable between refresh cycles**.  
   - New or updated data is loaded in periodic batches (e.g., nightly).  
   - Once loaded, data is not modified by transactional updates.  
   - It is **not deleted**
    - It is **not updated frequently**
    - Data is mostly append‑only

This ensures reporting stability.



### Visual Definition of a Data Warehouse

![Data Warehouse Definition](../images/dwh.jpg)



## Why Do Companies Need a Data Warehouse?

Without a proper data warehouse, organizations often face:

- Manual data collection
- Multiple inconsistent reports
- Delayed decision‑making
- High risk of human errors
- Poor handling of large data volumes

In such environments:
- Reports may be days or weeks out of sync
- Analysts spend more time cleaning data than analyzing it
- Business decisions are based on outdated or conflicting information



## How Data Flows into a Data Warehouse

- Data is **copied, not moved** from operational systems.  
- Source systems continue to operate independently.  
- The warehouse is periodically refreshed (commonly once per day).  
- Data is often **restructured and reorganized** to optimize analytical queries.



# Why Build a Data Warehouse?

Organizations invest time, resources, and money into building a data warehouse for two main reasons:

1. **Data-Driven Decision Making**  
   - Decisions are based on reliable data, not just intuition or experience.  
   - Enables analysis across past, present, and predictive (future) perspectives.  
   - Even supports exploration of the *unknown* through advanced analytics.  

2. **One-Stop Shopping**  
   - All data is consolidated into a single repository.  
   - Eliminates the need to gather scattered data from multiple operational systems.  
   - Analysts can focus on *analysis*, not on repeatedly collecting and integrating data.  



## Data Warehousing and Business Intelligence (BI)

- **Business Intelligence (BI)** and **Data Warehousing** emerged around the same time (~1990).  
- They reinforced each other:  
  - BI popularized data warehousing by providing tools to extract value.  
  - Data warehouses fueled BI with integrated, historical data.  
- Together, they form the backbone of modern data-driven organizations.



## Centralized Data Warehouse‑Driven Reporting

With a data warehouse:
- Data ingestion is automated
- Transformations are standardized
- Reports consume data from a single trusted source

This results in:
- Faster insights (hours instead of weeks)
- Consistent reporting
- Reduced operational stress



### Data Warehouse Flow

![Automated Data Warehouse Flow](../images/automated.jpg)

Key advantages:
- Automated ETL pipelines
- Integrated data sources
- Consistent historical tracking
- Scalable for large data volumes
