
## ETL Techniques – Extraction

### Extraction Methods
- Pull extraction
- Push extraction

### Extraction Types
- Full extraction
- Incremental extraction

### Extraction Techniques
- Database queries
- File ingestion
- API calls
- Change Data Capture (CDC)
- Event streaming
- Web scraping

### ETL Overview Diagram

![ETL Concepts Overview](../images/etl1.png)


## ETL Techniques – Transformation

### Data Cleansing
- Removing duplicates
- Handling missing values
- Fixing invalid records
- Data type casting
- Trimming unwanted spaces

### Data Standardization & Normalization
- Code‑to‑label mappings
- Date & currency standardization

### Data Integration
- Merging multiple sources
- Conforming dimensions

### Business Rules
- Calculated metrics
- Flags & indicators



## ETL Techniques – Load

### Processing Types
- Batch processing
- Stream processing

### Load Methods
- Truncate & Insert
- Upsert (Update + Insert)
- Append‑only
- Merge (Insert, Update, Delete)

### Slowly Changing Dimensions (SCD)
- SCD Type 0 – No change
- SCD Type 1 – Overwrite
- SCD Type 2 – Historical tracking



## Data Warehouse Architectures

### Common Approaches
- Inmon Architecture
- Kimball Architecture
- Data Vault
- Medallion Architecture (Bronze, Silver, Gold)

This project follows the **Medallion Architecture** due to its clarity, scalability, and modern adoption.



## Medallion Architecture Explained

### Bronze Layer
- Raw, unprocessed data
- One‑to‑one copy of source systems
- Used for traceability and debugging

### Silver Layer
- Cleaned and standardized data
- Basic transformations
- Still source‑aligned

### Gold Layer
- Business‑ready data
- Aggregations and analytics
- Optimized for reporting



## Separation of Concerns (Key Architecture Principle)

Each layer must have:
- A **single responsibility**
- No duplicated logic
- Clear ownership

Example:
- No business logic in Bronze
- No data cleansing in Gold

This principle ensures:
- Maintainability
- Scalability
- Debuggability



## Target Audience by Layer

| Layer  | Intended Users |
|------|---------------|
| Bronze | Data Engineers |
| Silver | Data Engineers, Analysts |
| Gold | Analysts, Business Users |

Access control is a **critical architectural decision**.



## Conclusion

A well‑designed data warehouse:
- Enables fast and reliable analytics
- Reduces operational complexity
- Supports confident business decisions

Understanding **ETL and data architecture** is foundational for any aspiring data engineer.

This notebook forms the **theoretical backbone** for building real‑world SQL‑based data warehouse projects.
