# OVERVIEW OF THE MEDALLION ARCHITECTURE
---
> The Medallion Architecture is a simple and popular data design pattern used in Databricks to logically organize data in your lakehouse.

Think of it as a data quality factory. Its main job is to take data that is "raw and dirty" and progressively transform it into "clean and aggregated" data that is reliable and easy to use for business analysis.

This "factory" has three distinct quality layers, named after medals: Bronze, Silver, and Gold.


# ⛏️ BRONZE LAYER (THE RAW DATA)
--- 
## WHAT IT IS
> This is the first stop for all data entering your lakehouse. The Bronze layer is a "landing zone" where you dump raw data from all your external source systems (like databases, application APIs, CSV/Excel uploads, etc.).

The Golden Rule: The data in the Bronze layer should be an exact, unaltered copy of the source data. You do not clean, filter, or change anything.

## KEY CHARACTERISTICS
**STATE:** Raw, messy, "dirty."

**PURPOSE:** To be a historical archive or "staging area." If there is ever a problem in your downstream tables (Silver or Gold), you can always rebuild them by re-running your code on the raw Bronze data, without having to ask the source system for the data again.

**EXAMPLE (Code):** This is exactly what your COPY INTO command did in our earlier steps. It took a raw CSV file and landed it in a Delta table. That table is a perfect Bronze table.

# 🥈 SILVER LAYER (THE CLEANED DATA)
---
## WHAT IT IS
>This is the "single source of truth" for the business. This is where the real data transformation happens. You write code (SQL or Python) to read from the Bronze layer(s) and apply all your business rules.

**The Golden Rule:** Data in the Silver layer must be validated, cleaned, and consistent. It should be "query-ready" for your data analysts.

## KEY CHARACTERISTICS
**STATE:** Cleaned, validated, deduplicated, and standardized.

### TRANSFORMATIONS (ETL):

- **Filtering:** Removing records with null or bad values (e.g., price_usd = 0).
- **Data Types:** Enforcing correct schemas (e.g., casting year from STRING to INT).
- **Joining:** Merging tables (e.g., joining your bmw_sales table with a customer_info table).
- **Deduplication:** Removing duplicate records.

**PURPOSE:** To provide a reliable, query-ready source for all major business entities (e.g., a "master sales table" or "master customer table").

# 🥇 GOLD LAYER (THE BUSINESS-READY DATA)
---
## WHAT IT IS
> This is the final, highly polished layer. Gold tables are not for general exploration; they are built for a specific purpose and are ready for final consumption.

**The Golden Rule:** Data in the Gold layer is aggregated and optimized for performance. It should directly answer a specific business question.

## KEY CHARACTERISTICS
**STATE:** Aggregated, summarized, and pre-computed.

### TRANSFORMATIONS (AGGREGATION):

- **Summarizing:** Using GROUP BY to calculate metrics (e.g., SUM(price_usd) BY region, year).
- **Business Logic:** Calculating complex metrics like "year-over-year growth" or "total market share."

**PURPOSE:** To directly power Business Intelligence (BI) dashboards (like in Power BI or Tableau) or to serve as a "feature table" for a machine learning model. An analyst should be able to query a Gold table without writing any complex joins or logic.

# WHY USE THIS? (SUMMARY)
- **RELIABILITY:** You always know where to find data of a certain quality.
- **DEBUGGING:** If a dashboard is wrong (Gold), you can easily check the Silver table. If the Silver table is wrong, you check the Bronze table. It's easy to find where a mistake was introduced.

- **REUSABILITY:** The Silver layer is a shared "source of truth" that can be used to build many different Gold tables without re-cleaning the data every time.