#Gold Layer - Business-Ready Data

**Purpose of This Layer**

The Gold layer represents the final, business-consumable data model in the Lakehouse architecture. It is designed to serve analytics, reporting, and decision-making needs by presenting clean, integrated, and standardized datasets in a star schema format.

While the Bronze and Silver layers focus on ingestion and data quality, the Gold layer focuses on clarity, usability, and alignment with how the business thinks about data.

**What This Layer Contains**

This layer exposes three logical views, each built from curated Silver-layer data:
- Customer Dimension (dim_customers): Provides a unified customer profile by combining CRM and ERP data, resolving conflicts between systems, and standardizing key attributes such as country, gender, and marital status.
- Product Dimension (dim_products): Represents the current, active product catalog with consistent categorization, product hierarchy, and cost attributes. Historical product records are intentionally excluded to keep reporting simple and relevant.
- Sales Fact (fact_sales): Captures transactional sales data at the order-line level, linking each sale to its corresponding customer and product using surrogate keys.
Together, these views form a star schema, which is a widely used design pattern for Business Intelligence and reporting.

**Design Principles Applied**

- Business-first modeling: Tables are structured around business entities (Customers, Products, Sales), not source-system tables.
- Single source of truth: All reporting and dashboards are built on top of these Gold views to ensure consistency.
- Separation of concerns:
    - Data cleaning and standardization happen in the Silver layer
    - Business-ready modeling happens here
    - Aggregations and KPIs are handled in the analytics or reporting layer (e.g., Power BI)
- Views instead of physical tables: Using views keeps the Gold layer lightweight, always up to date, and easy to evolve as business requirements change.

**Intended Usage**

The Gold layer is intended to be:
- Consumed directly by BI tools such as Power BI
- Used by analysts and business users without needing to understand raw source systems
- Extended in the future if performance or scale requires materialization
This layer bridges the gap between clean data and actionable insights, making it the foundation for dashboards, KPIs, and strategic analysis.



##Create Dimension View: gold.dim_customers

In [0]:
%sql
DROP VIEW IF EXISTS workspace.gold.dim_customers;

CREATE VIEW workspace.gold.dim_customers AS
SELECT
    ROW_NUMBER() OVER (ORDER BY ci.cst_id) AS customer_key,   -- Surrogate key
    ci.cst_id                              AS customer_id,
    ci.cst_key                             AS customer_number,
    ci.cst_firstname                       AS first_name,
    ci.cst_lastname                        AS last_name,
    la.cntry                               AS country,
    ci.cst_marital_status                  AS marital_status,
    CASE
        WHEN ci.cst_gndr <> 'n/a' THEN ci.cst_gndr  -- CRM is the primary source for gender
        ELSE COALESCE(ca.gen, 'n/a')                -- Fallback to ERP data
    END                                    AS gender,
    ca.bdate                               AS birthdate,
    ci.cst_create_date                     AS create_date
FROM workspace.silver.crm_cust_info ci
LEFT JOIN workspace.silver.erp_cust_az12 ca
    ON ci.cst_key = ca.cid
LEFT JOIN workspace.silver.erp_loc_a101 la
    ON ci.cst_key = la.cid;


##Create Dimension View: gold.dim_products

In [0]:
%sql
DROP VIEW IF EXISTS workspace.gold.dim_products;

CREATE VIEW workspace.gold.dim_products AS
SELECT
    ROW_NUMBER() OVER (ORDER BY pn.prd_start_dt, pn.prd_key) AS product_key, -- Surrogate key
    pn.prd_id        AS product_id,
    pn.prd_key       AS product_number,
    pn.prd_nm        AS product_name,
    pn.cat_id        AS category_id,
    pc.cat           AS category,
    pc.subcat        AS subcategory,
    pc.maintenance   AS maintenance,
    pn.prd_cost      AS cost,
    pn.prd_line      AS product_line,
    pn.prd_start_dt  AS start_date
FROM workspace.silver.crm_prd_info pn
LEFT JOIN workspace.silver.erp_px_cat_g1v2 pc
    ON pn.cat_id = pc.id
WHERE pn.prd_end_dt IS NULL;  -- Filter out all historical data


##Create Fact View: gold.fact_sales

In [0]:
%sql
DROP VIEW IF EXISTS workspace.gold.fact_sales;

CREATE VIEW workspace.gold.fact_sales AS
SELECT
    sd.sls_ord_num    AS order_number,
    pr.product_key    AS product_key,
    cu.customer_key   AS customer_key,
    sd.sls_order_dt   AS order_date,
    sd.sls_ship_dt    AS shipping_date,
    sd.sls_due_dt     AS due_date,
    sd.sls_sales      AS sales_amount,
    sd.sls_quantity   AS quantity,
    sd.sls_price      AS price
FROM workspace.silver.crm_sales_details sd
LEFT JOIN workspace.gold.dim_products pr
    ON sd.sls_prd_key = pr.product_number
LEFT JOIN workspace.gold.dim_customers cu
    ON sd.sls_cust_id = cu.customer_id;
