# Creating Silver Layer

Remember to run the previous notebooks first.

## Utilities

In [0]:
database_name = "northwind"
spark.sql(f"USE SCHEMA {database_name};")

Out[1]: DataFrame[]

## Creating Silver Layer

4 tables will be created:
- br_products -> si_products: no transformation needed - just merging into.
- br_order_details -> si_order_details: no transformation needed - just merging into.
- br_orders -> si_orders: null values (only present in columns of string type) will be converted into "Not specified" and columns with dates will be converted from timestamp to date.
- br_employees -> si_employees: null values on column ```region``` will be converted into "Not specified" and droped on column ```reports_to```. Also, columns with dates will be converted from timestamp to date.
- br_customers -> si_customers: null values on column ```region``` will be converted into "Not specified" and to "Not provided" on columns ```postal_code, fax```.

#### Table: "products"

In [0]:
spark.sql(f"""
    CREATE TABLE IF NOT EXISTS si_products
    TBLPROPERTIES('quality'='silver')
    AS SELECT *
    FROM br_products
    LIMIT 0
""")

spark.sql(f""" 
    MERGE INTO si_products
    USING br_products
    ON si_products.product_id = br_products.product_id
    WHEN MATCHED THEN
      UPDATE SET *
    WHEN NOT MATCHED
      THEN INSERT *
""")

Out[2]: DataFrame[num_affected_rows: bigint, num_updated_rows: bigint, num_deleted_rows: bigint, num_inserted_rows: bigint]

#### Table: "order_details"

In [0]:
spark.sql(f"""
    CREATE TABLE IF NOT EXISTS si_order_details
    TBLPROPERTIES('quality'='silver')
    AS SELECT *
    FROM br_order_details
    LIMIT 0
""")

spark.sql(f""" 
    MERGE INTO si_order_details
    USING br_order_details
    ON (si_order_details.order_id = br_order_details.order_id
      AND si_order_details.product_id = br_order_details.product_id)
    WHEN MATCHED THEN
      UPDATE SET *
    WHEN NOT MATCHED
      THEN INSERT *
""")

Out[3]: DataFrame[num_affected_rows: bigint, num_updated_rows: bigint, num_deleted_rows: bigint, num_inserted_rows: bigint]

#### Table: "orders"

In [0]:
spark.sql(f"""
    CREATE TABLE IF NOT EXISTS si_orders
    TBLPROPERTIES('quality'='silver')
    AS SELECT *
    FROM br_orders
""")

spark.sql(f"""
    UPDATE si_orders
    SET ship_region = COALESCE(ship_region, 'Not specified'),
        ship_postal_code = COALESCE(ship_postal_code, 'Not specified'),
        order_date = CAST(order_date AS DATE),
        required_date = CAST(required_date AS DATE),
        shipped_date = CAST(shipped_date AS DATE);
""")

Out[4]: DataFrame[num_affected_rows: bigint]

#### Table: "employees"

In [0]:
spark.sql(f"""
    CREATE TABLE IF NOT EXISTS si_employees
    TBLPROPERTIES('quality'='silver')
    AS SELECT *
    FROM br_employees
""")

spark.sql(f"""
    DELETE FROM si_employees 
    WHERE reports_to IS NULL;
""")

spark.sql(f"""
    UPDATE si_employees 
    SET region = COALESCE(region, 'Not specified'),
        birth_date = CAST(birth_date AS DATE),
        hire_date = CAST(hire_date AS DATE);
""")

Out[5]: DataFrame[num_affected_rows: bigint]

#### Table: "customers"

In [0]:
spark.sql(f"""
    CREATE TABLE IF NOT EXISTS si_customers
    TBLPROPERTIES('quality'='silver')
    AS SELECT *
    FROM br_customers
""")

spark.sql(f"""
    UPDATE si_customers
    SET region = COALESCE(region, 'Not specified'),
        postal_code = COALESCE(postal_code, 'Not provided'),
        fax = COALESCE(fax, 'Not provided');
""")

Out[6]: DataFrame[num_affected_rows: bigint]