# Employee ETL Pipeline - Bronze Layer

In this first layer, we are focusing on getting data from source as-is, and we don't make many transformations. 

Architecture Overview:  
![bronze.jpg](./bronze.jpg "bronze.jpg")


See more on _Medallion Architecture_: https://www.databricks.com/glossary/medallion-architecture

## Import Libraries

In [0]:
from pyspark.sql import functions as F

## Create Catalog and Schema (if needed)

In [0]:
%sql
-- Create a new catalog
CREATE CATALOG IF NOT EXISTS hrdata;

-- Set the current catalog for subsequent commands
USE CATALOG hrdata;

In [0]:
%sql
-- Create the Bronze layer schema
CREATE SCHEMA IF NOT EXISTS bronze;

## Connect to Cosmos DB Source

In [0]:
cosmos_endpoint = dbutils.secrets.get(scope = "cosmosdb", key = "cosmos-endpoint")
cosmos_key = dbutils.secrets.get(scope = "cosmosdb", key = "cosmos-key")
cosmos_database = "HRDatabase"
cosmos_employees_container = "employees"

In [0]:
cosmos_config = {
    "spark.cosmos.accountEndpoint": cosmos_endpoint,
    "spark.cosmos.accountKey": cosmos_key,
    "spark.cosmos.database": cosmos_database,
    "spark.cosmos.container": cosmos_employees_container,
    "spark.cosmos.read.customQuery": "SELECT * FROM c"
}

employees = spark.read.format("cosmos.oltp").options(**cosmos_config).load()

## Look at the data

In [0]:
display(employees.limit(5))

## Data Enrichment
We don't transform the data heavily, because it is not part of Bronze Layer in a Medallion Architecture. We just add some useful additional information.

In [0]:
# add timestamp of load
employees = employees.withColumn("load_ts", F.current_timestamp())

# Write Employees Table as Delta Table

In [0]:
employees.write.format("delta") \
  .mode("overwrite") \
  .saveAsTable("hrdata.bronze.employees")