# Loading Data to Silver Zone

This Notebook:
* We will iIngest data from **Bronze Zone** to **Silver Zone** using spark
* We will use Spark Structured Streaming with **`trigger(availableNow=True)`** for batch loading
* We will do a **load control** of the batch processes through Structured Streaming **checkpoint**
* We will use **`awaitTermination()`**  ethod to transform the streaming queries in a synchronous process
* We will Combine spark and sql in order to do the data load

## 1.0 Initial Setup

In [0]:
%run "/Users/cabreirajm@gmail.com/DataPipelineCabreira/Helpers/data_generator" 


## 2.0 Create `Silver Zone` Schema

In [0]:
spark.sql("CREATE DATABASE IF NOT EXISTS silver")

DataFrame[]

## 3.0 Businesse Requirements for Silver Zone

1. The ingestion need to be done in batch in order to avoid extra costs 
    * Even though the API data is stored in streaming in landing zone 
2. API and Batch Data need to be stored in the same table 
3. Each table will have an uuid column with a hash to identify each register
4. We have to garantee the correct data type of all column 
5. We need to create better column names for each table 

## 4.0 Data Modeling

### 4.1 Courses Table

The table `tb_courses` is a **domain table** and we will store all the available courses information ( the product ). Its information will be added manually.
* We will use the **md5()** function to create the **curso_uuuid** column by the course name 
* The column **data_carga** : Contains the processing date

In [0]:
spark.sql("""
    CREATE TABLE IF NOT EXISTS silver.tb_curso 
    AS
        SELECT 
            md5('Construindo o seu Primeiro Pipeline de Dados com o Databricks') AS curso_uuid,
            'Construindo o seu Primeiro Pipeline de Dados com o Databricks' AS nome_curso,
            'beginner' AS nivel_curso,
            589.90 AS valor_curso,
            getdate() AS data_carga

        UNION 

        SELECT 
            md5('Do Primeiro Pipeline ao Data Lakehouse com o Databricks') AS curso_uuid,
            'Do Primeiro Pipeline ao Data Lakehouse com o Databricks' AS nome_curso,
            'intermediate' AS nivel_curso,
            659.90 AS valor_curso,
            getdate() AS data_carga

        UNION 

        SELECT 
            md5('Construindo Pipelines de Dados usando o Spark Structured Streaming') AS curso_uuid,
            'Construindo Pipelines de Dados usando o Spark Structured Streaming' AS nome_curso,
            'intermediate' AS nivel_curso,
            549.90 AS valor_curso,
            getdate() AS data_carga
""")

spark.sql('SELECT * FROM silver.tb_curso').display()