<img src="https://github.com/richardcerny/bricksflow/raw/rc-bricksflow2.1/docs/img/databricks_icon.png?raw=true" width=100/>
# Bricksflow example 3.

## Bricksflow Development flow - DEMO
There is a standard process of developing pipelines using Bricksflow. The aim is to use SW Engineering practices while still working in interactively in Databricks notebooks. Follow schema bellow while using Bricksflow. 

If you want to get more info about the process check Bricksflow User training video - [Link](https://web.microsoftstream.com/video/e8e3ed9b-7944-4ea2-b314-8f0694853fcf)
<img src="https://github.com/richardcerny/bricksflow/raw/rc-bricksflow2.1/docs/img/development-flow.png?raw=true" width=1200/>

## Common console commands you would use:
- `console dbx:deploy --env=dev` to upload notebooks & configs from local to Databricks
- `console dbx:workspace:export --env=dev` to download notebooks from Databricks to local

Tip: By executing `console` you get list of available commands that you can use

In [0]:
%run ../../../app/install_master_package

In [0]:
from pyspark.sql import functions as F
from datetime import datetime
from logging import Logger
from datalakebundle.table.TableManager import TableManager
from pyspark.sql import SparkSession
from pyspark.sql.dataframe import DataFrame
from databricksbundle.notebook.decorators import dataFrameLoader, transformation, dataFrameSaver
from datalakebundle.table.TableNames import TableNames

In [0]:
@dataFrameLoader(display=False)
def read_table_bronze_covid_tbl_template_1_mask_usage(spark: SparkSession, tableNames: TableNames):
    return (
        spark
            .read
            .table(tableNames.getByAlias('bronze_covid.tbl_template_1_mask_usage'))
    )

In [0]:
@transformation(read_table_bronze_covid_tbl_template_1_mask_usage, display=False)
def add_execution_datetime(df: DataFrame):
    return (
        df
             .withColumn('EXECUTE_DATETIME',F.lit(datetime.now()))
    )

In [0]:
@transformation("%myparameter.myvalue%", add_execution_datetime_df, display=True) # TODO bug  ... _df must be added while passing function name
def add_parameter_from_config(config_yaml_parameter, df: DataFrame):
    print(config_yaml_parameter)
    return (
        df
             .withColumn('CONFIG_YAML_PARAMETER',F.lit(config_yaml_parameter)) #todo  pipelineParams.randomVariable pipelineParams: Box
    )

COUNTYFP,NEVER,RARELY,SOMETIMES,FREQUENTLY,ALWAYS,INSERT_TS,EXECUTE_DATETIME,CONFIG_YAML_PARAMETER
1001,0.053,0.074,0.134,0.295,0.444,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1003,0.083,0.059,0.098,0.323,0.436,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1005,0.067,0.121,0.12,0.201,0.491,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1007,0.02,0.034,0.096,0.278,0.572,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1009,0.053,0.114,0.18,0.194,0.459,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1011,0.031,0.04,0.144,0.286,0.5,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1013,0.102,0.053,0.257,0.137,0.451,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1015,0.152,0.108,0.13,0.167,0.442,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1017,0.117,0.037,0.15,0.136,0.56,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value
1019,0.135,0.027,0.161,0.158,0.52,2021-01-13T09:32:20.557+0000,2021-01-13T09:37:17.965+0000,This is a sample string config value


In [0]:
@dataFrameSaver(add_parameter_from_config)
def save_table_silver_covid_tbl_template_3_mask_usage(df: DataFrame, logger: Logger, tableNames: TableNames, tableManager: TableManager):
    outputTableName = tableNames.getByAlias('silver_covid.tbl_template_3_mask_usage')
    if tableManager.exists('silver_covid.tbl_template_3_mask_usage'):
        logger.info(f"Table {outputTableName} exists. Appending...")
    else:
        tableManager.create('silver_covid.tbl_template_3_mask_usage')
    
    logger.info(f"Saving data to table: {outputTableName}")
    (
        df
            .select(
                'COUNTYFP',
                'NEVER',
                'RARELY',
                'SOMETIMES',
                'FREQUENTLY',
                'ALWAYS',
                'EXECUTE_DATETIME',
                'CONFIG_YAML_PARAMETER',
            )
            .write
            .option('partitionOverwriteMode', 'dynamic')
            .insertInto(outputTableName)
    )