### Stream Customers data from cloud to Delta Lake
#### 1. Read files from cloud storage using DataStreamReader API
#### 2. Transform the dataframe to add following columns.
- ##### Cloud file_path
- ##### Ingestion_date 
#### 3. Write the transformed data stream to Delta Lake table.      

#### 1. Read files from cloud storage using DataStreamReader API 

- ##### Streaming doesnt support schema evalution so schema for the tables has to be explicitly defined. 

In [0]:
from pyspark.sql.types import StructType, StructField, IntegerType, DateType, TimestampType, StringType

customers_schema = StructType(
                              fields = [StructField("Customer_Id", IntegerType()),
                                        StructField("Customer_Name", StringType()),
                                        StructField("date_of_birth", DateType()),
                                        StructField("telephone", StringType()),
                                        StructField("email", StringType()),
                                        StructField("member_since", DateType()),
                                        StructField("created_timestamp", TimestampType())
                                        ]
                              )

In [0]:
df_customers_stream = (spark
                        .readStream
                        .format('json')
                        .schema(customers_schema)
                        .load('/Volumes/gizmobox/landing/operations_volume/customers_stream/')
                      )
display(df_customers_stream)

#### 2. Transform the dataframe to add following columns.
- #### Cloud file_path
- #### Ingestion_date 

In [0]:
from pyspark.sql.functions import col, current_timestamp

df_customers_stream_transformed = (df_customers_stream
                                                    .withColumn('file_path', col('_metadata.file_path'))
                                                    .withColumn('ingestion_date', current_timestamp())
                                  )
display(df_customers_stream_transformed)

#### 3. Write the transformed data stream to Delta Lake table.

In [0]:
customers_streaming_query = (df_customers_stream_transformed
                                      .writeStream
                                      .format('delta')
                                      .option('checkpointLocation', '/Volumes/gizmobox/landing/operations_volume/customers_stream/_customers_checkpoint_location')
                                      .toTable('gizmobox.bronze.customers_stream')
                            )
display(customers_streaming_query)

- #### writeStream queries will not stop automatically. It has to be manually terminated. It can be manually terminated by clicking on `Terminate` icon in the query block or by using `.stop()` 

In [0]:
customers_streaming_query.stop()