# Stream Customers Data From Cloud Files to Delta Lake using Auto Loader
1. Read files from cloud storage using Auto Loader
2. Transform the dataframe to add the following columns
- file_path: Cloud File Path
- ingestion date: Current Timestamp
3. Write the Transformed data stream to Delta Lake Table

### 1. Read files from cloud storage using Auto Loader

In [0]:
customers_df = spark.readStream \
    .format("cloudFiles") \
    .option("cloudFiles.format", "json") \
    .option("cloudFiles.schemaLocation", "/Volumes/gizmobox/landing/operational_data/customers_autoloader/_schema") \
    .option("cloudFiles.inferColumnTypes", "true") \
    .option("cloudFiles.schemaHints", "date_of_birth date, member_since date, created_timestamp timestamp") \
    .option("cloudFiles.schemaEvolutionMode", "rescue") \
    .load('/Volumes/gizmobox/landing/operational_data/customers_autoloader/')

### 2. Transform the dataframe to add the following columns
- file_path: Cloud File Path
- ingestion date: Current Timestamp

In [0]:
from pyspark.sql.functions import *
customers_transformed_df = customers_df.withColumn("file_path", col("_metadata.file_path")) \
                                        .withColumn("ingestion_date", current_timestamp())

### 3. Write the Transformed data stream to Delta Lake Table

In [0]:
streaming_query = customers_transformed_df.writeStream \
    .format("delta") \
        .option("checkpointLocation", "/Volumes/gizmobox/landing/operational_data/customers_autoloader/_checkpoint_stream") \
        .option("mergeSchema", "true") \
        .toTable("gizmobox.bronze.customers_autoloader")

In [0]:
streaming_query.stop()

In [0]:
%sql
select *
from gizmobox.bronze.customers_autoloader;