
### DLT Notes

We cannot make changes directly in dlt pipeline since we cannot see the data. 
We can only see the data once the pipeline is in development mode. This may cause some issues.

Instead: Read the dataframe, Make required Transformations and convert the code into DLT

Note: We cannot run a DLT pipeline with in the notebook. The Delta Live Tables (DLT) module is not supported on this cluster. You should either create a new pipeline or use an existing pipeline to run DLT code.

We can Create three things in DLT:
- Streaming table
- Materialized view
- Views -Normal or Streaming

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *

In [0]:
#Read the dataframe

df = spark.read.format("delta")\
          .load("/Volumes/workspace/bronze/bronzevolume/bookings/data")

display(df)

In [0]:
#Make required Transformations

df = df.withColumn("amount",col("amount").cast(DoubleType()))\
        .withColumn("ModifiedDate",current_timestamp())\
        .withColumn("booking_date",to_date(col("booking_date")))\
        .drop("_rescued_data")

display(df)

### Convert the code into DLT

In [0]:
#Load Data Incrementally

@dlt.table(
    name="stage_bookings"
)
def stage_bookings():
    return df

    df = spark.readStream.format("delta")\
            .load("/Volumes/workspace/bronze/bronzevolume/bookings/data")
    return df

In [0]:
#Create Streaming View

@dlt.view(
  name = "trans_bookings"
)
def trans_bookings():

    df = spark.readStream.table("stage_bookings")
    df = df.withColumn("amount",col("amount").cast(DoubleType()))\
            .withColumn("ModifiedDate",current_timestamp())\
            .withColumn("booking_date",to_date(col("booking_date")))\
            .drop("_rescued_data")

  return df

In [0]:
#Create Dictionary for rules

rules = {
    "rule1" : "booking_id IS NOT NULL",
    "rule2" : "passenger_id IS NOT NULL"
}

In [0]:
#Write data from Streaming view into a Streaming Table

@dlt.table(
    name = "silver_bookings"
)
#@dlt.expect_all(rules) #This will check if above rules are meeting. If not it will throw warning
@dlt.expect_all_or_drop(rules) #This will check if above rules are meeting. If not it drop records
#@dlt.expect_all_or_fail(rules) #This will check if above rules are meeting. If not it will fail
def silver_bookings():

    df = spark.read_stream("trans_bookings")
    return df