## Create a Databricks notebook to load a Parquet file to the order delta table

In [0]:
# Read product catalog JSON file into a Spark DataFrame
filePath = "dbfs:/FileStore/GlobalRetail/bronze_layer/transaction/transaction_snappy.parquet"
df = spark.read.parquet(filePath)
display(df)

In [0]:
# Convert 'transaction_date' column to timestamp type
from pyspark.sql.functions import to_timestamp, col
new_df = df.withColumn("transaction_date", to_timestamp(col("transaction_date")))
display(new_df)

In [0]:
# Add ingestion timestamp column to the DataFrame
from pyspark.sql.functions import current_timestamp
final_df = new_df.withColumn("ingestion_timestamp", current_timestamp())
display(final_df)

- We store the data in a Delta Lake table, which serves as the foundational table format.
- Delta Lake enables us to insert, modify, merge, and remove data, while also supporting ACID transactions.

In [0]:
# Write the DataFrame to a Delta table in append mode
spark.sql("use globalretail_bronze")
final_df.write.format("delta").mode("append").saveAsTable("bronze_transactions")

In [0]:
spark.sql("select * from bronze_transactions limit 180").show()

- After loading the data from our CSV file into the Delta Lake table, we need to move the processed file from the current folder to an archive folder to avoid reprocessing it.

In [0]:
# Generate archive file path with current timestamp for archiving processed customer data
import datetime
archive_folder= "dbfs:/FileStore/GlobalRetail/bronze_layer/transaction/archive/"
archive_filepath = archive_folder + '_'+datetime.datetime.now().strftime("%Y%m%d%H%M%s")
dbutils.fs.mv(filePath, archive_filepath)
print(archive_filepath)