### Spark session configuration
This cell enables two Fabric features that optimize how data is written and read in subsequent cells: V-order and Optimize Write.
Run this cell.

In [None]:
spark.conf.set("spark.sql.parquet.vorder.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true")
spark.conf.set("spark.microsoft.delta.optimizeWrite.binSize", "1073741824")

### Fact - Sale
This cell reads raw parquet data from the `wwi-raw-data` folder, adds date part columns (`Year`, `Quarter`, and `Month`), and writes the data as a Delta table partitioned by `Year` and `Quarter`.
Run this cell.

In [None]:
from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

df = spark.read.format("parquet").load('Files/wwi-raw-data/full/fact_sale_1y_full')
df = df.withColumn('Year', year(col("InvoiceDateKey")))
df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))
df = df.withColumn('Month', month(col("InvoiceDateKey")))

df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("Tables/" + table_name)

### Dimensions
This cell loads the dimension tables that provide descriptive context for the fact table. It defines a function that reads raw parquet data, drops the unused `Photo` column, and writes each table as a Delta table.
It then loops through `dimension_city`, `dimension_customer`, `dimension_date`, `dimension_employee`, and `dimension_stock_item` to create Delta tables.
Run this cell.

In [None]:
from pyspark.sql.types import *

def loadFullDataFromSource(table_name):
    df = spark.read.format("parquet").load('Files/wwi-raw-data/full/' + table_name)
    df = df.drop("Photo")
    df.write.mode("overwrite").format("delta").save("Tables/" + table_name)

full_tables = [
    'dimension_city',
    'dimension_customer',
    'dimension_date',
    'dimension_employee',
    'dimension_stock_item'
    ]

for table in full_tables:
    loadFullDataFromSource(table)