# Gold Layer – Sales Aggregation for Reporting

This notebook aggregates cleaned transaction data from the Silver layer into key business metrics for analytical use.  
The following fields are computed:

- Total daily sales (`TotalSales`)
- Number of items sold per day (`TotalItems`)
- Average order value (`AvgOrderValue`)

The resulting dataset is stored as a Gold Delta table named `gold_sales`, which is optimized for business intelligence consumption.


## Step 1: Load Cleaned Data from Silver Table

The cleaned and validated dataset is loaded from the Delta table `silver_sales`.  
This dataset serves as the basis for generating aggregated business metrics in the Gold layer.


In [0]:
# Load the Silver Delta table
df_silver = spark.read.format("delta").table("silver_sales")

# Preview the data
df_silver.display()


## Step 2: Aggregate Key Sales Metrics by Day

Business-level sales metrics are generated by aggregating cleaned transaction data on a daily basis.  
The `InvoiceDate` is converted to a standard date format to enable date-level grouping.  
The following indicators are computed for each day:

- `TotalSales`: Total revenue generated
- `TotalItems`: Total number of items sold
- `AvgOrderValue`: Average value per transaction


In [0]:
from pyspark.sql.functions import sum, countDistinct, to_date, col

# Extract date only from InvoiceDate string column
df_silver = df_silver.withColumn("InvoiceDate", to_date(col("InvoiceDate")))

# Group by date and compute metrics
df_gold = df_silver.groupBy("InvoiceDate").agg(
    sum("SalesAmount").alias("TotalSales"),
    sum("Quantity").alias("TotalItems"),
    (sum("SalesAmount") / countDistinct("Invoice")).alias("AvgOrderValue")
)

# Preview results
df_gold.display()


## Step 3: Save Aggregated Metrics to Gold Delta Table

The final aggregated dataset is saved as a Delta table named `gold_sales`.  
This Gold layer table provides key business indicators in a format optimized for dashboarding, reporting, and executive decision-making.


In [0]:
# Save the result as a Gold Delta Table
df_gold.write.format("delta").mode("overwrite").saveAsTable("gold_sales")
