# Silver → Gold: Date Dimension

## Purpose
Create calendar dimension table for time-based analytics

## Source
- Generated from fact_orders date range

## Output
- Gold: `dim_date`
- Grain: One row per date
- Range: 2016-09-04 to 2018-10-17 (covers all orders)

## Attributes
- Date components (year, quarter, month, day, weekday)
- Fiscal periods
- Business day flags
- Month/quarter names

**Author:** Kevin  
**Date:** Feb 9, 2026


In [0]:
from pyspark.sql.functions import (
    col, date_format, dayofmonth, dayofweek, dayofyear,
    weekofyear, year, month, quarter, when, current_timestamp,
    expr, sequence, explode, to_date
)

storage_account_name = "stgolistmigration"
account_key = ""

spark.conf.set(
    f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
    account_key
)

def get_gold_path(table):
    return f"abfss://gold@{storage_account_name}.dfs.core.windows.net/{table}/"

print("✅ Config loaded")


✅ Config loaded


In [0]:
print("📅 Generating date dimension...")

# Generate date range from 2016-09-01 to 2018-12-31 (covers all Olist orders)
df_dates = spark.sql("""
    SELECT explode(sequence(to_date('2016-09-01'), to_date('2018-12-31'), interval 1 day)) as date_key
""")

print(f"✅ Generated: {df_dates.count():,} dates")


📅 Generating date dimension...
✅ Generated: 852 dates


In [0]:
print("🔄 Adding date attributes...")

df_dim_date = df_dates \
    .withColumn("date_id", date_format(col("date_key"), "yyyyMMdd").cast("int")) \
    .withColumn("year", year(col("date_key"))) \
    .withColumn("quarter", quarter(col("date_key"))) \
    .withColumn("month", month(col("date_key"))) \
    .withColumn("month_name", date_format(col("date_key"), "MMMM")) \
    .withColumn("day", dayofmonth(col("date_key"))) \
    .withColumn("day_of_week", dayofweek(col("date_key"))) \
    .withColumn("day_name", date_format(col("date_key"), "EEEE")) \
    .withColumn("day_of_year", dayofyear(col("date_key"))) \
    .withColumn("week_of_year", weekofyear(col("date_key"))) \
    .withColumn("is_weekend", 
        when(col("day_of_week").isin(1, 7), True).otherwise(False)
    ) \
    .withColumn("quarter_name", 
        expr("concat('Q', quarter)")
    ) \
    .withColumn("year_month", date_format(col("date_key"), "yyyy-MM")) \
    .withColumn("year_quarter", 
        expr("concat(year, '-Q', quarter)")
    ) \
    .withColumn("dim_load_timestamp", current_timestamp())

print(f"✅ Date dimension created: {df_dim_date.count():,} dates")

# Show sample
df_dim_date.limit(3).show(truncate=False, vertical=True)


🔄 Adding date attributes...
✅ Date dimension created: 852 dates
-RECORD 0----------------------------------------
 date_key           | 2016-09-01                 
 date_id            | 20160901                   
 year               | 2016                       
 quarter            | 3                          
 month              | 9                          
 month_name         | September                  
 day                | 1                          
 day_of_week        | 5                          
 day_name           | Thursday                   
 day_of_year        | 245                        
 week_of_year       | 35                         
 is_weekend         | false                      
 quarter_name       | Q3                         
 year_month         | 2016-09                    
 year_quarter       | 2016-Q3                    
 dim_load_timestamp | 2026-02-09 13:23:34.421156 
-RECORD 1----------------------------------------
 date_key           | 2016-09-02    

In [0]:
output_path = get_gold_path("dim_date")

print(f"💾 Writing to: {output_path}")

df_dim_date.write \
    .format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .save(output_path)

print("✅ dim_date complete!")
print(f"   Rows: {df_dim_date.count():,} dates")


💾 Writing to: abfss://gold@stgolistmigration.dfs.core.windows.net/dim_date/
✅ dim_date complete!
   Rows: 852 dates


In [0]:
print("🔍 Verifying...")

df_verify = spark.read.format("delta").load(output_path)

print(f"✅ Verified: {df_verify.count():,} dates")
print(f"   Date range: {df_verify.agg({'date_key': 'min'}).collect()[0][0]} to {df_verify.agg({'date_key': 'max'}).collect()[0][0]}")

print("\nSample dates:")
df_verify.select("date_key", "year", "month_name", "day_name", "is_weekend").limit(10).show()

print("\nWeekend vs Weekday:")
df_verify.groupBy("is_weekend").count().show()

print("🎉 dim_date → Gold complete!")


🔍 Verifying...
✅ Verified: 852 dates
   Date range: 2016-09-01 to 2018-12-31

Sample dates:
+----------+----+----------+---------+----------+
|  date_key|year|month_name| day_name|is_weekend|
+----------+----+----------+---------+----------+
|2016-09-01|2016| September| Thursday|     false|
|2016-09-02|2016| September|   Friday|     false|
|2016-09-03|2016| September| Saturday|      true|
|2016-09-04|2016| September|   Sunday|      true|
|2016-09-05|2016| September|   Monday|     false|
|2016-09-06|2016| September|  Tuesday|     false|
|2016-09-07|2016| September|Wednesday|     false|
|2016-09-08|2016| September| Thursday|     false|
|2016-09-09|2016| September|   Friday|     false|
|2016-09-10|2016| September| Saturday|      true|
+----------+----+----------+---------+----------+


Weekend vs Weekday:
+----------+-----+
|is_weekend|count|
+----------+-----+
|      true|  244|
|     false|  608|
+----------+-----+

🎉 dim_date → Gold complete!
