# Silver Dim_date

**Tipo de tabla:** Maestro de fechas

**Destino:** `dim_customer`

## Lectura de datos parquet de Bronze

In [None]:
from pyspark.sql import SparkSession
from dotenv import load_dotenv
import os

load_dotenv("/home/jovyan/work/.env")
spark = SparkSession.builder.appName("silver_date").getOrCreate()

bronze_path = os.getenv("BRONZE_PATH")
silver_path = os.getenv("SILVER_PATH")


Se genera nuevo Dataframe de spark desde tablas bronze.

## Limpieza de Silver
### Reglas de Limpieza aplicada:
* Tipos Correctos
* Eliminacion de Duplicados por customer_id
* Evitar * en selects

### Reglas de Transformacion:
* Se agrega dato dummy para consistencia en productos no existentes
* Estandarizacion en lenguaje de nombres de columnas

In [9]:
# query de limpieza de datos para silver
query = """
WITH date_range AS (
    SELECT sequence(
        to_date('2023-01-01'),
        to_date('2025-12-31'),
        interval 1 day
    ) AS dates
),
calendar AS (
    SELECT explode(dates) AS date FROM date_range
)
SELECT
    CAST(date_format(date, 'yyyyMMdd') AS INT) AS date_id,
    date as date,
    year(date) AS year,
    month(date) AS month,
    day(date) AS day,
    weekofyear(date) AS week_of_year,
    date_format(date, 'yyyyMM') AS period,
    CAST(date_format(date, 'yyyyMM') AS INT) AS period_int,
    concat(year(date), '-', lpad(month(date),2,'0')) AS year_month,
    quarter(date) AS quarter,
    date_format(date, 'EEEE') AS day_name,
    date_format(date, 'E') AS day_name_short,
    dayofweek(date) AS day_of_week,   -- 1 = domingo
    dayofyear(date) AS day_of_year,
    CASE WHEN dayofweek(date) IN (1,7) THEN 1 ELSE 0 END AS is_weekend,
    CASE WHEN date = current_date() THEN 1 ELSE 0 END AS is_today,
    CASE WHEN month(date) IN (1,4,7,10) THEN 1 ELSE 0 END AS is_quarter_start,
    CASE WHEN day(date) = 1 THEN 1 ELSE 0 END AS is_month_start,
    last_day(date) AS month_end_date,
    CASE WHEN date = last_day(date) THEN 1 ELSE 0 END AS is_month_end
FROM calendar
ORDER BY date
"""

# Execute the SQL query and get the result as a new DataFrame
sql_result_df = spark.sql(query)

# Display the results
sql_result_df.printSchema()
sql_result_df.show()

root
 |-- date_id: integer (nullable = true)
 |-- date: date (nullable = false)
 |-- year: integer (nullable = false)
 |-- month: integer (nullable = false)
 |-- day: integer (nullable = false)
 |-- week_of_year: integer (nullable = false)
 |-- period: string (nullable = false)
 |-- period_int: integer (nullable = true)
 |-- year_month: string (nullable = false)
 |-- quarter: integer (nullable = false)
 |-- day_name: string (nullable = false)
 |-- day_name_short: string (nullable = false)
 |-- day_of_week: integer (nullable = false)
 |-- day_of_year: integer (nullable = false)
 |-- is_weekend: integer (nullable = false)
 |-- is_today: integer (nullable = false)
 |-- is_quarter_start: integer (nullable = false)
 |-- is_month_start: integer (nullable = false)
 |-- month_end_date: date (nullable = false)
 |-- is_month_end: integer (nullable = false)

+--------+----------+----+-----+---+------------+------+----------+----------+-------+---------+--------------+-----------+-----------+-----

## Escritura de datos
* Todo se escribe en parquet, en carpetas de silver.
* Se escribe con metodo upsert para posteriores ingestas masivas de datos
* Posibilidad de realizar SCD para preservar cambios historicos en dimensiones

**Mejora** utilziar deltas tables para control de merges y log (ACID)

In [None]:
# Escritura de dim_product en silver como parquet
output_path = os.path.join(silver_path, "dim_date.parquet")

sql_result_df.write.mode("overwrite").parquet(output_path)

#en caso de necesitar, se puede cambiar logica a SCD para preservar historia

## Validaciones.

In [11]:
summary_stats = spark.read.parquet(output_path).describe()
summary_stats.show()

+-------+--------------------+------------------+------------------+------------------+------------------+------------------+------------------+----------+------------------+---------+--------------+------------------+------------------+-------------------+--------------------+-------------------+--------------------+--------------------+
|summary|             date_id|              year|             month|               day|      week_of_year|            period|        period_int|year_month|           quarter| day_name|day_name_short|       day_of_week|       day_of_year|         is_weekend|            is_today|   is_quarter_start|      is_month_start|        is_month_end|
+-------+--------------------+------------------+------------------+------------------+------------------+------------------+------------------+----------+------------------+---------+--------------+------------------+------------------+-------------------+--------------------+-------------------+--------------------