# 07 - Silver City Dimension: Full Enrichment

**Sonu√ß Tablo Yapƒ±sƒ±:**

| Kolon | A√ßƒ±klama |
|-------|----------|
| `city`, `country` | ≈ûehir bilgisi |
| `temperature_c`, `weather_code` | Hava durumu |
| `rate_eur` | Her zaman 1.0 (base) |
| `rate_usd`, `rate_gbp`, `rate_jpy`, `rate_try`, `rate_aed`, `rate_cny` | T√ºm kurlar |

**Neden bu yapƒ±?**
- `silver_bookings`'deki para birimi ne olursa olsun dim her zaman tam.
- Power BI'da dinamik para birimi filtresi yapƒ±labilir.
- Gold Layer'da `total_amount * rate_xxx` ile istenen kurda KPI √ºretilir.

In [1]:
from pyspark.sql.functions import col, upper, trim, initcap, lit, when, coalesce, regexp_replace, max as spark_max

DIM_TABLE = "silver_city_dim"

StatementMeta(, b6a0479e-9f10-4a8f-94ef-f0b3885b39f4, 3, Finished, Available, Finished)

## 1. Load Sources

In [2]:
df_bookings = spark.read.table("silver_bookings")
df_weather  = spark.read.table("bronze_weather")
df_rates    = spark.read.table("bronze_exchange_rates")

print(f"‚úÖ Bookings: {df_bookings.count()} | Weather: {df_weather.count()} | Rates: {df_rates.count()}")

StatementMeta(, b6a0479e-9f10-4a8f-94ef-f0b3885b39f4, 4, Finished, Available, Finished)

‚úÖ Bookings: 1209341 | Weather: 244 | Rates: 6


## 2. Extract Unique Cities

In [3]:
# Determine correct city column name
city_col = "city_clean" if "city_clean" in df_bookings.columns else "city"

df_cities = df_bookings.select(
    initcap(trim(regexp_replace(col(city_col), r'\.+$', ''))).alias("city"),
    initcap(trim(col("country"))).alias("country")
).distinct().filter(
    col("city").isNotNull() &
    (col("city") != "Unknown") &
    (col("city") != "")
)

print(f"üåÜ Unique Cities: {df_cities.count()}")
df_cities.show(5)

StatementMeta(, b6a0479e-9f10-4a8f-94ef-f0b3885b39f4, 5, Finished, Available, Finished)

üåÜ Unique Cities: 12272
+---------+--------+
|     city| country|
+---------+--------+
|Dubrovnik| Croatia|
| Toulouse|  France|
|     Giza| Unknown|
|   Patras|Malaysia|
| Lausanne| Denmark|
+---------+--------+
only showing top 5 rows



## 3. Add Weather Data

In [4]:
# Normalize weather city names
df_w = df_weather.select(
    initcap(trim(col("city"))).alias("w_city"),
    col("temperature_c"),
    col("weather_code")
).dropDuplicates(["w_city"])

# JOIN cities with weather
df_with_weather = df_cities.join(
    df_w,
    df_cities.city == df_w.w_city,
    how="left"
).drop("w_city")

matched = df_with_weather.filter(col("temperature_c").isNotNull()).count()
print(f"üå°Ô∏è Weather matched: {matched} / {df_cities.count()} cities")

StatementMeta(, b6a0479e-9f10-4a8f-94ef-f0b3885b39f4, 6, Finished, Available, Finished)

üå°Ô∏è Weather matched: 12134 / 12272 cities


## 4. Pivot All Exchange Rates as Separate Columns
Her ≈üehir i√ßin **t√ºm d√∂viz kurlarƒ±** ayrƒ± kolonlarda. Booking currency'den baƒüƒ±msƒ±z.

In [5]:
# Normalize rates table
df_r = df_rates.select(
    upper(trim(col("target_currency"))).alias("currency"),
    col("rate").alias("rate_value")
).dropDuplicates(["currency"])

# Show available currencies
print("üí± Available exchange rates:")
df_r.show()

# Manually pull each rate (robust approach, no dynamic pivot needed)
def get_rate(currency_code):
    """Returns the exchange rate for a currency, or None if not found."""
    row = df_r.filter(col("currency") == currency_code).first()
    return float(row["rate_value"]) if row else None

rate_eur = 1.0
rate_usd = get_rate("USD")
rate_gbp = get_rate("GBP")
rate_jpy = get_rate("JPY")
rate_try = get_rate("TRY")
rate_aed = get_rate("AED")
rate_cny = get_rate("CNY")

print(f"EUR=1.0 | USD={rate_usd} | GBP={rate_gbp} | JPY={rate_jpy} | TRY={rate_try} | AED={rate_aed} | CNY={rate_cny}")

StatementMeta(, b6a0479e-9f10-4a8f-94ef-f0b3885b39f4, 7, Finished, Available, Finished)

üí± Available exchange rates:
+--------+----------+
|currency|rate_value|
+--------+----------+
|     JPY|182.436915|
|     AED|  4.323083|
|     TRY|  51.56482|
|     USD|  1.177139|
|     CNY|  8.135807|
|     GBP|  0.874196|
+--------+----------+

EUR=1.0 | USD=1.177139 | GBP=0.874196 | JPY=182.436915 | TRY=51.56482 | AED=4.323083 | CNY=8.135807


## 5. Build Final Dim Table

In [6]:
# Add all rates as columns to every city row
df_dim = df_with_weather

df_dim = df_dim.withColumn("rate_eur", lit(rate_eur))
df_dim = df_dim.withColumn("rate_usd", lit(rate_usd))
df_dim = df_dim.withColumn("rate_gbp", lit(rate_gbp))
df_dim = df_dim.withColumn("rate_jpy", lit(rate_jpy))
df_dim = df_dim.withColumn("rate_try", lit(rate_try))
df_dim = df_dim.withColumn("rate_aed", lit(rate_aed))
df_dim = df_dim.withColumn("rate_cny", lit(rate_cny))

# Save
df_dim.write.format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(DIM_TABLE)

print(f"‚úÖ {DIM_TABLE} saved! ({df_dim.count()} cities)")
df_dim.show(10, truncate=False)

StatementMeta(, b6a0479e-9f10-4a8f-94ef-f0b3885b39f4, 8, Finished, Available, Finished)

‚úÖ silver_city_dim saved! (12272 cities)
+------------+---------------+-------------+------------+--------+--------+--------+----------+--------+--------+--------+
|city        |country        |temperature_c|weather_code|rate_eur|rate_usd|rate_gbp|rate_jpy  |rate_try|rate_aed|rate_cny|
+------------+---------------+-------------+------------+--------+--------+--------+----------+--------+--------+--------+
|Dubrovnik   |Croatia        |8.4          |1           |1.0     |1.177139|0.874196|182.436915|51.56482|4.323083|8.135807|
|Toulouse    |France         |8.7          |2           |1.0     |1.177139|0.874196|182.436915|51.56482|4.323083|8.135807|
|Giza        |Unknown        |11.4         |1           |1.0     |1.177139|0.874196|182.436915|51.56482|4.323083|8.135807|
|Patras      |Malaysia       |9.3          |0           |1.0     |1.177139|0.874196|182.436915|51.56482|4.323083|8.135807|
|Lausanne    |Denmark        |5.9          |2           |1.0     |1.177139|0.874196|182.436915|51