# ü•á  GOLD LAYER

In [0]:
%run "./00 - DDL"


**We‚Äôll create three Gold tables:**

- `attendance_baseline` - Baseline attendance for each team and season (all games)
- `attendance_by_promo` - Attendance metrics for promotional games, joined to the baseline for lift calculations
- `attendance_by_team_promo_type` - Metrics by team, promo type and season


### 1. `attendance_baseline`
This is the foundation. It represents what normal attendance looks like for each team in a season, including both promotion and non-promotional games.
This helps us with questions like: ‚ÄúWhat‚Äôs the average crowd size when this team plays at home in a given season?‚Äù

### CREATE THE TABLE

In [0]:
spark.sql(
    f"""
CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BASELINE} (
    season INT COMMENT 'MLB season year (e.g., 2024). Used to compare trends across seasons.',
    home_team_name STRING COMMENT 'Full name of the home team (e.g., "New York Yankees"). Defines team identity for aggregation.',
    season_baseline_attendance DOUBLE COMMENT 'Average number of attendees per home game for the given team and season. Rounded to the nearest whole number.',
    total_home_games BIGINT COMMENT 'Count of distinct home games played by the team in that season for which valid attendance data exists.'
)
COMMENT 'Gold table defining each MLB team‚Äôs baseline home-game attendance by season. Serves as the benchmark for measuring attendance lift from promotions or events.';
          """
)

### POPULATE THE TABLE

In [0]:
# parameterize
spark.sql(f"""
INSERT OVERWRITE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BASELINE}
SELECT
    season,
    home_team_name,
    ROUND(AVG(attendance), 0) AS season_baseline_attendance,
    COUNT(DISTINCT gamePk) AS total_home_games
FROM {CATALOG}.{SILVER_SCHEMA}.{SILVER_GAMES_TABLE_ENRICHED}
WHERE attendance IS NOT NULL AND attendance > 0
GROUP BY season, home_team_name;
""")

This table becomes the ‚Äúyardstick‚Äù we‚Äôll compare promotional games against. It tells us what a ‚Äúnormal‚Äù game looks like for each team that season.

### 2. `attendance_by_promo`

Create the Promotion-Level Aggregation. We use the exploded promotions view (promotions_exploded) - where each row represents a game‚Äìpromotion pair ‚Äî and join it to the baseline.

### CREATE TABLE SCHEMA WITH DOCUMENTATION

In [0]:
spark.sql(f"""
CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO} (
    season STRING COMMENT 'MLB season year (e.g., 2024). Used to compare trends across seasons.',
    home_team_name STRING COMMENT 'Full name of the home team (e.g., "Chicago Cubs"). Defines the aggregation level.',
    venue_name STRING COMMENT 'Ballpark or stadium where the game was played.',
    promotion_type STRING COMMENT 'Type of promotion (e.g., Giveaway, Fireworks, Theme Game).',
    is_weekend BOOLEAN COMMENT 'TRUE if the game occurred on a Saturday or Sunday.',
    dayNight STRING COMMENT 'Game time classification ‚Äî either "day" or "night".',
    avg_attendance DOUBLE COMMENT 'Average attendance for games of this promotion type and context.',
    avg_opponent_popularity DOUBLE COMMENT 'Average attendance drawn by opponents, used as a proxy for opponent popularity.',
    avg_home_win_pct DOUBLE COMMENT 'Average home team win percentage at the time of the promotion.',
    season_baseline_attendance DOUBLE COMMENT 'Baseline attendance for the team and season, taken from the attendance_baseline table.',
    attendance_lift DOUBLE COMMENT 'Absolute difference between promotion game attendance and the team‚Äôs baseline attendance.',
    attendance_lift_pct DOUBLE COMMENT 'Relative increase in attendance (percentage) over the team‚Äôs baseline.',
    num_games BIGINT COMMENT 'Number of home games with this promotion type and context observed.'
)
COMMENT 'Gold table aggregating MLB attendance metrics by promotion type, comparing promotional game attendance to team baselines to calculate attendance lift and lift percentage.';
""")


In [0]:
spark.sql(f"""
INSERT OVERWRITE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO}
SELECT
  p.season,
  p.home_team_name,
  p.venue_name,
  p.promotion_type,
  p.is_weekend,
  p.dayNight,
  ROUND(AVG(p.attendance), 0) AS avg_attendance,
  ROUND(AVG(p.opponent_avg_attendance), 0) AS avg_opponent_popularity,
  ROUND(AVG(p.home_team_win_pct), 3) AS avg_home_win_pct,
  b.season_baseline_attendance,
  ROUND(AVG(p.attendance) - b.season_baseline_attendance, 0) AS attendance_lift,
  ROUND(
    100 * (AVG(p.attendance) - b.season_baseline_attendance) / b.season_baseline_attendance,
    1
  ) AS attendance_lift_pct,
  COUNT(DISTINCT p.gamePk) AS num_games
FROM {CATALOG}.{SILVER_SCHEMA}.{SILVER_PROMOTIONS_VIEW} p
JOIN {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BASELINE} b
  ON p.season = b.season
 AND p.home_team_name = b.home_team_name
GROUP BY
  p.season,
  p.home_team_name,
  p.venue_name,
  p.promotion_type,
  p.is_weekend,
  p.dayNight,
  b.season_baseline_attendance;
  """)





**What‚Äôs happening here:**

- We average attendance per combination of:
  - team
  - season
  - venue
  - promotion type
  - weekend flag
  - time of day

- We compare that to each team‚Äôs baseline for the same season.

- The difference is your **attendance lift**.

- And the percentage version is **attendance_lift_pct**.

### 3. `attendance_by_team_promo_type`

This table aggregates attendance lift by team and promotion type, showing the average increase in attendance and percentage lift compared to baseline, across all home games for each team and season.

In [0]:
spark.sql(f"""
CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_TEAM_AND_PROMO_TYPE} (
    season STRING COMMENT 'MLB season year (e.g., 2024). Defines the competitive year of play.',
    home_team_name STRING COMMENT 'Full name of the home team (e.g., "Los Angeles Dodgers"). Aggregation level for this dataset.',
    promotion_type STRING COMMENT 'Type of promotional event (e.g., Giveaway, Fireworks, Theme Game, Ticket Offer).',
    avg_lift DOUBLE COMMENT 'Average attendance increase (in absolute number of fans) across all games of this promotion type for the team in that season.',
    avg_lift_pct DOUBLE COMMENT 'Average percentage increase in attendance compared to the team‚Äôs baseline attendance for that season.',
    total_games BIGINT COMMENT 'Total number of home games analyzed for the given team, season, and promotion type.'
)
COMMENT 'Gold table summarizing team-level performance by promotion type, showing average attendance lift and lift percentage relative to baseline metrics.';
""")


In [0]:
spark.sql(f"""
INSERT OVERWRITE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_TEAM_AND_PROMO_TYPE}
SELECT
  season,
  home_team_name,
  promotion_type,
  ROUND(AVG(attendance_lift), 0) AS avg_lift,
  ROUND(AVG(attendance_lift_pct), 1) AS avg_lift_pct,
  SUM(num_games) AS total_games
FROM {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO}
GROUP BY season, home_team_name, promotion_type;
"""
)



This is useful for dashboards ‚Äî for instance:

‚ÄúAcross all home games, fireworks promotions boosted attendance by +2,000 fans on average for the Atlanta Braves.‚Äù