# ü•á  GOLD LAYER

In [0]:
%run "./00 - DDL"


**We‚Äôll create two Gold tables:**

- `attendance_baseline` - Baseline attendance for each team and season (all games)
- `attendance_by_promo` - Attendance metrics for promotional games, joined to the baseline for lift calculations


### attendance_baseline
This is the foundation. It represents what normal attendance looks like for each team in a season, including both promotion and non-promotional games.
This helps us with questions like: ‚ÄúWhat‚Äôs the average crowd size when this team plays at home in a given season?‚Äù

In [0]:
# schema

In [0]:
# parameterize
spark.sql(f"""
CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BASELINE} AS
SELECT
    season,
    home_team_name,
    ROUND(AVG(attendance), 0) AS team_avg_attendance,
    COUNT(DISTINCT gamePk) AS total_home_games
FROM {CATALOG}.{SILVER_SCHEMA}.all_games_enriched
WHERE attendance IS NOT NULL AND attendance > 0
GROUP BY season, home_team_name;
""")

This table becomes the ‚Äúyardstick‚Äù we‚Äôll compare promotional games against. It tells us what a ‚Äúnormal‚Äù game looks like for each team that season.

### attendance_by_promo

Create the Promotion-Level Aggregation. We use the exploded promotions view (promotions_exploded) - where each row represents a game‚Äìpromotion pair ‚Äî and join it to the baseline.

In [0]:
# schema

In [0]:
spark.sql(f"""
CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO} AS
SELECT
  p.season,
  p.home_team_name,
  p.venue_name,
  p.promotion_type,
  p.is_weekend,
  p.dayNight,
  ROUND(AVG(p.attendance), 0) AS avg_attendance,
  ROUND(AVG(p.opponent_avg_attendance), 0) AS avg_opponent_popularity,
  ROUND(AVG(p.home_team_win_pct), 3) AS avg_home_win_pct,
  b.team_avg_attendance,
  ROUND(AVG(p.attendance) - b.team_avg_attendance, 0) AS attendance_lift,
  ROUND(
    100 * (AVG(p.attendance) - b.team_avg_attendance) / b.team_avg_attendance,
    1
  ) AS attendance_lift_pct,
  COUNT(DISTINCT p.gamePk) AS num_games
FROM {CATALOG}.{SILVER_SCHEMA}.{SILVER_PROMOTIONS_VIEW} p
JOIN {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BASELINE} b
  ON p.season = b.season
 AND p.home_team_name = b.home_team_name
GROUP BY
  p.season,
  p.home_team_name,
  p.venue_name,
  p.promotion_type,
  p.is_weekend,
  p.dayNight,
  b.team_avg_attendance;
  """)





**What‚Äôs happening here:**

- We average attendance per combination of:
  - team
  - season
  - venue
  - promotion type
  - weekend flag
  - time of day

- We compare that to each team‚Äôs baseline for the same season.

- The difference is your **attendance lift**.

- And the percentage version is **attendance_lift_pct**.

### Summarize by team and promotion type

In [0]:
spark.sql(f"""
CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_TEAM_AND_PROMO_TYPE} AS
SELECT
  season,
  home_team_name,
  promotion_type,
  ROUND(AVG(attendance_lift), 0) AS avg_lift,
  ROUND(AVG(attendance_lift_pct), 1) AS avg_lift_pct,
  SUM(num_games) AS total_games
FROM {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO}
GROUP BY season, home_team_name, promotion_type;
"""
)



This is useful for dashboards ‚Äî for instance:

‚ÄúAcross all home games, fireworks promotions boosted attendance by +2,000 fans on average for the Atlanta Braves.‚Äù

In [0]:
# spark.sql(f"""
# CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO} (
#     season INT COMMENT 'MLB season year (e.g., 2024).',
#     home_team_name STRING COMMENT 'Home team name.',
#     venue_name STRING COMMENT 'Name of ballpark where game took place.',
#     promotion_types STRING COMMENT 'High-level promotion category (e.g., Giveaway, Theme Game).',
#     offer_type STRING COMMENT 'Operational classification (e.g., Day of Game Highlight).',
#     offer_name STRING COMMENT 'Specific offer name tied to this promotion.',
#     day_of_week STRING COMMENT 'Day of the week (Monday‚ÄìSunday).',
#     is_weekend BOOLEAN COMMENT 'True if game occurred on Saturday or Sunday.',
#     dayNight STRING COMMENT 'Day vs Night indicator.',
#     avg_attendance DOUBLE COMMENT 'Average attendance for games matching this group.',
#     team_avg_attendance DOUBLE COMMENT 'Average attendance for the same team and season (baseline).',
#     attendance_lift DOUBLE COMMENT 'Difference between promotion group attendance and team average.',
#     num_games BIGINT COMMENT 'Number of games in this aggregation.',
#     num_promotions BIGINT COMMENT 'Total distinct promotions counted in this aggregation.'
# )
# COMMENT 'Gold table summarizing attendance trends by promotion type and context. Used for dashboards and metric views.';
# """
# )


#### Your Semantic Layer

This metric view acts as the single source of truth for all measures used by Genie and AI/BI. Here, you define display names, formats, and synonyms to make your data conversationally accessible. For example, users can ask, ‚Äúshow me attendance lift by giveaway type,‚Äù and receive accurate, context-aware results.

In [0]:
# spark.sql(f"""
# CREATE OR REPLACE VIEW {CATALOG}.{SEMANTIC_SCHEMA}.{SEMANTIC_ATTENDANCE_IMPACT}
# WITH METRICS
# LANGUAGE YAML
# AS $$
# version: 1.1
# comment: "Unified MLB attendance impact metrics by promotion type, team, and season."
# source: {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO}

# dimensions:
#   - name: Season
#     expr: season
#     comment: "MLB season year (e.g., 2024)."

#   - name: Team
#     expr: home_team_name
#     comment: "Home team name."
#     synonyms: ['home team', 'franchise']

#   - name: Venue
#     expr: venue_name
#     comment: "Ballpark or stadium where the game took place."

#   - name: Promotion Type
#     expr: promotion_types
#     comment: "High-level category of promotion (e.g., Giveaway, Theme Game)."
#     synonyms: ['promotion category', 'promo type', 'event type']

#   - name: Offer Type
#     expr: offer_type
#     comment: "Operational classification of promotion (e.g., Day of Game Highlight)."

#   - name: Offer Name
#     expr: offer_name
#     comment: "Specific name of the promotion or giveaway."

#   - name: Day of Week
#     expr: day_of_week
#     comment: "Day on which the game was played."
#     synonyms: ['weekday', 'game day']

#   - name: Day/Night
#     expr: dayNight
#     comment: "Whether the game was played during the day or at night."

# measures:
#   - name: Average Attendance
#     expr: AVG(avg_attendance)
#     comment: "Average attendance for this group of games."
#     format:
#       type: number
#       decimal_places:
#         type: exact
#         places: 0

#   - name: Team Average Attendance
#     expr: AVG(team_avg_attendance)
#     comment: "Baseline average attendance for the same team and season."

#   - name: Attendance Lift
#     expr: AVG(attendance_lift)
#     comment: "Difference between group attendance and the team's seasonal average."
#     display_name: "Attendance Lift vs Team Average"
#     format:
#       type: number
#       decimal_places:
#         type: exact
#         places: 0
#       abbreviation: compact

#   - name: Number of Games
#     expr: SUM(num_games)
#     comment: "Count of games represented in this aggregation."
#     synonyms: ['games played']

#   - name: Number of Promotions
#     expr: SUM(num_promotions)
#     comment: "Count of distinct promotions across games."
#     synonyms: ['promo count', 'unique offers']

#   - name: Attendance Lift % 
#     expr: MEASURE(`Attendance Lift`) / MEASURE(`Team Average Attendance`)
#     comment: "Relative percentage increase in attendance compared to team average."
#     format:
#       type: percentage
#       decimal_places:
#         type: exact
#         places: 1
# $$;
# """
# )