In [0]:
%run "/Workspace/Users/suhmacc@fastmail.com/00 - DDL"


### STAGE 1: GOLD TABLE `gold.attendance_by_promo_type`

Aggregates attendance by promotion type, team, day, and season for fast analytics and LLM-ready summaries.

In [0]:
spark.sql(f"""
CREATE OR REPLACE TABLE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO} (
    season INT COMMENT 'MLB season year (e.g., 2024).',
    home_team_name STRING COMMENT 'Home team name.',
    venue_name STRING COMMENT 'Name of ballpark where game took place.',
    promotion_types STRING COMMENT 'High-level promotion category (e.g., Giveaway, Theme Game).',
    offer_type STRING COMMENT 'Operational classification (e.g., Day of Game Highlight).',
    offer_name STRING COMMENT 'Specific offer name tied to this promotion.',
    day_of_week STRING COMMENT 'Day of the week (Monday–Sunday).',
    is_weekend BOOLEAN COMMENT 'True if game occurred on Saturday or Sunday.',
    dayNight STRING COMMENT 'Day vs Night indicator.',
    avg_attendance DOUBLE COMMENT 'Average attendance for games matching this group.',
    team_avg_attendance DOUBLE COMMENT 'Average attendance for the same team and season (baseline).',
    attendance_lift DOUBLE COMMENT 'Difference between promotion group attendance and team average.',
    num_games BIGINT COMMENT 'Number of games in this aggregation.',
    num_promotions BIGINT COMMENT 'Total distinct promotions counted in this aggregation.'
)
COMMENT 'Gold table summarizing attendance trends by promotion type and context. Used for dashboards and metric views.';
"""
)

### POPULATE GOLD TABLE

In [0]:
spark.sql(f"""
INSERT OVERWRITE {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO}
SELECT
    season,
    home_team_name,
    venue_name,
    promotion_types,
    offer_type,
    offer_name,
    day_of_week,
    is_weekend,
    dayNight,
    AVG(attendance) AS avg_attendance,
    AVG(AVG(attendance)) OVER (PARTITION BY home_team_name, season) AS team_avg_attendance,
    AVG(attendance) - AVG(AVG(attendance)) OVER (PARTITION BY home_team_name, season) AS attendance_lift,
    COUNT(DISTINCT gamePk) AS num_games,
    COUNT(DISTINCT offer_name) AS num_promotions
FROM {CATALOG}.{SILVER_SCHEMA}.{ONLY_GAMES_WITH_PROMOS_CLEAN}
GROUP BY
    season,
    home_team_name,
    venue_name,
    promotion_types,
    offer_type,
    offer_name,
    day_of_week,
    is_weekend,
    dayNight;
"""
)

### STAGE 2: METRIC VIEW `mlb_attendance_impact`


#### Your Semantic Layer

This metric view acts as the single source of truth for all measures used by Genie and AI/BI. Here, you define display names, formats, and synonyms to make your data conversationally accessible. For example, users can ask, “show me attendance lift by giveaway type,” and receive accurate, context-aware results.

In [0]:
spark.sql(f"""
CREATE OR REPLACE VIEW {CATALOG}.{SEMANTIC_SCHEMA}.{SEMANTIC_ATTENDANCE_IMPACT}
WITH METRICS
LANGUAGE YAML
AS $$
version: 1.1
comment: "Unified MLB attendance impact metrics by promotion type, team, and season."
source: {CATALOG}.{GOLD_SCHEMA}.{GOLD_ATTENDANCE_BY_PROMO}

dimensions:
  - name: Season
    expr: season
    comment: "MLB season year (e.g., 2024)."

  - name: Team
    expr: home_team_name
    comment: "Home team name."
    synonyms: ['home team', 'franchise']

  - name: Venue
    expr: venue_name
    comment: "Ballpark or stadium where the game took place."

  - name: Promotion Type
    expr: promotion_types
    comment: "High-level category of promotion (e.g., Giveaway, Theme Game)."
    synonyms: ['promotion category', 'promo type', 'event type']

  - name: Offer Type
    expr: offer_type
    comment: "Operational classification of promotion (e.g., Day of Game Highlight)."

  - name: Offer Name
    expr: offer_name
    comment: "Specific name of the promotion or giveaway."

  - name: Day of Week
    expr: day_of_week
    comment: "Day on which the game was played."
    synonyms: ['weekday', 'game day']

  - name: Day/Night
    expr: dayNight
    comment: "Whether the game was played during the day or at night."

measures:
  - name: Average Attendance
    expr: AVG(avg_attendance)
    comment: "Average attendance for this group of games."
    format:
      type: number
      decimal_places:
        type: exact
        places: 0

  - name: Team Average Attendance
    expr: AVG(team_avg_attendance)
    comment: "Baseline average attendance for the same team and season."

  - name: Attendance Lift
    expr: AVG(attendance_lift)
    comment: "Difference between group attendance and the team's seasonal average."
    display_name: "Attendance Lift vs Team Average"
    format:
      type: number
      decimal_places:
        type: exact
        places: 0
      abbreviation: compact

  - name: Number of Games
    expr: SUM(num_games)
    comment: "Count of games represented in this aggregation."
    synonyms: ['games played']

  - name: Number of Promotions
    expr: SUM(num_promotions)
    comment: "Count of distinct promotions across games."
    synonyms: ['promo count', 'unique offers']

  - name: Attendance Lift % 
    expr: MEASURE(`Attendance Lift`) / MEASURE(`Team Average Attendance`)
    comment: "Relative percentage increase in attendance compared to team average."
    format:
      type: percentage
      decimal_places:
        type: exact
        places: 1
$$;
"""
)