# PlayHQ Fixture Scraping

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ssardina/tapp-fixture/blob/main/playhq_scrape.ipynb)

This system allows to scrape game fixtures from [PlayHQ](http://playhq.com/) via its Public [API](https://support.playhq.com/hc/en-au/sections/4405422358297-PlayHQ-APIs). It will produce a CSV file ready to be uploaded as Schedule in [TeamApp](https://brunswickmagicbasketball.teamapp.com/).

The *Public* APIs only require a header parameters to get a successful response, which includes the following components:

- `x-api-key` (also referred to as the Client ID) will be provided by PlayHQ when you request access to the public API via their [support page](https://support.playhq.com/hc/en-au) or email support@playhqsupport.zendesk.com. This key can be stored in a file `x_api_key.txt` or it will be asked interactively by the notebook otherwise. In many cases, the feature to create new API credentials is disabled for a user and can only be actioned by a Super Administrator role within the Play HQ portal.
- `x-phq-tenant` usually refers to the sport/association - in this case '`bv`'.


Detailed reference documentation for PlayHQ API can be found [here](https://docs.playhq.com/tech).

**Contact:** Sebastian Sardina (sssardina@gmail.com)

In [None]:
# from IPython.core.interactiveshell import InteractiveShell
# InteractiveShell.ast_node_interactivity = "all"
import pandas as pd

# from tqdm.notebook import tqdm    # for progress bar: https://github.com/tqdm/tqdm
import re
import os
import calendar, datetime
import dtale

import utils
import playhq as phq

## 1. Configuration and set-up

We first configure and set-up the application. This means reading configuration variables from a config file and setting the game day.

So, first of all, specify the following information:

1. Configuration file for the club and season.
2. Game dates interval to scrape.

In [None]:
# Change this to import your club's own configuration
# from config_bmc_w23 import *
# from config_bmc_s23 import *
# from config_bmc_w24 import *
from config_cba_24 import *

# Set the game date interval scrape
GAME_DATE_START = datetime.date.today()  # by default, any game after today
# GAME_DATE_START = utils.next_day(calendar.SATURDAY)   # start from next game day (used for Domestic)
GAME_DATE_START = datetime.date(2024, 7, 12) # start on a specific day - (used for Rep - REP ROUND 5+)

WEEKS = 10   # how many weeks after date start we want to scrape (use 1 for just next game)
GAME_DATE_END = GAME_DATE_START + datetime.timedelta(days=WEEKS*7)


###############################################################
# DO NOT CHANGE FROM HERE
###############################################################

# Get nice game date format: Saturday August 06, 2022
GAME_DATE_START_TIMESTAMP = pd.to_datetime(GAME_DATE_START).tz_localize(TIMEZONE)
GAME_DATE_END_TIMESTAMP = pd.to_datetime(GAME_DATE_END).tz_localize(TIMEZONE)

GAME_DATE_START_NAME = utils.pretty_date(GAME_DATE_START_TIMESTAMP)
GAME_DATE_END_NAME = utils.pretty_date(GAME_DATE_END_TIMESTAMP)

# Create phq_club object
phq_club = phq.PlayHQ(CLUB_NAME, ORG_ID, X_API_KEY, X_TENANT, TIMEZONE, tapp_team_name, tapp_game_name)
if SEASON_ID is None:
    raise SystemExit("ERROR! Please specify either SEASON_ID.")
SEASON_NAME = phq_club.get_season_name(SEASON_ID)
SEASON_COMPETITION = phq_club.get_season_competition(SEASON_ID)

print(f"Club name: {CLUB_NAME} (org. id: {ORG_ID[0:8]}****)")
print(f"Season: {SEASON_NAME} (season id: {SEASON_ID})")
print("Season competition:", SEASON_COMPETITION)
print("X-tenant:", X_TENANT, "x-api-key:", X_API_KEY[0:8]+"****")
print("Output path:", OUTPUT_PATH)
if not os.path.exists(OUTPUT_PATH):
    raise SystemExit("ERROR! Output path {OUTPUT_PATH} is missing! Please create or link that path correctly to save data.")

print("Timezone:", TIMEZONE)
print(f"Game dates: {GAME_DATE_START_NAME} - {GAME_DATE_END_NAME}")
print("PlayHQ Club fixture:", PLAYHQ_SEASON_URL)

## 2. Get upcoming games for club's teams

First, get the teams of the club, sort them based on age group.

In [None]:
teams_df = phq_club.get_season_teams(SEASON_ID)
teams_df.sort_values('age', ascending=True, inplace=True)
teams_df.reset_index(inplace=True, drop=True)

teams = teams_df['name'].values
print(f"Found {len(teams)} teams:", teams)

# teams_df

Only use this if you want to filter the teams to extract to some of them.

In [None]:
TEAMS_FILTER = None
# TEAMS_FILTER = ["Coburg U16 Boys 5", "Coburg U16 Boys 3"]

if TEAMS_FILTER is not None:
    teams_df = teams_df.query("name in @TEAMS_FILTER")
    print(f"Kept {len(teams_df['name'].values)} teams:", teams_df['name'].values)
else:
    print("All teams selected.")
teams_df

Next, extract all games between the dates specified for these teams of the club.

In [None]:
upcoming_games_df, team_errors = phq_club.get_games(teams_df, GAME_DATE_START_TIMESTAMP, GAME_DATE_END_TIMESTAMP)

if upcoming_games_df is not None:
    print(f'There were {upcoming_games_df.shape[0]} games extracted for game between {GAME_DATE_START_NAME} and {GAME_DATE_END_NAME}')
    upcoming_games_df[phq.GAMES_COLS]
else:
    print(f'No games between {GAME_DATE_START_NAME} and {GAME_DATE_END_NAME}')

if team_errors:
    print("Team errors:", team_errors)


If in FINALS, there may be games scheduled for the following weekend, so they have no opponent yet.
We now list them to check and then drop them as they are not yet actual games.

In [None]:
# get the teams that have missing competitor and remove them from upcoming games
mask_no_competitors = upcoming_games_df['competitors'].apply(lambda x: len(x) == 1)

teams_pending_competitors = upcoming_games_df[mask_no_competitors].team_name.values
upcoming_games_df.drop(upcoming_games_df[mask_no_competitors].index, inplace=True)

print("Teams that have a pending competitor (finals?):")
teams_pending_competitors

Show final upcoming games before converting to TeamApps.

In [None]:
# dtale.show(upcoming_games_df)
print("No of upcoming games:", upcoming_games_df.shape[0])

upcoming_games_df

In [None]:
# upcoming_games_df.query("team_name == 'Magic U14 Boys Diamond'")
# upcoming_games_df.loc[upcoming_games_df.team_name == 'Magic U12 Girls White']
# upcoming_games_df.iloc[0,:]

upcoming_games_df.query("team_name == 'Coburg U14 Girls 5'")

## 3. Convert to TeamApp CSV format

Next, we convert the PlayHQ upcoming games to Teams App format so we can produce a CSV file to be imported into Teams App.

This process takes time as it processes game per game and even obtains short URL links for each game.

In [None]:
if upcoming_games_df is None:
    raise SystemExit("There are no games to process. Exiting.")

games_tapps_df = phq_club.to_teamsapp_schedule(upcoming_games_df, desc_template=DESC_TAPP, game_duration=45)
print("Done computing the games for Teams App")

# find out the game day if there is one day all teams play on only
game_day = None
single_game_day = (games_tapps_df['start_date'].drop_duplicates().size == 1)
if single_game_day:
    game_day = games_tapps_df.iloc[0]['start_date']
    print("All games are in the following day:", utils.pretty_date(game_day))

Inspect how the description of one of the games will look like:

In [None]:
# games_tapps_df.sample(3)
# dtale.show(games_tapps_df)
games_tapps_df

Keep games after a particular date:

In [None]:
import datetime

START_DATE = None
# START_DATE = datetime.date(2023, 7, 21)

if START_DATE is not None:
    print("Keeping games after:", START_DATE)
    games_tapps_df[games_tapps_df['start_date'] > START_DATE]
    games_tapps_df = games_tapps_df[games_tapps_df['start_date'] > START_DATE]
else:
    print("Keeping all games")
games_tapps_df

In [None]:
# Inspect description game of one team
# team = "12.2 Boys"
team = "U14 Boys Silver"

print("Description for:", team)
print(games_tapps_df.query("team_name == @team")['description'].values[0])

## 4. Append BYE games (if necessary)

We generate BYE entries for TeamsAPP ***only*** if the games are played all in the same day.

In [None]:
days_games = games_tapps_df['start_date'].drop_duplicates().values

if len(days_games) == 1:
    game_day = days_games[0]
    print("Seems all games are played on the same day:", game_day)


In [None]:
bye_teams= False    # assume no bye games

if game_day is not None:
    game_day = games_tapps_df.iloc[0]['start_date']

    # Extract the date of the round
    # date = games_tapps_df.iloc[1]['start_date']
    print(f"Extract BYE games for games on {utils.pretty_date(game_day)}")

    playing_teams = upcoming_games_df['team_id'].tolist()
    bye_teams = teams_df.loc[~teams_df['id'].isin(playing_teams)]['name'].tolist()
    bye_teams = list(map(lambda x: tapp_team_name(x), bye_teams))

    if bye_teams:
        games_bye_df = phq_club.build_teamsapp_bye_schedule(bye_teams, game_day, DESC_BYE_TAPP)
        print(f"Bye teams ({len(bye_teams)}): ", bye_teams)

        games_tapps_df = pd.concat([games_tapps_df, games_bye_df])
        games_tapps_df.drop_duplicates(inplace=True)
        games_tapps_df.reset_index(inplace=True, drop=True)
    else:
        print("No BYE games this round...")
else:
    print("Games obtained are not on the same day, not computing BYE games...")

(bye_teams and games_bye_df)

## 5. Final review

Finally, we report the games to be written into Schedule CSV file and **CHECKING THAT ALL IS GOOD TO GO!**

Particularly, look for games that are schedule but **PENDING** and without all details (time or venue).

In [None]:
games_tapps_df.columns
games_tapps_df[['event_name', 'team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']]
# games_tapps_df

We stop the execution here if we are running all Jupyter notebook.

In [None]:
raise SystemExit("Stop right there! Continue below to produce the CSV file if needed.")

## 6. Save to CSV file for Teams App import

OK we are ready to import into Teams App.

### 6.2. Check changes with previous saves

If the schedule was generated before, check if the new one differs with the one saved already.

First, let us define the files that we will save to disk.

In [None]:
now = datetime.datetime.now() # current date and time
now_str = now.strftime("%Y_%m_%d-%H:%M:%S")

id_file = now_str
if game_day is not None:    # there is one date for all games!
    id_file = utils.compact_date(game_day)

file_csv = os.path.join(OUTPUT_PATH, f"schedule-teamsapp-{id_file}.csv")
file_upcoming_pkl = os.path.join(OUTPUT_PATH, f"upcoming_games_df-{id_file}.pkl")
file_games_tapps = os.path.join(OUTPUT_PATH, f"games_tapps_df-{id_file}.pkl")

print("Files to save:")
print(file_csv)
print(file_upcoming_pkl)
print(file_games_tapps)

if not os.path.exists(OUTPUT_PATH):
    raise SystemExit("ERROR! Output path {OUTPUT_PATH} is missing! Please create or link that path correctly to save data.")

Next, let's check if there was a saved file for the upcoming game day.

In [None]:
cols = ['team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']

changed_games_df = None
if os.path.exists(file_games_tapps):
    print("There was already a schedule saved, recovering it to compare...")
    old_games_tapps_df = pd.read_pickle(file_games_tapps)

    teams_changed = pd.concat([games_tapps_df[cols], old_games_tapps_df[cols]]).drop_duplicates(keep=False)['team_name'].unique()
    print("Teams whose games have changed (updated, new, dropped):", teams_changed)

    old_games_df = old_games_tapps_df[cols].query("team_name in @teams_changed")
    new_games_df = games_tapps_df[cols].query("team_name in @teams_changed")
    changed_games_df = new_games_df.merge(old_games_df, how="inner", on="team_name", suffixes=('_new', '_old'))
else:
    print("No previous schedule saved")

# Show changes if any...
changed_games_df

### 5.3. Write a TeamAPP Schedule CSV & Datafarmes Pickles

Finally, we save the data to a CSV file that can be imported into the [SCHEDULE of TeamsApp for all Entries](https://brunswickmagicbasketball.teamapp.com/clubs/263995/events?_list=v1&team_id=all).

In [None]:
import shutil

print('Saving TeamAPP schedule CSV file and Dataframes with id:', id_file)
for f in [file_csv, file_upcoming_pkl, file_games_tapps]:
  if os.path.exists(f):
    print("Backup file", f)
    shutil.copy(f, f + ".bak")

print('Saving CSV TeamApp schedule:', file_csv)
games_tapps_df.to_csv(file_csv, index=False)

print('Saving dataframe pickle:', file_upcoming_pkl)
upcoming_games_df.to_pickle(file_upcoming_pkl)
print('Saving dataframe pickle:', file_games_tapps)
games_tapps_df.to_pickle(file_games_tapps)

print(f"Finished saving CSV and DATA-FRAMNE files: {now.strftime('%d/%m/%Y, %H:%M:%S')}")

# ------------ END FIXTURE PUBLISHING ------------

### Check a particular team

In [None]:
team = "U10 Girls Gold"

print(games_tapps_df.query("team_name == @team")['description'].values[0])
games_tapps_df.query("team_name == @team")[['team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']]
