# PlayHQ Fixture Scraping

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ssardina/tapp-fixture/blob/main/playhq_scrape.ipynb)

This system allows to scrape game fixtures from [PlayHQ](http://playhq.com/) via its Public [API](https://support.playhq.com/hc/en-au/sections/4405422358297-PlayHQ-APIs). It will produce a CSV file ready to be uploaded as Schedule in [TeamApp](https://brunswickmagicbasketball.teamapp.com/).

The *Public* APIs only require a header parameters to get a successful response, which includes the following components:

- `x-api-key` (also referred to as the Client ID) will be provided by PlayHQ when you request access to the public API via their [support page](https://support.playhq.com/hc/en-au) or email support@playhqsupport.zendesk.com. This key can be stored in a file `x_api_key.txt` or it will be asked interactively by the notebook otherwise. In many cases, the feature to create new API credentials is disabled for a user and can only be actioned by a Super Administrator role within the Play HQ portal.
- `x-phq-tenant` usually refers to the sport/association - in this case '`bv`'.


Detailed reference documentation for PlayHQ API can be found [here](https://docs.playhq.com/tech).

**Contact:** Sebastian Sardina (sssardina@gmail.com)

In [1]:
# from IPython.core.interactiveshell import InteractiveShell
# InteractiveShell.ast_node_interactivity = "all"
import pandas as pd
import re
import os
import calendar, datetime
import configparser

# Set-up everything if running in Google Colab
if "COLAB_GPU" in os.environ:
  %pip install pyshorteners
  %pip install coloredlogs
  for f in ['utils.py', 'playhq.py']:
    if not os.path.exists(f):
      !wget "https://raw.githubusercontent.com/ssardina/tapp-fixture/main/{f}"

import utils
import playhq as phq

## 1. Configuration and set-up

We first configure and set-up the application. This means reading configuration variables from a config file and setting the game day.

So, first of all, we specify:

1. Configuration file for the club and season.
2. Game date to scrape.

In [2]:
CONFIG_FILE = 'config_bmc.cfg'
# CONFIG_FILE = 'config_cba.cfg'

# set the game date
GAME_DATE = utils.next_day(calendar.SATURDAY)
# GAME_DATE = datetime.date(2022, 8, 27)

In [3]:
config = configparser.ConfigParser()
config.read(CONFIG_FILE)

CLUB_NAME = config.get('main','CLUB_NAME')
TIMEZONE = config.get('main','TIMEZONE')
SEASON = config.get('main','SEASON')
OUTPUT_PATH = config.get('main','OUTPUT_PATH')

ORG_ID = config.get('playhq','ORG_ID')
X_TENANT = config.get('playhq','X_TENANT')
X_API_KEY = config.get('playhq','X_API_KEY')
PLAYHQ_SEASON_URL = config.get('playhq', 'PLAYHQ_SEASON_URL')

# Get nice game date format: Saturday August 06, 2022
GAME_DATE_TIMESTAMP = pd.to_datetime(GAME_DATE).tz_localize(TIMEZONE)
GAME_DATE_NAME = GAME_DATE_TIMESTAMP.strftime("%A %B %d, %Y (%Y/%m/%d)")

# Create phq_club object
phq_club = phq.PlayHQ(CLUB_NAME, ORG_ID, X_API_KEY, X_TENANT, TIMEZONE)

season_id = phq_club.get_season_id(SEASON)
PLAYHQ_GAMES_URL = f"https://bv.playhq.com/org/{ORG_ID}/games?date={GAME_DATE_TIMESTAMP.strftime('%Y-%m-%d')}"

print(f"Sections in file {CONFIG_FILE}:", config.sections())

print(f"Club name: {CLUB_NAME} (org. id: {ORG_ID})")
print(f"Season: {SEASON} (season id: {season_id})")
print("X-tenant:", X_TENANT, "x-api-key:", X_API_KEY)
print("Output path:", OUTPUT_PATH)
print(f"Game date: {GAME_DATE_NAME} - Timezone: {TIMEZONE}")

print()
print("PlayHQ Club fixture:", PLAYHQ_SEASON_URL)
print("PlayHQ Admin games:", PLAYHQ_GAMES_URL)

TypeError: tuple indices must be integers or slices, not str

Next configure some variables that will be used later on when generating schedules.

In [None]:
DESC_BYE_TAPP = "Sorry, no game for the team in this round."
DESC_TAPP = """RSVP mandatory for the game.

Opponent: {opponent}
Venue: {venue} ({court})
Address: {address} {address_tips}
Google Maps coord: https://maps.google.com/?q={coord}

- Please ensure you arrive early and ready.
- Remember that shorts should have no pockets, players should not wear bracelets/watch as it is a risk of injury.
- No food in the venue and pickup your rubbish.
- Games will have 2x20 min halves.
- Each team needs to provide a scorer. TMs, please consider a roster.
- Players should not bring balls into the venue - game balls provided by Magic in coach's equipment bag.
- Beginners refs will be wearing green shirts. Please support and respect them through a POSITIVE sideline behaviour.

Check the game in PlayHQ: {url_game}
Check the round in PlayHQ: {url_grade}
All clubs in PlayHQ: PLAYHQ_SEASON_URL
""".replace("PLAYHQ_SEASON_URL", PLAYHQ_SEASON_URL)

## 2. Get upcoming games for club's teams

First, get the teams of the club.

In [None]:
teams_df = phq_club.get_season_teams(season_id)
teams_df

Next, extract all upcoming games for these teams of the club.

In [None]:
upcoming_games_df = phq_club.get_games(teams_df, GAME_DATE_TIMESTAMP)

if upcoming_games_df is not None:
    print(f'There were {upcoming_games_df.shape[0]} games extracted for game day: {GAME_DATE_NAME}')
    upcoming_games_df[phq.GAMES_COLS]
else:
    print("No games for date: ", GAME_DATE_NAME)

## 3. Convert to TeamApp CSV format

Next, we convert the PlayHQ upcoming games to Teams App format so we can produce a CSV file to be imported into Teams App.

In [None]:
games_tapps_df = utils.to_teamsapp_schedule(upcoming_games_df, desc_template=DESC_TAPP, game_duration=45)
print("Done computing the games for Teams App")

games_tapps_df.sample(3)

Inspect how the description of one of the games will look like:

In [None]:
# Inspect description game of one team
team = "U16 Boys Diamond"

print("Description for:", team)
print(games_tapps_df.query("team_name == @team")['description'].values[0])

### Extract BYE games

We now extract the teams for which we couldn't scrape a game. In most cases this means a BYE for those teams.

In [None]:
# Extract the date of the round
# date = team_apps_csv_df.iloc[1]['start_date']
print(f"Extract BYE games for games on {GAME_DATE_NAME}")

playing_teams = upcoming_games_df['team_id'].tolist()
bye_teams = teams_df.loc[~teams_df['id'].isin(playing_teams)]['name'].tolist()
bye_teams = list(map(lambda x: re.search("U.*", x).group(0), bye_teams))

if bye_teams:
    games_bye_df = utils.build_teamsapp_bye_schedule(bye_teams, GAME_DATE, DESC_BYE_TAPP)
    print(f"Bye teams ({len(bye_teams)}): ", bye_teams)
else:
    print("No BYE games this round...")

Finally, put together upcoming games and BYE games in a single table that will later be used to produce a CSV for TeamApp schedule import.

In [None]:
if bye_teams:
    team_apps_csv_df = pd.concat([games_tapps_df, games_bye_df])
    team_apps_csv_df.drop_duplicates(inplace=True)
    team_apps_csv_df.reset_index(inplace=True, drop=True)
else:
    team_apps_csv_df = games_tapps_df

team_apps_csv_df.sample(4)

## 5. Save to CSV file for Teams App import

In this section, we will produce the CSV file to be imported into TeamAPP as well as pickle files saving the computed dataframes.

We start by reporting the games to be written into Schedule CSV file and **CHECKING THAT ALL IS GOOD TO GO!**

Particularly, look for games that are schedule but **PENDING** and without all details (time or venue).

In [None]:
team_apps_csv_df.columns
team_apps_csv_df[['team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']]
# team_apps_csv_df

We stop the execution here if we are running all Jupyter notebook.

In [None]:
raise SystemExit("Stop right there! Continue below to produce the CSV file if needed.")

### 5.2. Check changes with previous saves

If the schedule was generated before, check if the new one differs with the one saved already.

First, let us define the files that we will save to disk.

In [None]:
import datetime
import os
import shutil

now = datetime.datetime.now() # current date and time
now_str = now.strftime("%Y-%m-%d_%H:%M:%S")
game_date_str = GAME_DATE.strftime('%Y_%m_%d')

file_csv = os.path.join(OUTPUT_PATH, f"schedule-teamsapp-{game_date_str}.csv")
file_upcoming_pkl = os.path.join(OUTPUT_PATH, f"upcoming_games_df-{GAME_DATE.strftime('%Y_%m_%d')}.pkl")
file_team_apps_csv = os.path.join(OUTPUT_PATH, f"team_apps_csv_df-{GAME_DATE.strftime('%Y_%m_%d')}.pkl")

Next, let's check if there was a saved file for the upcoming game day.

In [None]:
cols = ['team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']

changed_games_df = None
if os.path.exists(file_team_apps_csv):
    print("There was already a schedule saved, recovering it to compare...")
    old_team_apps_csv_df = pd.read_pickle(file_team_apps_csv)

    teams_changed = pd.concat([team_apps_csv_df[cols], old_team_apps_csv_df[cols]]).drop_duplicates(keep=False)['team_name'].unique()
    print("Teams whose games have changed (updated, new, dropped):", teams_changed)

    old_games_df = old_team_apps_csv_df[cols].query("team_name in @teams_changed")
    new_games_df = team_apps_csv_df[cols].query("team_name in @teams_changed")
    changed_games_df = new_games_df.merge(old_games_df, how="inner", on="team_name", suffixes=('_new', '_old'))

# Show changes if any...
(changed_games_df is not None) and changed_games_df

### 5.3. Write a TeamAPP Schedule CSV & Datafarmes Pickles

Finally, we save the data to a CSV file that can be imported into the [SCHEDULE of TeamsApp for all Entries](https://brunswickmagicbasketball.teamapp.com/clubs/263995/events?_list=v1&team_id=all).

In [None]:
import datetime
import os
import shutil

now = datetime.datetime.now() # current date and time
now_str = now.strftime("%Y-%m-%d_%H:%M:%S")
game_date_str = GAME_DATE.strftime('%Y_%m_%d')

if not os.path.exists(OUTPUT_PATH):
  os.makedirs(OUTPUT_PATH)

print('Saving TeamAPP schedule CSV file and Dataframes for games:', GAME_DATE_NAME)

file_csv = os.path.join(OUTPUT_PATH, f"schedule-teamsapp-{game_date_str}.csv")
file_upcoming_pkl = os.path.join(OUTPUT_PATH, f"upcoming_games_df-{GAME_DATE.strftime('%Y_%m_%d')}.pkl")
file_team_apps_csv = os.path.join(OUTPUT_PATH, f"team_apps_csv_df-{GAME_DATE.strftime('%Y_%m_%d')}.pkl")

for f in [file_csv, file_upcoming_pkl, file_team_apps_csv]:
  if os.path.exists(f):
    print("Backup file", f)
    shutil.copy(f, f + ".bak")

print('File to save TeamApp schedule:', file_csv)
team_apps_csv_df.to_csv(file_csv, index=False)

print('Saving dataframe pickle:', file_upcoming_pkl)
upcoming_games_df.to_pickle(file_upcoming_pkl)
print('Saving dataframe pickle:', file_team_apps_csv)
team_apps_csv_df.to_pickle(file_team_apps_csv)

print(f"Finished saving csv and data-frame files: {now.strftime('%d/%m/%Y, %H:%M:%S')}")

# ------------ END FIXTURE PUBLISHING ------------

### Check a particular team

In [None]:
team = "U10 Girls Gold"

print(games_tapps_df.query("team_name == @team")['description'].values[0])
team_apps_csv_df.query("team_name == @team")[['team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']]
