# PlayHQ Fixture Scraping

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ssardina/tapp-fixture/blob/main/playhq_scrape.ipynb)

This system allows to scrape game fixtures from [PlayHQ](http://playhq.com/) via its Public [API](https://support.playhq.com/hc/en-au/sections/4405422358297-PlayHQ-APIs). It will produce a CSV file ready to be uploaded as Schedule in [TeamApp](https://brunswickmagicbasketball.teamapp.com/).

The *Public* APIs only require a header parameters to get a successful response, which includes the following components:

- `x-api-key` (also referred to as the Client ID) will be provided by PlayHQ when you request access to the public API via their [support page](https://support.playhq.com/hc/en-au) or email support@playhqsupport.zendesk.com. This key can be stored in a file `x_api_key.txt` or it will be asked interactively by the notebook otherwise. In many cases, the feature to create new API credentials is disabled for a user and can only be actioned by a Super Administrator role within the Play HQ portal.
- `x-phq-tenant` usually refers to the sport/association - in this case '`bv`'.


Detailed reference documentation for PlayHQ API can be found [here](https://docs.playhq.com/tech).

**Contact:** Sebastian Sardina (sssardina@gmail.com)

In [1]:
# from IPython.core.interactiveshell import InteractiveShell
# InteractiveShell.ast_node_interactivity = "all"
import pandas as pd
import re
import os
import calendar, datetime
import dtale

import utils
import playhq as phq

## 1. Configuration and set-up

We first configure and set-up the application. This means reading configuration variables from a config file and setting the game day.

So, first of all, specify the following information:

1. Configuration file for the club and season.
2. Game dates interval to scrape.

In [2]:
# Change this to import your club's own configuration
# from config_bmc import *
from config_cba import *

# Set the game date interval scrape
GAME_DATE_START = datetime.date.today() # by default, any game after today
# GAME_DATE_START = utils.next_day(calendar.SATURDAY)   # start from next game day
# GAME_DATE_START = datetime.date(2022, 11, 10) # start on a specific day

WEEKS = 5   # how many weeks after date start we want to scrape
GAME_DATE_END = GAME_DATE_START + datetime.timedelta(days=WEEKS*7)


###############################################################
# DO NOT CHANGE FROM HERE
###############################################################

# Get nice game date format: Saturday August 06, 2022
GAME_DATE_START_TIMESTAMP = pd.to_datetime(GAME_DATE_START).tz_localize(TIMEZONE)
GAME_DATE_END_TIMESTAMP = pd.to_datetime(GAME_DATE_END).tz_localize(TIMEZONE)

GAME_DATE_START_NAME = utils.pretty_date(GAME_DATE_START_TIMESTAMP)
GAME_DATE_END_NAME = utils.pretty_date(GAME_DATE_END_TIMESTAMP)

# Create phq_club object
phq_club = phq.PlayHQ(CLUB_NAME, ORG_ID, X_API_KEY, X_TENANT, TIMEZONE, tapp_team_name, tapp_game_name)
season_id = phq_club.get_season_id(SEASON)

print(f"Club name: {CLUB_NAME} (org. id: {ORG_ID})")
print(f"Season: {SEASON} (season id: {season_id})")
print("X-tenant:", X_TENANT, "x-api-key:", X_API_KEY)
print("Output path:", OUTPUT_PATH)
print("Timezone:", TIMEZONE)
print(f"Game dates: {GAME_DATE_START_NAME} - {GAME_DATE_END_NAME}")
print("PlayHQ Club fixture:", PLAYHQ_SEASON_URL)

Club name: Coburg Giants Basketball Club (org. id: 5db1d983-5453-4b73-912a-457e72c273c3)
Season: 2023 (season id: 78824ad1-0ca6-46c5-9440-dbe20c948b2f)
X-tenant: bv x-api-key: f5d33c76-f858-49fa-8330-8e0e396219cd
Output path: CBA/fixture/
Timezone: Australia/Melbourne
Game dates: Saturday November 12, 2022 (2022/11/12) - Saturday December 17, 2022 (2022/12/17)
PlayHQ Club fixture: https://bit.ly/cba-vjbl23


## 2. Get upcoming games for club's teams

First, get the teams of the club, sort them based on age group.

In [3]:
teams_df = phq_club.get_season_teams(season_id)
teams_df.sort_values('age', ascending=True, inplace=True)
teams_df.reset_index(inplace=True, drop=True)

teams = teams_df['name'].values
print(f"Found {len(teams)} teams:", teams)

# teams_df
# teams_df.query("name == 'Coburg U12 Boys 1'")

Found 30 teams: ['Coburg U12 Boys 1' 'Coburg U12 Boys 2' 'Coburg U12 Boys 3'
 'Coburg U12 Boys 4' 'Coburg U12 Girls 1' 'Coburg U12 Girls 2'
 'Coburg U12 Girls 3' 'Coburg U14 Girls 4' 'Coburg U14 Girls 2'
 'Coburg U14 Girls 1' 'Coburg U14 Boys 5' 'Coburg U14 Girls 3'
 'Coburg U14 Boys 3' 'Coburg U14 Boys 2' 'Coburg U14 Boys 1'
 'Coburg U14 Boys 4' 'Coburg U16 Girls 4' 'Coburg U16 Girls 3'
 'Coburg U16 Girls 2' 'Coburg U16 Girls 1' 'Coburg U16 Boys 5'
 'Coburg U16 Boys 3' 'Coburg U16 Boys 2' 'Coburg U16 Boys 1'
 'Coburg U16 Boys 4' 'Coburg U18 Boys 4' 'Coburg U18 Boys 1'
 'Coburg U18 Boys 2' 'Coburg U18 Boys 3' 'Coburg U18 Girls 1']


Next, extract all games between the dates specified for these teams of the club.

In [4]:
upcoming_games_df = phq_club.get_games(teams_df, GAME_DATE_START_TIMESTAMP, GAME_DATE_END_TIMESTAMP)

if upcoming_games_df is not None:
    print(f'There were {upcoming_games_df.shape[0]} games extracted for game between {GAME_DATE_START_NAME} and {GAME_DATE_END_NAME}')
    upcoming_games_df[phq.GAMES_COLS]
else:
    print(f'No games between {GAME_DATE_START_NAME} and {GAME_DATE_END_NAME}')

2022-11-12 07:22:45 INFO Games extracted for team: Coburg U12 Boys 1
2022-11-12 07:22:46 INFO Games extracted for team: Coburg U12 Boys 2
2022-11-12 07:22:46 INFO Games extracted for team: Coburg U12 Boys 3
2022-11-12 07:22:46 INFO Games extracted for team: Coburg U12 Boys 4
2022-11-12 07:22:46 INFO Games extracted for team: Coburg U12 Girls 1
2022-11-12 07:22:47 INFO Games extracted for team: Coburg U12 Girls 2
2022-11-12 07:22:47 INFO Games extracted for team: Coburg U12 Girls 3
2022-11-12 07:22:48 INFO Games extracted for team: Coburg U14 Girls 4
2022-11-12 07:22:48 INFO Games extracted for team: Coburg U14 Girls 2
2022-11-12 07:22:48 INFO Games extracted for team: Coburg U14 Girls 1
2022-11-12 07:22:49 INFO Games extracted for team: Coburg U14 Boys 5
2022-11-12 07:22:49 INFO Games extracted for team: Coburg U14 Girls 3
2022-11-12 07:22:50 INFO Games extracted for team: Coburg U14 Boys 3
2022-11-12 07:22:50 INFO Games extracted for team: Coburg U14 Boys 2
2022-11-12 07:22:51 INFO Ga

There were 90 games extracted for game between Saturday November 12, 2022 (2022/11/12) and Saturday December 17, 2022 (2022/12/17)


In [5]:
dtale.show(upcoming_games_df)



## 3. Convert to TeamApp CSV format

Next, we convert the PlayHQ upcoming games to Teams App format so we can produce a CSV file to be imported into Teams App.

This process takes time as it processes game per game and even obtains short URL links for each game.

In [6]:
games_tapps_df = phq_club.to_teamsapp_schedule(upcoming_games_df, desc_template=DESC_TAPP, game_duration=45)
print("Done computing the games for Teams App")

# find out the game day if there is one day all teams play on only
game_day = None
single_game_day = (games_tapps_df['start_date'].drop_duplicates().size == 1)
if single_game_day:
    game_day = games_tapps_df.iloc[0]['start_date']
    print("All games are in the following day:", utils.pretty_date(game_day))

Done computing the games for Teams App


Inspect how the description of one of the games will look like:

In [7]:
# games_tapps_df.sample(3)
dtale.show(games_tapps_df)



In [8]:
# Inspect description game of one team
team = "12.1 Boys"
# team = "U12 Boys Purple"

print("Description for:", team)
print(games_tapps_df.query("team_name == @team")['description'].values[0])

Description for: 12.1 Boys
RSVP mandatory for the game.

Opponent: Waverley U12 Boys 1
Venue: Coburg Basketball Stadium (Court 1)
Address: 25 Outlook Road, Coburg North 
Google Maps coord: https://maps.google.com/?q=(-37.73315,144.97684)

- Please ensure you arrive early and ready.
- Games will have 4x10 min quarters.
- Each team needs to provide a scorer.
- Teamsheet fees are $125 per team (VC fee are $150); TEAMPAY is the CBA required teamsheet payment method only.

Check the game in PlayHQ: https://tinyurl.com/2398uogc
Check the round in PlayHQ: https://tinyurl.com/277g2lsl
Check all club's teams in PlayHQ: https://bit.ly/cba-vjbl23



## 4. Append BYE games (if necessary)

We generate BYE entries for TeamsAPP ***only*** if the games are played all in the same day.

In [10]:
bye_teams= False    # assume no bye games

if game_day is not None:
    game_day = games_tapps_df.iloc[0]['start_date']

    # Extract the date of the round
    # date = games_tapps_df.iloc[1]['start_date']
    print(f"Extract BYE games for games on {utils.pretty_date(game_day)}")

    playing_teams = upcoming_games_df['team_id'].tolist()
    bye_teams = teams_df.loc[~teams_df['id'].isin(playing_teams)]['name'].tolist()
    bye_teams = list(map(lambda x: re.search("U.*", x).group(0), bye_teams))

    if bye_teams:
        games_bye_df = phq_club.build_teamsapp_bye_schedule(bye_teams, game_day, DESC_BYE_TAPP)
        print(f"Bye teams ({len(bye_teams)}): ", bye_teams)

        games_tapps_df = pd.concat([games_tapps_df, games_bye_df])
        games_tapps_df.drop_duplicates(inplace=True)
        games_tapps_df.reset_index(inplace=True, drop=True)
    else:
        print("No BYE games this round...")
else:
    print("Games obtained are not on the same day, not computing BYE games...")

(bye_teams and games_bye_df)

Games obtained are not on the same day, not computing BYE games...


False

## 5. Final review

Finally, we report the games to be written into Schedule CSV file and **CHECKING THAT ALL IS GOOD TO GO!**

Particularly, look for games that are schedule but **PENDING** and without all details (time or venue).

In [12]:
games_tapps_df.columns
games_tapps_df[['event_name', 'team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']]
# games_tapps_df

Unnamed: 0,event_name,team_name,opponent,start_date,start_time,venue,court
0,Game 12.1 Boys - Round 1,12.1 Boys,Waverley U12 Boys 1,2022-11-18,18:40:00,Coburg Basketball Stadium,Court 1
1,Game 12.1 Boys - Round 2,12.1 Boys,Camberwell U12 Boys 1,2022-11-25,18:30:00,Balwyn High School,Court 3
2,Game 12.1 Boys - Round 3,12.1 Boys,Blackburn U12 Boys 1,2022-12-02,19:30:00,Boroondara Sports Complex,Court 4
3,Game 12.2 Boys - Round 1,12.2 Boys,Nunawading U12 Boys 3,2022-11-18,18:40:00,Coburg Basketball Stadium,Court 2
4,Game 12.2 Boys - Round 2,12.2 Boys,Diamond Valley U12 Boys 3,2022-11-25,18:40:00,Diamond Valley Sports and Fitness Centre,Court 2
...,...,...,...,...,...,...,...
85,Game 18.3 Boys - Round 2,18.3 Boys,Broadmeadows U18 Boys 3,2022-11-25,20:40:00,Coburg Basketball Stadium,Court 4
86,Game 18.3 Boys - Round 3,18.3 Boys,Geelong United U18 Boys 5,2022-12-02,21:40:00,AWA Alliance Bank Stadium,Court 3
87,Game 18.1 Girls - Round 1,18.1 Girls,Diamond Valley U18 Girls 3,2022-11-18,21:40:00,Coburg Basketball Stadium,Court 1
88,Game 18.1 Girls - Round 2,18.1 Girls,Wallan U18 Girls 1,2022-11-25,21:40:00,Mill Park Basketball Stadium,Court 2


We stop the execution here if we are running all Jupyter notebook.

In [None]:
raise SystemExit("Stop right there! Continue below to produce the CSV file if needed.")

## 6. Save to CSV file for Teams App import

OK we are ready to import into Teams App.

### 6.2. Check changes with previous saves

If the schedule was generated before, check if the new one differs with the one saved already.

First, let us define the files that we will save to disk.

In [13]:
now = datetime.datetime.now() # current date and time
now_str = now.strftime("%Y_%m_%d-%H:%M:%S")

id_file = now_str
if game_day is not None:    # there is one date for all games!
    id_file = utils.compact_date(game_day)

file_csv = os.path.join(OUTPUT_PATH, f"schedule-teamsapp-{id_file}.csv")
file_upcoming_pkl = os.path.join(OUTPUT_PATH, f"upcoming_games_df-{id_file}.pkl")
file_games_tapps = os.path.join(OUTPUT_PATH, f"games_tapps_df-{id_file}.pkl")

print("Files to save:")
print(file_csv)
print(file_upcoming_pkl)
print(file_games_tapps)

Files to save:
CBA/fixture/schedule-teamsapp-2022_11_12-07:35:44.csv
CBA/fixture/upcoming_games_df-2022_11_12-07:35:44.pkl
CBA/fixture/games_tapps_df-2022_11_12-07:35:44.pkl


Next, let's check if there was a saved file for the upcoming game day.

In [14]:
cols = ['team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']

changed_games_df = None
if os.path.exists(file_games_tapps):
    print("There was already a schedule saved, recovering it to compare...")
    old_games_tapps_df = pd.read_pickle(file_games_tapps)

    teams_changed = pd.concat([games_tapps_df[cols], old_games_tapps_df[cols]]).drop_duplicates(keep=False)['team_name'].unique()
    print("Teams whose games have changed (updated, new, dropped):", teams_changed)

    old_games_df = old_games_tapps_df[cols].query("team_name in @teams_changed")
    new_games_df = games_tapps_df[cols].query("team_name in @teams_changed")
    changed_games_df = new_games_df.merge(old_games_df, how="inner", on="team_name", suffixes=('_new', '_old'))

# Show changes if any...
changed_games_df

### 5.3. Write a TeamAPP Schedule CSV & Datafarmes Pickles

Finally, we save the data to a CSV file that can be imported into the [SCHEDULE of TeamsApp for all Entries](https://brunswickmagicbasketball.teamapp.com/clubs/263995/events?_list=v1&team_id=all).

In [15]:
import shutil

print('Saving TeamAPP schedule CSV file and Dataframes with id:', id_file)
for f in [file_csv, file_upcoming_pkl, file_games_tapps]:
  if os.path.exists(f):
    print("Backup file", f)
    shutil.copy(f, f + ".bak")

print('Saving CSV TeamApp schedule:', file_csv)
games_tapps_df.to_csv(file_csv, index=False)

print('Saving dataframe pickle:', file_upcoming_pkl)
upcoming_games_df.to_pickle(file_upcoming_pkl)
print('Saving dataframe pickle:', file_games_tapps)
games_tapps_df.to_pickle(file_games_tapps)

print(f"Finished saving CSV and DATA-FRAMNE files: {now.strftime('%d/%m/%Y, %H:%M:%S')}")

Saving TeamAPP schedule CSV file and Dataframes with id: 2022_11_12-07:35:44
Saving CSV TeamApp schedule: CBA/fixture/schedule-teamsapp-2022_11_12-07:35:44.csv
Saving dataframe pickle: CBA/fixture/upcoming_games_df-2022_11_12-07:35:44.pkl
Saving dataframe pickle: CBA/fixture/games_tapps_df-2022_11_12-07:35:44.pkl
Finished saving CSV and DATA-FRAMNE files: 12/11/2022, 07:35:44


# ------------ END FIXTURE PUBLISHING ------------

### Check a particular team

In [None]:
team = "U10 Girls Gold"

print(games_tapps_df.query("team_name == @team")['description'].values[0])
games_tapps_df.query("team_name == @team")[['team_name', 'opponent', 'start_date', 'start_time', 'venue', 'court']]
