# pyNBL: Basketball Statistic System for Australian NBL

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ssardina/pynbl/HEAD)

This notebook **incrementally** builds a set of stat tables from NBL Basketball Games:

1. A table of games played, with team names, points, venue, etc.
2. A stat table of _stint lineups_ (advance) statistics for each game and each team. A **stint** is a lineup of players who play together in different interval periods across the game. This table will contain the stints for each team from the play-by-play data and compute various statistics for those stints.

Tables will be saved in CSV and Excel formats as well as in [Pickle format](https://docs.python.org/3/library/pickle.html) for later recovery as Panda DataFrames.


The data comes as a raw JSON file using the game id (e.g., `2087737`):

https://fibalivestats.dcd.shared.geniussports.com/data/2087737/data.json

In [1]:
# Let's first load all required packages...
import os
from pathlib import Path
import pandas as pd
import numpy as np
import dtale

from config import *
import bball_stats
import tools


# Set folder with data files and Pickle tables saved on disk
DATA_DIR='data-21_22/'
FILES = dict()
FILES['stint_stats'] = Path(DATA_DIR, "stint_stats_df").with_suffix('.pkl')
FILES['stints'] = Path(DATA_DIR, "stints_df").with_suffix('.pkl')
FILES['games'] = Path(DATA_DIR, "games_df").with_suffix('.pkl')
FILES['players'] = Path(DATA_DIR, "players_df").with_suffix('.pkl')

## 1. Define games to scrape and saved data

First, setup the games we want to scrape and compute, as well as the existing data stored in file to append to.

In [4]:
# Games to be computed

#   Format: (game id, round number)
games_21_22 = [(1976446, 1), (1976447, 1), (1976448, 1), (1976452, 1), (1976454, 1), (2004608, 1), (2004609, 2), (1976449, 2), (1976451, 2), (1976453, 2), (1976455, 2), (1976458, 2), (1976456, 2), (1976457, 3), (1976459, 3), (1976460, 3), (1976461, 3), (1976462, 3), (1976463, 3), (1976464, 3), (2004610, 3), (1976465, 3), (1976468, 4), (1976469, 4), (1976474, 5), (1976473, 5), (1976482, 6), (2036215, 7), (2031329, 7), (2031330, 7), (2031332, 7), (2031333, 7), (2031334, 7), (2031335, 8), (2031336, 8), (2031337, 8), (2031338, 8), (2031340, 8), (2031341, 8), (2046695, 8), (2046696, 8), (2046697, 9), (2031342, 9), (2031343, 9), (2031344, 9), (2031345, 9), (2031346, 9), (2031347, 9), (2046698, 10), (2046700, 10), (2046701, 10), (2046702, 10), (2046703, 10), (2046704, 10), (2046706, 10), (2046707, 11), (2046709, 11), (2046710, 11), (2046711, 11), (2046712, 11), (2046713, 11), (2051763, 11), (2053811, 12), (2053812, 12), (2053813, 12), (2053814, 12), (2053815, 12), (2053816, 12), (2053817, 12), (2053818, 13), (2053819, 13), (2053820, 13), (2053821, 13), (2053822, 13), (2053823, 13), (2053824, 13), (2053825, 13), (2056454, 14), (2056455, 14), (2056457, 14), (2056458, 14), (2056461, 14), (2056462, 14), (2056460, 14), (2056463, 15), (2056464, 15), (2056466, 15), (2056467, 15), (2056469, 15), (2056471, 15), (2056472, 15), (2056473, 15), (2065653, 16), (2065654, 16), (2065655, 16), (2065656, 16), (2065657, 17), (2065658, 17), (2065659, 17), (2069165, 16), (2069166, 16), (2069167, 16), (2069168, 17), (2069169, 17), (2069170, 17), (2069171, 17), (2069172, 17), (2069175, 18), (2069177, 18), (2069179, 18), (2069181, 18), (2069183, 18), (2069184, 18), (2069186, 18), (2069187, 18), (2069191, 18), (2069192, 19), (2069194, 19), (2069196, 19), (2069199, 19), (2069202, 19), (2069203, 19), (2069204, 19), (2069190, 19), (2069193, 19), (2069195, 20), (2069197, 20), (2069198, 20), (2069200, 20), (2069201, 20), (2069205, 20), (2069173, 20), (2069174, 21), (2069176, 21), (2069178, 21), (2069180, 21), (2069182, 21), (2069188, 21), (2069189, 20)]

games_22_23 = [(2141127, 0), (2135117, 0), (2134935, 0), (2141126, 0), (2135116, 0), (2122060, 0), (2122059, 0)]

# games = [1976463]   # game with no "bugs" in subs
games = games_21_22

# Set to True to re-compute from scratch all tables
reload = True

## 2. Compute stat and game tables

Now, let us run the system that scrapes the games' data, computes stats and game info, and adds them to the initial tables of stats and games.

We start by loading all saved previous games, if any, as we want to append to that database (and we don't want to recompute them).

In [5]:
# Load tables from saved files (if any)
saved_stint_stats_df = None
saved_stints_df = None
saved_games_df = None
saved_players_df = None
existing_games = []

if not reload:
    # load the stat dataframe already stored as a file
    print(f"Loading recorded dataframes from files")
    try:
        saved_stint_stats_df = pd.read_pickle(FILES['stint_stats'])
        saved_stints_df = pd.read_pickle(FILES['stints'])
        saved_games_df = pd.read_pickle(FILES['games'])
        saved_players_df = pd.read_pickle(FILES['players'])
        # collect game ids of all games recovered from file
        existing_games = saved_games_df.game_id.unique()
    except FileNotFoundError as e:
        print("Error loading Pickle files: ", e)
        saved_stint_stats_df = None
        saved_stints_df = None
        saved_games_df = None
        saved_players_df = None
        existing_games = []
else:
    existing_games = []

print(f"Recovered {len(existing_games)} games: ", existing_games)

# saved_stats_df['lineup'].apply(lambda x: len(x) > 5)
# saved_stats_df.loc[5,'lineup']
# saved_stats_df.loc[5]

# saved_stint_stats_df.sample(3)
# saved_games_df.sample(3)

Recovered 0 games:  []


It is now time to process games to extract:

1. Table of **games**.
2. Table of **players** who played in each game with their stats, for each team.
3. Table of **stints** in each game for each team.
4. Table of **stint stats** in each game for each team.

In [6]:
# collect here set of stat dfs and game info, one per game
#   then, we will put them together into different dataframes
stint_stats_dfs = []
stints_dfs = []
players_dfs = []
games_data = []

# Build data for each game, we'll put them together after....
for game in games:
    # get game_id and round no (if available)
    if isinstance(game, tuple):
        game_id, round_no = game
    else:
        game_id = game
        round_no = np.nan # no round info available

    # don't scrape game data if already loaded from file, skip it
    if game_id in existing_games:
        print(f"Game {game_id} was already saved on file; no scrapping...")
        continue

    ##################################################################
    # !!! MAIN STEP: scrape and compute the actual stats for the game
    ##################################################################
    print(f"Computing game {game_id}...")

    # 1. Read game JSON file
    game_json = tools.get_json_data(game_id)

    result = bball_stats.build_game_stints_stats_df(game_json, game_id)
    game_stint_stats_df = result['stint_stats_df']   #  this is basically what we care, the stint stats
    game_stints_df = result['stints_df']
    game_team1, game_team2 = result['teams']

    # Add the game id column to game tables
    game_stint_stats_df.insert(0, 'game_id', game_id)
    game_stints_df.insert(0, 'game_id', game_id)

    # Extract players in the game
    players_df = bball_stats.get_players_stats(game_json)
    players_df.insert(0, 'game_id', game_id)

    # Add tables to collected set of tables, one per game
    stint_stats_dfs.append(game_stint_stats_df)
    stints_dfs.append(game_stints_df)
    players_dfs.append(players_df)

    # Next build the record for the game dataframe
    # first, extract date of game from HTML page
    try:
        game_info = tools.get_game_info(game_id)
    except:
        game_info = { "venue" : np.nan, "date": np.nan}
    print(f"\t .... done: {game_team1[0]} ({game_team1[1]}) vs {game_team2[0]} ({game_team2[1]}) on {game_info['date']}")

    games_data.append({"game_id": game_id,
                        "date" : game_info['date'],
                        "round": round_no,
                        "team1": game_team1[0],
                        "team2": game_team2[0],
                        "s1": game_team1[1],
                        "s2": game_team2[1],
                        "winner": 1 if game_team1[1] > game_team2[1] else 2,
                        "venue" : game_info["venue"]}
                      )


#################################
# All games have been processed, now put all dfs together
#################################

# First, build a dataframe with all the game data collected
games_df = pd.DataFrame(games_data) if saved_games_df is None else pd.concat([saved_games_df, pd.DataFrame(games_data)])
games_df.reset_index(inplace=True, drop=True)

# Build players dataframe
players_df = pd.concat(players_dfs + ([saved_players_df] if saved_players_df is not None else []))
players_df.reset_index(inplace=True, drop=True)

# Build stint stats dataframe
stint_stats_df = pd.concat(stint_stats_dfs + ([saved_stint_stats_df] if saved_stint_stats_df is not None else []))
stint_stats_df.reset_index(inplace=True, drop=True)

# Build stints dataframe
stints_df = pd.concat(stints_dfs + ([saved_stints_df] if saved_stints_df is not None else []))
stints_df.reset_index(inplace=True, drop=True)

print("Number of games: ", games_df.shape[0])
stint_stats_df.sample(2)
stints_df.sample(2)
games_df.sample(5)
players_df.sample(5)

Computing game 1976446...




	 .... done: Tasmania JackJumpers (83) vs Brisbane Bullets (74) on 2021-12-03 00:00:00
Computing game 1976447...




	 .... done: Perth Wildcats (85) vs Adelaide 36ers (73) on 2021-12-03 00:00:00
Computing game 1976448...
	 .... done: South East Melbourne Phoenix (89) vs New Zealand Breakers (65) on 2021-12-04 00:00:00
Computing game 1976452...
	 .... done: Adelaide 36ers (71) vs Illawarra Hawks (81) on 2021-12-05 00:00:00
Computing game 1976454...
	 .... done: Sydney Kings (79) vs Melbourne United (74) on 2021-12-05 00:00:00
Computing game 2004608...
	 .... done: Perth Wildcats (90) vs Cairns Taipans (67) on 2021-12-05 00:00:00
Computing game 2004609...
	 .... done: Perth Wildcats (94) vs Brisbane Bullets (97) on 2021-12-12 00:00:00
Computing game 1976449...
	 .... done: Sydney Kings (84) vs Illawarra Hawks (92) on 2021-12-11 00:00:00
Computing game 1976451...
	 .... done: Cairns Taipans (69) vs Tasmania JackJumpers (62) on 2021-12-11 00:00:00
Computing game 1976453...
	 .... done: Melbourne United (86) vs South East Melbourne Phoenix (94) on 2021-12-12 00:00:00
Computing game 1976455...
	 .... done



	 .... done: Brisbane Bullets (83) vs New Zealand Breakers (88) on 2022-01-09 00:00:00
Computing game 2036215...




	 .... done: Illawarra Hawks (97) vs Sydney Kings (89) on 2022-01-13 00:00:00
Computing game 2031329...




	 .... done: New Zealand Breakers (78) vs Melbourne United (89) on 2022-01-14 00:00:00
Computing game 2031330...




	 .... done: Brisbane Bullets (100) vs South East Melbourne Phoenix (84) on 2022-01-15 00:00:00
Computing game 2031332...
	 .... done: Illawarra Hawks (84) vs Melbourne United (88) on 2022-01-16 00:00:00
Computing game 2031333...
	 .... done: Sydney Kings (75) vs New Zealand Breakers (82) on 2022-01-16 00:00:00
Computing game 2031334...




	 .... done: Adelaide 36ers (87) vs Perth Wildcats (74) on 2022-01-18 00:00:00
Computing game 2031335...
	 .... done: Brisbane Bullets (96) vs Sydney Kings (87) on 2022-01-21 00:00:00
Computing game 2031336...
	 .... done: Adelaide 36ers (78) vs Melbourne United (97) on 2022-01-22 00:00:00
Computing game 2031337...
	 .... done: Illawarra Hawks (78) vs Perth Wildcats (94) on 2022-01-22 00:00:00
Computing game 2031338...




	 .... done: Sydney Kings (97) vs Brisbane Bullets (73) on 2022-01-23 00:00:00
Computing game 2031340...




	 .... done: Tasmania JackJumpers (63) vs South East Melbourne Phoenix (76) on 2022-01-23 00:00:00
Computing game 2031341...
	 .... done: South East Melbourne Phoenix (87) vs Cairns Taipans (77) on 2022-01-25 00:00:00
Computing game 2046695...
	 .... done: Illawarra Hawks (100) vs Adelaide 36ers (89) on 2022-01-24 00:00:00
Computing game 2046696...
	 .... done: Brisbane Bullets (82) vs Melbourne United (84) on 2022-01-26 00:00:00
Computing game 2046697...
	 .... done: Illawarra Hawks (80) vs Perth Wildcats (94) on 2022-01-27 00:00:00
Computing game 2031342...
	 .... done: Tasmania JackJumpers (76) vs Adelaide 36ers (71) on 2022-01-28 00:00:00
Computing game 2031343...
	 .... done: Cairns Taipans (75) vs Illawarra Hawks (94) on 2022-01-29 00:00:00
Computing game 2031344...




	 .... done: Brisbane Bullets (73) vs South East Melbourne Phoenix (88) on 2022-01-29 00:00:00
Computing game 2031345...
	 .... done: Adelaide 36ers (88) vs Melbourne United (83) on 2022-01-30 00:00:00
Computing game 2031346...
	 .... done: Sydney Kings (96) vs Perth Wildcats (81) on 2022-01-30 00:00:00
Computing game 2031347...




	 .... done: New Zealand Breakers (59) vs Tasmania JackJumpers (83) on 2022-01-30 00:00:00
Computing game 2046698...
	 .... done: New Zealand Breakers (90) vs Illawarra Hawks (67) on 2022-02-02 00:00:00
Computing game 2046700...
	 .... done: Tasmania JackJumpers (77) vs Sydney Kings (70) on 2022-02-04 00:00:00
Computing game 2046701...




	 .... done: Brisbane Bullets (94) vs Cairns Taipans (102) on 2022-02-05 00:00:00
Computing game 2046702...
	 .... done: South East Melbourne Phoenix (79) vs Perth Wildcats (101) on 2022-02-05 00:00:00
Computing game 2046703...
	 .... done: Sydney Kings (84) vs New Zealand Breakers (65) on 2022-02-06 00:00:00
Computing game 2046704...
	 .... done: Melbourne United (85) vs Tasmania JackJumpers (94) on 2022-02-06 00:00:00
Computing game 2046706...
	 .... done: Illawarra Hawks (87) vs South East Melbourne Phoenix (88) on 2022-02-07 00:00:00
Computing game 2046707...




	 .... done: South East Melbourne Phoenix (87) vs Sydney Kings (92) on 2022-02-10 00:00:00
Computing game 2046709...
	 .... done: Brisbane Bullets (77) vs Adelaide 36ers (73) on 2022-02-11 00:00:00
Computing game 2046710...
	 .... done: Illawarra Hawks (87) vs Cairns Taipans (81) on 2022-02-12 00:00:00
Computing game 2046711...
	 .... done: Melbourne United (93) vs Perth Wildcats (87) on 2022-02-12 00:00:00
Computing game 2046712...
	 .... done: South East Melbourne Phoenix (83) vs Tasmania JackJumpers (71) on 2022-02-13 00:00:00
Computing game 2046713...
	 .... done: Sydney Kings (71) vs Brisbane Bullets (69) on 2022-02-13 00:00:00
Computing game 2051763...
	 .... done: Cairns Taipans (83) vs New Zealand Breakers (84) on 2022-02-14 00:00:00
Computing game 2053811...
	 .... done: Melbourne United (94) vs South East Melbourne Phoenix (87) on 2022-02-17 00:00:00
Computing game 2053812...
	 .... done: Illawarra Hawks (79) vs Cairns Taipans (54) on 2022-02-18 00:00:00
Computing game 205381



	 .... done: South East Melbourne Phoenix (98) vs Brisbane Bullets (94) on 2022-02-19 00:00:00
Computing game 2053814...
	 .... done: Sydney Kings (98) vs Perth Wildcats (95) on 2022-02-19 00:00:00
Computing game 2053815...
	 .... done: Adelaide 36ers (82) vs Cairns Taipans (71) on 2022-02-20 00:00:00
Computing game 2053816...
	 .... done: Melbourne United (108) vs New Zealand Breakers (73) on 2022-02-20 00:00:00
Computing game 2053817...
	 .... done: Illawarra Hawks (86) vs Tasmania JackJumpers (96) on 2022-02-20 00:00:00
Computing game 2053818...
	 .... done: Cairns Taipans (73) vs Brisbane Bullets (69) on 2022-02-24 00:00:00
Computing game 2053819...
	 .... done: Illawarra Hawks (87) vs Adelaide 36ers (71) on 2022-02-25 00:00:00
Computing game 2053820...
	 .... done: Brisbane Bullets (94) vs Tasmania JackJumpers (86) on 2022-02-26 00:00:00
Computing game 2053821...




	 .... done: South East Melbourne Phoenix (86) vs Perth Wildcats (80) on 2022-02-26 00:00:00
Computing game 2053822...
	 .... done: Adelaide 36ers (90) vs Sydney Kings (93) on 2022-02-27 00:00:00
Computing game 2053823...
	 .... done: Cairns Taipans (73) vs Melbourne United (89) on 2022-02-27 00:00:00
Computing game 2053824...
	 .... done: Tasmania JackJumpers (78) vs Perth Wildcats (89) on 2022-02-28 00:00:00
Computing game 2053825...
	 .... done: New Zealand Breakers (87) vs Illawarra Hawks (102) on 2022-03-01 00:00:00
Computing game 2056454...
	 .... done: Adelaide 36ers (76) vs South East Melbourne Phoenix (83) on 2022-03-04 00:00:00
Computing game 2056455...
	 .... done: Tasmania JackJumpers (66) vs New Zealand Breakers (62) on 2022-03-05 00:00:00
Computing game 2056457...
	 .... done: Melbourne United (95) vs Brisbane Bullets (83) on 2022-03-05 00:00:00
Computing game 2056458...
	 .... done: Adelaide 36ers (73) vs Perth Wildcats (92) on 2022-03-06 00:00:00
Computing game 2056461.



	 .... done: New Zealand Breakers (74) vs Brisbane Bullets (92) on 2022-03-07 00:00:00
Computing game 2056460...
	 .... done: Sydney Kings (98) vs Cairns Taipans (88) on 2022-03-06 00:00:00
Computing game 2056463...
	 .... done: Melbourne United (87) vs Perth Wildcats (97) on 2022-03-10 00:00:00
Computing game 2056464...
	 .... done: Cairns Taipans (69) vs Tasmania JackJumpers (85) on 2022-03-11 00:00:00
Computing game 2056466...
	 .... done: Brisbane Bullets (83) vs Perth Wildcats (95) on 2022-03-12 00:00:00
Computing game 2056467...




	 .... done: New Zealand Breakers (84) vs Adelaide 36ers (75) on 2022-03-12 00:00:00
Computing game 2056469...
	 .... done: Tasmania JackJumpers (81) vs Illawarra Hawks (77) on 2022-03-13 00:00:00
Computing game 2056471...




	 .... done: South East Melbourne Phoenix (90) vs Melbourne United (98) on 2022-03-13 00:00:00
Computing game 2056472...
	 .... done: Cairns Taipans (77) vs Sydney Kings (86) on 2022-03-13 00:00:00
Computing game 2056473...
	 .... done: New Zealand Breakers (102) vs Perth Wildcats (104) on 2022-03-14 00:00:00
Computing game 2065653...
	 .... done: Illawarra Hawks (103) vs South East Melbourne Phoenix (97) on 2022-03-17 00:00:00
Computing game 2065654...
	 .... done: Adelaide 36ers (83) vs Cairns Taipans (57) on 2022-03-18 00:00:00
Computing game 2065655...
	 .... done: Tasmania JackJumpers (65) vs Illawarra Hawks (91) on 2022-03-19 00:00:00
Computing game 2065656...
	 .... done: South East Melbourne Phoenix (89) vs Sydney Kings (91) on 2022-03-19 00:00:00
Computing game 2065657...
	 .... done: Melbourne United (101) vs Adelaide 36ers (74) on 2022-03-20 00:00:00
Computing game 2065658...




	 .... done: Brisbane Bullets (88) vs Cairns Taipans (98) on 2022-03-20 00:00:00
Computing game 2065659...
	 .... done: Perth Wildcats (95) vs New Zealand Breakers (85) on 2022-03-20 00:00:00
Computing game 2069165...
	 .... done: New Zealand Breakers (100) vs Brisbane Bullets (101) on 2022-03-24 00:00:00
Computing game 2069166...
	 .... done: Perth Wildcats (83) vs Tasmania JackJumpers (85) on 2022-03-24 00:00:00
Computing game 2069167...
	 .... done: Cairns Taipans (74) vs South East Melbourne Phoenix (86) on 2022-03-25 00:00:00
Computing game 2069168...
	 .... done: Brisbane Bullets (82) vs Tasmania JackJumpers (84) on 2022-03-26 00:00:00
Computing game 2069169...
	 .... done: Perth Wildcats (80) vs Sydney Kings (102) on 2022-03-26 00:00:00
Computing game 2069170...
	 .... done: Adelaide 36ers (100) vs South East Melbourne Phoenix (92) on 2022-03-27 00:00:00
Computing game 2069171...
	 .... done: Melbourne United (77) vs Illawarra Hawks (92) on 2022-03-27 00:00:00
Computing game 206



	 .... done: New Zealand Breakers (90) vs Cairns Taipans (93) on 2022-03-27 00:00:00
Computing game 2069175...
	 .... done: Illawarra Hawks (87) vs Brisbane Bullets (70) on 2022-03-31 00:00:00
Computing game 2069177...
	 .... done: Adelaide 36ers (72) vs Tasmania JackJumpers (80) on 2022-04-01 00:00:00
Computing game 2069179...
	 .... done: Melbourne United (90) vs Illawarra Hawks (96) on 2022-04-02 00:00:00
Computing game 2069181...
	 .... done: Cairns Taipans (90) vs South East Melbourne Phoenix (85) on 2022-04-02 00:00:00
Computing game 2069183...
	 .... done: Tasmania JackJumpers (83) vs Sydney Kings (103) on 2022-04-03 00:00:00
Computing game 2069184...
	 .... done: Brisbane Bullets (92) vs Adelaide 36ers (91) on 2022-04-03 00:00:00
Computing game 2069186...
	 .... done: New Zealand Breakers (77) vs Cairns Taipans (87) on 2022-04-04 00:00:00
Computing game 2069187...
	 .... done: Perth Wildcats (75) vs Melbourne United (84) on 2022-04-04 00:00:00
Computing game 2069191...
	 .... d



	 .... done: Adelaide 36ers (85) vs Brisbane Bullets (93) on 2022-04-11 00:00:00
Computing game 2069190...
	 .... done: Sydney Kings (82) vs Adelaide 36ers (90) on 2022-04-17 00:00:00
Computing game 2069193...
	 .... done: South East Melbourne Phoenix (80) vs Tasmania JackJumpers (84) on 2022-04-17 00:00:00
Computing game 2069195...
	 .... done: Perth Wildcats (106) vs Cairns Taipans (87) on 2022-04-16 00:00:00
Computing game 2069197...
	 .... done: Melbourne United (88) vs Brisbane Bullets (79) on 2022-04-16 00:00:00
Computing game 2069198...
	 .... done: New Zealand Breakers (86) vs Tasmania JackJumpers (88) on 2022-04-15 00:00:00
Computing game 2069200...
	 .... done: Perth Wildcats (70) vs Adelaide 36ers (82) on 2022-04-14 00:00:00
Computing game 2069201...
	 .... done: Illawarra Hawks (102) vs Sydney Kings (107) on 2022-04-14 00:00:00
Computing game 2069205...
	 .... done: New Zealand Breakers (70) vs Sydney Kings (76) on 2022-04-12 00:00:00
Computing game 2069173...
	 .... done: 



	 .... done: Sydney Kings (84) vs Illawarra Hawks (87) on 2022-04-24 00:00:00
Computing game 2069176...




	 .... done: New Zealand Breakers (60) vs Adelaide 36ers (93) on 2022-04-24 00:00:00
Computing game 2069178...
	 .... done: Cairns Taipans (112) vs Brisbane Bullets (98) on 2022-04-23 00:00:00
Computing game 2069180...




	 .... done: Tasmania JackJumpers (83) vs Melbourne United (61) on 2022-04-23 00:00:00
Computing game 2069182...
	 .... done: Perth Wildcats (77) vs Illawarra Hawks (82) on 2022-04-22 00:00:00
Computing game 2069188...
	 .... done: Cairns Taipans (77) vs Sydney Kings (87) on 2022-04-21 00:00:00
Computing game 2069189...
	 .... done: Melbourne United (92) vs Cairns Taipans (80) on 2022-04-18 00:00:00
Number of games:  139


Unnamed: 0,game_id,tno,player,shirtNumber,sMinutes,sFieldGoalsMade,sFieldGoalsAttempted,sFieldGoalsPercentage,sThreePointersMade,sThreePointersAttempted,...,photoT,photoS,playingPosition,starter,name,comp.sMinutesAverage,comp.sPointsAverage,comp.sReboundsTotalAverage,comp.sAssistsAverage,captain
1473,2053820,2,Jarred Bairstow,21,00:06:12,0,0,0,0,0,...,http://img.wh.sportingpulseinternational.com/d...,http://img.wh.sportingpulseinternational.com/d...,F,0,J. Bairstow,8:01,1.7,1.8,0.4,False
1432,2053818,2,Tyrell Harrison,24,00:24:17,2,7,28,0,0,...,http://img.wh.sportingpulseinternational.com/c...,http://img.wh.sportingpulseinternational.com/c...,C,0,T. Harrison,20:43,6.57,6.29,0.64,False
1916,2065654,2,Keanu Pinder,25,00:22:27,6,12,50,0,0,...,http://img.wh.sportingpulseinternational.com/4...,http://img.wh.sportingpulseinternational.com/4...,F,1,K. Pinder,20:47,8.12,6.41,0.88,False
2357,2069187,2,Yudai Baba,18,00:17:44,1,2,50,1,2,...,http://img.wh.sportingpulseinternational.com/f...,http://img.wh.sportingpulseinternational.com/f...,G,1,Y. Baba,17:53,4.0,0.0,1.5,False
169,1976451,1,Bul Kuol,42,00:15:11,1,4,25,1,4,...,http://img.wh.sportingpulseinternational.com/c...,http://img.wh.sportingpulseinternational.com/c...,F,0,B. Kuol,13:11,3.0,1.5,1.0,False


If we want we can do some sanity checks, before saving to disk:

In [7]:
dtale.show(games_df)
dtale.show(stint_stats_df)
# dtale.show(stints_df)
# dtale.show(players_df)



In [9]:
import random
print("The shape of stats_df is:", stint_stats_df.shape)
stats_cols = list(stint_stats_df.columns[4:-49])
print("Stats cols:", stats_cols)

# build columns we want to show
cols = ['game_id' , 'tno', 'team', 'stint']
rnd_cols = random.sample(stats_cols, 8)
rnd_cols.extend([f"{x}_opp" for x in rnd_cols])
cols.extend(rnd_cols)

# show some sample of stats computed
stint_stats_df[cols].sample(5)

The shape of stats_df is: (4687, 99)
Stats cols: ['poss', 'ortg', 'drtg', 'nrtg', 'fga', 'fgm', 'fgp', 'pts', 'patra', 'patrm', 'patrp', '3pt_fga', '3pt_fgm', '3pt_fgp', '2pt_fga', '2pt_fgm', '2pt_fgp', 'fta', 'ftm', 'ftp', 'tsp', 'ast', 'astr', 'fgm_astp', 'stl', 'stlr', 'blk', 'blkr', 'tov', 'tovr', 'reb', 'dreb', 'drebc', 'drebp', 'oreb', 'odrec', 'orebp', 'trb', 'trbr', 'tov_bh', 'tov_bp', 'tov_ofoul', 'tov_3sec', 'tov_8sec', 'tov_24sec', 'opp_fga_blocked']


Unnamed: 0,game_id,tno,team,stint,tsp,3pt_fga,astr,fta,fga,reb,fgp,patrp,tsp_opp,3pt_fga_opp,astr_opp,fta_opp,fga_opp,reb_opp,fgp_opp,patrp_opp
4041,2069202,2,South East Melbourne Phoenix,14,17.36,0.0,0.0,2.0,2.0,4.0,0.0,0.0,0.0,3.0,0.0,0.0,3.0,1.0,0.0,0.0
397,1976456,1,Tasmania JackJumpers,8,34.72,2.0,0.0,2.0,2.0,2.0,0.0,0.0,0.0,1.0,0.0,0.0,2.0,2.0,0.0,0.0
2846,2056464,2,Tasmania JackJumpers,1,81.86,19.0,21.96,3.0,28.0,16.0,60.71,50.0,53.19,7.0,31.65,1.0,24.0,8.0,50.0,58.33
1230,2031338,2,Brisbane Bullets,18,61.48,0.0,0.0,1.0,2.0,0.0,50.0,0.0,50.0,1.0,0.0,0.0,2.0,1.0,50.0,0.0
200,2004608,2,Cairns Taipans,9,28.57,5.0,16.67,0.0,7.0,1.0,28.57,100.0,125.0,2.0,25.0,0.0,4.0,3.0,100.0,100.0


Sanity check that `(ortg, drtg)` (offensive/defensive rate goal) should mirror `(drtg_opp, ortg_opp)` (opponent offensive/defensive rate goal)):

In [10]:
# (ortg, drtg) should mirror (drtg_opp, ortg_opp)
stint_stats_df.iloc[4][['game_id' , 'team', 'poss', 'ortg', 'drtg', "poss_opp", "ortg_opp", "drtg_opp"]]

game_id                  1976446
team        Tasmania JackJumpers
poss                         3.0
ortg                      266.67
drtg                       100.0
poss_opp                     4.0
ortg_opp                   100.0
drtg_opp                  266.67
Name: 4, dtype: object

## 3. Save stats and games to files

We now save the full dataframes (stats and games) in various formats: binary (pickle), csv, and Excel.

This will allows us to re-load that data later to add more games to it quicker.

In [11]:
import datetime
import os
import shutil
from pathlib import Path

# make a backup of existing tables on files
for pkl_file in FILES.values():
    for ext in ['.csv', '.xlsx', '.pkl']:
        file = Path(pkl_file).with_suffix(ext)
        if os.path.exists(file):
            # print("Backup file", file)
            shutil.copy(file, file.with_suffix(".bak"))

# dump stint stats dataframe
stint_stats_df.to_pickle(Path(DATA_DIR, "stint_stats_df").with_suffix(".pkl"))
stint_stats_df.to_csv(Path(DATA_DIR, "stint_stats_df").with_suffix(".csv"), index=False)
stint_stats_df.to_excel(Path(DATA_DIR, "stint_stats_df").with_suffix(".xlsx"), index=False)

# dump stint stats dataframe
stints_df.to_pickle(Path(DATA_DIR, "stints_df").with_suffix(".pkl"))
stints_df.to_csv(Path(DATA_DIR, "stints_df").with_suffix(".csv"), index=False)
stints_df.to_excel(Path(DATA_DIR, "stints_df").with_suffix(".xlsx"), index=False)

# dump game dataframe
games_df.to_pickle(Path(DATA_DIR, "games_df").with_suffix(".pkl"))
games_df.to_csv(Path(DATA_DIR, "games_df").with_suffix(".csv"), index=False)
games_df.to_excel(Path(DATA_DIR, "games_df").with_suffix(".xlsx"), index=False)

# dump players dataframe
players_df.to_pickle(Path(DATA_DIR, "players_df").with_suffix(".pkl"))
players_df.to_csv(Path(DATA_DIR, "players_df").with_suffix(".csv"), index=False)
players_df.to_excel(Path(DATA_DIR, "players_df").with_suffix(".xlsx"), index=False)

now = datetime.datetime.now() # current date and time
date_time = now.strftime("%m/%d/%Y, %H:%M:%S")
print(f"Finished saving at time: {date_time}")

Finished saving at time: 09/17/2022, 20:53:50


## 4. Inspection & analysis

We use [dtale](https://pypi.org/project/dtale/) package for this.

In [None]:
dtale.show(stint_stats_df)
# dtale.show(stats_df[['tno', 'stint', 'poss', 'ortg', 'drtg', "poss_opp", "ortg_opp", "drtg_opp"]])

## 5. Some checks...

Check if a stint lineup has more than 5 players! It could happen:

1. Game 2031329, player H. Besson comes out (wrongly?) at 3rd period min 10:00 but he keeps playing and then goes out again at 7:33.

In [None]:
stint_stats_df.shape
mask = stint_stats_df['lineup'].apply(lambda x: len(x) != 5)
stint_stats_df[mask]

# stats_df.iloc[941][['game_id', 'lineup']]