# BSS: Basketball Statistic System

This system tries to replicate [euRobasketAu](https://github.com/jgalowe/euRobasketAu?organization=jgalowe&organization=jgalowe) R scripts in Python.

It scrapes the data and then converts the raw numbers into _advanced stats_.

The data is provided live by [Genius Sports ](https://developer.geniussports.com/). The documentation for the Basketball feed can be found [here](https://developer.geniussports.com/livestats/tvfeed/index_basketball.html).

Messages are sent in JSON structures and use UTF-8 format.

An example of a raw JSON file:

https://fibalivestats.dcd.shared.geniussports.com/data/2087737/data.json

In [1]:
# Let's first load all required packages...
import json  # https://docs.python.org/3/library/json.html
import os
import pandas as pd

# Load constants
from config import *
import tools

In [2]:
# Load relevant game data
game_id = 742430
game_id = 2087737


pbp_df = tools.get_raw_pbp_fibalivestats(game_id)

# pbp_df.sample(20)
pbp_df.head()

Game data loaded from local file: data-2087737.json
Game Melbourne United (United) vs Tasmania JackJumpers (JackJumpers)


Unnamed: 0,team_name,team_short_name,clock,s1,s2,lead,tno,period,periodType,pno,player,success,actionType,actionNumber,previousAction,qualifier,subType,scoring
70,Melbourne United,JackJumpers,03:26:00,64,66,-2,1,4,REGULAR,13,M. Dellavedova,1,rebound,673,671.0,[],offensive,0
431,Melbourne United,JackJumpers,00:01:50,25,21,4,1,1,REGULAR,6,J. White,1,rebound,188,187.0,[],defensive,0
546,Tasmania JackJumpers,JackJumpers,08:06:00,3,0,3,2,1,REGULAR,11,J. Adams,0,2pt,24,,[pointsinthepaint],drivinglayup,1
232,Tasmania JackJumpers,JackJumpers,05:32:00,46,49,-3,2,3,REGULAR,8,S. McDaniel,1,substitution,458,,[],in,0
90,Tasmania JackJumpers,JackJumpers,04:44:00,63,66,-3,2,4,REGULAR,2,C. Steindl,1,foul,643,,"[shooting, 2freethrow]",personal,0
469,Melbourne United,JackJumpers,02:43:00,13,14,-1,1,1,REGULAR,14,A. Hukporti,1,block,141,140.0,[],,0
522,Melbourne United,JackJumpers,05:53:00,9,5,4,1,1,REGULAR,5,S. Ili,1,substitution,54,,[],out,0
384,Melbourne United,JackJumpers,06:53:00,29,24,5,1,2,REGULAR,6,J. White,1,foul,258,,"[shooting, 2freethrow]",personal,0
123,Tasmania JackJumpers,JackJumpers,07:28:00,59,62,-3,2,4,REGULAR,8,S. McDaniel,1,foul,598,,"[shooting, 2freethrow]",personal,0
102,Melbourne United,JackJumpers,05:36:00,63,66,-3,1,4,REGULAR,5,S. Ili,1,substitution,628,,[],in,0


In [24]:
import re

# just check that no player name has a number on it or a comma
pbp_df.loc[pbp_df['player'].str.contains('\d') | pbp_df['player'].str.contains(',')]


Unnamed: 0,team_name,team_short_name,clock,s1,s2,lead,tno,period,periodType,pno,player,success,actionType,actionNumber,previousAction,qualifier,subType,scoring


Unnamed: 0,team_name,team_short_name,clock,s1,s2,lead,tno,period,periodType,pno,player,success,actionType,actionNumber,previousAction,qualifier,subType,scoring
563,,,10:00:00,0,0,0,0,1,REGULAR,0,,1,jumpball,4,,[],startperiod,0
564,,,10:00:00,0,0,0,0,1,REGULAR,0,,1,period,2,,[],start,0
565,,,10:00:00,0,0,0,0,1,REGULAR,0,,1,game,1,,[],start,0
561,Melbourne United,JackJumpers,09:56:00,0,0,0,1,1,REGULAR,10,J. Lual-Acuil Jr,1,jumpball,7,4,[],lost,0
562,Tasmania JackJumpers,JackJumpers,09:56:00,0,0,0,2,1,REGULAR,9,F. Krslovic,1,jumpball,6,4,[],won,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4,Tasmania JackJumpers,JackJumpers,00:11:60,71,76,-5,2,4,REGULAR,14,M. McIntosh,1,rebound,769,768,[],defensive,0
3,Tasmania JackJumpers,JackJumpers,00:06:80,71,76,-5,2,4,REGULAR,11,J. Adams,1,turnover,771,,[],badpass,0
2,Melbourne United,JackJumpers,00:05:10,73,76,-3,1,4,REGULAR,10,J. Lual-Acuil Jr,1,2pt,773,,"[fromturnover, pointsinthepaint]",dunk,1
0,,,00:00:00,73,76,-3,0,4,REGULAR,0,,1,game,776,,[confirmed],end,0


In [None]:
pbp_cols = list(pbp_df.columns)
pbp_cols

We convert some data types.

* `gt` and `clock_time`. We used [Timestamp](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html), the Pandas version of Datetime.
  * One could also consider using [Timedelta](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_timedelta.html).
  * `clock time` uses `MM:SS:CC` where `CC` is hundredths of seconds, which is read as microseconds. So `00:05:10` is 00:00:05.100` which is correct.
  * We can use `.dt.time` on a `datetime` to extract just the time.
  * We can eventually do  `errors=coerece` to get `NaN` on errors.

In [None]:
pbp_df['gt'] = pd.to_datetime(pbp_df['gt'], format="%M:%S").dt.time
pbp_df['clock'] = pd.to_datetime(pbp_df['clock'], format="%M:%S:%f").dt.time

# pd.to_datetime(pbp_df['clock'], format="%M:%S:%f").apply(lambda x: pd.Timestamp(x))


pbp_df.info()
# print(pbp_df['gt'].dtypes)

In [None]:
pbp_df.head()