# JumpBall in a NBA Game

In this notebook I am going to use Python code to predict which player of a NBA game is more likely to win the jumpball.

https://sportsdatabase.com/NBA/query.html?sdql=&submit=++S+D+Q+L+%21++

The link above is very useful for sport stats. For this case in particular it isn't useful for us because it has no information about jumpballs.

I am going to try to use an API Client for nba.com. Let's see if it works.

In [1]:
pip install nba_api


Collecting nba_api
  Downloading nba_api-1.4.1-py3-none-any.whl (261 kB)
     -------------------------------------- 261.7/261.7 kB 7.9 MB/s eta 0:00:00
Collecting certifi<2024.0.0,>=2023.7.22
  Downloading certifi-2023.11.17-py3-none-any.whl (162 kB)
     -------------------------------------- 162.5/162.5 kB 9.5 MB/s eta 0:00:00
Collecting requests<3.0,>=2.31
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
     -------------------------------------- 62.6/62.6 kB 556.7 kB/s eta 0:00:00
Collecting numpy<2.0.0,>=1.22.2
  Downloading numpy-1.26.2-cp39-cp39-win_amd64.whl (15.8 MB)
     --------------------------------------- 15.8/15.8 MB 22.5 MB/s eta 0:00:00
Installing collected packages: numpy, certifi, requests, nba_api
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.5
    Uninstalling numpy-1.21.5:
      Successfully uninstalled numpy-1.21.5
  Attempting uninstall: certifi
    Found existing installation: certifi 2022.9.14
    Uninstalling certifi-2022.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
daal4py 2021.6.0 requires daal==2021.4.0, which is not installed.
scipy 1.9.1 requires numpy<1.25.0,>=1.18.5, but you have numpy 1.26.2 which is incompatible.
numba 0.55.1 requires numpy<1.22,>=1.18, but you have numpy 1.26.2 which is incompatible.
conda-repo-cli 1.0.20 requires clyent==1.2.1, but you have clyent 1.2.2 which is incompatible.
conda-repo-cli 1.0.20 requires nbformat==5.4.0, but you have nbformat 5.5.0 which is incompatible.
conda-repo-cli 1.0.20 requires requests==2.28.1, but you have requests 2.31.0 which is incompatible.


Apparently, I have newer versions of a few packages that are incompatible which causes some dependency conflicts. As I don't know how to solve this, I am going to try something else.

In [2]:
from nba_api.stats.endpoints import playercareerstats

# Nikola Jokić
career = playercareerstats.PlayerCareerStats(player_id='203999') 

# pandas data frames (optional: pip install pandas)
career.get_data_frames()[0]

# json
career.get_json()

'{"resource": "playercareerstats", "parameters": {"PerMode": "Totals", "PlayerID": 203999, "LeagueID": null}, "resultSets": [{"name": "SeasonTotalsRegularSeason", "headers": ["PLAYER_ID", "SEASON_ID", "LEAGUE_ID", "TEAM_ID", "TEAM_ABBREVIATION", "PLAYER_AGE", "GP", "GS", "MIN", "FGM", "FGA", "FG_PCT", "FG3M", "FG3A", "FG3_PCT", "FTM", "FTA", "FT_PCT", "OREB", "DREB", "REB", "AST", "STL", "BLK", "TOV", "PF", "PTS"], "rowSet": [[203999, "2015-16", "00", 1610612743, "DEN", 21.0, 80, 55, 1733.0, 307, 600, 0.512, 28, 84, 0.333, 154, 190, 0.811, 181, 379, 560, 189, 79, 50, 104, 208, 796], [203999, "2016-17", "00", 1610612743, "DEN", 22.0, 73, 59, 2038.0, 494, 854, 0.578, 45, 139, 0.324, 188, 228, 0.825, 212, 506, 718, 359, 61, 55, 171, 214, 1221], [203999, "2017-18", "00", 1610612743, "DEN", 23.0, 75, 73, 2443.0, 504, 1010, 0.499, 111, 280, 0.396, 266, 313, 0.85, 195, 608, 803, 458, 90, 61, 210, 212, 1385], [203999, "2018-19", "00", 1610612743, "DEN", 24.0, 80, 80, 2504.0, 616, 1206, 0.511, 

Well, apparently it works. Let's explore it a little bit.

In [3]:
from nba_api.live.nba.endpoints import scoreboard
import numpy as np
import pandas as pd

# Today's Score Board
games = scoreboard.ScoreBoard()

data_org = games.get_dict()["scoreboard"]["games"]

gameCode = []
homeTeam = []
awayTeam = []
result = []
winner = []

for game in data_org:
    gameCode.append(game["gameCode"])
    homeTeam.append(game["homeTeam"]["teamName"])
    awayTeam.append(game["awayTeam"]["teamName"])
    result.append(str(game["homeTeam"]["score"])+"-"+str(game["awayTeam"]["score"]))
    if game["homeTeam"]["score"] > game["awayTeam"]["score"]:
        winner.append(game["homeTeam"]["teamName"])
    else:
        winner.append(game["awayTeam"]["teamName"])
    
data = {"gameCode":gameCode,"homeTeam":homeTeam,"awayTeam":awayTeam,"result":result,"winner":winner}

pd.DataFrame(data)

Unnamed: 0,gameCode,homeTeam,awayTeam,result,winner


The previous DataFrame has also very useful information but, 1) it only gives information of matches played "today" and 2) it doesn't give information about jumpballs.

I haven't found a dataset that contains that information yet, but for now, this page is very good and gives good information: https://www.degendata.com/data

In [4]:
pip install nba_py

Note: you may need to restart the kernel to use updated packages.


In [None]:
from nba_py import game

# game.Boxscore("0022300258")

This module looked good but I don't know why it starts loading and doesn't return anything. But this have given me a good idea: to look the playbyplay and get the first play to see who wins it.

In [4]:
gameID = "0022300260"
initial_df = pd.read_json("https://cdn.nba.com/static/json/liveData/playbyplay/playbyplay_"+gameID+".json")

In [5]:
initial_df

Unnamed: 0,meta,game
version,1,
code,200,
request,http://nba.cloud/games/0022300260/playbyplay?F...,
time,2023-11-29 23:55:52.646780,
gameId,,0022300260
actions,,"[{'actionNumber': 2, 'clock': 'PT12M00.00S', '..."


In [6]:
playbyplay = initial_df.loc["actions"]["game"]

In [7]:
for i in range(len(playbyplay)):
    if playbyplay[i]["actionType"] == "jumpball" and playbyplay[i]["descriptor"] == "startperiod":
        jb_data = [playbyplay[i]]

In [8]:
jb_data

[{'actionNumber': 4,
  'clock': 'PT11M55.00S',
  'timeActual': '2023-11-30T00:40:47.6Z',
  'period': 1,
  'periodType': 'REGULAR',
  'teamId': 1610612756,
  'teamTricode': 'PHX',
  'actionType': 'jumpball',
  'subType': 'recovered',
  'descriptor': 'startperiod',
  'qualifiers': [],
  'personId': 1626164,
  'x': None,
  'y': None,
  'possession': 1610612756,
  'scoreHome': '0',
  'scoreAway': '0',
  'edited': '2023-11-30T00:40:47Z',
  'orderNumber': 40000,
  'isTargetScoreLastPeriod': False,
  'xLegacy': None,
  'yLegacy': None,
  'isFieldGoal': 0,
  'jumpBallRecoveredName': 'D. Booker',
  'jumpBallRecoverdPersonId': 1626164,
  'side': None,
  'playerName': 'Booker',
  'playerNameI': 'D. Booker',
  'personIdsFilter': [1626164, 203994, 1627751],
  'jumpBallWonPlayerName': 'Nurkic',
  'jumpBallWonPersonId': 203994,
  'description': 'Jump Ball J. Nurkic vs. J. Poeltl: Tip to D. Booker',
  'jumpBallLostPlayerName': 'Poeltl',
  'jumpBallLostPersonId': 1627751}]

In [9]:
jumpBallWonPlayerName = []
jumpBallWonPersonId = []
jumpBallLostPlayerName = []
jumpBallLostPersonId = []
index = []

In [10]:
jumpBallWonPlayerName.append(jb_data[0]["jumpBallWonPlayerName"])
jumpBallWonPersonId.append(jb_data[0]["jumpBallWonPersonId"])
jumpBallLostPlayerName.append(jb_data[0]["jumpBallLostPlayerName"])
jumpBallLostPersonId.append(jb_data[0]["jumpBallLostPersonId"])
index.append(gameID)

In [11]:
jb_dict = {"jumpBallWonPlayerName":jumpBallWonPlayerName,"jumpBallWonPersonId":jumpBallWonPersonId,"jumpBallLostPlayerName":jumpBallLostPlayerName,"jumpBallLostPersonId":jumpBallLostPersonId}

In [12]:
jb = pd.DataFrame(jb_dict,index=index)

In [13]:
jb

Unnamed: 0,jumpBallWonPlayerName,jumpBallWonPersonId,jumpBallLostPlayerName,jumpBallLostPersonId
22300260,Nurkic,203994,Poeltl,1627751


This is fine for just one game, but we need all the data from this season, so we will need to know all the gameID's of this season.

In [14]:
from nba_api.stats.endpoints import leaguegamefinder

gamefinder = leaguegamefinder.LeagueGameFinder()
games = gamefinder.get_data_frames()[0]
games_2324 = games[games.SEASON_ID.str[-4:] == '2023']
games_2324 = games_2324[games.GAME_ID.str[:2] == '00']
games_2324.head()

  games_2324 = games_2324[games.GAME_ID.str[:2] == '00']


Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,PTS,...,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PLUS_MINUS
14,62023,1610612747,LAL,Los Angeles Lakers,62300001,2023-12-09,LAL vs. IND,W,242,123,...,0.771,12,43,55,25,5,10,18,25,14.0
16,62023,1610612754,IND,Indiana Pacers,62300001,2023-12-09,IND @ LAL,L,238,109,...,0.879,9,23,32,27,10,7,9,33,-14.0
18,22023,1610612752,NYK,New York Knicks,22301227,2023-12-08,NYK @ BOS,L,240,123,...,0.741,12,32,44,28,4,3,13,20,-10.0
20,22023,1610612760,OKC,Oklahoma City Thunder,22301222,2023-12-08,OKC vs. GSW,W,265,138,...,0.861,11,28,39,15,17,7,9,22,2.0
21,22023,1610612748,MIA,Miami Heat,22301220,2023-12-08,MIA vs. CLE,L,239,99,...,0.667,7,30,37,23,5,5,18,23,-12.0


In [15]:
game_ids = sorted(list(set(games_2324["GAME_ID"])))

In [16]:
len(game_ids)

395

Now, we have all the game_ids of this season. So we have to repeat the same process we did before to get a dataframe with all the data of every opening jumpball of the season.

In [17]:
jb = pd.read_json("jb.json")

In [19]:
from tqdm import tqdm

In [22]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
for game_id in tqdm(game_ids):
    if game_id not in jb.index:
        gameID = game_id
        initial_df = pd.read_json("https://cdn.nba.com/static/json/liveData/playbyplay/playbyplay_"+gameID+".json")
        playbyplay = initial_df.loc["actions"]["game"]
        for i in range(len(playbyplay)):
            if playbyplay[i]["actionType"] == "jumpball" and playbyplay[i]["descriptor"] == "startperiod":
                jb_data = [playbyplay[i]]
        jb = jb.append(pd.DataFrame({"jumpBallWonPlayerName":jb_data[0]["jumpBallWonPlayerName"],"jumpBallWonPersonId":jb_data[0]["jumpBallWonPersonId"],"jumpBallLostPlayerName":jb_data[0]["jumpBallLostPlayerName"],"jumpBallLostPersonId":jb_data[0]["jumpBallLostPersonId"]},index=[gameID]))
    

 71%|███████   | 280/395 [02:43<01:11,  1.60it/s]

In [None]:
jb.to_json("jb.json")

In [18]:
jb

Unnamed: 0,jumpBallWonPlayerName,jumpBallWonPersonId,jumpBallLostPlayerName,jumpBallLostPersonId
12300001,Gobert,203497,Lively II,1641726
12300002,Lively II,1641726,Gobert,203497
12300003,Davis,203076,Looney,1626172
12300004,Vucevic,202696,Lopez,201572
12300005,Metu,1629002,Bagley III,1628963
...,...,...,...,...
22300260,Nurkic,203994,Poeltl,1627751
22300261,Jackson Jr.,1628991,Yurtseven,1630209
22300262,Valanciunas,202685,Morris Sr.,202694
22300263,Sengun,1630578,Jokic,203999


Now, we have the complete dataset for the opening jumpballs of this season, so we have to start getting some data from it.

In [209]:
jb_personid = list(set(jb["jumpBallWonPersonId"]))
fill = list(set(jb["jumpBallLostPersonId"]))
for n in fill:
    if n not in jb_personid:
        jb_personid.append(n)

In [219]:
jb_analytics = pd.DataFrame(index=[jb_personid],columns=["Name","Won","Total","Avg"])


In [220]:
jb_analytics = jb_analytics.fillna(0)

In [221]:
for gameID in game_ids:
    winner_id = jb.loc[gameID]["jumpBallWonPersonId"]
    loser_id = jb.loc[gameID]["jumpBallLostPersonId"]
    jb_analytics.loc[winner_id,"Name"] = jb.loc[gameID,"jumpBallWonPlayerName"]
    jb_analytics.loc[loser_id,"Name"] = jb.loc[gameID,"jumpBallLostPlayerName"]
    jb_analytics.loc[winner_id,"Won"] += 1
    jb_analytics.loc[winner_id,"Total"] += 1
    jb_analytics.loc[winner_id,"Avg"] = jb_analytics.loc[winner_id,"Won"]/jb_analytics.loc[winner_id,"Total"]
    jb_analytics.loc[loser_id,"Total"] += 1
    jb_analytics.loc[loser_id,"Avg"] = jb_analytics.loc[loser_id,"Won"]/jb_analytics.loc[loser_id,"Total"]
    

In [222]:
jb_analytics.loc[winner_id,"Won"].values[0]

6

In [223]:
jb_analytics

Unnamed: 0,Name,Won,Total,Avg
1631105,Duren,10,12,0.833333
1628418,Bryant,1,4,0.250000
1630209,Yurtseven,1,3,0.333333
1630596,Mobley,3,7,0.428571
1631109,Williams,12,20,0.600000
...,...,...,...,...
203488,Muscala,0,1,0.000000
1630567,Barnes,0,1,0.000000
201577,Lopez,0,1,0.000000
1630191,Stewart,0,2,0.000000


Here we have the dataframe with the avg of jumpballs won of each player who has jumped at least once.

In [238]:
name = "Gobert"
jb_analytics[jb_analytics["Name"].str.contains(name)]

Unnamed: 0,Name,Won,Total,Avg
203497,Gobert,11,21,0.52381


In [243]:
name = "Markkanen"
jb_analytics[jb_analytics["Name"].str.contains(name)]

Unnamed: 0,Name,Won,Total,Avg
1628374,Markkanen,0,7,0.0


I am going to leave this project like this, but it would be interesting to follow it with who is more likely to score the first basket. The following steps of this project would be:
- Get the play by play of every game and get how many tries need each team to score the first basket.
- Then, get which player is the most likely to do so.
- Get a metric of these three values (jumpball, first team to score, player most likely to score).