# Datasets
A brainstorm of what datasets would be useful

## By player per game
A dictionary of player's and their stats in all their games

## By team per game
A dictionary of all teams and overall stats for each game they've played

## Ratings
A dictionary of all players and their current ratings. Not on a game by game basis, but an aggregate.
Sources could include:
* NBA official stats
* 2K
* Other free stats sources

## Data structure
It might be useful to have custom data structures to access all this data. For example, I could have a specific data structure for box score, player stats, team stats.

# Using tutorial
[Here](http://practicallypredictable.com/2017/12/21/web-scraping-nba-team-matchups-box-scores/) is a nice tutorial on how to scrape box score data.
A nice notebook viewer version [here](https://nbviewer.jupyter.org/github/practicallypredictable/posts/blob/master/basketball/nba/notebooks/scrape-stats_nba-team_matchups.ipynb)

In [1]:
from itertools import chain
from pathlib import Path
from time import sleep
import datetime as dt
import requests
from tqdm import tqdm
tqdm.monitor_interval = 0
import numpy as np
import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

### User agent
Found out what my user-agent is [here](https://www.whoishostingthis.com/tools/user-agent/)

In [2]:
USER_AGENT = ('Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
              'AppleWebKit/537.36 (KHTML, like Gecko) ' +
              'Chrome/70.0.3538.77 Safari/537.36')

REQUEST_HEADERS = {
    'user-agent': USER_AGENT,
}

### Getting gameid from game by game page

Think about doing this day by day. Will make it much cleaner.

In [3]:
NBA_URL = 'https://stats.nba.com/stats/teamgamelogs'
NBA_ID = '00'

NBA_SEASON_TYPES = {
    'regular': 'Regular Season',
    'playoffs': 'Playoffs',
    'preseason': 'Pre Season',
}

season = '2016-17'
season_type = NBA_SEASON_TYPES['regular']

nba_params = {
    'LeagueID': NBA_ID,
    'Season': season,
    'SeasonType': season_type,
}
r = requests.get(NBA_URL, params=nba_params, headers=REQUEST_HEADERS, allow_redirects=False, timeout=15)
assert r.status_code == 200

In [4]:
json_dict = r.json() # Turns the json text into a python dict

In [5]:
games_dict = json_dict['resultSets'][0]
headers = games_dict['headers']
GAME_ID_COL = headers.index('GAME_ID')
DATE_COL = headers.index('GAME_DATE')

games_list = games_dict['rowSet']

game = games_list[0]
game_date = dt.datetime.strptime(game[DATE_COL][:10], '%Y-%m-%d')
game_id = game[GAME_ID_COL]
game_id

'0021601224'

### Parsing the box score
Now that we have a way to get the game ids, we have to parse the box score

The XHR with individual player scores is boxscoretraditionalv2

example of url is 

https://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=28800&GameID=0021800162&RangeType=0&Season=2018-19&SeasonType=Regular+Season&StartPeriod=1&StartRange=0

Not sure what a bunch of these parameters are, going to compare them to others. This is an example from a different game, also final boxscore:

https://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=28800&GameID=0021800152&RangeType=0&Season=2018-19&SeasonType=Regular+Season&StartPeriod=1&StartRange=0

It has the exact same parameters except for the GameID, I suspect that the other parameters change when the game is not final. At time of writing, there are no ongoing games so I shall try again tomorrow.

When a game is in play, there is no boxscoretraditionalv2 being called. Replacing the above exmple urls with a live game's GameID shows null for everyone's stats. Ways to handle this:
* Find the real time boxscores and update as we go
* Prevent any update for non finished games
    * Check NULL for every boxscore value
    * Check something else to see if a game is live. This is preferable.
    
    

In [6]:
URL_GAME_BOXSCORE = 'https://stats.nba.com/stats/boxscoretraditionalv2'
boxscore_params = {
    'GameID': game_id,
    'Season': season,
    'SeasonType': season_type,
    'EndPeriod': '10',
    'EndRange': '28800',
    'RangeType': '0',
    'StartPeriod': '1',
    'StartRange': '0'
}
r_boxscore = requests.get(URL_GAME_BOXSCORE, params=boxscore_params, headers=REQUEST_HEADERS, allow_redirects=False, timeout=15)
assert r_boxscore.status_code == 200

In [7]:
json_boxscore = r_boxscore.json()

In [47]:
player_stats = json_boxscore['resultSets'][0] # 0 is player stats, 1 is team stats, 2 is starter bench stats
boxscore_headers = player_stats['headers'] 
boxscore_stats = player_stats['rowSet']

df_boxscore = pd.DataFrame(columns=boxscore_headers, data=boxscore_stats)
df_boxscore

Unnamed: 0,GAME_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_CITY,PLAYER_ID,PLAYER_NAME,START_POSITION,COMMENT,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TO,PF,PTS,PLUS_MINUS
0,21601224,1610612750,MIN,Minnesota,203952,Andrew Wiggins,F,,30:19,7.0,17.0,0.412,2.0,5.0,0.4,5.0,7.0,0.714,0.0,2.0,2.0,3.0,1.0,0.0,4.0,2.0,21.0,-33.0
1,21601224,1610612750,MIN,Minnesota,203476,Gorgui Dieng,F,,27:21,3.0,8.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,3.0,5.0,8.0,2.0,0.0,0.0,3.0,2.0,6.0,-30.0
2,21601224,1610612750,MIN,Minnesota,1626157,Karl-Anthony Towns,C,,36:09,12.0,17.0,0.706,3.0,4.0,0.75,1.0,1.0,1.0,1.0,20.0,21.0,3.0,1.0,0.0,0.0,2.0,28.0,2.0
3,21601224,1610612750,MIN,Minnesota,201575,Brandon Rush,G,,24:19,3.0,3.0,1.0,2.0,2.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,2.0,0.0,0.0,2.0,0.0,8.0,-13.0
4,21601224,1610612750,MIN,Minnesota,1627739,Kris Dunn,G,,32:16,5.0,13.0,0.385,0.0,2.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,16.0,1.0,2.0,3.0,4.0,10.0,3.0
5,21601224,1610612750,MIN,Minnesota,201956,Omri Casspi,,,19:02,2.0,4.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,3.0,2.0,0.0,0.0,0.0,2.0,4.0,-7.0
6,21601224,1610612750,MIN,Minnesota,203498,Shabazz Muhammad,,,21:52,9.0,18.0,0.5,1.0,6.0,0.167,3.0,5.0,0.6,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,22.0,10.0
7,21601224,1610612750,MIN,Minnesota,202332,Cole Aldrich,,,15:20,1.0,2.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,4.0,0.0,0.0,0.0,1.0,1.0,2.0,19.0
8,21601224,1610612750,MIN,Minnesota,1626145,Tyus Jones,,,28:07,6.0,9.0,0.667,3.0,4.0,0.75,2.0,2.0,1.0,1.0,3.0,4.0,7.0,2.0,1.0,1.0,1.0,17.0,10.0
9,21601224,1610612750,MIN,Minnesota,203940,Adreian Payne,,,5:15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,14.0
