# Datasets
A brainstorm of what datasets would be useful

## By player per game
A dictionary of player's and their stats in all their games

## By team per game
A dictionary of all teams and overall stats for each game they've played

## Ratings
A dictionary of all players and their current ratings. Not on a game by game basis, but an aggregate.
Sources could include:
* NBA official stats
* 2K
* Other free stats sources

## Data structure
It might be useful to have custom data structures to access all this data. For example, I could have a specific data structure for box score, player stats, team stats.

# Using tutorial
[Here](http://practicallypredictable.com/2017/12/21/web-scraping-nba-team-matchups-box-scores/) is a nice tutorial on how to scrape box score data.
A nice notebook viewer version [here](https://nbviewer.jupyter.org/github/practicallypredictable/posts/blob/master/basketball/nba/notebooks/scrape-stats_nba-team_matchups.ipynb)

In [58]:
from itertools import chain
from pathlib import Path
from time import sleep
import datetime as dt
import requests
from tqdm import tqdm
tqdm.monitor_interval = 0
import numpy as np
import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

### User agent
Found out what my user-agent is [here](https://www.whoishostingthis.com/tools/user-agent/)

In [67]:
USER_AGENT = ('Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
              'AppleWebKit/537.36 (KHTML, like Gecko) ' +
              'Chrome/70.0.3538.77 Safari/537.36')

REQUEST_HEADERS = {
    'user-agent': USER_AGENT,
}

### Getting gameid from game by game page

Think about doing this day by day. Will make it much cleaner.

In [13]:
NBA_URL = 'https://stats.nba.com/stats/teamgamelogs'
NBA_ID = '00'

NBA_SEASON_TYPES = {
    'regular': 'Regular Season',
    'playoffs': 'Playoffs',
    'preseason': 'Pre Season',
}

season = '2016-17'
season_type = NBA_SEASON_TYPES['regular']

nba_params = {
    'LeagueID': NBA_ID,
    'Season': season,
    'SeasonType': season_type,
}
r = requests.get(NBA_URL, params=nba_params, headers=REQUEST_HEADERS, allow_redirects=False, timeout=15)
assert r.status_code == 200

In [29]:
json_dict = r.json() # Turns the json text into a python dict

In [74]:
games_dict = json_dict['resultSets'][0]
headers = games_dict['headers']
GAME_ID_COL = headers.index('GAME_ID')
DATE_COL = headers.index('GAME_DATE')

games_list = games_dict['rowSet']

game = games_list[0]
game_date = dt.datetime.strptime(game[DATE_COL][:10], '%Y-%m-%d')
game_id = game[GAME_ID_COL]
game_id

'0021601217'

### Parsing the box score
Now that we have a way to get the game ids, we "simply" have to parse the box score

The XHR with individual player scores is boxscoretraditionalv2

example of url is 

https://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=28800&GameID=0021800162&RangeType=0&Season=2018-19&SeasonType=Regular+Season&StartPeriod=1&StartRange=0

Not sure what a bunch of these parameters are, going to compare them to others. This is an example from a different game, also final boxscore:

https://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=28800&GameID=0021800152&RangeType=0&Season=2018-19&SeasonType=Regular+Season&StartPeriod=1&StartRange=0

It has the exact same parameters except for the GameID, I suspect that the other parameters change when the game is not final. At time of writing, there are no ongoing games so I shall try again tomorrow.

In [77]:
URL_GAME_BOXSCORE = 'https://stats.nba.com/stats/boxscoretraditionalv2'
boxscore_params = {
    'GameID': game_id,
    'Season': season,
    'SeasonType': season_type,
    'EndPeriod': '10',
    'EndRange': '28800',
    'RangeType': '0',
    'StartPeriod': '1',
    'StartRange': '0'
}
r_boxscore = requests.get(URL_GAME_BOXSCORE, params=boxscore_params, headers=REQUEST_HEADERS, allow_redirects=False, timeout=15)
assert r_boxscore.status_code == 200

In [80]:
json_boxscore = r_boxscore.json()

In [81]:
json_boxscore

{'parameters': {'EndPeriod': 10,
  'EndRange': 28800,
  'GameID': '0021601217',
  'RangeType': 0,
  'StartPeriod': 1,
  'StartRange': 0},
 'resource': 'boxscore',
 'resultSets': [{'headers': ['GAME_ID',
    'TEAM_ID',
    'TEAM_ABBREVIATION',
    'TEAM_CITY',
    'PLAYER_ID',
    'PLAYER_NAME',
    'START_POSITION',
    'COMMENT',
    'MIN',
    'FGM',
    'FGA',
    'FG_PCT',
    'FG3M',
    'FG3A',
    'FG3_PCT',
    'FTM',
    'FTA',
    'FT_PCT',
    'OREB',
    'DREB',
    'REB',
    'AST',
    'STL',
    'BLK',
    'TO',
    'PF',
    'PTS',
    'PLUS_MINUS'],
   'name': 'PlayerStats',
   'rowSet': [['0021601217',
     1610612765,
     'DET',
     'Detroit',
     1626169,
     'Stanley Johnson',
     'F',
     '',
     '20:25',
     0,
     5,
     0.0,
     0,
     3,
     0.0,
     0,
     0,
     0.0,
     0,
     1,
     1,
     0,
     0,
     0,
     0,
     2,
     0,
     -8.0],
    ['0021601217',
     1610612765,
     'DET',
     'Detroit',
     202720,
     'Jon Leuer',