# Goal

The goal here is to predict the outcomes of NBA games to:
1. Practice scraping
2. Practice using neural nets

We'll use [Basketball Reference](https://www.basketball-reference.com/) for our stats since they allow scraping as long as it isn't done in a mannger that adversely impact site performance/access. To make sure we don't adversely impact the site, we'll make sure we add a pause between each scraping event and we'll then store our results once the scraping is complete so we don't need to re-scrape data from the site. This will result in ongoing maintenance on our end to pick which games to start scraping again as new results come in, which is okay.

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

# Grabbing Data

First, we'll grab data from just one game for practice to make sure we can get what we want:
* Score
* Box Score
* Advanced Stats

In [2]:
test_game = requests.get("https://www.basketball-reference.com/boxscores/202110200CHO.html")

# Checking output to make sure info retrieved properly
test_game

<Response [200]>

In [3]:
game_soup = BeautifulSoup(test_game.content, 'html.parser')

First, we'll break down the data for one team (in this case, the Indiana Pacers)

In [20]:
IND = game_soup.select('div.section_wrapper.box-IND.box-IND-game')

In [50]:
IND_reg_subset = IND[0]

In [51]:
IND_reg_data = list(IND_reg_subset.find_all('table', class_='sortable'))[0]

In [52]:
players = IND_reg_data.find_all('th', class_='left')
players

[<th class="left" csk="Brogdon,Malcolm" data-append-csv="brogdma01" data-stat="player" scope="row"><a href="/players/b/brogdma01.html">Malcolm Brogdon</a></th>,
 <th class="left" csk="Sabonis,Domantas" data-append-csv="sabondo01" data-stat="player" scope="row"><a href="/players/s/sabondo01.html">Domantas Sabonis</a></th>,
 <th class="left" csk="Duarte,Chris" data-append-csv="duartch01" data-stat="player" scope="row"><a href="/players/d/duartch01.html">Chris Duarte</a></th>,
 <th class="left" csk="Turner,Myles" data-append-csv="turnemy01" data-stat="player" scope="row"><a href="/players/t/turnemy01.html">Myles Turner</a></th>,
 <th class="left" csk="Holiday,Justin" data-append-csv="holidju01" data-stat="player" scope="row"><a href="/players/h/holidju01.html">Justin Holiday</a></th>,
 <th class="left" csk="Craig,Torrey" data-append-csv="craigto01" data-stat="player" scope="row"><a href="/players/c/craigto01.html">Torrey Craig</a></th>,
 <th class="left" csk="Lamb,Jeremy" data-append-csv=

In [53]:
player_reg_stats = IND_reg_data.find_all('td', class_='right')
player_reg_stats

[<td class="right" csk="2418" data-stat="mp">40:18</td>,
 <td class="right" data-stat="fg">8</td>,
 <td class="right" data-stat="fga">20</td>,
 <td class="right" data-stat="fg_pct">.400</td>,
 <td class="right" data-stat="fg3">3</td>,
 <td class="right" data-stat="fg3a">10</td>,
 <td class="right" data-stat="fg3_pct">.300</td>,
 <td class="right" data-stat="ft">9</td>,
 <td class="right" data-stat="fta">9</td>,
 <td class="right" data-stat="ft_pct">1.000</td>,
 <td class="right iz" data-stat="orb">0</td>,
 <td class="right" data-stat="drb">4</td>,
 <td class="right" data-stat="trb">4</td>,
 <td class="right" data-stat="ast">11</td>,
 <td class="right iz" data-stat="stl">0</td>,
 <td class="right" data-stat="blk">1</td>,
 <td class="right" data-stat="tov">1</td>,
 <td class="right" data-stat="pf">2</td>,
 <td class="right" data-stat="pts">28</td>,
 <td class="right iz" data-stat="plus_minus">0</td>,
 <td class="right" csk="2359" data-stat="mp">39:19</td>,
 <td class="right" data-stat="f

In [54]:
IND_adv_subset = IND[1]

In [55]:
IND_adv_data = list(IND_adv_subset.find_all('table', class_='sortable'))[0]

In [87]:
player_adv_stats = IND_adv_data.find_all('td', class_='right')
player_adv_stats

[<td class="right" csk="2418" data-stat="mp">40:18</td>,
 <td class="right" data-stat="ts_pct">.584</td>,
 <td class="right" data-stat="efg_pct">.475</td>,
 <td class="right" data-stat="fg3a_per_fga_pct">.500</td>,
 <td class="right" data-stat="fta_per_fga_pct">.450</td>,
 <td class="right iz" data-stat="orb_pct">0.0</td>,
 <td class="right" data-stat="drb_pct">8.7</td>,
 <td class="right" data-stat="trb_pct">4.9</td>,
 <td class="right" data-stat="ast_pct">40.3</td>,
 <td class="right iz" data-stat="stl_pct">0.0</td>,
 <td class="right" data-stat="blk_pct">1.6</td>,
 <td class="right" data-stat="tov_pct">4.0</td>,
 <td class="right" data-stat="usg_pct">25.5</td>,
 <td class="right" data-stat="off_rtg">134</td>,
 <td class="right" data-stat="def_rtg">117</td>,
 <td class="right poptip" data-stat="bpm" data-tip="OBPM: 7.1&lt;br&gt; DBPM: -0.5&lt;br&gt; VORP: 7.2&lt;br&gt; &lt;em&gt;&lt;small&gt;VORP is prorated to 82 games&lt;/small&gt;&lt;/em&gt; ">6.5</td>,
 <td class="right" csk="235

# Converting data into a dataframe

Now that we have all of the stats for Indiana, let's build a dataframe

In [77]:
cols = ['player', 'MP', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', 'FT', 'FTA', 
        'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', 
        '+/-', 'TS%', 'eFG%', '3PAr' 'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%',
        'STL%', 'BLK%', 'TOV%', 'USG%', 'ORTg', 'DRTg', 'BPM']

In [61]:
players

[<th class="left" csk="Brogdon,Malcolm" data-append-csv="brogdma01" data-stat="player" scope="row"><a href="/players/b/brogdma01.html">Malcolm Brogdon</a></th>,
 <th class="left" csk="Sabonis,Domantas" data-append-csv="sabondo01" data-stat="player" scope="row"><a href="/players/s/sabondo01.html">Domantas Sabonis</a></th>,
 <th class="left" csk="Duarte,Chris" data-append-csv="duartch01" data-stat="player" scope="row"><a href="/players/d/duartch01.html">Chris Duarte</a></th>,
 <th class="left" csk="Turner,Myles" data-append-csv="turnemy01" data-stat="player" scope="row"><a href="/players/t/turnemy01.html">Myles Turner</a></th>,
 <th class="left" csk="Holiday,Justin" data-append-csv="holidju01" data-stat="player" scope="row"><a href="/players/h/holidju01.html">Justin Holiday</a></th>,
 <th class="left" csk="Craig,Torrey" data-append-csv="craigto01" data-stat="player" scope="row"><a href="/players/c/craigto01.html">Torrey Craig</a></th>,
 <th class="left" csk="Lamb,Jeremy" data-append-csv=

In [71]:
test = players[0]
players[0]

<th class="left" csk="Brogdon,Malcolm" data-append-csv="brogdma01" data-stat="player" scope="row"><a href="/players/b/brogdma01.html">Malcolm Brogdon</a></th>

In [72]:
test.string

'Malcolm Brogdon'

In [76]:
names = []

for player in range(len(players) - 1):
    names.append(players[player].string)
    
names

['Malcolm Brogdon',
 'Domantas Sabonis',
 'Chris Duarte',
 'Myles Turner',
 'Justin Holiday',
 'Torrey Craig',
 'Jeremy Lamb',
 'T.J. McConnell',
 'Isaiah Jackson',
 'Goga Bitadze',
 'Oshae Brissett',
 'Brad Wanamaker',
 'Duane Washington']

In [78]:
player_reg_stats[0]

<td class="right" csk="2418" data-stat="mp">40:18</td>

In [86]:
mp_list = []

for player in range(len(player_reg_stats)):
    if player_reg_stats[player].get('data-stat') == 'mp':
        mp_list.append(player_reg_stats[player].string)
        
mp_list = mp_list[:-1] # last val is overall team mins, which is always 240
mp_list

['40:18', '39:19', '32:42', '25:49', '25:43', '27:48', '24:14', '24:07']

Looking through the code more closely, the `data-stat` values for regular stats are:
* `mp`
* `fg`
* `fga`
* `fg_pct`
* `fg3`
* `fg3a`
* `fg3_pct`
* `ft`
* `fta`
* `ft_pct`
* `orb`
* `drb`
* `trb`
* `ast`
* `stl`
* `blk`
* `tov`
* `pf`
* `pts`
* `plus_minus`

And the following values for advanced stats:
* `ts_pct`
* `efg_pct`
* `fg3a_per_fga_pct`
* `fta_per_fga_pct`
* `orb_pct`
* `drb_pct`
* `trb_pct`
* `ast_pct`
* `stl_pct`
* `blk_pct`
* `tov_pct`
* `usg_pct`
* `off_rtg`
* `def_rtg`
* `bpm`

Let's initialize some lists to fill when iterating through the extracted data

In [97]:
# Regular stats
mp_list = []
fg_list = []
fga_list = []
fg_pct_list = []
fg3_list = []
fg3a_list = []
fg3_pct_list = []
ft_list = []
fta_list = []
ft_pct_list = []
orb_list = []
drb_list = []
trb_list = []
ast_list = []
stl_list = []
blk_list = []
tov_list = []
pf_list = []
pts_list = []
pm_list = []

# Advanced Stats
ts_pct_list = []
efg_pct_list = []
fg3a_per_fga_pct_list = []
fta_per_fga_pct_list = []
orb_pct_list = []
drb_pct_list = []
trb_pct_list = []
ast_pct_list = []
stl_pct_list = []
blk_pct_list = []
tov_pct_list = []
usg_pct_list = []
off_rtg_list = []
def_rtg_list = []
bpm_list = []


reg_stat_list = [mp_list, fg_list, fga_list, fg_pct_list, fg3_list, fg3a_list,
                 fg3_pct_list, ft_list, fta_list, ft_pct_list, orb_list, 
                 drb_list, trb_list, ast_list, stl_list, blk_list, tov_list, 
                 pf_list, pts_list, pm_list]
    
adv_stat_list = [ts_pct_list, efg_pct_list, fg3a_per_fga_pct_list, 
                 fta_per_fga_pct_list, orb_pct_list, drb_pct_list, trb_pct_list,
                 ast_pct_list, stl_pct_list, blk_pct_list, tov_pct_list, 
                 usg_pct_list, off_rtg_list, def_rtg_list, bpm_list]

reg_stat_val_list = ['mp', 'fg', 'fga', 'fg_pct', 'fg3', 'fg3a', 'fg3_pct', 
                     'ft','fta', 'ft_pct', 'orb', 'drb', 'trb', 'ast', 'stl',
                     'blk', 'tov', 'pf', 'pts', 'plus_minus']

adv_stat_val_list = ['ts_pct', 'efg_pct', 'fg3a_per_fga_pct', 
                     'fta_per_fga_pct', 'orb_pct', 'drb_pct', 'trb_pct',
                     'ast_pct', 'stl_pct', 'blk_pct', 'tov_pct', 'usg_pct',
                     'off_rtg', 'def_rtg', 'bpm']

We'll loop through each line of extracted data and add that info to the correct list.

In [98]:
# Loop through regular stats
for stat_observation in range(len(player_reg_stats)):
    for stat_type in range(len(reg_stat_val_list)):
        if player_reg_stats[stat_observation].get('data-stat') == reg_stat_val_list[stat_type]:
            reg_stat_list[stat_type].append(player_reg_stats[stat_observation].string)
            
# Loop through advanced stats
for stat_observation in range(len(player_adv_stats)):
    for stat_type in range(len(adv_stat_val_list)):
        if player_adv_stats[stat_observation].get('data-stat') == adv_stat_val_list[stat_type]:
            adv_stat_list[stat_type].append(player_adv_stats[stat_observation].string)

In [100]:
bpm_list

['6.5', '3.4', '8.5', '-5.1', '-0.1', '-0.6', '-16.0', '3.5', None]