# NBA Injuries
***
## Goal: 
Build model to predict the probability of a player missing a game due to injury within a particular time frame

## Approach:

### Part I: Data Preparation
Tasks:

1. Scrape injury history data from Pro Sports Transactions using Beautiful Soup
2. Scrape player statistics and information from NBA Stats using Beautiful Soup and Selenium and/or nba-api
3. Clean datasets
4. Merge the two datasets


***

## Part I: Data Preparation
### Task 1: Scrape injury data from http://www.prosportstransactions.com

In [4]:
import numpy as np
import pandas as pd

In [2]:
# Scrapes injury data from http://www.prosportstransactions.com
# Collects all missed games due to  injury from Oct 10, 2012 to Aug 12, 2020

import requests
from bs4 import BeautifulSoup


def clear_bullet(s):
    return s.replace('• ', '')


nba_injuries = pd.DataFrame(columns=['Date', 'Team', 'Player', 'Injury'])

# Scrape injury data from site
for page in range(366):
    URL = 'http://www.prosportstransactions.com/basketball/Search/SearchResults.php?Player=&Team=&BeginDate=2012-10-30&EndDate=2020-08-12&InjuriesChkBx=yes&Submit=Search&start=' + str(25 * page)
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, 'html.parser')
    
    results = soup.find_all('tr', align='left')
    
    for result in results:
        entry = result.text.strip().split('\n')
        nba_injuries = nba_injuries.append({'Date': entry[0], 'Team': entry[1], 'Player': entry[3], 'Injury': entry[4]}, ignore_index=True)
        
        
# Data cleaning
# Remove all entries without player name (i.e. returned to lineup)        

nba_injuries = nba_injuries[nba_injuries.Player != ' ']
nba_injuries['Player'] = nba_injuries['Player'].apply(clear_bullet)


# Save as csv
nba_injuries.to_csv('injuries.csv', index=False)

### Task 2: Get player stats from NBA.com using NBA-API

In [1]:
import nba_api.stats.static.players as players
from nba_api.stats import endpoints

In [3]:
# Get player bios from 18-19 season
bios1819 = endpoints.LeagueDashPlayerBioStats(season='2018-19').get_data_frames()[0]
bios1819.head()

Unnamed: 0,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_ABBREVIATION,AGE,PLAYER_HEIGHT,PLAYER_HEIGHT_INCHES,PLAYER_WEIGHT,COLLEGE,COUNTRY,...,GP,PTS,REB,AST,NET_RATING,OREB_PCT,DREB_PCT,USG_PCT,TS_PCT,AST_PCT
0,203932,Aaron Gordon,1610612753,ORL,23.0,6-9,81,220,Arizona,USA,...,78,1246,574,289,1.5,0.047,0.165,0.213,0.538,0.166
1,1628988,Aaron Holiday,1610612754,IND,22.0,6-1,73,185,UCLA,USA,...,50,294,67,87,7.0,0.008,0.088,0.206,0.518,0.18
2,1627846,Abdel Nader,1610612760,OKC,25.0,6-6,78,225,Iowa State,Egypt,...,61,241,116,20,-9.5,0.017,0.139,0.148,0.522,0.044
3,201143,Al Horford,1610612738,BOS,33.0,6-10,82,245,Florida,Dominican Republic,...,68,925,458,283,6.1,0.062,0.161,0.188,0.605,0.203
4,202329,Al-Farouq Aminu,1610612757,POR,28.0,6-9,81,220,Wake Forest,USA,...,81,760,610,104,8.2,0.048,0.204,0.134,0.568,0.057
