<img src="https://imgur.com/3Ua9VYU.png" style="float: left; margin: 18px; height: 75px"> 

# Data Extraction & Wrangling
---

## NBA Game Total Score Prediction
---
## Problem Statement
With the unpredictability in sports, there is never be an sure-fire winning sportsbet.
The goal of this project is to create a model that returns the expected totals of upcoming NBA matchups and comparing that to Over/Under bets from different sportsbooks, recommending whether the total will fall within the over or under line and by how much. Ultimately giving players somewhat of an analysis and upperhand when betting on sportsbooks. To create the expected value for the total of the game, we will be implementing machine learning models on previous NBA game data. Choosing the model with the best testing score.
---

### Importing Libraries & API key
---

In [88]:
import major_key_alert #personal API key .py file
odds_api_key=major_key_alert.odds_key  #personal API key .py file
import requests
import pandas as pd
from bs4 import BeautifulSoup

---
### API for Upcoming NBA Games
---

In [89]:
def odds_data(sportkey,api_key,regions,markets,odds_format,bookmakers):
    url=f'https://api.the-odds-api.com/v4/sports/{sportkey}/odds'
    #sport keys can be found at https://the-odds-api.com/liveapi/guides/v4/#overview
    params={
        'api_key': api_key, #api key
        'regions': regions, #region of bookmakers(sites)- us/uk/au/eu
        'markets': markets, #odds market- moneyline/spreads/totals/outrights
        'oddsFormat': odds_format, #decimal or american
        'bookmakers': bookmakers #bookmakers/site
    }
    res=requests.get(url,params)
    if res.status_code != 200: 
        return f'Error {res.status_code}: please review the input! Try again.' 
    else:
        rows = []
  
        for data in res.json():
            data_id = data['id']
            data_sport_title=data['sport_title']
            data_commence_time=data['commence_time']
            data_home_team=data['home_team']
            data_away_team=data['away_team']

            data_bookmakers = data['bookmakers']

            for data2 in data_bookmakers:
                data2_title=data2['title']
                data2_last_update=data2['last_update']

                data2_markets=data2['markets']

                for data3 in data2_markets:
                    data3_key=data3['key']

                    data3_outcomes=data3['outcomes']

                    for row in data3_outcomes:
                        row['id']=data_id
                        row['sport_title']=data_sport_title
                        row['commence_time']=data_commence_time
                        row['home_team']=data_home_team
                        row['away_team']=data_away_team

                        row['title']=data2_title
                        row['last_update']=data2_last_update

                        rows.append(row)
        df = pd.DataFrame(rows)
        
    return df

#### API parameters for Fanduel NBA Total Odds

In [90]:
sportkey='basketball_nba' #sport keys can be found at https://the-odds-api.com/liveapi/guides/v4/#overview
api_key=odds_api_key #personal API key
regions='us' #region of bookmakers(sites)- us/uk/au/eu
markets='totals' #odds market- moneyline/spreads/totals/outrights
odds_format='american' #decimal or american
bookmakers1='fanduel' #bookmakers/site
fanduel=odds_data(sportkey,api_key,regions,markets,odds_format,bookmakers1)

#### Upcoming NBA Matchups and Over/Under line 

In [94]:
matchups=fanduel[['home_team','away_team','point']].drop_duplicates(ignore_index=True)
matchups

Unnamed: 0,home_team,away_team,point
0,Orlando Magic,Atlanta Hawks,225.0
1,Charlotte Hornets,Detroit Pistons,227.5
2,Indiana Pacers,Golden State Warriors,238.0
3,Chicago Bulls,New York Knicks,225.5
4,Toronto Raptors,Sacramento Kings,232.5
5,Oklahoma City Thunder,Miami Heat,223.5
6,San Antonio Spurs,Portland Trail Blazers,230.0
7,Dallas Mavericks,Cleveland Cavaliers,215.5
8,Denver Nuggets,Washington Wizards,226.0
9,Los Angeles Clippers,Minnesota Timberwolves,222.5


#### Creating Dictionary for Mapping

In [95]:
team_abrv_url='https://en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Basketball_Association/National_Basketball_Association_team_abbreviations'
team_abrv_res=requests.get(team_abrv_url)
team_abrv_soup=BeautifulSoup(team_abrv_res.text)
team_abrv_tbl=team_abrv_soup.find('table')

teams_abrv={row.find('a').attrs['title']:row.find('td').text[:-1] for row in team_abrv_tbl.find('tbody').find_all('tr')[1:]}

teams_abrv.update({'Brooklyn Nets':'BRK'})
teams_abrv.update({'Charlotte Hornets':'CHO'})
teams_abrv.update({'Phoenix Suns':'PHO'})
teams_abrv

{'Atlanta Hawks': 'ATL',
 'Brooklyn Nets': 'BRK',
 'Boston Celtics': 'BOS',
 'Charlotte Hornets': 'CHO',
 'Chicago Bulls': 'CHI',
 'Cleveland Cavaliers': 'CLE',
 'Dallas Mavericks': 'DAL',
 'Denver Nuggets': 'DEN',
 'Detroit Pistons': 'DET',
 'Golden State Warriors': 'GSW',
 'Houston Rockets': 'HOU',
 'Indiana Pacers': 'IND',
 'Los Angeles Clippers': 'LAC',
 'Los Angeles Lakers': 'LAL',
 'Memphis Grizzlies': 'MEM',
 'Miami Heat': 'MIA',
 'Milwaukee Bucks': 'MIL',
 'Minnesota Timberwolves': 'MIN',
 'New Orleans Pelicans': 'NOP',
 'New York Knicks': 'NYK',
 'Oklahoma City Thunder': 'OKC',
 'Orlando Magic': 'ORL',
 'Philadelphia 76ers': 'PHI',
 'Phoenix Suns': 'PHO',
 'Portland Trail Blazers': 'POR',
 'Sacramento Kings': 'SAC',
 'San Antonio Spurs': 'SAS',
 'Toronto Raptors': 'TOR',
 'Utah Jazz': 'UTA',
 'Washington Wizards': 'WAS'}

#### Mapping Matchups 

In [106]:
matchups.replace({'home_team':teams_abrv,'away_team':teams_abrv},inplace=True)

#### Export as CSV

In [108]:
matchups.to_csv('../data/matchups.csv',index=False)

---
### Team Data Webscraper
---

In [111]:
def game_data(team_abrv,season):
    url=f'https://www.basketball-reference.com/teams/{team_abrv}/{season}/gamelog/'
    res=requests.get(url)
    if res.status_code != 200: 
        return f'Error {res.status_code}: please review the input! Try again.' 
    else:
        soup=BeautifulSoup(res.text)
        teams = []
        stats=['date_game','pts','fg','fga','fg_pct','fg3','fg3a','fg3_pct','ft','fta','ft_pct','trb','ast','stl','blk','tov']
        tbl=soup.find('table')
        trs=tbl.find('tbody').find_all('tr')
        for tr in trs:
            df_col={}
            for num in range(0,len(stats)):
                df_col[stats[num]]=tr.find(attrs={'data-stat':stats[num]})
            for num in range(1,len(stats)):
                df_col[f'opp_{stats[num]}']=tr.find(attrs={'data-stat':f'opp_{stats[num]}'})

            teams.append(df_col)

        df=pd.DataFrame(teams)
        
        df.dropna(inplace=True)
        df=df.applymap(lambda x: x.text)
        df=df[df.date_game!='Date']
        df=df.astype({'date_game':'datetime64[ns]','pts':'int64','fg':'int64','fga':'int64','fg_pct':'float64','fg3':'int64','fg3a':'int64','fg3_pct':'float64',
        'ft':'int64','fta':'int64','ft_pct':'float64','trb':'int64','ast':'int64','stl':'int64','blk':'int64','tov':'int64',
        'opp_pts':'int64','opp_fg':'int64','opp_fga':'int64','opp_fg_pct':'float64','opp_fg3':'int64','opp_fg3a':'int64','opp_fg3_pct':'float64',
        'opp_ft':'int64','opp_fta':'int64','opp_ft_pct':'float64','opp_trb':'int64','opp_ast':'int64','opp_stl':'int64','opp_blk':'int64','opp_tov':'int64'})
        
        df=df.sort_values(by=['date_game'],ascending=False)
        
        outdf=pd.DataFrame(df[['date_game','pts']])
        for num in [1,3,6,8]:
            rolsums=df.rolling(num).sum().add_prefix(f'last{num}sum_')
            rolsums['date_game']=df['date_game'].shift(num)
            out1=rolsums[num:]

            outdf=pd.merge(outdf,out1,on='date_game')
    return outdf


#### Webscraping Home and Away Team Data

In [112]:
team_abrv1=matchups['home_team'][0]
team_abrv2=matchups['away_team'][0]
team1=game_data(team_abrv1,'2022')
team2=game_data(team_abrv2,'2022')
team1.to_csv('../data/team1data.csv',index=False)
team2.to_csv('../data/team2data.csv',index=False)


#### Upcoming Game Data (Prediction Data)

In [118]:
def recent_game_data(team_abrv,season):
    url=f'https://www.basketball-reference.com/teams/{team_abrv}/{season}/gamelog/'
    res=requests.get(url)
    if res.status_code != 200: 
        return f'Error {res.status_code}: please review the input! Try again.' 
    else:
        soup=BeautifulSoup(res.text)
        teams = []
        stats=['date_game','pts','fg','fga','fg_pct','fg3','fg3a','fg3_pct','ft','fta','ft_pct','trb','ast','stl','blk','tov']
        tbl=soup.find('table')
        trs=tbl.find('tbody').find_all('tr')
        for tr in trs:
            df_col={}
            for num in range(0,len(stats)):
                df_col[stats[num]]=tr.find(attrs={'data-stat':stats[num]})
            for num in range(1,len(stats)):
                df_col[f'opp_{stats[num]}']=tr.find(attrs={'data-stat':f'opp_{stats[num]}'})

            teams.append(df_col)

        df=pd.DataFrame(teams)    
        df.dropna(inplace=True)
        df=df.applymap(lambda x: x.text)
        df=df[df.date_game!='Date']
        df=df.astype({'date_game':'datetime64[ns]','pts':'int64','fg':'int64','fga':'int64','fg_pct':'float64','fg3':'int64','fg3a':'int64','fg3_pct':'float64',
        'ft':'int64','fta':'int64','ft_pct':'float64','trb':'int64','ast':'int64','stl':'int64','blk':'int64','tov':'int64',
        'opp_pts':'int64','opp_fg':'int64','opp_fga':'int64','opp_fg_pct':'float64','opp_fg3':'int64','opp_fg3a':'int64','opp_fg3_pct':'float64',
        'opp_ft':'int64','opp_fta':'int64','opp_ft_pct':'float64','opp_trb':'int64','opp_ast':'int64','opp_stl':'int64','opp_blk':'int64','opp_tov':'int64'})
        
        df=df.sort_values(by=['date_game'],ascending=True)
        
        outdf=pd.DataFrame(df[['date_game','pts']])
        for num in [1,3,6,8]:
            rolsums=df.rolling(num).sum().add_prefix(f'last{num}sum_')
            rolsums['date_game']=df['date_game']
            out1=rolsums[num:]

            outdf=pd.merge(outdf,out1,on='date_game')
    return outdf.sort_values(by=['date_game'],ascending=False).head(1)


#### Webscraping Home and Away Upcoming Game Data

In [119]:
team1recent=recent_game_data(team_abrv1,'2023')
team2recent=recent_game_data(team_abrv2,'2023')
team1recent.to_csv('../data/team1recent.csv',index=False)
team2recent.to_csv('../data/team2recent.csv',index=False)

---