<img src="https://imgur.com/3Ua9VYU.png" style="float: left; margin: 18px; height: 75px"> 

# Compiled Model Workflow
---

## NBA Game Total Score Prediction
---
## Problem Statement
With the unpredictability in sports, there is never be an sure-fire winning sportsbet.
The goal of this project is to create a model that returns the expected totals of upcoming NBA matchups and comparing that to Over/Under bets from different sportsbooks, recommending whether the total will fall within the over or under line and by how much. Ultimately giving players somewhat of an analysis and upperhand when betting on sportsbooks. To create the expected value for the total of the game, we will be implementing machine learning models on previous NBA game data. Choosing the model with the best testing score.
---

### Importing Libraries, Data, and API key
---

In [361]:
import major_key_alert  #personal API key .py file
odds_api_key=major_key_alert.odds_key  #personal API key .py file
import requests
from bs4 import BeautifulSoup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import PolynomialFeatures,StandardScaler,MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression,RidgeCV,LassoCV,ElasticNetCV
from sklearn.compose import ColumnTransformer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor,plot_tree
from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor,GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

---
### API for Upcoming NBA Games
---

In [362]:
def odds_data(sportkey,api_key,regions,markets,odds_format,bookmakers):
    url=f'https://api.the-odds-api.com/v4/sports/{sportkey}/odds'
    #sport keys can be found at https://the-odds-api.com/liveapi/guides/v4/#overview
    params={
        'api_key': api_key, #api key
        'regions': regions, #region of bookmakers(sites)- us/uk/au/eu
        'markets': markets, #odds market- moneyline/spreads/totals/outrights
        'oddsFormat': odds_format, #decimal or american
        'bookmakers': bookmakers #bookmakers/site
    }
    res=requests.get(url,params)
    if res.status_code != 200: 
        return f'Error {res.status_code}: please review the input! Try again.' 
    else:
        rows = []
  
        for data in res.json():
            data_id = data['id']
            data_sport_title=data['sport_title']
            data_commence_time=data['commence_time']
            data_home_team=data['home_team']
            data_away_team=data['away_team']

            data_bookmakers = data['bookmakers']

            for data2 in data_bookmakers:
                data2_title=data2['title']
                data2_last_update=data2['last_update']

                data2_markets=data2['markets']

                for data3 in data2_markets:
                    data3_key=data3['key']

                    data3_outcomes=data3['outcomes']

                    for row in data3_outcomes:
                        row['id']=data_id
                        row['sport_title']=data_sport_title
                        row['commence_time']=data_commence_time
                        row['home_team']=data_home_team
                        row['away_team']=data_away_team

                        row['title']=data2_title
                        row['last_update']=data2_last_update

                        rows.append(row)
        df = pd.DataFrame(rows)
        
    return df

#### API parameters for Fanduel NBA Total Odds

In [363]:
sportkey='basketball_nba' #sport keys can be found at https://the-odds-api.com/liveapi/guides/v4/#overview
api_key=odds_api_key #personal API key
regions='us' #region of bookmakers(sites)- us/uk/au/eu
markets='totals' #odds market- moneyline/spreads/totals/outrights
odds_format='american' #decimal or american
bookmakers1='fanduel' #bookmakers/site
fanduel=odds_data(sportkey,api_key,regions,markets,odds_format,bookmakers1)

#### Upcoming NBA Matchups and Over/Under line 

In [364]:
matchups=fanduel[['home_team','away_team','point']].drop_duplicates(ignore_index=True)
matchups

Unnamed: 0,home_team,away_team,point
0,Orlando Magic,Atlanta Hawks,245.5
1,Indiana Pacers,Golden State Warriors,237.5
2,Charlotte Hornets,Detroit Pistons,247.5
3,Toronto Raptors,Sacramento Kings,230.5
4,Chicago Bulls,New York Knicks,226.5
5,Oklahoma City Thunder,Miami Heat,222.5
6,San Antonio Spurs,Portland Trail Blazers,231.0
7,Dallas Mavericks,Cleveland Cavaliers,217.0
8,Denver Nuggets,Washington Wizards,225.0
9,Los Angeles Clippers,Minnesota Timberwolves,222.5


#### Creating Dictionary for Mapping

In [365]:
team_abrv_url='https://en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Basketball_Association/National_Basketball_Association_team_abbreviations'
team_abrv_res=requests.get(team_abrv_url)
team_abrv_soup=BeautifulSoup(team_abrv_res.text)
team_abrv_tbl=team_abrv_soup.find('table')

teams_abrv={row.find('a').attrs['title']:row.find('td').text[:-1] for row in team_abrv_tbl.find('tbody').find_all('tr')[1:]}

teams_abrv.update({'Brooklyn Nets':'BRK'})
teams_abrv.update({'Charlotte Hornets':'CHO'})
teams_abrv.update({'Phoenix Suns':'PHO'})
teams_abrv

{'Atlanta Hawks': 'ATL',
 'Brooklyn Nets': 'BRK',
 'Boston Celtics': 'BOS',
 'Charlotte Hornets': 'CHO',
 'Chicago Bulls': 'CHI',
 'Cleveland Cavaliers': 'CLE',
 'Dallas Mavericks': 'DAL',
 'Denver Nuggets': 'DEN',
 'Detroit Pistons': 'DET',
 'Golden State Warriors': 'GSW',
 'Houston Rockets': 'HOU',
 'Indiana Pacers': 'IND',
 'Los Angeles Clippers': 'LAC',
 'Los Angeles Lakers': 'LAL',
 'Memphis Grizzlies': 'MEM',
 'Miami Heat': 'MIA',
 'Milwaukee Bucks': 'MIL',
 'Minnesota Timberwolves': 'MIN',
 'New Orleans Pelicans': 'NOP',
 'New York Knicks': 'NYK',
 'Oklahoma City Thunder': 'OKC',
 'Orlando Magic': 'ORL',
 'Philadelphia 76ers': 'PHI',
 'Phoenix Suns': 'PHO',
 'Portland Trail Blazers': 'POR',
 'Sacramento Kings': 'SAC',
 'San Antonio Spurs': 'SAS',
 'Toronto Raptors': 'TOR',
 'Utah Jazz': 'UTA',
 'Washington Wizards': 'WAS'}

#### Mapping Matchups 

In [366]:
matchups.replace({'home_team':teams_abrv,'away_team':teams_abrv},inplace=True)

In [367]:
matchups

Unnamed: 0,home_team,away_team,point
0,ORL,ATL,245.5
1,IND,GSW,237.5
2,CHO,DET,247.5
3,TOR,SAC,230.5
4,CHI,NYK,226.5
5,OKC,MIA,222.5
6,SAS,POR,231.0
7,DAL,CLE,217.0
8,DEN,WAS,225.0
9,LAC,MIN,222.5


---
### Team Data Webscraper
---

In [368]:
team_abrv1=matchups['home_team'][0] #replace 0 with index row for next matchup
team_abrv2=matchups['away_team'][0] #replace 0 with index row for next matchup
total=matchups['point'][0]#replace 0 with index row for next matchup

In [369]:
def game_data(team_abrv,season):
    url=f'https://www.basketball-reference.com/teams/{team_abrv}/{season}/gamelog/'
    res=requests.get(url)
    if res.status_code != 200: 
        return f'Error {res.status_code}: please review the input! Try again.' 
    else:
        soup=BeautifulSoup(res.text)
        teams = []
        stats=['date_game','pts','fg','fga','fg_pct','fg3','fg3a','fg3_pct','ft','fta','ft_pct','trb','ast','stl','blk','tov']
        tbl=soup.find('table')
        trs=tbl.find('tbody').find_all('tr')
        for tr in trs:
            df_col={}
            for num in range(0,len(stats)):
                df_col[stats[num]]=tr.find(attrs={'data-stat':stats[num]})
            for num in range(1,len(stats)):
                df_col[f'opp_{stats[num]}']=tr.find(attrs={'data-stat':f'opp_{stats[num]}'})

            teams.append(df_col)

        df=pd.DataFrame(teams)
        df.dropna(inplace=True)
        df=df.applymap(lambda x: x.text)
        df=df[df.date_game!='Date']
        df=df.astype({'date_game':'datetime64[ns]','pts':'int64','fg':'int64','fga':'int64','fg_pct':'float64','fg3':'int64','fg3a':'int64','fg3_pct':'float64',
        'ft':'int64','fta':'int64','ft_pct':'float64','trb':'int64','ast':'int64','stl':'int64','blk':'int64','tov':'int64',
        'opp_pts':'int64','opp_fg':'int64','opp_fga':'int64','opp_fg_pct':'float64','opp_fg3':'int64','opp_fg3a':'int64','opp_fg3_pct':'float64',
        'opp_ft':'int64','opp_fta':'int64','opp_ft_pct':'float64','opp_trb':'int64','opp_ast':'int64','opp_stl':'int64','opp_blk':'int64','opp_tov':'int64'})
        
        df=df.sort_values(by=['date_game'],ascending=False)
        
        outdf=pd.DataFrame(df[['date_game','pts']])
        for num in [1,3,6,8]:
            rolsums=df.rolling(num).sum().add_prefix(f'last{num}sum_')
            rolsums['date_game']=df['date_game'].shift(num)
            out1=rolsums[num:]

            outdf=pd.merge(outdf,out1,on='date_game')
    return outdf

In [370]:
team1=game_data(team_abrv1,'2022')
team2=game_data(team_abrv2,'2022')

#### Upcoming Game Data (Prediction Data)
---

In [371]:
def recent_game_data(team_abrv,season):
    url=f'https://www.basketball-reference.com/teams/{team_abrv}/{season}/gamelog/'
    res=requests.get(url)
    if res.status_code != 200: 
        return f'Error {res.status_code}: please review the input! Try again.' 
    else:
        soup=BeautifulSoup(res.text)
        teams = []
        stats=['date_game','pts','fg','fga','fg_pct','fg3','fg3a','fg3_pct','ft','fta','ft_pct','trb','ast','stl','blk','tov']
        tbl=soup.find('table')
        trs=tbl.find('tbody').find_all('tr')
        for tr in trs:
            df_col={}
            for num in range(0,len(stats)):
                df_col[stats[num]]=tr.find(attrs={'data-stat':stats[num]})
            for num in range(1,len(stats)):
                df_col[f'opp_{stats[num]}']=tr.find(attrs={'data-stat':f'opp_{stats[num]}'})

            teams.append(df_col)

        df=pd.DataFrame(teams)    
        df.dropna(inplace=True)
        df=df.applymap(lambda x: x.text)
        df=df[df.date_game!='Date']
        df=df.astype({'date_game':'datetime64[ns]','pts':'int64','fg':'int64','fga':'int64','fg_pct':'float64','fg3':'int64','fg3a':'int64','fg3_pct':'float64',
        'ft':'int64','fta':'int64','ft_pct':'float64','trb':'int64','ast':'int64','stl':'int64','blk':'int64','tov':'int64',
        'opp_pts':'int64','opp_fg':'int64','opp_fga':'int64','opp_fg_pct':'float64','opp_fg3':'int64','opp_fg3a':'int64','opp_fg3_pct':'float64',
        'opp_ft':'int64','opp_fta':'int64','opp_ft_pct':'float64','opp_trb':'int64','opp_ast':'int64','opp_stl':'int64','opp_blk':'int64','opp_tov':'int64'})
        
        df=df.sort_values(by=['date_game'],ascending=True)
        
        outdf=pd.DataFrame(df[['date_game','pts']])
        for num in [1,3,6,8]:
            rolsums=df.rolling(num).sum().add_prefix(f'last{num}sum_')
            rolsums['date_game']=df['date_game']
            out1=rolsums[num:]

            outdf=pd.merge(outdf,out1,on='date_game')
    return outdf.sort_values(by=['date_game'],ascending=False).head(1)


In [372]:
team1recent=recent_game_data(team_abrv1,'2023')
team2recent=recent_game_data(team_abrv2,'2023')

### Pipeline of Models 
---

In [373]:
lr_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('lr',LinearRegression())
])

rl_alphas = np.logspace(-3,0,5, 100)
ridge_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('ridge',RidgeCV(alphas=rl_alphas,scoring='r2',cv=5))
])


enet_alphas = np.linspace(0.5, 1.0, 100)
enet_ratio = 0.5
en_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('en',ElasticNetCV(alphas=enet_alphas,l1_ratio=enet_ratio,cv=5,random_state=42))
])

knn_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('knn',KNeighborsRegressor())
])

md=4 
mss=5
dt_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('dt',DecisionTreeRegressor(max_depth=md,min_samples_split=mss,random_state=42))
])

rf_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('rf',RandomForestRegressor(random_state=42))
])

ada_dt=AdaBoostRegressor(base_estimator=dt_pipe,random_state=42)

gb_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('gb',GradientBoostingRegressor(random_state=42))
])

### All Model Scores Together
---

In [374]:
team1df=pd.DataFrame()
# for team in team1_2_data:
allvar1=team1.drop(columns=['date_game','pts'])
top10_corr1=team1[pd.DataFrame(abs(team1.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:10]]
top15_corr1=team1[pd.DataFrame(abs(team1.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:15]]
top20_corr1=team1[pd.DataFrame(abs(team1.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:20]]
all_X=[allvar1,top10_corr1,top15_corr1,top20_corr1]
all_X_name=['allvar','top10_corr','top15_corr','top20_corr']
for x in all_X:
    X=x
    y=team1['pts']
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)
    #allvar,top10_corr,top15_corr,top20_corr
    pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
    pipes_name={0:'lr_pipe',1:'ridge_pipe',2:'en_pipe',3:'knn_pipe',4:'dt_pipe',5:'rf_pipe',6:'ada_dt',7:'gb_pipe'}
    results=[]
    for pipe in pipes:
        pipe_fitted=pipe.fit(X_train,y_train)
        pipe_train_score=pipe_fitted.score(X_train,y_train)
        pipe_test_score=pipe_fitted.score(X_val,y_val)
        pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
        pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
        pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
        
        results.append([pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val])
        dfres = pd.DataFrame(results)
        team1df=team1df.append(dfres)
team1df.drop_duplicates(inplace=True)
team1df.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
team1df['name_of_estimator_pipe']=team1df.index.to_series().map(pipes_name)
team1df['name_of_estimator_pipe'].iloc[:8]='all_var_'+team1df['name_of_estimator_pipe'].iloc[:8]
team1df['name_of_estimator_pipe'].iloc[8:16]='top10_corr_'+team1df['name_of_estimator_pipe'].iloc[8:16]
team1df['name_of_estimator_pipe'].iloc[16:24]='top15_corr_'+team1df['name_of_estimator_pipe'].iloc[16:24]
team1df['name_of_estimator_pipe'].iloc[24:32]='top20_corr_'+team1df['name_of_estimator_pipe'].iloc[24:32]
team1df.set_index('name_of_estimator_pipe',inplace=True)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


In [375]:
team2df=pd.DataFrame()
# for team in team1_2_data:
allvar2=team2.drop(columns=['date_game','pts'])
top10_corr2=team2[pd.DataFrame(abs(team2.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:10]]
top15_corr2=team2[pd.DataFrame(abs(team2.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:15]]
top20_corr2=team2[pd.DataFrame(abs(team2.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:20]]
all_X2=[allvar2,top10_corr2,top15_corr2,top20_corr2]
all_X_name=['allvar','top10_corr','top15_corr','top20_corr']
for x in all_X:
    X=x
    y=team2['pts']
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)
    #allvar,top10_corr,top15_corr,top20_corr
    pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
    pipes_name={0:'lr_pipe',1:'ridge_pipe',2:'en_pipe',3:'knn_pipe',4:'dt_pipe',5:'rf_pipe',6:'ada_dt',7:'gb_pipe'}
    results=[]
    for pipe in pipes:
        pipe_fitted=pipe.fit(X_train,y_train)
        pipe_train_score=pipe_fitted.score(X_train,y_train)
        pipe_test_score=pipe_fitted.score(X_val,y_val)
        pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
        pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
        pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
        
        results.append([pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val])
        dfres = pd.DataFrame(results)
        team2df=team2df.append(dfres)
team2df.drop_duplicates(inplace=True)
team2df.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
team2df['name_of_estimator_pipe']=team2df.index.to_series().map(pipes_name)
team2df['name_of_estimator_pipe'].iloc[:8]='all_var_'+team2df['name_of_estimator_pipe'].iloc[:8]
team2df['name_of_estimator_pipe'].iloc[8:16]='top10_corr_'+team2df['name_of_estimator_pipe'].iloc[8:16]
team2df['name_of_estimator_pipe'].iloc[16:24]='top15_corr_'+team2df['name_of_estimator_pipe'].iloc[16:24]
team2df['name_of_estimator_pipe'].iloc[24:32]='top20_corr_'+team2df['name_of_estimator_pipe'].iloc[24:32]
team2df.set_index('name_of_estimator_pipe',inplace=True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value)


### Top Test Score Model
---

In [376]:
team1df.loc[team1df['test_score'].idxmax()].name

'top10_corr_ada_dt'

In [377]:
team2df.loc[team2df['test_score'].idxmax()].name

'all_var_en_pipe'

### Top Model Predictions
---

In [378]:
X=top10_corr1
y=team1['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

team1_model=ada_dt.fit(X_train,y_train)

X=team1recent[top10_corr1.columns]
y_pred1=team1_model.predict(X)
y_pred1

array([110.])

In [379]:
X=top10_corr2
y=team2['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

team2_model=ridge_pipe.fit(X_train,y_train)

X=team2recent[top10_corr2.columns]
y_pred2=team2_model.predict(X)
y_pred2

array([117.14197592])

### Total Score Prediction 
---

In [380]:
print(f'Home team-{team_abrv1}:{y_pred1[0]},Away Team-{team_abrv2}:{y_pred2[0]}')
print(f'Predicted Total:{(y_pred1+y_pred2)[0]}')
print(f'Over/Under Line:{total}')
if total>(y_pred1+y_pred2)[0]:
    print('Bet Under!')
else:
    print('Bet Over!')

Home team-ORL:110.0,Away Team-ATL:117.14197591745466
Predicted Total:227.14197591745466
Over/Under Line:245.5
Bet Under!


---