<img src="https://imgur.com/3Ua9VYU.png" style="float: left; margin: 18px; height: 75px"> 

# Modeling & Accuracy of Models
---

## NBA Game Total Score Prediction
---
## Problem Statement
With the unpredictability in sports, there is never be an sure-fire winning sportsbet.
The goal of this project is to create a model that returns the expected totals of upcoming NBA matchups and comparing that to Over/Under bets from different sportsbooks, recommending whether the total will fall within the over or under line and by how much. Ultimately giving players somewhat of an analysis and upperhand when betting on sportsbooks. To create the expected value for the total of the game, we will be implementing machine learning models on previous NBA game data. Choosing the model with the best testing score.
---

### Importing Libraries & Data
---

In [112]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import PolynomialFeatures,StandardScaler,MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression,RidgeCV,LassoCV,ElasticNetCV
from sklearn.compose import ColumnTransformer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor,plot_tree
from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor,GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

In [113]:
team1=pd.read_csv('../data/team1data.csv')
team2=pd.read_csv('../data/team2data.csv')

---
### Modeling
---

### Team 1 (Home Team)

In [114]:
#Train-Test-Split

In [115]:
X=team1.drop(columns=['date_game','pts'])
y=team1['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [116]:
#baseline score(average score)

In [117]:
y.mean()

104.27027027027027

#### Linear Regression Pipeline
---

In [118]:
lr_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('lr',LinearRegression())
])

##### Linear Regression Scores

In [119]:
pipe_name=lr_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 1.0
val score: -0.8064376083819718
cross validate score: -1.3421238940206202
RMSE train: 3.0736098171946326e-14
RMSE val: 15.46883123936881


#### Ridge Regression Pipeline
---

In [120]:
rl_alphas = np.logspace(-3,0,5, 100)
ridge_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('ridge',RidgeCV(alphas=rl_alphas,scoring='r2',cv=5))
])

##### Ridge Regression Scores

In [121]:
pipe_name=ridge_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.9991339625765945
val score: -0.7266327880130223
cross validate score: -1.3078580798271309
RMSE train: 0.3158303376506942
RMSE val: 15.123280598532718


#### ElasticNet Regression Pipeline
---

In [122]:
enet_alphas = np.linspace(0.5, 1.0, 100)
enet_ratio = 0.5
en_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('en',ElasticNetCV(alphas=enet_alphas,l1_ratio=enet_ratio,cv=5,random_state=42))
])

##### ElasticNet Regression Scores

In [123]:
pipe_name=en_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.07613958147651845
val score: -0.036364738413293685
cross validate score: -0.24638805330778793
RMSE train: 10.315462280901228
RMSE val: 11.716619661104348


#### K-Nearest Neighbor Regression Pipeline
---

In [124]:
knn_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('knn',KNeighborsRegressor())
])

##### K-Nearest Neighbor Scores

In [125]:
pipe_name=knn_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.19059091725362698
val score: -0.3151120654945643
cross validate score: -0.3017934366387178
RMSE train: 9.655384025400858
RMSE val: 13.198585782827896


#### Decision Tree Regression Pipeline
---

In [126]:
md=4 
mss=5
dt_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('dt',DecisionTreeRegressor(max_depth=md,min_samples_split=mss,random_state=42))
])

##### Decision Tree Regression Scores

In [127]:
pipe_name=dt_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.9321787612225487
val score: -1.0081864781881076
cross validate score: -0.84796496793176
RMSE train: 2.7949135166904275
RMSE val: 16.309777544081186


#### Random Forest Regression Pipeline
---

In [128]:
rf_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('rf',RandomForestRegressor(random_state=42))
])

##### Random Forest Regression Scores


In [129]:
pipe_name=rf_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.8615636994433027
val score: 0.015505871695075002
cross validate score: -0.1688541967287916
RMSE train: 3.993102952146448
RMSE val: 11.419644477828545


#### Ada Boost Regression Pipeline
---

In [130]:
ada_dt=AdaBoostRegressor(base_estimator=dt_pipe,random_state=42)

##### Ada Boost Regression Score

In [131]:
boost_model=ada_dt.fit(X_train,y_train)
print(f'train score: {boost_model.score(X_train,y_train)}')
print(f'val score: {boost_model.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(boost_model, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,boost_model.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,boost_model.predict(X_val),squared=False)}')


train score: 0.9913659338535946
val score: 0.02090739809197184
cross validate score: -0.4859929029471031
RMSE train: 0.9972250991835281
RMSE val: 11.38827387315896


#### Gradient Boosting Regression Pipeline
---

In [132]:
gb_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('gb',GradientBoostingRegressor(random_state=42))
])

#### Gradient Boosting Regression Scores

In [133]:
pipe_name=gb_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.9999992888237811
val score: -0.3226444325601139
cross validate score: -0.2755707543807797
RMSE train: 0.009050532409340336
RMSE val: 13.23632957986325


#### All Model Scores 
----

In [134]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df = pd.DataFrame(results)
df.insert(0,'name_of_estimator_pipe',pipes_name)
df.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df
df.to_csv('../data/team1_all_var_models.csv',index=False)

---

### Team 2 (Away Team)

In [135]:
#Train-Test-Split

In [136]:
X=team2.drop(columns=['date_game','pts'])
y=team2['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [137]:
#baseline score(average score)

In [138]:
y.mean()

114.5945945945946

#### Linear Regression Pipeline
---

In [139]:
lr_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('lr',LinearRegression())
])

##### Linear Regression Scores

In [140]:
pipe_name=lr_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 1.0
val score: 0.24796980326061568
cross validate score: -2.2435360853721185
RMSE train: 5.330074015153098e-14
RMSE val: 9.112858380256315


#### Ridge Regression Pipeline
---

In [141]:
rl_alphas = np.logspace(-3,0,5, 100)
ridge_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('ridge',RidgeCV(alphas=rl_alphas,scoring='r2',cv=5))
])

##### Ridge Regression Scores

In [142]:
pipe_name=ridge_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.99822805885722
val score: 0.2560531786790028
cross validate score: -2.1265994262354213
RMSE train: 0.5264481742287678
RMSE val: 9.06375019822038


#### ElasticNet Regression Pipeline
---

In [143]:
enet_alphas = np.linspace(0.5, 1.0, 100)
enet_ratio = 0.5
en_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('en',ElasticNetCV(alphas=enet_alphas,l1_ratio=enet_ratio,cv=5,random_state=42))
])

##### ElasticNet Regression Scores

In [144]:
pipe_name=en_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.09681512362551525
val score: -0.08549652050797496
cross validate score: -0.7639143136658403
RMSE train: 11.885550624303283
RMSE val: 10.948413695050103


#### K-Nearest Neighbor Regression Pipeline
---

In [145]:
knn_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('knn',KNeighborsRegressor())
])

##### K-Nearest Neighbor Scores

In [146]:
pipe_name=knn_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.20799603276641065
val score: -0.29562907510263203
cross validate score: -0.7127072800065423
RMSE train: 11.129986979753909
RMSE val: 11.961270835492355


#### Decision Tree Regression Pipeline
---

In [147]:
md=4 
mss=5
dt_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('dt',DecisionTreeRegressor(max_depth=md,min_samples_split=mss,random_state=42))
])

##### Decision Tree Regression Scores

In [148]:
pipe_name=dt_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.8495307679255811
val score: -1.3675765574441967
cross validate score: -2.0374708107273554
RMSE train: 4.851263117347572
RMSE val: 16.169217275944575


#### Random Forest Regression Pipeline
---

In [149]:
rf_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('rf',RandomForestRegressor(random_state=42))
])

##### Random Forest Regression Scores


In [150]:
pipe_name=rf_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.845601911067847
val score: -0.1288836633663364
cross validate score: -0.7267325694131873
RMSE train: 4.914189934451498
RMSE val: 11.165073219643478


#### Ada Boost Regression Pipeline
---

In [151]:
ada_dt=AdaBoostRegressor(base_estimator=dt_pipe,random_state=42)

##### Ada Boost Regression Score

In [152]:
boost_model=ada_dt.fit(X_train,y_train)
print(f'train score: {boost_model.score(X_train,y_train)}')
print(f'val score: {boost_model.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(boost_model, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,boost_model.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,boost_model.predict(X_val),squared=False)}')


train score: 0.9892929015365628
val score: -0.10945295357364304
cross validate score: -1.156373101705609
RMSE train: 1.2940971358936406
RMSE val: 11.068567725167764


#### Gradient Boosting Regression Pipeline
---

In [153]:
gb_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('gb',GradientBoostingRegressor(random_state=42))
])

#### Gradient Boosting Regression Scores

In [154]:
pipe_name=gb_pipe.fit(X_train,y_train)
print(f'train score: {pipe_name.score(X_train,y_train)}')
print(f'val score: {pipe_name.score(X_val,y_val)}')
print(f'cross validate score: {cross_val_score(pipe_name, X_train, y_train, cv=5, n_jobs=-1).mean()}')
print(f'RMSE train: {mean_squared_error(y_train,pipe_name.predict(X_train),squared=False)}')
print(f'RMSE val: {mean_squared_error(y_val,pipe_name.predict(X_val),squared=False)}')


train score: 0.9999985692053074
val score: -0.3997590959284194
cross validate score: -0.8906265310077555
RMSE train: 0.014959588178529123
RMSE val: 12.432647791187614


#### All Model Scores 
----

In [155]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df = pd.DataFrame(results)
df.insert(0,'name_of_estimator_pipe',pipes_name)
df.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df
df.to_csv('../data/team2_all_var_models.csv',index=False)

---