<img src="https://imgur.com/3Ua9VYU.png" style="float: left; margin: 18px; height: 75px"> 

# Variable Testing
---

## NBA Game Total Score Prediction
---
## Problem Statement
With the unpredictability in sports, there is never be an sure-fire winning sportsbet.
The goal of this project is to create a model that returns the expected totals of upcoming NBA matchups and comparing that to Over/Under bets from different sportsbooks, recommending whether the total will fall within the over or under line and by how much. Ultimately giving players somewhat of an analysis and upperhand when betting on sportsbooks. To create the expected value for the total of the game, we will be implementing machine learning models on previous NBA game data. Choosing the model with the best testing score.
---

### Importing Libraries & Data
---

In [87]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler,PolynomialFeatures,MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression,RidgeCV,LassoCV,ElasticNetCV
from sklearn.compose import ColumnTransformer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor,plot_tree
from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor,GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

In [88]:
team1=pd.read_csv('../data/team1data.csv')
team2=pd.read_csv('../data/team2data.csv')

In [89]:
team1_all_var_models=pd.read_csv('../data/team1_all_var_models.csv')
team2_all_var_models=pd.read_csv('../data/team2_all_var_models.csv')

### Team 1 (Home Team)
---

### Correlation of variables to predictor variables
---

In [90]:
corr_sort=pd.DataFrame(abs(team1.corr()['pts']).sort_values(ascending=False))
corr_sort

Unnamed: 0,pts
pts,1.000000e+00
last1sum_opp_fg_pct,2.847916e-01
last6sum_fg3,2.812545e-01
last3sum_fga,2.673379e-01
last6sum_fg3a,2.572938e-01
...,...
last8sum_opp_fta,9.389361e-03
last6sum_opp_fga,3.246802e-03
last3sum_blk,2.420858e-03
last6sum_fg,2.404411e-03


#### Creating Top 10,15, and 20 Correlation Variables

In [91]:
top10_corr=pd.DataFrame(abs(team1.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:10]
top15_corr=pd.DataFrame(abs(team1.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:15]
top20_corr=pd.DataFrame(abs(team1.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:20]

### Pipeline of Models 
---

In [92]:
lr_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('lr',LinearRegression())
])

rl_alphas = np.logspace(-3,0,5, 100)
ridge_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('ridge',RidgeCV(alphas=rl_alphas,scoring='r2',cv=5))
])


enet_alphas = np.linspace(0.5, 1.0, 100)
enet_ratio = 0.5
en_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('en',ElasticNetCV(alphas=enet_alphas,l1_ratio=enet_ratio,cv=5,random_state=42))
])

knn_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('knn',KNeighborsRegressor())
])

md=4 
mss=5
dt_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('dt',DecisionTreeRegressor(max_depth=md,min_samples_split=mss,random_state=42))
])

rf_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('rf',RandomForestRegressor(random_state=42))
])

ada_dt=AdaBoostRegressor(base_estimator=dt_pipe,random_state=42)

gb_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('gb',GradientBoostingRegressor(random_state=42))
])

### Models with Top 10 Correlated Variables
---

In [93]:
X=team1[top10_corr]
y=team1['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [94]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df10 = pd.DataFrame(results)
df10.insert(0,'name_of_estimator_pipe',pipes_name)
df10.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df10['name_of_estimator_pipe']='top10corr_'+df10['name_of_estimator_pipe']

#### Top 10 Correlated Variable Model Scores
---

In [95]:
df10

Unnamed: 0,name_of_estimator_pipe,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
0,top10corr_lr_pipe,1.0,-15.138099,-8.302712,4.550182e-13,46.235143
1,top10corr_ridge_pipe,0.416339,0.024444,-0.16825,8.19909,11.367685
2,top10corr_en_pipe,0.168649,0.026052,-0.102165,9.785379,11.358317
3,top10corr_knn_pipe,0.307421,0.051141,-0.172896,8.931415,11.211066
4,top10corr_dt_pipe,0.850034,-0.211586,-0.771346,4.156066,12.66844
5,top10corr_rf_pipe,0.858085,-0.10043,-0.149719,4.042964,12.073333
6,top10corr_ada_dt,0.957208,0.345269,-0.292413,2.220069,9.312737
7,top10corr_gb_pipe,0.99935,-0.216694,-0.32909,0.2736336,12.695118


### Models with Top 15 Correlated Variables
---

In [96]:
X=team1[top15_corr]
y=team1['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [97]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df15 = pd.DataFrame(results)
df15.insert(0,'name_of_estimator_pipe',pipes_name)
df15.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df15['name_of_estimator_pipe']='top15corr_'+df15['name_of_estimator_pipe']

#### Top 15 Correlated Variable Model Scores
---

In [98]:
df15

Unnamed: 0,name_of_estimator_pipe,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
0,top15corr_lr_pipe,1.0,-2.733784,-3.333525,7.580879e-14,22.239276
1,top15corr_ridge_pipe,0.512509,0.172579,-0.452955,7.493227,10.4691
2,top15corr_en_pipe,0.190434,0.04229,-0.142929,9.656322,11.263229
3,top15corr_knn_pipe,0.226538,0.003328,-0.316219,9.438543,11.490054
4,top15corr_dt_pipe,0.558275,-1.261935,-0.458893,7.132827,17.309561
5,top15corr_rf_pipe,0.849454,-0.036298,-0.222695,4.164088,11.716245
6,top15corr_ada_dt,0.965939,-0.159799,-0.45729,1.980675,12.394737
7,top15corr_gb_pipe,0.999929,-0.153288,-0.317337,0.09042283,12.359898


### Models with Top 20 Correlated Variables
---

In [99]:
X=team1[top20_corr]
y=team1['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [100]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df20 = pd.DataFrame(results)
df20.insert(0,'name_of_estimator_pipe',pipes_name)
df20.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df20['name_of_estimator_pipe']='top20corr_'+df20['name_of_estimator_pipe']

#### Top 20 Correlated Variable Model Scores
---

In [101]:
df20

Unnamed: 0,name_of_estimator_pipe,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
0,top20corr_lr_pipe,1.0,-2.206722,-1.995515,5.200053e-14,20.609937
1,top20corr_ridge_pipe,0.623089,0.182018,-0.65646,6.58878,10.40921
2,top20corr_en_pipe,0.205495,0.037924,-0.154389,9.566076,11.288874
3,top20corr_knn_pipe,0.175899,-0.000416,-0.462926,9.74262,11.511617
4,top20corr_dt_pipe,0.795969,-0.705502,-0.766049,4.847678,15.030456
5,top20corr_rf_pipe,0.8471,-0.011334,-0.166959,4.196521,11.574262
6,top20corr_ada_dt,0.976889,-0.040825,-0.244059,1.631521,11.741808
7,top20corr_gb_pipe,0.99996,-0.123752,-0.21703,0.06757944,12.200601


### All Model Scores Together
---

In [102]:
team1_all_together=team1_all_var_models.append([df10,df15,df20])
team1_all_together.set_index('name_of_estimator_pipe',inplace=True)
team1_all_together

Unnamed: 0_level_0,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
name_of_estimator_pipe,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
lr_pipe,1.0,-0.806438,-1.342124,3.07361e-14,15.468831
ridge_pipe,0.999134,-0.726633,-1.307858,0.3158303,15.123281
en_pipe,0.07614,-0.036365,-0.246388,10.31546,11.71662
knn_pipe,0.190591,-0.315112,-0.301793,9.655384,13.198586
dt_pipe,0.932179,-1.008186,-0.847965,2.794914,16.309778
rf_pipe,0.861564,0.015506,-0.168854,3.993103,11.419644
ada_dt,0.991366,0.020907,-0.485993,0.9972251,11.388274
gb_pipe,0.999999,-0.322644,-0.275571,0.009050532,13.23633
top10corr_lr_pipe,1.0,-15.138099,-8.302712,4.550182e-13,46.235143
top10corr_ridge_pipe,0.416339,0.024444,-0.16825,8.19909,11.367685


#### Top Test Score Model
---

In [103]:
team1_all_together.loc[team1_all_together['test_score'].idxmax()]

train_score        0.957208
test_score         0.345269
cross_val_score   -0.292413
RMSE_train         2.220069
RMSE_val           9.312737
Name: top10corr_ada_dt, dtype: float64

#### Top Cross Validation Score Model
---

In [104]:
team1_all_together.loc[team1_all_together['cross_val_score'].idxmax()]

train_score         0.168649
test_score          0.026052
cross_val_score    -0.102165
RMSE_train          9.785379
RMSE_val           11.358317
Name: top10corr_en_pipe, dtype: float64

#### Top RMSE Validation Score Model
---

In [105]:
team1_all_together.loc[team1_all_together['RMSE_val'].idxmin()]

train_score        0.957208
test_score         0.345269
cross_val_score   -0.292413
RMSE_train         2.220069
RMSE_val           9.312737
Name: top10corr_ada_dt, dtype: float64

---


### Team 2 (Away Team)


### Correlation of variables to predictor variables
---

In [106]:
corr_sort=pd.DataFrame(abs(team2.corr()['pts']).sort_values(ascending=False))
corr_sort

Unnamed: 0,pts
pts,1.000000
last3sum_fg3a,0.285994
last1sum_opp_stl,0.253894
last3sum_fta,0.251202
last8sum_opp_tov,0.248107
...,...
last8sum_fg,0.008508
last6sum_opp_fg3_pct,0.007529
last3sum_opp_ft,0.005301
last8sum_fga,0.005167


#### Creating Top 10,15, and 20 Correlation Variables

In [107]:
top10_corr=pd.DataFrame(abs(team2.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:10]
top15_corr=pd.DataFrame(abs(team2.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:15]
top20_corr=pd.DataFrame(abs(team2.corr()['pts']).sort_values(ascending=False)).iloc[1:,:].reset_index()['index'][:20]

### Pipeline of Models 
---

In [108]:
lr_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('lr',LinearRegression())
])

rl_alphas = np.logspace(-3,0,5, 100)
ridge_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('ridge',RidgeCV(alphas=rl_alphas,scoring='r2',cv=5))
])


enet_alphas = np.linspace(0.5, 1.0, 100)
enet_ratio = 0.5
en_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('en',ElasticNetCV(alphas=enet_alphas,l1_ratio=enet_ratio,cv=5,random_state=42))
])

knn_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('knn',KNeighborsRegressor())
])

md=4 
mss=5
dt_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('dt',DecisionTreeRegressor(max_depth=md,min_samples_split=mss,random_state=42))
])

rf_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('rf',RandomForestRegressor(random_state=42))
])

ada_dt=AdaBoostRegressor(base_estimator=dt_pipe,random_state=42)

gb_pipe=Pipeline([
    ('mms',MinMaxScaler()),
    ('pf',PolynomialFeatures(include_bias=False)),
    ('gb',GradientBoostingRegressor(random_state=42))
])

### Models with Top 10 Correlated Variables
---

In [109]:
X=team2[top10_corr]
y=team2['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [110]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df10 = pd.DataFrame(results)
df10.insert(0,'name_of_estimator_pipe',pipes_name)
df10.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df10['name_of_estimator_pipe']='top10corr_'+df10['name_of_estimator_pipe']

#### Top 10 Correlated Variable Model Scores
---

In [111]:
df10

Unnamed: 0,name_of_estimator_pipe,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
0,top10corr_lr_pipe,1.0,-19.177077,-12.891815,3.232059e-13,47.20262
1,top10corr_ridge_pipe,0.380404,0.348455,-0.506577,9.844308,8.482215
2,top10corr_en_pipe,0.167676,0.050476,-0.534281,11.40978,10.239766
3,top10corr_knn_pipe,0.364439,0.025115,-0.610826,9.970329,10.375612
4,top10corr_dt_pipe,0.635875,-0.201393,-2.16946,7.546686,11.518063
5,top10corr_rf_pipe,0.842834,0.189707,-0.654976,4.958038,9.459278
6,top10corr_ada_dt,0.943438,0.115731,-0.748404,2.974358,9.881643
7,top10corr_gb_pipe,0.998835,0.185963,-1.165829,0.4269233,9.481106


### Models with Top 15 Correlated Variables
---

In [112]:
X=team2[top15_corr]
y=team2['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [113]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df15 = pd.DataFrame(results)
df15.insert(0,'name_of_estimator_pipe',pipes_name)
df15.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df15['name_of_estimator_pipe']='top15corr_'+df15['name_of_estimator_pipe']

#### Top 15 Correlated Variable Model Scores
---

In [114]:
df15

Unnamed: 0,name_of_estimator_pipe,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
0,top15corr_lr_pipe,1.0,-1.531368,-6.177195,2.122456e-13,16.719166
1,top15corr_ridge_pipe,0.46265,0.28694,-0.677283,9.167685,8.873602
2,top15corr_en_pipe,0.186924,0.104004,-0.565631,11.27708,9.946953
3,top15corr_knn_pipe,0.268359,-0.015117,-0.560889,10.69744,10.58754
4,top15corr_dt_pipe,0.766952,-1.117368,-1.703905,6.037458,15.290974
5,top15corr_rf_pipe,0.836792,0.188178,-0.723977,5.05245,9.468202
6,top15corr_ada_dt,0.970517,-0.072895,-0.659561,2.147408,10.884679
7,top15corr_gb_pipe,0.999706,0.046657,-1.046168,0.2143799,10.260335


### Models with Top 20 Correlated Variables
---

In [115]:
X=team2[top20_corr]
y=team2['pts']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.20, random_state=42)

In [116]:
pipes=[lr_pipe,ridge_pipe,en_pipe,knn_pipe,dt_pipe,rf_pipe,ada_dt,gb_pipe]
pipes_name=['lr_pipe','ridge_pipe','en_pipe','knn_pipe','dt_pipe','rf_pipe','ada_dt','gb_pipe']
results=[]
for pipe in pipes:
    pipe_fitted=pipe.fit(X_train,y_train)
    pipe_train_score=pipe_fitted.score(X_train,y_train)
    pipe_test_score=pipe_fitted.score(X_val,y_val)
    pipe_fitted_cross_val_score=cross_val_score(pipe_fitted, X_train, y_train, cv=5, n_jobs=-1).mean()
    pipe_fitted_RMSE_train=mean_squared_error(y_train,pipe_fitted.predict(X_train),squared=False)
    pipe_fitted_RMSE_val=mean_squared_error(y_val,pipe_fitted.predict(X_val),squared=False)
    results.append((pipe_train_score,pipe_test_score,pipe_fitted_cross_val_score,pipe_fitted_RMSE_train,pipe_fitted_RMSE_val))
df20 = pd.DataFrame(results)
df20.insert(0,'name_of_estimator_pipe',pipes_name)
df20.rename(columns={0:'train_score',1:'test_score',2:'cross_val_score',3:'RMSE_train',4:'RMSE_val'},inplace=True)
df20['name_of_estimator_pipe']='top20corr_'+df20['name_of_estimator_pipe']

#### Top 20 Correlated Variable Model Scores
---

In [117]:
df20

Unnamed: 0,name_of_estimator_pipe,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
0,top20corr_lr_pipe,1.0,-1.48412,-4.630326,5.717364e-14,16.562399
1,top20corr_ridge_pipe,0.5642,0.301046,-0.947902,8.256096,8.785394
2,top20corr_en_pipe,0.220607,0.133267,-0.596528,11.04102,9.783173
3,top20corr_knn_pipe,0.288207,-0.15774,-0.633201,10.55134,11.306871
4,top20corr_dt_pipe,0.758461,0.030532,-2.194954,6.146449,10.346745
5,top20corr_rf_pipe,0.857652,0.141128,-0.789508,4.718524,9.738704
6,top20corr_ada_dt,0.976192,0.057073,-0.895356,1.929702,10.204133
7,top20corr_gb_pipe,0.999929,-0.20257,-1.08732,0.1056874,11.523707


### All Model Scores Together
---

In [118]:
team2_all_together=team2_all_var_models.append([df10,df15,df20])
team2_all_together.set_index('name_of_estimator_pipe',inplace=True)
team2_all_together

Unnamed: 0_level_0,train_score,test_score,cross_val_score,RMSE_train,RMSE_val
name_of_estimator_pipe,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
lr_pipe,1.0,0.24797,-2.243536,5.330074e-14,9.112858
ridge_pipe,0.998228,0.256053,-2.126599,0.5264482,9.06375
en_pipe,0.096815,-0.085497,-0.763914,11.88555,10.948414
knn_pipe,0.207996,-0.295629,-0.712707,11.12999,11.961271
dt_pipe,0.849531,-1.367577,-2.037471,4.851263,16.169217
rf_pipe,0.845602,-0.128884,-0.726733,4.91419,11.165073
ada_dt,0.989293,-0.109453,-1.156373,1.294097,11.068568
gb_pipe,0.999999,-0.399759,-0.890627,0.01495959,12.432648
top10corr_lr_pipe,1.0,-19.177077,-12.891815,3.232059e-13,47.20262
top10corr_ridge_pipe,0.380404,0.348455,-0.506577,9.844308,8.482215


#### Top Test Score Model
---

In [119]:
team2_all_together.loc[team2_all_together['test_score'].idxmax()]

train_score        0.380404
test_score         0.348455
cross_val_score   -0.506577
RMSE_train         9.844308
RMSE_val           8.482215
Name: top10corr_ridge_pipe, dtype: float64

#### Top Cross Validation Score Model
---

In [120]:
team2_all_together.loc[team2_all_together['cross_val_score'].idxmax()]

train_score        0.380404
test_score         0.348455
cross_val_score   -0.506577
RMSE_train         9.844308
RMSE_val           8.482215
Name: top10corr_ridge_pipe, dtype: float64

#### Top RMSE Validation Score Model
---

In [121]:
team2_all_together.loc[team2_all_together['RMSE_val'].idxmin()]

train_score        0.380404
test_score         0.348455
cross_val_score   -0.506577
RMSE_train         9.844308
RMSE_val           8.482215
Name: top10corr_ridge_pipe, dtype: float64

---