# Phase Two (Preliminary Models)


In [None]:
import pandas as pd
from joblib import load
from start import *

## Linear Model
Check that the following metric is in the right units and seems reasonable. Average cross-validated score is 0.8371. Because this repository only cares about prediciton power, we have not provided an analysis on the coefficients found during linear regression.


In [None]:
results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/LeastSquares_cvresults.csv')
pipeline = load('models/nbastats2018-2019_run_05_26_22h_51m/LeastSquares_bestestimator.joblib')
results.head()

## Ridge Regression
Check that the following metric is in the right units and seems reasonable. Best Average Cross-validated score is 0.8292 with the following parameters: {'ridge__alpha': 0.01}

Make sure that the alpha scale is large enough in the following image. If not, update the ``confs/template.ini`` and run again.
![title](models/nbastats2018-2019_run_05_26_22h_51m/RidgeRegression_hyperparam.png)


In [None]:
results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/RidgeRegression_cvresults.csv')
pipeline = load('models/nbastats2018-2019_run_05_26_22h_51m/RidgeRegression_bestestimator.joblib')
results.head()

## Lasso Regression
Check that the following metric is in the right units and seems reasonable. Best Average Cross-validated score is 0.8469 with the following parameters: {'lasso__alpha': 0.001}

Make sure that the alpha scale is large enough in the following image. If not, update the ``confs/template.ini`` and run again.
![title](models/nbastats2018-2019_run_05_26_22h_51m/LassoRegression_hyperparam.png)


In [None]:
results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/LassoRegression_cvresults.csv')
pipeline = load('models/nbastats2018-2019_run_05_26_22h_51m/LassoRegression_bestestimator.joblib')
results.head()

## SVR
Check that the following metric is in the right units and seems reasonable. Best Average Cross-validated score is 0.8534 with the following parameters: {'svr__C': 10.0, 'svr__kernel': 'rbf', 'svr__gamma': 0.1}

Analyze the C and gamma parameter using the following image. If the range is too small, update the config file and run it again.

![title](models/nbastats2018-2019_run_05_26_22h_51m/SVR_hyperparam.png)


In [None]:
results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/SVR_cvresults.csv')
pipeline = load('models/nbastats2018-2019_run_05_26_22h_51m/SVR_bestestimator.joblib')
results.head()

## KNN Regression
Check that the following metric is in the right units and seems reasonable. Best Average Cross-validated score is 1.0152 with the following parameters: {'kneighborsregressor__n_neighbors': 35, 'kneighborsregressor__weights': 'uniform'}

Analyze the K hyper-parameter using the following image. If the range is too small, update the config file and run it again.

![title](models/nbastats2018-2019_run_05_26_22h_51m/KNNRegression_hyperparam.png)


In [None]:
results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/KNNRegression_cvresults.csv')
pipeline = load('models/nbastats2018-2019_run_05_26_22h_51m/KNNRegression_bestestimator.joblib')
results.head()

## Random Forest Regression
Check that the following metric is in the right units and seems reasonable. Best Average Cross-validated score is 0.7786 with the following parameters: {'randomforestregressor__max_features': 'auto', 'randomforestregressor__n_estimators': 40}

Analyze the hyper-parameters (Max Features and Number Estimators) using the following image. If the range is too small, update the config file and run it again.

![title](models/nbastats2018-2019_run_05_26_22h_51m/RandomForestRegression_hyperparam.png)

This is a graph of feature importances using gini impurity.
![title](models/nbastats2018-2019_run_05_26_22h_51m/RandomForestRegression_feature_importance.png)


In [None]:
results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/RandomForestRegression_cvresults.csv')
pipeline = load('models/nbastats2018-2019_run_05_26_22h_51m/RandomForestRegression_bestestimator.joblib')
results.head()

## Gradient Boosted Tree Regression
Check that the following metric is in the right units and seems reasonable. Best Average Cross-validated score is 0.7449 with the following parameters: {'xgbregressor__n_estimators': 100, 'xgbregressor__max_depth': 2, 'xgbregressor__learning_rate': 0.1}

Analyze the following hyperparameters using the following image. If the range is too small, update the config file and run it again. Note that the range of xs and ys are set automatically, if you need to analyze further, open up the results csv and analyze it yourself.

![title](models/nbastats2018-2019_run_05_26_22h_51m/GradientBoostedTreeRegression_hyperparam.png)


In [None]:
results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/GradientBoostedTreeRegression_cvresults.csv')
pipeline = load('models/nbastats2018-2019_run_05_26_22h_51m/GradientBoostedTreeRegression_bestestimator.joblib')
results.head()

# Phase Three (Feature Selection)


In [8]:
import pandas as pd
feat_imps = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/feat_imp.csv')
feat_imps.head()

Unnamed: 0,lasso__feat_imp,randfor__feat_imp,xgboost__feat_imp
0,1.0,0.017181,0.014479
1,1.0,0.027601,0.02271
2,1.0,0.123372,0.039445
3,0.0,0.092399,0.08601
4,1.0,0.007377,0.016555


![title](models/nbastats2018-2019_run_05_26_22h_51m/LassoRegression_featsel_improve.png)

![title](models/nbastats2018-2019_run_05_26_22h_51m/RandomForestRegression_featsel_improve.png)

In [1]:
import pandas as pd
all_results = pd.read_csv('models/nbastats2018-2019_run_05_26_22h_51m/all_results.csv')