# Salary Prediction Project of US Baseball Major League Players with Thirteen Different Machine Learning Models

In this project, thirteen different machine learning models will be employed to predict salary of any US Major Baseball League player. The Hitters data described below will be used to predict the salaries of baseball players. The data will be retrieved from "https://www.kaggle.com"

 
 
 
### Description
    
#### Context

This dataset is part of the R-package ISLR and is used in the related book by G. James et al. (2013) "An Introduction to Statistical Learning with applications in R" to demonstrate how Ridge regression and the LASSO are performed using R.

#### Content
This dataset was originally taken from the StatLib library which is maintained at Carnegie Mellon University. This is part of the data that was used in the 1988 ASA Graphics Section Poster Session. The salary data were originally from Sports Illustrated, April 20, 1987. The 1986 and career statistics were obtained from The 1987 Baseball Encyclopedia Update published by Collier Books, Macmillan Publishing Company, New York.

#### Format

A data frame with 322 observations of major league players on the following 20 variables.

- AtBat Number of times at bat in 1986
- Hits Number of hits in 1986
- HmRun Number of home runs in 1986
- Runs Number of runs in 1986
- RBI Number of runs batted in in 1986
- Walks Number of walks in 1986
- Years Number of years in the major leagues
- CAtBat Number of times at bat during his career
- CHits Number of hits during his career
- CHmRun Number of home runs during his career
- CRuns Number of runs during his career
- CRBI Number of runs batted in during his career
- CWalks Number of walks during his career
- League A factor with levels A and N indicating player’s league at the end of 1986
- Division A factor with levels E and W indicating player’s division at the end of 1986
- PutOuts Number of put outs in 1986
- Assists Number of assists in 1986
- Errors Number of errors in 1986
- Salary 1987 annual salary on opening day in thousands of dollars
- NewLeague A factor with levels A and N indicating player’s league at the beginning of 1987

Acknowledgements
Please cite/acknowledge: Games, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York. 




In [None]:
# Installing external libraries

!pip install xgboost
!pip install lightgbm
!pip install catboost

In [None]:
# Importing necessary libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import xgboost
import warnings
warnings.simplefilter(action='ignore')
warnings.filterwarnings("ignore", category=DeprecationWarning) 
warnings.filterwarnings("ignore", category=FutureWarning) 
warnings.filterwarnings("ignore", category=UserWarning) 

from warnings import filterwarnings
filterwarnings('ignore')

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.metrics import mean_squared_error,r2_score
from sklearn import model_selection
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn import neighbors
from sklearn.neighbors import LocalOutlierFactor, KNeighborsRegressor
from sklearn.preprocessing import scale, StandardScaler, RobustScaler
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.svm import SVR
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor


In [None]:
# Reading Hitters data from kaggle server

df = pd.read_csv("../input/hitters/Hitters.csv")  

### Understanding Data

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe().T

In [None]:
df.shape

In [None]:
# detecting missing values 

df.isnull().sum()

Salary variable has 59 missing values

In [None]:
#For visualizing missing values I need to install below package
# When you are working with anaconda you may need this installation

conda install -c conda-forge/label/cf202003 missingno


In [None]:
#Visualizing missing values

import missingno as msno
msno.bar(df);

In [None]:
#Correlation values more than 0.5 between features (Because of >0.5 I can only see the values above 0.5)

correlation_matrix = df.corr().round(2)
filtre=np.abs(correlation_matrix['Salary'])>0.50
corr_features=correlation_matrix.columns[filtre].tolist()
sns.clustermap(df[corr_features].corr(),annot=True,fmt=".2f")
plt.title('Correlation btw features')
plt.show()

In [None]:
# Even though there are very high correlation between some of the variables I will not do anything. Normally this problem should be solved.
# Here I will delete missing values

df = df.dropna()

In [None]:
df.shape

In [None]:
df.sort_values('Salary', ascending = False).head()


In [None]:
# I have 3 categorical variables

df['League'].value_counts()

In [None]:
df['NewLeague'].value_counts()

In [None]:
df['Division'].value_counts()

In [None]:
# Transforming nominal variables with one hot encoding method. Normally label encoding variable can be applied for dummy variables. One hot encoding is appropriate for the nominal variables have 3 or more categories 

df = pd.get_dummies(df, columns = ['League', 'Division', 'NewLeague'], drop_first = True)

In [None]:
df.head()

In [None]:
# For detecting outliers I will use LocalOutlierFactor. I will use default values of 20 and 'auto'.

clf=LocalOutlierFactor(n_neighbors=20, contamination='auto')
clf.fit_predict(df)
df_scores=clf.negative_outlier_factor_
df_scores= np.sort(df_scores)
df_scores[0:20]

In [None]:
?LocalOutlierFactor

In [None]:
# I will take the 5th value as  threshold while the values after fift values decreasing closely
# However at first I will visualize this situation regarding outliers

sns.boxplot(df_scores);

In [None]:
threshold=np.sort(df_scores)[5]
print(threshold)
df = df.loc[df_scores > threshold]
df = df.reset_index(drop=True)

In [None]:
df.shape

In [None]:
# Standardization
# I will make some operations in the below rows.
# Salary is my dependent variable, others are dummy variables. At first I will drop them from my independent variable set (X)
#At last I will combine all of the independent variables

df_X=df.drop(['Salary','League_N','Division_W','NewLeague_N'], axis=1)
df_X.head()


In [None]:
from sklearn.preprocessing import StandardScaler
scaled_cols=StandardScaler().fit_transform(df_X)



scaled_cols=pd.DataFrame(scaled_cols, columns=df_X.columns)
scaled_cols.head()

In [None]:
cat_df=df.loc[:, "League_N":"NewLeague_N"]
cat_df.head()

In [None]:
Salary=pd.DataFrame(df['Salary'])

In [None]:
df=pd.concat([Salary,scaled_cols, cat_df], axis=1)
df.head()

In [None]:
# Dependent variable y = Salary, independents variables x = the variables without salary

y = df['Salary']
X = df.drop('Salary', axis =1)

In [None]:
X

In [None]:
y

In [None]:
# We will evaluate our model results cccording to mean value of predicted variable (y) 

y.mean()

### MODELING

In [None]:
# Train and test separation process and determining train and test size
#Test size will be %20 of the data and random state will be 46 for all of the models in order to compare the models

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.20, 
                                                    random_state=46)

### Linear Regression

In [None]:
linreg = LinearRegression()
model = linreg.fit(X_train,y_train)
y_pred = model.predict(X_test)
df_linreg_rmse = np.sqrt(mean_squared_error(y_test,y_pred))
df_linreg_rmse

##### Prediction value (rmse) for linear regression model is 382.00085575367274. y.mean value is 538.2316872586872


### Ridge Regression

In [None]:
ridreg = Ridge()
model = ridreg.fit(X_train, y_train)
y_pred = model.predict(X_test)
df_ridreg_rmse = np.sqrt(mean_squared_error(y_test,y_pred))
df_ridreg_rmse 

### Lasso Regression

In [None]:
lasreg = Lasso()
model = lasreg.fit(X_train,y_train)
y_pred = model.predict(X_test)
df_lasreg_rmse = np.sqrt(mean_squared_error(y_test,y_pred))
df_lasreg_rmse

### Elastic Net Regression

In [None]:
enet = ElasticNet()
model = enet.fit(X_train,y_train)
y_pred = model.predict(X_test)
df_enet_rmse = np.sqrt(mean_squared_error(y_test,y_pred))
df_enet_rmse

### KNN (K-Nearest Neighbors)

In [None]:
knn = KNeighborsRegressor()
knn_model = knn.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)
df_knn_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_knn_rmse

### SVR (Support Vector Regression)

In [None]:
svr = SVR("linear")
svr_model = svr.fit(X_train, y_train)
y_pred = svr_model.predict(X_test)
df_svr_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_svr_rmse

### MLP (Multilayer Perceptron)
One of the Artificial Neural Network Models (ANN)

In [None]:
mlp = MLPRegressor()
mlp_model = mlp.fit(X_train, y_train)
y_pred = mlp_model.predict(X_test)
df_mlp_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_mlp_rmse

### CART (Classification and Regression Trees)

In [None]:
cart = DecisionTreeRegressor()
cart_model = cart.fit(X_train, y_train)
y_pred = cart_model.predict(X_test)
df_cart_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_cart_rmse

### Random Forests

In [None]:
rf = RandomForestRegressor()
rf_model = rf.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)
df_rf_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_rf_rmse

### GBM (Gradient Boosting Machines)

In [None]:
gbm = GradientBoostingRegressor()
gbm_model = gbm.fit(X_train, y_train)
y_pred = gbm_model.predict(X_test)
df_gbm_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_gbm_rmse

### XGBoost (Extreme Gradient Boosting)

In [None]:
xgb = XGBRegressor()
xgb_model = xgb.fit(X_train, y_train)
y_pred = xgb.predict(X_test)
df_xgb_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_xgb_rmse

### LightGBM

In [None]:
lgbm = LGBMRegressor()
lgbm_model = lgbm.fit(X_train, y_train)
y_pred = lgbm_model.predict(X_test)
df_lgbm_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_lgbm_rmse

### CatBoost (Category Boosting)

In [None]:
catb = CatBoostRegressor()
catb_model = catb.fit(X_train, y_train)
y_pred = catb_model.predict(X_test)
df_catb_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_catb_rmse

In [None]:
# Thirteen models' Root Mean Squared Errors (RMSE) 
# I will not include CatBoostRegressor since it takes about 2 hours when I include CatBoostRegressor. 
# I will report it seperately to save time

def compML(df, y, alg):
    model = alg().fit(X_train, y_train)
    y_pred = model.predict(X_test)
    RMSE = np.sqrt(mean_squared_error(y_test, y_pred))
    model_name = alg.__name__
    print(model_name, "Model RMSE:", RMSE)

In [None]:
models = [LinearRegression, Ridge, Lasso, ElasticNet, KNeighborsRegressor, SVR, MLPRegressor, DecisionTreeRegressor, 
          RandomForestRegressor, GradientBoostingRegressor, XGBRegressor, LGBMRegressor] 

In [None]:
for model in models:
    compML(df, 'Salary', model)

### Among base machine learning models CatBoost (Category Boosting) model ist the best model to predict salary with its RMSE value of 258.90197830660554.

## Model Tuning

### Ridge Regression Model Tuning

In [None]:
# Hyper parameter optimization with cross validation function.
# We will try to tune the model by assigning new alpha values.
# Default alpha value is 1.0 in Ridge regression. We will try different values.
# The best fit alpha value or parameter will be employed in the final model

alpha = [0.1,0.01,0.001,0.2,0.3,0.5,0.8,0.9,1]
ridreg_cv = RidgeCV(alphas = alpha, scoring = "neg_mean_squared_error", cv = 10, normalize = True)
ridreg_cv.fit(X_train, y_train)
ridreg_cv.alpha_

#Final Model 

ridreg_tuned = Ridge(alpha = ridreg_cv.alpha_).fit(X_train,y_train)
y_pred = ridreg_tuned.predict(X_test)
df_ridge_tuned_rmse = np.sqrt(mean_squared_error(y_test,y_pred))
df_ridge_tuned_rmse

### Lasso Regression Model Tuning

In [None]:
# Hyper parameter optimization with cross validation function.
# We will try to tune the model by assigning new alpha values.
# Default alpha value is 1.0 in Lasso regression. We will try different values.
# The best fit alpha value or parameter will be employed in the final model

alpha = [0.1,0.01,0.001,0.2,0.3,0.5,0.8,0.9,1]
lasso_cv = LassoCV(alphas = alpha, cv = 10, normalize = True)
lasso_cv.fit(X_train, y_train)
lasso_cv.alpha_

# Final Model 

lasso_tuned = Lasso(alpha = lasso_cv.alpha_).fit(X_train,y_train)
y_pred = lasso_tuned.predict(X_test)
df_lasso_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_lasso_tuned_rmse

In [None]:
?Lasso

### Elastic Net Regression Regression Model Tuning

In [None]:
?ElasticNet

In [None]:
# Hyper parameter optimization with cross validation function.
# We will try to tune the model by assigning new alpha values.
# Default alpha value is 1.0 and default l1_ratio is 0.5 in ElesticNet regression. We will try different values.
# The best fit  values or parameters will be employed in the final model


enet_params = {"l1_ratio": [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1],
              "alpha":[0.1,0.01,0.001,0.2,0.3,0.5,0.8,0.9,1]}
enet = ElasticNet()
enet_model = enet.fit(X_train,y_train)
enet_cv = GridSearchCV(enet_model, enet_params, cv = 10).fit(X, y)
enet_cv.best_params_

#Final Model 

enet_tuned = ElasticNet(**enet_cv.best_params_).fit(X_train,y_train)
y_pred = enet_tuned.predict(X_test)
df_enet_tuned_rmse = np.sqrt(mean_squared_error(y_test,y_pred))
df_enet_tuned_rmse 

### KNN (K-Nearest Neighbors) Model Tuning

In [None]:
?knn

In [None]:
# n_neighbors : int, default=5 Number of neighbors to use by default for :meth:`kneighbors` queries.

knn_params = {"n_neighbors": np.arange(2,30,1)}
knn_cv_model = GridSearchCV(knn_model, knn_params, cv = 10).fit(X_train, y_train)
knn_cv_model.best_params_
knn_tuned = KNeighborsRegressor(**knn_cv_model.best_params_).fit(X_train, y_train)

# Final Model

y_pred = knn_tuned.predict(X_test)
df_knn_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_knn_tuned_rmse

In [None]:
knn_cv_model.best_params_

In [None]:
knn_cv_model.best_estimator_

### SVR (Support Vector Regression) Model Tuning

In [None]:
?svr

In [None]:
# C : float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive.

svr_params = {'C': [0.01,0.001, 0.2, 0.1,0.5,0.8,0.9,1]}
svr_cv_model = GridSearchCV(svr_model, svr_params, cv = 5, n_jobs = -1, verbose =  2).fit(X_train, y_train)
svr_tuned = SVR('linear', **svr_cv_model.best_params_).fit(X_train, y_train)
y_pred = svr_tuned.predict(X_test)
df_svr_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_svr_tuned_rmse

In [None]:
svr_cv_model.best_params_

In [None]:
svr_cv_model.best_estimator_

### MLP (Multilayer Perceptron) Model Tuning
One of the Artificial Neural Network Models (ANN)

In [None]:
?mlp

In [None]:
# hidden_layer_sizes : tuple, length = n_layers - 2, default=(100,) The ith element represents the number of neurons in the ith hidden layer.
#alpha : float, default=0.0001

mlp_params = {"alpha": [0.1, 0.01, 0.02, 0.001, 0.0001], 
             "hidden_layer_sizes": [(10,20), (5,5), (100,100), (1000,100,10)]}
mlp_cv_model = GridSearchCV(mlp_model, mlp_params, cv = 10, verbose = 2, n_jobs = -1).fit(X_train, y_train)
mlp_tuned = MLPRegressor(**mlp_cv_model.best_params_).fit(X_train, y_train)
y_pred = mlp_tuned.predict(X_test)
df_mlp_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_mlp_tuned_rmse

In [None]:
mlp_cv_model.best_params_

In [None]:
mlp_cv_model.best_estimator_

### CART (Classification and Regression Trees) Model Tuning

In [None]:
?cart

* max_depth : int, default=None
    The maximum depth of the tree. If None, then nodes are expanded until
    all leaves are pure or until all leaves contain less than
    min_samples_split samples.

* min_samples_split : int or float, default=2
    The minimum number of samples required to split an internal node:

    - If int, then consider `min_samples_split` as the minimum number.
    - If float, then `min_samples_split` is a fraction and
      `ceil(min_samples_split * n_samples)` are the minimum
      number of samples for each split.

In [None]:
cart_params = {"max_depth": [2,3,4,5,10,20, 100, 1000],
              "min_samples_split": [2,10,5,30,50,10]}
cart_cv_model = GridSearchCV(cart_model, cart_params, cv = 10).fit(X_train, y_train)
cart_tuned = DecisionTreeRegressor(**cart_cv_model.best_params_).fit(X_train, y_train)
y_pred = cart_tuned.predict(X_test)
df_cart_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_cart_tuned_rmse

In [None]:
cart_cv_model.best_params_

In [None]:
cart_cv_model.best_estimator_

### Random Forests  Model Tuning

In [None]:
?RandomForestRegressor

- RandomForestRegressor default values
    
   -n_estimators=100,
   - criterion='mse',
   - max_depth=None,
   - min_samples_split=2
   - n_jobs=None,
   - verbose=0
    

In [None]:
rf_params = {"max_depth": [5,8,10,None],
            "max_features": [2,5,10,15,17],
            "n_estimators": [100,200, 500, 1000],
            "min_samples_split": [2,5,10,20,30]}
rf_cv_model = GridSearchCV(rf_model, rf_params, cv = 10, n_jobs = -1, verbose = 2).fit(X_train, y_train)
rf_tuned = RandomForestRegressor(**rf_cv_model.best_params_).fit(X_train, y_train)
y_pred = rf_tuned.predict(X_test)
df_rf_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_rf_tuned_rmse

In [None]:
rf_cv_model.best_params_

In [None]:
rf_cv_model.best_estimator_

### GBM (Gradient Boosting Machines) Model Tuning

In [None]:
?gbm

- Parameters
----------
- loss : {'ls', 'lad', 'huber', 'quantile'}, default='ls'

    loss function to be optimized. 'ls' refers to least squares
    regression. 'lad' (least absolute deviation) is a highly robust
    loss function solely based on order information of the input
    variables. 'huber' is a combination of the two. 'quantile'
    allows quantile regression (use `alpha` to specify the quantile).

- learning_rate : float, default=0.1

    learning rate shrinks the contribution of each tree by `learning_rate`.
    There is a trade-off between learning_rate and n_estimators.

- n_estimators : int, default=100

    The number of boosting stages to perform. Gradient boosting
    is fairly robust to over-fitting so a large number usually
    results in better performance.

- subsample : float, default=1.0

    The fraction of samples to be used for fitting the individual base
    learners. If smaller than 1.0 this results in Stochastic Gradient
    Boosting. `subsample` interacts with the parameter `n_estimators`.
    Choosing `subsample < 1.0` leads to a reduction of variance
    and an increase in bias.

- max_depth : int, default=3

    maximum depth of the individual regression estimators. The maximum
    depth limits the number of nodes in the tree. Tune this parameter
    for best performance; the best value depends on the interaction
    of the input variables.


In [None]:
# This process takes too much time therefore it would be better to run the code with cv =5 instead of cv = 10. 

gbm_params = {"learning_rate": [0.001,0.1,0.01, 0.05],
             "max_depth": [1,2,3,5,8,9,10],
             "n_estimators": [50,100,200,500,1000],
             "subsample": [2,1.5,1,0.4,0.5,0.7],
             "loss": ["ls","lad","quantile"]}                  
gbm_cv_model = GridSearchCV(gbm_model, gbm_params, cv = 10, n_jobs = -1, verbose = 2).fit(X_train, y_train)
gbm_tuned = GradientBoostingRegressor(**gbm_cv_model.best_params_).fit(X_train, y_train)                             
y_pred = gbm_tuned.predict(X_test)                             
df_gbm_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))                             
df_gbm_tuned_rmse      

In [None]:
gbm_cv_model.best_params_

In [None]:
gbm_cv_model.best_estimator_

### XGBoost (Extreme Gradient Boosting) Model Tuning

In [None]:
?xgb

In [None]:

xgb_params = {"learning_rate": [0.1,0.01,0.5,0.7,0.8],
             "max_depth": [3,4,5,6,7,8],
             "n_estimators": [100,200,500,1000],
             "colsample_bytree": [0.5,0.7,0.8,0.9]}
xgb_cv_model  = GridSearchCV(xgb,xgb_params, cv = 10, n_jobs = -1, verbose = 2).fit(X_train, y_train)
xgb_tuned = XGBRegressor(**xgb_cv_model.best_params_).fit(X_train, y_train)
y_pred = xgb_tuned.predict(X_test)
df_xgb_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))
df_xgb_tuned_rmse

In [None]:
xgb_cv_model.best_params_

In [None]:
xgb_cv_model.best_estimator_

### LightGBM Model Tuning

In [None]:
?lgbm

In [None]:
#learning_rate default = 0.1, n_estimators default = 100, colsample_bytree default = 1, max_depth default = -1, n_jobs default=-1)


lgbm_params = {"learning_rate": [0.01,0.001, 0.1, 0.5, 1],
              "n_estimators": [50,80,100,200,500,1000],
              "max_depth": [-1.5, -1.3, -1, 0.3, 0.5,0.7,2,4,6,7,10],
              "colsample_bytree": [0.1,0.3,0.5,0.7,1,1.3,1.5]}
lgbm_cv_model = GridSearchCV(lgbm_model, lgbm_params, cv = 10, n_jobs = -1, verbose =2).fit(X_train, y_train)
lgbm_tuned = LGBMRegressor(**lgbm_cv_model.best_params_).fit(X_train, y_train)                              
y_pred = lgbm_tuned.predict(X_test)                              
df_lgbm_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))                              
df_lgbm_tuned_rmse  

In [None]:
lgbm_cv_model.best_params_

In [None]:
lgbm_cv_model.best_estimator_

### CatBoost (Category Boosting)

In [None]:
?catb

In [None]:
# I tried both 10 and 5 folds. Fitting with 5 folds (cv =5) gave (lower) better result. With 10 folds the rmse value was 255, however with 5 folds it became 240.

catb_params = {"iterations": [200,500,100],
              "learning_rate": [0.01,0.1],
              "depth": [3,6,8]}
catb_cv_model = GridSearchCV(catb_model, catb_params, cv = 5, n_jobs = -1, verbose = 2).fit(X_train, y_train)
catb_tuned = CatBoostRegressor(**catb_cv_model.best_params_).fit(X_train, y_train)                            
y_pred = catb_tuned.predict(X_test) 
df_catb_tuned_rmse = np.sqrt(mean_squared_error(y_test, y_pred))                            
df_catb_tuned_rmse                            

In [None]:
catb_cv_model.best_params_

In [None]:
catb_cv_model.best_estimator_

### Comparable Results of Four Basic and Tuned Models

In [None]:

ComparableResults_df = pd.DataFrame({"LINEAR":[df_linreg_rmse],"RIDGE":[df_ridreg_rmse],"RIDGE TUNED":[df_ridge_tuned_rmse],
                             "LASSO":[df_lasreg_rmse],"LASSO TUNED":[df_lasso_tuned_rmse], 
                             "ELASTIC NET":[df_enet_rmse], "ELASTIC NET TUNED":[df_enet_tuned_rmse],
                             "KNN":[df_knn_rmse], "KNN TUNED":[df_knn_tuned_rmse],
                             "SVR":[df_svr_rmse], "SVR TUNED":[df_svr_tuned_rmse],
                             "MLP":[df_mlp_rmse], "MLP TUNED":[df_mlp_tuned_rmse],
                             "CART":[df_cart_rmse], "CART TUNED":[df_cart_tuned_rmse],
                             "RF":[df_rf_rmse], "RF TUNED":[df_rf_tuned_rmse],
                             "GBM":[df_gbm_rmse], "GBM TUNED":[df_gbm_tuned_rmse],
                             "XGBOOST":[df_xgb_rmse], "XGBOOST TUNED":[df_xgb_tuned_rmse],
                             "LightGBM":[df_lgbm_rmse], "LightGBM TUNED":[df_lgbm_tuned_rmse],
                             "CatBoost":[df_catb_rmse], "CatBoost TUNED":[df_catb_tuned_rmse]})

ComparableResults_df


In [None]:
ComparableResults_df.min(axis = 1, skipna = True)

In [None]:
ComparableResults_df.idxmin(axis=1)

In [None]:
ComparableResults_df.T



## Results and Conclusion 



In this project, thirteen different machine learning models were employed to predict salary of any US Major Baseball League player. By using Linear Regression, Ridge Regression, Lasso Regression, ElasticNet Regression,  KNN (K-Nearest Neighbors), SVR (Support Vector Regression), MLP (Multilayer Perceptron), CART (Classification and Regression Trees), Random Forests, GBM (Gradient Boosting Machines), XGBoost (Extreme Gradient Boosting), LightGBM, and CatBoost (Category Boosting) Machine Learning Models the root mean squared errors (RMSE) values were calculated. The RMSE is a measure of the average deviation of the estimates from the observed values. Then, the RMSE values were tried to be decreased with the help of hyperparameter optimizations. All of the base models were tuned. The results showed that in both the base and the tuned model, the lowest RMSE value (258.901978 and 240.560824) obtained from the CatBoost (Category Boosting) Machine Learning model. The best Machine Learning model became tuned CatBoost model with its RMSE value of 240.560824. This error score is quite far away from the mean of predicted value (539.2295992217898).
In sum, analyses and predictions results explicitly revealed that tuned CatBoost (Category Boosting) Machine Learning model is the best model to predict a US Baseball Major League player's salary. 
