<h1>Comparison of Different Models</h1>

In this notebook we compare the performance of several models. We examine the RMSE, MSE and R2 of each model.

In [2]:
import numpy as np
import pandas as pd
from tqdm import tqdm
import time
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, KFold
from sklearn.linear_model import Lasso,Ridge, ElasticNet, BayesianRidge, LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from xgboost import XGBRegressor
import warnings
from joblib import dump, load
from os import listdir, mkdir
from os.path import isdir, isfile, join
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_csv('person_csvs/all_people.csv', index_col=0)
resp = [i for i in data.columns if 'RESP' in i and i != ' RESP']
data.drop(resp + ['Time [s]', 'sec'], axis=1, inplace=True)

In [3]:
data.head()

Unnamed: 0,RESP,PLETH,V,AVR,II,HR,PULSE,SpO2,PLETH_Min,V_Min,...,AVR_Mean,II_Mean,PLETH_Kurt,V_Kurt,AVR_Kurt,II_Kurt,PLETH_Skw,V_Skw,AVR_Skw,II_Skw
0,0.23088,2.7449,0.55518,0.70528,0.060059,96.0,102.0,99.0,1.0098,0.49023,...,0.598002,0.197546,-0.512857,14.546472,11.03829,10.266315,0.68131,3.618766,-2.722894,2.806052
1,0.23675,2.8524,0.5498,0.70039,0.064941,96.0,102.0,99.0,1.0098,0.49023,...,0.598002,0.197546,-0.512857,14.546472,11.03829,10.266315,0.68131,3.618766,-2.722894,2.806052
2,0.24114,2.913,0.49512,0.70528,0.055176,96.0,102.0,99.0,1.0098,0.49023,...,0.598002,0.197546,-0.512857,14.546472,11.03829,10.266315,0.68131,3.618766,-2.722894,2.806052
3,0.24579,2.9267,0.5249,0.52493,0.3999,96.0,102.0,99.0,1.0098,0.49023,...,0.598002,0.197546,-0.512857,14.546472,11.03829,10.266315,0.68131,3.618766,-2.722894,2.806052
4,0.24921,2.8974,0.62988,0.069892,1.145,96.0,102.0,99.0,1.0098,0.49023,...,0.598002,0.197546,-0.512857,14.546472,11.03829,10.266315,0.68131,3.618766,-2.722894,2.806052


Here we normalize the predictive data using scikit-learn's standard scaler.

In [4]:
SS = StandardScaler()
X = data.drop(' RESP', axis=1)
SS.fit(X, y = None)
y = data[' RESP'].values
X = SS.transform(X)

We experimented with all of the models shown below. The RandomForestRegressor proved to be the best. Uncomment as many models as you which to train. 
The code to save these models can also be found in this. As the later models are rather large (e.g. KNN is ~850MB), it is commented out. If you do wish to use this feature, make sure you have enough memory. 

In [5]:
models = {
#           'OLS':LinearRegression(),
#           'ElasticNet':ElasticNet(),
#           'BayesianRidge':BayesianRidge(),
#           'Lasso':Lasso(),
#           'Ridge':Ridge(),
#           'KNN':KNeighborsRegressor(),
          'RFF':RandomForestRegressor(),
#           'Ada': AdaBoostRegressor(),
#           'XGB':XGBRegressor()
         }

def model_performance(X, y):
    times = []
    keys = []
    mean_squared_errors = []
    mean_abs_error = []
    r2_scores = []
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 10)

    for k, v in tqdm(models.items()):
        model = v
        t0 = time.time()
        model.fit(X_train, y_train)
        train_time = time.time()-t0
        t1 = time.time()
        pred = model.predict(X_test)
        predict_time = time.time() - t1
        pred = pd.Series(pred)
        Time_total = train_time + predict_time
        times.append(Time_total)
        r2_scores.append(r2_score(y_test,pred))
        mean_squared_errors.append(mean_squared_error(y_test,pred))
        mean_abs_error.append(mean_absolute_error(y_test,pred))
        keys.append(k)
        
#         if not isdir("saved_models"):
#             mkdir("saved_models") 
#         dump(model, "saved_models/"+k+'_joblib')
        
    table = pd.DataFrame({
                            'model':keys, 
                            'RMSE':mean_squared_errors, 
                            'MAE':mean_abs_error, 
                            'R2 score':r2_scores, 
                            'time':times
                        })
    
    table['RMSE'] = table['RMSE'].apply(lambda x: np.sqrt(x))
    
    return table

In [3]:
model_performance(X, y)