# MODEL COMPARISON

In this notebook we compare the performance of several models. We examine the RMSE, MSE and R2 of each model.

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

data = pd.read_csv('csv/person_12.csv',index_col=0)
resp = [i for i in data.columns if 'RESP' in i and i!=' RESP']
data.drop(resp+['Time [s]','sec'],axis=1,inplace=True)
data.head()

Unnamed: 0,RESP,PLETH,V,AVR,II,HR,PULSE,SpO2,PLETH_Min,V_Min,...,AVR_Mean,II_Mean,PLETH_Kurt,V_Kurt,AVR_Kurt,II_Kurt,PLETH_Skw,V_Skw,AVR_Skw,II_Skw
0,0.25806,0.59531,0.72157,0.85938,-0.058594,93,92,96,0.37732,0.07451,...,0.819469,-0.024312,-1.087751,10.346496,13.159851,13.949057,0.482318,-3.057577,-3.335702,3.582299
1,0.26393,0.59042,0.69608,0.69531,-0.029297,93,92,96,0.37732,0.07451,...,0.819469,-0.024312,-1.087751,10.346496,13.159851,13.949057,0.482318,-3.057577,-3.335702,3.582299
2,0.26979,0.58358,0.7,0.45508,0.17969,93,92,96,0.37732,0.07451,...,0.819469,-0.024312,-1.087751,10.346496,13.159851,13.949057,0.482318,-3.057577,-3.335702,3.582299
3,0.27566,0.57771,0.32941,0.041016,0.84375,93,92,96,0.37732,0.07451,...,0.819469,-0.024312,-1.087751,10.346496,13.159851,13.949057,0.482318,-3.057577,-3.335702,3.582299
4,0.2825,0.57283,0.078431,-0.099609,1.3184,93,92,96,0.37732,0.07451,...,0.819469,-0.024312,-1.087751,10.346496,13.159851,13.949057,0.482318,-3.057577,-3.335702,3.582299


Here we can standard scaler the predictive data.

In [7]:
from sklearn.preprocessing import StandardScaler
SS = StandardScaler()
X = data.drop(' RESP',axis=1)
SS.fit(X,y=None)
y = data[' RESP'].values
X = SS.transform(X)

This next function compares several regression model techniques (without tuning hyperparameters)

* Linear Regression
* Elastic Net
* Bayesian Ridge
* Lasso
* KNN regression
* Randorm Forest regression

In [8]:
import time
import numpy as np
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
from sklearn.linear_model import Lasso,Ridge,ElasticNet, BayesianRidge, LinearRegression
from sklearn import neighbors
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

models = {'OLS':LinearRegression(),'ElasticNet':ElasticNet(),
          'BayesianRidge':BayesianRidge(),'Lasso':Lasso(),
         'Ridge':Ridge(),'KNN':neighbors.KNeighborsRegressor(),
         'rff':RandomForestRegressor()}




def model_performance(X,y):
    times =[]
    keys = []
    mean_squared_errors = []
    mean_abs_error = []
    R2_scores = []
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10)

    for k,v in models.items():
        model = v
        t0=time.time()
        model.fit(X_train, y_train)
        train_time = time.time()-t0
        t1 = time.time()
        pred = model.predict(X_test)
        predict_time = time.time()-t1
        pred = pd.Series(pred)
        Time_total = train_time+predict_time
        times.append(Time_total)
        R2_scores.append(r2_score(y_test,pred))
        mean_squared_errors.append(mean_squared_error(y_test,pred))
        mean_abs_error.append(mean_absolute_error(y_test,pred))
        keys.append(k)
    table = pd.DataFrame({'model':keys, 'RMSE':mean_squared_errors,'MAE':mean_abs_error,'R2 score':R2_scores,'time':times})
    table['RMSE'] = table['RMSE'].apply(lambda x: np.sqrt(x))
    return table

model_performance(X,y)

Unnamed: 0,MAE,R2 score,RMSE,model,time
0,0.164318,0.538351,0.203807,OLS,0.033288
1,0.263664,-1.2e-05,0.299962,ElasticNet,0.020659
2,0.164326,0.538362,0.203805,BayesianRidge,0.061279
3,0.263664,-1.2e-05,0.299962,Lasso,0.019615
4,0.16432,0.538354,0.203807,Ridge,0.012678
5,0.076365,0.824398,0.125698,KNN,5.018693
6,0.065084,0.860567,0.112007,rff,3.19171


Here we see the mean absolute error, r2 score, root mean squared error, model name and the time for training and testing.

Clearly rff, which is random forest regressor, is the best performing. KNN regressor came in second.