# Modeling 

In this notebook, we will be predicting the different UPDRS scores for the patients at each time point. We will be using the chosen protein and peptide abundances for the predictions. We will be using two different models light gbm and SVM, and testing which one works best and gives the most optimal results. The reason that we chose light gbm rather than traditional gradient boosting models or random forest is its faster training time and higher accuracy. 

Load the libraries 

In [1]:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
from numpy import arange
import pandas as pd
import lightgbm
from lightgbm import LGBMClassifier
from bayes_opt import BayesianOptimization
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import GridSearchCV, RepeatedKFold
from sklearn.svm import SVC


Load the training and test datasets 

In [2]:
X_train=pd.read_csv("X_train.csv",index_col=0)
y_train=pd.read_csv('y_train.csv',index_col=0)
X_test=pd.read_csv("X_test.csv")
y_test=pd.read_csv('y_test.csv')


Load the selected protein and peptide abundances which are important for each of the UPDRS scores based on the boruta algorithm.

In [3]:
features_UPDRS1=pd.read_csv("features_UPDRS1",header=None)
features_UPDRS2=pd.read_csv("features_UPDRS2",header=None)
features_UPDRS3=pd.read_csv("features_UPDRS3",header=None)
features_UPDRS4=pd.read_csv("features_UPDRS4",header=None)

**We will be modelling each of the developed UPDRS scores seperately with a light gbm model and doing hyper parameter tuning to get the parameters with the best cross validation. We will be using bayesian optimization for hyperparameter tuning when using lightgbm  using five fold cross validation.

**We will first be selecting the UPDRS 1 features only for X_train and then predicting the UPDRS 1 scores

In [4]:
X_train_UPDRS1=X_train[features_UPDRS1.iloc[:,0].tolist()]

In [5]:
scaler = StandardScaler()
X_tr_scaled_UPDRS1 = scaler.fit_transform(X_train_UPDRS1)

In [6]:
def lgb_eval(num_leaves,max_depth,lambda_l2,lambda_l1,min_child_samples, min_data_in_leaf):
    params = {
        "objective" : "regression",
        "metric" : "RMSE",
        'is_unbalance': True,
        "num_leaves" : int(num_leaves),
        "max_depth" : int(max_depth),
        "lambda_l2" : lambda_l2,
        "lambda_l1" : lambda_l1,
        "num_threads" : 20,
        "min_child_samples" : int(min_child_samples),
        'min_data_in_leaf': int(min_data_in_leaf),
        "learning_rate" : 0.03,
        "subsample_freq" : 5,
        "verbosity" : -1
    }
    lgtrain = lightgbm.Dataset(X_tr_scaled_UPDRS1, y_train.updrs_1)
    cv_result = lightgbm.cv(params,
                       lgtrain,
                        num_boost_round=100,
                       stratified=False,
    callbacks=[
      lightgbm.early_stopping(stopping_rounds=1000),
    ], nfold=3)
    return cv_result['valid rmse-mean'][-1]

Apply the Bayesian optimizer to the function we created in the previous step to identify the best hyperparameters. We will run 10 iterations and set init_points = 2.

In [7]:
lgbBO = BayesianOptimization(lgb_eval,{'num_leaves': (25, 4000),
                                                'max_depth': (5, 63),
                                                'lambda_l2': (0.0, 0.05),
                                                'lambda_l1': (0.0, 0.05),
                                                'min_child_samples': (50, 10000),
                                                'min_data_in_leaf': (100, 2000)
                                                })

lgbBO.maximize(n_iter=10, init_points=2)

|   iter    |  target   | lambda_l1 | lambda_l2 | max_depth | min_ch... | min_da... | num_le... |
-------------------------------------------------------------------------------------------------
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 5.32832 + 0.0533255
| [0m1        [0m | [0m5.328    [0m | [0m0.03547  [0m | [0m0.0148   [0m | [0m12.41    [0m | [0m1.285e+03[0m | [0m1.85e+03 [0m | [0m3.875e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 5.32832 + 0.0533255
| [0m2        [0m | [0m5.328    [0m | [0m0.01732  [0m | [0m0.02653  [0m | [0m62.94    [0m | [0m2.767e+03[0m | [0m1.541e+03[0m | [0m172.6    [0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 5.32832 + 0.0533255
| [0m3     

We would like to see the best parameters and the mean RMSE for the training dataset prediction of UPDRS_1

In [9]:
lgbBO.max

{'target': 5.32832278411098,
 'params': {'lambda_l1': 0.03546828075193804,
  'lambda_l2': 0.0148034797957523,
  'max_depth': 12.405488064426054,
  'min_child_samples': 1284.883105835797,
  'min_data_in_leaf': 1850.1081453022489,
  'num_leaves': 3874.8351282247586}}

After cross validation the RMSE  5.32. We can see if adding the minimum visit month difference and visit_month can improve the RMSE.


We have to convert the visit_month_difference NA which is for the visit month 0, which should be converted to 0 before using it as a feature


In [12]:
X_train["visit_month_diff_min"] = X_train["visit_month_diff_min"].fillna(0)

In [13]:
X_train_UPDRS1=X_train[features_UPDRS1.iloc[:,0].tolist()+["visit_month_diff_min","visit_month"]]

X_tr_scaled_UPDRS1 = scaler.fit_transform(X_train_UPDRS1)

def lgb_eval(num_leaves,max_depth,lambda_l2,lambda_l1,min_child_samples, min_data_in_leaf):
    params = {
        "objective" : "regression",
        "metric" : "RMSE", 
        'is_unbalance': True,
        "num_leaves" : int(num_leaves),
        "max_depth" : int(max_depth),
        "lambda_l2" : lambda_l2,
        "lambda_l1" : lambda_l1,
        "num_threads" : 20,
        "min_child_samples" : int(min_child_samples),
        'min_data_in_leaf': int(min_data_in_leaf),
        "learning_rate" : 0.03,
        "subsample_freq" : 5,
        "verbosity" : -1
    }
    lgtrain = lightgbm.Dataset(X_tr_scaled_UPDRS1, y_train.updrs_1)
    cv_result = lightgbm.cv(params,
                       lgtrain,
                        num_boost_round=100,
                       stratified=False,
    callbacks=[
      lightgbm.early_stopping(stopping_rounds=1000),
    ], nfold=3)
    return cv_result['valid rmse-mean'][-1]
lgbBO = BayesianOptimization(lgb_eval,{'num_leaves': (25, 4000),
                                                'max_depth': (5, 63),
                                                'lambda_l2': (0.0, 0.05),
                                                'lambda_l1': (0.0, 0.05),
                                                'min_child_samples': (50, 10000),
                                                'min_data_in_leaf': (100, 2000)
                                                })

lgbBO.maximize(n_iter=10, init_points=2)


|   iter    |  target   | lambda_l1 | lambda_l2 | max_depth | min_ch... | min_da... | num_le... |
-------------------------------------------------------------------------------------------------
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 5.32832 + 0.0533255
| [0m1        [0m | [0m5.328    [0m | [0m0.007227 [0m | [0m0.03252  [0m | [0m7.724    [0m | [0m4.666e+03[0m | [0m1.08e+03 [0m | [0m1.822e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[99]	cv_agg's valid rmse: 5.06122 + 0.0238773
| [0m2        [0m | [0m5.061    [0m | [0m0.01674  [0m | [0m0.02403  [0m | [0m53.81    [0m | [0m2.239e+03[0m | [0m181.8    [0m | [0m664.1    [0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[100]	cv_agg's valid rmse: 5.12496 + 0.00576576
| [0m3 

Let us look at the best RMSE based on the best hyperparameters

In [14]:
lgbBO.max

{'target': 5.32832278411098,
 'params': {'lambda_l1': 0.007226950545954053,
  'lambda_l2': 0.03251507849571255,
  'max_depth': 7.723790898095893,
  'min_child_samples': 4665.753700337947,
  'min_data_in_leaf': 1080.472603537603,
  'num_leaves': 1822.339242886823}}

It does not seem to improve the RMSE to add the visit min difference and visit id. 

Let us add the RMSE values for the light gbm model for all the scores so that we can compare with SVM. 

In [15]:
RMSE_lightgbm={"UPDRS_1":lgbBO.max["target"]}

Let me see if the same model with UPDRS associated boruta associated features along with adding visit min difference and visit id works well for cross validation for the other UPDRS scores. Although adding visit min difference and visit id did not effect the RMSE in cross validation, adding the features did not cause any decrease in performance so we will continue to add them for looking at the other scores.

In [16]:
X_train_UPDRS2=X_train[features_UPDRS2.iloc[:,0].tolist()+["visit_month_diff_min","visit_month"]]

In [17]:
X_tr_scaled_UPDRS2 = scaler.fit_transform(X_train_UPDRS2)

In [18]:

def lgb_eval(num_leaves,max_depth,lambda_l2,lambda_l1,min_child_samples, min_data_in_leaf):
    params = {
        "objective" : "regression",
        "metric" : "RMSE",
        'is_unbalance': True,
        "num_leaves" : int(num_leaves),
        "max_depth" : int(max_depth),
        "lambda_l2" : lambda_l2,
        "lambda_l1" : lambda_l1,
        "num_threads" : 20,
        "min_child_samples" : int(min_child_samples),
        'min_data_in_leaf': int(min_data_in_leaf),
        "learning_rate" : 0.03,
        "subsample_freq" : 5,
        "verbosity" : -1
    }
    lgtrain = lightgbm.Dataset(X_tr_scaled_UPDRS2, y_train.updrs_2)
    cv_result = lightgbm.cv(params,
                       lgtrain,
                        num_boost_round=100,
                       stratified=False,
    callbacks=[
      lightgbm.early_stopping(stopping_rounds=1000),
    ], nfold=3)
    return cv_result['valid rmse-mean'][-1]
lgbBO = BayesianOptimization(lgb_eval,{'num_leaves': (25, 4000),
                                                'max_depth': (5, 63),
                                                'lambda_l2': (0.0, 0.05),
                                                'lambda_l1': (0.0, 0.05),
                                                'min_child_samples': (50, 10000),
                                                'min_data_in_leaf': (100, 2000)
                                                })

lgbBO.maximize(n_iter=10, init_points=2)


|   iter    |  target   | lambda_l1 | lambda_l2 | max_depth | min_ch... | min_da... | num_le... |
-------------------------------------------------------------------------------------------------
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 5.92679 + 0.225728
| [0m1        [0m | [0m5.927    [0m | [0m0.03546  [0m | [0m0.04793  [0m | [0m61.79    [0m | [0m7.081e+03[0m | [0m790.0    [0m | [0m626.1    [0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 5.92679 + 0.225728
| [0m2        [0m | [0m5.927    [0m | [0m0.04854  [0m | [0m0.03748  [0m | [0m18.67    [0m | [0m5.896e+03[0m | [0m1.564e+03[0m | [0m2.708e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[100]	cv_agg's valid rmse: 5.14934 + 0.2619
| [0m3        

In [19]:
lgbBO.max

{'target': 5.926793395486719,
 'params': {'lambda_l1': 0.035457163492870536,
  'lambda_l2': 0.047927783077294145,
  'max_depth': 61.791820568227806,
  'min_child_samples': 7080.577155513798,
  'min_data_in_leaf': 789.9894378533228,
  'num_leaves': 626.1395586289303}}

In [20]:
RMSE_lightgbm["UPDRS_2"]=lgbBO.max["target"]


The best RMSE for lightgb, is 5.93 for predicting UPDRS2

Let us see the performance of light gbm with UPDRS3. We will be using the upd23b_clinical_state_on_medication as a feature as well along with the important protein/peptide abundance scores and visit_month and minimum visit_difference. upd23b_clinical_state_on_medication  is supposed to effect the UPDRS3 scores according to the initially provided information.
Since our previous notebooks showed that upd23b_clinical_state_on_medication had many missing values and this itself may have some significance, we can replace missing values with a value like “Unknown” or “Missing” using the fillna() method. 

In [21]:
X_train["upd23b_clinical_state_on_medication"] = X_train.upd23b_clinical_state_on_medication.fillna("Unknown")

In [22]:
X_train_UPDRS3=X_train[features_UPDRS3.iloc[:,0].tolist()+["visit_month_diff_min","visit_month","upd23b_clinical_state_on_medication"]]

We need to transform the UPDRS3 scores as we have both numerical and categorical variables

In [23]:
numeric_columns = X_train_UPDRS3.select_dtypes(include=['int64', 'float64']).columns
categorical_columns =X_train_UPDRS3.select_dtypes(include=['object', 'bool']).columns

pipeline=ColumnTransformer([
    ('num',StandardScaler(),numeric_columns),
    ('cat',OneHotEncoder(),categorical_columns),
])

X_tr_scaled_UPDRS3=pipeline.fit_transform(X_train_UPDRS3)

In [23]:
def lgb_eval(num_leaves,max_depth,lambda_l2,lambda_l1,min_child_samples, min_data_in_leaf):
    params = {
        "objective" : "regression",
        "metric" : "RMSE", ""
        'is_unbalance': True,
        "num_leaves" : int(num_leaves),
        "max_depth" : int(max_depth),
        "lambda_l2" : lambda_l2,
        "lambda_l1" : lambda_l1,
        "num_threads" : 20,
        "min_child_samples" : int(min_child_samples),
        'min_data_in_leaf': int(min_data_in_leaf),
        "learning_rate" : 0.03,
        "subsample_freq" : 5,
        "verbosity" : -1
    }
    lgtrain = lightgbm.Dataset(X_tr_scaled_UPDRS3, y_train.updrs_3)
    cv_result = lightgbm.cv(params,
                       lgtrain,
                        num_boost_round=100,
                       stratified=False,
    callbacks=[
      lightgbm.early_stopping(stopping_rounds=1000),
    ], nfold=3)
    return cv_result['valid rmse-mean'][-1]
lgbBO = BayesianOptimization(lgb_eval,{'num_leaves': (25, 4000),
                                                'max_depth': (5, 63),
                                                'lambda_l2': (0.0, 0.05),
                                                'lambda_l1': (0.0, 0.05),
                                                'min_child_samples': (50, 10000),
                                                'min_data_in_leaf': (100, 2000)
                                                })

lgbBO.maximize(n_iter=10, init_points=2)


|   iter    |  target   | lambda_l1 | lambda_l2 | max_depth | min_ch... | min_da... | num_le... |
-------------------------------------------------------------------------------------------------
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 14.9414 + 0.321264
| [0m1        [0m | [0m14.94    [0m | [0m0.007741 [0m | [0m0.01431  [0m | [0m18.62    [0m | [0m9.365e+03[0m | [0m735.6    [0m | [0m3.467e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 14.9414 + 0.321264
| [0m2        [0m | [0m14.94    [0m | [0m0.002207 [0m | [0m0.01569  [0m | [0m61.78    [0m | [0m1.349e+03[0m | [0m1.714e+03[0m | [0m730.1    [0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[100]	cv_agg's valid rmse: 11.2784 + 0.458659
| [0m3      

In [119]:
X_train_UPDRS4=X_train[["C(UniMod_4)AEENC(UniMod_4)FIQK",
"LDEVKEQVAEVR","visit_month_diff_min","visit_month"]]

X_tr_scaled_UPDRS4 = scaler.fit_transform(X_train_UPDRS4)

def lgb_eval(num_leaves,max_depth,lambda_l2,lambda_l1,min_child_samples, min_data_in_leaf):
    params = {
        "objective" : "regression",
        "metric" : "RMSE", ""
        'is_unbalance': True,
        "num_leaves" : int(num_leaves),
        "max_depth" : int(max_depth),
        "lambda_l2" : lambda_l2,
        "lambda_l1" : lambda_l1,
        "num_threads" : 20,
        "min_child_samples" : int(min_child_samples),
        'min_data_in_leaf': int(min_data_in_leaf),
        "learning_rate" : 0.03,
        "subsample_freq" : 5,
        "verbosity" : -1
    }
    lgtrain = lightgbm.Dataset(X_tr_scaled_UPDRS4, y_train.updrs_4)
    cv_result = lightgbm.cv(params,
                       lgtrain,
                        num_boost_round=100,
                       stratified=False,
    callbacks=[
      lightgbm.early_stopping(stopping_rounds=1000),
    ], nfold=3)
    return cv_result['valid rmse-mean'][-1]
lgbBO = BayesianOptimization(lgb_eval,{'num_leaves': (25, 4000),
                                                'max_depth': (5, 63),
                                                'lambda_l2': (0.0, 0.05),
                                                'lambda_l1': (0.0, 0.05),
                                                'min_child_samples': (50, 10000),
                                                'min_data_in_leaf': (100, 2000)
                                                })

lgbBO.maximize(n_iter=10, init_points=2)


|   iter    |  target   | lambda_l1 | lambda_l2 | max_depth | min_ch... | min_da... | num_le... |
-------------------------------------------------------------------------------------------------
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 2.4313 + 0.216878
| [0m1        [0m | [0m2.431    [0m | [0m0.02343  [0m | [0m0.03025  [0m | [0m43.67    [0m | [0m7.206e+03[0m | [0m1.073e+03[0m | [0m2.595e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 2.4313 + 0.216878
| [0m2        [0m | [0m2.431    [0m | [0m0.03314  [0m | [0m0.01789  [0m | [0m12.35    [0m | [0m752.5    [0m | [0m659.2    [0m | [0m3.799e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 2.4313 + 0.216878
| [0m3        [0

In [24]:
lgbBO.max

{'target': 5.926793395486719,
 'params': {'lambda_l1': 0.035457163492870536,
  'lambda_l2': 0.047927783077294145,
  'max_depth': 61.791820568227806,
  'min_child_samples': 7080.577155513798,
  'min_data_in_leaf': 789.9894378533228,
  'num_leaves': 626.1395586289303}}

In [25]:
RMSE_lightgbm["UPDRS_3"]=lgbBO.max["target"]


The best RMSE for UPDRS3 is 14.94

Now for UPDRS4 using light gbm for analysing performance of training data.

In [26]:
X_train_UPDRS4=X_train[features_UPDRS4.iloc[:,0].tolist()+["visit_month_diff_min","visit_month"]]
X_tr_scaled_UPDRS4 = scaler.fit_transform(X_train_UPDRS4)
def lgb_eval(num_leaves,max_depth,lambda_l2,lambda_l1,min_child_samples, min_data_in_leaf):
    params = {
        "objective" : "regression",
        "metric" : "RMSE",
        'is_unbalance': True,
        "num_leaves" : int(num_leaves),
        "max_depth" : int(max_depth),
        "lambda_l2" : lambda_l2,
        "lambda_l1" : lambda_l1,
        "num_threads" : 20,
        "min_child_samples" : int(min_child_samples),
        'min_data_in_leaf': int(min_data_in_leaf),
        "learning_rate" : 0.03,
        "subsample_freq" : 5,
        "verbosity" : -1
    }
    lgtrain = lightgbm.Dataset(X_tr_scaled_UPDRS4, y_train.updrs_4)
    cv_result = lightgbm.cv(params,
                       lgtrain,
                        num_boost_round=100,
                       stratified=False,
    callbacks=[
      lightgbm.early_stopping(stopping_rounds=1000),
    ], nfold=3)
    return cv_result['valid rmse-mean'][-1]
lgbBO = BayesianOptimization(lgb_eval,{'num_leaves': (25, 4000),
                                                'max_depth': (5, 63),
                                                'lambda_l2': (0.0, 0.05),
                                                'lambda_l1': (0.0, 0.05),
                                                'min_child_samples': (50, 10000),
                                                'min_data_in_leaf': (100, 2000)
                                                })

lgbBO.maximize(n_iter=10, init_points=2)



|   iter    |  target   | lambda_l1 | lambda_l2 | max_depth | min_ch... | min_da... | num_le... |
-------------------------------------------------------------------------------------------------
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 2.4313 + 0.216878
| [0m1        [0m | [0m2.431    [0m | [0m0.0118   [0m | [0m0.04891  [0m | [0m42.13    [0m | [0m1.961e+03[0m | [0m1.675e+03[0m | [0m3.233e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 2.4313 + 0.216878
| [0m2        [0m | [0m2.431    [0m | [0m0.01008  [0m | [0m0.03571  [0m | [0m37.14    [0m | [0m3.188e+03[0m | [0m678.2    [0m | [0m2.492e+03[0m |
Training until validation scores don't improve for 1000 rounds
Did not meet early stopping. Best iteration is:
[1]	cv_agg's valid rmse: 2.4313 + 0.216878
| [0m3        [0

In [27]:

RMSE_lightgbm["UPDRS_4"]=lgbBO.max["target"]
print(lgbBO.max)

{'target': 2.4312964721402337, 'params': {'lambda_l1': 0.011797622311499563, 'lambda_l2': 0.0489110702424097, 'max_depth': 42.13032707410743, 'min_child_samples': 1960.846075329814, 'min_data_in_leaf': 1674.9644907790528, 'num_leaves': 3232.9345934308494}}


Best RMSE is 2.43.

Let me see how elastic net regression performs in cross validation in RMSE using the same features that we used for prediction for each of the UPDRS scores. We will be doing five fold cross validation using grid search for hyper parameter tuning looking at different kernels and regularization parameter C. We can use do an exhaustive search using grid search looking at all possible combinations rather than a smart bayesian based approach as we have fewer parameters to tune.

In [28]:
def pred(x,y):
    param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf', 'poly']}
    grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3,scoring="neg_root_mean_squared_error") 
  # fitting the models for grid search 
    grid.fit(x, y) 
    best_params = grid.best_params_
    best_score = grid.best_score_
    return(best_params,best_score)

In [29]:
SVM_1=pred(X_train_UPDRS1, y_train.updrs_1)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END .............C=0.1, kernel=linear;, score=-6.195 total time=   0.4s
[CV 2/5] END .............C=0.1, kernel=linear;, score=-6.094 total time=   0.4s
[CV 3/5] END .............C=0.1, kernel=linear;, score=-6.397 total time=   0.5s
[CV 4/5] END .............C=0.1, kernel=linear;, score=-6.028 total time=   0.4s
[CV 5/5] END .............C=0.1, kernel=linear;, score=-6.121 total time=   0.4s
[CV 1/5] END ................C=0.1, kernel=rbf;, score=-7.654 total time=   0.1s
[CV 2/5] END ................C=0.1, kernel=rbf;, score=-7.751 total time=   0.1s
[CV 3/5] END ................C=0.1, kernel=rbf;, score=-7.929 total time=   0.1s
[CV 4/5] END ................C=0.1, kernel=rbf;, score=-7.560 total time=   0.1s
[CV 5/5] END ................C=0.1, kernel=rbf;, score=-7.478 total time=   0.1s
[CV 1/5] END ...............C=0.1, kernel=poly;, score=-7.378 total time=   0.1s
[CV 2/5] END ...............C=0.1, kernel=poly;, 

In [34]:
SVM_updrs2=pred(X_train_UPDRS2, y_train.updrs_2)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END .............C=0.1, kernel=linear;, score=-6.195 total time=   0.4s
[CV 2/5] END .............C=0.1, kernel=linear;, score=-6.094 total time=   0.4s
[CV 3/5] END .............C=0.1, kernel=linear;, score=-6.397 total time=   0.5s
[CV 4/5] END .............C=0.1, kernel=linear;, score=-6.028 total time=   0.4s
[CV 5/5] END .............C=0.1, kernel=linear;, score=-6.121 total time=   0.4s
[CV 1/5] END ................C=0.1, kernel=rbf;, score=-7.654 total time=   0.1s
[CV 2/5] END ................C=0.1, kernel=rbf;, score=-7.751 total time=   0.1s
[CV 3/5] END ................C=0.1, kernel=rbf;, score=-7.929 total time=   0.1s
[CV 4/5] END ................C=0.1, kernel=rbf;, score=-7.560 total time=   0.1s
[CV 5/5] END ................C=0.1, kernel=rbf;, score=-7.478 total time=   0.1s
[CV 1/5] END ...............C=0.1, kernel=poly;, score=-7.378 total time=   0.1s
[CV 2/5] END ...............C=0.1, kernel=poly;, 

In [38]:
SVM_updrs3=pred(X_train_UPDRS3, y_train.updrs_3)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END ................C=0.1, kernel=linear;, score=nan total time=   0.0s
[CV 2/5] END ................C=0.1, kernel=linear;, score=nan total time=   0.0s
[CV 3/5] END ................C=0.1, kernel=linear;, score=nan total time=   0.0s
[CV 4/5] END ................C=0.1, kernel=linear;, score=nan total time=   0.0s
[CV 5/5] END ................C=0.1, kernel=linear;, score=nan total time=   0.0s
[CV 1/5] END ...................C=0.1, kernel=rbf;, score=nan total time=   0.0s
[CV 2/5] END ...................C=0.1, kernel=rbf;, score=nan total time=   0.0s
[CV 3/5] END ...................C=0.1, kernel=rbf;, score=nan total time=   0.0s
[CV 4/5] END ...................C=0.1, kernel=rbf;, score=nan total time=   0.0s
[CV 5/5] END ...................C=0.1, kernel=rbf;, score=nan total time=   0.0s
[CV 1/5] END ..................C=0.1, kernel=poly;, score=nan total time=   0.0s
[CV 2/5] END ..................C=0.1, kernel=poly

ValueError: could not convert string to float: 'Unknown'

In [None]:
SVM_4=pred(X_train_UPDRS4, y_train.updrs_4)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
[CV 1/5] END .............C=0.1, kernel=linear;, score=-2.676 total time=   0.1s
[CV 2/5] END .............C=0.1, kernel=linear;, score=-2.880 total time=   0.0s
[CV 3/5] END .............C=0.1, kernel=linear;, score=-2.630 total time=   0.0s
[CV 4/5] END .............C=0.1, kernel=linear;, score=-2.441 total time=   0.1s
[CV 5/5] END .............C=0.1, kernel=linear;, score=-2.560 total time=   0.0s
[CV 1/5] END ................C=0.1, kernel=rbf;, score=-2.676 total time=   0.0s
[CV 2/5] END ................C=0.1, kernel=rbf;, score=-2.880 total time=   0.0s
[CV 3/5] END ................C=0.1, kernel=rbf;, score=-2.630 total time=   0.0s
[CV 4/5] END ................C=0.1, kernel=rbf;, score=-2.441 total time=   0.0s
[CV 5/5] END ................C=0.1, kernel=rbf;, score=-2.560 total time=   0.0s
[CV 1/5] END ...............C=0.1, kernel=poly;, score=-2.676 total time=   0.1s
[CV 2/5] END ...............C=0.1, kernel=poly;, 

In [36]:
RMSE_SVM={"UPDRS_1":SVM_1[1],"UPDRS_2":SVM_2[1]}
print(RMSE_SVM)

{'UPDRS_1': -6.166888257925175, 'UPDRS_2': -6.3800217813560005}
