# LR_RMH: Logistic Regression with additional features 

We added features to the LR_BAS model, features we thought would improve the model performance:
* some patient medical history available during triage (e.g: history of diabetes)
* MSA indicator for Metropolitan status (if the ER is located in a Metropolitan area or not) 
* adding Reason for Visit (RFV) codes as vectors to capture its hierarchical semantic into the model . 

The LR model performance improved. 
    


In [18]:
%c inline
import pandas as pd
import numpy as np
import json
from sklearn.cross_validation import train_test_split

import sys 
sys.path.append("../../src/models/train_model")
import LR_model 
sys.path.append("../../src/features")
import build_features, vital_signs_features, age_features, RFV_features


%matplotlib inline

ERROR:root:Line magic function `%c` not found.


## Model Training

In [19]:
with open('../../fileConfig.json') as config_file:    
        fileConfig = json.load(config_file)

In [20]:
# Training model via LR_model.py method
LR_model.LR_RMH_model_training(fileConfig)

ROC_AUC = 0.8428 
ROC_AUC = 0.8330 
ROC_AUC = 0.8376 
ROC_AUC = 0.8777 
ROC_AUC = 0.8663 
ROC_AUC = 0.8420 
ROC_AUC = 0.8601 
ROC_AUC = 0.8447 
ROC_AUC = 0.8421 
ROC_AUC = 0.8273 
ROC AUC: 0.8474% (+/- 0.01%


## Model Training, step by step

### Reading CDC File

In [2]:
pd.options.mode.chained_assignment = None  # default='warn'

In [6]:
#reading file
processedDirectory = fileConfig['dataDirectory'] + fileConfig['processedDirectory'] 
cdc_input = pd.read_csv(processedDirectory + 'ED_TOTAL_2009_2009.csv' )



###  Feature Engineering

Adding new features incrementally, to see its impact

#### (1) New feature: splitting RFV codes as vectors

Each RFV code follows a hierarchy, there is knowledge represented in the digits that are part of the RFV code,
those digits represent which RFV are similar (like embeddings)   

Here we make a RFV code, like '10302' to 5 inputs 1,0.3,0,2 

In [9]:
reload(build_features)
reload(RFV_features)
predictors, target = build_features.get_baseline_features (cdc_input )
predictors = RFV_features.make_rfv_digit_features (predictors, cdc_input)


In [10]:
list (predictors)

['Temp_Baseline',
 'Pulse_Baseline',
 'Sys_BP_Baseline',
 'Resp_Rate_Baseline',
 'Oxygen_Sat_Baseline',
 'Reason_Chest_Pain',
 'Reason_Abdominal_Pain',
 'Reason_Headache',
 'Reason_Shortness_of_Breath',
 'Reason_Back_Pain',
 'Reason_Cough',
 'Reason_Nausea_Vomiting',
 'Reason_Fever_Chills',
 'Reason_Syncope',
 'Reason_Dizziness',
 'Reason_Psychiatric_Complaint',
 'Reason_Nervous_System',
 'Reason_Cardiovascular_Other',
 'Reason_Ears_Eyes_Complaint',
 'Reason_Respiratory_Other',
 'Reason_Gastrointestinal_Other',
 'Reason_Genitourinary_Other',
 'Reason_Skin_Hair_Nails_Complaint',
 'Reason_Musculoskeletal_Other',
 'Reason_Injury_Poisoning',
 'Reason_Other',
 'Hypothermia',
 'Hyperthermia',
 'Bradycardia',
 'Mild_Tachycardia',
 'Moderate_Tachycardia',
 'Severe_Tachycardia',
 'Hypotension',
 'Hypertension',
 'Bradypnea',
 'Moderate_Tachypnea',
 'Severe_Tachypnea',
 'Mild_Hypoxia',
 'Severe_Hypoxia',
 'Age_18_30',
 'Age_31_40',
 'Age_41_50',
 'Age_51_60',
 'Age_61_70',
 'Age_71_80',
 'Age_81

#### Logistic Model

In [11]:
LR_model.cross_LR_Validation ( predictors, target, c=1.0)

ROC_AUC = 0.8344 
ROC_AUC = 0.8279 
ROC_AUC = 0.8283 
ROC_AUC = 0.8749 
ROC_AUC = 0.8592 
ROC_AUC = 0.8303 
ROC_AUC = 0.8499 
ROC_AUC = 0.8351 
ROC_AUC = 0.8299 
ROC_AUC = 0.8228 
ROC AUC: 0.8393% (+/- 0.02%


#### (3)   Adding MSA and some medical history for chronical diseases  
These type of fields are used by other papers when predicting critical outcomes, and they are available during Triage

In [12]:
reload(build_features)
predictors, target = build_features.get_all_features (cdc_input )

In [13]:
list(predictors)

['Temp_Baseline',
 'Pulse_Baseline',
 'Sys_BP_Baseline',
 'Resp_Rate_Baseline',
 'Oxygen_Sat_Baseline',
 'Reason_Chest_Pain',
 'Reason_Abdominal_Pain',
 'Reason_Headache',
 'Reason_Shortness_of_Breath',
 'Reason_Back_Pain',
 'Reason_Cough',
 'Reason_Nausea_Vomiting',
 'Reason_Fever_Chills',
 'Reason_Syncope',
 'Reason_Dizziness',
 'Reason_Psychiatric_Complaint',
 'Reason_Nervous_System',
 'Reason_Cardiovascular_Other',
 'Reason_Ears_Eyes_Complaint',
 'Reason_Respiratory_Other',
 'Reason_Gastrointestinal_Other',
 'Reason_Genitourinary_Other',
 'Reason_Skin_Hair_Nails_Complaint',
 'Reason_Musculoskeletal_Other',
 'Reason_Injury_Poisoning',
 'Reason_Other',
 'Hypothermia',
 'Hyperthermia',
 'Bradycardia',
 'Mild_Tachycardia',
 'Moderate_Tachycardia',
 'Severe_Tachycardia',
 'Hypotension',
 'Hypertension',
 'Bradypnea',
 'Moderate_Tachypnea',
 'Severe_Tachypnea',
 'Mild_Hypoxia',
 'Severe_Hypoxia',
 'Age_18_30',
 'Age_31_40',
 'Age_41_50',
 'Age_51_60',
 'Age_61_70',
 'Age_71_80',
 'Age_81

### LR Model Training

In [14]:
reload(LR_model)
LR_model.cross_LR_Validation ( predictors, target, c=1.0)

ROC_AUC = 0.8428 
ROC_AUC = 0.8330 
ROC_AUC = 0.8376 
ROC_AUC = 0.8777 
ROC_AUC = 0.8663 
ROC_AUC = 0.8420 
ROC_AUC = 0.8601 
ROC_AUC = 0.8447 
ROC_AUC = 0.8421 
ROC_AUC = 0.8273 
ROC AUC: 0.8474% (+/- 0.01%
