# FNN_TE_ATT  Forward Neural Network with an Attention layer

IDANN Model that predicts critical outcomes, with the attention layer.   

Interpretability of our FNN model is essential for its adoption by healthcare communities. Therefore we implemented a feature attribution system determined by an entry Attention Layer in our FNN. We were inspired by recent studies on healthcare DNN models like Google, Rajkomar et al. (Jan 2018), and Georgia Tech, Sha et al. (Aug 2017). Keras/Tensorflow didn't have an attention layer ready to use, then we implemented one for our model based on formulas explained at Raffel et al. . The attention layer learns to determine a set of weights that represent the relative importance of each feature on a specific prediction. The resulting model not only became interpretable, but also gained in performance! 

In [1]:
import pandas as pd
import numpy as np
import json 
from sklearn.cross_validation import train_test_split

from keras.utils import plot_model
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
import pickle
import sys 
sys.path.append("../../src/models/train_model")
import NN_VE_model
import attention_layer
sys.path.append("../../src/features")
import build_features, vital_signs_features, age_features, RFV_features, RFV_text_vectorizing

%matplotlib inline

Using TensorFlow backend.


In [2]:
pd.options.mode.chained_assignment = None  # default='warn'

## Model Training 

In [3]:
with open('../../fileConfig.json') as config_file:    
        fileConfig = json.load(config_file)

In [5]:
reload(NN_VE_model)
NN_VE_model.FNN_TE_ATT_model_training(fileConfig, 'ED_TOTAL_2009_2009.csv')

Creating text for embeddings
Vocabulary size: 1603
Average text length: 12.6051971547
Max text length: 122


  context_vector =merge([att_weights, inputs], mode='dot', dot_axes=(1,1), name='context_vector_c'+i)
  name=name)


AUROC: 85.22%
AUROC: 85.12%
AUROC: 84.87%
AUROC: 87.98%
AUROC: 87.65%
AUROC: 86.10%
AUROC: 87.80%
AUROC: 86.88%
AUROC: 85.33%
AUROC: 83.94%
ROC AUC: 86.09% (+/- 0.01%)


## Model Training, step by step

### Reading CDC File

In [6]:
#reading file
processedDirectory = fileConfig['dataDirectory'] + fileConfig['processedDirectory'] 
cdc_input = pd.read_csv(processedDirectory + 'ED_TOTAL_2009_2009.csv' )

### Feature Engineering

In [7]:
reload(build_features)
predictors, target = build_features.get_features(cdc_input, with_features_for_Embedding=True)  

Creating text for embeddings


In [8]:
list(predictors)

['Temp_Baseline',
 'Pulse_Baseline',
 'Sys_BP_Baseline',
 'Resp_Rate_Baseline',
 'Oxygen_Sat_Baseline',
 'Reason_Chest_Pain',
 'Reason_Abdominal_Pain',
 'Reason_Headache',
 'Reason_Shortness_of_Breath',
 'Reason_Back_Pain',
 'Reason_Cough',
 'Reason_Nausea_Vomiting',
 'Reason_Fever_Chills',
 'Reason_Syncope',
 'Reason_Dizziness',
 'Reason_Psychiatric_Complaint',
 'Reason_Nervous_System',
 'Reason_Cardiovascular_Other',
 'Reason_Ears_Eyes_Complaint',
 'Reason_Respiratory_Other',
 'Reason_Gastrointestinal_Other',
 'Reason_Genitourinary_Other',
 'Reason_Skin_Hair_Nails_Complaint',
 'Reason_Musculoskeletal_Other',
 'Reason_Injury_Poisoning',
 'Reason_Other',
 'Hypothermia',
 'Hyperthermia',
 'Bradycardia',
 'Mild_Tachycardia',
 'Moderate_Tachycardia',
 'Severe_Tachycardia',
 'Hypotension',
 'Hypertension',
 'Bradypnea',
 'Moderate_Tachypnea',
 'Severe_Tachypnea',
 'Mild_Hypoxia',
 'Severe_Hypoxia',
 'Age_18_30',
 'Age_31_40',
 'Age_41_50',
 'Age_51_60',
 'Age_61_70',
 'Age_71_80',
 'Age_81

### Vectorizing text for Embeddings

In [9]:
# append all RFVn_text  into one text
# vectorize, get a number_id for each word (tokenizer has the dictionary)
# make each rfv_data_vectorized the same length, appending zeroes
# returns MAC_VOCAB: length of the dictionary, max_seq_length: maximum text length 
predictors, max_text_length, MAX_VOCAB,  tokenizer = \
                     RFV_text_vectorizing.vectorize_RFV_text (predictors,  debug=False)   


Vocabulary size: 1603
Average text length: 12.6051971547
Max text length: 122


In [10]:
with open("../../models/cdc_2009_att_text_tokenizer.pickle", "wb") as f:
    pickle.dump(tokenizer, f)
with open("../../models/cdc_2009_att_text_max_length.pickle", "wb") as f:
    pickle.dump(max_text_length, f)

In [11]:
list(predictors)

['Temp_Baseline',
 'Pulse_Baseline',
 'Sys_BP_Baseline',
 'Resp_Rate_Baseline',
 'Oxygen_Sat_Baseline',
 'Reason_Chest_Pain',
 'Reason_Abdominal_Pain',
 'Reason_Headache',
 'Reason_Shortness_of_Breath',
 'Reason_Back_Pain',
 'Reason_Cough',
 'Reason_Nausea_Vomiting',
 'Reason_Fever_Chills',
 'Reason_Syncope',
 'Reason_Dizziness',
 'Reason_Psychiatric_Complaint',
 'Reason_Nervous_System',
 'Reason_Cardiovascular_Other',
 'Reason_Ears_Eyes_Complaint',
 'Reason_Respiratory_Other',
 'Reason_Gastrointestinal_Other',
 'Reason_Genitourinary_Other',
 'Reason_Skin_Hair_Nails_Complaint',
 'Reason_Musculoskeletal_Other',
 'Reason_Injury_Poisoning',
 'Reason_Other',
 'Hypothermia',
 'Hyperthermia',
 'Bradycardia',
 'Mild_Tachycardia',
 'Moderate_Tachycardia',
 'Severe_Tachycardia',
 'Hypotension',
 'Hypertension',
 'Bradypnea',
 'Moderate_Tachypnea',
 'Severe_Tachypnea',
 'Mild_Hypoxia',
 'Severe_Hypoxia',
 'Age_18_30',
 'Age_31_40',
 'Age_41_50',
 'Age_51_60',
 'Age_61_70',
 'Age_71_80',
 'Age_81

## NN model

In [12]:
reload(NN_VE_model)
reload(attention_layer)
nn_model = NN_VE_model.create_model(l2=0.0001, n_units =100, apply_attention=True,
                                    embedding_nh=100, 
                                    input_text_length=max_text_length,
                                    vocab_size=MAX_VOCAB)
nn_model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_70 (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
input_71 (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
input_72 (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
input_62 (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
embedding_

## Train Model

In [16]:
reload(NN_VE_model)
X_train, X_dev, y_train, y_dev = train_test_split(predictors, target, test_size = 0.1)
X_train_list = NN_VE_model.get_x_list(X_train)
X_dev_list = NN_VE_model.get_x_list(X_dev)

In [17]:
reload(NN_VE_model)
reload(attention_layer)
roc_auc, cdc_model = NN_VE_model.train_model(X_train_list, y_train,X_dev_list, y_dev, 
                                  num_epochs=40, l2=0.0001, n_units=50, 
                                apply_attention= True, embedding_nh=50, n_layers =3,att_l2=0.0001,
                                input_text_length=max_text_length,  vocab_size=MAX_VOCAB,  verbose = True )

Train on 21888 samples, validate on 2433 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40
AUROC: 84.88%


In [27]:
reload(NN_VE_model)
reload(attention_layer)
roc_auc, cdc_model = NN_VE_model.train_model(X_train_list, y_train,X_dev_list, y_dev, 
                                  num_epochs=40, l2=0.0001, n_units=50, 
                                apply_attention= True, embedding_nh=50, n_layers =3,att_l2=0.0001,
                                input_text_length=max_text_length,  vocab_size=MAX_VOCAB,  verbose = False )

AUROC: 87.81%


## Cross Validation

In [15]:
reload(NN_VE_model)
NN_VE_model.cross_Validation (40, predictors, target,l2=0.0001,units_n = 50,apply_attention= True,
                              embedding_nh=50,   n_layers =3, att_l2=0.0001,
                             input_text_length=max_text_length,  vocab_size=MAX_VOCAB)

AUROC: 85.24%
AUROC: 85.05%
AUROC: 84.91%
AUROC: 88.07%
AUROC: 87.63%
AUROC: 86.14%
AUROC: 87.64%
AUROC: 86.76%
AUROC: 85.40%
AUROC: 83.94%
ROC AUC: 86.08% (+/- 0.01%)


## Train Model with all data

This is the model that got the highest ROC AUC (using 10 fold cross validation), therefore we will use it for prediction. We will now train this model with all data we have, and that is the model that will be use for prediction with the test data and in the API service.  
References:    
https://stats.stackexchange.com/questions/2306/feature-selection-for-final-model-when-performing-cross-validation-in-machine    
https://machinelearningmastery.com/train-final-machine-learning-model/




In [12]:
X_train_list = NN_VE_model.get_x_list(predictors)


In [14]:
reload(NN_VE_model)
cdc_model = NN_VE_model.train_full_model(X_train_list, target, 
                                  num_epochs=40, l2=0.0001, n_units=50, 
                                apply_attention= True, embedding_nh=50, n_layers =3,att_l2=0.0001,
                                input_text_length=max_text_length,  vocab_size=MAX_VOCAB,  verbose = True )

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


In [15]:
cdc_model.save ('../../models/cdc_2009_nn_att_text_embedding.H5') 
# for w210 results AUC ROC  87.98% 

### Appendix: Other hyper-parameter values

```
NN_VE_model.cross_Validation (20, predictors, target,l2=0.0001,units_n = 50,apply_attention= True,
                              embedding_nh=50,   n_layers =3, att_l2=0.0001,
                             input_text_length=max_text_length,  vocab_size=MAX_VOCAB)
AUROC: 84.76%
AUROC: 84.43%
AUROC: 84.56%
AUROC: 88.17%
AUROC: 87.41%
AUROC: 85.60%
AUROC: 87.24%
AUROC: 86.49%
AUROC: 85.12%
AUROC: 83.62%
ROC AUC: 85.74% (+/- 0.01%)
```

```
NN_VE_model.cross_Validation (50, predictors, target,l2=0.0001,units_n = 50,apply_attention= True,
                              embedding_nh=50,   n_layers =3, att_l2=0.0001,
                             input_text_length=max_seq_length,  vocab_size=MAX_VOCAB)
AUROC: 85.00%                             

```

```
reload(NN_VE_model)
NN_VE_model.cross_Validation (10, predictors, target,l2=0.0001,units_n = 50,apply_attention= True,
                              embedding_nh=50,   n_layers =3, att_l2=0.0001,
                             input_text_length=max_seq_length,  vocab_size=MAX_VOCAB)
                             AUROC: 84.57%
AUROC: 84.02%
AUROC: 84.27%
AUROC: 88.04%
AUROC: 86.91%
AUROC: 85.35%
AUROC: 87.14%
AUROC: 86.55%
AUROC: 85.22%
AUROC: 83.15%
ROC AUC: 85.52% (+/- 0.01%)
```

```
reload(NN_VE_model)
NN_VE_model.cross_Validation (10, predictors, target,l2=0.0001,units_n = 50,apply_attention= True,
                              embedding_nh=50,   n_layers =2, att_l2=0.0001,
                             input_text_length=max_seq_length,  vocab_size=MAX_VOCAB)
AUROC: 84.36%
AUROC: 83.93%
AUROC: 84.34%
AUROC: 88.18%
AUROC: 87.14%
AUROC: 85.17%
AUROC: 87.09%
AUROC: 86.18%
AUROC: 85.22%
AUROC: 83.19%
ROC AUC: 85.48% (+/- 0.02%)
``` 

```
reload(NN_VE_model)
NN_VE_model.cross_Validation (5, predictors, target,l2=0.0001,units_n = 50,apply_attention= True,
                              embedding_nh=100,   n_layers =2, att_l2=0.0001,
                             input_text_length=max_seq_length,  vocab_size=MAX_VOCAB)
                             
AUROC: 84.17%
AUROC: 83.45%
AUROC: 83.74%
AUROC: 88.00%
AUROC: 86.77%
AUROC: 84.79%
AUROC: 86.78%
AUROC: 85.67%
AUROC: 85.17%
AUROC: 82.16%
ROC AUC: 85.07% (+/- 0.02%)
```