# Description of the techniques used:

Metrics that can provide better insight are:

Confusion Matrix:  a table showing correct predictions and types of incorrect predictions.

Precision:    the number of true positives divided by all positive predictions. 

Precision is also called Positive Predictive Value. It is a measure of a classifier’s exactness. Low precision indicates a high number of false positives.

Recall:    the number of true positives divided by the number of positive values in the test data. 

The recall is also called Sensitivity or the True Positive Rate. It is a measure of a classifier’s completeness. Low recall indicates a high number of false negatives.

F1-Score: the weighted average of precision and recall.

Area Under ROC Curve (AUROC): AUROC represents the likelihood of your model distinguishing observations from two classes.


In other words, if you randomly select one observation from each class, what’s the probability that your model will be able to “rank” them correctly?


What is the SOFA Score?
The Sequential Organ Failure Assessment (SOFA) score is a scoring system that assesses the
performance of several organ systems in the body (neurologic, blood, liver, kidney, and blood
pressure/hemodynamics) and assigns a score based on the data obtained in each category. The
higher the SOFA score, the higher the likely mortality. 

SAPS II was designed to measure the severity of disease for patients admitted to Intensive care units aged 18 or more.

24 hours after admission to the ICU, the measurement has been completed and resulted in an integer point score between 0 and 163 and a predicted mortality between 0% and 100%. No new score can be calculated during the stay. If a patient is discharged from the ICU and readmitted, a new SAPS II score can be calculated.

This scoring system is mostly used to:

describe the morbidity of a patient when comparing the outcome with other patients.
describe the morbidity of a group of patients when comparing the outcome with another group of patients

 Kendall : rank correlation coefficient, commonly referred to as Kendall's τ coefficient (after the Greek letter τ, tau), is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient.

 Pearson : It is the ratio between the covariance[3][circular reference] of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. 

The point of the random state is so that the train_test_split will return the same split each time, giving consistency to your model. For example, say you don't set a random state (or you set it to None). Doing this means that every time you run your model, the split will occur, and because the random state wasn't set the split will be different every time. In other words, the train and test sets won't always be the same - they will have different values in them.

SAPS II score above 35 indicates the mortality risk >80% [3]. The SOFA score predicts mortality risk at initial ICU admission and in the following hours based on the status of body systems: respiratory, cardiovascular (and specifically coagulation), hepatic, renal, and neurological. A total SOFA score >3 reflects organ failure

# Models

In [1]:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
import os
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

In [2]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

root_dir = "/content/gdrive/My Drive/Colab Notebooks/ICU Project/"
directory = root_dir + 'assets/'

Mounted at /content/gdrive


In [3]:
# Reading DataSet
df = pd.read_csv(directory + 'f_means_df.csv')

In [4]:
info_df = pd.read_csv(directory + 'info.csv')

In [5]:
extracted_col = info_df["Gender"]
df = df.join(extracted_col)

In [6]:
df.columns

Index(['RecordID', 'In-hospital_death', 'ICUType', 'Age', 'SOFA', 'SAPS-I',
       'Weight', 'HR', 'BUN', 'Creatinine', 'GCS', 'Temp', 'HCT', 'Platelets',
       'WBC', 'Na', 'HCO3', 'k', 'Mg', 'Glucose', 'Urine', 'NISysABP',
       'NIDiasABP', 'NIMAP', 'pH', 'PaCO2', 'PaO2', 'DiasABP', 'SysABP', 'MAP',
       'FiO2', 'MechVent', 'Lactate', 'Gender'],
      dtype='object')

 **1-Applying SVM without handling the imbalanced class distribution**

In [7]:
X = df.drop('In-hospital_death', axis=1)
y = df['In-hospital_death']

In [8]:
print(X,y)

      RecordID  ICUType   Age  SOFA  SAPS-I     Weight          HR        BUN  \
0       132539      4.0  54.0     1       6  -1.000000   70.810811  10.500000   
1       132540      2.0  76.0     8      16  80.670588   80.794118  18.333333   
2       132541      3.0  44.0    11      21  56.700000   83.759259   4.666667   
3       132543      3.0  68.0     1       7  84.600000   70.983333  17.666667   
4       132545      3.0  88.0     2      17  -1.000000   74.958333  35.000000   
...        ...      ...   ...   ...     ...        ...         ...        ...   
3995    142603      4.0  61.0     7      13  80.375000   92.333333  29.666667   
3996    142618      3.0  57.0     8      21  93.500000  100.108434  13.000000   
3997    142626      4.0  83.0     4      16  70.000000   93.139241  28.250000   
3998    142638      3.0  74.0     7      19  65.000000   57.946429  57.000000   
3999    142671      3.0  37.0    10      22  87.400000   88.461538  89.250000   

      Creatinine        GCS

In [9]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 20)

print("Number transactions X_train dataset: ", X_train.shape)
print("Number transactions y_train dataset: ", y_train.shape)
print("Number transactions X_test dataset: ", X_test.shape)
print("Number transactions y_test dataset: ", y_test.shape)

Number transactions X_train dataset:  (2800, 33)
Number transactions y_train dataset:  (2800,)
Number transactions X_test dataset:  (1200, 33)
Number transactions y_test dataset:  (1200,)


In [10]:
#  training the model without handling the imbalanced class distribution

from sklearn.svm import SVC
svclassifier = SVC(kernel='rbf')
svclassifier.fit(X_train, y_train)
y_pred = svclassifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import metrics
print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
print('-------------------------------------------------------------------------')
print(confusion_matrix(y_test,y_pred))
print('-------------------------------------------------------------------------')
print(classification_report(y_test,y_pred))


area under curve (auc):  0.5
-------------------------------------------------------------------------
[[1040    0]
 [ 160    0]]
-------------------------------------------------------------------------
              precision    recall  f1-score   support

           0       0.87      1.00      0.93      1040
           1       0.00      0.00      0.00       160

    accuracy                           0.87      1200
   macro avg       0.43      0.50      0.46      1200
weighted avg       0.75      0.87      0.80      1200



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))




```
# The recall of the minority class is zero. It proves that the model is more biased towards majority class. And the classifier did nothing.
```



**Building the model using ANN without handling the class imbalance**

In [11]:
import tensorflow as tf
from tensorflow import keras

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 20)

In [15]:
def ANN(X_train,y_train,X_test,y_test,loss,weights):
  random_state =[20]
  for r in random_state:
    model=keras.Sequential([
        keras.layers.Dense(20,input_shape=(33,),activation='relu'),
        keras.layers.Dense(40,activation='relu'),
        keras.layers.Dense(1,activation='sigmoid')                    
    ])
    model.compile(optimizer='adam',loss=loss,metrics=['accuracy'])
    
    if weights==-1:
      model.fit(X_train,y_train,epochs=100)

    else:
      model.fit(X_train,y_train,epochs=100,class_weight=weights)
        
    print(model.evaluate(X_test,y_test))

    y_pred=model.predict(X_test)
    y_pred=np.round(y_pred)
    
    print('Classification Report :\n',classification_report(y_test,y_pred))
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(accuracy_score(y_test,y_pred))
    print(confusion_matrix(y_test,y_pred))

    return y_pred

    

In [16]:
y_pred=ANN(X_train,y_train,X_test,y_test,'binary_crossentropy',0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

**2-Using SMOTE technique to overSample the minority class**

In [23]:
# Using SMOTE Algorithm to handle the imbalance of the data

print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0)))
  
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 20)
X_train_res, y_train_res = sm.fit_resample(X_train, y_train.ravel())
  
print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape))
  
print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0)))



Before OverSampling, counts of label '1': 394
Before OverSampling, counts of label '0': 2406 

After OverSampling, the shape of train_X: (4812, 33)
After OverSampling, the shape of train_y: (4812,) 

After OverSampling, counts of label '1': 2406
After OverSampling, counts of label '0': 2406




```
# SMOTE Algorithm has oversampled the minority instances and made it equal to majority class.
```



**3- Repeating ANN after applying SMOTE technique**

In [24]:
X_train_res.shape, y_train_res.shape

((4812, 33), (4812,))

In [37]:
def ANN(X_train_res,y_train_res,X_test,y_test,loss,weights):
  random_state =[20]
  for r in random_state:
      model=keras.Sequential([
        keras.layers.Dense(20,input_shape=(33,),activation='relu'),
        keras.layers.Dense(40,activation='relu'),
        keras.layers.Dense(1,activation='sigmoid')                            
         ])
      model.compile(optimizer='adam',loss=loss,metrics=['accuracy'])
    
      if weights==-1:
        model.fit(X_train_res,y_train_res,epochs=100)

      else:
        model.fit(X_train_res,y_train_res,epochs=100,class_weight=weights)
        
        print(model.evaluate(X_test,y_test))

        y_pred=model.predict(X_test)
        y_pred=np.round(y_pred)
    
        print('Classification Report :\n',classification_report(y_test,y_pred))
        print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
        print('-------------------------------------------------------------------------')
        print(accuracy_score(y_test,y_pred))
        print(confusion_matrix(y_test,y_pred))

  return y_pred


In [38]:
y_pred=ANN1(X_train_res,y_train_res,X_test,y_test,'binary_crossentropy',0)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

**3- Repeating SVM after applying SMOTE technique**

In [None]:
# Repeating SVM after applying SMOTE technique
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

random_state =[10,20,30,40,50,60,70]
for r in random_state:
  svclassifier = SVC(kernel='rbf',random_state=r)
  svclassifier.fit(X_train_res, y_train_res)
  y_pred = svclassifier.predict(X_test)
  from sklearn import metrics
  print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
  print('-------------------------------------------------------------------------')
  print(confusion_matrix(y_test,y_pred))
  print(classification_report(y_test,y_pred))





```
# The recall value of the minority class has improved to 49%
```




**4- Apply SVM-PCA after Applying SMOTE**


In [None]:
# Apply SVM-PCA after Applying SMOTE
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

random_state =[10,20,30,40,50,60,70]
for r in random_state:

      pca = PCA(n_components=15, whiten=True, random_state=r)
      svc = SVC(kernel='rbf', class_weight='balanced')
      model = make_pipeline(pca, svc)

      from sklearn.preprocessing import StandardScaler

      scaler = StandardScaler()
      scaled_X_train = scaler.fit_transform(X_train_res)
      scaled_X_test = scaler.transform(X_test)

      from sklearn.model_selection import GridSearchCV
      param_grid = {'svc__C': [1, 5, 10, 50],
                    'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
      grid = GridSearchCV(model, param_grid)

      %time grid.fit(scaled_X_train, y_train_res)
      print(grid.best_params_)

      model = grid.best_estimator_
      ypred = model.predict(scaled_X_test)
      from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
      from sklearn import metrics
      print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
      print('-------------------------------------------------------------------------')
      print(accuracy_score(y_test,ypred))
      print(confusion_matrix(y_test,ypred))
      print(classification_report(y_test,ypred))




```
# The accuarcy and recall has improved.
```



**5- Applying RandomForest after Applying SMOTE**

In [None]:
# Applying RandomForest after Applying SMOTE
# load library
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

random_state =[10,20,30,40,50,60,70]
for r in random_state:
    rfc = RandomForestClassifier(random_state=r)

    # fit the predictor and target
    rfc.fit(X_train_res, y_train_res)

    # predict
    rfc_predict = rfc.predict(X_test)# check performance
    print('ROCAUC score:',roc_auc_score(y_test, rfc_predict))
    print('Accuracy score:',accuracy_score(y_test, rfc_predict))
    print('F1 score:',f1_score(y_test, rfc_predict))
    print("---------------------------------------------")



```
                                                  # Under Sampling (Near-Miss)
```



### **1- Appling NearMiss technique to Under-sample the majority class and see its accuracy and recall results.**

In [None]:
print("Before Undersampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before Undersampling, counts of label '0': {} \n".format(sum(y_train == 0)))
  
# apply near miss
from imblearn.under_sampling import NearMiss
nr = NearMiss()
  
X_train_miss, y_train_miss = nr.fit_resample(X_train, y_train.ravel())
  
print('After Undersampling, the shape of train_X: {}'.format(X_train_miss.shape))
print('After Undersampling, the shape of train_y: {} \n'.format(y_train_miss.shape))
  
print("After Undersampling, counts of label '1': {}".format(sum(y_train_miss == 1)))
print("After Undersampling, counts of label '0': {}".format(sum(y_train_miss == 0)))



```
# The NearMiss Algorithm has undersampled the majority instances and made it equal to majority class.
```

 

**2- Apply SVM after applying NearMiss**

In [None]:
from sklearn.svm import SVC
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    svclassifier = SVC(kernel='rbf',random_state=r)
    svclassifier.fit(X_train_miss, y_train_miss)
    y_pred = svclassifier.predict(X_test)
    from sklearn.metrics import classification_report, confusion_matrix
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(confusion_matrix(y_test,y_pred))
    print(classification_report(y_test,y_pred))
    print('----------------------------------')



```
# The accuarcy slightly increased to 51%
```



**3- Apply SVM-PCA after Applying NearMiss**

In [None]:
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

random_state =[10,20,30,40,50,60,70]
for r in random_state:

    pca = PCA(n_components=15, whiten=True, random_state=r)
    svc = SVC(kernel='rbf', class_weight='balanced')
    model = make_pipeline(pca, svc)

    from sklearn.preprocessing import StandardScaler

    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(X_train_miss)
    scaled_X_test = scaler.transform(X_test)

    from sklearn.model_selection import GridSearchCV
    param_grid = {'svc__C': [1, 5, 10, 50],
                  'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
    grid = GridSearchCV(model, param_grid)

    %time grid.fit(scaled_X_train, y_train_miss)
    print(grid.best_params_)

    model = grid.best_estimator_
    ypred = model.predict(scaled_X_test)
    from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(accuracy_score(y_test,ypred))
    print(confusion_matrix(y_test,ypred))
    print(classification_report(y_test,ypred))
    print('------------------------------------------')

**```
Using SVM with PCA gives us better recall results in both classes and better accuarcy.
```**

**4- Applying RandomForest Classifier after applying NearMiss**

In [None]:
# load library
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    rfc = RandomForestClassifier(random_state=r)

    # fit the predictor and target
    rfc.fit(X_train_miss, y_train_miss)

    # predict
    rfc_predict = rfc.predict(X_test)# check performance
    print('ROCAUC score:',roc_auc_score(y_test, rfc_predict))
    print('Accuracy score:',accuracy_score(y_test, rfc_predict))
    print('F1 score:',f1_score(y_test, rfc_predict))
    print('-------------------------------------------------------')



# **LDA, NBayes, Logistic Regression with OverSampling**:




In [None]:
xl = df.drop('In-hospital_death', axis=1)
yl = df['In-hospital_death']

In [None]:
from sklearn.model_selection import train_test_split
Xl_train, Xl_test, yl_train, yl_test = train_test_split(xl, yl, test_size = 0.3, random_state = 0)

In [None]:
# Using SMOTE Algorithm to handle the imbalance of the data

print("Before OverSampling, counts of label '1': {}".format(sum(yl_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(yl_train == 0)))
  
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 20)
Xl_train_res, yl_train_res = sm.fit_resample(Xl_train, yl_train.ravel())
  
print('After OverSampling, the shape of train_X: {}'.format(Xl_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(yl_train_res.shape))
  
print("After OverSampling, counts of label '1': {}".format(sum(yl_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(yl_train_res == 0)))

**1- Logistic regression after oversampling the data (SMOOT) :**

In [None]:
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
xtrain = sc_x.fit_transform(Xl_train_res)
xtest = sc_x.transform(Xl_test)
 
#print (xtrain[0:10, :])
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(Xl_train_res, yl_train_res)

y_pred = classifier.predict(Xl_test)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(yl_test, y_pred)
 
print ("Confusion Matrix : \n", cm)
from sklearn.metrics import accuracy_score
from sklearn import metrics
print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
print('-------------------------------------------------------------------------')
print ("Accuracy : ", accuracy_score(yl_test, y_pred))



```
# True Positive + True Negative = 837 + 133
  False Positive + False Negative = 187 + 47
  Performance measure – Accuracy 
```



**2- Linear Disciminant Analysis (LDA) after Oversampling the data:**

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis 
#Fit the LDA model
model = LinearDiscriminantAnalysis()
model.fit(Xl_train_res,yl_train_res)
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
#Define method to evaluate model
cv = RepeatedStratifiedKFold(n_splits=20, n_repeats=5, random_state=1)

#evaluate model
scores = cross_val_score(model, Xl_train_res, yl_train_res, scoring='accuracy', cv=cv, n_jobs=-1)
print(np.mean(scores))   

**3- Gaussian Naive Bayes after overSampling the data (SMOOT):**

In [None]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
gnb = GaussianNB()

#Train the model using the training sets
gnb.fit(Xl_train_res, yl_train_res)

#Predict the response for test dataset
y_pred = gnb.predict(Xl_test)

#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

print("area under curve (auc): ", metrics.roc_auc_score(yl_test,y_pred))
print('-------------------------------------------------------------------------')
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(yl_test, y_pred))

# **LDA, NBayes, Logistic Regression with UnderSampling**:

In [None]:
xl = df.drop('In-hospital_death', axis=1)
yl = df['In-hospital_death']

In [None]:
from sklearn.model_selection import train_test_split
Xu_train, Xu_test, yu_train, yu_test = train_test_split(xl, yl, test_size = 0.3, random_state = 0)

In [None]:
print("Before Undersampling, counts of label '1': {}".format(sum(yu_train == 1)))
print("Before Undersampling, counts of label '0': {} \n".format(sum(yu_train == 0)))
  
# apply near miss
from imblearn.under_sampling import NearMiss
nr = NearMiss()
  
X_train_miss, y_train_miss = nr.fit_resample(Xu_train, yu_train.ravel())
  
print('After Undersampling, the shape of train_X: {}'.format(X_train_miss.shape))
print('After Undersampling, the shape of train_y: {} \n'.format(y_train_miss.shape))
  
print("After Undersampling, counts of label '1': {}".format(sum(y_train_miss == 1)))
print("After Undersampling, counts of label '0': {}".format(sum(y_train_miss == 0)))

**1- Logistic regression after undersampling the data:**

In [None]:
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
xtrain = sc_x.fit_transform(X_train_miss)
xtest = sc_x.transform(Xu_test)
 
#print (xtrain[0:10, :])
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train_miss, y_train_miss)

y_pred = classifier.predict(Xu_test)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(yu_test, y_pred)
 
print ("Confusion Matrix : \n", cm)
from sklearn import metrics
print("area under curve (auc): ", metrics.roc_auc_score(yu_test,y_pred))
print('-------------------------------------------------------------------------')
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(yu_test, y_pred))

**2- Linear Disciminant Analysis (LDA) after undersampling :**

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis 
#Fit the LDA model
model = LinearDiscriminantAnalysis()
model.fit(X_train_miss,y_train_miss)
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
#Define method to evaluate model
cv = RepeatedStratifiedKFold(n_splits=20, n_repeats=5, random_state=1)

#evaluate model
scores = cross_val_score(model, X_train_miss, y_train_miss, scoring='accuracy', cv=cv, n_jobs=-1)

print(np.mean(scores))   

**3- Gaussian Naive Bayes after underSampling the data (NearMiss):**

In [None]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
gnb = GaussianNB()

#Train the model using the training sets
gnb.fit(X_train_miss, y_train_miss)

#Predict the response for test dataset
y_pred = gnb.predict(Xl_test)

#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
print("area under curve (auc): ", metrics.roc_auc_score(yl_test,y_pred))
print('-------------------------------------------------------------------------')
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(yl_test, y_pred))

# Feature Selection  techniques:

The performance of machine learning algorithms can degrade with too many input variables.

So far the model were trained on all features without focusing on the features 
that might affect directly the rate of Mortaility in the ICU.


**1 - Using ‘Pearson’ method to find the highly correlated features :** 

In [39]:
# Getting the features that are highly correlated with the In-hospital-death 
cor = df.corr(method ='pearson')
#Correlation with output variable
cor_target = abs(cor["In-hospital_death"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.14]
relevant_features

In-hospital_death    1.000000
SOFA                 0.177288
SAPS-I               0.156008
BUN                  0.224256
GCS                  0.221835
FiO2                 0.346808
MechVent             0.280123
Lactate              0.272493
Name: In-hospital_death, dtype: float64

In [40]:
# Creating a set of the 
lst =  df[['In-hospital_death','SOFA','SAPS-I', 'BUN', 'GCS', 'FiO2','MechVent','Lactate']]
df_pearson =lst.copy()

In [41]:
X_p = df_pearson.drop('In-hospital_death', axis=1)
y_p = df_pearson['In-hospital_death']

In [42]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

Xp_train, Xp_test, yp_train, yp_test = train_test_split(X_p, y_p, test_size = 0.3, random_state = 20)


In [43]:
#  training the model without handling the imbalanced class distribution

from sklearn.svm import SVC
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    rfc = RandomForestClassifier(random_state=r)
    svclassifier = SVC(kernel='rbf')
    svclassifier.fit(Xp_train, yp_train)
    yp_pred = svclassifier.predict(Xp_test)
    from sklearn.metrics import classification_report, confusion_matrix
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(yp_test,yp_pred))
    print('-------------------------------------------------------------------------')
    print(confusion_matrix(yp_test,yp_pred))
    print(classification_report(yp_test,yp_pred))
    print('----------------------------------------')

area under curve (auc):  0.503125
-------------------------------------------------------------------------
[[1040    0]
 [ 159    1]]
              precision    recall  f1-score   support

           0       0.87      1.00      0.93      1040
           1       1.00      0.01      0.01       160

    accuracy                           0.87      1200
   macro avg       0.93      0.50      0.47      1200
weighted avg       0.89      0.87      0.81      1200

----------------------------------------
area under curve (auc):  0.503125
-------------------------------------------------------------------------
[[1040    0]
 [ 159    1]]
              precision    recall  f1-score   support

           0       0.87      1.00      0.93      1040
           1       1.00      0.01      0.01       160

    accuracy                           0.87      1200
   macro avg       0.93      0.50      0.47      1200
weighted avg       0.89      0.87      0.81      1200

-----------------------------------

In [52]:
# Using SMOTE Algorithm to handle the imbalance of the data

print("Before OverSampling, counts of label '1': {}".format(sum(yp_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(yp_train == 0)))
  
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 20)
Xp_train_res, yp_train_res = sm.fit_resample(Xp_train, yp_train.ravel())
  
print('After OverSampling, the shape of train_X: {}'.format(Xp_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(yp_train_res.shape))
  
print("After OverSampling, counts of label '1': {}".format(sum(yp_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(yp_train_res == 0)))


Before OverSampling, counts of label '1': 394
Before OverSampling, counts of label '0': 2406 

After OverSampling, the shape of train_X: (4812, 7)
After OverSampling, the shape of train_y: (4812,) 

After OverSampling, counts of label '1': 2406
After OverSampling, counts of label '0': 2406


**Repeating SVM after applying SMOTE and Feature Selection(Pearson)**

In [53]:
#Repeating SVM after applying SMOTE and Feature Selection
from sklearn.svm import SVC
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    svclassifier = SVC(kernel='rbf',random_state=r)
    svclassifier.fit(Xp_train_res, yp_train_res)
    yp_pred = svclassifier.predict(Xp_test)
    from sklearn.metrics import classification_report, confusion_matrix
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(confusion_matrix(yp_test,yp_pred))
    print(classification_report(yp_test,yp_pred))
    print('-----------------------------------------')

area under curve (auc):  0.5718749999999999
-------------------------------------------------------------------------
[[729 311]
 [ 44 116]]
              precision    recall  f1-score   support

           0       0.94      0.70      0.80      1040
           1       0.27      0.72      0.40       160

    accuracy                           0.70      1200
   macro avg       0.61      0.71      0.60      1200
weighted avg       0.85      0.70      0.75      1200

-----------------------------------------
area under curve (auc):  0.5718749999999999
-------------------------------------------------------------------------
[[729 311]
 [ 44 116]]
              precision    recall  f1-score   support

           0       0.94      0.70      0.80      1040
           1       0.27      0.72      0.40       160

    accuracy                           0.70      1200
   macro avg       0.61      0.71      0.60      1200
weighted avg       0.85      0.70      0.75      1200

----------------------

In [54]:
def ANN(Xp_train_res,yp_train_res,Xp_test,yp_test,loss,weights):
    model=keras.Sequential([
        keras.layers.Dense(20,input_shape=(22,),activation='relu'),
        keras.layers.Dense(40,activation='relu'),
       
        keras.layers.Dense(1,activation='sigmoid')            
    ])
    model.compile(optimizer='adam',loss=loss,metrics=['accuracy'])
    
    if weights==-1:
        model.fit(Xp_train_res,y_train,epochs=100)

    else:
        model.fit(Xp_train_res,y_train,epochs=100,class_weight=weights)
        
    print(model.evaluate(Xp_test,yp_test))

    y_pred=model.predict(Xp_test)
    y_pred=np.round(y_pred)
    
    print('Classification Report :\n',classification_report(yp_test,y_pred))

    return y_pred

In [55]:
Xp_train_res.shape, yp_train_res

((4812, 7), array([0, 0, 1, ..., 1, 1, 1]))

In [56]:
y_pred=ANN(Xp_train_res,yp_train_res,Xp_test,y_test,'binary_crossentropy',0)

ValueError: ignored

**Repeating SVM-PCA after applying SMOTE and Feature Selection (Pearson)**

In [None]:
# Repeating SVM-PCA after applying SMOTE and Feature Selection (Pearson)
# Apply SVM-PCA after Applying SMOTE
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
random_state =[10,20,30,40,50,60,70]
for r in random_state:
        
    pca = PCA(n_components=6, whiten=True, random_state=r)
    svc = SVC(kernel='rbf', class_weight='balanced')
    model = make_pipeline(pca, svc)

    from sklearn.preprocessing import StandardScaler

    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(Xp_train_res)
    scaled_X_test = scaler.transform(Xp_test)

    from sklearn.model_selection import GridSearchCV
    param_grid = {'svc__C': [1, 5, 10, 50],
                  'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
    grid = GridSearchCV(model, param_grid)

    %time grid.fit(scaled_X_train, yp_train_res)
    print(grid.best_params_)

    model = grid.best_estimator_
    ypred = model.predict(scaled_X_test)
    from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(accuracy_score(yp_test,ypred))
    print(confusion_matrix(yp_test,ypred))
    print(classification_report(yp_test,ypred))
    print('---------------------------------------')


**Repeating RandomForest after applying SMOTE and Feature Selection(Pearson)**

In [44]:
#Repeating RandomForest after applying SMOTE and Feature Selection(Pearson)
# load library
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    rfc = RandomForestClassifier(random_state=r)

    # fit the predictor and target
    rfc.fit(Xp_train_res, yp_train_res)

    # predict
    rfc_predict = rfc.predict(Xp_test)# check performance
    print('ROCAUC score:',roc_auc_score(yp_test, rfc_predict))
    print('Accuracy score:',accuracy_score(yp_test, rfc_predict))
    print('F1 score:',f1_score(yp_test, rfc_predict))
    print('---------------------------------------------')

NameError: ignored

**2- Using 'Kendall' method to find the highly Correlated Features**

In [None]:
# Getting the features that are highly correlated with the In-hospital-death 
cor1 = df.corr(method ='kendall')
#Correlation with output variable
cor1_target = abs(cor1["In-hospital_death"])
#Selecting highly correlated features
relevant_features = cor1_target[cor1_target>0.14]
relevant_features

In [None]:
# Creating a set of the 
lst_1 =  df[['In-hospital_death', 'BUN', 'GCS', 'Urine','FiO2','MechVent','Lactate']]
df_kendall =lst_1.copy()

In [None]:
X_k = df_kendall.drop('In-hospital_death', axis=1)
y_k = df_kendall['In-hospital_death']

In [None]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

Xk_train, Xk_test, yk_train, yk_test = train_test_split(X_k, y_k, test_size = 0.3, random_state = 20)


In [None]:
#  training the model without handling the imbalanced class distribution

from sklearn.svm import SVC
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    svclassifier = SVC(kernel='rbf',random_state = r)
    svclassifier.fit(Xk_train, yk_train)
    yk_pred = svclassifier.predict(Xk_test)
    from sklearn.metrics import classification_report, confusion_matrix
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(confusion_matrix(yk_test,yk_pred))
    print(classification_report(yk_test,yk_pred))
    print('----------------------------------------')


```
# The classifier learned nothing as it is more biased towards the majority class

```



In [None]:
# Using SMOTE Algorithm to handle the imbalance of the data

print("Before OverSampling, counts of label '1': {}".format(sum(yk_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(yk_train == 0)))
  
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 20)
Xk_train_res, yk_train_res = sm.fit_resample(Xk_train, yk_train.ravel())
  
print('After OverSampling, the shape of train_X: {}'.format(Xk_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(yk_train_res.shape))
  
print("After OverSampling, counts of label '1': {}".format(sum(yk_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(yk_train_res == 0)))





```
# Repeating SVM after applying SMOTE and Feature Selection(kendall)
```



**Repeating SVM after applying SMOTE and Feature Selection(Kendell)**

In [None]:
#Repeating SVM after applying SMOTE and Feature Selection
from sklearn.svm import SVC
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    svclassifier = SVC(kernel='rbf',random_state=r)
    svclassifier.fit(Xk_train_res, yk_train_res)
    yk_pred = svclassifier.predict(Xk_test)
    from sklearn.metrics import classification_report, confusion_matrix
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(confusion_matrix(yk_test,yk_pred))
    print(classification_report(yk_test,yk_pred))
    print('-------------------------------------------')

**Repeating SVM-PCA after applying SMOTE and Feature Selection (Kendell)**





In [None]:
# Apply SVM-PCA after Applying SMOTE
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    pca = PCA(n_components=6, whiten=True, random_state=r)
    svc = SVC(kernel='rbf', class_weight='balanced')
    model = make_pipeline(pca, svc)

    from sklearn.preprocessing import StandardScaler

    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(Xk_train_res)
    scaled_X_test = scaler.transform(Xk_test)

    from sklearn.model_selection import GridSearchCV
    param_grid = {'svc__C': [1, 5, 10, 50],
                  'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
    grid = GridSearchCV(model, param_grid)

    %time grid.fit(scaled_X_train, yk_train_res)
    print(grid.best_params_)

    model = grid.best_estimator_
    ypred = model.predict(scaled_X_test)
    from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
    from sklearn import metrics
    print("area under curve (auc): ", metrics.roc_auc_score(y_test,y_pred))
    print('-------------------------------------------------------------------------')
    print(accuracy_score(yk_test,ypred))
    print(confusion_matrix(yk_test,ypred))
    print(classification_report(yk_test,ypred))
    print('---------------------------------------------------------------------------')



```
# Repeating RandomForest after applying SMOTE and Feature Selection(Kendell)
```



In [None]:
#Repeating RandomForest after applying SMOTE and Feature Selection(Pearson)
# load library
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    rfc = RandomForestClassifier(random_state=r)

    # fit the predictor and target
    rfc.fit(Xk_train_res, yk_train_res)

    # predict
    rfc_predict = rfc.predict(Xk_test)# check performance
    print('ROCAUC score:',roc_auc_score(yk_test, rfc_predict))
    print('Accuracy score:',accuracy_score(yk_test, rfc_predict))
    print('F1 score:',f1_score(yk_test, rfc_predict))
    print('-----------------------------------------------------')


# Investigating the mortaility rate per each ICU-Type



```
# ICUType (1: Coronary Care Unit, 2: Cardiac Surgery Recovery Unit,
           3: Medical ICU, or 4: Surgical ICU)
```



**1 - Cornary Care Unit**

In [None]:
df['ICUType'].value_counts()



```
# Cornary Care Unit : ICU-Type =1
```



In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df['ICUType'])
plt.title('Distribution of patients among ICUs')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df), 2)}%",
                ha = 'center')

In [None]:
# splitting the dataframe into 2 parts

df_icu_1 = df[df['ICUType']== 1]


In [None]:
df_icu_1['In-hospital_death'].value_counts()

In [None]:
X_icu1 = df_icu_1.drop('In-hospital_death', axis=1)
y_icu1 = df_icu_1['In-hospital_death']



```
# PCA-SVM -> Stratify
```



In [None]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

Xicu1_train, Xicu1_test, yicu1_train, yicu1_test = train_test_split(X_icu1,y_icu1 , test_size = 0.3, random_state = 20)


In [None]:
# Apply SVM-PCA 
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

pca = PCA(n_components=4, whiten=True, random_state=20)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(Xicu1_train)
scaled_X_test = scaler.transform(Xicu1_test)

from sklearn.model_selection import GridSearchCV
param_grid = {'svc__C': [1, 5, 10, 50],
              'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)

%time grid.fit(scaled_X_train, yicu1_train)
print(grid.best_params_)

model = grid.best_estimator_
ypred = model.predict(scaled_X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(accuracy_score(yicu1_test,ypred))
print(confusion_matrix(yicu1_test,ypred))
print(classification_report(yicu1_test,ypred))

In [None]:
sns.heatmap(confusion_matrix(yicu1_test,ypred),annot=True)



```
# Coronary Care Unit(Type1) : stydying the mortai 
```



**2: Cardiac Surgery Recovery Unit**

In [None]:
df_icu_2 = df[df['ICUType']== 2]


In [None]:
X_icu2 = df_icu_2.drop('In-hospital_death', axis=1)
y_icu2 = df_icu_2['In-hospital_death']

In [None]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

Xicu2_train, Xicu2_test, yicu2_train, yicu2_test = train_test_split(X_icu2,y_icu2 , test_size = 0.3, random_state = 20)


In [None]:
# Apply SVM-PCA 
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

pca = PCA(n_components=4, whiten=True, random_state=20)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(Xicu2_train)
scaled_X_test = scaler.transform(Xicu2_test)

from sklearn.model_selection import GridSearchCV
param_grid = {'svc__C': [1, 5, 10, 50],
              'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)

%time grid.fit(scaled_X_train, yicu2_train)
print(grid.best_params_)

model = grid.best_estimator_
ypred = model.predict(scaled_X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(accuracy_score(yicu2_test,ypred))
print(confusion_matrix(yicu2_test,ypred))
print(classification_report(yicu2_test,ypred))

**3: Medical ICU**

In [None]:
df_icu_3 = df[df['ICUType']== 3]

In [None]:
X_icu3 = df_icu_3.drop('In-hospital_death', axis=1)
y_icu3 = df_icu_3['In-hospital_death']

In [None]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

Xicu3_train, Xicu3_test, yicu3_train, yicu3_test = train_test_split(X_icu3,y_icu3 , test_size = 0.3, random_state = 20)


In [None]:
# Using SMOTE Algorithm to handle the imbalance of the data

print("Before OverSampling, counts of label '1': {}".format(sum(yicu3_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(yicu3_train == 0)))
  
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 20)
Xp3_train_res, yp3_train_res = sm.fit_resample(Xicu3_train, yicu3_train.ravel())
  
print('After OverSampling, the shape of train_X: {}'.format(Xp3_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(yp3_train_res.shape))
  
print("After OverSampling, counts of label '1': {}".format(sum(yp3_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(yp3_train_res == 0)))

In [None]:
# Apply SVM-PCA 
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

pca = PCA(n_components=4, whiten=True, random_state=20)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(Xp3_train_res)
scaled_X_test = scaler.transform(Xicu3_test)

from sklearn.model_selection import GridSearchCV
param_grid = {'svc__C': [1, 5, 10, 50],
              'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)

%time grid.fit(scaled_X_train, yp3_train_res)
print(grid.best_params_)

model = grid.best_estimator_
ypred = model.predict(scaled_X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(accuracy_score(yicu3_test,ypred))
print(confusion_matrix(yicu3_test,ypred))
print(classification_report(yicu3_test,ypred))

**4: Surgical ICU**

In [None]:
df_icu_4 = df[df['ICUType']== 4]

In [None]:
X_icu4 = df_icu_4.drop('In-hospital_death', axis=1)
y_icu4 = df_icu_4['In-hospital_death']

In [None]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

Xicu4_train, Xicu4_test, yicu4_train, yicu4_test = train_test_split(X_icu4,y_icu4 , test_size = 0.3, random_state = 20)


In [None]:
# Using SMOTE Algorithm to handle the imbalance of the data

print("Before OverSampling, counts of label '1': {}".format(sum(yicu4_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(yicu4_train == 0)))
  
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 20)
Xp4_train_res, yp4_train_res = sm.fit_resample(Xicu4_train, yicu4_train.ravel())
  
print('After OverSampling, the shape of train_X: {}'.format(Xp4_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(yp4_train_res.shape))
  
print("After OverSampling, counts of label '1': {}".format(sum(yp4_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(yp4_train_res == 0)))

In [None]:
# Apply SVM-PCA 
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

pca = PCA(n_components=4, whiten=True, random_state=20)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(Xp4_train_res)
scaled_X_test = scaler.transform(Xicu4_test)

from sklearn.model_selection import GridSearchCV
param_grid = {'svc__C': [1, 5, 10, 50],
              'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)

%time grid.fit(scaled_X_train, yp4_train_res)
print(grid.best_params_)

model = grid.best_estimator_
ypred = model.predict(scaled_X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(accuracy_score(yicu4_test,ypred))
print(confusion_matrix(yicu4_test,ypred))
print(classification_report(yicu4_test,ypred))



```
# This is formatted as code
```



In [None]:
#https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-measures/

# stratifying the mortality trends by whether or not a patient was ventilated on ICU day 1,

Abbreviations
APACHE:
Acute Physiology and Chronic Health Evaluation

APS:
Acute Physiology Score

GI:
gastrointestinal

MPM:
Mortality Probability Model

PAC:
post-acute care

SAPS:
Simplified Acute Physiology Score.



```
# Time Series
These 37 variables may be observed once, more than once, or not at all in some cases:

Albumin (g/dL)
ALP [Alkaline phosphatase (IU/L)]
ALT [Alanine transaminase (IU/L)]
AST [Aspartate transaminase (IU/L)]
Bilirubin (mg/dL)
BUN [Blood urea nitrogen (mg/dL)]
Cholesterol (mg/dL)
Creatinine [Serum creatinine (mg/dL)]
DiasABP [Invasive diastolic arterial blood pressure (mmHg)]
FiO2 [Fractional inspired O2 (0-1)]
GCS [Glasgow Coma Score (3-15)]
Glucose [Serum glucose (mg/dL)]
HCO3 [Serum bicarbonate (mmol/L)]
HCT [Hematocrit (%)]
HR [Heart rate (bpm)]
K [Serum potassium (mEq/L)]
Lactate (mmol/L)
Mg [Serum magnesium (mmol/L)]
MAP [Invasive mean arterial blood pressure (mmHg)]
MechVent [Mechanical ventilation respiration (0:false, or 1:true)]
Na [Serum sodium (mEq/L)]
NIDiasABP [Non-invasive diastolic arterial blood pressure (mmHg)]
NIMAP [Non-invasive mean arterial blood pressure (mmHg)]
NISysABP [Non-invasive systolic arterial blood pressure (mmHg)]
PaCO2 [partial pressure of arterial CO2 (mmHg)]
PaO2 [Partial pressure of arterial O2 (mmHg)]
pH [Arterial pH (0-14)]
Platelets (cells/nL)
RespRate [Respiration rate (bpm)]
SaO2 [O2 saturation in hemoglobin (%)]
SysABP [Invasive systolic arterial blood pressure (mmHg)]
Temp [Temperature (°C)]
TropI [Troponin-I (μg/L)]
TropT [Troponin-T (μg/L)]
Urine [Urine output (mL)]
WBC [White blood cell count (cells/nL)]
Weight (kg)*
```



# Studying the patients who needed Mechanical Ventilation

In [None]:
df.columns

In [None]:
df_imp = df[['In-hospital_death','ICUType','Age','Gender','SOFA','SAPS-I','MechVent','FiO2']].copy()



```
# MechVent [Mechanical ventilation respiration (0:false, or 1:true)]
```



In [None]:
df_imp['MechVent'].value_counts()



```
# 67% of the patients used Mechanical ventilation respiration
```



*Tracking the patients who used MechVent :*

In [None]:
# splitting the dataframe into 2 parts

df_MechVent_true =  df[df['MechVent']== 1]


In [None]:
df_MechVent_true.columns

In [None]:
df_MechVent_true['In-hospital_death'].value_counts()

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_true['In-hospital_death'])
plt.title('Mortality Distribution in Ventilated Patients')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_true), 2)}%",
                ha = 'center')



```
# In-hospital death (0: survivor, or 1: died in-hospital)
# 79% of the patients who used the MechVent survived, while 21% were passed away
```



In [None]:
df_MechVent_true['ICUType'].value_counts()



```

ICUType (1: Coronary Care Unit, 2: Cardiac Surgery Recovery Unit,
         3: Medical ICU, or 4: Surgical ICU)

Cardiac (CCU or CTU): Individuals who have had a cardiac emergency,
 like a heart attack or sudden stoppage of their heart, 
 may become a patient in the cardiac ICU. 

Medical (MICU): Patients who require close observation and specialized treatment may be candidates for the medical ICU. 
Common conditions that patients present with include respiratory failure

Surgical (SICU): Patients who need surgery or who are recovering from surgery may be in the surgical ICU.

Cardiac Surgery Recovery Unit (also called the CSRU) are those who have had heart surgery.
```



In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_true['ICUType'])
plt.title('Distribution of ventilated patients amongdifferent ICUs')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_true), 2)}%",
                ha = 'center')



```
#  Medical ICU has the highest percentage of ventilated patients. Cardiac Surgey and Surgical are almost the same capacity 
while the coronary care unit has the lowest number of ventilated patients.


```



***Studying the Patients in the Medical ICU who are most probably have a serious respiratory problems :***

In [None]:
# Split the dataset to focus only on patients assigned to (ie:ICUType =1)
df_MechVent_icu1 =  df_MechVent_true[df_MechVent_true['ICUType']== 1].copy()

In [None]:
df_MechVent_icu1.columns

In [None]:
df_MechVent_icu1['In-hospital_death'].value_counts()

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_icu1['In-hospital_death'])
plt.title('Mortaility Distribution in Patients who needed  MechVent in Coronary Care Unit')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_icu1), 2)}%",
                ha = 'center')

In [None]:
df_MechVent_icu1['Gender'].value_counts()

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_icu1['Gender'])
plt.title('Percentage of males and females in Medical ICU 1')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_icu1), 2)}%",
                ha = 'center')

In [None]:
#split data In-Hospital-death -> 0 w 1,  column MAP-> 
df_MechVent_true_0_1 =   df_MechVent_icu1[df_MechVent_icu1['In-hospital_death']== 0]
df_MechVent_true_1_1  =  df_MechVent_icu1[df_MechVent_icu1['In-hospital_death']== 1]

 'SOFA', 'SAPS-I',
       'Weight', 'HR', 'BUN', 'Creatinine', 'GCS', 'Temp', 'HCT', 'Platelets',
       'WBC', 'Na', 'HCO3', 'k', 'Mg', 'Glucose', 'Urine', 'NISysABP',
       'NIDiasABP', 'NIMAP', 'pH', 'PaCO2', 'PaO2', 'DiasABP', 'SysABP', 'MAP',
       'FiO2', 'MechVent', 'Lactate'

In [None]:
#ICU 1
df_MechVent_true_0_1["Lactate"].median()

In [None]:
# ICU 1
df_MechVent_true_1_1["Lactate"].median()

To conclude the results of the coronary care unit, the parameters that were outside the normal medical ranges or with high score of mortaility were:
 [Creatine,  PaCO2, SAPS-I, GCS, WBCs, Lactate, HCT, BUN ] 

In [None]:
# ICU 2
df_MechVent_icu2 =  df_MechVent_true[df_MechVent_true['ICUType']== 2].copy()

In [None]:
df_MechVent_true_0_2 =   df_MechVent_icu2[df_MechVent_icu2['In-hospital_death']== 0]
df_MechVent_true_1_2  =  df_MechVent_icu2[df_MechVent_icu2['In-hospital_death']== 1]

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_icu2['In-hospital_death'])
plt.title('Mortaility Distribution in Patients who needed  MechVent in Cardiac Surgey Recovery Unit')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_icu2), 2)}%",
                ha = 'center')

'SOFA', 'SAPS-I', 'Weight', 'HR', 'BUN', 'Creatinine', 'GCS', 'Temp', 'HCT', 'Platelets', 'WBC', 'Na', 'HCO3', 'k', 'Mg', 'Glucose', 'Urine', 'NISysABP', 'NIDiasABP', 'NIMAP', 'pH', 'PaCO2', 'PaO2', 'DiasABP', 'SysABP', 'MAP', 'FiO2', 'MechVent', 'Lactate'

In [None]:
df_MechVent_true_0_2["Lactate"].median()

In [None]:
df_MechVent_true_1_2["Lactate"].median()

Surgical ICU

In [None]:
df_MechVent_icu4 =  df_MechVent_true[df_MechVent_true['ICUType']== 4].copy()

In [None]:
df_MechVent_true_0_4 =   df_MechVent_icu4[df_MechVent_icu4['In-hospital_death']== 0]
df_MechVent_true_1_4  =  df_MechVent_icu4[df_MechVent_icu4['In-hospital_death']== 1]

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_icu4['In-hospital_death'])
plt.title('Mortaility Distribution in Patients who needed  MechVent in Surgical ICU')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_icu4), 2)}%",
                ha = 'center')

SOFA', 'SAPS-I', 'Weight', 'HR', 'BUN', 'Creatinine', 'GCS', 'Temp', 'HCT', 'Platelets', 'WBC', 'Na', 'HCO3', 'k', 'Mg', 'Glucose', 'Urine', 'NISysABP', 'NIDiasABP', 'NIMAP', 'pH', 'PaCO2', 'PaO2', 'DiasABP', 'SysABP', 'MAP', 'FiO2', 'MechVent', 'Lactate'

In [None]:
df_MechVent_true_0_4["Lactate"].median()

In [None]:
df_MechVent_true_1_4["Lactate"].median()

In [None]:
SOFA, 

In [None]:
# Split the dataset to focus only on patients assigned to the Medical ICU (ie:ICUType =3)
df_MechVent_icu3 =  df_MechVent_true[df_MechVent_true['ICUType']== 3].copy()

In [None]:
df_MechVent_icu3.columns

*Mortaility Distribution in Medical ICU*

In [None]:
df_MechVent_icu3['In-hospital_death'].value_counts()

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_icu3['In-hospital_death'])
plt.title('Mortaility Distribution in Patients who needed  MechVent in Medical ICU3')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_icu3), 2)}%",
                ha = 'center')



```
# 80% of the Patients were survived while 20.81% did not.
```



*Percentage of males and females in Medical ICU*

In [None]:
df_MechVent_icu3['Gender'].value_counts()

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_MechVent_icu3['Gender'])
plt.title('Percentage of males and females in Medical ICU 3')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_MechVent_icu3), 2)}%",
                ha = 'center')



```
# Gender (0: female, or 1: male). The percentage of Males are higher than females.
```



**1-Mortaility distribution between males and females in Medical ICU and ventilated:**

In [None]:
Features =['SOFA','FiO2','SAPS-I','MAP','DiasABP']
df_MechVent_icu3

In [None]:
#split data In-Hospital-death -> 0 w 1, mean column MAP-> 
df_MechVent_true_0 =   df_MechVent_icu3[df_MechVent_icu3['In-hospital_death']== 0]
df_MechVent_true_1  =  df_MechVent_icu3[df_MechVent_icu3['In-hospital_death']== 1]



```
# Maximum

SOFA Score	Mortality

0 to 6      	< 10%
7 to 9	     15 - 20%
10 to 12	   40 - 50%
13 to 14	   50 - 60%
15	           > 80%
15 to 24	     > 90%

```



In [None]:
# Mean of SOFA Score in Survived patients
SOFA_mean_0 = df_MechVent_true_0["SOFA"].median()
SOFA_mean_0

In [None]:
# Mean of SOFA Score in non-Survived patients
SOFA_mean_1 = df_MechVent_true_1["SOFA"].median()
SOFA_mean_1

In [None]:
SAPSI_mean_0 = df_MechVent_true_0["SAPS-I"].median()
SAPSI_mean_0

In [None]:
SAPSI_mean_1 = df_MechVent_true_1["SAPS-I"].median()
SAPSI_mean_1



```
#  70 and 100 mm Hg considered to be normal.
mean arterial pressure (MAP).
A MAP in this range indicates that there’s enough consistent pressure in your arteries to deliver blood throughout your body.


```



In [None]:
# Mean arterial Pressure in Survived patients
MAP_mean_0 = df_MechVent_true_0["MAP"].median()
MAP_mean_0

In [None]:
# Mean arterial Pressure in Non-Survived patients
MAP_mean_1 = df_MechVent_true_1["MAP"].median()
MAP_mean_1



```
# A BUN test can reveal whether your urea nitrogen levels are higher than normal, suggesting that your kidneys may not be working properly.
around 6 to 24 mg/dL is considered to be normal
```



In [None]:
BUN__0 = df_MechVent_true_0["BUN"].median()
BUN__0

In [None]:
BUN_mean_1 = df_MechVent_true_1["BUN"].median()
BUN_mean_1

In [None]:
Temp_mean_0 =df_MechVent_true_0["Temp"].median()
Temp_mean_0

In [None]:
Temp_mean_1 =df_MechVent_true_1["Temp"].median()
Temp_mean_1



```
# A normal platelet count ranges from 150,000 to 450,000 platelets per microliter of blood. Having more than 450,000 platelets is a 
condition called thrombocytosis; having less than 150,000 is known as thrombocytopenia
```



In [None]:
Platelets_mean_0 =df_MechVent_true_0["Platelets"].median()
Platelets_mean_0


In [None]:
Platelets_1 =df_MechVent_true_1["Platelets"].median()
Platelets_1

In [None]:
df_MechVent_true_1.columns

 
 
 

*   Creatine : 0.7 to 1.3 mg/dL , All patients had normal ranges
*   HR       : between 60 and 100bpm. Normal in both
*   BUN      : Abnormal in died patients
*   
*  Glasgow Coma Scale (GCS) :An initial score of less than 5 is associated with an 80% chance of being in a lasting vegetative state or death. An initial score of greater than 11 is associated with 90% chance of recovery. 

Temp : normal in both
HCT and Platelets same ranges in both categories

WBC (White blood cells ) is higher than normal range in died patients while inside the normal range on those who survived


NA(Sodium) : Normal in both category
HCO3 : Normal in both
K : normal in both
Mg : normal in both
Glucose Both had abnormal ranges but the died are much higher

Urine no data about the fluid in take so it can not be a fair comparison
NIMap: Normal in Both
PH: normal inboth

Partial pressure of carbon dioxide (PaCO2): 38 to 42 mm Hg
below normal in died

Partial pressure of oxygen (PaO2) :75 -> 100 abnormal in both

Lactate normal in Both



In [None]:
GCS_1 =df_MechVent_true_1["GCS"].median()
GCS_1

In [None]:
GCS_0=df_MechVent_true_0["GCS"].median()
GCS_0

In [None]:
# HIGH WBC in died 
df_MechVent_true_1["WBC"].median()




```
# HIGH WBC COUNT

A higher than normal WBC count is called leukocytosis. It may be due to:

Certain drugs or medicines (see list below)
Cigarette smoking
After spleen removal surgery
Infections, most often those caused by bacteria
Inflammatory disease (such as rheumatoid arthritis or allergy)
Leukemia or Hodgkin disease
Tissue damage (for example, burns)
Pregnancy
```



In [None]:
# Normal range in Survived
df_MechVent_true_0["WBC"].median()

In [None]:
df_MechVent_true_1["Glucose"].median()

In [None]:
df_MechVent_true_0["Glucose"].median()

In [None]:
# 
df_MechVent_true_1["HR"].median()

In [None]:
df_MechVent_true_0["HR"].median()

**SVM-PCA after SMOOT of the patients in the medical ICU only and selecting the features that shows abnormal ranges in the died patients and normal on the survived ones:**

In [None]:
# Creating a set of the 
lst_2 =  df_MechVent_icu3[['In-hospital_death','PaCO2', 'BUN', 'GCS', 'WBC','HCT','Lactate']]
df_imp1 =lst_2.copy()

In [None]:
X_imp1 = df_imp1.drop('In-hospital_death', axis=1)
y_imp1 = df_imp1['In-hospital_death']

In [None]:
# Split the data into test and train sets

from sklearn.model_selection import train_test_split
  

X_train, X_test, y_train, y_test = train_test_split(X_imp1, y_imp1, test_size = 0.3, random_state = 20)

In [None]:
# Using SMOTE Algorithm to handle the imbalance of the data

print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0)))
  
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 20)
X_train_res, y_train_res = sm.fit_resample(X_train, y_train.ravel())
  
print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape))
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape))
  
print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0)))

In [None]:
# Apply SVM-PCA after Applying SMOTE
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline

        
pca = PCA(n_components=3, whiten=True, random_state=20)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(X_train_res)
scaled_X_test = scaler.transform(X_test)

from sklearn.model_selection import GridSearchCV
param_grid = {'svc__C': [1, 5, 10, 50],
            'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)

%time grid.fit(scaled_X_train, y_train_res)
print(grid.best_params_)
model = grid.best_estimator_
ypred = model.predict(scaled_X_test)
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from sklearn import metrics
print("area under curve (auc): ", metrics.roc_auc_score(y_test,ypred))
print('-------------------------------------------------------------------------')
print(accuracy_score(y_test,ypred))
print(confusion_matrix(y_test,ypred))
print(classification_report(y_test,ypred))
print('------------------------------------------------------------------------------')


Support Vector Machines (Kernels)

C parameter: Controls trade-off between classifying training points correctly and having a smooth decision boundary.
Small C (loose) makes the cost (penalty) of misclassification low (soft margin)
Large C (strict) makes the cost of misclassification high (hard margin), forcing the model to explain input data stricter and potentially over its
gamma parameter: Controls how far the influence of a single training set reaches.
Large gamma: close reach (closer data points have high weight)
Small gamma: far reach (more generalized solution)
degree parameter: Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
Grid search is a popular way to find the right hyper-parameter values. Performing a large grid search first, then a refined grid search centred on the best results is frequently faster. Knowing what each hyper-parameter does can also help you identify the right part of the hyper-parameter space to search for.

In [None]:
# Studying patients at ICU 

# Investigating False Negative Analysis:

In [None]:
# Apply SVM-PCA after Applying SMOTE
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
random_state =[10,20,30,40,50,60,70]
for r in random_state:
    pca = PCA(n_components=6, whiten=True, random_state=20)
    svc = SVC(kernel='rbf', class_weight='balanced')
    model = make_pipeline(pca, svc)

    from sklearn.preprocessing import StandardScaler

    scaler = StandardScaler()
    scaled_X_train = scaler.fit_transform(X_train)
    scaled_X_test = scaler.transform(X_test)

    from sklearn.model_selection import GridSearchCV
    param_grid = {'svc__C': [1],
                  'svc__gamma': [0.0001]}
    grid = GridSearchCV(model, param_grid)

    %time grid.fit(scaled_X_train, y_train)
    print(grid.best_params_)

    model = grid.best_estimator_
    ypred = model.predict(scaled_X_test)
    from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
    print(accuracy_score(y_test,ypred))
    print(confusion_matrix(y_test,ypred))
    print(classification_report(y_test,ypred))
    print('---------------------------------------------------------------------------')

In [None]:
fn = X_test[(y_test == 1) & (ypred[:] == 0)]

In [None]:
fp = X_test[(y_test == 0) & (ypred[:] == 1)]

In [None]:
len(fn)

In [None]:
fn.head(25)

In [None]:
fp.head(25)

In [None]:
filter_fn = fn[['BUN','GCS','Urine','FiO2','MechVent','Lactate']]
filter_fn.corr()
#sns.heatmap(filter_fn)

In [None]:
sns.scatterplot(data=fn, x="BUN", y="GCS")

In [None]:
sns.scatterplot(data=df, x="FiO2", y="GCS")

The higher the FiO2 the higher the GCS

In [None]:
sns.scatterplot(data=fn, x="FiO2", y="GCS")

In [None]:
sns.scatterplot(data=fp, x="FiO2", y="GCS")

In [None]:
dead_df = df[df['In-hospital_death'] == 1]
dead_df.describe()

In [None]:
sns.scatterplot(data=dead_df, x="FiO2", y="GCS")

In [None]:
dead_df['ICUType'].value_counts()

In [None]:
type(y_test)

In [None]:
dead_test_df = y_test[y_test == 1]
dead_test_df.index

In [None]:
Filter_x_test_dead  = X_test[X_test.index.isin(dead_test_df.index)]

In [None]:
x_dead_icu_count = Filter_x_test_dead['ICUType'].value_counts()
x_dead_icu_count.rename('ICUType_Before', inplace=True)
x_dead_icu_count

In [None]:
fn_icu_count = fn['ICUType'].value_counts()
fn_icu_count.rename('ICUType_After', inplace=True)
fn_icu_count

In [None]:
before_after = pd.concat([x_dead_icu_count, fn_icu_count], axis=1)
before_after['Percentage'] = (before_after['ICUType_After'] / before_after['ICUType_Before']) * 100
before_after

In [None]:
before_after.index = before_after.index.astype("int")
sns.barplot(before_after.index, y="Percentage", data=before_after, order=before_after.sort_values('Percentage',ascending = False).index)

In [None]:
x_icu4 = df[df['ICUType'] == 4]
x_icu4.head()

In [None]:
x_test_icu4 = Filter_x_test_dead[Filter_x_test_dead['ICUType'] == 4]
x_test_icu4.head()

In [None]:
fn_icu4 = fn[fn['ICUType'] == 4]
fn_icu4.head()

In [None]:
x_icu4.describe()

In [None]:
x_test_icu4.describe()

In [None]:
fn_icu4.describe()

In [None]:
survived_df = df[df['In-hospital_death'] == 0]
survived_df.describe()

In [None]:
fp.describe()

In [None]:
fn.describe()

In [None]:
len(fn)

In [None]:
Both have SOFA 8 - > 10 to 15% of Mortaility
SAME 



Same                   Normal MAP win both
                            Normal Glucose 
                              PH, Mg

SOFA
SAPS



BUN, WBCs, PACO2 outside normal range died patients
PaO2 outside normal range in the surveyed and normal in the dead
Both Moderate GCS
Both abnormal lactate level, HCT
