# About the data

About Dataset :
“How many beds should an ICU provide to serve the population?” 
“Which is the most critical body information to monitor in an ICU?”
The data consist of records from 12,000 ICU stays. All patients were adults who were admitted for a wide variety of reasons to cardiac, medical, surgical, and trauma ICUs. ICU stays of less than 48 hours have been excluded.


# General Descriptors

These six descriptors are collected at the time the patient is admitted to the ICU. Their associated time-stamps are set to 00:00 (thus they appear at the beginning of each patient's record).

      RecordID (a unique integer for each ICU stay)
      Age (years)
      Gender (0: female, or 1: male)
      Height (cm)
      ICUType (1: Coronary Care Unit, 2: Cardiac Surgery Recovery Unit,
               3: Medical ICU, or 4: Surgical ICU)
      Weight (kg)*.

Acute Physiology and Chronic  Health  Evaluation  (APACHE)  

Simplified  Acute  Physiology  Score              (SAPS) 

Sequential  Organ  Failure  Assessment            (SOFA) 

Time Series
These 37 variables may be observed once, more than once, or not at all in some cases:


    1.      Albumin (g/dL)
    2.      ALP [Alkaline phosphatase (IU/L)]
    3.      ALT [Alanine transaminase (IU/L)]
    4.      AST [Aspartate transaminase (IU/L)]
    5.      Bilirubin (mg/dL)
    6.      BUN [Blood urea nitrogen (mg/dL)]
    7.      Cholesterol (mg/dL)
    8.      Creatinine [Serum creatinine (mg/dL)]
    9.      DiasABP [Invasive diastolic arterial blood pressure (mmHg)]
    10.     FiO2 [Fractional inspired O2 (0-1)]
    11.     GCS [Glasgow Coma Score (3-15)]
    12.     Glucose [Serum glucose (mg/dL)]
    13.     HCO3 [Serum bicarbonate (mmol/L)]
    14.     HCT [Hematocrit (%)]
    15.     HR [Heart rate (bpm)]
    16.     K [Serum potassium (mEq/L)]
    17.     Lactate (mmol/L)
    18.     Mg [Serum magnesium (mmol/L)]
    19.     MAP [Invasive mean arterial blood pressure (mmHg)]
    20.     MechVent [Mechanical ventilation respiration (0:false, or 1:true)]
    21.     Na [Serum sodium (mEq/L)]
    22.     NIDiasABP [Non-invasive diastolic arterial blood pressure (mmHg)]
    23.     NIMAP [Non-invasive mean arterial blood pressure (mmHg)]
    24.     NISysABP [Non-invasive systolic arterial blood pressure (mmHg)]
    25.     PaCO2 [partial pressure of arterial CO2 (mmHg)]
    26.     PaO2 [Partial pressure of arterial O2 (mmHg)]
    27.     pH [Arterial pH (0-14)]
    28.     Platelets (cells/nL)
    29.     RespRate [Respiration rate (bpm)]
    30.     SaO2 [O2 saturation in hemoglobin (%)]
    31.     SysABP [Invasive systolic arterial blood pressure (mmHg)]
    32.     Temp [Temperature (°C)]
    33.     TropI [Troponin-I (μg/L)]
    34.     TropT [Troponin-T (μg/L)]
    35.     Urine [Urine output (mL)]
    36.     WBC [White blood cell count (cells/nL)]
    37.     Weight (kg)*





# ICU admission characteristics and mortality rates

In [1]:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
import os


In [2]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

root_dir = "/content/gdrive/My Drive/Colab Notebooks/ICU Project/"
directory = root_dir + 'assets/'

MessageError: ignored

In [None]:
info_df = pd.read_csv(directory + 'info.csv')
readings_df = pd.read_csv(directory + 'readings.csv')
outcomes_df = pd.read_csv(directory + 'outcomes.csv')


In [None]:
readings_df.columns

In [None]:
outcomes_df.info()

In [None]:
readings_df["Parameter"].value_counts()

In [None]:
#readings_df.groupby("RecordID")[['HR']].mean()

# Exploring the dataSet

In [None]:
outcomes_df.head()

In [None]:
outcomes_df.shape


In [None]:
info_df.head()

In [None]:
info_df['Age'].min()

In [None]:
info_df['Age'].max()

In [None]:
readings_df.head()

In [None]:
df_merge = info_df.merge(outcomes_df)

In [None]:
df_merge.shape

In [None]:
df_merge.isnull().sum()

In [None]:
df_merge.isnull().sum() / len(df_merge)

In [None]:
#df_merge : is merging both the info_df and outcomes_df together to do the visualizations
df_merge = info_df.merge(outcomes_df)

In [None]:
df_merge.columns

In [None]:
df_merge.info()

In [None]:
df_merge.describe()

In [None]:
# how often specific values occur in a column
# Comparing the Counts among the ICU_TYPE
#  
df_merge["ICUType"].value_counts()

In [None]:
#  Number of survivals
df_merge["In-hospital_death"].value_counts()

In [None]:
#  0: female, or 1: male
df_merge["Gender"].value_counts()

In [None]:
readings_df.columns

In [None]:
# merge In-hospital with reading data
# Feature Correlation with the In-hospital_death
#print('Feature Correlation with target:')
#df_merge.corr()['In-hospital_death']

# Data Cleaning

In [None]:
df_merge['Gender'].value_counts()

In [None]:
df_merge.drop(df_merge.index[df_merge['Gender'] == -1], inplace = True)

In [None]:
df_merge["ICUType"].values

In [None]:
df_merge["Age"].values

Range of Age:

In [None]:
Age_min = df_merge["Age"]
Age_min.min()

In [None]:
Age_max = df_merge["Age"]
Age_max.max()

Data Visualizations:

Mortaility Distribution:

In-hospital death (0: survivor, or 1: died in-hospital)

In [None]:
df_merge['In-hospital_death'].value_counts()

In [None]:
plt.figure(figsize = (8, 8))
ax = sns.countplot(df_merge['In-hospital_death'])
plt.title('Mortality Distribution')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_merge), 2)}%",
                ha = 'center')

In [None]:
#!pip install -U pandasql
#from pandasql import sqldf 
#mysql = lambda q: sqldf(q, globals())
#mysql("SELECT Gender FROM info_df LIMIT 5;")
#'''
#SELECT m.Gender, b.'In-hospital_death'
#FROM info_df AS m
#INNER JOIN 
##outcomes_df AS b
#ON m.RecordID = b.RecordID;
#'''
#mysql(query)

Mortality Rate Based on Gender:

In [None]:
plt.figure(figsize = (8, 8))
plt.title('Mortality by Gender')
ax = sns.countplot(x = 'Gender', data = df_merge, hue = 'In-hospital_death')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_merge), 2)}%",
                ha = 'center')

Distribution among each ICU_TYPE: Number of patients per ICU

In [None]:
plt.figure(figsize = (18, 8))
plt.subplot(1, 2, 1)
plt.title('ICU Type Distribution')
ax = sns.countplot(x = 'ICUType', data = df_merge)
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_merge), 2)}%",
                ha = 'center')
        


The Percentage of the In-Hospital_death according to the ICU_TYPE

In-hospital death (0: survivor, or 1: died in-hospital) 

In [None]:
plt.figure(figsize = (18, 8))
plt.subplot(1, 2, 2)
plt.title('ICU Type Distribution by Mortality')
ax = sns.countplot(x = 'ICUType', data = df_merge, hue = 'In-hospital_death')
for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2.0, height + 3,
                f"{round(100 * height / len(df_merge), 2)}%",
                ha = 'center')

In [None]:
readings_df.info

In [None]:
readings_df.head(50)

In [None]:
readings_df['RecordID'].value_counts()

In [None]:
readings_df.isnull().sum()

In [None]:
outcomes_df.shape

Getting the means corresponding to each patient readings in a new dataset:

In [None]:
data = {'Albumin ',
'ALP ',
'ALT',
'AST',
'Bilirubin',
'BUN',
'Cholesterol',
'Creatinine',
'DiasABP',
'FiO2',
'GCS',
'Glucose',
'HCO3',
'HCT',
'HR',
'K',
'Lactate',
'Mg',
'MAP',
'MechVent',
'Na',
'NIDiasABP',
'NIMAP',
'NISysABP',
'PaCO2',
'PaO2',
'pH',
'Platelets',
'RespRate',
'SaO2',
'SysABP' , 'Temp' ,'TropI' ,'TropT' ,'Urine','WBC',
'Weight'} 




# Aya :  Creating Dataset of the avg ,min, max value of every parameter for each patient
##### 2 - Trial to convert the million records to 4000 records#####

In [None]:
readings_df_copy = pd.read_csv(directory + 'readings.csv')


# grouped_multiple : is a dataset contains the mean , min , max values per patient per parameter.
saved to csv file named:  to_csv('dfmeans_values_Aya.csv')

In [None]:
grouped_multiple = readings_df_copy.groupby(['RecordID', 'Parameter']).agg({'Value': ['mean', 'min', 'max']})
grouped_multiple.columns = ['mean', 'min', 'max']
grouped_multiple = grouped_multiple.reset_index()
print(grouped_multiple)

In [None]:
grouped_multiple.to_csv('dfmeans_values_Aya.csv')

In [None]:
grouped_multiple.isnull().sum()

In [None]:
# How many entries are there for each Parameter?
# Checking Whether 
grouped_multiple['Parameter'].value_counts()

In [None]:
grouped_multiple.drop(['min', 'max'], axis=1)

In [None]:
grouped_multiple.isnull().sum()

Creating new dataset excluding parameters not measured in more than 50 % of the patients : grouped_multiple_less_2000

In [None]:
# Setting a threshold if the count of a certain parameter is less than 2000 which means 50% of the patients don't have values, it will be ignored
#define values
values = ['SaO2', 'AST', 'ALT','Bilirubin','ALP','Albumin','RespRate','TroponinT','Cholesterol','TroponinI']

#drop rows that contain any value in the list
grouped_multiple_less_2000 = grouped_multiple[grouped_multiple.Parameter.isin(values) == False]
grouped_multiple_less_2000

In [None]:
grouped_multiple_less_2000['Parameter'].value_counts()

In [None]:
# Data Set of Weight
grouped_multiple_Weight = grouped_multiple[grouped_multiple.Parameter == 'Weight'  ]
grouped_multiple_Weight.rename(columns = {'mean':'Weight'}, inplace = True)
grouped_multiple_Weight = grouped_multiple_Weight.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Weight.isnull().sum())

In [None]:
isinstance(grouped_multiple_Weight, pd.DataFrame)

In [None]:
grouped_multiple.loc[grouped_multiple.Parameter == 'HR']

In [None]:
# Data Set of HR 
grouped_multiple_HR  = grouped_multiple.loc[grouped_multiple.Parameter == 'HR']
grouped_multiple_HR.rename(columns = {'mean':'HR'}, inplace = True)
grouped_multiple_HR = grouped_multiple_HR.drop(['Parameter','min','max'], axis=1)
grouped_multiple_HR.shape

In [None]:
grouped_multiple_HR.shape

In [None]:
# Data Set of BUN
grouped_multiple_BUN  = grouped_multiple[grouped_multiple.Parameter == 'BUN']
grouped_multiple_BUN.rename(columns = {'mean':'BUN'}, inplace = True)
grouped_multiple_BUN= grouped_multiple_BUN.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_BUN.isnull().sum())

In [None]:
# Data Set of Creatinine
grouped_multiple_Creatinine  = grouped_multiple[grouped_multiple.Parameter == 'Creatinine']
grouped_multiple_Creatinine.rename(columns = {'mean':'Creatinine'}, inplace = True)
grouped_multiple_Creatinine = grouped_multiple_Creatinine.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Creatinine.isnull().sum())

In [None]:
# Data Set of GCS
grouped_multiple_GCS  = grouped_multiple[grouped_multiple.Parameter == 'GCS']
grouped_multiple_GCS.rename(columns = {'mean':'GCS'}, inplace = True)
grouped_multiple_GCS = grouped_multiple_GCS.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_GCS.isnull().sum())

In [None]:
# Data Set of Temp
grouped_multiple_Temp  = grouped_multiple[grouped_multiple.Parameter == 'Temp']
grouped_multiple_Temp.rename(columns = {'mean':'Temp'}, inplace = True)
grouped_multiple_Temp = grouped_multiple_Temp.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Temp.isnull().sum())

In [None]:
# Data Set of HCT
grouped_multiple_HCT  = grouped_multiple[grouped_multiple.Parameter == 'HCT']
grouped_multiple_HCT.rename(columns = {'mean':'HCT'}, inplace = True)
grouped_multiple_HCT = grouped_multiple_HCT.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_HCT.isnull().sum())

In [None]:
# Data Set of Platelets
grouped_multiple_Platelets  = grouped_multiple[grouped_multiple.Parameter == 'Platelets']
grouped_multiple_Platelets.rename(columns = {'mean':'Platelets'}, inplace = True)
grouped_multiple_Platelets = grouped_multiple_Platelets.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Platelets.isnull().sum())

In [None]:
# Data Set of WBC
grouped_multiple_WBC  = grouped_multiple[grouped_multiple.Parameter == 'WBC'  ]
grouped_multiple_WBC.rename(columns = {'mean':'WBC'}, inplace = True)
grouped_multiple_WBC = grouped_multiple_WBC.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_WBC.isnull().sum())

In [None]:
# Data Set of Na
grouped_multiple_Na  = grouped_multiple[grouped_multiple.Parameter == 'Na']
grouped_multiple_Na.rename(columns = {'mean':'Na'}, inplace = True)
grouped_multiple_Na = grouped_multiple_Na.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Na.isnull().sum())

In [None]:
# Data Set of HCO3 
#grouped_multiple.loc[grouped_multiple.Parameter == 'HCO3']
grouped_multiple_HCO3   = grouped_multiple[grouped_multiple.Parameter == 'HCO3']
grouped_multiple_HCO3.rename(columns = {'mean':'HCO3'}, inplace = True)
grouped_multiple_HCO3 = grouped_multiple_HCO3.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_HCO3.isnull().sum())

In [None]:
# Data Set of k
grouped_multiple_k   = grouped_multiple.loc[grouped_multiple.Parameter == 'K']
grouped_multiple_k.rename(columns = {'mean':'k'}, inplace = True)
grouped_multiple_k = grouped_multiple_k.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_k.isnull().sum())

In [None]:
# Data Set of Mg
grouped_multiple_Mg   = grouped_multiple[grouped_multiple.Parameter == 'Mg']
grouped_multiple_Mg.rename(columns = {'mean':'Mg'}, inplace = True)
grouped_multiple_Mg = grouped_multiple_Mg.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Mg.isnull().sum())

In [None]:
# Data Set of Glucose
grouped_multiple_Glucose   = grouped_multiple[grouped_multiple.Parameter == 'Glucose']
grouped_multiple_Glucose.rename(columns = {'mean':'Glucose'}, inplace = True)
grouped_multiple_Glucose= grouped_multiple_Glucose.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Glucose.isnull().sum())

In [None]:
# Data Set of Urine
grouped_multiple_Urine   = grouped_multiple[grouped_multiple.Parameter == 'Urine']
grouped_multiple_Urine.rename(columns = {'mean':'Urine'}, inplace = True)
grouped_multiple_Urine= grouped_multiple_Urine.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Urine.isnull().sum())


In [None]:
# Data Set of NISysABP
grouped_multiple_NISysABP   = grouped_multiple[grouped_multiple.Parameter == 'NISysABP']
grouped_multiple_NISysABP.rename(columns = {'mean':'NISysABP'}, inplace = True)
grouped_multiple_NISysABP= grouped_multiple_NISysABP.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_NISysABP.isnull().sum())

In [None]:
# Data Set of NIDiasABP
grouped_multiple_NIDiasABP   = grouped_multiple[grouped_multiple.Parameter == 'NIDiasABP']
grouped_multiple_NIDiasABP.rename(columns = {'mean':'NIDiasABP'}, inplace = True)
grouped_multiple_NIDiasABP= grouped_multiple_NIDiasABP.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_NIDiasABP.isnull().sum())

In [None]:
# Data Set of NIMAP
grouped_multiple_NIMAP   = grouped_multiple[grouped_multiple.Parameter == 'NIMAP']
grouped_multiple_NIMAP.rename(columns = {'mean':'NIMAP'}, inplace = True)
grouped_multiple_NIMAP= grouped_multiple_NIMAP.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_NIMAP.isnull().sum())

In [None]:
# Data Set of pH
grouped_multiple_pH   = grouped_multiple[grouped_multiple.Parameter == 'pH']
grouped_multiple_pH.rename(columns = {'mean':'pH'}, inplace = True)
grouped_multiple_pH= grouped_multiple_pH.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_pH.isnull().sum())

In [None]:
# Data Set of PaCO2
grouped_multiple_PaCO2   = grouped_multiple[grouped_multiple.Parameter == 'PaCO2']
grouped_multiple_PaCO2.rename(columns = {'mean':'PaCO2'}, inplace = True)
grouped_multiple_PaCO2= grouped_multiple_PaCO2.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_PaCO2.isnull().sum())

In [None]:
# Data Set of PaO2
grouped_multiple_PaO2   = grouped_multiple[grouped_multiple.Parameter == 'PaO2']
grouped_multiple_PaO2.rename(columns = {'mean':'PaO2'}, inplace = True)
grouped_multiple_PaO2= grouped_multiple_PaO2.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_PaO2.isnull().sum())

In [None]:
# Data Set of DiasABP
grouped_multiple_DiasABP   = grouped_multiple[grouped_multiple.Parameter == 'DiasABP']
grouped_multiple_DiasABP.rename(columns = {'mean':'DiasABP'}, inplace = True)
grouped_multiple_DiasABP= grouped_multiple_DiasABP.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_DiasABP.isnull().sum())

In [None]:
# Data Set of SysABP
grouped_multiple_SysABP   = grouped_multiple[grouped_multiple.Parameter == 'SysABP']
grouped_multiple_SysABP.rename(columns = {'mean':'SysABP'}, inplace = True)
grouped_multiple_SysABP= grouped_multiple_SysABP.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_SysABP.isnull().sum())

In [None]:
# Data Set of MAP
grouped_multiple_MAP   = grouped_multiple[grouped_multiple.Parameter == 'MAP']
grouped_multiple_MAP.rename(columns = {'mean':'MAP'}, inplace = True)
grouped_multiple_MAP= grouped_multiple_MAP.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_MAP.isnull().sum())

In [None]:
# Data Set of FiO2
grouped_multiple_FiO2   = grouped_multiple[grouped_multiple.Parameter == 'FiO2']
grouped_multiple_FiO2.rename(columns = {'mean':'FiO2'}, inplace = True)
grouped_multiple_FiO2= grouped_multiple_FiO2.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_FiO2.isnull().sum())

In [None]:
# Data Set of MechVent
grouped_multiple_MechVent   = grouped_multiple[grouped_multiple.Parameter == 'MechVent']
grouped_multiple_MechVent.rename(columns = {'mean':'MechVent'}, inplace = True)
grouped_multiple_MechVent= grouped_multiple_MechVent.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_MechVent.isnull().sum())

In [None]:
# Data Set of Lactate
grouped_multiple_Lactate   = grouped_multiple[grouped_multiple.Parameter == 'Lactate']
grouped_multiple_Lactate.rename(columns = {'mean':'Lactate'}, inplace = True)
grouped_multiple_Lactate= grouped_multiple_Lactate.drop(['Parameter','min','max'], axis=1)
print(grouped_multiple_Lactate.isnull().sum())

In [None]:
import pandas as pd
from functools import reduce

In [None]:
info_df.columns

In [None]:
df_survivals = outcomes_df[['RecordID','In-hospital_death']]
df_ICUTYPE = info_df[['RecordID','ICUType']]
df_Age = info_df[['RecordID','Age']]
df_SOFA = outcomes_df[['RecordID','SOFA']]
df_SAPSI = outcomes_df[['RecordID','SAPS-I']]

In [None]:
#define list of DataFrames
list_df_more_2000 = [df_survivals,df_ICUTYPE,df_Age,df_SOFA,df_SAPSI,grouped_multiple_Weight,grouped_multiple_HR,grouped_multiple_BUN,grouped_multiple_Creatinine,grouped_multiple_GCS,
           grouped_multiple_Temp,grouped_multiple_HCT ,grouped_multiple_Platelets, grouped_multiple_WBC, grouped_multiple_Na,
           grouped_multiple_HCO3,grouped_multiple_k,grouped_multiple_Mg,grouped_multiple_Glucose,grouped_multiple_Urine,
           grouped_multiple_NISysABP,grouped_multiple_NIDiasABP,grouped_multiple_NIMAP,grouped_multiple_pH,
           grouped_multiple_PaCO2,grouped_multiple_PaO2,grouped_multiple_DiasABP,grouped_multiple_SysABP,
           grouped_multiple_MAP,grouped_multiple_FiO2,grouped_multiple_MechVent,grouped_multiple_Lactate ]

In [None]:
#merge all DataFrames into one
final_df_means = reduce(lambda  left,right: pd.merge(left,right,on=['RecordID'],
                                            how='outer'), list_df_more_2000)
final_df_means.isnull().sum()

# Final dataset

In [None]:
final_df_means.to_csv(directory + 'final_df_means.csv', index=False)

In [None]:
final_df_means = pd.read_csv(directory + 'final_df_means.csv')

In [None]:
final_df_means.info()

In [None]:
final_df_means.shape

In [None]:
final_df_means.isnull().sum()

In [None]:
# Cleaning the data from the nulls
# Split the data into two datasets based on In-hospital_death

# splitting the dataframe into 2 parts

df_0 = final_df_means[final_df_means['In-hospital_death']== 0]
df_1 = final_df_means[final_df_means['In-hospital_death']== 1]

In [None]:
# Replace nan values with average of columns
df_0.mean()
df_0.fillna(value=df_0['In-hospital_death'].mean(), inplace=True)

In [None]:
df_0


In [None]:
# Replace nan values with average of columns
df_1.mean()
df_1.fillna(value=df_1['In-hospital_death'].mean(), inplace=True)
df_1

In [None]:
df_1.info()

In [None]:
conc_final_df = pd.concat([df_0, df_1])

In [None]:
conc_final_df.to_csv(directory + 'f_means_df.csv', index=False)

In [None]:
conc_final_df.to_csv( 'f_means_df.csv', index=False)

In [None]:
conc_final_df = pd.read_csv(directory + 'f_means_df.csv')

In [None]:
conc_final_df.info()

In [None]:
conc_final_df.isnull().sum()

In [None]:
conc_final_df.corr()['In-hospital_death']

In [None]:
conc_final_df.corr()['ICUType']

In [None]:
conc_final_df.columns

In [None]:
Features =  ['In-hospital_death', 'ICUType', 'mean_Weight', 'mean_HR',
       'mean_BUN', 'mean_Creatinine', 'mean_GCS', 'mean_Temp', 'mean_HCT',
       'mean_Platelets', 'mean_WBC', 'mean_Na', 'mean_HCO3', 'mean_k',
       'mean_Mg', 'mean_Glucose', 'mean_Urine', 'mean_NISysABP',
       'mean_NIDiasABP', 'mean_NIMAP', 'mean_pH', 'mean_PaCO2', 'mean_PaO2',
       'mean_DiasABP', 'mean_SysABP', 'mean_MAP', 'mean_FiO2', 'mean_MechVent',
       'mean_Lactate']
Labels =   ['In-hospital_death']

In [None]:
conc_final_df.groupby('ICUType')['In-hospital_death'].mean().nlargest(20).plot.bar()

In [None]:
cor = conc_final_df.corr()
#Correlation with output variable
cor_target = abs(cor["ICUType"])
#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.2]
relevant_features