<a href="https://colab.research.google.com/github/nelsondressler/DragonCaveDS/blob/master/DragonCaveDS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **WiDS Datathon 2021 - DragonCaveDS Group**

## Analyzing MIT’s GOSSIS's data to determine whether a patient has been diagnosed with *Diabetes Mellitus*.

Members: Henrique Branco, Lorrayne Alberto, Marina Passos

### Data Dictionary:
 
- encounter_id: Whether the patient was invasively ventilated at the time of the highest scoring arterial blood gas using the oxygenation scoring algorithm, including any mode of positive pressure ventilation delivered through a circuit attached to an endo-tracheal tube or tracheostomy;

- hospital_id: Whether the patient was intubated at the time of the highest scoring arterial blood gas used in the oxygenation score;

- age: Whether the patient was admitted to the hospital for an elective surgical operation;

- bmi: Whether the patient has their immune system suppressed within six months prior to ICU admission for any of the following reasons; radiation therapy, chemotherapy, use of non-cytotoxic immunosuppressive drugs, high dose steroids (at least 0.3 mg/kg/day of methylprednisolone or equivalent for at least 6 months).;

- elective_surgery: Whether the patient has cirrhosis and additional complications including jaundice and ascites, upper GI bleeding, hepatic encephalopathy, or coma.;

- ethnicity: Whether the patient has been diagnosed with non-Hodgkin lymphoma.;

- gender: Whether the patient has been diagnosed with diabetes mellitus, a chronic disease.;

- height: Whether the patient has been diagnosed with any solid tumor carcinoma (including malignant melanoma) which has evidence of metastasis.;

- hospital_admit_source: Whether the patient has been diagnosed with acute or chronic myelogenous leukemia, acute or chronic lymphocytic leukemia, or multiple myeloma.;

- icu_admit_source: Whether the patient has a history of heavy alcohol use with portal hypertension and varices, other causes of cirrhosis with evidence of portal hypertension and varices, or biopsy proven cirrhosis. This comorbidity does not apply to patients with a functioning liver transplant.;

- icu_admit_type: Whether the patient has a definitive diagnosis of acquired immune deficiency syndrome (AIDS) (not HIV positive alone);

- icu_id: Whether the patient had acute renal failure during the first 24 hours of their unit stay, defined as a 24 hour urine output <410ml, creatinine >=133 micromol/L and no chronic dialysis;

- icu_stay_type: Whether the Glasgow Coma Scale was unable to be assessed due to patient sedation;

- icu_type: Whether the current unit stay is the second (or greater) stay at an ICU within the same hospitalization;

- pre_icu_los_days: Unique identifier associated with a patient unit stay;

- readmission_status: Unique identifier associated with a hospital;

- weight: The white blood cell count measured during the first 24 hours which results in the highest APACHE III score;

- albumin_apache: The weight (body mass) of the person on unit admission;

- apache_2_diagnosis: The verbal component of the Glasgow Coma Scale measured during the first 24 hours which results in the highest APACHE III score;

- apache_3j_diagnosis: The type of unit admission for the patient;

- apache_post_operative: The total urine output for the first 24 hours;

- arf_apache: The temperature measured during the first 24 hours which results in the highest APACHE III score;

- bilirubin_apache: The sodium concentration measured during the first 24 hours which results in the highest APACHE III score;

- bun_apache: The respiratory rate measured during the first 24 hours which results in the highest APACHE III score;

- creatinine_apache: The pH from the arterial blood gas taken during the first 24 hours of unit admission which produces the highest APACHE III score for acid-base disturbance;

- fio2_apache: The patient's lowest systolic blood pressure during the first hour of their unit stay, non-invasively measured;

- gcs_eyes_apache: The patient's lowest systolic blood pressure during the first hour of their unit stay, invasively measured;

- gcs_motor_apache: The patient's lowest systolic blood pressure during the first hour of their unit stay, either non-invasively or invasively measured;

- gcs_unable_apache: The patient's lowest systolic blood pressure during the first 24 hours of their unit stay, non-invasively measured;

- gcs_verbal_apache: The patient's lowest systolic blood pressure during the first 24 hours of their unit stay, invasively measured;

- glucose_apache: The patient's lowest systolic blood pressure during the first 24 hours of their unit stay, either non-invasively or invasively measured;

- heart_rate_apache: The patient's lowest respiratory rate during the first hour of their unit stay;

- hematocrit_apache: The patient's lowest respiratory rate during the first 24 hours of their unit stay;

- intubated_apache: The patient's lowest peripheral oxygen saturation during the first hour of their unit stay;

- map_apache: The patient's lowest peripheral oxygen saturation during the first 24 hours of their unit stay;

- paco2_apache: The patient's lowest mean blood pressure during the first hour of their unit stay, non-invasively measured;

- paco2_for_ph_apache: The patient's lowest mean blood pressure during the first hour of their unit stay, invasively measured;

- pao2_apache: The patient's lowest mean blood pressure during the first hour of their unit stay, either non-invasively or invasively measured;

- ph_apache: The patient's lowest mean blood pressure during the first 24 hours of their unit stay, non-invasively measured;

- resprate_apache: The patient's lowest mean blood pressure during the first 24 hours of their unit stay, invasively measured;

- sodium_apache: The patient's lowest mean blood pressure during the first 24 hours of their unit stay, either non-invasively or invasively measured;

- temp_apache: The patient's lowest heart rate during the first hour of their unit stay;

- urineoutput_apache: The patient's lowest heart rate during the first 24 hours of their unit stay;

- ventilated_apache: The patient's lowest diastolic blood pressure during the first hour of their unit stay, non-invasively measured;

- wbc_apache: The patient's lowest diastolic blood pressure during the first hour of their unit stay, invasively measured;

- d1_diasbp_invasive_max: The patient's lowest diastolic blood pressure during the first hour of their unit stay, either non-invasively or invasively measured;

- d1_diasbp_invasive_min: The patient's lowest diastolic blood pressure during the first 24 hours of their unit stay, non-invasively measured;

- d1_diasbp_max: The patient's lowest diastolic blood pressure during the first 24 hours of their unit stay, invasively measured;

- d1_diasbp_min: The patient's lowest diastolic blood pressure during the first 24 hours of their unit stay, either non-invasively or invasively measured;

- d1_diasbp_noninvasive_max: The patient's lowest core temperature during the first hour of their unit stay;

- d1_diasbp_noninvasive_min: The patient's lowest core temperature during the first 24 hours of their unit stay;

- d1_heartrate_max: The patient's highest systolic blood pressure during the first hour of their unit stay, non-invasively measured;

- d1_heartrate_min: The patient's highest systolic blood pressure during the first hour of their unit stay, invasively measured;

- d1_mbp_invasive_max: The patient's highest systolic blood pressure during the first hour of their unit stay, either non-invasively or invasively measured;

- d1_mbp_invasive_min: The patient's highest systolic blood pressure during the first 24 hours of their unit stay, non-invasively measured;

- d1_mbp_max: The patient's highest systolic blood pressure during the first 24 hours of their unit stay, invasively measured;

- d1_mbp_min: The patient's highest systolic blood pressure during the first 24 hours of their unit stay, either non-invasively or invasively measured;

- d1_mbp_noninvasive_max: The patient's highest respiratory rate during the first hour of their unit stay;

- d1_mbp_noninvasive_min: The patient's highest respiratory rate during the first 24 hours of their unit stay;

- d1_resprate_max: The patient's highest peripheral oxygen saturation during the first hour of their unit stay;

- d1_resprate_min: The patient's highest peripheral oxygen saturation during the first 24 hours of their unit stay;

- d1_spo2_max: The patient's highest mean blood pressure during the first hour of their unit stay, non-invasively measured;

- d1_spo2_min: The patient's highest mean blood pressure during the first hour of their unit stay, invasively measured;

- d1_sysbp_invasive_max: The patient's highest mean blood pressure during the first hour of their unit stay, either non-invasively or invasively measured;

- d1_sysbp_invasive_min: The patient's highest mean blood pressure during the first 24 hours of their unit stay, non-invasively measured;

- d1_sysbp_max: The patient's highest mean blood pressure during the first 24 hours of their unit stay, invasively measured;

- d1_sysbp_min: The patient's highest mean blood pressure during the first 24 hours of their unit stay, either non-invasively or invasively measured;

- d1_sysbp_noninvasive_max: The patient's highest heart rate during the first hour of their unit stay;

- d1_sysbp_noninvasive_min: The patient's highest heart rate during the first 24 hours of their unit stay;

- d1_temp_max: The patient's highest diastolic blood pressure during the first hour of their unit stay, non-invasively measured;

- d1_temp_min: The patient's highest diastolic blood pressure during the first hour of their unit stay, invasively measured;

- h1_diasbp_invasive_max: The patient's highest diastolic blood pressure during the first hour of their unit stay, either non-invasively or invasively measured;

- h1_diasbp_invasive_min: The patient's highest diastolic blood pressure during the first 24 hours of their unit stay, non-invasively measured;

- h1_diasbp_max: The patient's highest diastolic blood pressure during the first 24 hours of their unit stay, invasively measured;

- h1_diasbp_min: The patient's highest diastolic blood pressure during the first 24 hours of their unit stay, either non-invasively or invasively measured;

- h1_diasbp_noninvasive_max: The patient's highest core temperature during the first hour of their unit stay, invasively measured;

- h1_diasbp_noninvasive_min: The patient's highest core temperature during the first 24 hours of their unit stay, invasively measured;

- h1_heartrate_max: The partial pressure of oxygen from the arterial blood gas taken during the first 24 hours of unit admission which produces the highest APACHE III score for oxygenation;

- h1_heartrate_min: The partial pressure of carbon dioxide from the arterial blood gas taken during the first 24 hours of unit admission which produces the highest APACHE III score for oxygenation;

- h1_mbp_invasive_max: The partial pressure of carbon dioxide from the arterial blood gas taken during the first 24 hours of unit admission which produces the highest APACHE III score for acid-base disturbance;

- h1_mbp_invasive_min: The motor component of the Glasgow Coma Scale measured during the first 24 hours which results in the highest APACHE III score;

- h1_mbp_max: The mean arterial pressure measured during the first 24 hours which results in the highest APACHE III score;

- h1_mbp_min: The lowest white blood cell count for the patient during the first hour of their unit stay;

- h1_mbp_noninvasive_max: The lowest white blood cell count for the patient during the first 24 hours of their unit stay;

- h1_mbp_noninvasive_min: The lowest volume proportion of red blood cells in a patient's blood during the first hour of their unit stay, expressed as a fraction;

- h1_resprate_max: The lowest volume proportion of red blood cells in a patient's blood during the first 24 hours of their unit stay, expressed as a fraction;

- h1_resprate_min: The lowest sodium concentration for the patient in their serum or plasma during the first hour of their unit stay;

- h1_spo2_max: The lowest sodium concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_spo2_min: The lowest potassium concentration for the patient in their serum or plasma during the first hour of their unit stay;

- h1_sysbp_invasive_max: The lowest potassium concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_sysbp_invasive_min: The lowest platelet count for the patient during the first hour of their unit stay;

- h1_sysbp_max: The lowest platelet count for the patient during the first 24 hours of their unit stay;

- h1_sysbp_min: The lowest lactate concentration for the patient in their serum or plasma during the first hour of their unit stay;

- h1_sysbp_noninvasive_max: The lowest lactate concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_sysbp_noninvasive_min: The lowest international normalized ratio for the patient during the first hour of their unit stay;

- h1_temp_max: The lowest international normalized ratio for the patient during the first 24 hours of their unit stay;

- h1_temp_min: The lowest hemoglobin concentration for the patient during the first hour of their unit stay;

- d1_albumin_max: The lowest hemoglobin concentration for the patient during the first 24 hours of their unit stay;

- d1_albumin_min: The lowest glucose concentration of the patient in their serum or plasma during the first hour of their unit stay;

- d1_bilirubin_max: The lowest glucose concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- d1_bilirubin_min: The lowest fraction of inspired oxygen for the patient during the first hour of their unit stay;

- d1_bun_max: The lowest fraction of inspired oxygen for the patient during the first 24 hours of their unit stay;

- d1_bun_min: The lowest creatinine concentration of the patient in their serum or plasma during the first hour of their unit stay;

- d1_calcium_max: The lowest creatinine concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- d1_calcium_min: The lowest calcium concentration of the patient in their serum during the first hour of their unit stay;

- d1_creatinine_max: The lowest calcium concentration of the patient in their serum during the first 24 hours of their unit stay;

- d1_creatinine_min: The lowest blood urea nitrogen concentration of the patient in their serum or plasma during the first hour of their unit stay;

- d1_glucose_max: The lowest blood urea nitrogen concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- d1_glucose_min: The lowest bilirubin concentration of the patient in their serum or plasma during the first hour of their unit stay;

- d1_hco3_max: The lowest bilirubin concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- d1_hco3_min: The lowest bicarbonate concentration for the patient in their serum or plasma during the first hour of their unit stay;

- d1_hemaglobin_max: The lowest bicarbonate concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- d1_hemaglobin_min: The lowest arterial pH for the patient during the first hour of their unit stay;

- d1_hematocrit_max: The lowest arterial pH for the patient during the first 24 hours of their unit stay;

- d1_hematocrit_min: The lowest arterial partial pressure of oxygen for the patient during the first hour of their unit stay;

- d1_inr_max: The lowest arterial partial pressure of oxygen for the patient during the first 24 hours of their unit stay;

- d1_inr_min: The lowest arterial partial pressure of carbon dioxide for the patient during the first hour of their unit stay;

- d1_lactate_max: The lowest arterial partial pressure of carbon dioxide for the patient during the first 24 hours of their unit stay;

- d1_lactate_min: The lowest albumin concentration of the patient in their serum during the first hour of their unit stay;

- d1_platelets_max: The lowest albumin concentration of the patient in their serum during the first hour of their unit stay;

- d1_platelets_min: The lowest albumin concentration of the patient in their serum during the first 24 hours of their unit stay;

- d1_potassium_max: The lowest albumin concentration of the patient in their serum during the first 24 hours of their unit stay;

- d1_potassium_min: The location of the patient prior to being admitted to the unit;

- d1_sodium_max: The location of the patient prior to being admitted to the hospital;

- d1_sodium_min: The length of stay of the patient between hospital admission and unit admission;

- d1_wbc_max: The highest white blood cell count for the patient during the first hour of their unit stay;

- d1_wbc_min: The highest white blood cell count for the patient during the first 24 hours of their unit stay;

- h1_albumin_max: The highest volume proportion of red blood cells in a patient's blood during the first hour of their unit stay, expressed as a fraction;

- h1_albumin_min: The highest volume proportion of red blood cells in a patient's blood during the first 24 hours of their unit stay, expressed as a fraction;

- h1_bilirubin_max: The highest sodium concentration for the patient in their serum or plasma during the first hour of their unit stay;

- h1_bilirubin_min: The highest sodium concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_bun_max: The highest potassium concentration for the patient in their serum or plasma during the first hour of their unit stay;

- h1_bun_min: The highest potassium concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_calcium_max: The highest platelet count for the patient during the first hour of their unit stay;

- h1_calcium_min: The highest platelet count for the patient during the first 24 hours of their unit stay;

- h1_creatinine_max: The highest lactate concentration for the patient in their serum or plasma during the first hour of their unit stay;

- h1_creatinine_min: The highest lactate concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_glucose_max: The highest international normalized ratio for the patient during the first hour of their unit stay;

- h1_glucose_min: The highest international normalized ratio for the patient during the first 24 hours of their unit stay;

- h1_hco3_max: The highest hemoglobin concentration for the patient during the first hour of their unit stay;

- h1_hco3_min: The highest hemoglobin concentration for the patient during the first 24 hours of their unit stay;

- h1_hemaglobin_max: The highest glucose concentration of the patient in their serum or plasma during the first hour of their unit stay;

- h1_hemaglobin_min: The highest glucose concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_hematocrit_max: The highest fraction of inspired oxygen for the patient during the first hour of their unit stay;

- h1_hematocrit_min: The highest fraction of inspired oxygen for the patient during the first 24 hours of their unit stay;

- h1_inr_max: The highest creatinine concentration of the patient in their serum or plasma during the first hour of their unit stay;

- h1_inr_min: The highest creatinine concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_lactate_max: The highest calcium concentration of the patient in their serum during the first hour of their unit stay;

- h1_lactate_min: The highest calcium concentration of the patient in their serum during the first 24 hours of their unit stay;

- h1_platelets_max: The highest blood urea nitrogen concentration of the patient in their serum or plasma during the first hour of their unit stay;

- h1_platelets_min: The highest blood urea nitrogen concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_potassium_max: The highest bilirubin concentration of the patient in their serum or plasma during the first hour of their unit stay;

- h1_potassium_min: The highest bilirubin concentration of the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_sodium_max: The highest bicarbonate concentration for the patient in their serum or plasma during the first hour of their unit stay;

- h1_sodium_min: The highest bicarbonate concentration for the patient in their serum or plasma during the first 24 hours of their unit stay;

- h1_wbc_max: The highest arterial pH for the patient during the first hour of their unit stay;

- h1_wbc_min: The highest arterial pH for the patient during the first 24 hours of their unit stay;

- d1_arterial_pco2_max: The highest arterial partial pressure of oxygen for the patient during the first hour of their unit stay;

- d1_arterial_pco2_min: The highest arterial partial pressure of oxygen for the patient during the first 24 hours of their unit stay;

- d1_arterial_ph_max: The highest arterial partial pressure of carbon dioxide for the patient during the first hour of their unit stay;

- d1_arterial_ph_min: The highest arterial partial pressure of carbon dioxide for the patient during the first 24 hours of their unit stay;

- d1_arterial_po2_max: The hematocrit measured during the first 24 hours which results in the highest APACHE III score;

- d1_arterial_po2_min: The height of the person on unit admission;

- d1_pao2fio2ratio_max: The heart rate measured during the first 24 hours which results in the highest APACHE III score;

- d1_pao2fio2ratio_min: The glucose concentration measured during the first 24 hours which results in the highest APACHE III score;

- h1_arterial_pco2_max: The genotypical sex of the patient;

- h1_arterial_pco2_min: The fraction of inspired oxygen from the arterial blood gas taken during the first 24 hours of unit admission which produces the highest APACHE III score for oxygenation;

- h1_arterial_ph_max: The eye opening component of the Glasgow Coma Scale measured during the first 24 hours which results in the highest APACHE III score;

- h1_arterial_ph_min: The creatinine concentration measured during the first 24 hours which results in the highest APACHE III score;

- h1_arterial_po2_max: The common national or cultural tradition which the person belongs to;

- h1_arterial_po2_min: The body mass index of the person on unit admission;

- h1_pao2fio2ratio_max: The blood urea nitrogen concentration measured during the first 24 hours which results in the highest APACHE III score;

- h1_pao2fio2ratio_min: The bilirubin concentration measured during the first 24 hours which results in the highest APACHE III score;

- aids: The APACHE operative status; 1 for post-operative, 0 for non-operative;

- cirrhosis: The APACHE III-J sub-diagnosis code which best describes the reason for the ICU admission;

- hepatic_failure: The APACHE II diagnosis for the ICU admission;

- immunosuppression: The albumin concentration measured during the first 24 hours which results in the highest APACHE III score;

- leukemia: The age of the patient on unit admission;

- lymphoma: A unique identifier for the unit to which the patient was admitted;

- solid_tumor_with_metastasis: A classification which indicates the type of care the unit is capable of providing;

- diabetes_mellitus: Label;


In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [8]:
TrainingWiDS2021_df = pd.read_csv("https://github.com/nelsondressler/DragonCaveDS/raw/master/widsdatathon2021/TrainingWiDS2021.csv")

TrainingWiDS2021_df.head()

Unnamed: 0.1,Unnamed: 0,encounter_id,hospital_id,age,bmi,elective_surgery,ethnicity,gender,height,hospital_admit_source,icu_admit_source,icu_id,icu_stay_type,icu_type,pre_icu_los_days,readmission_status,weight,albumin_apache,apache_2_diagnosis,apache_3j_diagnosis,apache_post_operative,arf_apache,bilirubin_apache,bun_apache,creatinine_apache,fio2_apache,gcs_eyes_apache,gcs_motor_apache,gcs_unable_apache,gcs_verbal_apache,glucose_apache,heart_rate_apache,hematocrit_apache,intubated_apache,map_apache,paco2_apache,paco2_for_ph_apache,pao2_apache,ph_apache,resprate_apache,...,h1_hemaglobin_max,h1_hemaglobin_min,h1_hematocrit_max,h1_hematocrit_min,h1_inr_max,h1_inr_min,h1_lactate_max,h1_lactate_min,h1_platelets_max,h1_platelets_min,h1_potassium_max,h1_potassium_min,h1_sodium_max,h1_sodium_min,h1_wbc_max,h1_wbc_min,d1_arterial_pco2_max,d1_arterial_pco2_min,d1_arterial_ph_max,d1_arterial_ph_min,d1_arterial_po2_max,d1_arterial_po2_min,d1_pao2fio2ratio_max,d1_pao2fio2ratio_min,h1_arterial_pco2_max,h1_arterial_pco2_min,h1_arterial_ph_max,h1_arterial_ph_min,h1_arterial_po2_max,h1_arterial_po2_min,h1_pao2fio2ratio_max,h1_pao2fio2ratio_min,aids,cirrhosis,hepatic_failure,immunosuppression,leukemia,lymphoma,solid_tumor_with_metastasis,diabetes_mellitus
0,1,214826,118,68.0,22.732803,0,Caucasian,M,180.3,Floor,Floor,92,admit,CTICU,0.541667,0,73.9,2.3,113.0,502.01,0,0,0.4,31.0,2.51,,3.0,6.0,0.0,4.0,168.0,118.0,27.4,0,40.0,,,,,36.0,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,0,1
1,2,246060,81,77.0,27.421875,0,Caucasian,F,160.0,Floor,Floor,90,admit,Med-Surg ICU,0.927778,0,70.2,,108.0,203.01,0,0,,9.0,0.56,1.0,1.0,3.0,0.0,1.0,145.0,120.0,36.9,0,46.0,37.0,37.0,51.0,7.45,33.0,...,11.3,11.3,36.9,36.9,1.3,1.3,3.5,3.5,557.0,557.0,4.2,4.2,145.0,145.0,12.7,12.7,37.0,37.0,7.45,7.45,51.0,51.0,54.8,51.0,37.0,37.0,7.45,7.45,51.0,51.0,51.0,51.0,0,0,0,0,0,0,0,1
2,3,276985,118,25.0,31.952749,0,Caucasian,F,172.7,Emergency Department,Accident & Emergency,93,admit,Med-Surg ICU,0.000694,0,95.3,,122.0,703.03,0,0,,,,,3.0,6.0,0.0,5.0,,102.0,,0,68.0,,,,,37.0,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,0,0
3,4,262220,118,81.0,22.635548,1,Caucasian,F,165.1,Operating Room,Operating Room / Recovery,92,admit,CTICU,0.000694,0,61.7,,203.0,1206.03,1,0,,,,0.6,4.0,6.0,0.0,5.0,185.0,114.0,25.9,1,60.0,30.0,30.0,142.0,7.39,4.0,...,11.6,11.6,34.0,34.0,1.6,1.1,,,43.0,43.0,,,,,8.8,8.8,37.0,27.0,7.44,7.34,337.0,102.0,342.5,236.666667,36.0,33.0,7.37,7.34,337.0,265.0,337.0,337.0,0,0,0,0,0,0,0,0
4,5,201746,33,19.0,,0,Caucasian,M,188.0,,Accident & Emergency,91,admit,Med-Surg ICU,0.073611,0,,,119.0,601.01,0,0,,,,,,,,,,60.0,,0,103.0,,,,,16.0,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0,0,0,0,0


In [9]:
TrainingWiDS2021_df.shape

(130157, 181)

In [16]:
TrainingWiDS2021_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130157 entries, 0 to 130156
Columns: 181 entries, Unnamed: 0 to diabetes_mellitus
dtypes: float64(157), int64(18), object(6)
memory usage: 179.7+ MB


In [13]:
TrainingWiDS2021_df.isnull().sum()

Unnamed: 0                        0
encounter_id                      0
hospital_id                       0
age                            4988
bmi                            4490
                               ... 
immunosuppression                 0
leukemia                          0
lymphoma                          0
solid_tumor_with_metastasis       0
diabetes_mellitus                 0
Length: 181, dtype: int64

In [17]:
TrainingWiDS2021_df.describe()

Unnamed: 0.1,Unnamed: 0,encounter_id,hospital_id,age,bmi,elective_surgery,height,icu_id,pre_icu_los_days,readmission_status,weight,albumin_apache,apache_2_diagnosis,apache_3j_diagnosis,apache_post_operative,arf_apache,bilirubin_apache,bun_apache,creatinine_apache,fio2_apache,gcs_eyes_apache,gcs_motor_apache,gcs_unable_apache,gcs_verbal_apache,glucose_apache,heart_rate_apache,hematocrit_apache,intubated_apache,map_apache,paco2_apache,paco2_for_ph_apache,pao2_apache,ph_apache,resprate_apache,sodium_apache,temp_apache,urineoutput_apache,ventilated_apache,wbc_apache,d1_diasbp_invasive_max,...,h1_hemaglobin_max,h1_hemaglobin_min,h1_hematocrit_max,h1_hematocrit_min,h1_inr_max,h1_inr_min,h1_lactate_max,h1_lactate_min,h1_platelets_max,h1_platelets_min,h1_potassium_max,h1_potassium_min,h1_sodium_max,h1_sodium_min,h1_wbc_max,h1_wbc_min,d1_arterial_pco2_max,d1_arterial_pco2_min,d1_arterial_ph_max,d1_arterial_ph_min,d1_arterial_po2_max,d1_arterial_po2_min,d1_pao2fio2ratio_max,d1_pao2fio2ratio_min,h1_arterial_pco2_max,h1_arterial_pco2_min,h1_arterial_ph_max,h1_arterial_ph_min,h1_arterial_po2_max,h1_arterial_po2_min,h1_pao2fio2ratio_max,h1_pao2fio2ratio_min,aids,cirrhosis,hepatic_failure,immunosuppression,leukemia,lymphoma,solid_tumor_with_metastasis,diabetes_mellitus
count,130157.0,130157.0,130157.0,125169.0,125667.0,130157.0,128080.0,130157.0,130157.0,130157.0,126694.0,51994.0,128472.0,129292.0,130157.0,130157.0,47597.0,104746.0,105275.0,30437.0,127967.0,127967.0,129448.0,127967.0,115461.0,129848.0,103399.0,130157.0,129737.0,30437.0,30437.0,30437.0,30437.0,129349.0,105638.0,123546.0,66990.0,130157.0,100682.0,35089.0,...,27367.0,27367.0,27201.0,27201.0,48944.0,48944.0,11690.0,11690.0,24428.0,24428.0,29336.0,29336.0,28376.0,28376.0,24171.0,24171.0,45696.0,45696.0,45350.0,45350.0,46147.0,46147.0,36818.0,36818.0,22491.0,22491.0,22308.0,22308.0,22712.0,22712.0,16760.0,16760.0,130157.0,130157.0,130157.0,130157.0,130157.0,130157.0,130157.0,130157.0
mean,65079.0,213000.856519,106.102131,61.995103,29.11026,0.18984,169.607219,662.428344,0.839933,0.0,83.791104,2.886149,185.492683,565.994296,0.207111,0.027997,1.201222,25.71807,1.481629,0.595735,3.48829,5.484828,0.011441,4.030203,160.141416,99.85453,32.975817,0.156626,87.193046,42.161246,42.161246,132.061737,7.352154,25.150603,137.94526,36.420638,1800.803417,0.330432,12.187662,79.261563,...,11.204166,11.088205,33.73183,33.349796,1.577788,1.463473,3.028198,2.976982,193.943057,193.123506,4.188984,4.147028,138.167205,137.879814,13.387873,13.336485,45.341451,38.535587,7.387687,7.322903,165.003814,102.957476,287.600071,224.005403,44.552966,43.341081,7.337283,7.327771,163.035835,145.949537,247.525419,239.617358,0.00103,0.016081,0.013599,0.025669,0.007307,0.004187,0.020852,0.216285
std,37573.233831,38109.828146,63.482277,16.82288,8.262776,0.392176,10.833085,304.259843,2.485337,0.0,24.963063,0.689812,85.858208,466.51085,0.405238,0.164965,2.351994,20.690041,1.543535,0.262922,0.939831,1.271039,0.106349,1.538528,90.701327,30.759505,6.834576,0.363449,41.908109,12.267414,12.267414,84.958826,0.098423,15.02473,5.30384,0.857584,1456.551481,0.47037,6.931023,21.69332,...,2.350633,2.381224,6.804903,6.963975,0.946477,0.737639,2.898524,2.854953,92.486473,92.692759,0.760183,0.749463,5.711628,5.658036,6.933161,6.915142,14.623775,10.996254,0.085651,0.11153,107.084058,61.514357,130.31962,119.119675,14.631359,14.052015,0.10593,0.108658,112.646743,100.211935,131.440167,128.562211,0.03207,0.125786,0.115819,0.158146,0.085166,0.064574,0.142888,0.411712
min,1.0,147000.0,1.0,0.0,14.844926,0.0,137.2,82.0,-0.25,0.0,38.6,1.2,101.0,0.01,0.0,0.0,0.1,4.0,0.3,0.21,1.0,1.0,0.0,1.0,39.0,30.0,16.2,0.0,40.0,18.0,18.0,31.0,6.96054,4.0,117.0,32.1,0.0,0.0,0.9,37.0,...,5.1,5.0,16.0,15.5,0.9,0.9,0.4,0.4,20.0,20.0,2.5,2.5,114.0,114.0,1.1,1.0898,18.4,14.9,7.05428,6.89,39.0,28.0,54.8,36.0,15.0,14.997,6.93,6.9,34.0,31.0,42.0,38.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,32540.0,180001.0,49.0,52.0,23.598006,0.0,162.5,427.0,0.045833,0.0,66.5,2.4,113.0,204.01,0.0,0.0,0.4,13.0,0.71,0.4,3.0,6.0,0.0,4.0,97.0,87.0,28.0,0.0,54.0,34.5,34.5,77.0,7.301,11.0,135.0,36.2,799.0488,0.0,7.5,65.0,...,9.6,9.4,29.0,28.4,1.1,1.1,1.3,1.25,131.0,130.0,3.7,3.7,136.0,135.0,8.6,8.5,36.2,32.0,7.34,7.27,88.0,68.0,192.205556,132.0,36.0,35.0,7.29,7.28,80.0,77.0,144.0,138.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,65079.0,213014.0,112.0,64.0,27.564749,0.0,170.1,653.0,0.155556,0.0,80.0,2.9,122.0,409.02,0.0,0.0,0.7,19.0,0.97,0.5,4.0,6.0,0.0,5.0,133.0,104.0,33.1,0.0,66.0,40.0,40.0,104.0,7.36,27.0,138.0,36.5,1454.976,0.0,10.47,76.0,...,11.2,11.1,33.7,33.3,1.3,1.2,2.0,2.0,179.0,179.0,4.1,4.1,139.0,138.0,12.1,12.1,42.9,37.0,7.39,7.34,127.0,85.0,275.0,205.0,42.0,41.0,7.35,7.34,119.9,108.0,228.125,218.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,97618.0,246002.0,165.0,75.0,32.803127,0.0,177.8,969.0,0.423611,0.0,96.8,3.4,301.0,703.03,0.0,0.0,1.1,31.0,1.53,0.85,4.0,6.0,0.0,5.0,195.0,120.0,37.9,0.0,124.0,47.0,47.0,156.0,7.414,36.0,141.0,36.7,2415.096,1.0,15.3,88.0,...,12.8,12.8,38.5,38.2,1.6,1.5,3.6,3.5,239.0,238.0,4.5,4.5,141.0,141.0,16.7,16.6,50.0,43.0,7.44,7.398,206.0,116.25,370.0,300.0,49.0,48.0,7.408,7.4,214.0,182.0,333.0,324.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,130157.0,279000.0,204.0,89.0,67.81499,1.0,195.59,1111.0,175.627778,0.0,186.0,4.6,308.0,2201.05,1.0,1.0,60.2,127.0,11.18,1.0,4.0,6.0,1.0,5.0,598.7,178.0,51.4,1.0,200.0,95.0,95.0,498.0,7.59,60.0,158.0,39.7,8716.669632,1.0,45.8,181.0,...,17.4,17.3,51.7,51.5,7.756,6.127,18.1,18.0195,585.0,585.0,7.2,7.1,157.0,157.0,44.102,44.102,111.0,85.912,7.62,7.55786,540.865,448.892,834.805,604.227778,111.505,107.0,7.57,7.563,534.905,514.905,720.0,654.813793,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
