<a href="https://colab.research.google.com/github/sumaiya08/medicaldata/blob/master/EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Access data using Google BigQuery.
from google.colab import auth
from google.cloud import bigquery

In [3]:
auth.authenticate_user()
print('Authenticated')


Authenticated


In [0]:
project_id='eicudata'


Total number of hospitals in the database -

In [0]:
%%bigquery --project $project_id totalhosp
Select COUNT (DISTINCT hospitalid)
from `physionet-data.eicu_crd.hospital`;

In [6]:
totalhosp

Unnamed: 0,f0_
0,208


Total number of ICU'S - 

In [0]:
%%bigquery --project $project_id totalicu
SELECT COUNT (DISTINCT wardid )
FROM `physionet-data.eicu_crd.patient`


In [8]:
totalicu

Unnamed: 0,f0_
0,335


Procedures that have ICD 9/10 codes available -

In [0]:
%%bigquery --project $project_id icdcode
select diagnosisstring,icd9code,count(*) as count
from `physionet-data.eicu_crd.diagnosis` 
where icd9code is not null
group by diagnosisstring, icd9code


In [10]:
icdcode

Unnamed: 0,diagnosisstring,icd9code,count
0,cardiovascular|shock / hypotension|hypovolemic...,"785.59, R57.1, R58",3249
1,cardiovascular|diseases of the aorta|aortic an...,"441.4, I71.4",727
2,cardiovascular|diseases of the aorta|aortic an...,"441.9, I71.9",81
3,pulmonary|disorders of the airways|asthma / br...,"493.90, J45",7593
4,oncology|hematologic malignancy|leukemia|chron...,"204.10, C91.10",394
...,...,...,...
3928,renal|disorder of acid base|metabolic acidosis...,"276.2, E87.2",5
3929,surgery|neurosurgical issues|seizures|from tumor,"345.90, R56.9",1
3930,pulmonary|respiratory failure|acute respirator...,"518.81, J96.00",1
3931,surgery|respiratory failure|ventilatory failur...,"518.81, J96.00",1


Top 10 diseases common amongst patients -

In [0]:
%%bigquery --project $project_id topdiseases 
select apacheadmissiondx, count(apacheadmissiondx) as count
from `physionet-data.eicu_crd.patient` 
group by apacheadmissiondx 
order by count(apacheadmissiondx) desc
limit 10;

In [12]:
topdiseases

Unnamed: 0,apacheadmissiondx,count
0,,22996
1,"Sepsis, pulmonary",8862
2,"Infarction, acute myocardial (MI)",7228
3,"CVA, cerebrovascular accident/stroke",6647
4,"CHF, congestive heart failure",6617
5,"Sepsis, renal/UTI (including bladder)",5273
6,"Rhythm disturbance (atrial, supraventricular)",4827
7,Diabetic ketoacidosis,4825
8,Cardiac arrest (with or without respiratory ar...,4580
9,"CABG alone, coronary artery bypass grafting",4543


Identifying patients with heart diseases using related drug name - 

In [0]:
%%bigquery --project $project_id heartpatients 
select gender, age, ethnicity,apacheadmissiondx,drugname
from `physionet-data.eicu_crd.admissiondrug` a
join `physionet-data.eicu_crd.patient` b ON (
a.patientunitstayid =b.patientunitstayid
)
where drugname like '%CLOPIDOGREL%' # can also look for drug WARFARIN

In [14]:
heartpatients

Unnamed: 0,gender,age,ethnicity,apacheadmissiondx,drugname
0,Male,74,,"Endarterectomy, carotid",CLOPIDOGREL ...
1,Male,68,Hispanic,"Bleeding, upper GI",CLOPIDOGREL ...
2,Female,74,,"Endarterectomy, carotid",CLOPIDOGREL ...
3,Male,69,Caucasian,"Hematoma subdural, surgery for",CLOPIDOGREL ...
4,Male,53,Caucasian,Hemorrhage (for gastrointestinal bleeding GI-s...,CLOPIDOGREL ...
...,...,...,...,...,...
1724,Female,85,African American,"Hematologic medical, other",CLOPIDOGREL ...
1725,Male,77,Caucasian,"Infarction, acute myocardial (MI)",CLOPIDOGREL ...
1726,Male,73,African American,Seizures (primary-no structural brain disease),CLOPIDOGREL ...
1727,Male,73,African American,"CVA, cerebrovascular accident/stroke",CLOPIDOGREL ...


Indentifying diabetic patients using drug name -

In [0]:
%%bigquery --project $project_id diabpatients 
select gender,age,ethnicity,apacheadmissiondx,admissionheight,drugname
from `physionet-data.eicu_crd.admissiondrug` a
join `physionet-data.eicu_crd.patient` b ON (
a.patientunitstayid =b.patientunitstayid
)
where drugname like '%GLUCAGON%'    #can also look for drug HUMULIN

In [16]:
diabpatients

Unnamed: 0,gender,age,ethnicity,apacheadmissiondx,admissionheight,drugname
0,Female,71,African American,"Embolus, pulmonary",162.6,"GLUCAGON,HUMAN RECOMBINANT ..."
1,Female,71,African American,"Embolus, pulmonary",162.6,"GLUCAGON,HUMAN RECOMBINANT ..."
2,Female,71,African American,"Embolus, pulmonary",162.6,"GLUCAGON,HUMAN RECOMBINANT ..."
3,Female,71,African American,"Embolus, pulmonary",162.6,"GLUCAGON,HUMAN RECOMBINANT ..."
4,Female,71,African American,"Embolus, pulmonary",162.6,"GLUCAGON,HUMAN RECOMBINANT ..."
...,...,...,...,...,...,...
143,Female,78,Asian,Emphysema/bronchitis,154.9,GLUCAGON EMERGENCY KIT ...
144,Female,53,Caucasian,Anaphylaxis,177.8,GLUCAGON EMERGENCY KIT ...
145,Male,60,Caucasian,Seizures (primary-no structural brain disease),170.2,GLUCAGON EMERGENCY KIT ...
146,Male,22,Other/Unknown,"Overdose, street drugs (opiates, cocaine, amph...",170.2,GLUCAGON EMERGENCY KIT ...


Types of procedures patients come for -

In [0]:
%%bigquery --project $project_id procedures 
select treatmentstring, count (treatmentstring) as count
from `physionet-data.eicu_crd.treatment` 
group by treatmentstring
order by count(treatmentstring) desc
limit 5;

In [20]:
procedures

Unnamed: 0,treatmentstring,count
0,pulmonary|ventilation and oxygenation|mechanic...,117481
1,pulmonary|radiologic procedures / bronchoscopy...,65148
2,neurologic|pain / agitation / altered mentatio...,46055
3,renal|urinary catheters|foley catheter,40672
4,cardiovascular|intravenous fluid|normal saline...,40522


Identifying patients that underwent heart procedures -

In [0]:
%%bigquery --project $project_id heartprocedure 
select gender,age,ethnicity,apacheadmissiondx,treatmentstring
from `physionet-data.eicu_crd.treatment`  a
join `physionet-data.eicu_crd.patient` b ON (
a.patientunitstayid =b.patientunitstayid
)
where a.treatmentstring  like '%cardio%'

In [22]:
heartprocedure

Unnamed: 0,gender,age,ethnicity,apacheadmissiondx,treatmentstring
0,Female,75,Other/Unknown,Emphysema/bronchitis,cardiovascular|arrhythmias|digoxin
1,Female,75,Other/Unknown,Emphysema/bronchitis,cardiovascular|arrhythmias|digoxin
2,Female,75,Other/Unknown,Emphysema/bronchitis,cardiovascular|arrhythmias|calcium channel blo...
3,Female,75,Other/Unknown,Emphysema/bronchitis,cardiovascular|arrhythmias|calcium channel blo...
4,Male,46,Caucasian,"Pneumonia, other",cardiovascular|non-operative procedures|diagno...
...,...,...,...,...,...
860530,Male,74,Caucasian,Hematomas,cardiovascular|hypertension|vasodilating agent...
860531,Male,74,Caucasian,Hematomas,cardiovascular|myocardial ischemia / infarctio...
860532,Male,74,Caucasian,Hematomas,cardiovascular|intravenous fluid|Lactated Ring...
860533,Male,76,Caucasian,Encephalopathies (excluding hepatic),cardiovascular|intravenous fluid|normal saline...


Indentifying heart patients that die before being released from their stay -

In [0]:
%%bigquery --project $project_id death_heart
select gender, age, ethnicity,apacheadmissiondx,hospitaldischargestatus
from `physionet-data.eicu_crd.patient` 
where apacheadmissiondx like '%Cardi%' and hospitaldischargestatus like '%Expire%';

In [24]:
death_heart

Unnamed: 0,gender,age,ethnicity,apacheadmissiondx,hospitaldischargestatus
0,Male,88,Caucasian,"Cardiovascular medical, other",Expired
1,Female,63,African American,Cardiac arrest (with or without respiratory ar...,Expired
2,Male,84,Caucasian,Cardiac arrest (with or without respiratory ar...,Expired
3,Male,80,Hispanic,Cardiac arrest (with or without respiratory ar...,Expired
4,Male,59,Caucasian,Cardiac arrest (with or without respiratory ar...,Expired
...,...,...,...,...,...
2461,Male,83,Caucasian,Cardiac arrest (with or without respiratory ar...,Expired
2462,Female,78,Caucasian,Cardiac arrest (with or without respiratory ar...,Expired
2463,Male,86,Caucasian,Cardiac arrest (with or without respiratory ar...,Expired
2464,Male,88,Caucasian,Cardiac arrest (with or without respiratory ar...,Expired


In [31]:
print('Number of Heart patients that die during their stay in the hospital - ', len(death_heart))

Number of Heart patients that die during their stay in the hospital -  2466


Identifying diabetic patients that die before being released from the stay -

In [0]:
%%bigquery --project $project_id death_diab
select gender, age, ethnicity,apacheadmissiondx,hospitaldischargestatus
from `physionet-data.eicu_crd.patient` 
where apacheadmissiondx like '%Diab%' and hospitaldischargestatus like '%Expire%'

In [29]:
death_diab

Unnamed: 0,gender,age,ethnicity,apacheadmissiondx,hospitaldischargestatus
0,Male,28,Caucasian,Diabetic ketoacidosis,Expired
1,Male,36,Hispanic,Diabetic ketoacidosis,Expired
2,Female,44,Hispanic,Diabetic ketoacidosis,Expired
3,Female,53,Hispanic,Diabetic hyperglycemic hyperosmolar nonketotic...,Expired
4,Female,35,Caucasian,Diabetic ketoacidosis,Expired
5,Female,64,Caucasian,Diabetic ketoacidosis,Expired
6,Male,28,Caucasian,Diabetic ketoacidosis,Expired
7,Male,44,Caucasian,Diabetic ketoacidosis,Expired
8,Female,89,Caucasian,Diabetic hyperglycemic hyperosmolar nonketotic...,Expired
9,Female,63,Caucasian,Diabetic ketoacidosis,Expired


In [30]:
print("Number of Diabetic patients that die before being released from their stay -", len(death_diab))

Number of Diabetic patients that die before being released from their stay - 39
