## Predicting Readmission with Discharge Summaries
## 퇴원노트로 재입원 예측하기

This notebook is studying purpose following Andrew Long's Introduction to Clinical Natural Language Processing: https://github.com/andrewwlong/mimic_bow/blob/master/mimic_bow_full.ipynb

### model 
: predict which patients are at risk for 30-day unplanned readmission with discharge summary notes

### Dataset
: MIMIC-III
- 1) ADMISSIONS : table containing admission and discharge dates
- 2) NOTEEVENTS : notes for each hospitalization
- HADM_ID

## Preprocessing:

### 1. ADMISSION
- retreive data from admission (admission_type only {urgent or emergency or elective})
- convert strings to dates
- get the next unplanned admission
- calculate days until next admission

### 2. NOTES
- filter discharge summaries

### 3. MERGE
- select one summary per admission
- label output
- make training/validation/test set

### 1. ADMISSION

#### 1.1
- retreive admission data with admission_type **{elective or emergency or urgent}** 
- **{admittime, dischtime, deathtime}** will be converted to timestamp if applicable.  
(format YYYY-MM-DD hh:mm:ss)

In [41]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
import datetime

sys.path.append('./db')

import alchemy_con
engine = alchemy_con.get_engine()

In [43]:
df_adm = pd.read_sql("""SELECT *
                        FROM ADMISSIONS
                        WHERE admission_type in ('ELECTIVE', 'EMERGENCY', 'URGENT')
                        ;""", engine)
df_adm.head()

Unnamed: 0,row_id,subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admission_location,discharge_location,insurance,language,religion,marital_status,ethnicity,edregtime,edouttime,diagnosis,hospital_expire_flag,has_chartevents_data
0,21,22,165315,2196-04-09 12:26:00,2196-04-10 15:54:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,DISC-TRAN CANCER/CHLDRN H,Private,,UNOBTAINABLE,MARRIED,WHITE,2196-04-09 10:06:00,2196-04-09 13:24:00,BENZODIAZEPINE OVERDOSE,0,1
1,22,23,152223,2153-09-03 07:15:00,2153-09-08 19:10:00,NaT,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,,CATHOLIC,MARRIED,WHITE,NaT,NaT,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,0,1
2,23,23,124321,2157-10-18 19:34:00,2157-10-25 14:00:00,NaT,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME HEALTH CARE,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,NaT,NaT,BRAIN MASS,0,1
3,24,24,161859,2139-06-06 16:14:00,2139-06-09 12:48:00,NaT,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,HOME,Private,,PROTESTANT QUAKER,SINGLE,WHITE,NaT,NaT,INTERIOR MYOCARDIAL INFARCTION,0,1
4,25,25,129635,2160-11-02 02:06:00,2160-11-05 14:55:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,HOME,Private,,UNOBTAINABLE,MARRIED,WHITE,2160-11-02 01:01:00,2160-11-02 04:27:00,ACUTE CORONARY SYNDROME,0,1


In [44]:
df_adm.shape

(51113, 19)

#### check data type for admittime and dischtime

In [45]:
df_adm.admittime.dtype

dtype('<M8[ns]')

In [46]:
df_adm.admittime[0]

Timestamp('2196-04-09 12:26:00')

#### for deathtime, convert to timestamp

In [47]:
# coerce to allow for missing dates
df_adm.deathtime = pd.to_datetime(df_adm.deathtime, \
                                  format='%Y-%m-%d %H:%M:%S', \
                                  errors='coerce')

In [48]:
df_adm.deathtime.dtype

dtype('<M8[ns]')

In [49]:
df_adm.deathtime[0]

NaT

In [50]:
# columns
df_adm.columns

Index(['row_id', 'subject_id', 'hadm_id', 'admittime', 'dischtime',
       'deathtime', 'admission_type', 'admission_location',
       'discharge_location', 'insurance', 'language', 'religion',
       'marital_status', 'ethnicity', 'edregtime', 'edouttime', 'diagnosis',
       'hospital_expire_flag', 'has_chartevents_data'],
      dtype='object')

#### columns
- https://mimic.physionet.org/mimictables/admissions/

subject_id  
hadm_id : hospitalization id  
admittime : format YYYY-MM-DD hh:mm:ss  
dischtime :  
deathtime : if exists  
admission_type : elective, emergency, newborn, urgent  
edregtime : time registered for Emergency department  
edouttime : time discharged from Emergency department  
  
due to de-dientification, date can be assigned to future.

In [51]:
# admission type
df_adm.groupby(df_adm.admission_type).size()

admission_type
ELECTIVE      7706
EMERGENCY    42071
URGENT        1336
dtype: int64

#### 1.2 get the next unplanned admission date if exists

In [53]:
# sort subject_id and admission date
df_adm.sort_values(['subject_id', 'admittime'], inplace=True)
df_adm.head()

Unnamed: 0,row_id,subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admission_location,discharge_location,insurance,language,religion,marital_status,ethnicity,edregtime,edouttime,diagnosis,hospital_expire_flag,has_chartevents_data
179,2,3,145834,2101-10-20 19:08:00,2101-10-31 13:58:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,SNF,Medicare,,CATHOLIC,MARRIED,WHITE,2101-10-20 17:09:00,2101-10-20 19:24:00,HYPOTENSION,0,1
180,3,4,185777,2191-03-16 00:28:00,2191-03-23 18:41:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,HOME WITH HOME IV PROVIDR,Private,,PROTESTANT QUAKER,SINGLE,WHITE,2191-03-15 13:10:00,2191-03-16 01:10:00,"FEVER,DEHYDRATION,FAILURE TO THRIVE",0,1
181,5,6,107064,2175-05-30 07:15:00,2175-06-15 16:00:00,NaT,ELECTIVE,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,ENGL,NOT SPECIFIED,MARRIED,WHITE,NaT,NaT,CHRONIC RENAL FAILURE/SDA,0,1
182,8,9,150750,2149-11-09 13:06:00,2149-11-14 10:15:00,2149-11-14 10:15:00,EMERGENCY,EMERGENCY ROOM ADMIT,DEAD/EXPIRED,Medicaid,,UNOBTAINABLE,,UNKNOWN/NOT SPECIFIED,2149-11-09 11:13:00,2149-11-09 13:18:00,HEMORRHAGIC CVA,1,1
183,10,11,194540,2178-04-16 06:18:00,2178-05-11 19:00:00,NaT,EMERGENCY,EMERGENCY ROOM ADMIT,HOME HEALTH CARE,Private,,OTHER,MARRIED,WHITE,2178-04-15 20:46:00,2178-04-16 06:53:00,BRAIN MASS,0,1


In [66]:
# check for a single patient
df_adm.loc[df_adm.subject_id==99982]

Unnamed: 0,row_id,subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admission_location,discharge_location,insurance,language,religion,marital_status,ethnicity,edregtime,edouttime,diagnosis,hospital_expire_flag,has_chartevents_data
48568,58968,99982,151454,2156-11-28 11:56:00,2156-12-08 13:45:00,NaT,EMERGENCY,PHYS REFERRAL/NORMAL DELI,HOME HEALTH CARE,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,NaT,NaT,TVR,0,1
48569,58969,99982,112748,2157-01-05 17:27:00,2157-01-12 13:00:00,NaT,EMERGENCY,CLINIC REFERRAL/PREMATURE,HOME,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,2157-01-05 14:03:00,2157-01-05 18:50:00,SHORTNESS OF BREATH,0,1
48570,58970,99982,183791,2157-02-16 17:31:00,2157-02-22 20:36:00,NaT,EMERGENCY,TRANSFER FROM HOSP/EXTRAM,SHORT TERM HOSPITAL,Medicare,ENGL,CATHOLIC,MARRIED,WHITE,NaT,NaT,BIVENTRICULAR HEART FAILURE,0,1
