# Background

Dataset: https://physionet.org/content/mimiciii-demo/1.4/  
Data info: https://mimic.physionet.org/mimictables/callout/  
Full database: https://physionet.org/content/mimiciii/1.4/
ICD9 codes scraped from: https://en.wikipedia.org/wiki/List_of_ICD-9_codes_001%E2%80%93139:_infectious_and_parasitic_diseases

## Definitions
SUBJECT_ID: indicates a unique patient. There can be multiple instances which indicates readmittance.  

HADM_ID, represents a single patient’s admission to the hospital; cannot be duplicates  

ADMITTIME provides the date and time the patient was admitted to the hospital  

ISCHTIME provides the date and time the patient was discharged from the hospital  

DEATHTIME provides the time of in-hospital death for the patient  

ADMISSION_TYPE describes the type of the admission: ‘ELECTIVE’, ‘URGENT’, ‘NEWBORN’ or ‘EMERGENCY’. Emergency/urgent indicate unplanned medical care, and are often collapsed into a single category in studies. Elective indicates a previously planned hospital admission. Newborn indicates that the HADM_ID pertains to the patient’s birth  

ADMISSION_LOCATION provides information about the previous location of the patient prior to arriving at the hospital. There are 9 possible values:
* EMERGENCY ROOM ADMIT
* TRANSFER FROM HOSP/EXTRAM
* TRANSFER FROM OTHER HEALT
* CLINIC REFERRAL/PREMATURE
* ** INFO NOT AVAILABLE **
* TRANSFER FROM SKILLED NUR
* TRSF WITHIN THIS FACILITY
* HMO REFERRAL/SICK
* PHYS REFERRAL/NORMAL DELI  

EDREGTIME, EDOUTTIME: Time that the patient was registered and discharged from the emergency department  
  
DIAGNOSIS column provides a preliminary, free text diagnosis for the patient on hospital admission.  

HOSPITAL_EXPIRE_FLAG: This indicates whether the patient died within the given hospitalization. 1 indicates death in the hospital, and 0 indicates survival to hospital discharge.  

INSURANCE, LANGUAGE, RELIGION, MARITAL_STATUS, ETHNICITY columns describe patient demographics  

TRANSFERTIME is the time at which the patient moved from the previous service  

PREV_SERVICE and CURR_SERVICE are the previous and current service that the patient resides under.  

GENDER is the genotypical sex of the patient  

DOB is the date of birth of the given patient. Patients who are older than 89 years old at any time in the database have had their date of birth shifted to obscure their age and comply with HIPAA  

DOD is the date of death for the given patient  

SEQ_NUM provides the order in which the ICD diagnoses relate to the patient  

ICD9_CODE contains the actual code corresponding to the diagnosis assigned to the patient for the given row.  

SUBMIT_WARDID identifies the ward from which the request was submitted.

In [5]:
import numpy as np
import pandas as pd
import datetime as dt

#visualization
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline

from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression  
from sklearn.model_selection import train_test_split

from importing_data import import_data
from hospital_data_cleaning import clean_data
from dummifying_data import dummifying_cat_cols

## Importing & Merging Datasets

In [2]:
hosp_data = import_data()

admissions imported
patients imported
diagnoses_icd imported
services imported
Sucess, all files were merged.
Success, appropriate columns were selected.
Success, files imported and columns selected.


In [3]:
hosp_data_clean = clean_data(hosp_data)

Length of stay column was added.
Age column was added.
DOB column was dropped
There are 92 unique subject_ids in the origonal and new dataset and 92 unique hadm_ids.
Patients who died in the hospital were removed.
Languages were cleaned.
Marital status was cleaned.
Admissions locations were cleaned.
Religions were cleaned.
Ethnicities were cleaned.
Scraping beginning.
ICD9 codes scraped from Wikipedia.
Temp indicator column created.
ICD9 codes converted.


In [None]:
#Not for use. Saved for any issues. 
clean_data_saved = hosp_data_clean.copy()

# Creating Dummy Variables

In [6]:
hosp_dummied = dummifying_cat_cols(hosp_data_clean)

ICD9 dummy variables created.
