## Notebook
- Examples using database and featurizer module

1. [Establish connection with GBQ](#1)
2. [Simple query from GBQ](#2)
3. [Creating a featurizer config](#3)
4. [Run Featurizer and save features](#4)

In [1]:
from utils_prediction.featurizer.mimic4_gbq import *
from utils_prediction.database import gbq_connect, gbq_query

#### 1. <a id='1'> Establish connection with GBQ </a>
- Requires a service account authentication file & a project_id (where tables are stored)

In [2]:
c = gbq_connect(
    service_account_json_path = '/hpf/projects/lsung/creds/gbq/mimic.json',
    project_id = 'mimic-iv-ches'
    )

Google Big Query Connection Established


#### 2. <a id='2'>Simple query</a>

In [3]:
q = """
    select * from `mimic-iv-ches.core.patients` limit 10
"""
df = gbq_query(c,q,verbose=True)
df


 Done!
 Took 2.34 s to process the query.
 Your query returned 10 rows, 6 columns.
 Total Memory Usage: 0.0 MB.




Unnamed: 0,subject_id,gender,anchor_age,anchor_year,anchor_year_group,dod
0,11301375,F,0,2110,2008 - 2010,
1,12653888,F,0,2111,2008 - 2010,
2,15227464,F,0,2111,2008 - 2010,
3,17572400,F,0,2111,2008 - 2010,
4,19661157,F,0,2114,2008 - 2010,
5,11093653,F,0,2114,2008 - 2010,
6,13352890,F,0,2115,2008 - 2010,
7,12731328,F,0,2116,2008 - 2010,
8,18814948,F,0,2117,2008 - 2010,
9,15385969,F,0,2117,2008 - 2010,


#### 3. Example featurizer config
- The dictionary below configurates a query to extract features for the patients in the 'mimic4ds_inhospmort' cohort table
- The extracted features include counts (of concept_ids) and measurements (mapped to min,max,avg) from the following tables: 
    - labs
    - prescriptions
    - procedures
    - hcpcsevents
    - chartevents
- The extracted features are grouped into the following time-bins: 
    - oldest -> 180 days before ICU admission 
    - 180 days -> 30 days before ICU admission
    - 30 -> 7 days before ICU admission
    - 7 days to the time of ICU admission
    - First 4 hours of ICU admission
- The extracted features are stored in a table saved as a parquet file under "save_fpath"

In [4]:
config = {
    'analysis_id':'mortality',
    'project_name':'mimic-iv-ches',
    'cohort_table':'cohorts.mimic4ds_inhospmort',
    'tables_to_build':{
        'count_labs':{
            'feature_tag':'lab',
            'feature_type':'count',
            'concept_table':'hosp.labevents',
            'concept_id':'itemid',
            'timestamp':'charttime'
            },
        'count_prescriptions':{
            'feature_tag':'presc',
            'feature_type':'count',
            'concept_table':'hosp.prescriptions',
            'concept_id':'ndc',
            'timestamp':'starttime'
            },
        'count_diagnosis':{
            'feature_tag':'diag',
            'feature_type':'count',
            'concept_table':'hosp.diagnoses_icd',
            'concept_id':'icd_code',
            'timestamp': 'discharge'
            },
        'count_procedures':{
            'feature_tag':'proc',
            'feature_type':'count',
            'concept_table':'hosp.procedures_icd',
            'concept_id':'icd_code',
            'timestamp': 'discharge'
            },
        'count_hcpcs':{
            'feature_tag':'hcpcs',
            'feature_type':'count',
            'concept_table':'hosp.hcpcsevents',
            'concept_id':'hcpcs_cd',
            'timestamp': 'discharge'
            },
        'meas_labs':{
            'feature_tag':'labs',
            'feature_type':'measurement',
            'concept_table':'hosp.labevents',
            'concept_id':'itemid',
            'timestamp':'charttime'
            },
        'meas_icucharts':{
            'feature_tag':'icucharts',
            'feature_type':'measurement',
            'concept_table':'icu.chartevents',
            'concept_id':'itemid',
            'timestamp':'charttime'
            }
        },
    'time_bins': ['-180','-30','-7','0','0.167'],
    'include_all_history': True,
    'save_fpath':'/hpf/projects/lsung/projects/utils_prediction/utils_prediction/demos/data/',
    'save_ftype':'parquet'
     }

#### 4. Run Featurizer and save
- all featurizer settings and save path are specified by the config above.

In [5]:
%%time
df = featurizer(config).featurize_and_save()
df.head(2)

Google Big Query Connection Established
saving features...
CPU times: user 2min 1s, sys: 35.9 s, total: 2min 37s
Wall time: 2min 41s


Unnamed: 0,label,group_var,subject_id,insurance,marital_status,ethnicity,language,age,gender,00002140701_presc_count_bin_0_0.167,...,Z992 _diag_count_bin_-7_0,Z993 _diag_count_bin_-180_-30,Z993 _diag_count_bin_-30_-7,Z993 _diag_count_bin_-365000_-180,Z993 _diag_count_bin_-7_0,Z9981 _diag_count_bin_-180_-30,Z9981 _diag_count_bin_-30_-7,Z9981 _diag_count_bin_-365000_-180,Z9981 _diag_count_bin_-7_0,Z9989 _diag_count_bin_-180_-30
0,0,2008 - 2010,12517625,Other,SINGLE,WHITE,ENGLISH,20,F,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0,2008 - 2010,16230402,Medicaid,MARRIED,HISPANIC/LATINO,OTHER,28,F,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Create a random sample of features

In [6]:
from utils_prediction.dataloader.mimic4 import *

In [7]:
data = dataloader(
    analysis_id='mortality',
    features_fpath='/hpf/projects/lsung/projects/utils_prediction/utils_prediction/demos/data/'
    ).load_features()

In [9]:
data.features = data.sample()

In [26]:
data.features.to_csv('data/mimic4_slice.csv',index=False)