#### Defining a cohort

A cohort is a table in which rows correspond to unique combinations of `person_id`, `window_start_field` (e.g., admit_date), and `window_end_field` (e.g., discharge_date).

Here, we will call a pre-defined set of transformations to define a cohort of hospital admissions. Refer to the source code for details on how this cohort is defined. In practice, a cohort can be defined arbitrarily, as long as it meets the specifications described above and is stored in a table in Google Big Query. 

In [1]:
import pandas as pd
from datasets.cohorts.admissions import AdmissionCohort

#### Instantiate Admission Cohort

In [2]:
cohort = AdmissionCohort()



#### Configure Cohort
- `google_application_credentials`: location of the json file that stores the gcloud auth credentials. Default is "~/.config/gcloud/application_default_credential.json", the default location after auth setup using the command `gcloud auth application-default login`
- `glcoud_project`: gcloud project [default "som-nero-nigam-starr"]
- `dataset_project`: project in which OMOP CDM dataset is stored [default "som-nero-nigam-starr"]
- `rs_dataset_project`: project in which cohort table is stored and to which the label table will be written [default "som-nero-nigam-starr"]
- `dataset`: name of the OMOP CDM dataset [default "starr_omop_cdm5_deid_20210723"]
- `rs_dataset`: name of the dataset in which cohort table is stored and to which the label table will be written
- `cohort_name`: name of the cohort
- `limit`: Optionally used for debugging to restrict the number of rows in the cohort table
- `min_stay_hour`: Optionally used to filter based on length (in hours) of the time window
- `limit_str`: Optional; Created using `limit`, but can can be customly specified
- `where_str`: Optional; Created using `min_stay_hour`, but can be customly specified

In [4]:
cohort.configure(
    rs_dataset='lguo_explore',
    cohort_name='test_refactor_admissions_rollup'
)

In [5]:
cohort.config

{'google_application_credentials': '/home/guolin1/.config/gcloud/application_default_credentials.json',
 'gcloud_project': 'som-nero-nigam-starr',
 'dataset_project': 'som-nero-nigam-starr',
 'rs_dataset_project': 'som-nero-nigam-starr',
 'dataset': 'starr_omop_cdm5_deid_20210723',
 'rs_dataset': 'lguo_explore',
 'cohort_name': 'test_refactor_admissions_rollup',
 'limit': None,
 'min_stay_hour': None,
 'limit_str': '',
 'where_str': ''}

#### Create Cohort Table on GBQ

In [6]:
cohort.create_cohort_table()

In [7]:
df = pd.read_gbq("""
select * from `som-nero-nigam-starr.lguo_explore.test_refactor_admissions_rollup` limit 1000
""",use_bqstorage_api=True)



In [8]:
df.head(5)

Unnamed: 0,person_id,admit_date,discharge_date
0,29923082,2018-10-16 13:30:00,2018-10-24 13:08:00
1,29923083,2018-07-22 20:22:00,2018-07-26 14:45:00
2,29923090,2018-02-04 08:05:00,2018-02-09 11:27:00
3,29923110,2015-06-27 17:03:00,2015-06-30 12:20:00
4,29923110,2020-10-27 09:06:00,2020-10-31 14:55:00
