# Patient Basic
Create patient cohort of all patients who have at least 1 claim wtih a code in `ref_cohort` between the dates of interest.

**Script**
* [scripts/coh/coh_basic.ipynb](./scripts/coh/coh_basic.ipynb)
 
**Parameters**
* `in/coh/coh_basic.xlsx[param]`
* `in/coh/coh_basic.xlsx[ref]`

**Input**
* None

**Output**
* `coh_basic_ref`
* `coh_basic_raven_diagnosis`
* `coh_basic_raven_pharmacy`
* `coh_basic_raven_procedure`
* `coh_pt`
* `coh_claim`

**Review**
* [scripts/coh/coh_basic.html](./scripts/coh/coh_basic.html)

In [1]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from workbook_writer import make_xlsx

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications

# Parameters
Python dictionary of parameters for the cohort basic procedure

**Input**  
* `in/coh/coh_basic.xlsx[param]`

**Output**
* Python variables named after parameters with the value


In [2]:
#Create system variables from excel into script and review values in dictionary
input_df = pd.read_excel('../../in/coh/coh_bool.xlsx', sheet_name='param', skiprows=4, dtype=str)
var_dict = dict(zip(input_df.parameter, input_df.value))
for key,val in var_dict.items(): exec(key + '=val')

#Check inputs
pd.DataFrame.from_dict(var_dict, orient='index')

Unnamed: 0,0
med_dx_start_dt,2017-01-01
med_dx_end_dt,2018-12-31
proc_rx_start_dt,2017-01-01
proc_rx_end_dt,2018-12-31


# Reference
Uploads reference table to snowflake used to identify the cohort

**Input**  
  * `in/coh/coh_basic.xlsx[ref]`

**Output**  
* `coh_basic_ref`

## coh_bool_ref_dx1
Diagnosis code group 1

In [3]:
#Upload reference table from excel to snowflake and review snowflake output
df = pd.read_excel('../../in/coh/coh_bool.xlsx', sheet_name='DX1', skiprows=4, dtype=str)

#Strip white space, make referrable columns uppercase, and remove non alphanumeric values from coce
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
df[['code_type','code']] =  \
   df[['code_type','code']].apply(lambda x: x.str.upper() if x.dtype == "object" else x)
df.code= df.code.str.replace(r'\W+','').astype('str')

#Upload to snowflake
snow.drop_table("coh_bool_ref_dx1")
snow.upload_dataframe(df,"coh_bool_ref_dx1")
snow.select("SELECT * FROM coh_bool_ref_dx1")
del df

DROP TABLE IF EXISTS ref_db.semi_custom.coh_bool_ref_dx1;
Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
Table ref_db.semi_custom.coh_bool_ref_dx1 dropped! (╯°□°）╯︵ ┻━┻
Upload into ref_db.semi_custom.coh_bool_ref_dx1 successful! ┬──┬◡ﾉ(°-°ﾉ)


Unnamed: 0,code_type,code,description,source
0,ICD10,G3289,Other specified degenerative disorders of nerv...,Ankit Market Def
1,ICD10,G3281,Cerebellar ataxia in diseases classified elsew...,Ankit Market Def
2,ICD10,G328,Other specified degenerative disorders of nerv...,Ankit Market Def
3,ICD10,G320,Subacute combined degeneration of spinal cord ...,Ankit Market Def
4,ICD10,G32,Other degenerative disorders of nervous system...,Ankit Market Def
5,ICD10,G319,Degenerative disease of nervous system unspeci...,Ankit Market Def
6,ICD10,G3189,Other specified degenerative diseases of nervo...,Ankit Market Def
7,ICD10,G3185,Corticobasal degeneration,Ankit Market Def
8,ICD10,G3184,Mild cognitive impairment so stated,Ankit Market Def
9,ICD10,G3183,Dementia with Lewy bodies,Ankit Market Def


## coh_bool_ref_dx2
Diagnosis Code Group 2

In [4]:
#Upload reference table from excel to snowflake and review snowflake output
df = pd.read_excel('../../in/coh/coh_bool.xlsx', sheet_name='DX2', skiprows=4, dtype=str)

#Strip white space, make referrable columns uppercase, and remove non alphanumeric values from coce
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
df[['code_type','code']] =  \
   df[['code_type','code']].apply(lambda x: x.str.upper() if x.dtype == "object" else x)
df.code= df.code.str.replace(r'\W+','').astype('str')

#Upload to snowflake
snow.drop_table("coh_bool_ref_dx2")
snow.upload_dataframe(df,"coh_bool_ref_dx2")
snow.select("SELECT * FROM coh_bool_ref_dx2")
del df

DROP TABLE IF EXISTS ref_db.semi_custom.coh_bool_ref_dx2;
Table ref_db.semi_custom.coh_bool_ref_dx2 dropped! (╯°□°）╯︵ ┻━┻
Upload into ref_db.semi_custom.coh_bool_ref_dx2 successful! ┬──┬◡ﾉ(°-°ﾉ)


Unnamed: 0,code_type,code,description,source
0,ICD9,4118,,Xarelto File
1,ICD9,4108,,Xarelto File
2,ICD9,4102,,Xarelto File
3,ICD9,410,,Xarelto File
4,ICD9,4106,,Xarelto File
5,ICD9,4107,,Xarelto File
6,ICD9,41010,Acute myocardial infarction of other anterior ...,Xarelto File
7,ICD9,41012,Acute myocardial infarction of other anterior ...,Xarelto File
8,ICD9,41062,"True posterior wall infarction, subsequent epi...",Xarelto File
9,ICD9,41080,Acute myocardial infarction of other specified...,Xarelto File


## coh_bool_ref_proc_rx
Procedure and rx grouped codes

In [5]:
#Upload reference table from excel to snowflake and review snowflake output
df = pd.read_excel('../../in/coh/coh_bool.xlsx', sheet_name='proc_rx', skiprows=4, dtype=str)

#Strip white space, make referrable columns uppercase, and remove non alphanumeric values from coce
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
df[['code_type','code']] =  \
   df[['code_type','code']].apply(lambda x: x.str.upper() if x.dtype == "object" else x)
df.code= df.code.str.replace(r'\W+','').astype('str')

#Upload to snowflake
snow.drop_table("coh_bool_ref_proc_rx")
snow.upload_dataframe(df,"coh_bool_ref_proc_rx")
snow.select("SELECT * FROM coh_bool_ref_proc_rx")
del df

DROP TABLE IF EXISTS ref_db.semi_custom.coh_bool_ref_proc_rx;
Table ref_db.semi_custom.coh_bool_ref_proc_rx dropped! (╯°□°）╯︵ ┻━┻
Upload into ref_db.semi_custom.coh_bool_ref_proc_rx successful! ┬──┬◡ﾉ(°-°ﾉ)


Unnamed: 0,category,code_type,code,description,source
0,Drug1,NDC,75839042501,Memantine Hydrochloride,Ankit Market Def
1,Drug1,NDC,71335058203,Donepezil Hydrochloride,Ankit Market Def
2,Drug1,NDC,71335058202,Donepezil Hydrochloride,Ankit Market Def
3,Drug1,NDC,71335058201,Donepezil Hydrochloride,Ankit Market Def
4,Drug1,NDC,71335041603,Donepezil Hydrochloride,Ankit Market Def
5,Drug1,NDC,71335041602,Donepezil Hydrochloride,Ankit Market Def
6,Drug1,NDC,71335041601,Donepezil Hydrochloride,Ankit Market Def
7,Drug1,NDC,69452010930,Donepezil Hydrochloride,Ankit Market Def
8,Drug1,NDC,69452010919,Donepezil Hydrochloride,Ankit Market Def
9,Drug1,NDC,69452010913,Donepezil Hydrochloride,Ankit Market Def


# Diagnosis Extract

## coh_bool_dx1
Extract the subset from raven diagnosis 

**Parameters**
  * `med_dx_start_dt`
  * `med_dx_end_dt`  
  
**Input**
  * `coh_basic_ref`
  * `rwd_db.rwd.raven_external_claims_submits_diagnosis`
  
**Output**  
* `coh_basic_raven_diagnosis`


In [6]:
%%read_sql

--CREATE TRANSIENT TABLE of all claims with the codes of interest
DROP TABLE IF EXISTS coh_bool_dx1;
CREATE TRANSIENT TABLE coh_bool_dx1 as
      SELECT dx.patient_id,
             dx.claim_id,
             dx.code_type, 
             dx.diagnosis,
             dx.diagnosis_sequence,
             dx.year_of_service
        #FROM rwd_db.rwd.raven_claims_submits_diagnosis dx
        FROM rwd_db.rwd.raven_external_claims_submits_diagnosis dx
       WHERE year_of_service BETWEEN '{med_dx_start_dt}' AND '{med_dx_end_dt}'
             AND diagnosis IN (SELECT code
                                 FROM coh_bool_ref_dx1
                                WHERE code_type IN ('ICD9','ICD10'));
                                
--Add flag to identify if in final cohort
ALTER TABLE coh_bool_dx1
        ADD cohort_flag BOOLEAN default False;

Query started at 10:09:49 AM Eastern Daylight Time; Query executed in 0.04 mQuery started at 10:09:51 AM Eastern Daylight Time; Query executed in 0.22 mQuery started at 10:10:04 AM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,status
0,Statement executed successfully.


In [7]:
%%read_sql
--Check the raw counts of the initial data pull
SELECT Count(*) AS row_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct diagnosis) AS diagnosis_cnt
  FROM coh_bool_dx1

Query started at 10:10:06 AM Eastern Daylight Time; Query executed in 0.13 m

Unnamed: 0,row_cnt,patient_cnt,claim_cnt,diagnosis_cnt
0,14632483,2418894,13210674,34


In [8]:
%%read_sql
--Review counts by year and code_type
SELECT year(year_of_service) AS yr,
       code_type,
       Count(*) AS cnt
  FROM coh_bool_dx1
 GROUP BY yr, code_type
 ORDER BY yr, code_type

Query started at 10:10:14 AM Eastern Daylight Time; Query executed in 0.12 m

Unnamed: 0,yr,code_type,cnt
0,2017,ICD10,7221465
1,2017,ICD9,527
2,2017,,113455
3,2018,ICD10,7245910
4,2018,ICD9,105
5,2018,,51021


In [9]:
%%read_sql
--Review the distribution of the diagnosis counts against the table
SELECT ref.code_type,
       ref.code,
       ref.description,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_bool_ref_dx1 ref 
       LEFT JOIN coh_bool_dx1 dx
              ON ref.code = dx.diagnosis
 GROUP BY ref.code_type, ref.code, ref.description
 ORDER BY Count(*) DESC

Query started at 10:10:21 AM Eastern Daylight Time; Query executed in 0.09 m

Unnamed: 0,code_type,code,description,row_cnt,claim_cnt,patient_cnt
0,ICD10,G309,Alzheimer's disease unspecified,7067051,6325912,955202
1,ICD10,G3184,Mild cognitive impairment so stated,2263966,2168307,657752
2,ICD10,G301,Alzheimer's disease with late onset,1937059,1789603,348505
3,ICD10,G319,Degenerative disease of nervous system unspeci...,794590,764291,412483
4,ICD10,G311,Senile degeneration of brain not elsewhere cla...,618771,529148,93710
5,ICD10,G3183,Dementia with Lewy bodies,570914,524831,81139
6,ICD10,G300,Alzheimer's disease with early onset,438753,405572,93476
7,ICD10,G308,Other Alzheimer's disease,427365,396732,93544
8,ICD10,G3109,Other frontotemporal dementia,209410,192508,36366
9,ICD10,G3189,Other specified degenerative diseases of nervo...,104492,100675,53928


## coh_bool_dx2

In [10]:
%%read_sql

--CREATE TRANSIENT TABLE of all claims with the codes of interest
DROP TABLE IF EXISTS coh_bool_dx2;
CREATE TRANSIENT TABLE coh_bool_dx2 as
      SELECT dx.patient_id,
             dx.claim_id,
             dx.code_type, 
             dx.diagnosis,
             dx.diagnosis_sequence,
             dx.year_of_service
        #FROM rwd_db.rwd.raven_claims_submits_diagnosis dx
        FROM rwd_db.rwd.raven_external_claims_submits_diagnosis dx
       WHERE year_of_service BETWEEN '{med_dx_start_dt}' AND '{med_dx_end_dt}'
             AND diagnosis IN (SELECT code
                                 FROM coh_bool_ref_dx2
                                WHERE code_type IN ('ICD9','ICD10'));
                                
--Add flag to identify if in final cohort
ALTER TABLE coh_bool_dx2
        ADD cohort_flag BOOLEAN default False;

Query started at 10:10:27 AM Eastern Daylight Time; Query executed in 0.04 mQuery started at 10:10:29 AM Eastern Daylight Time; Query executed in 0.13 mQuery started at 10:10:37 AM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,status
0,Statement executed successfully.


In [11]:
%%read_sql
--Check the raw counts of the initial data pull
SELECT Count(*) AS row_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct diagnosis) AS diagnosis_cnt
  FROM coh_bool_dx2

Query started at 10:10:39 AM Eastern Daylight Time; Query executed in 0.06 m

Unnamed: 0,row_cnt,patient_cnt,claim_cnt,diagnosis_cnt
0,9593192,2154443,8699980,57


In [12]:
%%read_sql
--Review counts by year and code_type
SELECT year(year_of_service) AS yr,
       code_type,
       Count(*) AS cnt
  FROM coh_bool_dx2
 GROUP BY yr, code_type
 ORDER BY yr, code_type

Query started at 10:10:43 AM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,yr,code_type,cnt
0,2017,ICD10,4843941
1,2017,ICD9,387
2,2017,,43335
3,2018,ICD10,4687526
4,2018,ICD9,120
5,2018,,17883


In [13]:
%%read_sql
--Review the distribution of the diagnosis counts against the table
SELECT ref.code_type,
       ref.code,
       ref.description,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_bool_ref_dx2 ref 
       LEFT JOIN coh_bool_dx2 dx
              ON ref.code = dx.diagnosis
 GROUP BY ref.code_type, ref.code, ref.description
 ORDER BY Count(*) DESC

Query started at 10:10:45 AM Eastern Daylight Time; Query executed in 0.08 m

Unnamed: 0,code_type,code,description,row_cnt,claim_cnt,patient_cnt
0,ICD10,I214,Non-ST elevation (NSTEMI) myocardial infarction,5131039,4738329,1067090
1,ICD10,I213,ST elevation (STEMI) myocardial infarction of ...,1046792,993873,338513
2,ICD10,I200,Unstable angina,961981,925948,407283
3,ICD10,I248,Other forms of acute ischemic heart disease,520245,508152,236648
4,ICD9,I249,"Acute ischemic heart disease, unspecified",437300,417940,187753
5,ICD10,I249,Acute ischemic heart disease unspecified,437300,417940,187753
6,ICD10,I2119,ST elevation (STEMI) myocardial infarction inv...,349338,315702,119550
7,ICD10,I2109,ST elevation (STEMI) myocardial infarction inv...,302256,276528,102362
8,ICD10,I2102,ST elevation (STEMI) myocardial infarction inv...,175556,163238,45027
9,ICD10,I2111,ST elevation (STEMI) myocardial infarction inv...,153384,142937,44129


## coh_bool_dx_pop
Determine the overlap population

In [14]:
%%read_sql
DROP TABLE IF EXISTS coh_bool_dx_pop;
CREATE TRANSIENT TABLE coh_bool_dx_pop AS
    SELECT dx1.patient_id
      FROM coh_bool_dx1 dx1
           JOIN coh_bool_dx2 dx2
             ON dx1.patient_id = dx2.patient_id
     GROUP BY dx1.patient_id;

Query started at 10:10:50 AM Eastern Daylight Time; Query executed in 0.03 mQuery started at 10:10:51 AM Eastern Daylight Time; Query executed in 0.08 m

Unnamed: 0,status
0,Table COH_BOOL_DX_POP successfully created.


In [15]:
%%read_sql
--Review counts
SELECT Count(*) AS row_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_bool_dx_pop

Query started at 10:10:57 AM Eastern Daylight Time; Query executed in 0.08 m

Unnamed: 0,row_cnt,patient_cnt
0,133753,133753


# Proc & Rx

## coh_bool_proc_raw
raw extract of the raven pharmacy claims

In [16]:
%%read_sql
--Identify subset of procedure claims of interest for procedure & rx codes of interest
DROP TABLE IF EXISTS coh_bool_proc_raw;
CREATE TRANSIENT TABLE coh_bool_proc_raw as
      SELECT patient_id,
             claim_id,
             procedure_type,
             procedure,
             year_of_service,
             ndc
        #FROM rwd_db.rwd.raven_claims_submits_procedure
        FROM rwd_db.rwd.raven_external_claims_submits_procedure
       WHERE year_of_service BETWEEN '{proc_rx_start_dt}' AND '{proc_rx_end_dt}'
             AND patient_id IN (SELECT patient_id
                                  FROM coh_bool_dx_pop)
             AND (procedure IN (SELECT code
                                  FROM coh_bool_ref_proc_rx)
                 OR ndc IN (SELECT code
                              FROM coh_bool_ref_proc_rx));
                              
--Add flag to identify if in final cohort
ALTER TABLE coh_bool_proc_raw
        ADD cohort_flag BOOLEAN default False;

Query started at 10:11:01 AM Eastern Daylight Time; Query executed in 0.03 mQuery started at 10:11:03 AM Eastern Daylight Time; Query executed in 2.05 mQuery started at 10:13:06 AM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,status
0,Statement executed successfully.


In [17]:
%%read_sql
--Check the raw counts from the procedure data pull
SELECT Count(*)                     AS row_cnt, 
       Count(DISTINCT patient_id)   AS patient_cnt, 
       Count(DISTINCT claim_id)     AS claim_cnt, 
       Count(DISTINCT procedure)    AS procedure_cnt, 
       Count(DISTINCT ndc)          AS ndc_cnt 
  FROM coh_bool_proc_raw; 

Query started at 10:13:09 AM Eastern Daylight Time; Query executed in 0.12 m

Unnamed: 0,row_cnt,patient_cnt,claim_cnt,procedure_cnt,ndc_cnt
0,5753,1490,2793,7,186


In [18]:
%%read_sql
--Review Distribution of procedure codes
SELECT ref.category,
       ref.code_type,
       ref.code,
       ref.description,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_bool_ref_proc_rx ref 
       LEFT JOIN coh_bool_proc_raw proc
              ON ref.code = proc.ndc
 WHERE ref.code_type = 'NDC'
 GROUP BY  ref.category, ref.code_type, ref.code, ref.description
 ORDER BY Count(*) DESC;

Query started at 10:13:16 AM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,category,code_type,code,description,row_cnt,claim_cnt,patient_cnt
0,Drug,NDC,00904640861,Donepezil Hydrochloride,412,224,157
1,Drug,NDC,00904640961,Donepezil Hydrochloride,344,200,133
2,Drug,NDC,00904650561,Memantine Hydrochloride,269,119,71
3,Drug,NDC,43547027611,Donepezil Hydrochloride,216,159,48
4,Drug,NDC,13668010290,Donepezil Hydrochloride,191,72,36
5,Drug,NDC,00904650661,Memantine Hydrochloride,158,82,62
6,Drug,NDC,00904647861,Donepezil Hydrochloride,158,79,57
7,Drug,NDC,13668010390,Donepezil Hydrochloride,137,61,44
8,Drug,NDC,60687018457,Memantine Hydrochloride,131,65,30
9,Drug,NDC,60687029211,Donepezil Hydrochloride,116,62,47


In [19]:
%%read_sql
--Review the distribution of the diagnosis counts against the table
SELECT ref.category,
       ref.code_type,
       ref.code,
       ref.description,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_bool_ref_proc_rx ref 
       LEFT JOIN coh_bool_proc_raw proc
              ON ref.code = proc.procedure
 WHERE ref.code_type <> 'NDC'
 GROUP BY  ref.category, ref.code_type, ref.code, ref.description
 ORDER BY Count(*) DESC;


Query started at 10:13:18 AM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,category,code_type,code,description,row_cnt,claim_cnt,patient_cnt
0,Eval,CPT,99483,Cognition focused evaluation,401,382,259


## coh_bool_pharmacy_raw
Identify patients from Raven Pharmacy using the codes of interest

In [20]:
%%read_sql
--Create pharmacy table with codes of interest
DROP TABLE IF EXISTS coh_bool_pharmacy_raw;
CREATE TRANSIENT TABLE coh_bool_pharmacy_raw AS
    SELECT patient_id,
           claim_id,
           prescriber_id            AS npi, 
           product_or_service_id    AS ndc, 
           date_of_service
      #FROM rwd_db.rwd.raven_pharmacy 
      FROM rwd_db.rwd.raven_external_pharmacy
     WHERE date_of_service BETWEEN '{proc_rx_start_dt}' AND '{proc_rx_end_dt}'
           AND patient_id IN (SELECT patient_id
                                FROM coh_bool_dx_pop)
           AND product_or_service_id IN (SELECT code
                                           FROM coh_bool_ref_proc_rx
                                          WHERE code_type = 'NDC');

--Add flag to identify if in final cohort
ALTER TABLE coh_bool_pharmacy_raw
        ADD cohort_flag BOOLEAN default False;
        
--Add flag to identify if healthbase marked the npi as a provider
ALTER TABLE coh_bool_pharmacy_raw
        ADD hb_provider_flag BOOLEAN default False;        

Query started at 10:13:20 AM Eastern Daylight Time; Query executed in 0.04 mQuery started at 10:13:22 AM Eastern Daylight Time; Query executed in 2.45 mQuery started at 10:15:49 AM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,status
0,Statement executed successfully.


In [68]:
%%read_sql
DESC TABLE coh_bool_pharmacy_raw;

Query started at 01:41:18 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,name,type,kind,null?,default,primary key,unique key,check,expression,comment
0,PATIENT_ID,"NUMBER(38,0)",COLUMN,Y,,N,N,,,
1,CLAIM_ID,"NUMBER(18,0)",COLUMN,Y,,N,N,,,
2,NPI,VARCHAR(15),COLUMN,Y,,N,N,,,
3,NDC,VARCHAR(19),COLUMN,Y,,N,N,,,
4,DATE_OF_SERVICE,DATE,COLUMN,Y,,N,N,,,
5,COHORT_FLAG,BOOLEAN,COLUMN,Y,False,N,N,,,
6,HB_PROVIDER_FLAG,BOOLEAN,COLUMN,Y,False,N,N,,,


In [71]:
%%read_sql
--Update healthbase flag
BEGIN;
UPDATE coh_bool_pharmacy_raw
   SET hb_provider_flag = TRUE
 WHERE npi IN (SELECT Cast(npi AS VARCHAR(15))
                 FROM rwd_db.rwd.healthbase_practitioner);
COMMIT;

--Review Counts
SELECT hb_provider_flag,
       Count(*) AS cnt
  FROM coh_bool_pharmacy_raw
 GROUP BY hb_provider_flag

Query started at 01:43:36 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:43:38 PM Eastern Daylight Time; Query executed in 0.08 mQuery started at 01:43:43 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:43:45 PM Eastern Daylight Time; Query executed in 0.05 m

Unnamed: 0,hb_provider_flag,cnt
0,True,207865
1,False,1871


In [21]:
%%read_sql
SELECT Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct ndc) AS ndc_cnt
  FROM coh_bool_pharmacy_raw

Query started at 10:15:51 AM Eastern Daylight Time; Query executed in 0.13 m

Unnamed: 0,row_cnt,claim_cnt,patient_cnt,ndc_cnt
0,209736,207808,15825,325


In [22]:
%%read_sql
--Review the distribution of the diagnosis counts against the table
SELECT ref.category,
       ref.code_type,
       ref.code,
       ref.description,
       Count(*) AS row_cnt,
       Count(*) / (SELECT Count(*)
                     FROM coh_bool_pharmacy_raw) AS row_pct,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_bool_ref_proc_rx ref 
       LEFT JOIN coh_bool_pharmacy_raw rx
              ON ref.code = rx.ndc
 WHERE ref.code_type = 'NDC'
 GROUP BY  ref.category, ref.code_type, ref.code, ref.description
 ORDER BY Count(*) DESC;

Query started at 10:15:59 AM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,category,code_type,code,description,row_cnt,row_pct,claim_cnt,patient_cnt
0,Drug,NDC,43547027609,Donepezil Hydrochloride,18863,0.089937,18748,2637
1,Drug,NDC,43547027611,Donepezil Hydrochloride,13543,0.064572,13435,1955
2,Drug,NDC,43547027509,Donepezil Hydrochloride,11080,0.052828,11016,2143
3,Drug,NDC,00456342833,Memantine Hydrochloride,10561,0.050354,10526,1085
4,Drug,NDC,13668010310,Donepezil Hydrochloride,9493,0.045262,9402,1510
5,Drug,NDC,29300017216,Memantine Hydrochloride,9374,0.044694,9267,1432
6,Drug,NDC,43547027603,Donepezil Hydrochloride,8148,0.038849,8076,1348
7,Drug,NDC,43547027503,Donepezil Hydrochloride,6390,0.030467,6327,1358
8,Drug,NDC,00456122830,Donepezil Hydrochloride/Memantine Hydrochloride,6379,0.030414,6307,612
9,Drug,NDC,47335032286,Memantine Hydrochloride,5994,0.028579,5958,922


# Final populations
Create the summary Statistics for all of the populations

* coh_bool_dx1
* coh_bool_dx2
* coh_bool_proc_raw
* coh_bool_pharmacy_raw

Need overlap of everything and counts of 
* claims
* patients
* Unique hcps
* emr overlap


## coh_bool_pt
Create the patient population

In [78]:
%%read_sql
DROP TABLE IF EXISTS coh_bool_pt;
CREATE TRANSIENT TABLE coh_bool_pt AS 
    SELECT dx1.patient_id
      FROM coh_bool_dx1 dx1
           JOIN coh_bool_dx2 dx2
             ON dx1.patient_id = dx2.patient_id
     WHERE dx1.patient_id IN (SELECT patient_id
                                FROM coh_bool_proc_raw)
           OR dx1.patient_id IN (SELECT patient_id
                                   FROM coh_bool_pharmacy_raw)
   GROUP BY dx1.patient_id;

--Add column for EHR overlap           
ALTER TABLE coh_bool_pt
        ADD ehr_overlap BOOLEAN DEFAULT False;

--Update EHR overlap column
BEGIN;
UPDATE coh_bool_pt coh
   SET coh.ehr_overlap = True
  FROM rwd_db.rwd.raven_patient_demographics demo
 WHERE coh.patient_id = demo.patient_id
       AND demo.source_of_patient IN ('CLAIMS AND EHR','EHR');
COMMIT;

Query started at 01:46:18 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:46:20 PM Eastern Daylight Time; Query executed in 0.14 mQuery started at 01:46:29 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:46:31 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:46:33 PM Eastern Daylight Time; Query executed in 0.53 mQuery started at 01:47:05 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,status
0,Statement executed successfully.


In [79]:
%%read_sql
--Review EHR Overlap
SELECT ehr_overlap,
       Count(*) as cnt,
       Count(*) / (SELECT Count(*)
                     FROM coh_bool_pt_yr) AS pct
  FROM coh_bool_pt
 GROUP BY ehr_overlap;

Query started at 01:47:07 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,ehr_overlap,cnt,pct
0,False,7386,0.219247
1,True,9458,0.280753


## coh_bool_pt_yr
Identify all of the unique years where a patient's data may have appeared

In [80]:
%%read_sql

--Create all unique years that are possible from data
CREATE OR REPLACE TEMP TABLE tmp_unique_yrs AS
    SELECT year(year_of_service) AS yr
      FROM coh_bool_dx1
     UNION
    SELECT year(year_of_service) AS yr
      FROM coh_bool_dx2
     UNION
    SELECT year(year_of_service) AS yr
      FROM coh_bool_proc_raw 
     UNION
    SELECT year(date_of_service) AS yr
      FROM coh_bool_pharmacy_raw;

--Create final table
DROP TABLE IF EXISTS coh_bool_pt_yr;
CREATE TRANSIENT TABLE coh_bool_pt_yr AS
    SELECT pt.patient_id,
           yr.yr
      FROM coh_bool_pt pt
           JOIN tmp_unique_yrs yr;
           


Query started at 01:47:09 PM Eastern Daylight Time; Query executed in 0.06 mQuery started at 01:47:13 PM Eastern Daylight Time; Query executed in 0.04 mQuery started at 01:47:15 PM Eastern Daylight Time; Query executed in 0.05 m

Unnamed: 0,status
0,Table COH_BOOL_PT_YR successfully created.


## cohort_flag Update
Update the raw data extracts with a flag to identify claims for patients in the final cohort

In [81]:
%%read_sql

--Diagnosis Code Group 1
BEGIN;
UPDATE coh_bool_dx1 dx1
   SET dx1.cohort_flag = True
 WHERE dx1.patient_id IN (SELECT patient_id
                            FROM coh_bool_pt);
COMMIT;

--Diagnosis Code Group2
BEGIN;
UPDATE coh_bool_dx2 dx2
   SET dx2.cohort_flag = True
 WHERE dx2.patient_id IN (SELECT patient_id
                            FROM coh_bool_pt);
COMMIT;

--Procedures
BEGIN;
UPDATE coh_bool_proc_raw proc
   SET proc.cohort_flag = True
 WHERE proc.patient_id IN (SELECT patient_id
                             FROM coh_bool_pt);
COMMIT;

--Pharmacy
BEGIN;
UPDATE coh_bool_pharmacy_raw rx
   SET rx.cohort_flag = True
 WHERE rx.patient_id IN (SELECT patient_id
                           FROM coh_bool_pt);
COMMIT;

Query started at 01:47:18 PM Eastern Daylight Time; Query executed in 0.04 mQuery started at 01:47:20 PM Eastern Daylight Time; Query executed in 0.13 mQuery started at 01:47:28 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:47:30 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:47:32 PM Eastern Daylight Time; Query executed in 0.10 mQuery started at 01:47:38 PM Eastern Daylight Time; Query executed in 0.04 mQuery started at 01:47:40 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:47:42 PM Eastern Daylight Time; Query executed in 0.10 mQuery started at 01:47:48 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:47:50 PM Eastern Daylight Time; Query executed in 0.05 mQuery started at 01:47:53 PM Eastern Daylight Time; Query executed in 0.08 mQuery started at 01:47:58 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,status
0,Statement executed successfully.


## coh_bool_provider_raw
Get the providers for the claims

In [82]:
%%read_sql
DROP TABLE IF EXISTS coh_bool_claim;
CREATE TRANSIENT TABLE coh_bool_claim AS
    SELECT patient_id, 
           claim_id, 
           1 AS dx1,
           0 AS dx2,
           0 AS proc
      FROM coh_bool_dx1
     UNION 
    SELECT patient_id, 
           claim_id,
           0 AS dx1,
           1 AS dx2,
           0 AS proc
      FROM coh_bool_dx2
     UNION 
    SELECT patient_id, 
           claim_id,
           0 AS dx1,
           0 AS dx2,
           1 AS proc
      FROM coh_bool_proc_raw;

BEGIN;
DELETE FROM coh_bool_claim 
WHERE Not(patient_id in (SELECT patient_id
                            FROM coh_bool_pt));
COMMIT;

Query started at 01:48:00 PM Eastern Daylight Time; Query executed in 0.04 mQuery started at 01:48:02 PM Eastern Daylight Time; Query executed in 0.11 mQuery started at 01:48:09 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:48:11 PM Eastern Daylight Time; Query executed in 0.11 mQuery started at 01:48:17 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,status
0,Statement executed successfully.


In [83]:
%%read_sql
SELECT Count(Distinct patient_id) pt_cnt,
       Count(*) AS row_cnt
  FROM coh_bool_pt

Query started at 01:48:19 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,pt_cnt,row_cnt
0,16844,16844


In [84]:
%%read_sql                                   
DROP TABLE IF EXISTS coh_bool_provider_raw;
CREATE TRANSIENT TABLE coh_bool_provider_raw AS 
  SELECT patient_id, 
         claim_id, 
         year_of_service, 
         provider_type, 
         provider_npi
    #FROM rwd_db.rwd.raven_claims_submits_provider
    FROM rwd_db.rwd.raven_external_claims_submits_provider
   WHERE year_of_service BETWEEN Least('{med_dx_start_dt}','{proc_rx_start_dt}') 
                                 AND Greatest('{med_dx_end_dt}','{proc_rx_end_dt}')
         AND claim_id IN (SELECT claim_id 
                            FROM coh_bool_claim)
         AND entity_type_code = 1; 

--Add flag to identify if healthbase marked the npi as a provider
ALTER TABLE coh_bool_provider_raw
        ADD hb_provider_flag BOOLEAN default False;  

Query started at 01:48:21 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:48:23 PM Eastern Daylight Time; Query executed in 1.63 mQuery started at 01:50:01 PM Eastern Daylight Time; Query executed in 0.05 m

Unnamed: 0,status
0,Statement executed successfully.


In [85]:
%%read_sql
--Update healthbase flag
BEGIN;
UPDATE coh_bool_provider_raw
   SET hb_provider_flag = TRUE
 WHERE provider_npi IN (SELECT Cast(npi AS VARCHAR(15))
                          FROM rwd_db.rwd.healthbase_practitioner);
COMMIT;

--Review Counts
SELECT hb_provider_flag,
       Count(*) AS cnt
  FROM coh_bool_provider_raw
 GROUP BY hb_provider_flag

Query started at 01:50:04 PM Eastern Daylight Time; Query executed in 0.04 mQuery started at 01:50:06 PM Eastern Daylight Time; Query executed in 0.15 mQuery started at 01:50:15 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:50:17 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,hb_provider_flag,cnt
0,False,188
1,True,249218


In [86]:
%%read_sql
SELECT Count(*) AS row_cnt,
       Count(Distinct patient_id) AS pt_cnt,
       Count(Distinct claim_id) AS claim_cnt
  FROM coh_bool_provider_raw;

Query started at 01:50:19 PM Eastern Daylight Time; Query executed in 0.05 m

Unnamed: 0,row_cnt,pt_cnt,claim_cnt
0,249406,16823,173312


# Yearly Counts
Figure out the yearly Counts

## Summarize Data
Summarize data at the riht level

### DX1

In [87]:
%%read_sql
--Diagnosis Code group 1 counts
CREATE OR REPLACE TEMP TABLE tmp_dx1_cnt AS
    SELECT year(year_of_service) AS yr,
           Count(Distinct patient_id) AS dx1_pt_cnt,
           Count(Distinct patient_id, year_of_service) AS dx1_claim_cnt
      FROM coh_bool_dx1
     WHERE cohort_flag = TRUE
     GROUP BY yr;

--Summary Counts
SELECT *
  FROM tmp_dx1_cnt
 ORDER BY yr;

Query started at 01:50:22 PM Eastern Daylight Time; Query executed in 0.05 mQuery started at 01:50:25 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,yr,dx1_pt_cnt,dx1_claim_cnt
0,2017,11090,40342
1,2018,11376,43302


### DX2

In [88]:
%%read_sql
--Diagnosis Code group 2 counts
CREATE OR REPLACE TEMP TABLE tmp_dx2_cnt AS
    SELECT year(year_of_service) AS yr,
           Count(Distinct patient_id) AS dx2_pt_cnt, 
           Count(Distinct patient_id, year_of_service) AS dx2_claim_cnt
      FROM coh_bool_dx2
     WHERE cohort_flag = True
     GROUP BY yr;
     
--Summary Counts
SELECT *
  FROM tmp_dx2_cnt
 ORDER BY yr;    

Query started at 01:50:27 PM Eastern Daylight Time; Query executed in 0.08 mQuery started at 01:50:32 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,yr,dx2_pt_cnt,dx2_claim_cnt
0,2017,9432,24434
1,2018,8973,24187


### Proc

In [89]:
%%read_sql
--Procedure counts (cpt, icd10, hcpcs)
CREATE OR REPLACE TEMP TABLE tmp_proc_cnt AS
    SELECT year(proc.year_of_service) AS yr,
           Count(Distinct proc.patient_id) AS proc_pt_cnt,
           Count(Distinct proc.patient_id, proc.year_of_service) AS proc_claim_cnt
      FROM coh_bool_proc_raw proc
           JOIN coh_bool_ref_proc_rx ref
             ON ref.code = proc.procedure
     WHERE ref.code_type <> 'NDC'
           AND proc.cohort_flag = True
     GROUP BY yr;

--Summary Counts
SELECT *
  FROM tmp_proc_cnt
 ORDER BY yr; 

Query started at 01:50:34 PM Eastern Daylight Time; Query executed in 0.06 mQuery started at 01:50:37 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,yr,proc_pt_cnt,proc_claim_cnt
0,2018,259,299


### Med NDC

In [90]:
%%read_sql
--ndc counts from procedure claims
CREATE OR REPLACE TEMP TABLE tmp_proc_ndc_cnt AS
    SELECT year(proc.year_of_service) AS yr,
           Count(Distinct proc.patient_id) AS proc_ndc_pt_cnt,
           Count(Distinct proc.patient_id, proc.year_of_service) AS proc_ndc_claim_cnt
      FROM coh_bool_proc_raw proc
           JOIN coh_bool_ref_proc_rx ref
             ON proc.ndc = ref.code
     WHERE ref.code_type = 'NDC'    
           AND proc.cohort_flag = True
    GROUP BY  yr;
    
--Summary Counts
SELECT *
  FROM tmp_proc_ndc_cnt
 ORDER BY yr; 

Query started at 01:50:39 PM Eastern Daylight Time; Query executed in 0.05 mQuery started at 01:50:42 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,yr,proc_ndc_pt_cnt,proc_ndc_claim_cnt
0,2017,612,1667
1,2018,717,1973


### Pharmacy NDC patients & Claims

In [91]:
%%read_sql
--pharmacy counts
CREATE OR REPLACE TEMP TABLE tmp_phar_pt_cnt AS
    SELECT year(date_of_service) AS yr,
           Count(Distinct patient_id) AS phar_ndc_pt_cnt,
           Count(Distinct patient_id, date_of_service) AS phar_ndc_claim_cnt
      FROM coh_bool_pharmacy_raw rx
           JOIN coh_bool_ref_proc_rx ref
             ON rx.ndc = ref.code
     WHERE ref.code_type = 'NDC'
           AND rx.cohort_flag = True
     GROUP BY yr;

--Summary Counts
SELECT *
  FROM tmp_phar_pt_cnt
 ORDER BY yr; 

Query started at 01:50:44 PM Eastern Daylight Time; Query executed in 0.06 mQuery started at 01:50:47 PM Eastern Daylight Time; Query executed in 0.03 m

Unnamed: 0,yr,phar_ndc_pt_cnt,phar_ndc_claim_cnt
0,2017,12201,83651
1,2018,10308,64347


### Pharmacy npi cnts

In [92]:
%%read_sql
--pharmacy counts
CREATE OR REPLACE TEMP TABLE tmp_phar_npi_cnt AS
    SELECT year(date_of_service) AS yr,
           Count(Distinct rx.npi) AS phar_npi_cnt
      FROM coh_bool_pharmacy_raw rx
           JOIN coh_bool_ref_proc_rx ref
             ON rx.ndc = ref.code
     WHERE ref.code_type = 'NDC'
           AND rx.cohort_flag = True
     GROUP BY  yr;

--Summary Counts
SELECT *
  FROM tmp_phar_npi_cnt
 ORDER BY yr; 

Query started at 01:50:49 PM Eastern Daylight Time; Query executed in 0.06 mQuery started at 01:50:53 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,yr,phar_npi_cnt
0,2017,14217
1,2018,12090


### Medical Providers

In [93]:
%%read_sql
  --Provider Count from medical claims
CREATE OR REPLACE TEMP TABLE tmp_med_npi_cnt AS
SELECT Year(year_of_service) AS yr,
       Count(Distinct provider_npi) med_npi_cnt
  FROM coh_bool_provider_raw
 GROUP BY yr;
 
--Review stats
SELECT *
  FROM tmp_med_npi_cnt

Query started at 01:50:55 PM Eastern Daylight Time; Query executed in 0.05 mQuery started at 01:50:58 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,yr,med_npi_cnt
0,2017,33141
1,2018,32589


### All Provider Count

In [94]:
%%read_sql       
  
--Count of providers from medical and pharmacy claims
CREATE OR REPLACE TEMP TABLE tmp_all_provider_cnt AS
    SELECT year(svc_dt) AS yr,
           Count(Distinct npi) AS total_npi_cnt
     FROM (SELECT provider_npi AS npi,
                  year_of_service AS svc_dt
             FROM coh_bool_provider_raw
            UNION 
           SELECT npi AS npi,
                  date_of_service AS svc_dt
             FROM coh_bool_pharmacy_raw)
     GROUP BY yr;
 
--Review Counts
SELECT *
  FROM tmp_all_provider_cnt;

Query started at 01:51:00 PM Eastern Daylight Time; Query executed in 0.05 mQuery started at 01:51:04 PM Eastern Daylight Time; Query executed in 0.05 m

Unnamed: 0,yr,total_npi_cnt
0,2018,39807
1,2017,41693


## Merge
Merge Everything together


### Overall Counts
Counts across all years

In [95]:
%%read_sql all_time_df
--Get the high level counts across all years
CREATE OR REPLACE TEMP TABLE coh_bool_stats_alltime AS
    SELECT  --All Years
            'All time'                                AS yr,

            --Patient Count
            (SELECT Count(DISTINCT patient_id) 
               FROM coh_bool_pt)                       AS pt_cnt, 

            --Medical Claim Counts Total (DX + Proc)
            (SELECT Count(DISTINCT patient_id, claim_id) 
               FROM coh_bool_claim 
              WHERE patient_id IN (SELECT patient_id 
                                    FROM coh_bool_pt)) AS med_claim_cnt, 

            --Pharmacy Claim Counts Total                        
            (SELECT Count(DISTINCT patient_id, claim_id) 
               FROM coh_bool_pharmacy_raw 
              WHERE patient_id IN (SELECT patient_id 
                                    FROM coh_bool_pt)) AS pharmacy_claim_cnt,

           --EHR Overlap Counts
           (SELECT Count(DISTINCT patient_id) 
              FROM coh_bool_pt 
             WHERE ehr_overlap = true)                 AS ehr_overlap,

           --Procedure Overall Counts
          (SELECT Count(Distinct proc.patient_id)   
             FROM coh_bool_proc_raw proc
                  JOIN coh_bool_ref_proc_rx ref
                    ON ref.code = proc.procedure
            WHERE ref.code_type <> 'NDC'
                  AND proc.cohort_flag = True)          AS proc_pt_cnt,

         --Procedure Claim Counts
         (SELECT Count(Distinct proc.patient_id, proc.year_of_service)
            FROM coh_bool_proc_raw proc
                 JOIN coh_bool_ref_proc_rx ref
                   ON ref.code = proc.procedure
           WHERE ref.code_type <> 'NDC'
                 AND proc.cohort_flag = True)          AS proc_ndc_cnt,

        --Procedure NDC Patient Counts
        (SELECT Count(Distinct proc.patient_id)        
          FROM coh_bool_proc_raw proc
               JOIN coh_bool_ref_proc_rx ref
                 ON proc.ndc = ref.code
         WHERE ref.code_type = 'NDC'    
               AND proc.cohort_flag = True) AS proc_ndc_pt_cnt,

         --Procedure NDC Claim Counts
        (SELECT Count(Distinct proc.patient_id, proc.year_of_service)
          FROM coh_bool_proc_raw proc
               JOIN coh_bool_ref_proc_rx ref
                 ON proc.ndc = ref.code
         WHERE ref.code_type = 'NDC'    
               AND proc.cohort_flag = True) AS proc_ndc_claim_cnt,
        
        --Pharmacy Patient Count
        (SELECT Count(Distinct patient_id)      
           FROM coh_bool_pharmacy_raw rx
          WHERE rx.cohort_flag = True) AS phar_ndc_pt_cnt,
        
        --Pharmacy Claim Count
        (SELECT Count(Distinct patient_id, date_of_service) 
           FROM coh_bool_pharmacy_raw
          WHERE cohort_flag = True) AS phar_ndc_claim_cnt,
        
        --Provider Count from medical claims
        (SELECT Count(Distinct provider_npi) 
           FROM coh_bool_provider_raw) AS med_claim_provider_cnt,
        
        --Provider Count from pharmacy claims
        (SELECT Count(Distinct npi)
           FROM coh_bool_pharmacy_raw
          WHERE cohort_flag = True) AS pharmacy_claim_provider_cnt,
          
        --Count of providers from medical and pharmacy claims
        (SELECT Count(Distinct npi)
        FROM (SELECT provider_npi AS npi
                FROM coh_bool_provider_raw
               UNION 
              SELECT npi
                FROM coh_bool_pharmacy_raw)) as provider_total_cnt;

SELECT *
  FROM coh_bool_stats_alltime;


Query started at 01:51:06 PM Eastern Daylight Time; Query executed in 0.12 mQuery started at 01:51:14 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,yr,pt_cnt,med_claim_cnt,pharmacy_claim_cnt,ehr_overlap,proc_pt_cnt,proc_ndc_cnt,proc_ndc_pt_cnt,proc_ndc_claim_cnt,phar_ndc_pt_cnt,phar_ndc_claim_cnt,med_claim_provider_cnt,pharmacy_claim_provider_cnt,provider_total_cnt
0,All time,16844,187564,207808,9458,259,299,1234,3640,15825,147998,54447,19785,64856


### Yearly Counts
Identify the diagnosis code overlap

In [96]:
%%read_sql
--Final Summary of yearly data\
DROP TABLE IF EXISTS coh_bool_stats_yr;
CREATE TRANSIENT TABLE coh_bool_stats_yr AS
    SELECT DISTINCT pt.yr,
                    dx1.dx1_pt_cnt,
                    dx1.dx1_claim_cnt,
                    dx2.dx2_pt_cnt,
                    dx2.dx2_claim_cnt,
                    proc.proc_pt_cnt,
                    proc.proc_claim_cnt,
                    procndc.proc_ndc_pt_cnt,
                    procndc.proc_ndc_claim_cnt,
                    phar.phar_ndc_pt_cnt,
                    phar.phar_ndc_claim_cnt,
                    pharnpi.phar_npi_cnt,
                    mednpi.med_npi_cnt,
                    allnpi.total_npi_cnt
      FROM coh_bool_pt_yr pt
           LEFT JOIN tmp_dx1_cnt dx1
                  ON pt.yr = dx1.yr
           LEFT JOIN tmp_dx2_cnt dx2
                  ON pt.yr = dx2.yr
           LEFT JOIN tmp_proc_cnt proc
                  ON pt.yr = proc.yr
           LEFT JOIN tmp_proc_ndc_cnt procndc
                  ON pt.yr = procndc.yr
           LEFT JOIN tmp_phar_pt_cnt phar
                  ON pt.yr = phar.yr
           LEFT JOIN tmp_phar_npi_cnt pharnpi
                  ON pt.yr = pharnpi.yr
           LEFT JOIN tmp_med_npi_cnt mednpi
                  ON pt.yr = mednpi.yr
           LEFT JOIN tmp_all_provider_cnt allnpi
                  ON pt.yr = allnpi.yr

Query started at 01:51:16 PM Eastern Daylight Time; Query executed in 0.04 mQuery started at 01:51:18 PM Eastern Daylight Time; Query executed in 0.07 m

Unnamed: 0,status
0,Table COH_BOOL_STATS_YR successfully created.


In [97]:
%%read_sql
SELECT *
  FROM coh_bool_stats_yr

Query started at 01:51:23 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,yr,dx1_pt_cnt,dx1_claim_cnt,dx2_pt_cnt,dx2_claim_cnt,proc_pt_cnt,proc_claim_cnt,proc_ndc_pt_cnt,proc_ndc_claim_cnt,phar_ndc_pt_cnt,phar_ndc_claim_cnt,phar_npi_cnt,med_npi_cnt,total_npi_cnt
0,2018,11376,43302,8973,24187,259.0,299.0,717,1973,10308,64347,12090,32589,39807
1,2017,11090,40342,9432,24434,,,612,1667,12201,83651,14217,33141,41693


## Output to Excel
Create an output for excel

In [98]:
#Pull down annual stats
df_year = snow.select("SELECT * FROM coh_bool_stats_yr")
df_year = df_year.fillna(0)
df_year.columns = ['Year',
                   'Diagnosis Group 1 Patient Count','Diagnosis Group 1 Claim Count',
                   'Diagnosis Group 2 Patient Count','Diagnosis Group 2 Claim Count',
                   'Procedure Patient Count','Procedure Claim Count',
                   'Procedure NDC Patient Count','Procedure NDC Claim Count',
                   'Pharmacy Patient Count','Pharmacy Claim Count',
                   'Pharmacy Provider Count', 'Medical Provider Count', 'Total Provider Count']

In [99]:
#Transpose to make eaiser to read
df_year_t = df_year.transpose()
df_year_t.reset_index(inplace=True)
df_year_t

Unnamed: 0,index,0,1
0,Year,2017.0,2018.0
1,Diagnosis Group 1 Patient Count,11090.0,11376.0
2,Diagnosis Group 1 Claim Count,40342.0,43302.0
3,Diagnosis Group 2 Patient Count,9432.0,8973.0
4,Diagnosis Group 2 Claim Count,24434.0,24187.0
5,Procedure Patient Count,0.0,259.0
6,Procedure Claim Count,0.0,299.0
7,Procedure NDC Patient Count,612.0,717.0
8,Procedure NDC Claim Count,1667.0,1973.0
9,Pharmacy Patient Count,12201.0,10308.0


In [100]:
#Pull All time numbers
df_all = snow.select("SELECT * FROM coh_bool_stats_alltime")
df_all = df_all.fillna(0)
df_all.columns = ['Year','Patient Count',
                  'Medical Claim Counts','Pharmacy Claim Counts','EHR Overlap',
                  'Procedure Patient Count','Procedure Claim Count',
                  'Procedure NDC Patient Count','Procedure NDC Claim Count',
                  'Pharmacy Patient Count','Pharmacy Claim Count',
                  'Medical Provider Count','Pharmacy Provider Count', 'Total Provider Count']

In [101]:
#Transpose all time numbers
df_all_t = df_all.transpose()
df_all_t.reset_index(inplace=True)
df_all_t.columns = ['Category','Count']
df_all_t

Unnamed: 0,Category,Count
0,Year,All time
1,Patient Count,16844
2,Medical Claim Counts,187564
3,Pharmacy Claim Counts,207808
4,EHR Overlap,9458
5,Procedure Patient Count,259
6,Procedure Claim Count,299
7,Procedure NDC Patient Count,1234
8,Procedure NDC Claim Count,3640
9,Pharmacy Patient Count,15825


In [102]:
#Export to excel spreadsheet
make_xlsx([df_all_t,df_year_t],
          '../../out/coh/cohort_bool_stats.xlsx',
          workbook_title = 'Pre-Sales Scoping Semi-Custom Output',
          sheet_names = ['Overall Counts','Yearly Counts'],
          sheet_titles = ['Overall Counts','Yearly Counts'],
          sheet_descriptions = ['All Time Counts','Yearly Counts'])