# Patient Basic
Create patient cohort of all patients who have at least 1 claim wtih a code in `ref_cohort` between the dates of interest.

**Script**
* [scripts/coh/coh_basic.ipynb](./scripts/coh/coh_basic.ipynb)
 
**Parameters**
* `in/coh/coh_basic.xlsx[param]`
* `in/coh/coh_basic.xlsx[ref]`

**Input**
* None

**Output**
* `coh_basic_ref`
* `coh_basic_raven_diagnosis`
* `coh_basic_raven_pharmacy`
* `coh_basic_raven_procedure`
* `coh_pt`
* `coh_claim`

**Review**
* [scripts/coh/coh_basic.html](./scripts/coh/coh_basic.html)

In [1]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications

# Parameters
Python dictionary of parameters for the cohort basic procedure

**Input**  
* `in/coh/coh_basic.xlsx[param]`

**Output**
* Python variables named after parameters with the value


In [2]:
#Create system variables from excel into script and review values in dictionary
input_df = pd.read_excel('../../in/coh/coh_basic.xlsx', sheet_name='param', skiprows=4, dtype=str)
var_dict = dict(zip(input_df.parameter, input_df.value))
for key,val in var_dict.items(): exec(key + '=val')

#Check inputs
pd.DataFrame.from_dict(var_dict, orient='index')

Unnamed: 0,0
med_dx_start_dt,2015-01-01
med_dx_end_dt,2018-12-31
med_proc_start_dt,2017-01-01
med_proc_end_dt,2017-12-31
phar_rx_start_dt,2017-01-01
phar_rx_end_dt,2017-12-31


# Reference
Uploads reference table to snowflake used to identify the cohort

**Input**  
  * `in/coh/coh_basic.xlsx[ref]`

**Output**  
* `coh_basic_ref`

In [3]:
#Upload reference table from excel to snowflake and review snowflake output
df = pd.read_excel('../../in/coh/coh_basic.xlsx', sheet_name='ref', skiprows=4, dtype=str)

#Strip white space, make referrable columns uppercase, and remove non alphanumeric values from coce
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
df[['data_asset','code_type','code']] =  \
   df[['data_asset','code_type','code']].apply(lambda x: x.str.upper() if x.dtype == "object" else x)
df.code= df.code.str.replace(r'\W+','').astype('str')

#Upload to snowflake
snow.drop_table("coh_basic_ref")
snow.upload_dataframe(df,"coh_basic_ref")
snow.select("SELECT * FROM coh_basic_ref")

DROP TABLE IF EXISTS ref_db.semi_custom.coh_basic_ref;
Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
Table ref_db.semi_custom.coh_basic_ref dropped! (╯°□°）╯︵ ┻━┻
Upload into ref_db.semi_custom.coh_basic_ref successful! ┬──┬◡ﾉ(°-°ﾉ)


Unnamed: 0,data_asset,code_type,code,description,source
0,MEDICAL,ICD9,34700,Narcolepsy w/o cataplexy,DRG Treatment algorithms from Tamara Blutstein
1,MEDICAL,ICD9,34701,Narcolepsy w/cataplexy,DRG Treatment algorithms from Tamara Blutstein
2,MEDICAL,ICD9,3471,Narcolepsy in conditions classifed elsewhere,DRG Treatment algorithms from Tamara Blutstein
3,MEDICAL,ICD9,34711,Narcolepsy w/cataplexy in conditions classifed...,DRG Treatment algorithms from Tamara Blutstein
4,MEDICAL,ICD10,G474,Narcolepsy and cataplexy,DRG Treatment algorithms from Tamara Blutstein
5,MEDICAL,ICD10,G4741,Narcolepsy,DRG Treatment algorithms from Tamara Blutstein
6,MEDICAL,ICD10,G47411,Narcolepsy w/cataplexy,DRG Treatment algorithms from Tamara Blutstein
7,MEDICAL,ICD10,G47419,Narcolepsy w/o cataplexy,DRG Treatment algorithms from Tamara Blutstein
8,MEDICAL,ICD10,G4742,Narcolepsy in conditions classifed elsewhere,DRG Treatment algorithms from Tamara Blutstein
9,MEDICAL,ICD10,G47421,Narcolepsy in conditions classifed elsewhere w...,DRG Treatment algorithms from Tamara Blutstein


# Extract 
Extract raw data using the reference table and parameters for this module

## Diagnosis
Extract the subset from raven diagnosis 

**Parameters**
  * `med_dx_start_dt`
  * `med_dx_end_dt`  
  
**Input**
  * `coh_basic_ref`
  * `rwd_db.rwd.raven_external_claims_submits_diagnosis`
  
**Output**  
* `coh_basic_raven_diagnosis`


In [4]:
%%read_sql

--CREATE TRANSIENT TABLE of all claims with the codes of interest
DROP TABLE IF EXISTS coh_basic_raven_diagnosis;
CREATE TRANSIENT TABLE coh_basic_raven_diagnosis as
      SELECT dx.patient_id,
             dx.claim_id,
             dx.code_type, 
             dx.diagnosis,
             dx.diagnosis_sequence,
             dx.year_of_service
        #FROM rwd_db.rwd.raven_claims_submits_diagnosis dx
        FROM rwd_db.rwd.raven_external_claims_submits_diagnosis dx
       WHERE year_of_service BETWEEN '{med_dx_start_dt}' AND '{med_dx_end_dt}'
             AND diagnosis IN (SELECT code
                                 FROM coh_basic_ref
                                WHERE data_asset = 'MEDICAL'
                                      AND code_type IN ('ICD9','ICD10'));

Query started at 01:40:22 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:40:24 PM Eastern Daylight Time; Query executed in 0.23 m

Unnamed: 0,status
0,Table COH_BASIC_RAVEN_DIAGNOSIS successfully c...


In [5]:
%%read_sql
--Check the raw counts of the initial data pull
SELECT Count(*) AS row_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct diagnosis) AS diagnosis_cnt
  FROM coh_basic_raven_diagnosis

Query started at 01:40:38 PM Eastern Daylight Time; Query executed in 0.20 m

Unnamed: 0,row_cnt,patient_cnt,claim_cnt,diagnosis_cnt
0,188256,68205,181823,7


In [6]:
%%read_sql
--Review counts by year and code_type
SELECT year(year_of_service) AS yr,
       code_type,
       Count(*) AS cnt
  FROM coh_basic_raven_diagnosis
 GROUP BY yr, code_type
 ORDER BY yr, code_type

Query started at 01:40:50 PM Eastern Daylight Time; Query executed in 0.05 m

Unnamed: 0,yr,code_type,cnt
0,2017,ICD10,182605
1,2017,ICD9,4
2,2017,,5647


In [7]:
%%read_sql
--Review the distribution of the diagnosis counts against the table
SELECT ref.code_type,
       ref.code,
       ref.description,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_basic_ref ref 
       LEFT JOIN coh_basic_raven_diagnosis dx
              ON ref.code = dx.diagnosis
 WHERE ref.data_asset = 'MEDICAL'
       AND ref.code_type IN ('ICD9','ICD10')
 GROUP BY ref.code_type, ref.code, ref.description
 ORDER BY Count(*) DESC

Query started at 01:40:53 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,code_type,code,description,row_cnt,claim_cnt,patient_cnt
0,ICD10,G47419,Narcolepsy w/o cataplexy,141572,137631,54603
1,ICD10,G47411,Narcolepsy w/cataplexy,42610,40957,15750
2,ICD10,G4729,Narcolepsy in conditions classifed elsewhere w...,2392,2338,1286
3,ICD10,G47421,Narcolepsy in conditions classifed elsewhere w...,1615,1585,666
4,ICD10,G4741,Narcolepsy,60,59,36
5,ICD10,G474,Narcolepsy and cataplexy,4,4,3
6,ICD9,34700,Narcolepsy w/o cataplexy,3,3,3
7,ICD9,34711,Narcolepsy w/cataplexy in conditions classifed...,1,0,0
8,ICD9,3471,Narcolepsy in conditions classifed elsewhere,1,0,0
9,ICD9,34701,Narcolepsy w/cataplexy,1,0,0


## Procedures
Identify subset of Raven procedures that have the procedure or rx codes of interest between the dates.

**Parameters**  
  * `med_proc_start_dt`
  * `med_proc_end_dt`
  * `med_rx_start_dt`
  * `med_rx_end_dt`
  
**Input**
  * `coh_basic_ref`
  * `rwd_db.rwd.raven_external_claims_submits_procedure`

**Output**  
* `coh_basic_raven_procedure`

In [8]:
%%read_sql
--Identify subset of procedure claims of interest for procedure & rx codes of interest
DROP TABLE IF EXISTS coh_basic_raven_procedure;
CREATE TRANSIENT TABLE coh_basic_raven_procedure as
      SELECT patient_id,
             claim_id,
             procedure_type,
             procedure,
             year_of_service,
             ndc
        #FROM rwd_db.rwd.raven_claims_submits_procedure
        FROM rwd_db.rwd.raven_external_claims_submits_procedure
       WHERE year_of_service BETWEEN '{med_proc_start_dt}' AND '{med_proc_end_dt}'
             AND (procedure IN (SELECT code
                                 FROM coh_basic_ref
                                WHERE data_asset='MEDICAL'
                                      AND code_type IN ('CPT','JCODE','HCPCS','ICD10PROC'))
                 OR ndc IN (SELECT code
                              FROM coh_basic_ref
                             WHERE data_asset='MEDICAL'
                                   AND code_type='NDC'));

Query started at 01:40:55 PM Eastern Daylight Time; Query executed in 0.04 mQuery started at 01:40:58 PM Eastern Daylight Time; Query executed in 1.62 m

Unnamed: 0,status
0,Table COH_BASIC_RAVEN_PROCEDURE successfully c...


In [9]:
%%read_sql
--Check the raw counts from the procedure data pull
SELECT Count(*)                     AS row_cnt, 
       Count(DISTINCT patient_id)   AS patient_cnt, 
       Count(DISTINCT claim_id) AS claim_cnt, 
       Count(DISTINCT procedure)    AS procedure_cnt, 
       Count(DISTINCT ndc)          AS ndc_cnt 
  FROM coh_basic_raven_procedure; 

Query started at 01:42:35 PM Eastern Daylight Time; Query executed in 0.13 m

Unnamed: 0,row_cnt,patient_cnt,claim_cnt,procedure_cnt,ndc_cnt
0,444586,108389,281726,50,70


In [10]:
%%read_sql
--Review the distribution of the diagnosis counts against the table
SELECT ref.code_type,
       ref.code,
       ref.description,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM coh_basic_ref ref 
       LEFT JOIN coh_basic_raven_procedure proc
              ON ref.code = proc.procedure
 WHERE ref.data_asset = 'MEDICAL'
       AND ref.code_type IN ('CPT','JCODE','HCPCS','ICD10PROC')
 GROUP BY ref.code_type, ref.code, ref.description
 ORDER BY Count(*) DESC;

Query started at 01:42:43 PM Eastern Daylight Time; Query executed in 0.06 m

Unnamed: 0,code_type,code,description,row_cnt,claim_cnt,patient_cnt
0,CPT,63650,Percutaneous implantation of neurostimulator e...,81208,55651,27082
1,CPT,95972,Electronic analysis of implanted neurostimulat...,48374,47983,24781
2,CPT,63685,Insertion or replacement of spinal neurostimul...,44056,43011,24146
3,CPT,95970,Electronic analysis of implanted neurostimulat...,35665,35064,16284
4,HCPCS,C1778,"Lead, neurostimulator (implantable)",27233,25869,17214
5,CPT,95978,Electronic analysis of implanted neurostimulat...,24703,24399,8332
6,CPT,64999,"Unlisted procedure, nervous system",22290,21109,12011
7,CPT,95971,Electronic analysis of implanted neurostimulat...,21135,20885,10962
8,CPT,95974,Electronic analysis of implanted neurostimulat...,20413,20153,7048
9,HCPCS,C1767,"Generator, neurostimulator (implantable), non-...",19376,18712,13626


## Pharmacy
Identify patients from Raven Pharmacy using the codes of interest

**Parameters**
* `phar_rx_start_dt`
* `phar_rx_end_dt`

**Input**
  * `coh_basic_ref`
  **External table**
  * `rwd_db.rwd.raven_external_pharmacy `

**Output**  
* `coh_basic_raven_pharmacy`

In [11]:
%%read_sql
--Create pharmacy table with codes of interest
DROP TABLE IF EXISTS coh_basic_raven_pharmacy;
CREATE TRANSIENT TABLE coh_basic_raven_pharmacy AS
    SELECT patient_id,
           claim_id,
           prescriber_id            AS npi, 
           product_or_service_id    AS ndc, 
           date_of_service
     # FROM rwd_db.rwd.raven_pharmacy 
     FROM rwd_db.rwd.raven_external_pharmacy
     WHERE date_of_service BETWEEN '{phar_rx_start_dt}' AND '{phar_rx_end_dt}' 
           AND product_or_service_id IN (SELECT code
                                           FROM coh_basic_ref
                                          WHERE data_asset='PHARMACY'
                                                AND code_type='NDC'); 

Query started at 01:42:47 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:42:49 PM Eastern Daylight Time; Query executed in 0.32 m

Unnamed: 0,status
0,Table COH_BASIC_RAVEN_PHARMACY successfully cr...


In [12]:
%%read_sql
SELECT Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct ndc) AS ndc_cnt
  FROM coh_basic_raven_pharmacy

Query started at 01:43:08 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,row_cnt,claim_cnt,patient_cnt,ndc_cnt
0,511,510,137,1


In [13]:
%%read_sql
SELECT ndc,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct ndc) AS ndc_cnt
  FROM coh_basic_raven_pharmacy
 GROUP BY ndc
 ORDER BY row_cnt DESC;

Query started at 01:43:11 PM Eastern Daylight Time; Query executed in 0.04 m

Unnamed: 0,ndc,row_cnt,claim_cnt,patient_cnt,ndc_cnt
0,68727010001,511,510,137,1


# Create cohorts
Create a few key tables for the cohorts

## Patients
Create a cohort of all the unique patient_ids 

**Input**
* `coh_basic_raven_diagnosis`
* `coh_basic_raven_procedure`
* `coh_basic_raven_pharmacy`

**Output**  
* `coh_pt`

In [14]:
%%read_sql
--Create final cohort of unique patients from the extracts
DROP TABLE IF EXISTS coh_pt;
CREATE TRANSIENT TABLE coh_pt AS 
  SELECT patient_id, 
         Max(src_med_dx)   AS src_med_dx, 
         Max(src_med_proc) AS src_med_proc,
         Max(src_med_rx)   AS src_med_rx,
         Max(src_phar_rx)  AS src_phar_rx 
    FROM (--Medical Diagnosis Claims
          SELECT patient_id, 
                 1 AS src_med_dx, 
                 0 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_diagnosis 
          UNION 
          --Medical Procedural claims using procedure codes
          SELECT patient_id, 
                 0 AS src_med_dx, 
                 1 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
           WHERE procedure IN (SELECT code
                                 FROM coh_basic_ref
                                WHERE data_asset='MEDICAL'
                                      AND code_type IN ('CPT','JCODE','HCPCS','ICD10PROC'))
          UNION 
          --Medical Procedural claims using ndc codes
          SELECT patient_id, 
                 0 AS src_med_dx, 
                 0 AS src_med_proc, 
                 1 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
           WHERE ndc IN (SELECT code
                           FROM coh_basic_ref
                          WHERE data_asset='MEDICAL'
                               AND code_type='NDC')
          UNION 
         --Pharmacy claims
          SELECT patient_id, 
                 0 AS src_med_dx, 
                 0 AS src_med_proc,
                 0 AS src_med_rx,
                 1 AS src_phar_rx 
            FROM coh_basic_raven_pharmacy) 
   WHERE patient_id IS NOT NULL 
   GROUP BY patient_id; 

Query started at 01:43:13 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:43:14 PM Eastern Daylight Time; Query executed in 0.11 m

Unnamed: 0,status
0,Table COH_PT successfully created.


In [15]:
%%read_sql
--Review counts and confirm they are unique
SELECT Count(*)                   AS row_cnt, 
       Count(DISTINCT patient_id) AS patient_cnt, 
       Sum(src_med_dx)            AS src_med_dx_sum, 
       Sum(src_med_proc)          AS src_med_proc_sum, 
       Sum(src_med_rx)            AS src_med_rx_sum,
       Sum(src_phar_rx)           AS src_phar_rx_sum
  FROM coh_pt

Query started at 01:43:21 PM Eastern Daylight Time; Query executed in 0.11 m

Unnamed: 0,row_cnt,patient_cnt,src_med_dx_sum,src_med_proc_sum,src_med_rx_sum,src_phar_rx_sum
0,176421,176421,68205,108388,1,137


## Claims
Identify all of the unique claim numbers that were extracted from the analysis for later use

**Input**
* `coh_basic_raven_diagnosis`
* `coh_basic_raven_procedure`
* `coh_basic_raven_pharmacy`

**Output**  
* `coh_claims`

In [16]:
%%read_sql
--Create final cohort of unique patients from the extracts
DROP TABLE IF EXISTS coh_claim;
CREATE TRANSIENT TABLE coh_claim AS 
  SELECT patient_id, 
         claim_id,
         Max(src_med_dx)   AS src_med_dx, 
         Max(src_med_proc) AS src_med_proc,
         Max(src_med_rx)   AS src_med_rx,
         Max(src_phar_rx)  AS src_phar_rx 
    FROM (--Medical Diagnosis Claims
          SELECT patient_id, 
                 claim_id,
                 1 AS src_med_dx, 
                 0 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_diagnosis 
          UNION 
          --Medical Procedural claims using procedure codes
          SELECT patient_id, 
                 claim_id,
                 0 AS src_med_dx, 
                 1 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
           WHERE procedure IN (SELECT code
                                 FROM coh_basic_ref
                                WHERE data_asset='MEDICAL'
                                      AND code_type IN ('CPT','JCODE','HCPCS','ICD10PROC'))
          UNION 
          --Medical Procedural claims using ndc codes
          SELECT patient_id, 
                 claim_id,
                 0 AS src_med_dx, 
                 0 AS src_med_proc, 
                 1 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
           WHERE ndc IN (SELECT code
                           FROM coh_basic_ref
                          WHERE data_asset='MEDICAL'
                               AND code_type='NDC')
          UNION 
         --Pharmacy claims
          SELECT patient_id, 
                 claim_id,
                 0 AS src_med_dx, 
                 0 AS src_med_proc,
                 0 AS src_med_rx,
                 1 AS src_phar_rx 
            FROM coh_basic_raven_pharmacy) 
   WHERE patient_id IS NOT NULL
         AND claim_id IS NOT NULL
   GROUP BY patient_id, claim_id; 

Query started at 01:43:27 PM Eastern Daylight Time; Query executed in 0.03 mQuery started at 01:43:29 PM Eastern Daylight Time; Query executed in 0.12 m

Unnamed: 0,status
0,Table COH_CLAIM successfully created.


In [17]:
%%read_sql
--Review counts and confirm they are unique
SELECT Count(*)                     AS row_cnt, 
       Count(DISTINCT patient_id)   AS patient_cnt, 
       Count(Distinct claim_id) AS claim_cnt,
       Sum(src_med_dx)              AS src_med_dx_sum, 
       Sum(src_med_proc)            AS src_med_proc_sum, 
       Sum(src_med_rx)              AS src_med_rx_sum,
       Sum(src_phar_rx)             AS src_phar_rx_sum
  FROM coh_claim

Query started at 01:43:36 PM Eastern Daylight Time; Query executed in 0.11 m

Unnamed: 0,row_cnt,patient_cnt,claim_cnt,src_med_dx_sum,src_med_proc_sum,src_med_rx_sum,src_phar_rx_sum
0,458082,176421,458082,180213,277567,1,387
