# Patient Basic
Create patient cohort of all patients who have at least 1 claim wtih a code in `ref_cohort` between the dates of interest.

**Script**
* [scripts/coh/coh_basic.ipynb](./scripts/coh/coh_basic.ipynb)
 
**Parameters**
* `in/coh/coh_basic.xlsx[param]`
* `in/coh/coh_basic.xlsx[ref]`

**Input**
* None

**Output**
* `coh_basic_ref`
* `coh_basic_raven_diagnosis`
* `coh_basic_raven_pharmacy`
* `coh_basic_raven_procedure`
* `coh_pt`
* `coh_claim`

**Review**
* [scripts/coh/coh_basic.html](./scripts/coh/coh_basic.html)

In [None]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications

# Parameters
Python dictionary of parameters for the cohort basic procedure

**Input**  
* `in/coh/coh_basic.xlsx[param]`

**Output**
* Python variables named after parameters with the value


In [None]:
#Create system variables from excel into script and review values in dictionary
input_df = pd.read_excel('../../in/coh/coh_pcb.xlsx', sheet_name='param', skiprows=4, dtype=str)
var_dict = dict(zip(input_df.parameter, input_df.value))
for key,val in var_dict.items(): exec(key + '=val')

#Check inputs
pd.DataFrame.from_dict(var_dict, orient='index')

# Extract 
Extract raw data using the reference table and parameters for this module

## Diagnosis
Extract the subset from raven diagnosis 

**Parameters**
  * `med_dx_start_dt`
  * `med_dx_end_dt`  
  
**Input**
  * `coh_basic_ref`
  * `rwd_db.rwd.raven_external_claims_submits_diagnosis`
  
**Output**  
* `coh_basic_raven_diagnosis`


In [None]:
%%read_sql

--CREATE TRANSIENT TABLE of all claims with the codes of interest
DROP TABLE IF EXISTS coh_basic_raven_diagnosis;
CREATE TRANSIENT TABLE coh_basic_raven_diagnosis as
      SELECT patient_id,
             claim_id,
             code_type, 
             diagnosis,
             diagnosis_sequence,
             year_of_service
        FROM {submits_diagnosis};

In [None]:
%%read_sql
--Check the raw counts of the initial data pull
SELECT Count(*) AS row_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct diagnosis) AS diagnosis_cnt
  FROM coh_basic_raven_diagnosis

In [None]:
%%read_sql
--Review counts by year and code_type
SELECT year(year_of_service) AS yr,
       code_type,
       Count(*) AS cnt
  FROM coh_basic_raven_diagnosis
 GROUP BY yr, code_type
 ORDER BY yr, code_type

## Procedures
Identify subset of Raven procedures that have the procedure or rx codes of interest between the dates.

**Parameters**  
  * `med_proc_start_dt`
  * `med_proc_end_dt`
  * `med_rx_start_dt`
  * `med_rx_end_dt`
  
**Input**
  * `coh_basic_ref`
  * `rwd_db.rwd.raven_external_claims_submits_procedure`

**Output**  
* `coh_basic_raven_procedure`

In [None]:
%%read_sql
--Identify subset of procedure claims of interest for procedure & rx codes of interest
DROP TABLE IF EXISTS coh_basic_raven_procedure;
CREATE TRANSIENT TABLE coh_basic_raven_procedure as
      SELECT patient_id,
             claim_id,
             procedure_type,
             procedure,
             year_of_service,
             ndc
        FROM {submits_procedure};

In [None]:
%%read_sql
--Check the raw counts from the procedure data pull
SELECT Count(*)                     AS row_cnt, 
       Count(DISTINCT patient_id)   AS patient_cnt, 
       Count(DISTINCT claim_id) AS claim_cnt, 
       Count(DISTINCT procedure)    AS procedure_cnt, 
       Count(DISTINCT ndc)          AS ndc_cnt 
  FROM coh_basic_raven_procedure; 

## Pharmacy
Identify patients from Raven Pharmacy using the codes of interest

**Parameters**
* `phar_rx_start_dt`
* `phar_rx_end_dt`

**Input**
  * `coh_basic_ref`
  * `rwd_db.rwd.raven_external_pharmacy `

**Output**  
* `coh_basic_raven_pharmacy`

In [None]:
%%read_sql
--Create pharmacy table with codes of interest
DROP TABLE IF EXISTS coh_basic_raven_pharmacy;
CREATE TRANSIENT TABLE coh_basic_raven_pharmacy AS
    SELECT patient_id,
           claim_id,
           prescriber_id            AS npi, 
           product_or_service_id    AS ndc, 
           date_of_service
      FROM {raven_pharmacy}; 

In [None]:
%%read_sql
SELECT Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct ndc) AS ndc_cnt
  FROM coh_basic_raven_pharmacy

In [None]:
%%read_sql
SELECT ndc,
       Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt,
       Count(Distinct ndc) AS ndc_cnt
  FROM coh_basic_raven_pharmacy
 GROUP BY ndc
 ORDER BY row_cnt DESC;

# Create cohorts
Create a few key tables for the cohorts

## Patients
Create a cohort of all the unique patient_ids 

**Input**
* `coh_basic_raven_diagnosis`
* `coh_basic_raven_procedure`
* `coh_basic_raven_pharmacy`

**Output**  
* `coh_pt`

In [None]:
%%read_sql
--Create final cohort of unique patients from the extracts
DROP TABLE IF EXISTS coh_pt;
CREATE TRANSIENT TABLE coh_pt AS 
  SELECT patient_id, 
         Max(src_med_dx)   AS src_med_dx, 
         Max(src_med_proc) AS src_med_proc,
         Max(src_med_rx)   AS src_med_rx,
         Max(src_phar_rx)  AS src_phar_rx 
    FROM (--Medical Diagnosis Claims
          SELECT patient_id, 
                 1 AS src_med_dx, 
                 0 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_diagnosis 
          UNION 
          --Medical Procedural claims using procedure codes
          SELECT patient_id, 
                 0 AS src_med_dx, 
                 1 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
          UNION 
          --Medical Procedural claims using ndc codes
          SELECT patient_id, 
                 0 AS src_med_dx, 
                 0 AS src_med_proc, 
                 1 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
          UNION 
         --Pharmacy claims
          SELECT patient_id, 
                 0 AS src_med_dx, 
                 0 AS src_med_proc,
                 0 AS src_med_rx,
                 1 AS src_phar_rx 
            FROM coh_basic_raven_pharmacy) 
   WHERE patient_id IS NOT NULL 
   GROUP BY patient_id; 

In [None]:
%%read_sql
--Review counts and confirm they are unique
SELECT Count(*)                   AS row_cnt, 
       Count(DISTINCT patient_id) AS patient_cnt, 
       Sum(src_med_dx)            AS src_med_dx_sum, 
       Sum(src_med_proc)          AS src_med_proc_sum, 
       Sum(src_med_rx)            AS src_med_rx_sum,
       Sum(src_phar_rx)           AS src_phar_rx_sum
  FROM coh_pt

## Claims
Identify all of the unique claim numbers that were extracted from the analysis for later use

**Input**
* `coh_basic_raven_diagnosis`
* `coh_basic_raven_procedure`
* `coh_basic_raven_pharmacy`

**Output**  
* `coh_claims`

In [None]:
%%read_sql
--Create final cohort of unique patients from the extracts
DROP TABLE IF EXISTS coh_claim;
CREATE TRANSIENT TABLE coh_claim AS 
  SELECT patient_id, 
         claim_id,
         Max(src_med_dx)   AS src_med_dx, 
         Max(src_med_proc) AS src_med_proc,
         Max(src_med_rx)   AS src_med_rx,
         Max(src_phar_rx)  AS src_phar_rx 
    FROM (--Medical Diagnosis Claims
          SELECT patient_id, 
                 claim_id,
                 1 AS src_med_dx, 
                 0 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_diagnosis 
          UNION 
          --Medical Procedural claims using procedure codes
          SELECT patient_id, 
                 claim_id,
                 0 AS src_med_dx, 
                 1 AS src_med_proc, 
                 0 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
          UNION 
          --Medical Procedural claims using ndc codes
          SELECT patient_id, 
                 claim_id,
                 0 AS src_med_dx, 
                 0 AS src_med_proc, 
                 1 AS src_med_rx,
                 0 AS src_phar_rx 
            FROM coh_basic_raven_procedure
          UNION 
         --Pharmacy claims
          SELECT patient_id, 
                 claim_id,
                 0 AS src_med_dx, 
                 0 AS src_med_proc,
                 0 AS src_med_rx,
                 1 AS src_phar_rx 
            FROM coh_basic_raven_pharmacy) 
   WHERE patient_id IS NOT NULL
         AND claim_id IS NOT NULL
   GROUP BY patient_id, claim_id; 

In [None]:
%%read_sql
--Review counts and confirm they are unique
SELECT Count(*)                     AS row_cnt, 
       Count(DISTINCT patient_id)   AS patient_cnt, 
       Count(Distinct claim_id) AS claim_cnt,
       Sum(src_med_dx)              AS src_med_dx_sum, 
       Sum(src_med_proc)            AS src_med_proc_sum, 
       Sum(src_med_rx)              AS src_med_rx_sum,
       Sum(src_phar_rx)             AS src_phar_rx_sum
  FROM coh_claim