# Raven Diagnosis
Extract all Raven diagnosis records for `coh_pt` between date parameters

**Script**
* [scripts/de/raven_diagnosis.ipynb](./scripts/de/raven_diagnosis.ipynb)

**Prior Script(s)**
* [scripts/coh/coh_basic.ipynb](./scripts/coh/coh_basic.ipynb)

**Parameters**
* `in/de/raven_extract.xlsx[raven_extract]`

**Input**
* `coh_pt`
* `rwd_db.rwd.raven_external_claims_submits_diagnosis`

**Output**  
* `de_raven_diagnosis`

**Review**
* [scripts/de/raven_diagnosis.html](./scripts/de/raven_diagnosis.html)

In [1]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications


# Parameters
Create python variables of the parameters

 **Input**  
* `in/extract/raven_extract.xlsx[raven_extract]`

**Output**
* Python variables named after parameters with the value

In [2]:
#Create system variables from excel into script and review values in dictionary
input_df = pd.read_excel('../../in/de/raven_extract.xlsx', sheet_name='raven_extract', skiprows=4, dtype=str)
var_dict = dict(zip(input_df.parameter, input_df.value))
for key,val in var_dict.items(): exec(key + '=val')

#Check inputs
pd.DataFrame.from_dict(var_dict, orient='index')

Unnamed: 0,0
diagnosis_start_dt,2015-01-01
diagnosis_end_dt,2018-12-31
procedure_start_dt,2015-01-01
procedure_end_dt,2018-12-31
pharmacy_start_dt,2015-01-01
pharmacy_end_dt,2018-12-31
payer_start_dt,2015-01-01
payer_end_dt,2018-12-31
provider_start_dt,2015-01-01
provider_end_dt,2018-12-31


# Extract Data
Extract subset of raven diagnosis for the patients of interest between specified date ranges

**Parameters**
  * `diagnosis_start_dt`
  * `diagnosis_end_dt`  
  
**Input**
  * `coh_pt`
  * `rwd_db.rwd.raven_external_claims_submits_diagnosis` **OR** `PCB Extract`
  
**Output**  
* `de_raven_diagnosis`

In [None]:
%%read_sql
--Create raven diagnosis table
DROP TABLE IF EXISTS de_raven_diagnosis; 
CREATE TRANSIENT TABLE de_raven_diagnosis AS
      SELECT patient_id,
             claim_id,
             code_type, 
             diagnosis,
             diagnosis_sequence,
             year_of_service
        #FROM rwd_db.rwd.raven_claims_submits_diagnosis
        FROM rwd_db.rwd.raven_external_claims_submits_diagnosis
       WHERE year_of_service BETWEEN '{diagnosis_start_dt}' AND '{diagnosis_end_dt}'
             AND patient_id IN (SELECT patient_id
                                  FROM coh_pt);

In [None]:
%%read_sql
--Review counts as a sanity check
SELECT Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM de_raven_diagnosis;