# Raven Header
Extract all Raven header records for `coh_pt` between date parameters

**Script**
* [scripts/de/raven_header.ipynb](./scripts/de/raven_header.ipynb)

**Prior Script(s)**
* [scripts/coh/coh_basic.ipynb](./scripts/coh/coh_basic.ipynb)

**Parameters**
* `in/de/raven_extract.xlsx[raven_extract]`

**Input**
* `coh_pt`
* `rwd_db.rwd.raven_external_claims_submits_header`

**Output**  
* `de_raven_header`

**Review**
* [scripts/de/raven_header.html](./scripts/de/raven_header.html)

In [None]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications


# Parameters
Create python variables of the parameters

 **Input**  
* `in/extract/raven_extract.xlsx[raven_extract]`

**Output**
* Python variables named after parameters with the value

In [None]:
#Create system variables from excel into script and review values in dictionary
input_df = pd.read_excel('../../in/de/raven_extract.xlsx', sheet_name='raven_extract', skiprows=4, dtype=str)
var_dict = dict(zip(input_df.parameter, input_df.value))
for key,val in var_dict.items(): exec(key + '=val')

#Check inputs
pd.DataFrame.from_dict(var_dict, orient='index')

# Extract Data
Extract subset of raven diagnosis for the patients of interest between specified date ranges

**Parameters**
  * `header_start_dt`
  * `header_end_dt`  
  
**Input**
  * `coh_pt`
  * `rwd_db.rwd.raven_external_claims_submits_header`
  
**Output**  
* `de_raven_header`

In [None]:
%%read_sql
--Create raven header table
DROP TABLE IF EXISTS de_raven_header;
CREATE TRANSIENT TABLE de_raven_header AS 
  SELECT claim_id, 
         patient_id, 
         received_date, 
         claim_type_code, 
         statement_from, 
         statement_to, 
         min_service_from, 
         max_service_to, 
         drg_code, 
         type_bill, 
         admission_date, 
         admit_type_code, 
         admit_src_code, 
         discharge_hour, 
         discharge_status, 
         year_of_service 
    #FROM rwd_db.rwd.raven_claims_submits_header
    FROM rwd_db.rwd.raven_external_claims_submits_header
   WHERE year_of_service BETWEEN '{header_start_dt}' AND '{header_end_dt}' 
         AND patient_id IN (SELECT patient_id 
                              FROM coh_pt); 

In [None]:
%%read_sql
--Review counts as a sanity check
SELECT Count(*) AS row_cnt,
       Count(Distinct claim_id) AS claim_cnt,
       Count(Distinct patient_id) AS patient_cnt
  FROM de_raven_header;