# Provider Basic
Identify all of the providers that were on claims with patients extracted from `coh_basic_prov.ipynb`.

**Script**
  * [scripts/coh/coh_basic_prov.ipynb](./scripts/coh/coh_basic_prov.ipynb)

**Prior Scripts**
  * [scripts/coh/coh_basic.ipynb](./scripts/coh/coh_basic.ipynb)

**Parameters**
  * `in/coh/coh_pt_basic.xlsx[coh_basic_param]`
  
**Input**
  * coh_claim
  * coh_pt
  * rwd_db.rwd.raven_external_claims_submits_provider
  
**Output**
  * coh_npi
  * coh_npi_pt_link

**Review**
  * [scripts/coh/coh_basic_prov.html](./scripts/coh/coh_basic_prov.html)
  
**Suggestion**
  * Create a table with provider type per physician possibly linked to patients
  
**Suggestion**
  * Create a table with provider type per physician possibly linked to patients

In [1]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications

# Parameters
Python dictionary of parameters for the cohort basic procedure

**Input**  
* `in/cohort/coh_pt_basic.xlsx[coh_basic_param]`

**Output**
* Python variables named after parameters with the value

In [None]:
#Create system variables from excel into script and review values in dictionary
input_df = pd.read_excel('../../in/coh/coh_basic.xlsx', sheet_name='param', skiprows=4, dtype=str)
var_dict = dict(zip(input_df.parameter, input_df.value))
for key,val in var_dict.items(): exec(key + '=val')

#Check inputs
pd.DataFrame.from_dict(var_dict, orient='index')

# Extract
Extract the raw data needed to perform this analysis

## Medical
Identify the subset of raven providers for Medical claims of interest

**Parameters**  
  * `med_dx_start_dt`
  * `med_dx_end_dt`
  * `med_proc_start_dt`
  * `med_proc_end_dt`

**Input**
  * `coh_claims`
  * `rwd_db.rwd.raven_external_claims_submits_provider`

**Output**  
* `coh_raven_providers`


In [None]:
%%read_sql
--Extract the provider information for those claims for the claim numbers of interest 
DROP TABLE IF EXISTS coh_raven_provider;
CREATE TRANSIENT TABLE coh_raven_provider AS 
  SELECT claim_id, 
         patient_id, 
         provider_npi, 
         provider_type,
         year_of_service
    #FROM rwd_db.rwd.raven_claims_submits_provider
    FROM rwd_db.rwd.raven_external_claims_submits_provider
   WHERE claim_id IN (SELECT claim_id 
                            FROM coh_claim
                           WHERE src_med_dx = 1
                                 OR src_med_proc = 1
                                 OR src_med_rx = 1) 
         AND year_of_service BETWEEN Least('{med_dx_start_dt}','{med_proc_start_dt}') 
                                     AND Greatest('{med_dx_end_dt}','{med_proc_end_dt}')
   GROUP BY claim_id, patient_id, provider_npi, provider_type, year_of_service; 

In [None]:
%%read_sql
--Review basic counts
SELECT Count(*)                        AS row_cnt, 
       Count(DISTINCT claim_id)    AS claim_cnt, 
       Count(DISTINCT patient_id)      AS patient_cnt, 
       Count(DISTINCT provider_npi)    AS npi_cnt, 
       Count(DISTINCT provider_type)   AS prov_type_cnt, 
       Count(DISTINCT year_of_service) AS year_of_service_cnt 
  FROM coh_raven_provider;

# Create cohorts
Create the cohorts for this project

## providers
Identify all of the unique providers who performed services for this population and identify which data source it came from


**Input**
* `coh_pt`
* `coh_raven_pharmacy`
* `coh_raven_provider`

**Output**  
* coh_npi

In [None]:
%%read_sql
--Identify unique providers from all claims and locations from where the data came from
DROP TABLE IF EXISTS coh_npi;
CREATE TRANSIENT TABLE coh_npi AS
    SELECT npi,
           Max(src_med_dx)   AS src_med_dx,
           Max(src_med_proc) AS src_med_proc,
           Max(src_med_rx)   AS src_med_rx,
           Max(src_phar_rx)  AS src_phar_rx
      FROM (SELECT prov.provider_npi AS npi,
                   pt.src_med_dx, 
                   pt.src_med_proc, 
                   pt.src_med_rx,
                   pt.src_phar_rx
              FROM coh_raven_provider prov
                   JOIN coh_pt pt
                     ON pt.patient_id = prov.patient_id
             WHERE npi IS NOT NULL
            UNION
            SELECT npi,
                   0 AS src_med_dx, 
                   0 AS src_med_proc, 
                   0 AS src_med_rx,
                   1 AS src_phar_rx
              FROM coh_basic_raven_pharmacy
             WHERE npi IS NOT NULL)
     GROUP BY npi;

In [None]:
%%read_sql
--Revew Counts to confirm they look right
SELECT Count(*) AS row_cnt,
       Count(Distinct npi) AS npi_cnt,
       Sum(src_med_dx)     AS src_med_dx, 
       Sum(src_med_proc)   AS src_med_proc, 
       Sum(src_med_rx)     AS src_med_rx,
       Sum(src_phar_rx)    AS src_phar_rx
  FROM coh_npi;

## NPI patient link
Identify the link between NPIs and providers

**Parameters**  
  * `med_dx_start_dt`
  * `med_dx_end_dt`
  * `med_proc_start_dt`
  * `med_proc_end_dt`

**Input**
  * `coh_claims`
  * `rwd_db.rwd.raven_external_claims_submits_provider`

**Output**  
* `coh_raven_providers`

In [None]:
%%read_sql
--Identify unique providers from all claims and locations from where the data came from
DROP TABLE IF EXISTS coh_npi_pt_link;
CREATE TRANSIENT TABLE coh_npi_pt_link AS
    SELECT npi,
           patient_id,
           Max(src_med_dx)   AS src_med_dx,
           Max(src_med_proc) AS src_med_proc,
           Max(src_med_rx)   AS src_med_rx,
           Max(src_phar_rx)  AS src_phar_rx
      FROM (SELECT prov.provider_npi AS npi,
                   prov.patient_id,
                   pt.src_med_dx, 
                   pt.src_med_proc, 
                   pt.src_med_rx,
                   pt.src_phar_rx
              FROM coh_raven_provider prov
                   JOIN coh_pt pt
                     ON pt.patient_id = prov.patient_id
             WHERE npi IS NOT NULL
            UNION
            SELECT npi,
                   patient_id,
                   0 AS src_med_dx, 
                   0 AS src_med_proc, 
                   0 AS src_med_rx,
                   1 AS src_phar_rx
              FROM coh_basic_raven_pharmacy
             WHERE npi IS NOT NULL)
     GROUP BY npi, patient_id;

In [None]:
%%read_sql
--Revew Counts to confirm they look right
SELECT Count(*)                   AS row_cnt, 
       Count(DISTINCT npi)        AS npi_cnt, 
       Count(DISTINCT patient_id) AS pt_cnt, 
       Sum(src_med_dx)            AS src_med_dx, 
       Sum(src_med_proc)          AS src_med_proc, 
       Sum(src_med_rx)            AS src_med_rx, 
       Sum(src_phar_rx)           AS src_phar_rx 
  FROM coh_npi_pt_link; 

# Drop Tables
Drop unnecessary tables

In [None]:
%%read_sql
DROP TABLE IF EXISTS COH_RAVEN_PROVIDER;