# Procedures
Tidy tables of the patient's custom procedure groupings identifed from all extracted medical claims.  All table start with `pld_proc`

**Script**
* [scripts/cld/procedures.ipynb](./scripts/cld/procedures.ipynb)

**Prior Script(s)**
* [scripts/de/raven_procedure.ipynb](./scripts/de/raven_procedure.ipynb)

**Parameters**
* `in/cld/procedure_custom.xlsx[param]`
* `in/cld/procedure_custom.xlsx[ref]`: `cld_proc_ref`

**Input**
* `coh_pt`
* `de_raven_procedure`
  
**Output**  
* `cld_proc`

**Review**
* [scripts/cld/proc.html](./scripts/cld/proc.html)

In [None]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications


# Reference
Upload reference table used to identify the custom comorbidities

**Input**  
  * `in/cld/procedures.xlsx[ref]`

**Output**  
* `cld_proc_ref`

In [None]:
#Upload reference table from excel to snowflake and review snowflake output
df = pd.read_excel('../../in/cld/procedures.xlsx', sheet_name='ref', skiprows=4, dtype=str)

#Strip white space and make referrable columns uppercase
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
df[['code_type','code']] =  \
    df[['code_type','code']].apply(lambda x: x.str.upper() if x.dtype == "object" else x)
df.code= df.code.str.replace(r'\W+','').astype('str')

#Upload to snowflake
snow.drop_table("cld_proc_ref")
snow.upload_dataframe(df,"cld_proc_ref")
snow.select("SELECT * FROM cld_proc_ref")
del df

# Procedure Grouping
Identify the patients comorbidities based on the parameters and inputs

**Parameters**
  * NONE
  
**Input**
  * `coh_pt`
  * `de_raven_procedure`
  * `cld_proc_ref`
  
**Output**  
* `cld_proc`

In [None]:
%%read_sql
--Create raven diagnosis table
DROP TABLE IF EXISTS cld_proc; 
CREATE TRANSIENT TABLE cld_proc AS
      SELECT coh.patient_id,
             proc.claim_id,
             proc.year_of_service,
             ref.cat1,
             ref.cat2, 
             ref.cat3,
             ref.code_type,
             ref.code,
             ref.description,
             ref.source
        FROM coh_pt coh
             JOIN de_raven_procedure proc
               ON coh.patient_id = proc.patient_id
             JOIN cld_proc_ref ref
               ON ref.code = proc.procedure
       GROUP BY coh.patient_id, proc.claim_id, proc.year_of_service, ref.cat1, ref.cat2, ref.cat3, 
                ref.code_type, ref.code, ref.description, ref.source

In [None]:
%%read_sql
--Review counts as a sanity check
SELECT Count(*) AS row_cnt,
       Count(Distinct patient_id) AS pt_cnt,
       Count(Distinct cat1) AS cat1_cnt,
       Count(Distinct cat2) AS cat2_cnt,
       Count(Distinct cat3) AS cat3_cnt
  FROM cld_proc;

In [None]:
%%read_sql
--Quick distribution of the counts
SELECT cat1,
       Count(Distinct patient_id) AS pt_cnt,
       Count(Distinct patient_id) / (SELECT Count(*)
                                      FROM coh_pt) AS pct 
  FROM cld_proc
 GROUP BY cat1
 ORDER BY pt_cnt desc;

In [None]:
%%read_sql
--Quick distribution of the counts
SELECT cat1,
       cat2,
       Count(Distinct patient_id) AS pt_cnt,
       Count(Distinct patient_id) / (SELECT Count(*)
                                      FROM coh_pt) AS pct 
  FROM cld_proc
 GROUP BY cat1, cat2
 ORDER BY pt_cnt desc;

In [None]:
%%read_sql
--Quick distribution of the counts
SELECT cat1,
       cat2,
       cat3,
       Count(Distinct patient_id) AS pt_cnt,
       Count(Distinct patient_id) / (SELECT Count(*)
                                      FROM coh_pt) AS pct 
  FROM cld_proc
 GROUP BY cat1, cat2, cat3
 ORDER BY pt_cnt desc;

In [None]:
%%read_sql
--Quick distribution of the counts
SELECT cat1,
       cat2,
       cat3,
       code_type,
       code,
       description,
       Count(Distinct patient_id) AS pt_cnt,
       Count(Distinct patient_id) / (SELECT Count(*)
                                      FROM coh_pt) AS pct 
  FROM cld_proc
 GROUP BY cat1, cat2, cat3, code_type, code, description
 ORDER BY pt_cnt desc;