# Geography Medical
Geography of each unique patient per the medical claims using the claim closest to an index date

**Script**
* [scripts/pld/geography_medical.ipynb](./scripts/pld/geography_medical.ipynb)

**Prior Script(s)**
* [scripts/de/raven_patient.ipynb](./scripts/de/raven_patient.ipynb)

**Parameters**
* `in/pld/geography.xlsx[param]`

**Input**
* `coh_pt`
* `ref_db.analytics.zip_geo`
* `de_raven_patient`

**Output**  
* `pld_geo_medical`

**Review**
* [scripts/pld/geography_medical.html](./scripts/pld/geography_medical.html)

In [13]:
#Import libraries for this notebook
import pandas as pd  
from drg_connect import Snowflake
import numpy as np
import pickle
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#Load connection variables to connect_dict
with open('../../out/conn/connect_dict.pickle', 'rb') as handle:
    connect_dict = pickle.load(handle)

#Create Eegine to connect to snowflake
snow = Snowflake(role=connect_dict['role'],
                 warehouse=connect_dict['warehouse'],
                 database=connect_dict['database'],
                 schema=connect_dict['schema'])

#Finish engine setup
engine = snow.engine
%load_ext sql_magic
%config SQL.conn_name = 'engine'  #Set the sql_magic connection engine
%config SQL.output_result = True  #Enable output to std out
%config SQL.notify_result = False #disable browser notifications


The sql_magic extension is already loaded. To reload it, use:
  %reload_ext sql_magic


# Parameters
Import the index date where the date the patient lives is

 **Input**  
* `in/extract/raven_extract.xlsx[raven_extract]`

**Output**
* Python variables named after parameters with the value

In [14]:
#Create system variables from excel into script and review values in dictionary
input_df = pd.read_excel('../../in/pld/geography.xlsx', sheet_name='param', skiprows=4, dtype=str)
var_dict = dict(zip(input_df.parameter, input_df.value))
for key,val in var_dict.items(): exec(key + '=val')

#Check inputs
pd.DataFrame.from_dict(var_dict, orient='index')

Unnamed: 0,0
med_index_dt,2018-01-01
phar_index_dt,2018-01-01


# Geography
Identify the zip3 from medical claims closest to the index date for medical claims (med_index_dt)

In [15]:
%%read_sql
--Identify number of days to cosesest 
CREATE OR REPLACE TEMP TABLE tmp_zip_delta_min AS
    SELECT pt.patient_id,
           Min(abs(Datediff(d,'{med_index_dt}',pt.year_of_service))) AS delta
      FROM de_raven_patient pt
           JOIN ref_db.analytics.zip3_geo zip
             ON Trim(pt.member_adr_zip) = Trim(zip.zip3)
     GROUP BY pt.patient_id;    

Query started at 11:21:20 AM Eastern Daylight Time; Query executed in 0.08 m

Unnamed: 0,status
0,Table TMP_ZIP_DELTA_MIN successfully created.


In [16]:
%%read_sql
--Identify the date and zipcode of interest
CREATE OR REPLACE TEMP TABLE tmp_zip3 AS
    SELECT pt.patient_id,
           Min(pt.year_of_service) AS year_of_service,
           Min(geo.zip3) AS zip3
      FROM tmp_zip_delta_min zipmin
           JOIN de_raven_patient pt
             ON zipmin.patient_id = pt.patient_id
                AND abs(Datediff(d,'{med_index_dt}',pt.year_of_service)) = zipmin.delta
           JOIN ref_db.analytics.zip3_geo geo
             ON Trim(geo.zip3) = Trim(pt.member_adr_zip)
     GROUP BY pt.patient_id;

Query started at 11:21:24 AM Eastern Daylight Time; Query executed in 0.11 m

Unnamed: 0,status
0,Table TMP_ZIP3 successfully created.


In [17]:
%%read_sql
--Identify final zip3 and other geography data for each patient
DROP TABLE IF EXISTS pld_geo_medical;
CREATE TRANSIENT TABLE pld_geo_medical AS
    SELECT coh.patient_id,
           year_of_service,
           Datediff(d,year_of_service,'{med_index_dt}') AS index_delta,
           geo.fips,
           geo.county,
           geo.msa,
           geo.dma,
           geo.state,
           geo.state_abbr,
           geo.region
      FROM coh_pt coh
           JOIN tmp_zip3 zip3
             ON coh.patient_id = zip3.patient_id
           JOIN ref_db.analytics.zip3_geo geo
             ON geo.zip3 = zip3.zip3

Query started at 11:21:31 AM Eastern Daylight Time; Query executed in 0.02 mQuery started at 11:21:32 AM Eastern Daylight Time; Query executed in 0.08 m

Unnamed: 0,status
0,Table PLD_GEO_MEDICAL successfully created.


# Summary
Summarize Data for later review

In [18]:
%%read_sql
SELECT Count(*),
       Count(Distinct patient_id) AS pt_cnt,
       Count(Distinct year_of_service) AS dt_cnt,
       Count(Distinct fips) AS fips_cnt,
       Count(Distinct county) AS county_cnt,
       Count(Distinct msa) AS msa_cnt,
       Count(Distinct dma) AS dma_cnt,
       Count(Distinct state) AS state_cnt,
       Count(Distinct state_abbr) AS state_abbr_cnt,
       Count(Distinct region) AS region_cnt
  FROM pld_geo_medical

Query started at 11:21:37 AM Eastern Daylight Time; Query executed in 0.10 m

Unnamed: 0,COUNT(*),pt_cnt,dt_cnt,fips_cnt,county_cnt,msa_cnt,dma_cnt,state_cnt,state_abbr_cnt,region_cnt
0,2742663,2742663,1461,730,582,344,206,52,52,5
