# Pre-Analysis



Version 1.0 \| First Created September 18, 2023 \| Updated November 2023

## Jupyter Notebook

This is an Jupyter Notebook document. For more details on using a Jupyter Notebook see <https://docs.jupyter.org/en/latest/>.



# A Space-Time Accessibility Analysis of Pharmacy Care in Vermont

### Authors

- Sam Roubin\*, sroubin@middlebury.edu, @samroubin, https://orcid.org/0009-0005-5490-3744, Middlebury College
- Joseph Holler, jholler@middlebury.edu, @josephholler, https://orcid.org/0000-0002-2381-2699, Middlebury College

\* Corresponding author and creator



### Abstract

Pharmacy care is a fundamental aspect of primary healthcare and is gaining greater recognition in the healthcare landscape. Patient-pharmacist interactions have recently evolved beyond the conventional role of medication dispensing and have embraced a more patient-centric medication management role. Recent research has underscored the importance of pharmacy access in primary care, as patients tend to visit pharmacies at higher rates than primary care providers and pharmacies are particularly valuable for reaching rural populations.

Our research aims to measure the spatial variation in access to pharmacy care across the state of Vermont through adapting an established healthcare accessibility model encompassing the enhanced 2-step floating catchment area (E2SFCA) method. Specifically, it seeks to identify areas of the state that have particularly limited access to pharmacies. This research is temporally explicit, as it will analyze variation in accessibility at specific times of the day and week. Such temporally granular data has added benefits for the research since it provides information on how spatial accessibility varies at irregular times, improving our understanding of when pharmacy care may be particularly limited for certain populations. Previous studies have used the two step floating catchment area (2SFCA) and the E2SFCA method to measure spatial accesibility of pharmacy care across geographic scales and regions (e.g. on both the state and national levels). However, no prior studies have done so in Vermont. The results of this study are of interest to both service providers and Vermont state public health officials and may have important implications for public health planning in the state. 

This study adapts and extends the methodology used in Kang et al.'s (2020) "Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA" to measure spatial accessibility of pharmacy care across Vermont. 



### Study metadata - Answer below for final maps

- `Key words`: Pharmacy accessibility, spatial accessibility, pharmacy deserts, spatial analysis, pharmacy care, Vermont healthcare, rural healthcare, pharmacy services, Vermont public health, E2FSCA. 
- `Subject`: Medicine and Health Sciences: Public Health: Health Services Research OR Social and Behavioral Sciences: Geography: Spatial Science
- `Date created`: 9/18/2023
- `Date modified`: 11/8/2023
- `Spatial Coverage`: The state of Vermont and areas within 10-miles of the Vermont border (MA, NY, NH). Canada is excluded. 
- `Spatial Resolution`: Municipalities, roughly same size as zipcodes, politically meaningful. The smallest spatial units of data will likely be around 5km hexagon grids, as the smallest distance between zipcode centroids were around 4km. Additionally, census tract data may be used to include social vulnerability index (SVI) in our analysis. 
- `Spatial Reference System`: EPSG 6589  
- `Temporal Coverage`: The temporal extent is not explicitly determined. Pharmacies have been asked to provide their staffing levels over the period of roughly a month. The results from the study will be theoretically based on a single point in time in the Fall of 2023. This is a one time measuremement, and the study does not investigate change over time. 
- `Temporal Resolution`: The temporal resoution of this study varies. It will be as granular as a period of a couple of hours of a week to whole days, or an entire work week. Spatial access will be assessed and compared across specific hours of the days, between week days and Saturdays, weekdays and Sundays, and other combinations of these temporal units. 
- `Funding Name`: NA
- `Funding Title`: NA
- `Award info URI`: NA
- `Award number`: NA

## Study design

Our study is original research that adapts the method set forth by Kang et al. (2020) to measure spatial accesibility to COVID-19 healthcare resources. Specifically, we will adapt the supply variables and change our geographic area of interest but will largely follow the same methodology, with minor relevant changes for our analysis. 

Holler (include OSF link) previously reproduced the Kang et al. 

General research question This study is largerly a descriptive model that characterizes access to pharmacy care across the state. Importantly, the model will be run in a temporally explicit manner to describe how variation in access is impacted during a given week. Primary driver of this study was that rural populations across Vermont experience increased difficulty in . We hypothesize tha: Rural places have more significanlty limited access to pharmacy care, expecially outside of normal 9am-5pm monday through Friday business hours. Although this study does not address a specific hypothesis due to its descriptive nature, several specific research questions arise from studying temporally-explicit spatial accessibility of pharmacy care in Vermont. We are primarily interested in whether certain areas of the state have particularly limited access to pharmacy care. We are also interested in how access varies across certain times of the day, days of the week, or a combination of these. Are there gaps in spatial accesibility to pharmacy care in Vermont, especially at different times of the day or days of the week? This question can be answered with a temporally-explicit model that creates a series of maps demonstrating varying accesibility. 

The specific time intervals for which we will measure accessibility have not yet been determined. However, it is likely that we will run the model for weekdays, Saturdays, and Sundays generally, and then during time periods that are likely to pose accessibility issues to patients, such as early mornings (i.e. before 9 am), lunch hours, and evenings (i.e. after 6 pm). Running this model for explicit time periods increases the granularity of our analysis and may drive a more comprehensive understanding of the spatial accessibility to pharmacy care across Vermont. 

The model is based on the Enhanced 2-Step Floating Catchment (E2SFCA) method, first developed by Luo and Qi (2009) to measure the spatial accesibility to primary care physicians. Since then, this method has been widely adopted to measure the spatial accessibility to a wide range of services; still, it appears to primarily be used to address access to healthcare resources. 
Background on the E2SFCA. Mention how we plan to adapt it. 

Though there is some uncertainty in all that our analysis will entail, we are also interested in the relationship between social vulnerability and pharmacy access as well as the impact that independent versus chain pharmacies have on the accessibility landscape. Considering Vermont's aging population and the elderly's increased reliance on pharmaceuticals, we are interested in the relationship between the presence of elderly populations and pharmacy access, which is likely correlated with SVI as age is taken into account in this measure of vulnerability. 

Specific statistical tests? Only to test hypothesis. 
- Simple t-test, is there a difference between urban and rural accesibility to pharmacy care, maybe weighted by population. 
- Metropolitan, micropolitan, and rural, see if there's statistically significant difference between groups. Plan to compare access across these groups at various time extents, presuming that there will be more extreme differences during extreme business hours - these classifications are by town, another justifcation to use towns for final maps as geographic unit



## Materials and procedure

## Computational environment

Similar to Kang et al. (2020), this study was run using CyberGIS-Jupyter. 

Please refer to `00-Python-environment-setup.ipynb` for details.



In [None]:
# Import modules, define directories
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import osmnx as ox
import re
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from tqdm import tqdm
import multiprocessing as mp
import folium
import itertools
import os
import time
import warnings
import IPython
from IPython.display import display, clear_output

warnings.filterwarnings("ignore")
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))



### Data and variables

Three main datasets will be used for this study, only one of which includes primary data: 1) A retail pharmacy dataset, including the locations, hours of operations, and staffing levels of each retail pharmacy, 2) a residential dataset, and 3) a road network dataset. The list of active pharmacies was sourced from the Vermont Office of Professional Regulation(OPR)and the Homeland Infrastructure Foundation-Level Data (HIFLD) provided by the Department of Homeland Security (DHS). A list of all pharmacies within the study area was compiled by aggregating these data sources and checking them against Google Maps and OpenStreetMap. By conducting surveys, we verified the publicly available data regarding operational hours and gathered information on staffing levels. The residential dataset is sourced from the United States Census Bureau and contains information on the populations living in given census tracts and zipcodes (?), as well as the demographics of those tracts. The road network dataset is pulled from OpenStreetMap (OSM) using the OSMnx package in Python. 

The sole primary data source for the study include the data we collected on pharmacies, while the secondary data sources for the study include the residential dataset and the road network data. 


#### Pharmacy Dataset


**Standard Metadata**

- `Abstract`: Brief description of the data source
- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
- `Lineage`: Describe and/or cite data sources and/or methodological steps planned to create this data source.
  - sampling scheme, including spatial sampling
  - target sample size and method for determining sample size
  - stopping criteria for data collection and sampling (e.g. sample size, time elapsed)
  - de-identification / anonymization
  - experimental manipulation
- `Distribution`: Describe who will make the data available and how?
- `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*
- `Data Quality`: State any planned quality assessment
- `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
  - `Label`: variable name as used in the data or code
  - `Alias`: intuitive natural language name
  - `Definition`: Short description or definition of the variable. Include measurement units in description.
  - `Type`: data type, e.g. character string, integer, real
  - `Accuracy`: e.g. uncertainty of measurements
  - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
  - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations
  - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected

| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| variable1 | ... | ... | ... | ... | ... | ... | ... |
| variable2 | ... | ... | ... | ... | ... | ... | ... |



#### Residential Dataset

think about attribute table for the pharmacy data
**Standard Metadata**

- `Abstract`: Brief description of the data source
- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
- `Lineage`: Describe and/or cite data sources and/or methodological steps used to create this data source
- `Distribution`: Describe how the data is distributed, including any persistent identifier (e.g. DOI) or URL for data access
- `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*
- `Data Quality`: State result of quality assessment or state "Quality unknown"
- `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
  - `Label`: variable name as used in the data or code
  - `Alias`: intuitive natural language name
  - `Definition`: Short description or definition of the variable. Include measurement units in description.
  - `Type`: data type, e.g. character string, integer, real
  - `Accuracy`: e.g. uncertainty of measurements
  - `Domain`: Range (Maximum and Minimum) of numerical data, or codes or categories of nominal data, or reference to a standard codebook
  - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations
  - `Missing Data Frequency`: Frequency of missing data observations

| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| variable1 | ... | ... | ... | ... | ... | ... | ... |
| variable2 | ... | ... | ... | ... | ... | ... | ... |

#### OpenStreetMap Road Network 

**Standard Metadata**

- `Abstract`: Brief description of the data source
- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
- `Lineage`: Describe and/or cite data sources and/or methodological steps used to create this data source
- `Distribution`: Describe how the data is distributed, including any persistent identifier (e.g. DOI) or URL for data access
- `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*
- `Data Quality`: State result of quality assessment or state "Quality unknown"
- `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
  - `Label`: variable name as used in the data or code
  - `Alias`: intuitive natural language name
  - `Definition`: Short description or definition of the variable. Include measurement units in description.
  - `Type`: data type, e.g. character string, integer, real
  - `Accuracy`: e.g. uncertainty of measurements
  - `Domain`: Range (Maximum and Minimum) of numerical data, or codes or categories of nominal data, or reference to a standard codebook
  - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations
  - `Missing Data Frequency`: Frequency of missing data observations

| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| variable1 | ... | ... | ... | ... | ... | ... | ... |
| variable2 | ... | ... | ... | ... | ... | ... | ... |



#### Secondary data source2 name

... same form as above...



### Prior observations  

Prior experience with the study area, prior data collection, or prior observation of the data can compromise the validity of a study, e.g. through p-hacking.
Therefore, disclose any prior experience or observations at the time of study pre-registration here, with example text below:

At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied.
This study is related to ____ prior studies by the authors

For each primary data source, declare the extent to which authors had already engaged with the data:

- [ ] no data collection has started
- [ ] pilot test data has been collected
- [ ] data collection is in progress and data has not been observed
- [ ] data collection is in progress and __% of data has been observed
- [ ] data collection is complete and data has been observed. Explain how authors have already manipulated / explored the data.

For each secondary source, declare the extent to which authors had already engaged with the data:

- [ ] data is not available yet
- [ ] data is available, but only metadata has been observed
- [ ] metadata and descriptive statistics have been observed
- [ ] metadata and a pilot test subset or sample of the full dataset have been observed
- [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data.

If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design.



### Bias and threats to validity

Boundary effects, modifiable areal unit problem --> units for SVI dont necessarily align with the areas served by the pharmacies

Boundary effects solved by using road networks and population data 

Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.

These include:
  - uneven primary data collection due to geographic inaccessibility or other constraints
  - multiple hypothesis testing
  - edge or boundary effects
  - the modifiable areal unit problem
  - nonstationarity
  - spatial dependence or autocorrelation
  - temporal dependence or autocorrelation
  - spatial scale dependency
  - spatial anisotropies
  - confusion of spatial and a-spatial causation
  - ecological fallacy
  - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy

Estimating number of pharmacist and pharmacy technicians based on budegeted hours. Potentially missing pharmacy locations outside of Vermont. 


### Data transformations

Describe all data transformations planned to prepare data sources for analysis.
This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis.
Including steps to check and mitigate sources of **bias** and **threats to validity**.
The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test.
More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections.
Visual workflow diagrams may help communicate the methodology in this section.

Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.

Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc.

Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**.



### Analysis

Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions.
This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*.
Also explain any follow-up analyses or validations.

SVI correlations, independent versus chain pharmacies. 

## Results

Describe how results are to be presented.



## Discussion

Describe how the results are to be interpreted *vis a vis* each hypothesis or research question.



## Integrity Statement

Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.
If a prior registration *does* exist, explain the rationale for revising the registration here.



# Acknowledgements

- `Funding Name`: name of funding for the project
- `Funding Title`: title of project grant
- `Award info URI`: web address for award information
- `Award number`: award number

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

## References