# Pre-Analysis



Version 1.0 \| First Created September 18, 2023 \| Updated November 2023

## Jupyter Notebook

This is an Jupyter Notebook document. For more details on using a Jupyter Notebook see <https://docs.jupyter.org/en/latest/>.



# A Spatio-temporal Accessibility Analysis of Pharmacy Care in Vermont

### Authors

- Sam Roubin\*, sroubin@middlebury.edu, @samroubin, https://orcid.org/0009-0005-5490-3744, Middlebury College
- Joseph Holler, jholler@middlebury.edu, @josephholler, https://orcid.org/0000-0002-2381-2699, Middlebury College

\* Corresponding author and creator



### Abstract

Pharmacy care is a fundamental aspect of primary healthcare and is gaining greater recognition in the healthcare landscape. Recent research has underscored the importance of pharmacy access in primary care, as pharmacies are visited at higher rates than primary care providers and are particularly valuable for reaching rural populations. We aim to measure spatiotemporal variation in access to pharmacy care across the state of Vermont using the enhanced 2-step floating catchment area (E2SFCA) method. Specifically, we seek to identify areas that have particularly limited access to pharmacies. This research is temporally explicit, as it will analyze variation in accessibility at specific times of the day and days of the week. Such temporally granular data has benefits for the research since it provides information on how spatial accessibility varies at irregular times, improving our understanding of when pharmacy care is particularly limited for certain populations. Results will be depicted as a series of spatial accessibility maps, covering diverse temporal extents to capture variations in pharmacy working hours. Previous studies have used the two step floating catchment area (2SFCA) and the E2SFCA method to measure spatial accessibility of pharmacy care across geographic scales and regions. To our knowledge, this is the first temporally-explicit study of pharmacy care, and the first E2SFCA study of pharmacy accessibility in Vermont. The results of this study are of interest to service providers and Vermont state public health officials and may have important implications for public health planning in the state. 

This study adapts and extends the methodology used in Kang et al.'s (2020) "Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA" to measure spatial accessibility of pharmacy care across Vermont. 



### Study metadata 

- `Key words`: Pharmacy, spatial accessibility, pharmacy deserts, Vermont, E2FSCA. 
- `Subject`: Medicine and Health Sciences: Public Health: Health Services Research OR Social and Behavioral Sciences: Geography: Spatial Science
- `Date created`: 9/18/2023
- `Date modified`: 12/14/2023
- `Spatial Coverage`: The state of Vermont and areas within 10-miles of the Vermont border (MA, NY, NH). Canada is excluded. 
- `Spatial Resolution`: Vermont municipalities. These units are roughly the same size as zipcodes, but they represent politically meaningful geographic units. 
- `Spatial Reference System`: EPSG 6589  
- `Temporal Coverage`: Data was collected in October, November, and December of 2023. The temporal extent is not explicitly determined. Pharmacies have been asked to provide their staffing levels over the period of roughly a month. The results from the study will be theoretically based on a single point in time in the fall of 2023. This is a one time measuremement, and the study does not investigate change over time. 
- `Temporal Resolution`: The temporal resoution of this study varies. Final maps will represent periods of a couple of hours of a week to whole days, or an entire work week. Spatial access will be assessed and compared across specific hours of the days, between weekdays and Saturdays, weekdays and Sundays, and other combinations of these temporal units. 

## Study design

Our study represents original research employing the Enhanced 2-Step Floating Catchment Area (E2SFCA) method to assess spatial and temporal dimensions of pharmacy accessibility across Vermont. Holler et al. (2022) previously [reproduced Kang et al.'s (2020) study](https://osf.io/n92v3/) on spatial accessibility of COVID-19 healthcare resources in Illinois, USA. Our research extends the the foundational code from this reproduction study, making relevant changes for our analysis. While retaining the core methodology, we make geographic adjustments, such as modifying the geographic area of interest and spatial resolution of accessibility measurements. We also introduce temporal considerations for a more nuanced analysis of pharmacy access. 

This study is an observational study that employs a descriptive model, based upon the E2SFCA method, first developed by Luo and Qi (2009) to measure spatial accessibility to primary care physicians. Since then, this method has been widely adopted to measure the spatial accessibility to a wide range of services; still, it appears to primarily be used to address access to healthcare resources.  

The overarching objective of this study is to understand how access to pharmacy care varies spatiotemporally across Vermont. This investigation is primarily driven by the hypothesis that rural populations across Vermont experience increased difficulty in accessing pharmacy services,  particularly beyond the conventional 9 am to 5 pm business hours on weekdays. We will conduct multiple runs of a spatial accessibility model at varying times of day and days of the week (temporally-explicitly) in order to describe how general variation in spatial access is impacted throughout a given week. Generally, results will be depicted in series of spatial accesibility maps. Three primary research questions and hypotheses arise from studying temporally explicit spatial accessibility of pharmacy care in Vermont. 

1. Geographical Variation: Do certain areas of the state exhibit particularly limited access to pharmacy care?
    - H1: Rural towns have relatively limited access to pharmacy care when compared to microplitan and metroplitan towns. 
    - H1 Null: There is no significant difference in access to pharmacy care among Vermont towns classified as metropolitan, micropolitan, and rural


2. Temporal Fluctuations: How does the accessibility to pharmacies fluctuate across different temporal segments of the day, various days of  the week, or combinations of these temporal dimensions? 
    - H2: The accessibility to all pharmacies varies across temporal segments, including different times of the day, days of the week, and combinations of these temporal dimensions, with more limited access outside of typical 9 am to 5 pm weekday business hours.
    - H2 Null: There is no significant difference in access between these temporal segments. 


3. Spatiotemporal: Presuming that we have observed differences in pharmacy accessibility in rural areas, are spatial differences in access exacerbated outside of conventional business hours? 
    - H3: Spatial differences in access will be exacerbated outside of the convetional weekday business hours, with larger differences arising at more extreme hours. 
    
It is important to note that the classifications of metropolitan, micropolitan, and rural are based on town (county subdivision) geographical units, providing justification to use towns as the geographic units for the final maps in our study. Additionally, these units are politically meaningful, which also informed this decision.  

Our hypotheses are based on the primary care accessibility literature, which has demonstrated urban-rural differences in access. It has also been shown that rural residents are less likely than their metropolitan counterparts to have usual source of care providers available during nights and weekends (Kirby & Yabroff, 2020). To assess our hypothesis regarding diminished spatial accessibility to pharmacy services for rural populations, we may utilize the ANOVA test (potentially weighted by population) to compare accessibility across metropolitan, micropolitan, and rural towns. This statistical test is commonly used to analyze the difference between the means of more than two groups. We plan to use choropleth maps to illustrate the spatial variation in pharmacy access. Testing for statistically significant difference of means will also be used to determine whether accessibility varies temporally. However, it is decidedly more complex to run statistical tests that encapsulate the complexity of spatiotemporal variations in access to pharmacy care. As such, we may demonstrate our results in a more descriptive manner through comparing maps of accesibility over varying time extents for different geographic regions, or summarizing accessibility measures in interesting categories of the state for these various time extents. 

The specific time intervals for which we will measure accessibility have not yet been determined. However, it is likely that we will run the model for weekdays, Saturdays, and Sundays generally, and then during time periods that are likely to pose accessibility issues to patients, such as early mornings (i.e. before 9 am), lunch hours, and evenings (i.e. after 6 pm). This temporal granularity aims to establish a more detailed and comprehensive understanding of the spatial and temporal accessibility landscape of pharmacy care across the state of Vermont.

Though there is some uncertainty in all that our analysis will entail, we are additionally interested in the relationship between social vulnerability and pharmacy access, as well as the impact that independent versus chain pharmacies have on the accessibility landscape. Considering Vermont's aging population and the elderly's increased reliance on pharmaceuticals, we are curious about the relationship between the presence of elderly populations and pharmacy access. To demonstrate these results, we envision creating scaterplots that compare percent elderly population in a town to the town's spatial accesibility measure and running linear regression tests weighted by population. It may alternately be possible to categorize the percentage of elderly in three categories (i.e. high, medium, low) and testing for significant difference of means of spatial accessibility between these groups.  By investigating these relationships, our study aims to explore some factors associated with pharmacy accessibility in Vermont, offering insights with potential additional implications for public health planning and resource allocation. 

## Materials and procedure

## Computational environment

Similar to Kang et al. (2020), this study was run using CyberGIS-Jupyter. This study will use the an updated software environment from the reproduction study, using Python (3.11.4) instead of Python (3.7.6) Jupyter Notebooks in the CybgerGISX environment. It consists of the software packages listed below. The research completed  by accessing the CyberGISX environment from the MacOS operating system. Running Jupyter Noteboooks in a cyberinfrastructure environment is recommended to gain the advantages of parallel computing.


In [2]:
# Import modules, define directories
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import osmnx as ox
import re
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from tqdm import tqdm
import multiprocessing as mp
import folium
import itertools
import os
import time
import warnings
import IPython
from IPython.display import display, clear_output

warnings.filterwarnings("ignore")
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))



numpy==1.22.0
pandas==1.3.5
geopandas==0.10.2
networkx==2.6.3
osmnx==1.1.2
re==2.2.1
folium==0.12.1.post1
IPython==8.3.0


### Data and variables

Three main datasets will be used for this study, only one of which includes primary data: 1) A retail pharmacy dataset, including the locations, hours of operations, and staffing levels of each retail pharmacy, 2) a population dataset, and 3) a road network dataset. The list of active pharmacies was sourced from the Vermont Office of Professional Regulation(OPR)and the Homeland Infrastructure Foundation-Level Data (HIFLD) provided by the Department of Homeland Security (DHS). A list of all pharmacies within the study area was compiled by aggregating these data sources and checking them against Google Maps and OpenStreetMap (OSM). By conducting surveys, we verified the publicly available data regarding operational hours and gathered information on staffing levels. The populations dataset is sourced from the United States Census Bureau's American Community Survey and contains information on the populations living in Vermont towns, as well as the demographics of those towns. The road network dataset is pulled from OSM using the OSMnx package in Python. 

The sole primary data source for the study include the data we collected on pharmacies, while the secondary data sources for the study include the residential dataset and the road network data. 


#### Pharmacy Dataset


**Standard Metadata**

- `Abstract`: This dataset is primary data that was collected through surveying all VT pharmacies and pharmacies wihtin 10 miles of the VT border about their staffing levels and hours of operations. It contains information on each pharmacy's location, hours of operations on weekdays, Saturdays, and Sundays, and the number of pharmacists and pharmacy technicians that typically work on a given day. 
- `Spatial Coverage`: Vermont and ten miles within in Vermont state border, encompassing parts of New York, Massachusetts, and New Hampshire. Canada was excluded. 
- `Spatial Resolution`: This data specifies the specific coordinates of the pharmacy locations of interest. Resolution is not relevant. 
- `Spatial Reference System`: EPSG 6589 in order to match the results maps. 
- `Temporal Coverage`: Data was collected over the course of a couple of months, but it theoretically represents a an average week in the fall or early winter of 2023. 
- `Temporal Resolution`: A week. Differential data was collected for weekdays, Saturdays, and Sundays of a typical week. 
- `Lineage`: Initial pharmacy list was downloaded from the Vermont Office of Professional Regulation (https://secure.professionals.vermont.gov/prweb/PRServletCustom/app/NGLPGuestUser/V9csDxL3sXkkjMC_FR2HrA*/!STANDARD) and modified to exclude no longer active pharmacies and non-retail pharmacies. The pharmacy locations were checked against Google Maps and a national pharmacy dataset from the Department of Homeland Security (DHS) to ensure all pharmacies in the state were included. Pharmacy locations outsie of Vermont but within study area were incuded through querying pharmacies from OSM in QGIS, Google Maps searches, and the DHS pharmacy dataset. Newly permanently closed pharmacies were omitted from the data. All collected data on hours of operations and staffing were manually inputted by the authors. 
- `Distribution`: The majority of this dataset will be made public and downloadable on a GitHub repository. The staffing data will not be made public due to the proprietary nature of this information; however, the data may be available upon request. 
- `Constraints`: We have agreed not to publicly release staffing levels of the pharmacy locations due to the proprietary nature of this data for some of the larger pharmacy chains.
- `Data Quality`: 
- `Variables`: 

| Label | Definition | Type |
| :--: | :--: | :--: | 
| x coordinate | GPS coordinates. Lat | Decimal | 
| y coordinate | GPS coordinates. Lon | Decimal | 
| address | pharmacy street address | Text |  
| week_hours | hours of operations on weekdays | Datetime | 
| week_lunch | lunchbreak hours on weekdays |  Datetime | 
| sat_hours| hours of operations on Saturdays | Datetime | 
| sat_lunch | lunchbreak hours on Saturdays | Datetime |
| sun_hours| hours of operations on Sundays | Datetime | 
| sun_lunch | lunchbreak hours on Sundays | Datetime | 
| week_pharm| Typical # pharmacists on weekday | Integer | 
| week_tech| Typical # pharm. techs on weekdays  | Integer | 
| sat_pharm| Typical # pharmacists on Saturday | Integer| 
| sat_tech| Typical # pharm. techs on Saturdays  | Integer| 
| sun_pharm| Typical # pharmacists on Sundays | Integer | 
| sun_tech| Typical # pharm. techs on Sudays| Integer| 


  - `Accuracy`: We are confident in the accuracy of the pharmacy list in Vermont. Our confidence in the out-of-state retail pharmacy locations is slightly diminished since these pharmacy locations are not based on a recent dataset from the respective states. We have confidence in the quality of the data on pharmacy staffing and hours of operations since we collected the data through surveying all pharmacies.  
  - `Domain`:  The expected range of all of the pharmacy operational hours is from roughly 8am to 9pm. The expected range of pharmacist and technicians is from one to ten for weekdays, Saturdays, and Sundays. 
  - `Missing Data Value(s)`: For any missing data on the number of pharmacists and pharmacy technicians at a given pharmacy location, the number of pharmacists and technicians will either be estimated based on similar pharmacies' data or uniformly given a value of one pharmacist and one technician. For missing data on hours of operations from surveying, the posted hours online will be used. 
  - `Missing Data Frequency`: Not yet known for data to be collected. We expect there to be some pharmacy locations from which we are unable to collect data. 





#### Population Dataset

**Standard Metadata**

- `Abstract`: Secondary data pulled from the US Census Bureau's American Community Survey (ACS), containing information on demographics of county subdivisions (towns). 
- `Spatial Coverage`: Vermont. 
- `Spatial Resolution`: County subdivisions (i.e. towns) 
- `Spatial Reference System`: NA
- `Temporal Coverage`: 2018-2022 ACS (5-year estimates)
- `Temporal Resolution`: NA. 
- `Lineage`: This data is downloaded from Social Explorer ACS 2018-2022. Only total population and age variables will be selected, as these are the only demographic data necessary for our study. The data on age is reported as cohorts. All cohorts older than 65 will be combined to create a percent value for the elderly population in each town. Elderly is commonly defined as 65 years or older. 
- `Distribution`: This data is downloadable from the US Census Bureau ACS page (https://data.census.gov/table?g=040XX00US50$0600000) or from Social Explorer Tables (https://www.socialexplorer.com/tables/ACS2022_5yr/R13548224)
- `Constraints`: US Census Bureau is publicly accessible data. Social explorer requires a license but is not necessary- it solely serves to streamline the data cleaning process as only variables of interest can be selected. 
- `Data Quality`: Census provides data on margins of error. 
- `Variables`: 

| Label |  Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
| :--: |  :--: | :--: | :--: | :--: | :--: | :--: |
| total_pop |  Total population in a town | Integer | Defined by Census | 0 - ~45,000 | NA | NA |
| percent_elderly |  Percent elderly residents in a town | Decimal | Defined by Census |  0% - ~50%  | NA | NA |

#### OpenStreetMap Road Network 

**Standard Metadata**

- `Abstract`: Combines OpenStreetMap (OSM) data with network analysis algorithms from the python package, NetworkX. Contains information on road networks and speed limits, which are used to calculate travel times. 
- `Spatial Coverage`: Road network data will be pulled for the study area-- Vermont and a 10-mile buffer surrounding the VT border (excluding Canada)
- `Spatial Resolution`: NA
- `Spatial Reference System`: NA
- `Temporal Coverage`: This data is pulled at the time of our analysis. How often is the speed limit / road data updated? 
- `Temporal Resolution`: This data represents a single point in time. 
- `Lineage`: The road network is pulled from OpenStreetMap. The road segments within this network are then categorized by their unique speed limit value. A network setting function is subsequently used to clean the road network to work better with drive-time analysis. This function calculates the speed limits using the OSMnx data and populates any missing speed limit values with averages of other segments of the same road type. Then, it calculates travel times based on the speed limits and road distances. 
- `Distribution`: The data will be accessible based on the public code that will be released, which pulls in the necessary data for our study area. 
- `Constraints`: The data is publicly accessible; however, OSM should be acknowledged of the source of this data, and the code to pull this data should also be accessible.  
- `Data Quality`: Quality unknown
- `Variables`: Exact variables unnecessary at this time. Code will be provided and publicly accessible to retrieve the road netowrk data used in our study. 


### Prior observations  

At the time of this study pre-registration, we have some prior knowledge of the geography of the study region with regards to the pharmacy accessibility. Since the study includes primary data, we have been collecting the data on pharmacy staffing and hours of operations of pharmacy locations within the study area. Given the data collection methodology, which mostly includes surveying individual pharmacies over the phone, we are aware of the data we have collected. However, we have not observed or analyzed this data in full and have paid little attention towards the the patterns in the data. Although we do not specifically ask about how pharmacist staff are feeling about their staffing levels and patient accessibility, some survey respondents have expressed their concerns about staffing difficulty across the state. Speaking to pharmacists on the phone has brought to light certain sentiments about the pharmacy landscape in Vermont; however, no specific distribution of these sentiments (i.e. rural vs urban) has been observed or analyzed. No data manipulation has occurred. 

This study is related to Holler et al. (2022) which assessed  in its methodology, but we have not conducted any prior studies regarding pharmacy accessibility or healthcare accessibility in Vermont. 

Pharmacy Dataset (Primary): 
- Data collection is in progress and all of the data has been observed due to the nature of surveying and recording data manually. No data manipulation has occured. 

Population Dataset (Secondary): 
- Data is available, but only metadata has been observed

Road Network Dataset (Secondary):
- Data has not been observed. The data will be pulled in during the analysis using the OSMnx package in python. 



### Bias and threats to validity

Boundary effects, modifiable areal unit problem --> units for SVI dont necessarily align with the areas served by the pharmacies

Boundary effects solved by using road networks and population data 
Estimating number of pharmacist and pharmacy technicians based on budegeted hours. Potentially missing pharmacy locations outside of Vermont. 

It is difficult to encapsulate the complex dynamics of pharmacy staffing in our data collection and models. Frequently overlap between staff for only portions of the day (i.e lunch hours), number of pharmacists and 

Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.

These include:
  - uneven primary data collection due to geographic inaccessibility or other constraints
  - multiple hypothesis testing
  - edge or boundary effects
  - the modifiable areal unit problem
  - nonstationarity
  - spatial dependence or autocorrelation
  - temporal dependence or autocorrelation
  - spatial scale dependency
  - spatial anisotropies
  - confusion of spatial and a-spatial causation
  - ecological fallacy
  - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy





### Data transformations

Describe all data transformations planned to prepare data sources for analysis.
This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis.
Including steps to check and mitigate sources of **bias** and **threats to validity**.
The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test.
More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections.
Visual workflow diagrams may help communicate the methodology in this section.

Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.

Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc.

Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**.



### Analysis

Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions.
This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*.
Also explain any follow-up analyses or validations.

SVI correlations, independent versus chain pharmacies. 

## Results

Results will be presented in a series of maps depicting varying spatial accesibility to pharmacy care by Vermont town. Results will also be summarized using graphs that compare spatial accessibility by variables such as temporal segments, rurality, or percent elderly. 



List of anticipated maps and graphs in result section. 
- Graph with time vs spatial accesibility measure. Done in 



## Discussion

Describe how the results are to be interpreted *vis a vis* each hypothesis or research question.



## Integrity Statement

The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.




# Acknowledgements

- `Funding Name`: NA
- `Funding Title`: NA
- `Award info URI`: NA
- `Award number`: NA

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

## References

Kang et al. 2020. Rapidly Measuring Spatial Accessibility of COVID-19 Healthcare Resources: A Case Study of Illinois, USA. *International Journal of Health Geographics* 19:36. DOI:[10.1186/s12942-020-00229-x](https://doi.org/10.1186/s12942-020-00229-x).

Luo, W., & Qi, Y. (2009). An enhanced two-step floating catchment area (E2SFCA) method for measuring spatial accessibility to primary care physicians. *Health & place*, 15(4), 1100-1107.