# Analytic Plan - Reproducible Notebooks

To examine the relationship between air quality and high blood pressure prevelance, a Geographically Weighted Regression (GWR) was conducted, with high blood pressure prevelance as the target variable.

![Screenshot%202026-01-15%20175853.png](attachment:Screenshot%202026-01-15%20175853.png)

(GWR is a spatial regression approach that accounts for geographic variability, producing area-specific regression results and allowing for the creation of a localized air quality health risk landscape.)

## Feature Selection 
<font color='grey'>*Feature selection for the high blood pressure GWR model involved using correlation to aid in the initial reduction of variables and Ordinary Least Squares (OLS) regression to evaluate strength of explanatory variables and to aid in the reduction of variables.*</font> 


[Correlation Analyses](gwr/correlation_hbp.ipynb) This notebook looks at the Correlations between High Blood Pressure, other county health indicators and Air Quality. These correlations were used to evaluate strength of explanatory variables and to aid in the initial reduction of variables. 


[Ordinary Least Squares (OLS) Regression](gwr/OLSRegression_hbp.ipynb)
In this notebook, ordinary least squares regression is used to evaluate strength of explanatory variables and to aid in the reduction of variables. The OLS Regression based off of correlation results in Correlation analyses notebook listed above. Among the  variables that provided the highest explanitory strength for high blood pressure prevelance were the following independent/predictor variables:
<br>
<br>1. PM2.5 Air Quality (`AIRQUALTY_MEAN`)
<br>2. Food Insecurity (`FoodInsecurity`) 
<br>3. General Health (`GHLTH_AdjPrev`)
<br>4. Sleep (`SLEEP_AdjPrev`)

## Spatial Patterns 
<font color='grey'>*Spatial Patterns were calculated using spatial autocorrelation. A series of spatial autocorrelation analyses with chosen predictor variables was conducted. Spatial autocorrelation which will help inform the Geographically Weighted Regression and development of the model.*</font> 

All selected variables demonstrated significant spatial autocorrelation (p < 0.05). These four factors were included in the GWR model as independent predictor variables. These variables included:

1. [SpatialAutocorrelation_airquality.ipynb](gwr/SpatialAutocorrelation_airquality.ipynb) In this notebook, spatial autocorrelation analysis is conducted to understand spatial clustering patterns of the key predictor variable **`AIRQUALTY_MEAN` defined as the yearly average density of fine particulate matter in micrograms per cubic meter (PM2.5).**
<br>

2. [SpatialAutocorrelation_foodinsecurity.ipynb](gwr/SpatialAutocorrelation_foodinsecurity.ipynb) In this notebook, spatial autocorrelation analysis is conducted to understand spatial clustering patterns of the high blood pressure predictor variable **`FoodInsecurity` defined as the percentage of population who lack adequate access to food**.
<br>

3. [SpatialAutocorrelation_generalhealth.ipynb](gwr/SpatialAutocorrelation_generalhealth.ipynb) In this notebook, spatial autocorrelation analysis is conducted to understand spatial clustering patterns of the high blood pressure predictor variable **`GHLTH_AdjPrev` defined as the model-based estimate for age-adjusted prevalence of fair or poor health among adults aged >=18 years.**
<br>

4. [SpatialAutocorrelation_sleep.ipynb](gwr/SpatialAutocorrelation_sleep.ipynb) In this notebook, spatial autocorrelation analysis is conducted to understand spatial clustering patterns of the high blood pressure predictor variable **`Sleep` defined as the model-based estimate for age-adjusted prevalence of sleeping less than 7 hours among adults aged >=18 years**
![Screenshot%202026-01-15%20175321.png](attachment:Screenshot%202026-01-15%20175321.png)

<br>
<br>

## Spatial Model: GWR
<font color='grey'>*Geographically weighted regression (GWR) is a model that accounts for spatial variability and allows for non-stationary parameter estimates to be computed. By employeeing GWR, it is possible to examine the spatial relationship between air quality and the target health outcome: high blood pressure - determining the air quality health risk of county areas in the Contiguous US.*</font> 


[GWR_HighBP.ipynb](gwr/GWR_HighBP.ipynb) 
GWR for High Blood Pressure Prevelance. Independent Variables tested: "air quality","food insecurity","general health", and "sleep". The GWR adujusted R2 model fit was approximately .92.

The GWR model explained **92% of the variance in high blood pressure prevelance**, with air quality showing a **significant contribution**—indicating that air quality helps explain variability in high blood pressure prevelance even when accounting for other relevant county health factors.

Air quality parameter estimates are displayed in the map below. These values show the **magnitude and direction** of the relationship between air quality rate and high blood pressure %.

- **Blue** indicate counties where a **significant positive relationship** exists—meaning that as air quality increases, hbp also increases.
- **Green or yellow** indicates a **negative relationship**—as air quality increases, hbp decreases.

![Screenshot%202026-01-15%20181224.png](attachment:Screenshot%202026-01-15%20181224.png)