# CVS Health Community Access Analysis - Part 3: Geographic Analysis

This notebook creates geographic visualizations (maps) to show spatial patterns in clinic distribution, vulnerability, and health needs across the United States.


In [None]:
# import libraries for geographic visualization
import pandas as pd
import numpy as np
import plotly.express as px
import requests
import json

# load data (assumes you've run notebooks 06a and 06b)
# if running independently, load and prepare data first
print("ready for geographic analysis")


## Load Geographic Data

we need county boundary data to create maps. we'll use plotly's built-in geojson data for US counties.


In [None]:
# download county boundary data from plotly's dataset repository
# this geojson file contains the geographic boundaries for all US counties
url = "https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json"
counties = requests.get(url).json()

print(f"loaded county boundaries for {len(counties['features'])} counties")


## Map 1: Distribution of Clinic Counts

this map shows where CVS clinics are located across the United States. darker colors indicate more clinics in that county.


In [None]:
# create choropleth map showing clinic count by county
# choropleth maps color-code geographic regions based on data values
fig = px.choropleth(
    df,
    geojson=counties,
    locations='fips',  # county FIPS codes for matching
    color='clinic_count',  # variable to color by
    color_continuous_scale="Reds",  # color scheme (red = more clinics)
    scope="usa",  # focus on United States
    labels={'clinic_count':'CVS Clinic Count'},
    title="CVS Clinic Distribution Across U.S. Counties"
)

# adjust map settings for better display
fig.update_geos(fitbounds="locations", visible=False)
fig.show()


### what this map shows:

- most counties appear light/white, indicating zero clinics
- darker red areas show counties with multiple clinics
- clinics are highly concentrated in specific regions (likely urban/metropolitan areas)
- large swaths of the country have no CVS clinic coverage
- the pattern suggests geographic clustering rather than even distribution


## Map 2: Social Vulnerability Index (SVI)

this map shows social vulnerability across counties. darker colors indicate higher vulnerability (more challenges).


In [None]:
# create map showing social vulnerability index
# SVI measures how vulnerable a community is to disasters and other stressors
fig = px.choropleth(
    df,
    geojson=counties,
    locations='fips',
    color='svi_overall',  # overall social vulnerability score
    color_continuous_scale="Viridis",  # purple-to-yellow color scheme
    scope="usa",
    labels={'svi_overall':'SVI (Overall Vulnerability)'},
    title="Social Vulnerability (SVI) Across U.S. Counties"
)

fig.update_geos(fitbounds="locations", visible=False)
fig.show()


### what this map shows:

- vulnerability patterns vary across regions
- some areas show consistent high vulnerability (darker colors)
- comparing this to the clinic map can reveal if vulnerable areas lack clinics
- rural areas and certain regions show higher vulnerability


## Map 3: Health Burden Score

this map shows health burden across counties. darker colors indicate worse health outcomes (higher burden).


In [None]:
# create map showing health burden scores
# health burden combines stroke, physical inactivity, disability, and social isolation
fig = px.choropleth(
    df,
    geojson=counties,
    locations='fips',
    color='health_burden_score',  # health burden metric
    color_continuous_scale="Plasma",  # purple-to-yellow color scheme
    scope="usa",
    labels={'health_burden_score': 'Health Burden Score'},
    title="Health Burden Across U.S. Counties"
)

fig.update_geos(fitbounds="locations", visible=False)
fig.show()


### what this map shows:

- health burden varies significantly across regions
- certain areas show consistently high health burden (darker colors)
- comparing this to clinic distribution reveals if high-need areas have access
- patterns may correlate with rural/urban divides or regional health trends


## Map 4: Access Gap Map

this map shows the gap between health need and clinic availability. positive values (red) mean high need but low access. negative values (blue) mean adequate access relative to need.


In [None]:
# create normalized health need score (0 to 1 scale)
# this makes it comparable across counties
df['health_need'] = (
    (df['health_burden_score'] - df['health_burden_score'].min()) /
    (df['health_burden_score'].max() - df['health_burden_score'].min())
)

# create normalized clinic availability score (0 to 1 scale)
# counties with more clinics get higher scores
if df['clinic_count'].max() > 0:
    df['clinic_availability'] = df['clinic_count'] / df['clinic_count'].max()
else:
    df['clinic_availability'] = 0

# calculate gap score: positive = need exceeds access, negative = access exceeds need
df['gap_score'] = df['health_need'] - df['clinic_availability']

print(f"gap score range: {df['gap_score'].min():.2f} to {df['gap_score'].max():.2f}")
print(f"counties with positive gap (need > access): {(df['gap_score'] > 0).sum()}")
print(f"counties with negative gap (access > need): {(df['gap_score'] < 0).sum()}")


In [None]:
# create gap map using diverging color scheme
# red = high need, low access (underserved)
# blue = low need, high access (well-served)
fig = px.choropleth(
    df,
    geojson=counties,
    locations='fips',
    color='gap_score',
    color_continuous_scale="RdBu",  # red-blue diverging colors
    range_color=(-1, 1),  # set color scale range
    scope="usa",
    labels={'gap_score': 'Need-Access Gap Score'},
    title="Access Gap Map: Counties Where Health Needs Exceed CVS Clinic Availability"
)

fig.update_geos(fitbounds="locations", visible=False)
fig.show()


### what this map shows:

- red areas: counties with high health need but low clinic access (underserved)
- blue areas: counties with adequate or excess clinic access relative to need
- white/light areas: counties where need and access are balanced
- this map directly identifies priority expansion targets (reddest areas)
- the pattern reveals geographic clusters of underserved communities


In [None]:
# identify worst underserved counties by gap score
# these are the highest priority for expansion
worst_underserved = df.sort_values('gap_score', ascending=False).head(20)[
    ['county_full', 'state_full', 'gap_score', 'health_need', 'clinic_availability']
]

print("top 20 counties with largest access gaps:")
print("(highest need, lowest access)")
worst_underserved


### what this means:

these counties show the strongest mismatch between health needs and clinic access. they have very high health burden scores but zero or very few CVS clinics. these represent the highest-priority expansion targets where CVS could make the biggest impact on community health access.
