Examining the Impacts of Measurement Error in Quantifying Health Disparities: A Case Study on Diabetes and the Food Environment
This repository contains R code to reproduce results from the
manuscript by Hung, Green, and Lotspeich (2025+).
Our implementation of the health disparities methods in the paper rely
on the following R packages, which can be installed and loaded as
follows.
# Run once: install.packages(c("dplyr", "ggplot2", "scales", "epitools", "rineq"))
library(dplyr) ## for data wrangling
library(ggplot2) ## for concentration curves
library(scales) ## for % labels on concentration curves
library(epitools) ## for rate ratios
library(rineq) ## for concentration indicesWith permission from the Institutional Review Board at Wake Forest University School of Medicine, a de-identified version of the study dataset have been made available with the paper as Supplementary Material. Once downloaded, these data can be loaded as follows.
# Read in the data from CSV
ehr_food = read.csv("https://raw.githubusercontent.com/sarahlotspeich/ME_Disparities_HSORM/refs/heads/main/me_disparities_analysis_2020.csv")
## Inspect data summary
ehr_food |>
summary()## ID DIABETES MAP_PROXIMITY STRAIGHT_PROXIMITY
## Min. : 1.0 Min. :0.00000 Min. :2.880e-05 Min. :2.880e-05
## 1st Qu.:231.8 1st Qu.:0.00000 1st Qu.:9.503e-01 1st Qu.:6.074e-01
## Median :462.5 Median :0.00000 Median :1.701e+00 Median :1.140e+00
## Mean :462.5 Mean :0.06818 Mean :2.512e+00 Mean :1.738e+00
## 3rd Qu.:693.2 3rd Qu.:0.00000 3rd Qu.:3.572e+00 3rd Qu.:2.291e+00
## Max. :924.0 Max. :1.00000 Max. :1.547e+01 Max. :1.019e+01
## MAP_PROXIMITY_BINNED STRAIGHT_PROXIMITY_BINNED EST_FOOD_INSECURITY
## Length:924 Length:924 Min. : 5.30
## Class :character Class :character 1st Qu.:11.90
## Mode :character Mode :character Median :16.10
## Mean :16.08
## 3rd Qu.:20.90
## Max. :29.20
## LB_FOOD_INSECURITY UB_FOOD_INSECURITY EST_FOOD_INSECURITY_BINNED
## Min. : 4.60 Min. : 6.00 Length:924
## 1st Qu.:10.60 1st Qu.:13.20 Class :character
## Median :14.50 Median :18.00 Mode :character
## Mean :14.46 Mean :17.76
## 3rd Qu.:19.00 3rd Qu.:22.80
## Max. :26.70 Max. :31.60
## LB_FOOD_INSECURITY_BINNED UB_FOOD_INSECURITY_BINNED
## Length:924 Length:924
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## Define levels for binned factors
prox_levels = c("More than 5 Miles", "Between 4-5 Miles", "Between 3-4 Miles",
"Between 2-3 Miles", "Between 1-2 Miles", "Up to 1 Mile")
insec_levels = c("More than 20%", "Between 17.5-20%", "Between 15-17.5%",
"Between 12.5-15%", "Between 10-12.5%", "Up to 10%")
### Order from most disadvantaged --> least disadvantaged
ehr_food = ehr_food |>
mutate(MAP_PROXIMITY_BINNED = factor(x = MAP_PROXIMITY_BINNED,
levels = prox_levels),
STRAIGHT_PROXIMITY_BINNED = factor(x = STRAIGHT_PROXIMITY_BINNED,
levels = prox_levels),
EST_FOOD_INSECURITY_BINNED = factor(x = EST_FOOD_INSECURITY_BINNED,
levels = insec_levels),
LB_FOOD_INSECURITY_BINNED = factor(x = LB_FOOD_INSECURITY_BINNED,
levels = insec_levels),
UB_FOOD_INSECURITY_BINNED = factor(x = UB_FOOD_INSECURITY_BINNED,
levels = insec_levels))The dataset contains the following columns.
ID: a deidentified patient identifierDIABETES: a binary indicator of being diagnosed with type-2 diabetes (=0if no,=1if yes)MAP_PROXIMITY: unbinned map-based proximity to healthy foods (in miles)STRAIGHT_PROXIMITY: unbinned straight-line proximity to healthy foods (in miles)MAP_PROXIMITY_BINNED: binned map-based proximity to healthy foods (categorical)STRAIGHT_PROXIMITY_BINNED: binned straight-line proximity to healthy foods (categorical)EST_FOOD_INSECURITY: unbinned survey estimate of neighborhood rate of food insecurity (percent, between0and100)LB_FOOD_INSECURITY: unbinned lower bound on neighborhood rate of food insecurity, taken from the 95% confidence interval (percent, between0and100)UB_FOOD_INSECURITY: unbinned upper bound on neighborhood rate of food insecurity, taken from the 95% confidence interval (percent, between0and100)EST_FOOD_INSECURITY_BINNED: binned survey estimate of neighborhood rate of food insecurity (categorical)LB_FOOD_INSECURITY_BINNED: binned lower bound on neighborhood rate of food insecurity, taken from the 95% confidence interval (categorical)UB_FOOD_INSECURITY_BINNED: binned upper bound on neighborhood rate of food insecurity, taken from the 95% confidence interval (categorical)
Each row represents data for one patient.
The logbin_rr function, defined below, calculates the rate ratio (RR)
and its 95% confidence interval from a log-binomial regression model.
# Write a function that calculates RR (95% CI) from a log-binomial model
### formula: Model formula, like Y ~ X
### data: Dataset where variables in the model formula can be found
logbin_rr = function(formula, data) {
fit = glm(formula = formula,
data = data,
family = binomial(link = "log"))
to_return = suppressMessages(
c(exp(coefficients(fit)[2]),
exp(confint(object = fit))[2, ])
)
names(to_return) = c("RR", "LB", "UB")
return(to_return)
}The mod_rii function, defined below, calculates the relative index of
inequality (RII) and its 95% confidence interval using the Poisson
regression approach by Moreno-Betancur et
al. (2015).
# Write a function that calculates RII (95% CI) from a Poisson model
## Arguments:
### health_var: name of health outcome
### group_by_var: name of the socioeconomic exposure to group by (ordered from most- to least-disadvantaged)
### data: dataset where health_var and group_by_var can be found
mod_rii = function(health_var, group_by_var, data) {
prev_by_group = data |>
group_by({{group_by_var}}) |>
summarize(sum_health = sum({{health_var}}),
num = n()) |>
ungroup() |>
arrange({{group_by_var}}) |>
mutate(est_prev = sum_health / num,
prop_num = num / sum(num),
cum_prop_num = cumsum(prop_num))
cdf = data.frame(
sum_health = 0,
num = 0,
est_prev = 0,
prop_num = 0,
cum_prop_num = 0) |>
bind_rows(prev_by_group) |>
mutate(rank = (1 - cum_prop_num) + 1 / 2 * prop_num) |>
slice(-1)
rii = glm(formula = sum_health ~ rank,
offset = log(num),
family = "poisson",
data = cdf)
to_return = suppressMessages(exp(c(rii$coefficients[2], confint(rii)[2, ])))
names(to_return) = c("RII", "LB", "UB")
return(to_return)
}The make_cc_dat function, defined below, constructs the ranked
dataset, which can be used to build a concentration curve.
# Write a function that create the dataset for a concentration curve
## Arguments:
### health_var: name of health outcome
### group_by_var: name of the socioeconomic exposure to group by (ordered from most- to least-disadvantaged)
### data: dataset where health_var and group_by_var can be found
make_cc_dat = function(health_var, group_by_var, data) {
data |>
group_by({{group_by_var}}) |>
summarize(num = n(),
sum_health = sum({{health_var}})) |>
arrange({{group_by_var}}) |>
mutate(cumsum_num = cumsum(num),
cumprop_num = cumsum_num / sum(num),
cumsum_health = cumsum(sum_health),
cumprop_health = cumsum_health / sum(sum_health)) |>
bind_rows(
data.frame(num = 0,
sum_health = 0,
cumsum_num = 0,
cumprop_num = 0,
cumsum_health = 0,
cumprop_health = 0)
) |>
mutate(CI = cumprop_num * lead(x = cumprop_health, n = 1, default = 0) -
cumprop_health * lead(x = cumprop_num, n = 1, default = 0))
}The calc_grouped_ci function, defined below, calculates the
concentration index with its 95% confidence interval from grouped data.
# Write a function that calculate the concentration index (C) from grouped data
## Arguments:
### health_var: name of health outcome
### group_by_var: name of the socioeconomic exposure to group by (ordered from most- to least-disadvantaged)
### data: dataset where health_var and group_by_var can be found
calc_grouped_ci = function(group_by_var, health_var, data) {
# Save useful constants
n = nrow(data) ## sample size
# Build summary dataset by group
cc_dat = data |>
group_by({{group_by_var}}) |>
summarize(num = dplyr::n(),
prop = num / n,
cases = sum({{health_var}}),
mean_health = mean({{health_var}}),
var_health = var({{health_var}})) |>
arrange({{group_by_var}}) |>
mutate(cumsum_num = cumsum(num),
cumprop_num = cumsum_num / sum(num),
cumsum_cases = cumsum(cases),
cumprop_cases = cumsum_cases / sum(cases)) |>
mutate(CI = cumprop_num * lead(x = cumprop_cases, n = 1, default = 0) -
cumprop_cases * lead(x = cumprop_num, n = 1, default = 0),
R = lag(x = cumprop_num, n = 1, default = 0) + 1 / 2 * prop)
# Compute concentration index
C = sum(cc_dat$CI)
# Compute variance for C
Ybar = data |>
pull({{health_var}}) |>
mean()
q = c(0, 1 / Ybar * cumsum(cc_dat$mean_health * cc_dat$prop)) ## augment with q0 = 0
a = (cc_dat$mean_health / Ybar) * (2 * cc_dat$R - 1 - C) + 2 - q[-length(q)] - q[-1]
varC = 1 / n * (sum(cc_dat$prop * a ^ 2) - (1 + C) ^ 2) +
1 / (n * Ybar ^ 2) * sum(cc_dat$prop * cc_dat$var_health * (2 * cc_dat$R - 1 - C) ^ 2)
# Return C with its 95% CI
to_return = c(C, C + c(-1.96, 1.96) * sqrt(varC))
names(to_return) = c("CI", "LB", "UB")
return(to_return)
}## Prevalence rate ratios based on straight-line proximity (binned)
tab_straight = with(ehr_food, table(STRAIGHT_PROXIMITY_BINNED, DIABETES))
rr_straight = data.frame(riskratio(tab_straight, rev = "rows")$measure)
rr_straight## estimate lower upper
## Up to 1 Mile 1.000000 NA NA
## Between 1-2 Miles 1.092593 0.6083895 1.962162
## Between 2-3 Miles 1.000692 0.4480019 2.235225
## Between 3-4 Miles 1.487140 0.6730990 3.285676
## Between 4-5 Miles 0.000000 0.0000000 NaN
## More than 5 Miles 1.443047 0.5806423 3.586346
## Prevalence rate ratios based on map-based proximity (binned)
tab_map = with(ehr_food, table(MAP_PROXIMITY_BINNED, DIABETES))
rr_map = data.frame(riskratio(tab_map, rev = "rows")$measure)
rr_map## estimate lower upper
## Up to 1 Mile 1.0000000 NA NA
## Between 1-2 Miles 1.3784461 0.7318271 2.596397
## Between 2-3 Miles 0.6613757 0.2459461 1.778510
## Between 3-4 Miles 1.0869565 0.4348702 2.716844
## Between 4-5 Miles 1.8779343 0.8299323 4.249307
## More than 5 Miles 0.9803922 0.4106589 2.340553
## Prevalence rate ratio based on straight-line proximity (unbinned)
logbin_rr(formula = DIABETES ~ STRAIGHT_PROXIMITY,
data = ehr_food)## RR LB UB
## 1.0012044 0.8569016 1.1427091
## Prevalence rate ratio based on map-based proximity (unbinned)
logbin_rr(formula = DIABETES ~ MAP_PROXIMITY,
data = ehr_food)## RR LB UB
## 1.0136741 0.9055229 1.1152475
## Relative index of inequality based on straight-line proximity (binned)
mod_rii(health_var = DIABETES,
group_by_var = STRAIGHT_PROXIMITY_BINNED,
data = ehr_food)## RII LB UB
## 1.1196108 0.4485444 2.7463805
## Relative index of inequality based on straight-line proximity (binned)
mod_rii(health_var = DIABETES,
group_by_var = MAP_PROXIMITY_BINNED,
data = ehr_food)## RII LB UB
## 1.0966057 0.4541682 2.6378360
## Define plot colors
cols = c("#ff99ff", "#8bdddb", "#787ff6", "#ffbd59", "#7dd5f6", "#ff914d")
## Make dataset for concentration curves (both distance types)
cc_dat = make_cc_dat(health_var = DIABETES,
group_by_var = STRAIGHT_PROXIMITY_BINNED,
data = ehr_food) |>
mutate(PROXIMITY = "Straight-Line (Binned)") |>
bind_rows(
make_cc_dat(health_var = DIABETES,
group_by_var = MAP_PROXIMITY_BINNED,
data = ehr_food) |>
mutate(PROXIMITY = "Map-Based (Binned)")
) |>
mutate(PROXIMITY = factor(x = PROXIMITY,
levels = c("Straight-Line (Binned)", "Map-Based (Binned)")))
## Use ggplot() to build concentration curves (both distance types)
cc_dat|>
ggplot(aes(x = cumprop_num, y = cumprop_health, color = PROXIMITY)) +
geom_abline(slope = 1, intercept = 0, linetype = 2, color = "black") +
geom_line(linewidth = 1.2, alpha = 1) +
coord_equal() +
theme_minimal(base_size = 14) +
labs(x = "Cumulative Proportion of Patients\n(Ranked by Proximity)",
y = "Cumulative Proportion of Diabetes Cases") +
theme(legend.text = element_text(color = "black"),
legend.title = element_text(color = "black", face = "bold"),
axis.title = element_text(color = "black", face = "bold"),
legend.position = "top") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format()) +
scale_color_manual(values = cols, name = "Distance")## Create negative proximity to rank from farthest (most disadvantaged) --> nearest (least disadvantaged)
ehr_food = ehr_food |>
dplyr::mutate(NEG_STRAIGHT_PROXIMITY = - STRAIGHT_PROXIMITY,
NEG_MAP_PROXIMITY = - MAP_PROXIMITY)
## Make dataset for concentration curves (both distance types)
cc_dat = make_cc_dat(health_var = DIABETES,
group_by_var = NEG_STRAIGHT_PROXIMITY,
data = ehr_food) |>
mutate(PROXIMITY = "Straight-Line") |>
bind_rows(
make_cc_dat(health_var = DIABETES,
group_by_var = NEG_MAP_PROXIMITY,
data = ehr_food) |>
mutate(PROXIMITY = "Map-Based")
) |>
mutate(PROXIMITY = factor(x = PROXIMITY,
levels = c("Straight-Line", "Map-Based")))
## Use ggplot() to build concentration curves (both distance types)
cc_dat |>
ggplot(aes(x = cumprop_num, y = cumprop_health, color = PROXIMITY)) +
geom_abline(slope = 1, intercept = 0, linetype = 2, color = "black") +
geom_line(linewidth = 1.2, alpha = 1) +
coord_equal() +
theme_minimal(base_size = 14) +
labs(x = "Cumulative Proportion of Patients\n(Ranked by Proximity)",
y = "Cumulative Proportion of Diabetes Cases") +
theme(axis.title = element_text(color = "black", face = "bold"),
legend.title = element_text(color = "black", face = "bold"),
legend.position = "top") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format()) +
scale_color_manual(values = cols, name = "Distance")## Concentration index based on straight-line proximity (binned)
calc_grouped_ci(group_by_var = STRAIGHT_PROXIMITY_BINNED,
health_var = DIABETES,
data = ehr_food)## CI LB UB
## -0.0168522 -0.1470394 0.1133350
## Concentration index based on map-based proximity (binned)
calc_grouped_ci(group_by_var = MAP_PROXIMITY_BINNED,
health_var = DIABETES,
data = ehr_food)## CI LB UB
## -0.01461898 -0.14701374 0.11777578
## Concentration index based on straight-line proximity (unbinned)
ci_straight = ci(ineqvar = ehr_food$NEG_STRAIGHT_PROXIMITY, ### Food environment exposure
outcome = ehr_food$DIABETES, ### Health outcome
type = "CIw", ### Wagstaff's correction for binary health outcome
robust_se = TRUE) ### Use sandwich standard errors for confidence interval
summary(ci_straight)## Call:
## [1] "ci(ineqvar = ehr_food$NEG_STRAIGHT_PROXIMITY, outcome = ehr_food$DIABETES, "
## [2] " type = \"CIw\", robust_se = TRUE)"
##
## Type of Concentration Index:
## CIw
##
## Health Concentration Index:
## -0.003078738
##
## Variance:
## 0.005611874
##
## 95% Confidence Interval:
## -0.1499044 0.1437469
## Concentration index based on map-based proximity (unbinned)
ci_map = ci(ineqvar = ehr_food$NEG_MAP_PROXIMITY, ### Food environment exposure
outcome = ehr_food$DIABETES, ### Health outcome
type = "CIw", ### Wagstaff's correction for binary health outcome
robust_se = TRUE) ### Use sandwich standard errors for confidence interval
summary(ci_map)## Call:
## [1] "ci(ineqvar = ehr_food$NEG_MAP_PROXIMITY, outcome = ehr_food$DIABETES, "
## [2] " type = \"CIw\", robust_se = TRUE)"
##
## Type of Concentration Index:
## CIw
##
## Health Concentration Index:
## -0.02026068
##
## Variance:
## 0.00553609
##
## 95% Confidence Interval:
## -0.1660916 0.1255703
## Prevalence rate ratios based on survey estimate food insecurity (binned)
tab_est = with(ehr_food, table(EST_FOOD_INSECURITY_BINNED, DIABETES))
rr_est = data.frame(riskratio(tab_est, rev = "rows")$measure)
rr_est## estimate lower upper
## Up to 10% 1.0000000 NA NA
## Between 10-12.5% 0.3641975 0.0834310 1.589815
## Between 12.5-15% 0.9600592 0.4354942 2.116477
## Between 15-17.5% 0.8750000 0.3548332 2.157704
## Between 17.5-20% 1.2178899 0.5307070 2.794868
## More than 20% 1.2018519 0.6104945 2.366029
## Prevalence rate ratios based on lower bound food insecurity (binned)
tab_lb = with(ehr_food, table(LB_FOOD_INSECURITY_BINNED, DIABETES))
rr_lb = data.frame(riskratio(tab_lb, rev = "rows")$measure)
rr_lb## estimate lower upper
## Up to 10% 1.0000000 NA NA
## Between 10-12.5% 0.4071730 0.1339359 1.237829
## Between 12.5-15% 1.3864943 0.6675526 2.879723
## Between 15-17.5% 0.8935185 0.2087473 3.824602
## Between 17.5-20% 1.0767085 0.5219766 2.220983
## More than 20% 1.8158602 0.8687047 3.795707
## Prevalence rate ratios based on upper bound food insecurity (binned)
tab_ub = with(ehr_food, table(UB_FOOD_INSECURITY_BINNED, DIABETES))
rr_ub = data.frame(riskratio(tab_ub)$measure)
rr_ub## estimate lower upper
## More than 20% 1.0000000 NA NA
## Between 17.5-20% 1.0444444 0.49484619 2.204451
## Between 15-17.5% 0.8837607 0.45442442 1.718730
## Between 12.5-15% 0.2433657 0.05913981 1.001472
## Between 10-12.5% 0.5632959 0.20365731 1.558020
## Up to 10% 0.9641026 0.45584225 2.039069
## Prevalence rate ratio based on survey estimate food insecurity (unbinned)
logbin_rr(formula = DIABETES ~ EST_FOOD_INSECURITY,
data = ehr_food)## RR LB UB
## 1.0325328 0.9857904 1.0819680
## Prevalence rate ratio based on lower bound food insecurity (unbinned)
logbin_rr(formula = DIABETES ~ LB_FOOD_INSECURITY,
data = ehr_food)## RR LB UB
## 1.0356549 0.9852063 1.0891817
## Prevalence rate ratio based on upper bound food insecurity (unbinned)
logbin_rr(formula = DIABETES ~ UB_FOOD_INSECURITY,
data = ehr_food)## RR LB UB
## 1.0302874 0.9867722 1.0761693
## Relative index of inequality based on survey estimate food insecurity (binned)
mod_rii(health_var = DIABETES,
group_by_var = EST_FOOD_INSECURITY_BINNED,
data = ehr_food)## RII LB UB
## 1.6530293 0.6871525 4.0523816
## Relative index of inequality based on lower bound food insecurity (binned)
mod_rii(health_var = DIABETES,
group_by_var = LB_FOOD_INSECURITY_BINNED,
data = ehr_food)## RII LB UB
## 2.2605885 0.9371285 5.5732896
## Relative index of inequality based on upper bound food insecurity (binned)
mod_rii(health_var = DIABETES,
group_by_var = UB_FOOD_INSECURITY_BINNED,
data = ehr_food)## RII LB UB
## 1.8074586 0.7338411 4.6124180
## Make dataset for concentration curves (all values)
cc_dat = make_cc_dat(group_by_var = LB_FOOD_INSECURITY_BINNED,
health_var = DIABETES,
data = ehr_food) |>
mutate(FOOD_INSECURITY = "Lower Bound (Binned)") |>
bind_rows(
make_cc_dat(group_by_var = EST_FOOD_INSECURITY_BINNED,
health_var = DIABETES,
data = ehr_food) |>
mutate(FOOD_INSECURITY = "Survey Estimate (Binned)")
) |>
bind_rows(
make_cc_dat(group_by_var = UB_FOOD_INSECURITY_BINNED,
health_var = DIABETES,
data = ehr_food) |>
mutate(FOOD_INSECURITY = "Upper Bound (Binned)")
) |>
mutate(FOOD_INSECURITY = factor(x = FOOD_INSECURITY,
levels = c("Lower Bound (Binned)",
"Survey Estimate (Binned)",
"Upper Bound (Binned)")))
## Use ggplot() to build concentration curves (both distance types)
cc_dat |>
ggplot(aes(x = cumprop_num, y = cumprop_health, color = FOOD_INSECURITY)) +
geom_abline(slope = 1, intercept = 0, linetype = 2, color = "black") +
geom_line(linewidth = 1.2, alpha = 1) +
coord_equal() +
theme_minimal(base_size = 14) +
labs(x = "Cumulative Proportion of Patients\n(Ranked by Food Insecurity)",
y = "Cumulative Proportion of Diabetes Cases") +
theme(axis.title = element_text(color = "black", face = "bold"),
legend.title = element_text(color = "black", face = "bold"),
legend.position = "top") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format()) +
scale_color_manual(values = rev(cols[-length(cols)]), name = "Distance")## Create negative neighborhood food insecurity to rank from highest (most disadvantaged) --> lowest (least disadvantaged)
ehr_food = ehr_food |>
dplyr::mutate(NEG_EST_FOOD_INSECURITY = - EST_FOOD_INSECURITY,
NEG_LB_FOOD_INSECURITY = - LB_FOOD_INSECURITY,
NEG_UB_FOOD_INSECURITY = - UB_FOOD_INSECURITY)
## Make dataset for concentration curves (all values)
cc_dat = make_cc_dat(group_by_var = NEG_LB_FOOD_INSECURITY,
health_var = DIABETES,
data = ehr_food) |>
mutate(FOOD_INSECURITY = "Lower Bound") |>
bind_rows(
make_cc_dat(group_by_var = NEG_EST_FOOD_INSECURITY,
health_var = DIABETES,
data = ehr_food) |>
mutate(FOOD_INSECURITY = "Survey Estimate")
) |>
bind_rows(
make_cc_dat(group_by_var = NEG_UB_FOOD_INSECURITY,
health_var = DIABETES,
data = ehr_food) |>
mutate(FOOD_INSECURITY = "Upper Bound")
) |>
mutate(FOOD_INSECURITY = factor(x = FOOD_INSECURITY,
levels = c("Lower Bound",
"Survey Estimate",
"Upper Bound")))
## Use ggplot() to build concentration curves (both distance types)
cc_dat |>
ggplot(aes(x = cumprop_num, y = cumprop_health, color = FOOD_INSECURITY)) +
geom_abline(slope = 1, intercept = 0, linetype = 2, color = "black") +
geom_line(linewidth = 1.2, alpha = 1) +
coord_equal() +
theme_minimal(base_size = 14) +
labs(x = "Cumulative Proportion of Patients\n(Ranked by Food Insecurity)",
y = "Cumulative Proportion of Diabetes Cases") +
theme(axis.title = element_text(color = "black", face = "bold"),
legend.title = element_text(color = "black", face = "bold"),
legend.position = "top") +
scale_x_continuous(labels = percent_format()) +
scale_y_continuous(labels = percent_format()) +
scale_color_manual(values = rev(cols[-length(cols)]), name = "Distance")## Concentration index based on survey estimate food insecurity (binned)
calc_grouped_ci(group_by_var = EST_FOOD_INSECURITY_BINNED,
health_var = DIABETES,
data = ehr_food)## CI LB UB
## -0.07950251 -0.21598215 0.05697713
## Concentration index based on lower bound food insecurity (binned)
calc_grouped_ci(group_by_var = LB_FOOD_INSECURITY_BINNED,
health_var = DIABETES,
data = ehr_food)## CI LB UB
## -0.128908129 -0.267642306 0.009826048
## Concentration index based on upper bound food insecurity (binned)
calc_grouped_ci(group_by_var = UB_FOOD_INSECURITY_BINNED,
health_var = DIABETES,
data = ehr_food)## CI LB UB
## -0.08888202 -0.22069648 0.04293245
## Concentration index based on survey estimate food insecurity (unbinned)
ci_est = ci(ineqvar = ehr_food$NEG_EST_FOOD_INSECURITY, ### Food environment exposure
outcome = ehr_food$DIABETES, ### Health outcome
type = "CIw", ### Wagstaff's correction for binary health outcome
robust_se = TRUE) ### Use sandwich standard errors for confidence interval
summary(ci_est)## Call:
## [1] "ci(ineqvar = ehr_food$NEG_EST_FOOD_INSECURITY, outcome = ehr_food$DIABETES, "
## [2] " type = \"CIw\", robust_se = TRUE)"
##
## Type of Concentration Index:
## CIw
##
## Health Concentration Index:
## -0.1603709
##
## Variance:
## 0.006078574
##
## 95% Confidence Interval:
## -0.3131799 -0.007561918
## Concentration index based on lower bound food insecurity (unbinned)
ci_lb = ci(ineqvar = ehr_food$NEG_LB_FOOD_INSECURITY, ### Food environment exposure
outcome = ehr_food$DIABETES, ### Health outcome
type = "CIw", ### Wagstaff's correction for binary health outcome
robust_se = TRUE) ### Use sandwich standard errors for confidence interval
summary(ci_lb)## Call:
## [1] "ci(ineqvar = ehr_food$NEG_LB_FOOD_INSECURITY, outcome = ehr_food$DIABETES, "
## [2] " type = \"CIw\", robust_se = TRUE)"
##
## Type of Concentration Index:
## CIw
##
## Health Concentration Index:
## -0.1602603
##
## Variance:
## 0.006087245
##
## 95% Confidence Interval:
## -0.3131783 -0.007342358
## Concentration index based on upper bound food insecurity (unbinned)
ci_ub = ci(ineqvar = ehr_food$NEG_UB_FOOD_INSECURITY, ### Food environment exposure
outcome = ehr_food$DIABETES, ### Health outcome
type = "CIw", ### Wagstaff's correction for binary health outcome
robust_se = TRUE) ### Use sandwich standard errors for confidence interval
summary(ci_ub)## Call:
## [1] "ci(ineqvar = ehr_food$NEG_UB_FOOD_INSECURITY, outcome = ehr_food$DIABETES, "
## [2] " type = \"CIw\", robust_se = TRUE)"
##
## Type of Concentration Index:
## CIw
##
## Health Concentration Index:
## -0.1601128
##
## Variance:
## 0.006077284
##
## 95% Confidence Interval:
## -0.3129056 -0.007320041



