Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted tables? #13

Open
tylcole opened this Issue Jan 19, 2019 · 10 comments

Comments

Projects
None yet
2 participants
@tylcole
Copy link

tylcole commented Jan 19, 2019

Would it be difficult to incorporating weighting into this package? This would be incredibly helpful for both administrative datasets as well as propensity score matching reporting. Thanks!

@ewenharrison

This comment has been minimized.

Copy link
Owner

ewenharrison commented Jan 19, 2019

There are lots of different glm() options that could be added. Poisson/weights/propensity score as you rightly say. Rather than make things more complicated, ff_merge() allows any table to be created. Here are some examples of going from a simple logistic regression model, to a weighted model.

Standard approach:

explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
	finalfit(dependent, explanatory)

Build each column separately approach:

colon_s %>% 
	summary_factorlist(dependent, explanatory, fit_id=TRUE) %>% 
	ff_merge(
		glmuni(colon_s, dependent, explanatory) %>% 
			fit2df(estimate_suffix = " (univariable)")
	) %>% 
	ff_merge(
		glmmulti(colon_s, dependent, explanatory) %>%
			fit2df(estimate_suffix = " (multivariable)")
	) %>% 
	dplyr::select(-fit_id, -index) %>% 
	dependent_label(colon_s, dependent)

Incorporate more complex model:

colon_s %>% 
	summary_factorlist(dependent, explanatory, fit_id=TRUE) %>% 
	ff_merge(
		glmuni(colon_s, dependent, explanatory) %>% 
			fit2df(estimate_suffix = " (univariable)")
	) %>% 
	ff_merge(
		glm(ff_formula(dependent, explanatory), data=colon_s, 
				family = "binomial", weights = NULL) %>% 
			fit2df(estimate_suffix = " (multivariable)")
	) %>% 
	dplyr::select(-fit_id, -index) %>% 
	dependent_label(colon_s, dependent)

Let me know what you think.

@tylcole

This comment has been minimized.

Copy link
Author

tylcole commented Jan 24, 2019

That's great for multivariate models, thanks!

What do you think about weighted univariate stats? Would that require using different functions within the glmuni() function?

@ewenharrison

This comment has been minimized.

Copy link
Owner

ewenharrison commented Mar 25, 2019

lmuni(), lmmulti(), glmuni() and glmmulti() now supported weights.

explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
wgts = runif(dim(colon_s)[1], 0, 1)

library(finalfit)
colon_s %>% 
	summary_factorlist(dependent, explanatory, fit_id=TRUE) %>% 
	ff_merge(
		glmuni(colon_s, dependent, explanatory, weights = wgts) %>% 
			fit2df(estimate_suffix = " (univariable)")
	) %>% 
	ff_merge(
		glm(ff_formula(dependent, explanatory), data=colon_s, 
				family = "binomial", weights = wgts) %>% 
			fit2df(estimate_suffix = " (multivariable)")
	) %>% 
	dplyr::select(-fit_id, -index) %>% 
	dependent_label(colon_s, dependent)
@tylcole

This comment has been minimized.

Copy link
Author

tylcole commented Mar 25, 2019

@ewenharrison

This comment has been minimized.

Copy link
Owner

ewenharrison commented Mar 25, 2019

Thanks.
Could you provide an example of what you suggest.

@tylcole

This comment has been minimized.

Copy link
Author

tylcole commented Apr 1, 2019

ewenharrison added a commit that referenced this issue Apr 3, 2019

@ewenharrison

This comment has been minimized.

Copy link
Owner

ewenharrison commented Apr 3, 2019

Hi,

I don't know too much about this area. I've added these functions and would appreciate if you could let me know if this is what you were thinking of. Also, are the examples correct.

Many thanks for your help with this.

Ewen

library(dplyr)
library(survey)

# Examples from survey::svyglm() help page

data(api)

# Label data frame
apistrat = apistrat %>% 
  mutate(
    api00 = ff_label(api00, "API in 2000 (api00)"),
    ell = ff_label(ell, "English language learners (percent)(ell)"),
    meals = ff_label(meals, "Meals eligible (percent)(meals)"),
    mobility = ff_label(mobility, "First year at the school (percent)(mobility)"),
    sch.wide = ff_label(sch.wide, "School-wide target met (sch.wide)")
    )
		
# Linear example
dependent = "api00"
explanatory = c("ell", "meals", "mobility")

# Stratified design
dstrat = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

# Univariable fit
fit_uni = dstrat %>%
    svyglmuni(dependent, explanatory) %>%
    fit2df(estimate_suffix = " (univariable)")

# Multivariable fit
fit_multi = dstrat %>%
    svyglmmulti(dependent, explanatory) %>%
    fit2df(estimate_suffix = " (multivariable)")
	
# Pipe together
apistrat %>%
    summary_factorlist(dependent, explanatory, fit_id = TRUE) %>%
    ff_merge(fit_uni) %>% 
    ff_merge(fit_multi) %>% 
    select(-fit_id, -index) %>%
    dependent_label(apistrat, dependent)

#               Dependent: API in 2000 (api00)             Mean (sd)    Coefficient (univariable)  Coefficient (multivariable)
#     English language learners (percent)(ell)  [0,84] 652.8 (121.0) -3.73 (-4.35--3.11, p<0.001)  -0.48 (-1.25-0.29, p=0.222)
#              Meals eligible (percent)(meals) [0,100] 652.8 (121.0) -3.38 (-3.71--3.05, p<0.001) -3.14 (-3.70--2.59, p<0.001)
# First year at the school (percent)(mobility)  [1,99] 652.8 (121.0)  -1.43 (-3.30-0.44, p=0.137)   0.23 (-0.54-1.00, p=0.567)

# Binomial example
## Note model family needs specified and exponentiation if desired
dependent = "sch.wide"
explanatory = c("ell", "meals", "mobility")

# Univariable fit
fit_uni = dstrat %>%
    svyglmuni(dependent, explanatory, family = "quasibinomial") %>%
    fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (univariable)")

# Multivariable fit
fit_multi = dstrat %>%
    svyglmmulti(dependent, explanatory, family = "quasibinomial") %>%
    fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (multivariable)")

# Pipe together
apistrat %>%
    summary_factorlist(dependent, explanatory, fit_id = TRUE) %>%
    ff_merge(fit_uni) %>% 
    ff_merge(fit_multi) %>% 
    select(-fit_id, -index) %>%
    dependent_label(apistrat, dependent)

# Dependent: School-wide target met (sch.wide)                    No         Yes          OR (univariable)        OR (multivariable)
#     English language learners (percent)(ell) Mean (SD) 22.5 (19.3) 20.5 (20.0) 1.00 (0.98-1.01, p=0.715) 1.00 (0.97-1.02, p=0.851)
#              Meals eligible (percent)(meals) Mean (SD) 46.0 (29.1) 44.7 (29.0) 1.00 (0.99-1.01, p=0.968) 1.00 (0.98-1.01, p=0.732)
# First year at the school (percent)(mobility) Mean (SD)  13.9 (8.6) 17.2 (13.0) 1.06 (1.00-1.12, p=0.049) 1.06 (1.00-1.13, p=0.058)

@ewenharrison ewenharrison reopened this Apr 3, 2019

@tylcole

This comment has been minimized.

Copy link
Author

tylcole commented Apr 4, 2019

That's fantastic, I should have time in the next week to go through this and confirm. I will also test on my own data. I'll respond as soon as I do. Thank you for this great update!

@ewenharrison

This comment has been minimized.

Copy link
Owner

ewenharrison commented Apr 4, 2019

Thanks. One issue is that the summary statistics from summary_factorlist() are not weighted. They cannot be weighted using this function. So to be useful it might be necessary to incorporate svymean() or similar.

What would be useful is to know how you would actually use this. What your final table would actually look like. What variations would be common in practice etc. Best wishes.

@tylcole

This comment has been minimized.

Copy link
Author

tylcole commented Apr 18, 2019

Ewen I just took a look and that looks correct, though like you mention a key feature that would be really nice is the weighted means and SD in the table using svymean(). The final table would be just like you are demonstrating, except weighted mean/SD.

This is a basic example of what those tables would look like in a paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5624560/pdf/cureus-0009-00000001536.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.