Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted tables? #13

Open
tylcole opened this issue Jan 19, 2019 · 27 comments
Open

Weighted tables? #13

tylcole opened this issue Jan 19, 2019 · 27 comments

Comments

@tylcole
Copy link

@tylcole tylcole commented Jan 19, 2019

Would it be difficult to incorporating weighting into this package? This would be incredibly helpful for both administrative datasets as well as propensity score matching reporting. Thanks!

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Jan 19, 2019

There are lots of different glm() options that could be added. Poisson/weights/propensity score as you rightly say. Rather than make things more complicated, ff_merge() allows any table to be created. Here are some examples of going from a simple logistic regression model, to a weighted model.

Standard approach:

explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
	finalfit(dependent, explanatory)

Build each column separately approach:

colon_s %>% 
	summary_factorlist(dependent, explanatory, fit_id=TRUE) %>% 
	ff_merge(
		glmuni(colon_s, dependent, explanatory) %>% 
			fit2df(estimate_suffix = " (univariable)")
	) %>% 
	ff_merge(
		glmmulti(colon_s, dependent, explanatory) %>%
			fit2df(estimate_suffix = " (multivariable)")
	) %>% 
	dplyr::select(-fit_id, -index) %>% 
	dependent_label(colon_s, dependent)

Incorporate more complex model:

colon_s %>% 
	summary_factorlist(dependent, explanatory, fit_id=TRUE) %>% 
	ff_merge(
		glmuni(colon_s, dependent, explanatory) %>% 
			fit2df(estimate_suffix = " (univariable)")
	) %>% 
	ff_merge(
		glm(ff_formula(dependent, explanatory), data=colon_s, 
				family = "binomial", weights = NULL) %>% 
			fit2df(estimate_suffix = " (multivariable)")
	) %>% 
	dplyr::select(-fit_id, -index) %>% 
	dependent_label(colon_s, dependent)

Let me know what you think.

@tylcole
Copy link
Author

@tylcole tylcole commented Jan 24, 2019

That's great for multivariate models, thanks!

What do you think about weighted univariate stats? Would that require using different functions within the glmuni() function?

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Mar 25, 2019

lmuni(), lmmulti(), glmuni() and glmmulti() now supported weights.

explanatory = c("age.factor", "age", "sex.factor", "nodes", "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
wgts = runif(dim(colon_s)[1], 0, 1)

library(finalfit)
colon_s %>% 
	summary_factorlist(dependent, explanatory, fit_id=TRUE) %>% 
	ff_merge(
		glmuni(colon_s, dependent, explanatory, weights = wgts) %>% 
			fit2df(estimate_suffix = " (univariable)")
	) %>% 
	ff_merge(
		glm(ff_formula(dependent, explanatory), data=colon_s, 
				family = "binomial", weights = wgts) %>% 
			fit2df(estimate_suffix = " (multivariable)")
	) %>% 
	dplyr::select(-fit_id, -index) %>% 
	dependent_label(colon_s, dependent)
@tylcole
Copy link
Author

@tylcole tylcole commented Mar 25, 2019

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Mar 25, 2019

Thanks.
Could you provide an example of what you suggest.

@tylcole
Copy link
Author

@tylcole tylcole commented Apr 1, 2019

ewenharrison added a commit that referenced this issue Apr 3, 2019
@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Apr 3, 2019

Hi,

I don't know too much about this area. I've added these functions and would appreciate if you could let me know if this is what you were thinking of. Also, are the examples correct.

Many thanks for your help with this.

Ewen

library(dplyr)
library(survey)

# Examples from survey::svyglm() help page

data(api)

# Label data frame
apistrat = apistrat %>% 
  mutate(
    api00 = ff_label(api00, "API in 2000 (api00)"),
    ell = ff_label(ell, "English language learners (percent)(ell)"),
    meals = ff_label(meals, "Meals eligible (percent)(meals)"),
    mobility = ff_label(mobility, "First year at the school (percent)(mobility)"),
    sch.wide = ff_label(sch.wide, "School-wide target met (sch.wide)")
    )
		
# Linear example
dependent = "api00"
explanatory = c("ell", "meals", "mobility")

# Stratified design
dstrat = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

# Univariable fit
fit_uni = dstrat %>%
    svyglmuni(dependent, explanatory) %>%
    fit2df(estimate_suffix = " (univariable)")

# Multivariable fit
fit_multi = dstrat %>%
    svyglmmulti(dependent, explanatory) %>%
    fit2df(estimate_suffix = " (multivariable)")
	
# Pipe together
apistrat %>%
    summary_factorlist(dependent, explanatory, fit_id = TRUE) %>%
    ff_merge(fit_uni) %>% 
    ff_merge(fit_multi) %>% 
    select(-fit_id, -index) %>%
    dependent_label(apistrat, dependent)

#               Dependent: API in 2000 (api00)             Mean (sd)    Coefficient (univariable)  Coefficient (multivariable)
#     English language learners (percent)(ell)  [0,84] 652.8 (121.0) -3.73 (-4.35--3.11, p<0.001)  -0.48 (-1.25-0.29, p=0.222)
#              Meals eligible (percent)(meals) [0,100] 652.8 (121.0) -3.38 (-3.71--3.05, p<0.001) -3.14 (-3.70--2.59, p<0.001)
# First year at the school (percent)(mobility)  [1,99] 652.8 (121.0)  -1.43 (-3.30-0.44, p=0.137)   0.23 (-0.54-1.00, p=0.567)

# Binomial example
## Note model family needs specified and exponentiation if desired
dependent = "sch.wide"
explanatory = c("ell", "meals", "mobility")

# Univariable fit
fit_uni = dstrat %>%
    svyglmuni(dependent, explanatory, family = "quasibinomial") %>%
    fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (univariable)")

# Multivariable fit
fit_multi = dstrat %>%
    svyglmmulti(dependent, explanatory, family = "quasibinomial") %>%
    fit2df(exp = TRUE, estimate_name = "OR", estimate_suffix = " (multivariable)")

# Pipe together
apistrat %>%
    summary_factorlist(dependent, explanatory, fit_id = TRUE) %>%
    ff_merge(fit_uni) %>% 
    ff_merge(fit_multi) %>% 
    select(-fit_id, -index) %>%
    dependent_label(apistrat, dependent)

# Dependent: School-wide target met (sch.wide)                    No         Yes          OR (univariable)        OR (multivariable)
#     English language learners (percent)(ell) Mean (SD) 22.5 (19.3) 20.5 (20.0) 1.00 (0.98-1.01, p=0.715) 1.00 (0.97-1.02, p=0.851)
#              Meals eligible (percent)(meals) Mean (SD) 46.0 (29.1) 44.7 (29.0) 1.00 (0.99-1.01, p=0.968) 1.00 (0.98-1.01, p=0.732)
# First year at the school (percent)(mobility) Mean (SD)  13.9 (8.6) 17.2 (13.0) 1.06 (1.00-1.12, p=0.049) 1.06 (1.00-1.13, p=0.058)
@ewenharrison ewenharrison reopened this Apr 3, 2019
@tylcole
Copy link
Author

@tylcole tylcole commented Apr 4, 2019

That's fantastic, I should have time in the next week to go through this and confirm. I will also test on my own data. I'll respond as soon as I do. Thank you for this great update!

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Apr 4, 2019

Thanks. One issue is that the summary statistics from summary_factorlist() are not weighted. They cannot be weighted using this function. So to be useful it might be necessary to incorporate svymean() or similar.

What would be useful is to know how you would actually use this. What your final table would actually look like. What variations would be common in practice etc. Best wishes.

@tylcole
Copy link
Author

@tylcole tylcole commented Apr 18, 2019

Ewen I just took a look and that looks correct, though like you mention a key feature that would be really nice is the weighted means and SD in the table using svymean(). The final table would be just like you are demonstrating, except weighted mean/SD.

This is a basic example of what those tables would look like in a paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5624560/pdf/cureus-0009-00000001536.pdf

@larmarange
Copy link

@larmarange larmarange commented Oct 24, 2019

Ideally, summary_factorlist() should test if .data is a svydesign object. In such case, all stats could be computed using survey functions. i.e. svytable, svymean, svychisq, etc.

The desired table is the same as with unweighted data. It's just that the computation of them should used survey functions.

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Oct 24, 2019

Thanks @larmarange
Is it worth a new function? Would people use it?

@larmarange
Copy link

@larmarange larmarange commented Oct 25, 2019

Dear @ewenharrison thank you very much for your feedback. You have done a fantastic job on finalfit.

I just spent 3 days teaching R to master students and teachers in population studies. They were very impressed by summary_fatorlist(), finalfit() and or_plot() as they are very easy to use.

In population studies, most of them are working with national surveys. Such surveys have complex sampling and/or post-calibration. Almost all of them required to use the survey package.

When I presented multivariate analysis using survey, they were very frustrated not to be able to use finalfit.

I would say it's really worth to extend summary_facorlist() to be able to deal with a survey design object instead of a data.frame and to use therefore survey functions (such as svytable(), svymean(), svychisq()...) to generate the table.

Similarly, if summary_factorlist() is adaped to manage survey.design class (it's simple to test if an object inherits that class to load a specific function), then it should be quite easy to adapt finalfit() and or_fit() as well.

By the way, just for you to know, survey also provide a svycoxph() function.

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Oct 25, 2019

Ok, well if you could point me to a suitable dataset and provide 2 or 3 example analyses using survey, together with the table output you would like to to see then I can have a look.
Many of the examples in library(survey) help seem very complex and difficult to generalise.

@larmarange
Copy link

@larmarange larmarange commented Oct 26, 2019

Please find below an exemple using a dataset provided by questionr package.

library(questionr)
data(hdv2003)

library(dplyr)
#> 
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
hdv2003$etud <- recode_factor(
  hdv2003$nivetud,
  "N'a jamais fait d'etudes" = "Primary",
  "A arrete ses etudes, avant la derniere annee d'etudes primaires" = "Primary",
  "Derniere annee d'etudes primaires" = "Primary",
  "1er cycle" = "Secondary",
  "2eme cycle" = "Secondary",
  "Enseignement technique ou professionnel court" = "Professionnal",
  "Enseignement technique ou professionnel long" = "Professionnal",
  "Enseignement superieur y compris technique superieur" = "High"
)

library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#> 
#> Attachement du package : 'survey'
#> The following object is masked from 'package:graphics':
#> 
#>     dotchart
dw <- svydesign(ids = ~ 1, weights = ~ poids, data = hdv2003)

# Note: summary.formula not compatible with svydesign objects

# dependent variable
freq(svytable(~ sport, dw))
#>           n    % val%
#> Non 6714760 60.7 60.7
#> Oui 4356466 39.3 39.3

# binary explanatory
tab <- svytable(~ sexe + sport, dw)
tab
#>        sport
#> sexe        Non     Oui
#>   Homme 2850210 2299173
#>   Femme 3864550 2057293
rprop(tab)
#>           sport
#> sexe       Non   Oui   Total
#>   Homme     55.4  44.6 100.0
#>   Femme     65.3  34.7 100.0
#>   Ensemble  60.7  39.3 100.0
svychisq(~ sexe + sport, dw)
#> 
#>  Pearson's X^2: Rao & Scott adjustment
#> 
#> data:  svychisq(~sexe + sport, dw)
#> F = 12.444, ndf = 1, ddf = 1999, p-value = 0.0004289

# explanotory with more modalities
tab <- svytable(~ etud + sport, dw)
tab
#>                sport
#> etud                  Non       Oui
#>   Primary       1978619.6  234477.9
#>   Secondary     1428124.8  671021.2
#>   Professionnal 2050834.3 1315064.1
#>   High          1071009.1 1504802.3
rprop(tab)
#>                sport
#> etud            Non   Oui   Total
#>   Primary        89.4  10.6 100.0
#>   Secondary      68.0  32.0 100.0
#>   Professionnal  60.9  39.1 100.0
#>   High           41.6  58.4 100.0
#>   Ensemble       63.7  36.3 100.0
svychisq(~ etud + sport, dw)
#> 
#>  Pearson's X^2: Rao & Scott adjustment
#> 
#> data:  svychisq(~etud + sport, dw)
#> F = 45.593, ndf = 2.9831, ddf = 5963.2106, p-value < 2.2e-16

# quantitative explanatory
svyby(~ heures.tv, ~ sport, dw, svymean, na.rm = TRUE) # mean
#>     sport heures.tv         se
#> Non   Non  2.417582 0.06689688
#> Oui   Oui  1.785640 0.06573058
svyttest(heures.tv ~ sport, dw)
#> 
#>  Design-based t-test
#> 
#> data:  heures.tv ~ sport
#> t = -6.7382, df = 1993, p-value = 2.092e-11
#> alternative hypothesis: true difference in mean is not equal to 0
#> 95 percent confidence interval:
#>  -0.8157581 -0.4481261
#> sample estimates:
#> difference in mean 
#>         -0.6319421

svyby(~ heures.tv, ~ sport, dw, svyquantile, quantiles = 1:3/4, ci = TRUE, na.rm = TRUE) # quartiles
#> Warning in vcov.svyquantile(X[[i]], ...): Only diagonal of vcov() available

#> Warning in vcov.svyquantile(X[[i]], ...): Only diagonal of vcov() available
#>     sport 0.25 0.5 0.75   se.0.25    se.0.5   se.0.75
#> Non   Non    1   2    3 0.0000000 0.0000000 0.2551067
#> Oui   Oui    1   2    3 0.1020427 0.1371243 0.2551067
svyranktest(heures.tv ~ sport, dw)
#> 
#>  Design-based KruskalWallis test
#> 
#> data:  heures.tv ~ sport
#> t = -6.1587, df = 1993, p-value = 8.846e-10
#> alternative hypothesis: true difference in mean rank score is not equal to 0
#> sample estimates:
#> difference in mean rank score 
#>                   -0.09905157

# binary regression
reg <- svyglm(sport ~ sexe + etud + heures.tv, family = quasibinomial(), design = dw)
questionr::odds.ratio(reg)
#>                         OR    2.5 %  97.5 %         p    
#> (Intercept)        0.18564  0.11762  0.2930 6.902e-13 ***
#> sexeFemme          0.79913  0.61522  1.0380  0.093059 .  
#> etudSecondary      3.73818  2.32314  6.0151 6.261e-08 ***
#> etudProfessionnal  4.94649  3.18799  7.6750 1.398e-12 ***
#> etudHigh          10.16897  6.43759 16.0632 < 2.2e-16 ***
#> heures.tv          0.88979  0.81820  0.9677  0.006427 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
broom::tidy(reg, exponentiate = TRUE, conf.int = TRUE)
#> # A tibble: 6 x 7
#>   term             estimate std.error statistic  p.value conf.low conf.high
#>   <chr>               <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)         0.186    0.233      -7.23 6.90e-13    0.118     0.293
#> 2 sexeFemme           0.799    0.133      -1.68 9.31e- 2    0.615     1.04 
#> 3 etudSecondary       3.74     0.243       5.43 6.26e- 8    2.32      6.02 
#> 4 etudProfessionn~    4.95     0.224       7.13 1.40e-12    3.19      7.67 
#> 5 etudHigh           10.2      0.233       9.94 9.76e-23    6.44     16.1  
#> 6 heures.tv           0.890    0.0428     -2.73 6.43e- 3    0.818     0.968
@larmarange
Copy link

@larmarange larmarange commented Oct 26, 2019

In terms of expected output, the idea would be to be able to do the following :

dep <- "sport"
expl <- c("sexe", "etud", "heures.tv")

library(finalfit)
dw %>% summary_factorlist(dep, expl, p = TRUE) 
dw %>% finalfit(dep, expl)

The table will be exactly the same as with a classic data.frame, except that all stats would have been computed using survey functions, taking into account sample weights and sample design.

Best regards

@larmarange
Copy link

@larmarange larmarange commented Oct 26, 2019

Just for information, if you prefer working with dplyr syntax, there is a package called srvyr extening dplyr verbs to survey objects.

https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html

With this package you can do things like:

dw <- as_survey_design(dw) # to add tbl_svy class to dw
dw %>% group_by(sport) %>% summarise(mean = survey_mean(heures.tv, na.rm = TRUE))

Here, the result will be a tibble.

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Oct 28, 2019

Thanks Joseph for the very useful example.

This probably gets us quite far. There are summary_factorlist() features not yet included, most notably, labels. But that would be an easy addition.

Would be tidied up and split into component functions prior to being deployed.

Interested in thoughts.

library(questionr)
data(hdv2003)

library(dplyr)
hdv2003$etud <- recode_factor(
  hdv2003$nivetud,
  "N'a jamais fait d'etudes" = "Primary",
  "A arrete ses etudes, avant la derniere annee d'etudes primaires" = "Primary",
  "Derniere annee d'etudes primaires" = "Primary",
  "1er cycle" = "Secondary",
  "2eme cycle" = "Secondary",
  "Enseignement technique ou professionnel court" = "Professionnal",
  "Enseignement technique ou professionnel long" = "Professionnal",
  "Enseignement superieur y compris technique superieur" = "High"
)

library(survey)
dw <- svydesign(ids = ~ 1, weights = ~ poids, data = hdv2003)

summary_survey <- function(.data, dependent, explanatory, cont = "mean", cont_range = FALSE, 
                           p = FALSE, column = FALSE, digits = c(1, 1, 3, 1), fit_id = FALSE){
  # Define variable type
  explanatory_type = .data$variables %>% 
    select(explanatory) %>% 
    purrr::map(is.numeric)
  
  # Hypothesis test
  if(p){
    p_tests = explanatory %>% 
      purrr::map2(explanatory_type,
                  ~if(!.y){
                    survey::svychisq(as.formula(paste0("~", ., "+", dependent)), .data)$p.value %>% 
                      p_tidy(digits[3])
                  } else if (.y & cont == "mean"){
                    survey::svyttest(as.formula(paste0(.x, "~", dependent)), .data)$p.value %>% 
                      p_tidy(digits[3])
                  } else if (.y & cont == "median"){
                    survey::svyranktest(as.formula(paste0(.x, "~", dependent)), .data)$p.value %>% 
                      p_tidy(digits[3])
                  }
      )
  }
  
  # Output table
  explanatory %>% 
    purrr::map2(explanatory_type,
                ~ if(!.y){
                  survey::svytable(as.formula(paste0("~", .x, "+", dependent)), .data) %>% 
                    as.data.frame(stringsAsFactors = FALSE) %>% 
                    { if(column) {
                      dplyr::group_by(., !! sym(dependent)) %>% 
                        dplyr::mutate(
                          total = sum(Freq),
                          prop = 100 * Freq / total
                        )
                    } else { 
                      dplyr::group_by_at(., 1) %>% 
                        dplyr::mutate(
                          total = sum(Freq),
                          prop = 100 * Freq / total
                        )
                    }
                    } %>% 
                    dplyr::mutate(
                      value = paste0(Freq %>% round_tidy(digits[4]), " (", 
                                     prop %>% round_tidy(digits[4]), ")")
                    ) %>%
                    dplyr::select(-total, -prop, -Freq) %>% 
                    tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>% 
                    dplyr::mutate(
                      label = names(.)[1]
                    ) %>% 
                    dplyr::rename(levels = 1)
                } else {
                  { if(cont == "mean") {
                    survey::svyby(as.formula(paste0("~", .x)), as.formula(paste0("~", dependent)), .data, svymean, na.rm = TRUE) %>%
                      dplyr::mutate(
                        value = paste0(!! sym(.x) %>% round_tidy(digits[1]), " (", 
                                       se %>%  round_tidy(digits[2]), ")") 
                      ) %>% 
                      dplyr::select(-c(.x, se)) %>% 
                      tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>% 
                      dplyr::mutate(
                        label = .x,
                        levels = "Mean (SE)"
                      )
                  } else if(cont == "median"){
                    survey::svyby(as.formula(paste0("~", .x)), as.formula(paste0("~", dependent)), .data, 
                                  svyquantile, quantiles = 1:3/4, ci = TRUE, na.rm = TRUE) %>% 
                      dplyr::rename(Q1 = 2,
                                    Q2 = 3,
                                    Q3 = 4) %>% 
                      dplyr::mutate(
                        IQR = Q3 - Q1) %>% 
                      { if(cont_range){
                        dplyr::mutate(., 
                                      value = paste0(Q2 %>% round_tidy(digits[1]), " (", 
                                                     Q1 %>%  round_tidy(digits[2]), " to ",
                                                     Q3 %>% round_tidy(digits[2]), ")")
                        )                        
                      } else {
                        dplyr::mutate(.,
                                      value = paste0(Q2 %>% round_tidy(digits[1]), " (", 
                                                     IQR %>%  round_tidy(digits[2]), ")")
                        )
                      }} %>% 
                      dplyr::select(-c(2:8)) %>%                      
                      tidyr::pivot_wider(names_from = !! dependent, values_from = value) %>% 
                      dplyr::mutate(
                        label = .x,
                        levels = "Median (IQR)"
                      )
                  }
                  }
                }) %>% 
    
    # Add hypothesis test
    { if(p){
      purrr::map2_df(., p_tests,
                     ~ mutate(.x,
                              p = .y)
      )} else {
        dplyr::bind_rows(.)
      }} %>%
    dplyr::select(label, levels, dplyr::everything()) %>% 
    as.data.frame() %>%
    { if(fit_id){
      levels_id = .$levels
      drop = levels_id %in% c("Mean (SE)", "Median (IQR)")
      levels_id[drop] = ""
      mutate(., 
             fit_id = paste0(label, levels_id),
             index = 1:dim(.)[1])
    } else {
      .
    }} %>% 
    rm_duplicate_labels()
}


library(finalfit)
dependent = "sport"

# Choose different options to test
explanatory = c("sexe")
explanatory = c("heures.tv")
explanatory = c("sexe", "heures.tv", "etud")

dw %>% 
  summary_survey(dependent, explanatory, cont = "mean", cont_range = FALSE, p = FALSE, column = FALSE, 
                 digits = c(1, 1, 3, 0))


ff_merge(
  summary_survey(dw, dependent, explanatory, fit_id = TRUE),
  svyglmmulti(dw, dependent, explanatory, family = "quasibinomial") %>%
    fit2df(exp = TRUE, estimate_name = "OR (multivariable)"),
  last_merge = TRUE
)
@larmarange
Copy link

@larmarange larmarange commented Oct 29, 2019

Thanks a lot. That's great!!!!

Just a small comment: for digits argument, it would be nice to consider a 5th value correspond to the number of digits for the number of observations (for example, to be able to use 0 for the number of observations while keeping 1 for percentage).

Otherwise, everything seems to behave as expected.

Thanks again

@tylcole
Copy link
Author

@tylcole tylcole commented Nov 13, 2019

@nobu-nsk
Copy link

@nobu-nsk nobu-nsk commented Nov 24, 2019

Thank you very much for this excellent development.
I also needed exactly this functionality.

I noticed that the currently Mean (SE) is used instead of (SD).
Definitely, more often we need SD for reporting together with means.
Unfortunately, svymean package does not provide SD but it can be derived easily from svyvar:

svyvar(~x,design)[1] %>% sqrt()
or
svyby(~x, ~dependent, design, svyvar, na.rm = TRUE)[2] %>% sqrt()

I would be great to have Mean (SD), which is also ensuring consistency with summary_factorlist().

Nobu Nishikiori

@tylcole
Copy link
Author

@tylcole tylcole commented Nov 24, 2019

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Nov 24, 2019

Thanks both.
Just to confirm, there is not a library(survey) function that will give both the mean and the sd in the same function?
I have re-written the original summary_factorlist() to remove some old dependencies and incorporate requested features.
We'll get that and this out soon. Thanks again. Ewen

@tylcole
Copy link
Author

@tylcole tylcole commented Nov 24, 2019

@nobu-nsk
Copy link

@nobu-nsk nobu-nsk commented Nov 26, 2019

I believe so.

Just to confirm, there is not a library(survey) function that will give
both the mean and the sd in the same function?
Thank you very much for incorporating the feature.
Nobu

@e-ibrahim
Copy link

@e-ibrahim e-ibrahim commented Jan 30, 2020

Hi @ewenharrison

Please, I am a newbie to R and I am trying to adapt your "summary_survey(dw, dependent, explanatory, fit_id = TRUE)" solution to my work with below parameters:

surveydes=svydesign(id=~psu, strata=~strata, weights=~pweight, data=data)
dependent="knowledge"
explanatory=c("agegroup", "married", "children")

#knowledge (factor var: knowledgeable vs. not knowledgeable)
#agegroup (factor var: 15-24 vs. 25-34, 35-49)
#married (factor var: never married vs. ever married)
#children (integer: 0, 1, 2, ...,19

surveydes %>%
summary_survey(dependent, explanatory, cont = "mean", cont_range = FALSE, p = FALSE, column = FALSE,
digits = c(1, 1, 3, 0))

Each time I do run my code, it returns:
"Error in select(., explanatory) : unused argument (explanatory)"

I copied and pasted your "summary_survey" source codes in my R environment. But, I am certain there is something I have either misapplied or am not doing right.
Could you please advise?

Thank you,
Elhakim

@ewenharrison
Copy link
Owner

@ewenharrison ewenharrison commented Feb 7, 2020

Hi, not sure why it is not working for you. Do you have a reproducible example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants