Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Population variance and ability to suppress computation of standard errors #48

Merged
merged 4 commits into from Apr 11, 2019
Merged

Conversation

tzoltak
Copy link
Contributor

@tzoltak tzoltak commented Mar 14, 2019

Hi!

  • I added two functions: survey_var and survey_sd allowing computation of population variance and standard deviation (with svyvar underneath). I also added some basic tests of this functions to test_survey_statistics.r.
  • I patched all the survey_ functions, so it is possible to suppress computation of (any kind of) standard errors (resolving Calculating contingency tables without showing SE #45 ). And I also added drop=FALSE to one row in summarise.grouped_svy so it is able to carry on with such results.
  • I made survey_median don't add "_q50" suffix to names of all the variables it generates.
  • I made some assertion checks and error messages among survey_ functions a little more consistent.

Hope it will be somehow helpful.

@gergness
Copy link
Owner

Wow, awesome, thank you so much! I'll take a look in the next week or so!

@codecov-io
Copy link

codecov-io commented Mar 14, 2019

Codecov Report

Merging #48 into master will increase coverage by 1.72%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #48      +/-   ##
==========================================
+ Coverage   79.44%   81.17%   +1.72%     
==========================================
  Files          17       17              
  Lines         905      956      +51     
==========================================
+ Hits          719      776      +57     
+ Misses        186      180       -6
Impacted Files Coverage Δ
R/summarise.r 88.57% <100%> (ø) ⬆️
R/survey_statistics_helpers.R 83.59% <100%> (+1.97%) ⬆️
R/survey_statistics.r 90.37% <90%> (+3.91%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update caf5055...0d125b4. Read the comment docs.

@tzoltak
Copy link
Contributor Author

tzoltak commented Mar 14, 2019

I hope to implement joins (apart of full join and right join) in the future - they're reasonable if only there is no one-to-many relations between survey object data and data that is to be joined. So this is something that has to be checked before doing actual join and if such relations are found, it will throw an error. With inner, semi and anti joins subsetting of the survey design information must be also implemented but it won't be hard to do with all the functions that are already in the package.

Copy link
Owner

@gergness gergness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not quite one week, sorry about that!

This is great, but a few minor comments. Do you have time to fix them? I really appreciate it.

R/survey_statistics.r Outdated Show resolved Hide resolved
R/survey_statistics.r Outdated Show resolved Hide resolved
R/survey_statistics.r Outdated Show resolved Hide resolved
tests/testthat/test_as_survey_twophase.r Outdated Show resolved Hide resolved
R/survey_statistics.r Outdated Show resolved Hide resolved
@tzoltak
Copy link
Contributor Author

tzoltak commented Mar 26, 2019

I can fix this issues tomorrow evening.

@tzoltak
Copy link
Contributor Author

tzoltak commented Mar 28, 2019

I didn't manage yesterday - I started to rewrite tests regarding survey_quantile() and survey_median a little (I checked how to get SE from objects returned by svy_quantile) and also I want to add tests to cover calls to functions in survey_statistics.r that result in errors thrown on assertion stage (missing x in case of non-grouped objects and factor/character x). However I'll be working on this today and perhaps tomorrow (if some other things don't let me spend enough time on this today).

Regarding CI for survey_var I thought about and changed my mind: if srvyr is a kind of front-end for survey, perhaps we shouldn't try to be wiser than survey is. If it is possible to obtain CI for variance in survey (in fact it is) and this CI is estimated using t (or normal with df=Inf) distribution - let it be. We may write a warning in description within help page of survey_var that such CI can be a very poor one (especially when analyzing subgroups) but nothing more.
What do you think about this?

@gergness
Copy link
Owner

No worries, thanks for the update!

Yeah that makes a lot of sense to me, I’ve generally tried to defer the statistical reasoning to the survey package, as it makes support easier. A warning in the documentation seems like a reasonable compromise to me.

@gergness gergness merged commit 192e06a into gergness:master Apr 11, 2019
@gergness
Copy link
Owner

Ugh, sorry that took me so long, it's been a month of sickness in my house.

Thank you so so much for your work!

@tzoltak
Copy link
Contributor Author

tzoltak commented Apr 11, 2019

Great! :)
Perhaps you should only tidy up version number, because in my pull request it was a rather provisional one.

@gergness
Copy link
Owner

Yep, will do a full CRAN release soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants