Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power documentation and Example #4091

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mikekaminsky
Copy link
Contributor

@mikekaminsky mikekaminsky commented Oct 29, 2017

Improvements to power documentation. Comments and suggestions welcome!

Working on suggestions noted in #4085

Future Work:

  • Examples for plot_power() function
  • Potentially remove references to user-facing power() function and point everyone to the solve_power() versions
  • Figure out how to document the relationship between the classes and the helper functions

@mikekaminsky
Copy link
Contributor Author

mikekaminsky commented Oct 29, 2017

I could use some help figuring out how the documentation system is supposed to work. A few issues:

  • I'm getting an error in my use of smp.prop_ind_solve_power in the document power.rst and I'm not sure why
  • Is there way to use make html on only one file for faster iteration?
  • How do you control what shows up in the documentation pages (e.g., how do I make the signature of the function show up as referenced in this comment: DOC: Confusion in power functions and documentation #4085 (comment))?

@coveralls
Copy link

coveralls commented Oct 29, 2017

Coverage Status

Coverage increased (+0.0002%) to 81.883% when pulling be986f1 on mikekaminsky:power_documentation into 0c3f6e9 on statsmodels:master.

@coveralls
Copy link

coveralls commented Oct 29, 2017

Coverage Status

Coverage increased (+0.0002%) to 82.863% when pulling 5b72ff8 on mikekaminsky:power_documentation into 40e4f56 on statsmodels:master.

@codecov-io
Copy link

codecov-io commented Oct 29, 2017

Codecov Report

Merging #4091 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4091      +/-   ##
==========================================
+ Coverage   80.26%   80.26%   +<.01%     
==========================================
  Files         564      564              
  Lines       85555    85556       +1     
  Branches     9675     9675              
==========================================
+ Hits        68669    68670       +1     
- Misses      14656    14659       +3     
+ Partials     2230     2227       -3
Impacted Files Coverage Δ
statsmodels/stats/api.py 100% <ø> (ø) ⬆️
statsmodels/stats/power.py 80.16% <100%> (+0.08%) ⬆️
statsmodels/iolib/summary.py 54.16% <0%> (ø) ⬆️
statsmodels/tsa/statespace/mlemodel.py 85.61% <0%> (ø) ⬆️
statsmodels/stats/libqsturng/make_tbls.py 0% <0%> (ø) ⬆️
statsmodels/stats/descriptivestats.py 24.13% <0%> (ø) ⬆️
statsmodels/sandbox/regression/treewalkerclass.py 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40e4f56...5b72ff8. Read the comment docs.

@josef-pkt
Copy link
Member

josef-pkt commented Oct 30, 2017

Is there way to use make html on only one file for faster iteration?

sphinx only rebuild the parts that have been changed, so it should be faster on repeated runs. AFAIR, I haven't done it in a while

How do you control what shows up in the documentation pages

That is done automatically by sphinx, I don't know if we can somehow force the missing signature.

I don't see a prop_ind_solve_power function, I need to check. I guess this should be using the Normal Power function.

@mikekaminsky
Copy link
Contributor Author

@josef-pkt I added the prop_ind_solve_power function here

What do you think of the rest of the content?

@josef-pkt
Copy link
Member

I will try to go over it later today.

We will most likely have to rename prop_ind_solve_power to prop_ztest_solve_power, because we need a different version also #1197

Copy link
Member

@josef-pkt josef-pkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this got lost,

I just found two pending review comments, but it looks like I didn't finish with the review.
I'm postponing this until after 0.9rc


Similar to the difference in proportions, we can use the :code:`tt_ind_solve_power` function to analyze the power of an experiment where we'll use a t-test (e.g., if we want to compare the difference in means between two groups in our study).

Imagine we're working on a study to compare the spending habits between two groups of students where the outcome of interest is the average amount spent on books.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we are" I prefer not to use the shortened forms in written text


Imagine we're working on a study to compare the spending habits between two groups of students where the outcome of interest is the average amount spent on books.

In this example, let's assume that we have are conducting a study where we know what our sample-size will be in advance, and we want to calculate the smallest effect-size we'll be able to detect with 80% power.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we have are conducting" ?

@mikekaminsky
Copy link
Contributor Author

I also forgot about this. I'll take another pass over the weekend.

@josef-pkt
Copy link
Member

I think this would fit now better in a examples/notebook, which makes it also easier for users to run the example interactively.

Those notebooks are a mixed bag, some with extensive explanations and some just with code examples.

@mikekaminsky
Copy link
Contributor Author

mikekaminsky commented Apr 27, 2018 via email

@mikekaminsky
Copy link
Contributor Author

@josef-pkt I moved what I had into a notebook in examples/.

I still think we could improve the docs for this functionality (e.g., for tt_ind_solve_power) which is the first statsmodels result when you search python power analysis .

In particular, something on that page that points either to the example notebook or a more info on the suite of power tests would be helpful for end users.

LMK if I can help with that or if you just want to leave it.

@josef-pkt
Copy link
Member

@mikekaminsky
The docstring can have an Examples section, which is very useful for users to quickly see how to use it.
The main problem is that it is a lot of work to add those to many functions and methods.

In our old sourceforge doc pages it was possible to directly link to the notebooks. I don't think it's currently possible in the new github site, but those links should (eventually) be available again.
(I didn't really pay a lot of attention to how the current doc setup works.)

@mikekaminsky mikekaminsky changed the title WIP Power documentation Power documentation and Example Apr 30, 2018
@mikekaminsky
Copy link
Contributor Author

@josef-pkt works for me. I think this is an improvement that's good to go out as-is unless you have additional comments / suggestions.

@josef-pkt josef-pkt modified the milestones: 0.10, 0.9 Apr 30, 2018
@josef-pkt
Copy link
Member

I put milestone 0.9 back.
I think we can still add it to 0.9, except for the prop_ind_solve_power function. We could define it in the notebook itself.

I'm not sure yet whether prop_ind_solve_power doesn't need to have different options, i.e. add a method argument or use different names for different version of normal based prop_ind tests
The current version is not what most users expect coming from other packages, unless they come from the effect size references.

@mikekaminsky
Copy link
Contributor Author

I think I was trying to both follow your patterns and show how to generate what you get in R

R> power.prop.test(p1=0.10, p2=0.1*(1+.05), power=.80)

     Two-sample comparison of proportions power calculation 

              n = 57763
             p1 = 0.1
             p2 = 0.105
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

I can make the interface the same as the R one so that you just give two proportions (and the ES gets calculated under the hood) -- LMK what you think would be most useful to users

@bashtage bashtage self-assigned this May 2, 2019
@bashtage
Copy link
Member

  • The notebook is still incomplete, and is missing the second section
  • @josef-pkt comment about ading the alias needs to be addressed.

@bashtage bashtage removed their assignment May 10, 2019
@mikekaminsky
Copy link
Contributor Author

@bashtage I don't really have time to pick this back up right now. Happy to simply close the PR.

@EmanuelGoncalves
Copy link

Thanks for this great functionality.

I found the documentation of the class and methods to be a bit confusing to me particularly about the number of observations, i.e. if it was about the whole set or for each group.

Meanwhile I managed to clarify this through a similar implementation in R.

As I couldn't find a reply to this question I created a stackoverflow Q&A: https://stackoverflow.com/questions/68365921/statsmodels-power-analysis-number-of-observations/68366180#68366180

Hope this is of any use.

Thanks,

@josef-pkt
Copy link
Member

@EmanuelGoncalves
saying it's nobs "per group" is a bit tricky because group sizes are not the same if ratio is not 1 in two sample case.

So, the general pattern is that nobs1, i.e. the nobs of the first sample, is the main sample size variable, the sample size of the second sample, nobs2 and total nobs, then depend on the ratio.

@EmanuelGoncalves
Copy link

That makes sense, thank you.

But isn't it still ambiguous in the plot_power function of the TTestIndPower class, since the argument is nobs?

@josef-pkt
Copy link
Member

The name nobs is ambiguous or "wrong", but it's inherited and we would need to add a method to subclasses to adjust the signature of the method.

I guess we should do this, but when I'm writing new things, I'm more focused on getting the code to work instead of thinking about every part of the api. That's were feedback becomes helpful.

A bit similar: some functions work for both one and two sample cases, and we cannot have nobs and nobs1 at the same time. even though that would be appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
0.10
Awaiting triage
0.9
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

6 participants