ENH/REF/SUMM: enhance, refactor power classes #8652

josef-pkt · 2023-02-06T15:39:16Z

triggered by #8646 and #8651
see also #8159 for power classes without effect size
related issues: ...

The semi-generic power classes were written initially based on effect sizes and packages GPower and R pwr.
Design decisions were based on those packages with the generic structure it is often not "obvious" how to use those.

Target is to make them more directly usable and extend them to new cases, with possibly test specific power classes.

This should be based more on NCSS/PASS and new Stata pss, than on the previous packages.
The more recent power function, especially for cases where var/std differs between null and alternative as in rates and proportions, where heavily based on the NCSS/PASS docs.

Additionally, I want

power classes or/and effect sizes based on test statistics, tstat, fstat, ..., with normalized noncentrality, nc / nobs.
- main difference is e.g. using std of test statistic instead of std of population as in Cohen's d family.
...

not clear yet

options not yet included, e.g. unequal var in t_test
more general: power computation that are specification robust,
e.g. robust cov_type, excess dispersion in poisson, kurtosis in variance hypothesis tests
...

specific todos

review existing power classes for hidden assumptions especially what special cases they are designed for
- FTestPower, see comments in BUG: FTestPower always Fails to converge on a solution, wrong names, unclear usage #8646
- TTestIndPower: assumes equal var, and cohen's d effect size
maybe distinguish more clearly between keyword we can solve for and keywords that define setting or hypothesis test. For example we will likely need a method argument if we make classes for recently added power functions like those for rates and proportions.
power classes for recently added power functions, rates, proportions, variances, ...
power classes for TOST and other hypothesis tests that are not point hypothesis
...
...

I guess (not checked again): The basic power classes for one sample TTestPower can be used for generic case if std in effect size is the std of the (unstandardized) test statistic.
Why is there currently no NormalPower class? We only have NormalIndPower with same equal var assumption as TTestIndPower.
update
NormalIndPower can be used for one sample test if ratio=0

            ``ratio`` can be set to zero in order to get the power for a
            one sample test.

It wouldn't cost much to add a specific NormalPower class.
Aside:
NormalIndPower has an option in the __init__ self.ddof = ddof instead of as method keyword.
It's the only power class with an __init__ method

The text was updated successfully, but these errors were encountered:

josef-pkt · 2023-02-06T19:57:01Z

some overview references

https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/effect-size-power/faqhow-is-effect-size-used-in-power-analysis/
note, in the last part square for f are missing
noncentrality coefficient lambda = N*f = 60*.369^2 = 60*.136 = 8.17
should be
noncentrality coefficient lambda = N*f^2 = 60*.369^2 = 60*.136 = 8.17
also they use f^2 for regression and f for oneway anova in effect size definition

GPower manual has much better coverage of cases than when I wrote power classes
https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf

related Stata esize for various effect size estimates, also estat esize after regression anova
https://www.stata.com/manuals/resize.pdf
effect size overview page https://www.stata.com/features/overview/effect-sizes/

partially related meta-analysis effect size https://www.stata.com/manuals/metametaesize.pdf

NCSS/PASS several doc chapter on regression including linear, poisson and binomial, and GEE
https://www.ncss.com/software/pass/pass-documentation/#Regression
example Logit, power has different std under null and alternative, sample size formula on p. 3
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_the_Odds_Ratio_in_Logistic_Regression_with_One_Binary_X_and_Other_Xs-Wald_Test.pdf
However, for a normally distributed exog, the sample size formula does not have different std under null and alternative
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_the_Odds_Ratio_in_Logistic_Regression_with_One_Normal_X_and_Other_Xs-Wald_Test.pdf

https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Multiple_Regression.pdf
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Poisson_Regression.pdf

another interesting one WMW rankorder statistic, stochastically larger as in brunner-munzel
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Tests_for_Two_Ordered_Categorical_Variables-Non-Proportional_Odds.pdf
I never looked at this AFAIR.
references in their docs
Machin, D., Campbell, M., Tan, S.B., and Tan, S.H. 2018. Sample Size Tables for Clinical Studies, 4th Edition.
John Wiley & Sons. Hoboken, NJ.
Zhao, Y.D., Rahardja, D. Qu, Y. 2008. 'Sample size calculation for the Wilcoxon-Mann-Whitney test adjusting for
ties.' Statistics in Medicine, 27, 462-468.

in R
http://users.stat.umn.edu/~helwig/notes/espa-Notes.pdf
70 pages of slides, good overview, first part is effect size measure, multiple regression starting on p. 63,
He also has rcode and slides on other topics http://users.stat.umn.edu/~helwig/teaching.html
(e.g. inference for multivariate means might be interesting.)

partially related: effect size in meta-analysis https://bookdown.org/MathiasHarrer/Doing_Meta_Analysis_in_R/es-calc.html
low in formulas, references to text book
Lipsey, Mark W, and David B Wilson. 2001. Practical Meta-Analysis. SAGE.

slides for GPower https://med.und.edu/research/daccota/_files/pdfs/berdc_resource_pdfs/sample_size_gpower_module.pdf
mostly point-and-click instructions
overview tables of hypothesis tests on p. 10 and 11

josef-pkt · 2023-02-07T18:11:58Z

(random thought)

t-test with unequal variance (Welch, HC):
I think we need var_ratio as fixed keyword argument in power function.
It's possible to use the effect size with unequal variance, but if nobs-ratio changes, then this effect size will also change.
Similar to current computation of std correction for nobs1, nobs2 inside TTestIndPower class.
nobs = 1./ (1. / nobs1 + 1. / nobs2)
but in the unequal case, we will need to include var_ratio.
The optimal nobs_ratio should then depend on the var_ratio.

josef-pkt · 2023-02-10T19:12:04Z

semi-random idea

use keyword value "required" in solve_power, e.g. base rate or proportion for poisson, binomial

PoissonPower().solve_power(nobs=None, ..., base_rate="required", ...)
alternative would be to set it in __init__
PoissonPower(base_rate=None).solve_power(nobs=None, ...,)

The first is more flexible if we want to loop over a required keyword, and power method itself will be vectorized in the required keyword in many cases, but not in string keywords like method and compare.

Name of class would more likely be Poisson2indepPower, PoissonRegressionPower... for different sampling cases or models.

josef-pkt · 2023-02-10T20:06:20Z

not clear to me or I don't remember

How are effect size and power defined for alternative="smaller" ?
Does solve_power look for a negative effect size?

e.g. hypothesis tests with inherent heteroscedasticity:

How does it work if we use "ratio" or "diff" as (raw) effect size for Binomial and Poisson power?
Minimum detectable effect can be greater or smaller 1 or 0 resp, but only one side will have power > alpha with one-sided alternative.
With two-sided alternative, there will be a difference between "negative" and "positive" effect when std and var depend on the mean values (GLM type).
If we don't add a keyword option for which side we should compute, then we would have to compute both sides in some cases.

In cases other than solving for minimum detectable effect, there should not be a problem because users specify all rates or proportions under the alternative.

AFAICS, TestTTPowerTwoSx for two sample t-test do not include unit tests for alternative="smaller" (two-sided and larger are included)
What happens if effect size is negative?

However, docstring for effect size of TTestIndPower says "effect_size has to be positive."

Also, currently we don't have a null_value (for a margin under the null) in the power classes. In t-test it can be subsumed under the effect size (AFAIU, AFAIR), but not for full poisson, binomial tests. (inferiority, superiority)

Also, no power classes yet for any TOST, AFAICS. (need to check what interface R powerTOST package is using, I never looked at it AFAIR)

josef-pkt added type-enh comp-stats labels Feb 6, 2023

josef-pkt added this to the 0.15 milestone Feb 6, 2023

This was referenced Feb 24, 2023

SUMM: roadmap for 0.15 josef #8217

Open

ENH/SUMM (roadmap) Power and sample size computation #8705

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/REF/SUMM: enhance, refactor power classes #8652

ENH/REF/SUMM: enhance, refactor power classes #8652

josef-pkt commented Feb 6, 2023 •

edited

josef-pkt commented Feb 6, 2023

josef-pkt commented Feb 7, 2023

josef-pkt commented Feb 10, 2023

josef-pkt commented Feb 10, 2023

ENH/REF/SUMM: enhance, refactor power classes #8652

ENH/REF/SUMM: enhance, refactor power classes #8652

Comments

josef-pkt commented Feb 6, 2023 • edited

josef-pkt commented Feb 6, 2023

josef-pkt commented Feb 7, 2023

josef-pkt commented Feb 10, 2023

josef-pkt commented Feb 10, 2023

josef-pkt commented Feb 6, 2023 •

edited