Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: FTestPower always Fails to converge on a solution, wrong names, unclear usage #8646

Closed
sam-s opened this issue Feb 2, 2023 · 10 comments · Fixed by #8656
Closed

BUG: FTestPower always Fails to converge on a solution, wrong names, unclear usage #8646

sam-s opened this issue Feb 2, 2023 · 10 comments · Fixed by #8656

Comments

@sam-s
Copy link

sam-s commented Feb 2, 2023

edit (josef)
FTestPower mixes up df_num, df_denom names of keywords, meaning is reversed.
Code itself works correctly but not obvious how to use it.

Describe the bug

I am trying to do a power analysis for regression - compute the number of necessary observations.

No matter what parameters I supply, I get

python3.10/site-packages/statsmodels/stats/power.py:415: ConvergenceWarning: 
Failed to converge on a solution.

and the return value is array([3.]).

  1. Why does it return an array?
  2. Why does it always return the same value?

Code Sample, a copy-pastable example if possible

import statsmodels.api as sm
analysis = sm.stats.FTestPower()
analysis.solve_power(nobs=None, effect_size=1, alpha=0.1, power=0.9, df_num=1)

I assume that the effect size is f^2=r^2/(1-r^2) (Cohen's f^2).
This is a simple regression, so df=1.

Expected Output

I expect a single number returned - at least for some inputs, no warnings.

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

Python: 3.10.6.final.0
OS: Linux 6.0.6-76060006-generic #202210290932166906205022.04~d94609a SMP PREEMPT_DYNAMIC Mon N x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8

statsmodels

Installed: 0.13.5 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/statsmodels)

Required Dependencies

cython: Not installed
numpy: 1.24.1 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/numpy)
scipy: 1.10.0 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/scipy)
pandas: 1.5.3 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/pandas)
dateutil: 2.8.2 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/dateutil)
patsy: 0.5.3 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/patsy)

Optional Dependencies

matplotlib: 3.6.3 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/matplotlib)
backend: module://matplotlib_inline.backend_inline
cvxopt: Not installed
joblib: 1.2.0 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/joblib)

Developer Tools

IPython: 8.9.0 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/IPython)
jinja2: 3.1.2 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/jinja2)
sphinx: Not installed
pygments: 2.14.0 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/pygments)
pytest: 7.2.1 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/pytest)
virtualenv: 20.17.1 (/home/sds/.virtualenvs/myself/lib/python3.10/site-packages/virtualenv)

@sam-s
Copy link
Author

sam-s commented Feb 2, 2023

Actually, I do get some convergence:

analysis.solve_power(nobs=None, effect_size=10, alpha=0.1, power=0.9, df_num=1)
3.942830017311245

but here the effect is crazy high - r=95% and the return value is crazy low - ~4.

@josef-pkt
Copy link
Member

josef-pkt commented Feb 3, 2023

effectsize for FTestAnovaPower is Cohen's f (sqrt of f2)
https://stats.stackexchange.com/questions/544027/different-results-for-anova-power-calculation-by-r-and-statsmodels/544144

The effect size for FTestPower is also Cohen's f, (according to the unit test)
see also #8403 (comment)

I don't remember the details for the F-test effect sizes and how to compute Cohen's f.
(open issue to collect effect size measures #7674)

@josef-pkt
Copy link
Member

doc problem
The docstring explanation for effectsize does not specify which effect size is used. Sentence is generic and is based on a "d" effect size.

e.g. https://www.statsmodels.org/dev/generated/statsmodels.stats.power.FTestPower.solve_power.html

effect_size[float]
standardized effect size, mean divided by the standard deviation. effect size has to be positive.

@josef-pkt
Copy link
Member

I found a stackoverflow question that uses r_squared from OLS

this uses effect_size=np.sqrt(R2/(1-R2)),
Based on comment it's the same issue f versus f_squared
https://stackoverflow.com/questions/64420111/how-can-i-get-the-ftest-solve-power-methods-in-statsmodel-to-compute-power-for

@josef-pkt
Copy link
Member

@sam-s
Copy link
Author

sam-s commented Feb 3, 2023

  1. Thank you for the clarification, I need to take the root of my f^2
  2. I agree that the doc needs to be fixed.
  3. There are still the issues of (A) failure to converge for reasonable values of effect size and (B) returning an array on failure to converge.

@sam-s
Copy link
Author

sam-s commented Feb 3, 2023

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3328081/ equ. (1) f**2 = R2 / (1 - R2) and similar for partial R2

also f**2 = eta2 / (1 - eta2) https://www.ibm.com/support/pages/effect-size-relationship-between-partial-eta-squared-cohens-f-and-cohens-d https://www.statsmodels.org/dev/generated/statsmodels.stats.oneway.convert_effectsize_fsqu.html

but power classes uses f = sqrt(f2)

Right, so Cohen's f=0.314 for correlation r=0.3 and it should be a valid effect size for solve_power, but

sm.stats.FTestPower().solve_power(nobs=None, effect_size=0.3, alpha=0.1, power=0.9, df_num=1)

warns Failed to converge on a solution and returns array([3.]).

@josef-pkt
Copy link
Member

just guessing, trying out an example.
AFAIR, the stackexchange question mentions switching df_num and df_denom

The following seems to be like what the unit test does. nobs is not used.

n = FTestPower().solve_power(effect_size=0.3, alpha=0.1, power=0.9, df_denom=1)
n
94.53640395938483

FTestPower().power(effect_size=0.3, alpha=0.1, df_denom=1, df_num=n)
0.9000000060405937

I need to check later why this is. No time for this right now.

@josef-pkt
Copy link
Member

josef-pkt commented Feb 4, 2023

(I had made a comment here, but I guess I closed the tab without sending it)

The stackoverflow mentioned reversing df_num and df_denom.

This turns out to be a naming bug. I opened #8651.

For backwards compatibility I will change only the docstring for now.

using df_denom=1 and solving for df_num = nobs - df_resid, I get

n = FTestPower().solve_power(effect_size=0.3, alpha=0.1, power=0.9, df_denom=1)
n
94.53640395938483

FTestPower().power(effect_size=0.3, alpha=0.1, df_denom=1, df_num=n)
0.9000000060405937
> pwr.f2.test(u = 1, v = 94.53640395938483, f2 = 0.3^2, sig.level = 0.1, power = NULL)

     Multiple regression power calculation 

              u = 1
              v = 94.53640395938483
             f2 = 0.09
      sig.level = 0.1
          power = 0.8999998023819036

> pwr.f2.test(u = 1, v = NULL, f2 = 0.3^2, sig.level = 0.1, power = 0.9)

     Multiple regression power calculation 

              u = 1
              v = 94.53645363507292
             f2 = 0.09
      sig.level = 0.1
          power = 0.9


note FTestPower has similar signature as pwr.f2.test with two differences

  • effect size is f and not f_squared
  • naming bug df_num and df_denom are reversed

@josef-pkt josef-pkt changed the title FTestPower always Fails to converge on a solution BUG: FTestPower always Fails to converge on a solution, wrong names, unclear usage Feb 7, 2023
@josef-pkt
Copy link
Member

PR #8656 added new class FTestPowerF2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants