Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: power_proportions_2indep with value different of zero and different compare #8251

Open
josef-pkt opened this issue Apr 26, 2022 · 3 comments

Comments

@josef-pkt
Copy link
Member

josef-pkt commented Apr 26, 2022

power_proportions_2indep does not support value different from equality.
Also, not compare option, which will become only relevant if non-zero null is supported.

see comments at around here #8049 (comment)

I partially read Farrington and Manning 1990 again.
The main task is to add constrained parameter estimates as in score test. Using this as default extends the pooled estimate under the null and is backwards compatible.

Farrington and Manning also have the option of using the wald variance, which I guess is based on the proportions under the alternative.

It looks like this follows the same pattern as we have for poisson and negbin in PR #8166
F/M also have an estimate under fixed margins, but they mention that the samplesize or power estimates are not very good.

Aside: proportion has very few references
I cannot find anymore what I used to implement the power function.
The generic part is general in many references, but I don't know what the specific Binomial reference is (using the pooled estimate instead of (wald) props under alternative).

I guess the power for the equality test is also old and common.
e.g. Lachine 1981 Introduction to sample size determination and power analysis for clinical trials
eq. (12) but he uses fraction of total nobs instead of nobs_ratio, and sample size formula is for total nobs instead of nobs1.
(I had read it in the past, so it was one of the references, but not directly the one I implemented)

related
#6721 refactoring, reorganizing proportions

@josef-pkt
Copy link
Member Author

josef-pkt commented Feb 24, 2023

bump for 0.15

need to go through the NCSS/PASS docs again
e.g.
Non-Zero_Null_Tests_for_the_Difference_Between_Two_Proportions.pdf
has examples for power for Farrington-Manning score test
and Chow, Shao, and Wang unpooled (i.e. wald) test

and similarly for ratio of proportions
Non-Unity_Null_Tests_for_the_Ratio_of_Two_Proportions.pdf

see also #8676

check, implementation idea

Can we reuse the test function with nobs1=1, nobs2=nobs_ratio, count1=prop1, count2=prop2 to get the nobs1-normalized null variance?
It think that should work, and we would not need to duplicate the code, or separate out into helper functions.

@josef-pkt
Copy link
Member Author

josef-pkt commented Feb 25, 2023

To using base nobs1=1 in test:
small sample corrections make var not linear in scale, i.e. var at base nobs does not directly scale up.
Consequently, we cannot use the one-tail sample size formula with var independent of nobs to compute required sample size.
The power computation will have to take into account that nobs normalized variance still depends on nobs.
In larger samples, the small sample corrections will be small and ignoring it will still give a good approximation.

For diff, this happens with methods agresti-caffo and with score, correction=True

e.g. comparing nobs-normalized variance for cases nobs=1 versus nobs=100 versus nobs=1000

n1 = 100
n2 = n1 * ratio
​
for meth in method_names_proportions_2indep["test"]["diff"]:
    print(meth)
    res1 = smp.test_proportions_2indep(p1, 1., p2, ratio, method=meth, correction=False)
    res100 = smp.test_proportions_2indep(n1*p1, n1, n2 * p2, n2, method=meth, correction=False)
    k = 10
    res1000 = smp.test_proportions_2indep(k*n1*p1, k*n1, k*n2 * p2, k*n2, method=meth, correction=False)
    print(res1.variance, res100.variance * n1, res1000.variance * k*n1)

wald
0.04869999999999999 0.04869999999999999 0.04869999999999999
score
0.04875 0.04875 0.048749999999999995
agresti-caffo
0.14995185185185184 0.06492600885029137 0.050399000803974504

a simple solution would be two-step or iterative, using one of the other sample sizes as starting value for nobs1.

@josef-pkt
Copy link
Member Author

here's the full version, all compare and methods, score only without correction

Variance for odds-ratio methods logit-adjusted and logit-smoothed and ratio method log-adjusted for nobs1=1 looks strange. They are very small compare to larger nobs1.

n1 = 100
n2 = n1 * ratio
​
for compare in ["diff", "ratio", "odds-ratio"]:
    print("\n", compare)
    for meth in method_names_proportions_2indep["test"][compare]:
        print(meth)
        res1 = smp.test_proportions_2indep(p1, 1., p2, ratio, compare=compare, method=meth, correction=False)
        res100 = smp.test_proportions_2indep(n1*p1, n1, n2 * p2, n2, compare=compare, method=meth, correction=False)
        k = 10
        res1000 = smp.test_proportions_2indep(k*n1*p1, k*n1, k*n2 * p2, k*n2, compare=compare, method=meth, correction=False)
        print(res1.variance, res100.variance * n1, res1000.variance * k*n1)

 diff
wald
0.04869999999999999 0.04869999999999999 0.04869999999999999
score
0.04875 0.04875 0.048749999999999995
agresti-caffo
0.14995185185185184 0.06492600885029137 0.050399000803974504

 ratio
log
81.33333333333334 81.33333333333333 81.33333333333334
score
0.04875 0.04875 0.048749999999999995
log-adjusted
2.476536042573778 66.5813788201848 79.56837255102957

 odds-ratio
logit
85.38466933165019 85.38466933165019 85.38466933165019
score
82.05128205128204 82.05128205128204 82.05128205128204
logit-adjusted
5.165817160426324 70.61229802346554 83.61765756585088
logit-smoothed
41.429688172927776 84.47159197748753 85.29244331587292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant