Return Confidence Interval for nonparametric Mann Whitney U Test #225

kschuerholt · 2022-01-21T12:02:32Z

The t-test returns amongst other useful values the confidence interval on the difference between the means.
A CI on the difference of medians would be super useful to have for nonparametric tests like the MWU, so that not everybody has to comb through literature to figure out how to compute CIs for nonparametric tests.

I'm not entirely sure if for generic cases that'd require bootstrapping, or if closed-form solutions exist and robust enough.
A method to compute CIs for nonparametric tests is, i.e., given in Calculating confidence intervals for some non-parametric analyses, Michael J Campbell and Martin J Gardner, British Medical Journal 1988.

raphaelvallat#225 Implemented CI from 'Calculating confidence intervals for some non-parametric analyses', Campbell and Gardner 1988. CI Style is adapted from ttest. The same publication offers a solution for wilcoxon, which is not yet implemented but could be added fairly easily.

raphaelvallat · 2022-01-22T02:00:12Z

Hi @kschuerholt,

Thank you for opening the issue and submitting a PR. I'll dive into the latter in the next few days.

This is related to #153.

Thanks,
Raphael

kschuerholt · 2022-01-22T09:09:55Z

Hi @raphaelvallat

Thans for your great work, glad to be able to contribute in a small way.

#153 seems to be the same feature request for wilcoxon. The paper I used for the PR also gives CI interavals for wilcoxon, the computation is very similar to the one for mwu. It does look different to the CI computation in R, but I'm no statistician.

Best,
Konstantin

raphaelvallat · 2022-01-25T22:25:21Z

Thank you @kschuerholt! Looking at the documentation of the wilcox.test R function, it seems that they are using the following formula: Myles Hollander and Douglas A. Wolfe (1973). Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 27--33 (one-sample), 68--75 (two-sample).

Optionally (if argument conf.int is true), a nonparametric confidence interval and an estimator for the pseudomedian (one-sample case) or for the difference of the location parameters x-y is computed. (The pseudomedian of a distribution (F) is the median of the distribution of ((u+v)/2), where (u) and (v) are independent, each with distribution (F). If (F) is symmetric, then the pseudomedian and median coincide. See Hollander & Wolfe (1973), page 34.) Note that in the two-sample case the estimator for the difference in location parameters does not estimate the difference in medians (a common misconception) but rather the median of the difference between a sample from x and a sample from y.

That said, the paper that you have used for the MWU test is more recent than the paper they refer to, and I think it would make sense to use the formula they provide to implement CI for the wilcoxon test as well. Is this something you would have time and bandwidth to implement?

A few other comments on the PR:

The CI should be rounded and not displayed in full float precision, i.e. [-0.39, -0.09] instead of [-0.39290395101879694, -0.09400270319896187]. This should normally be done automatically by the _postprocess_dataframe function, which should round the CI95% column to two decimals.
Do you know of any other implementations (R, Matlab, SPSS) of this CI method? If so, it would be great to add the CI to the unit testing of the MWU function, i.e. comparing our results against another statistical software.
Could you make sure that the code follows the contributing guidelines? The code should be flake8-compatible. For instance, there must be white spaces between arithmetic operators here:

k = int(round(ct1*ct2/2 - (N * (ct1*ct2*(ct1+ct2+1)/12)**0.5)))

Thank you so much for your help on this,
Raphael

kschuerholt · 2022-01-26T17:16:46Z

Hi @raphaelvallat

I can't promise an ETA, but I can implement the corresponding CI method for wilcoxon in the next days or weeks.

I checked the source again. The paper I cited earlier is basically a user's reference, it's only cited 3 times. They in turn appear to take the CI computation method from Conover WJ. Practical non-parametric statistics. New York: Wiley, 1980. That appears a more reputable source with more than 20000 citations, but I couldn't get hold of a copy, yet. I'll see what I can do on that front. I'm not familiar with the related work, so I can't make a call on which is the better method to use.

Regarding the other comments:

At least locally, _postprocess_dataframe does give me the raw floats. Similar behavior for ttest e.g. w/ confidence=0.98. I'm not sure where you'd like to adress that.
I had a look, but as far as I could see, Matlab doesn't comput CI, SPSS computes the CI on the p value and R uses - as you mentioned above, another method... :/ Maybe the literature holds examples that can be used for unit testing.
Sure thing, sorry about that, will be considered in a new commit.

Cheers,
Konstantin

raphaelvallat · 2022-01-28T02:56:04Z

Hi @kschuerholt,

Thank you! That would be great if you could have a look at the wilcoxon CI, but no pressure at all. I am already very thankful for your contribution.

I was thinking that since there does not seem to be a single gold-standard method, we could also smply report the bootstrapped confidence intervals, using either scipy.stats.bootstrap or pingouin's own pg.compute_bootci function. However, this would drastically increase computation time, so if we use this we would need to allow the users to disable the CI though (e.g. by setting n_boot=0). Do you prefer the analytical or bootstrap method?

Also, please don't worry about the decimal rounding for now. I'll do a deep dive to fix this once the PR is ready.

Thanks,
Raphael

kschuerholt linked a pull request Jan 21, 2022 that will close this issue

Add confidence interval for MWU #226

Open

raphaelvallat added the feature request 🚧 New feature or request label Jan 22, 2022

raphaelvallat linked a pull request Jan 22, 2022 that will close this issue

Add confidence interval for MWU #226

Open

raphaelvallat mentioned this issue Feb 20, 2022

Roadmap for release 0.5.2 #242

Closed

18 tasks

raphaelvallat mentioned this issue Jun 18, 2022

Roadmap for release 0.6.0 #279

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return Confidence Interval for nonparametric Mann Whitney U Test #225

Return Confidence Interval for nonparametric Mann Whitney U Test #225

kschuerholt commented Jan 21, 2022

raphaelvallat commented Jan 22, 2022

kschuerholt commented Jan 22, 2022

raphaelvallat commented Jan 25, 2022

kschuerholt commented Jan 26, 2022

raphaelvallat commented Jan 28, 2022 •

edited

Loading

Return Confidence Interval for nonparametric Mann Whitney U Test #225

Return Confidence Interval for nonparametric Mann Whitney U Test #225

Comments

kschuerholt commented Jan 21, 2022

raphaelvallat commented Jan 22, 2022

kschuerholt commented Jan 22, 2022

raphaelvallat commented Jan 25, 2022

kschuerholt commented Jan 26, 2022

raphaelvallat commented Jan 28, 2022 • edited Loading

raphaelvallat commented Jan 28, 2022 •

edited

Loading