Updates CLES to be order sensitive #94

odgaard · 2020-05-14T18:45:44Z

Hi, first of all thank you so much for this library, it's been very useful!

I noticed when I was performing pairwise comparisions with one-sided "greater" than wilcoxon tests that the CLES didn't change even though the input order changed.

I believe the definition of CLES is strictly greater than, which causes discrete test sets like x and y with overlapping observations (x[-2] == y[-2]) to provide the behavior above. So I believe the correct CLES for result_1 would be 0.375 and result_2 would be 0.5833.

Thank you again for your work :)

codecov · 2020-05-14T18:57:30Z

Codecov Report

Merging #94 into master will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master      #94   +/-   ##
=======================================
  Coverage   99.54%   99.54%           
=======================================
  Files          17       17           
  Lines        3096     3096           
  Branches      500      500           
=======================================
  Hits         3082     3082           
  Misses          5        5           
  Partials        9        9

Impacted Files	Coverage Δ
pingouin/nonparametric.py	`98.64% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0388f44...284a6f9. Read the comment docs.

odgaard · 2020-05-14T19:20:54Z

Apologies, I didn't notice the contribution guidelines. Would you like me to change the tests and documentation or leave it be? The code changes I've made are trivial anyway. Hope the issue description was clear at least.

raphaelvallat · 2020-05-15T04:58:57Z

Hi @odgaard!

Thanks for opening the PR. I like the idea of making CLES tail-sensitive a lot. Following your PR, I read this blogpost explaining how to calculate CLES (in R), and they used a nice approach to handle ties, which would translate in Python to something like:

diff = x[:, None] - y
# Tail = 'greater', with ties set to 0.5
cles = np.where(diff == 0, 0.5, diff > 0).mean()
cles = cles if tail == 'greater' else 1 - cles

If we are to modify the pingouin.wilcoxon function, it would also make sense to update the pingouin.convert_effsize function which also returns CLES, but does not accept a "tail" argument. In that case, I would report either

cles = np.where(diff == 0, 0.5, diff > 0).mean()  # Method 1
cles = max(cles, 1 - cles)  # Method 2

What do you think?

For the PR, it would be great if you could:

Update the source code
Update the examples in the docstring of the function
Add the changes to the docs/changelog.rst file
Add one or two tests to validate the results against the R function from the blogpost.

If that's too much I can also take care of steps 2-4.

Thanks again,
Raphael

odgaard · 2020-05-15T09:40:53Z

That seems like a nice way to handle the ties.

I don't have a background in statistics, but I tried to research the topic a bit more and found that the Vargha-Delaney A measure seems to be the exact same adjustment to CLES as the one you suggested, Link to Paper. (See Eq 2. for the definition of measure of stochastic superiority)

In this case I agree that changing the CLES implementation to the Vargha-Delaney A measure is probably very usefull. However in that case the documentation and function name and df header might need to change as well? Which could be an unfortunate breaking change.

I might be misunderstanding the definition of CLES, however I feel like the code proposed in this PR would be the most representative of the original definition of CLES, while the Vargha-Delaney A measure is a very useful extension to CLES. So an alternative way to solve this could be to add the Vargha-Delaney A measure in addition to this PR's proposed change to CLES. The CLES documentation should probably then include a warning about the limitations of CLES with respect to ties.

Which solution would you prefer?

Edit: A compromise might be to change CLES the way you propose to the Vargha-Delaney A measure and keep the API untouched, yet add a note in the documentation specifying that the CLES implementation returns the A measure instead of the original CLES measure.

raphaelvallat · 2020-05-15T15:44:05Z

Hi @odgaard ! Thanks for your extensive reply and research!

I'm curious what other Pingouin users have to say. In my opinion, we should replace the calculation method by the one described in the Vargha and Delaney paper, leaving the API intact, but updating the documentation.

The reason I'd prefer using this "new" method is that 1) ties are frequent, especially when working with ordinal data (e.g. Likert-scale), and it seems important to adjust for them if we can, 2) the Vargha and Delaney paper has been cited >700 times so it seems like a well-known method.

Let me know what you think,
Thanks
Raphael

odgaard · 2020-05-15T19:25:44Z

No problem, I'm just glad to be able to help :)
Yes, I agree that would probably be the best solution. That way the functionality is upgraded without changing the API.

I've tried to setup pytest on my local machine, but I can't seem to configure it properly. So I would suggest that it's more convenient for you to make the changes to the code, since you know the codebase and documentation.

Thank you again for this very useful library!

Updates CLES to be order sensitive

284a6f9

raphaelvallat self-assigned this May 14, 2020

odgaard closed this May 15, 2020

odgaard mentioned this pull request Aug 5, 2020

Updating CLES in MWU test to Vargha and Delaney #114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates CLES to be order sensitive #94

Updates CLES to be order sensitive #94

odgaard commented May 14, 2020

codecov bot commented May 14, 2020 •

edited

odgaard commented May 14, 2020

raphaelvallat commented May 15, 2020

odgaard commented May 15, 2020 •

edited

raphaelvallat commented May 15, 2020

odgaard commented May 15, 2020

Updates CLES to be order sensitive #94

Updates CLES to be order sensitive #94

Conversation

odgaard commented May 14, 2020

codecov bot commented May 14, 2020 • edited

Codecov Report

odgaard commented May 14, 2020

raphaelvallat commented May 15, 2020

odgaard commented May 15, 2020 • edited

raphaelvallat commented May 15, 2020

odgaard commented May 15, 2020

codecov bot commented May 14, 2020 •

edited

odgaard commented May 15, 2020 •

edited