Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert gh-5938, restore ks_2samp #6545

Merged
merged 3 commits into from
Sep 13, 2016
Merged

Conversation

ev-br
Copy link
Member

@ev-br ev-br commented Sep 4, 2016

As discussed in gh-6435,

  1. Cythonize stats.ks_2samp for a ~33% gain in speed. #5938 changed ks_2samp for scipy 0.18.0,
  2. the change was supposed to be a moderate speed up only, but it changed the behavior in case of ties in the data
  3. both "old" (0.17.0) and "new" (0.18.0) behavior is somewhat ad hoc, and the "new" is not clearly better than the "old"
  4. figuring out what is the obviously correct behavior is not trivial and nobody seems to have time/energy for it.

Therefore, this PR simply reverts gh-5938 and restores the implementation of stats.ks_2samp from scipy 0.17.0 so that we keep backwards compatibility until someone steps up for 4. above.

Points to consider:

  • scipy.stats.ks_2samp returns different values on different computers #6435 has an example which can be used for testing the result against scipy 0.17.0 and older. ISTM a test like this is going to be confusing in the long run, but I'd be fine adding one if there are strong opinions.
  • So far I kept the cython implementation of ks_2samp in _stats.pyx which is there but unused. Do we want to keep it in the source code, or we're fine with it surviving in the git history only?

This reverts commit 2f1720a.

Conflicts:

	doc/release/0.18.0-notes.rst
	scipy/stats/_stats.pyx
	scipy/stats/setup.py
	scipy/stats/stats.py
@ev-br ev-br added scipy.stats maintenance Items related to regular maintenance tasks backport-candidate This fix should be ported by a maintainer to previous SciPy versions. labels Sep 4, 2016
@ev-br ev-br added this to the 0.18.1 milestone Sep 4, 2016
@rgommers
Copy link
Member

ISTM a test like this is going to be confusing in the long run, but I'd be fine adding one if there are strong opinions.

Agreed, if it's not clearly correct, then it probably doesn't have too much value.

@rgommers
Copy link
Member

So far I kept the cython implementation of ks_2samp in _stats.pyx which is there but unused. Do we want to keep it in the source code, or we're fine with it surviving in the git history only?

I suggest to remove it and open a new issue that explains the history and what has to be done to get the faster version of ks_2samp back. Keeping dead code around is just going to waste someone's time later on, figuring out why it's there.

Since it's no longer used. It was added in scipygh-5938 for scipy 0.18.0
to get some speedup for ks_2samp, but then the addition was reverted
in scipygh-6545, following the discussion in scipygh-6435: it gives different
answers on different machines, it changes one ad hoc statistic to
a different ad hoc statistic, and neither of them are clearly "correct".
@ev-br
Copy link
Member Author

ev-br commented Sep 12, 2016

OK, dead code is gone.

@rgommers rgommers merged commit 2526df7 into scipy:master Sep 13, 2016
@rgommers
Copy link
Member

In it goes then. Thanks @ev-br

@ev-br ev-br deleted the revert_ks_2samp branch September 13, 2016 11:07
ev-br added a commit to ev-br/scipy that referenced this pull request Sep 13, 2016
Since it's no longer used. It was added in scipygh-5938 for scipy 0.18.0
to get some speedup for ks_2samp, but then the addition was reverted
in scipygh-6545, following the discussion in scipygh-6435: it gives different
answers on different machines, it changes one ad hoc statistic to
a different ad hoc statistic, and neither of them are clearly "correct".
@ev-br
Copy link
Member Author

ev-br commented Sep 14, 2016

backported in gh-6557

@ev-br ev-br removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Sep 14, 2016
@anntzer anntzer mentioned this pull request Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants