-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kendall tau implementation uses Python mergesort #5533
Comments
Is this a bottleneck for you? If so, shouldn't be hard to move |
numpy has a few sorts including mergesort I think, but notice that the kendalltau mergesort is instrumented to track the number of exchanges. This exchange count is an ingredient in the tau calculation. Although it wouldn't make sense to switch to an existing non-exchange-counting mergesort implementation, the instrumented mergesort could indeed be implemented in Cython for more speed. |
@argriffing Yes, I remember the algorithm was quite tricky. I wouldn't change anything except literal translation into cython. |
It's not a serious bottleneck, the statistic took long enough that I
|
If you noticed it was slow, then it can be sped up. Let's leave this issue open and see if someone feels like moving the thing to Cython without changing the algorithm. Should be a lot faster then. |
I opened a pull-request which removes the exchange counting mergesort entirely: #5548 |
When |
@sturlamolden One of the reasons not to go with the cython version was that I was the only stats maintainer, and without any experience with cython I didn't want to get the extra maintenance work. Plus, this was supposed to be a computationally efficient implementation. Now, there several maintainers and they have enough experience with cython to implement or maintain whatever works best. (Aside, I'm currently in a corner with power and sample size calculations where I just want to get the things to work, without expanding any effort on high performance in large samples. But in that case, we can at least resort to fast asymptotic results for large sample data.) |
(Another aside: |
I suggested some improvements for #5548 |
For the record, here are the Cython versions I wrote in 2009. They cannot be used as they stand (e.g. using C int instead of intp_t), but we can look at them for comparison with the current implementations: |
I just noticed that the implementation of scipy.stats.kendalltau implements its own mergesort. I suspect that any sorting algorithm, especially one that performs recursive function calls, would be orders of magnitude faster using a C implementation.
The text was updated successfully, but these errors were encountered: