Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we make Data.Sequence.sort incremental and/or faster? #143

Open
treeowl opened this issue Mar 14, 2015 · 5 comments
Open

Can we make Data.Sequence.sort incremental and/or faster? #143

treeowl opened this issue Mar 14, 2015 · 5 comments

Comments

@treeowl
Copy link
Contributor

treeowl commented Mar 14, 2015

I admit that this idea is most likely insane, but there are a few vague ideas running around in my head and I figured I might as well get them out in the open in case they inspire someone.

  1. Getting the first few and the last few elements can be done in O(n), and that's enough to build the first level of a finger tree.
  2. Frederickson's heap selection algorithm theoretically lets us find a "block" of k smallest/largest elements in a binary heap, represented as an array, in O(k) time. This will only help, I believe, if we can then get the next 2k or 3k or so elements quickly, which may well be impossible.
  3. Median-of-medians finds the median of the medians of groups of 5 elements each, producing an element in the middle 40%. Can we use such approximate medians to partition a sequence portion repeatedly, forming a 2-3 tree? It's possible, of course, to get exact medians, but the constant factors for that are very bad.
@treeowl
Copy link
Contributor Author

treeowl commented Mar 14, 2015

OK, so I thought about this a bit more, and it's definitely possible to do it (probably with terrible constant factors) using a variant of quicksort with median-of-medians partitioning. Specifically, partition out result digits using the appropriate (very lopsided at the top) order statistics, then build 2-3 trees using s quot 3 and 2*s quot 3 order statistics. This should give incremental sorting (I'm not clear on the exact time to reach an arbitrary element, but it should be at least as good as O(n log (min {i+2, n-i+1}))) that completes in linearithmic time. Of course, doing it very slowly is pretty much useless, so the question is whether it's possible to speed it up. I like the idea of trying to find a way to use approximate order statistics instead of exact ones, to the extent possible, but I don't know if it's possible to make it work.

@treeowl
Copy link
Contributor Author

treeowl commented Mar 16, 2015

Optimal Incremental Sorting by Rodrigo Paredes and Gonzalo Navarro appears to be relevant. Their algorithm (based on QuickSelect) works from just one end, whereas we'd need to work from both, but there may be ideas we can use.

@sjakobi
Copy link
Member

sjakobi commented Jul 15, 2020

I'm hardly familiar with this topic, but I think https://github.com/treeowl/sort-traversable is related.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 15, 2020

I'm hardly familiar with this topic, but I think https://github.com/treeowl/sort-traversable is related.

It's not, or at least not obviously so.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 15, 2020

A few years ago, I played around with something like this using the machinery behind zipWith. It wasn't exactly a speed demon, but it also was not really optimized. I have no idea what I did with that code. The painful aspect is figuring out how to manage random numbers. This might prove to be an idea that's better implemented in another package, but I'd rather not close this ticket until such a thing exists.

@sjakobi sjakobi added the Seq label Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants