Can we make Data.Sequence.sort incremental and/or faster? #143

treeowl · 2015-03-14T19:41:00Z

I admit that this idea is most likely insane, but there are a few vague ideas running around in my head and I figured I might as well get them out in the open in case they inspire someone.

Getting the first few and the last few elements can be done in O(n), and that's enough to build the first level of a finger tree.
Frederickson's heap selection algorithm theoretically lets us find a "block" of k smallest/largest elements in a binary heap, represented as an array, in O(k) time. This will only help, I believe, if we can then get the next 2k or 3k or so elements quickly, which may well be impossible.
Median-of-medians finds the median of the medians of groups of 5 elements each, producing an element in the middle 40%. Can we use such approximate medians to partition a sequence portion repeatedly, forming a 2-3 tree? It's possible, of course, to get exact medians, but the constant factors for that are very bad.

treeowl · 2015-03-14T20:30:19Z

OK, so I thought about this a bit more, and it's definitely possible to do it (probably with terrible constant factors) using a variant of quicksort with median-of-medians partitioning. Specifically, partition out result digits using the appropriate (very lopsided at the top) order statistics, then build 2-3 trees using s quot 3 and 2*s quot 3 order statistics. This should give incremental sorting (I'm not clear on the exact time to reach an arbitrary element, but it should be at least as good as O(n log (min {i+2, n-i+1}))) that completes in linearithmic time. Of course, doing it very slowly is pretty much useless, so the question is whether it's possible to speed it up. I like the idea of trying to find a way to use approximate order statistics instead of exact ones, to the extent possible, but I don't know if it's possible to make it work.

treeowl · 2015-03-16T16:48:23Z

Optimal Incremental Sorting by Rodrigo Paredes and Gonzalo Navarro appears to be relevant. Their algorithm (based on QuickSelect) works from just one end, whereas we'd need to work from both, but there may be ideas we can use.

sjakobi · 2020-07-15T02:11:46Z

I'm hardly familiar with this topic, but I think https://github.com/treeowl/sort-traversable is related.

treeowl · 2020-07-15T02:44:51Z

I'm hardly familiar with this topic, but I think https://github.com/treeowl/sort-traversable is related.

It's not, or at least not obviously so.

treeowl · 2020-07-15T02:49:31Z

A few years ago, I played around with something like this using the machinery behind zipWith. It wasn't exactly a speed demon, but it also was not really optimized. I have no idea what I did with that code. The painful aspect is figuring out how to manage random numbers. This might prove to be an idea that's better implemented in another package, but I'd rather not close this ticket until such a thing exists.

m-renaud added the performance label Dec 26, 2017

sjakobi added the Seq label Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we make Data.Sequence.sort incremental and/or faster? #143

Can we make Data.Sequence.sort incremental and/or faster? #143

treeowl commented Mar 14, 2015

treeowl commented Mar 14, 2015

treeowl commented Mar 16, 2015

sjakobi commented Jul 15, 2020

treeowl commented Jul 15, 2020

treeowl commented Jul 15, 2020

Can we make Data.Sequence.sort incremental and/or faster? #143

Can we make Data.Sequence.sort incremental and/or faster? #143

Comments

treeowl commented Mar 14, 2015

treeowl commented Mar 14, 2015

treeowl commented Mar 16, 2015

sjakobi commented Jul 15, 2020

treeowl commented Jul 15, 2020

treeowl commented Jul 15, 2020