Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort should have an algorithm with 'best' complexity for nearly sorted data #12186

Closed
jondo opened this issue Oct 16, 2018 · 19 comments
Closed

Comments

@jondo
Copy link

jondo commented Oct 16, 2018

I have got a large amount of data that is nearly sorted.
As e.g. nicely shown in these animations, a simple insertion sort could be the fastest in this case.

So I suggest offering this as an additional algorithm for the 'kind' argument.

(I also found a corresponding SO question.)

@charris
Copy link
Member

charris commented Oct 16, 2018

Note that there are already insertion sort implementations in npysort, it is used for "small" sorts. It should not be to hard to take that code and make official sorting methods from it.

@liwt31
Copy link
Contributor

liwt31 commented Oct 24, 2018

Can I take on this? This is a somewhat big enhancement, tasks include:

  • Implement an insertion sort in c.src
  • Update npy_sort.h and maybe use a template to generate the file
  • Add a new field in PyArray_ArrFunc and update arraytypes.c
  • Update ndarraytypes.h and conversion_utils.c for the new method.
  • Add tests and modify document

EDIT: it's not necessary to modify PyArray_ArrFunc directly, some tweaks around the fieldPyArray_SortFunc *sort[NPY_NSORTS] should be enough.

@mattip
Copy link
Member

mattip commented Oct 24, 2018

Why do you need to touch PyArray_ArrFunc? Isn't it enough to define a new [NPY_SORTKIND] and increment NPY_SORTS?

If you feel the PR is very large, you make a merge request of a partial implementation, mark it WIP, and add a checklist so we know when/what to review.

@charris
Copy link
Member

charris commented Oct 24, 2018

This may a bit much for a first project involving a lot of code. You are, of course, welcome to give it a try.

@liwt31
Copy link
Contributor

liwt31 commented Oct 24, 2018

Why do you need to touch PyArray_ArrFunc? Isn't it enough to define a new [NPY_SORTKIND] and increment NPY_SORTS?

Sorry, that was an inaccurate expression. This is what I meant originally.

This may a bit much for a first project involving a lot of code.

I admit solving the issue is quite challenging, so thank you all for your (forthcoming) kind advice.

@juliantaylor
Copy link
Contributor

I'd start with some benchmarks first. Prepare some datasets and just replace the quicksort with an insertion sort locally for the benchmarks.
Adding a sorting algorithm breaks the ABI and we should now some real numbers on benefits before doing so.

@charris
Copy link
Member

charris commented Oct 25, 2018

Quicksort falls back on insertion sort when the partition size is small, so all you need to do to test the timing is to change the cutoff to a bigger number in numpy/core/src/npysort/quicksort.c.src.

#define SMALL_QUICKSORT 15 /* change this to bigger number */

Quicksort will then be insertion sort for arrays smaller than that. The problem will be putting together relevant test data and deciding what "almost sorted" means. Note that writing insertion sort comes down to copying the quicksort code and deleting large chunks of it :)

@charris
Copy link
Member

charris commented Oct 25, 2018

It might be possible to also speed up mergsort for the almost sorted case. Which, now that I think of it, is what Timsort does. Maybe we should finally implement that, it seems safer than a straight insertion sort and we have discussed adding it before.

@charris
Copy link
Member

charris commented Oct 25, 2018

If Timsort is no worse than mergesort, we could simply replace mergesort under the covers, just like we did with quicksort, which is now introsort.

@liwt31
Copy link
Contributor

liwt31 commented Oct 26, 2018

I found a lib to do some quick benchmarks, here are some results (array size 100000, average over 10 times, -O3 optimization).

Random generated array

Running tests
stdlib qsort time:                 15712.00 us per iteration
quick sort time:                    7235.20 us per iteration
selection sort time:             16801795.10 us per iteration
bubble sort time:                17992590.90 us per iteration
merge sort time:                    8862.10 us per iteration
binary insertion sort time:      1465175.50 us per iteration
heap sort time:                    10233.90 us per iteration
shell sort time:                   11486.70 us per iteration
tim sort time:                      9248.40 us per iteration
in-place merge sort time:           8205.80 us per iteration
grail sort time:                   10529.20 us per iteration
sqrt sort time:                     9159.90 us per iteration
rec stable sort sort time:         34215.50 us per iteration
grail sort dyn buffer sort time:    9527.50 us per iteration

Makes sense that quick sort is the fastest. Tim sort slightly worse than merge sort.

Completely sorted array

Running tests
stdlib qsort time:                  5419.30 us per iteration
quick sort time:                    1924.90 us per iteration
selection sort time:             3391344.20 us per iteration
bubble sort time:                     96.00 us per iteration
merge sort time:                    2016.40 us per iteration
binary insertion sort time:           67.00 us per iteration
heap sort time:                     8213.60 us per iteration
shell sort time:                    1480.10 us per iteration
tim sort time:                        53.90 us per iteration
in-place merge sort time:            154.90 us per iteration
grail sort time:                    2208.90 us per iteration
sqrt sort time:                     1689.30 us per iteration
rec stable sort sort time:          2118.00 us per iteration
grail sort dyn buffer sort time:    1749.30 us per iteration

Insertion sort is good, however the fastest is tim sort. Very impressive.

Almost sorted array with 0.05% permutation (all elements away from sorted position by averagely 0.05% of total size of the array, i.e 50 elements)

Running tests
stdlib qsort time:                  9476.00 us per iteration
quick sort time:                    5275.50 us per iteration
selection sort time:             3496961.80 us per iteration
bubble sort time:                  21221.50 us per iteration
merge sort time:                    5251.80 us per iteration
binary insertion sort time:         5690.50 us per iteration
heap sort time:                     9306.70 us per iteration
shell sort time:                    4523.20 us per iteration
tim sort time:                      5101.90 us per iteration
in-place merge sort time:           4103.30 us per iteration
grail sort time:                    5359.30 us per iteration
sqrt sort time:                     4110.00 us per iteration
rec stable sort sort time:         10509.20 us per iteration
grail sort dyn buffer sort time:    4799.10 us per iteration

Tim sort is not performing well with this kind of permutation, and insertion sort performs basically the same with tim sort. In-place merge sort becomes the fastest.

Almost sorted array with 5% permutation (all elements away from sorted position by averagely 5% of total size of the array, 5000 elements)

Running tests
stdlib qsort time:                 13776.50 us per iteration
quick sort time:                    7988.50 us per iteration
selection sort time:             4810939.30 us per iteration
bubble sort time:                1780895.10 us per iteration
merge sort time:                    7918.50 us per iteration
binary insertion sort time:       100704.50 us per iteration
heap sort time:                     8718.30 us per iteration
shell sort time:                    8775.20 us per iteration
tim sort time:                      7933.30 us per iteration
in-place merge sort time:           6256.20 us per iteration
grail sort time:                    8166.30 us per iteration
sqrt sort time:                     6699.00 us per iteration
rec stable sort sort time:         25762.10 us per iteration
grail sort dyn buffer sort time:    7437.60 us per iteration

Merge sort (in-place or not) better than tim sort, and insertion sort becomes useless.

Conclusion

These experiments suggests that when the array is truly "nearly sorted", tim sort is the best choice. And when the array is "somewhat nearly sorted", in-place merge sort is the best choice. I'd consider the latter to be more common in real cases, so if we decide to replace merge sort, we'd better replace it with in-place merge sort. And if it's necessary we can add another tim sort for users who really know that their arrays are almost sorted.

@liwt31
Copy link
Contributor

liwt31 commented Oct 26, 2018

While trying to improve these experiments with more kinds of permutation it came to me that maybe what we should do is to implement as many sorting algorithms as possible and leave the benchmark and decide-which-to-use to users. Some algorithms, such as tim sort, may have higher priority in the todo list of course.

@charris
Copy link
Member

charris commented Oct 26, 2018

We want as few methods as serve, otherwise a lot of methods just sit around unused. The original three were basically "inplace, quick, not guaranteed", "out of place, stable, guaranteed", and "inplace guaranteed". With introsort replacing quicksort, we could probably drop "heapsort" at some point.

We cannot replace mergesort with inplace mergesort because the latter is not stable and we need a stable sort. So I think it comes down to timsort, which seems generally as good as mergesort, or trying some compiler tricks to improve mergesort, maybe NPY_LIKELY in some if statements.

@charris
Copy link
Member

charris commented Oct 26, 2018

There is recent ongoing work on stable sorts, see https://github.com/BonzaiThePenguin/WikiSort for an example.

@charris
Copy link
Member

charris commented Oct 26, 2018

I think timsort also relies on runs, so it might work better for data where you shuffle (or even reverse) a randomly selected portion of sorted data rather than just keeping data nearby so that the result has a lot of runs in it.

@jondo
Copy link
Author

jondo commented Oct 26, 2018

In the use case that made me file this issue, the data has got long runs and only a few small unsorted areas.

@charris
Copy link
Member

charris commented Oct 26, 2018

Useful google: measures of sortedness. I think the measure we might want is "insertion index", n - length_of_longest_run.

@jondo The case of runs might be improved with a simple `NPY_LIKELY' or some such, which should allows parallel execution of the likely case in mergesort. @juliantaylor Thoughts?

@charris
Copy link
Member

charris commented Oct 26, 2018

Well, NPY_LIKELY sped things up by 0.5%, so not much help :(

@liwt31
Copy link
Contributor

liwt31 commented Oct 27, 2018

I'd prefer tim sort. It is well studied, benchmarked and tested in practice. After it's implemented we can do some benchmarks to decide if it should replace the old merge sort or not.

@shoyer
Copy link
Member

shoyer commented May 10, 2019

Timsort has been merged!

@shoyer shoyer closed this as completed May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants