Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Improve GAP's sort functions #609

Merged
merged 3 commits into from
Feb 25, 2016

Conversation

ChrisJefferson
Copy link
Contributor

Before looking at this patch, look at these soothing benchmarks. These each show a list, and how long it takes to show to sort it in master, and in this branch.

List([1..10000000], x -> Random([1..10000000])) : 4467 -> 1324
List([1..10000000], x -> Random([1..2]));; : 736 -> 200
List([1..10000000], x -> -x) : 895 -> 175
Concatenation([1..10000000], [50]) : 639 -> 383
l := [1..100000000] * 0;; : 6245 -> 1412

This patch aims to do 2 things:

  1. Using some reasonably horrid C macros, remove the fact we have 8 cut+pasted copy of our sort routine.

The 8 sorts are all combinations of:

  • Sorting a plist and any list
  • Sorting with a predicate, or without
  • Sort and SortParallel
  1. Replace the shell sort (which has served us well) with a more modern search, in this case 'pdqsort' which is a variant of introsort (which is a variant of quicksort), designed to make use of sorted sub-sequences fairly cheaply.

The way this works is we define a small language of macro functions which must be implemented to instantiate our sort.

The main question is, do people consider the macros worth the reduction in code duplication, or does anyone have a suggestion on how to handle this differently, before I apply final spit+polish? (The answer "turn the GAP kernel in C++" seems like too huge a step right now).

@markuspf
Copy link
Member

This looks really good.

For some reason I thought that the previous sort was stable, but having a quick look around I can't seem to confirm this (ShellSort isn't stable).

I do actually think that we should be looking at least at templates from C++. Sure we can hack up most things using macros, but templates would make some things more clean (permutations, transformations, I am looking at you).

@ChrisJefferson
Copy link
Contributor Author

Now I've written this, this would be an ideal time to add StableSort and StableSortParallel, if users wantwant them.

@hulpke
Copy link
Contributor

hulpke commented Feb 10, 2016

I seem to remember vaguely that in some code in the past I did slightly dodgy things with `Sort' (using an order that is not really total), relying on the fact that the sort was stable. Thus I want a StableSort.

@ChrisJefferson
Copy link
Contributor Author

@hulpke : Do you mean you want a stable sort adding, or you want GAP's sort to remain stable? Because currently, Sort is not stable. For example (gap 4.7.8) (I made this array by random search. Notice how the second members of the pairs are in order, then after sorting [1,10] comes before[1,3].

 ┌───────┐   GAP, Version 4.7.8 of 09-Jun-2015 (free software, GPL)
 │  GAP  │   http://www.gap-system.org
 └───────┘   Architecture: x86_64-apple-darwin15.0.0-gcc-default64
gap> l := [ [ 1, 1 ], [ 2, 2 ], [ 1, 3 ], [ 3, 4 ], [ 1, 5 ], [ 4, 6 ], [ 4, 7 ],
>   [ 3, 8 ], [ 5, 9 ], [ 1, 10 ], [ 4, 11 ], [ 5, 12 ], [ 3, 13 ], [ 1, 14 ],
>   [ 4, 15 ], [ 2, 16 ], [ 1, 17 ], [ 4, 18 ], [ 5, 19 ], [ 2, 20 ] ];;
gap> Sort(l, function(x,y) return x[1] < y[1]; end);
gap> l;
[ [ 1, 1 ], [ 1, 10 ], [ 1, 3 ], [ 1, 5 ], [ 1, 14 ], [ 1, 17 ], [ 2, 16 ], [ 2, 20 ], [ 2, 2 ],
  [ 3, 4 ], [ 3, 13 ], [ 3, 8 ], [ 4, 7 ],  [ 4, 11 ], [ 4, 6 ], [ 4, 15 ], [ 4, 18 ], [ 5, 9 ],
  [ 5, 19 ], [ 5, 12 ] ]

@hulpke
Copy link
Contributor

hulpke commented Feb 10, 2016

@ChrisJefferson Apparently then I've been wrong before -- I erroneously thought shell sort was stable and that had been the reason for using it.

So my request would be to also have a stable sort.

@fingolfin
Copy link
Member

Having a StableSort (and StableSortBy, and StableSortParallel) would indeed be useful.

As to the implementation: I can live with the macro tricks (I did something similar, though much simpler, for the pc collector code, after all). I'd clearly prefer to rewrite this using C++ templates, but as you say, this might be a bit too much right now.

Hmm, though, is it really? Couldn't we just compile individual files with C++, making sure to only invoke extern "C" functions from the them? Perhaps we should ask on the list if there are serious objections to this? I think the template code would be much easier to understand, even for people who don't know much or any C++, at least compared to the macro trickery employed here.

But don't get me wrong: since you already wrote the code, and it seems to work, I don't mind it being added for now, and rewritten (if desired) later on.

@fingolfin
Copy link
Member

BTW, I think this is a super cool change you are making -- all my review comments are meant as constructive criticism, as always, I hope they get across that way. If anything sounds unreasonable, please let me know

@fingolfin
Copy link
Member

After looking at some of the commits, I think you might want to consider squashing several of them.

SORT_ASS_TEMP_TO_LIST( k, w );
k -= h;
if ( h+(start-1) < k ) {
SORT_ASS_LIST_TO_TEMP( w, k-h );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indention (tabs?)

@ChrisJefferson
Copy link
Contributor Author

As a general thing, I'll squish commits, and probably just run through something like clang-format (as it's a new file).

The C++, I think it's worth asking. I expect it to be rare nowadays for people to have a C compiler and not a C++ one (at least, it will be trivial for them to install one), and we can stick with C++03, which everyone with a C++ compiler will have.

@hulpke
Copy link
Contributor

hulpke commented Feb 10, 2016

More a question than a remark: Can one compile some files with C and some with C++ and have a guarantee that everything still fits together?

@ChrisJefferson
Copy link
Contributor Author

@hulpke : The short answer is yes. There is already C++ in kernel code in the json, profiling and semigroups packages (maybe just dev versions of semigroups), distributed with GAP. The only practical issue is teaching the configure script to go find a C++ compiler.

The (only slightly) longer answer is that you have to wrap the prototypes of C++ functions you want to call from C with extern "C" { ... }, which ensures that the name mangling is set to the C standard (and you can't use any C++ types in the arguments/etc).

@ChrisJefferson : there could be issues with libGAP properly wrapping up C++, although not impossible to resolve, of course. As long as ./configure is generated by autoconf, finding C++ compiler is trivial...

@frankluebeck
Copy link
Member

@ChrisJefferson: This is nice!

I have made some more tests than those you mention in the first post, in particular with much shorter lists. There were only a few cases where the Sort from this change is not faster or even slightly slower than the old one, but many cases where it is much faster.

I remember that years ago we have made some comparisons of various sorting algorithms. The experience was that some algorithms were faster than the shell sort for certain types of input but (sometimes much) slower on other types (long vs. short lists, almost sorted vs reversely sorted vs pretty random, many different vs few different entries, ...). So, at that time our conclusion was a confirmation of the comment in src/listfunc.c that shell sort is a very good general purpose choice.

The pdqsort strategy is more complicated than the previous shell sort but seems to be an even better choice for general purpose sorting.

@frankluebeck
Copy link
Member

I have needed a stable sort on a few occasions and used the following generic workaround:
Instead of

Sort(l);

I used

tmp := List([1..Length(l)], i-> [l[i], i]);;
Sort(tmp);
l := List(tmp, a-> a[1]);;

@dimpase
Copy link
Member

dimpase commented Feb 11, 2016

instead of replying to a comment, I edited it, sorry @ChrisJefferson :-)

@ChrisJefferson ChrisJefferson force-pushed the sortimprove branch 2 times, most recently from 3760dbb to 1cdf7ed Compare February 11, 2016 13:31
@ChrisJefferson
Copy link
Contributor Author

Now featuring StableSort, StableSortBy and StableSortParallel, just missing documentation.

These methods are (not unexpectedly) slower than Sort by varying amounts, but still (usually) faster than @frankluebeck 's mapping to an array of pairs.

To get the speed higher, one could implement one of the modern stable sorts which is similar in design to pdqsort, such as timsort (used in Python, among other places). I leave such an implementation to anyone who wants to volunteer :)

@fingolfin
Copy link
Member

Such an implementation can be done later, no need to delay things for it ;-)

There are test failures now -- interstingly, in the new small groups code I added. Perhaps this is a case of an unstable sort result differing between old and new sort implementation, leading to different choice being made in the algorithm? Hmmm

@ChrisJefferson
Copy link
Contributor Author

In some quick experiments, the old sort is "more stable" in some cases, but nothing anyone could reliably count on.

@rbehrends
Copy link
Contributor

Once concern that I have is that the choice of pdqsort may not be the best for GAP (on the other hand, mergesort is now available through StableSort()). Quicksort is fast when it comes to sorting integers (due its fast inner loop), but when comparisons are expensive and/or refer to data with poor locality (something we can expect in the general case, especially when a comparison function is written in GAP), then it (and worse, heapsort, which pdqsort/introsort fall back on to avoid the worst case) use a high number of comparisons compared to mergesort (though, obviously, mergesort can't do in-place sorting).

Some quick benchmarking seems to indicate that the fallback to insertion sort is counterproductive for mergesort; mergesort is faster without that, and can be further optimized by not merging already sorted ranges (I did not test a further optimization with handcoding the cases for ranges of length 1 or 2). For (short) integers, pdqsort performs better, while mergesort beats out pdqsort for strings, especially when those are partially sorted already.

@ChrisJefferson
Copy link
Contributor Author

My mergesort should certainly be improved (python's timsort could be a good basis). If we can make a stable sort that's good enough for the general sort, that would avoid the need for two sorts.

@ChrisJefferson
Copy link
Contributor Author

@fingolfin (or others) I would be interested to know if the change in StructureDescription appears to be a bug (in which case I will try to track it down), or it is just an alternative, valid, output (I don't understand StructureDescription well enough myself to be sure).

@hulpke
Copy link
Contributor

hulpke commented Feb 18, 2016

@ChrisJefferson As far as I understand StructrureDescription some decisions (on which of multiple decompositions to take) are based on selecting from a sorted list. Also the newly obtained structure descriptions seem to be as valid as the expected ones, so I'm inclined to consider this as a harmless side-effect.
(I don't think it is a good idea to have StructureDescription in tests and check for explicit strings returned.)

@hungaborhorvath
Copy link
Contributor

@ChrisJefferson I have checked and all three problematic StructureDescription results are completely valid, do not worry about them. One could argue which output of StructureDescription is the best one, but that is for another place and time and should not affect (and certainly should not delay) the merge of this PR.

(Not to mention that StructureDescription is being rewritten, and a lot of outputs will change in exchange for faster output. BTW, the code that uses the sort and caused the difference is itself a bit outdated at the moment and will in fact be replaced in a future PR.)

I suggest to circumvent the test by replacing the old values in lib/grpnames.g by the new values for the three groups for which the difference occurs. I do not mind to do it in another PR after this PR is merged, if that is the preference.

I think it is a good idea to have StructureDescription used in tests, it already helped to reveal one of my mistakes in my previous code. And when the change in the output is harmless (as in this case), we can simply adapt accordingly.

@markuspf
Copy link
Member

(Not to mention that StructureDescription is being rewritten, and a lot of

Is this documented/discussed somewhere?

@hungaborhorvath
Copy link
Contributor

Offtopic:
@markuspf Actually, no. I discussed my ideas with @Stefan-Kohl as he is the maintainer of grpnames.g*, and some discussion has been started at e.g. #160. There are already some PRs along making it more efficient, e.g. #379, #561, #563.

What would be the proper forum to open a discussion about this? To open an issue?

@markuspf
Copy link
Member

On Thu, Feb 18, 2016 at 12:13:35PM -0800, hungaborhorvath wrote:

Offtopic:
@markuspf Actually, no. I discussed my ideas with @Stefan-Kohl as he is the
maintainer of grpnames.g*, and some discussion has been started at e.g. #160.
There are already some PRs along making it more efficient, e.g. #379, #561, #
563.

What would be the proper forum to open a discussion about this? To open an
issue?

If you are planning to make major changes to the GAP system the best way to go
about probably is to send an email to gap@gap-system.org (the open development
mailing list) and make people aware of your plans.

We hope that this way you will get early feedback on your project and (hopefully)
don't end up in a situation where you invested a lot of time into a project
and you're then turned down because people don't agree with what you have
been doing.

If you have code, the best thing to do is open a pull-request.

I don't think that discussions about design decisions are well-placed in the
issue tracker.

@markuspf
Copy link
Member

Shall we update the tests then and go ahead with a merge?

@stevelinton
Copy link
Contributor

@markuspf yes

@markuspf
Copy link
Member

@ChrisJefferson this PR produces a lot of warnings about variables being set but not used. I am investigating, but assme this is due to the generecity of the macros?.

@markuspf
Copy link
Member

Another thing that this turns up is of course the slight horror of reading the code to be able to understand/debug it. But since it took me only 15 minutes I hope that's not too much of a factor for most.

@markuspf markuspf merged commit 1d34563 into gap-system:master Feb 25, 2016
@ChrisJefferson ChrisJefferson deleted the sortimprove branch February 26, 2016 08:29
@ChrisJefferson
Copy link
Contributor Author

I'm not seeing any warnings, did you fix them, or is my computer just not producing them?

@markuspf
Copy link
Member

On Fri, Feb 26, 2016 at 12:29:26AM -0800, Christopher Jefferson wrote:

I'm not seeing any warnings, did you fix them, or is my computer just not
producing them?

You can see them in the travis test. I didn't fix them yet.

I get the follwoing warnings (compiling with gcc 5.3.1 [DragonFly]
Release/2015-12-04)

../../src/sortbase.h: In function 'SORT_PARA_LISTCompIndices':
../../src/sortbase.h:113:21: warning: variable 'us' set but not used
[-Wunused-but-set-variable]
   SORT_CREATE_LOCAL(u);
                     ^
../../src/listfunc.c:807:48: note: in definition of macro 'SORT_CREATE_LOCAL'
 #define SORT_CREATE_LOCAL(name) Obj name ; Obj name##s ;
                                                ^
../../src/sortbase.h:112:21: warning: variable 'ts' set but not used
[-Wunused-but-set-variable]
   SORT_CREATE_LOCAL(t);
                     ^
../../src/listfunc.c:807:48: note: in definition of macro 'SORT_CREATE_LOCAL'
 #define SORT_CREATE_LOCAL(name) Obj name ; Obj name##s ;
                                                ^
../../src/sortbase.h: In function 'SORT_PARA_LISTPartition':
../../src/sortbase.h:161:25: warning: variable 'listcpys' set but not used
[-Wunused-but-set-variable]
       SORT_CREATE_LOCAL(listcpy);
                         ^
../../src/listfunc.c:807:48: note: in definition of macro 'SORT_CREATE_LOCAL'
 #define SORT_CREATE_LOCAL(name) Obj name ; Obj name##s ;
                                                ^
../../src/sortbase.h:170:25: warning: variable 'listcpys' set but not used
[-Wunused-but-set-variable]
       SORT_CREATE_LOCAL(listcpy);
                         ^
../../src/listfunc.c:807:48: note: in definition of macro 'SORT_CREATE_LOCAL'
 #define SORT_CREATE_LOCAL(name) Obj name ; Obj name##s ;
                                                ^
../../src/sortbase.h:152:21: warning: variable 'pivots' set but not used
[-Wunused-but-set-variable]
   SORT_CREATE_LOCAL(pivot);
                     ^
../../src/listfunc.c:807:48: note: in definition of macro 'SORT_CREATE_LOCAL'
 #define SORT_CREATE_LOCAL(name) Obj name ; Obj name##s ;
                                                ^
../../src/sortbase.h: In function 'SortParaDensePlistCompIndices':
../../src/sortbase.h:113:21: warning: variable 'us' set but not used
[-Wunused-but-set-variable]
   SORT_CREATE_LOCAL(u);
                     ^
../../src/listfunc.c:828:48: note: in definition of macro 'SORT_CREATE_LOCAL'
 #define SORT_CREATE_LOCAL(name) Obj name ; Obj name##s ;

(There are more of them, but I think you see the point).

@ChrisJefferson
Copy link
Contributor Author

Just for confirmation, these have been fixed.

@olexandr-konovalov
Copy link
Member

@ChrisJefferson - thanks, just to confirm that nightly tests with -Werror option are now back to normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants