Upgraded sort code to use QuickSort instead of ShellSort. #988

rmitton · 2023-06-23T19:09:11Z

Jobs in FASTBuild are sorted using ShellSort, which has execution time of O(n^2). When 'n' is large, which can happen on builds containing thousands of jobs, the time taken to perform the sorting quickly becomes notable, sometimes taking over 30 seconds to complete.

Data can be sorted quickly using an O(n log n) algorithm such as QuickSort (C. A. R. Hoare, 1961). QuickSort uses a divide-and-conquer strategy to recursively split the problem into smaller tasks. This implementation uses an explicit stack to avoid function-call overhead and bound stack depth in case of adverse inputs (degrading to a slower sort under the rare case of overflow), chooses median-of-three as its partition function to maintain recursion balance, and handles repeated equal values well.

Other options such as deferring to the libc qsort or STL std::sort were not considered due to the pre-existing coding style of the FASTBuild codebase. Switching algorithms to a distributive sort such as Radix Sort was discounted to maintain API compatibility.

Jobs in FASTBuild are sorted using ShellSort, which has execution time of O(n^2). When 'n' is large, which can happen on builds containing thousands of jobs, the time taken to perform the sorting quickly becomes notable, sometimes taking over 30 seconds to complete. Data can be sorted quickly using an O(n log n) algorithm such as QuickSort (C. A. R. Hoare, 1961). QuickSort uses a divide-and-conquer strategy to recursively split the problem into smaller tasks. This implementation uses an explicit stack to avoid function-call overhead and bound stack depth in case of adverse inputs (degrading to a slower sort under the rare case of overflow), chooses median-of-three as its partition function to maintain recursion balance, and handles repeated equal values well. Other options such as deferring to the libc qsort or STL std::sort were not considered due to the pre-existing coding style of the FASTBuild codebase. Switching algorithms to a distributive sort such as Radix Sort was discounted to maintain API compatibility.

missmah · 2023-06-24T00:55:52Z

Code/Core/Containers/Sort.h

@@ -30,37 +30,80 @@ class AscendingCompareDeref
    }
 };

-// ShellSort


Wondering why ShellSort is removed here - presumably there are no current uses for it, but it could potentially find uses in the future? Or even be used in a unit test of the new quicksort implementation?

I'd be ok with deleting it if we determine it should never be used. I'd also be ok with moving it into the tests to have a simple way of verifying the sort/sorts we chose to keep (though it may be of limitted use since the sorts we're using are unstable and may not produce the same results for lists with "equal" items)

ffulin · 2023-06-24T01:28:29Z

Changing the default sort algorithm, particularly to a hybrid one might make sense. That approach is the standard on most stl implementations for good reason. I certainly want to avoid pathological edge cases and have the default Sort() be broadly performant.

Background on current implementation

To add some background, the existing sort algorithm was chosen over quicksort originally for a few reasons:

purported better or equivalent worst-case performance (n log n vs n^2, or n^2 vs n^2 depending on the source) (at the expense of worse average time)
better space complexity
implementation simplicity

Naturally, big-O complexity notation is not the only factor in actual real world performance. The gap sequences, cost of comparisons and cost of exchanges or copies are all factors, so performance was profiled and I've not seen any performance issues even on large codebases with hundreds of projects and dozens of build configurations.

Questions about your use case

I'm curious to learn a bit more about exactly which situation you saw this being slow and what order of magnitude the number of jobs was.

For example, the QueueJobs function that pushes the main thread queue into the job system, combining it with the existing queue has 2 sorts in it. One to sort the new jobs and a second on to sort the merged lists. Do you see slow sorts in the first or second sort? If it was the second one for example, we could do better than re-sorting the list by performing a merge of the already two sorted lists. If it was the first sort, it might be good to limit the size of that queue, and flush it periodically (which would have some other benefits to jobs processing)

Just to be clear, those changes may be complimentary to changing the sort algorithm.

Tests
Various tests appear to be failing on Windows, hitting some asserts and then ultimately resulting in the application being terminated with a "fail fast" exception (likely indicating some sort of buffer overflow). It's strange that the failures occur on Windows only.

The Array class does implement some Sort() tests but perhaps the coverage is not adequate. (It's also possible the failure is not in the sort itself but perhaps some existing issue has been uncovered).

Thoughts about Moves
Looking at both the existing sort implementation and the proposed new one, I think there is an opportunity to improve sort performance by utilizing the Move helper (FASTBuild's std::move equivalent). Many performance critical sorts in FASTBuild use SortDeref to avoid deep copies so wouldn't benefit, but some sorts are using Sort() and would benefit from using move semantics.

ffulin · 2023-07-04T02:17:57Z

I spent some time constructing some build graphs with very large amounts of jobs and manipulating them so as to cause large amounts of re-sorting when combining the list of new jobs with existing jobs. I managed to craft some extreme cases that caused 10+ seconds worth of sorting when adding new job batches into the queue.

In my scenario, the majority of the cost was spent inside the re-sorting of the combined lists, so I went ahead and replaced that with a merge (979f605) as I mentioned might be nice to do in my previous comment.

In my testing it all but eliminated the 10+ second stall of my forced bad case and was an nice improvement in all other real-world scenarios i measured.

I'd be interested to hear if this change improves the situation you were seeing.

I still think the sort itself is still worth considering for further optimization and these changes are complimentary.

I did also experiment with earlier flushing of job batches as I mentioned before, and while that yielded some interesting results due to how it impacts job scheduling, it improved some cases while making others worse. There could be some heuristic to apply here that might still make this useful, but for now this isn't an obvious win.

rmitton · 2023-09-12T15:58:47Z

Thank you for your thoughts regarding the performance of the FASTBuild ShellSort() code, and the brief discussion of the merits of big-O complexity, gap sequences, and the cost of exchanges and copies.

Not being a domain expert in sorting, I felt it best to defer to the opinion of someone who is. I therefore took the time to print out and mail a copy of this pull request to Professor Donald E. Knuth at Stanford University, author of The Art Of Computer Programming, who despite being 85 years old took the time to study the code in question and write back with a careful analysis which I have retyped the relevant excerpt from here:

The ShellSort code fragment that you sent me is perhaps the worst example of coding that I've ever seen. It is remarkable that you could discipline yourself to avoid judgmental words, etc. when submitting your replacement code.

The programmer has totally screwed up Donald Shell's idea: His/her routine starts with increment=3 (line 39). Then it does a hairy computation in lines 53--64 to set increment=1 for the second pass, then increment=0. (Those ten lines of code could be replaced by the single line "increment=increment/2" and achieve exactly the same effect .... the else clause on lines 61--64 is never executed). But those lines are executed only twice anyway; they aren't the problem [...] The problem of course is that this sort routine begins with increment 3; nobody with the slightest understanding of shellsort would do that. The starting increment should be somewhere about n/3, and there are better ways to adjust it after that. A decent shellsort runs faster than order n^2. But this one is indecent.

Regarding our use case, the work at our company involves sometimes as many as quarter of a million objects to consider; as the code in question is an algorithm for sorting numbers into ascending order, our use case is simply that our 'n' is much bigger than your 'n'.

I hope this is useful.

larsx2 · 2023-09-12T20:46:31Z

@rmitton this is one of the most petty and unnecessarily rude responses I have seen in the OSS space.

How childish do you have to be for acting this way against someone who in no way showed ill-intent in any of their arguments?

While you are entitled to judge their opinions as wrong, the way you attacked the maintainer just shows the world how weak as a person you are.

MinisculeGirraffe · 2023-09-13T03:41:48Z

Alexa. Write me a petty response please.

jan-vandenberg · 2023-09-13T11:51:29Z

Could this actually be a response by Donald Knuth? I doubt it.

It doesn't read like him (i.e. I don't think he would call it 'Donald Shell's idea'). Also you don't have to print and mail Knuth. Yes, he doesn't use e-mail, but his staff does. I know this, because I myself have interacted this way with Knuth before.

Overall there is no other 'proof' that this is actually a response by Knuth.

rmitton · 2023-09-15T08:37:46Z

I would like to take this time to thank the members of the open-source community for their kindness, warmth, and general goodwill. It's been nice to see folks take such a keen, spirited interest in the history of computer science and the practical details of sorting algorithm implementation.

As there seems little interest in pursuing this pull request any further however, I have elected to close it and move on with other things.

I wish the fine, daring algorithm designers over at FASTBuild the best of luck in future with their bold independent research.

ffulin · 2023-09-15T09:12:55Z

@rmitton I'm not sure where you got the impression that I didn't want to fix this issue or that I wasn't open to merging your changes.

My initial response to you was:

Changing the default sort algorithm, particularly to a hybrid one might make sense. That approach is the standard on most stl implementations for good reason. I certainly want to avoid pathological edge cases and have the default Sort() be broadly performant.

I wanted to understand your use-case better to make sure any fix was holistic and that we could take advantage of other work I was aware of. Further to that, I suggested additional complimentary changes.

I requested additional information from you to help drive this work.

You did not respond.

I mentioned your PR was failing the tests, possibly due to stack corruption.

You did not respond.

While waiting for your response, I went ahead and generated some test cases to emulate my best guess at the issue you were seeing. I eliminated the need for some of the sort calls as I suggested and saw significant uplift in my tests.

I released a new version and asked if this improved your use-case at all, with the expectation that we'd continue talking about the core change you proposed.

You did not respond.

Finally, you responded, but not to any of my queries. Instead, you mentioned that apart from a sub-optimal algorithm being used there was likely also an implementation error. You did this in a rather unpleasant way, but honestly I was kind of amused at the hyperbole - it's hard to believe a function with a silly error in it and an inelegant expression is the worst code any programmer has ever seen.

I would still like to fix this issue for you and for others.

I plan to:

fix error in the existing Sort() function as a likely cheap improvement
look to replace it with a hybrid sort algorithm as you originally proposed

If you'd still like to help, I'd welcome you to either reopen this PR and perhaps help investigate the crash it causes, or perhaps raise a new PR if that's not possible (or if you prefer).

I am still curious if the changes in the latest version have improved your use-case at all, ideally on top of your changes. I would expect them to still be complimentary.

If you are no longer interested in helping, I'll create a new issue to track this problem and continue fixing it.

MinisculeGirraffe · 2023-09-15T12:03:40Z

I would like to take this time to thank the members of the open-source community for their kindness, warmth, and general goodwill. It's been nice to see folks take such a keen, spirited interest in the history of computer science and the practical details of sorting algorithm implementation.

You didn't like that people called out on your bad behavior, so you're throwing a tantrum. Like a petty child.

ffulin · 2023-09-16T06:50:17Z

I've fixed the gap sequence calculation in the Sort function, replacing the erroneous fixed starting gap of 3 (which was intended to be n / 3) with Ciura's [2001] empirically derived sequence (0edfdbc)

Sorts that previously took tens of seconds now take tens of milliseconds.

I additionally added move semantic support as mentioned previously, though the gains there are not relevant for the job queuing case and most other uses in FASTBuild avoid the need by performing indirect sorts. Nonetheless, some other infrequent sorts are also improved by this change.

Previous changes to avoid sorts entirely by performing merges remain in-tact.

There may still be modest gains to be had by a change in algorithm, but given the gains and relatively low cost for large sets of data, that may no longer be necessary.

This fix will be in the next release (v1.12)

@rmitton If you are able to confirm if this resolves the performance problem you reported, that would be appreciated. If it doesn't, we can continue to investigate an algorithm switch as you proposed.

rmitton · 2023-09-18T19:39:55Z

Dear @ffulin,

Thank you for expressing continued interest in the performance analysis of sorting algorithms. As I previously mentioned, my focus has shifted to other matters. However, given the heightened interest in the shellsort gap sequence from so many unexpected sources, I thought it'd be beneficial to test the code in detail for those keen on the subject.

To that end, I've taken the various sorting algorithm implementations discussed in this thread and created a test harness (available here). This program evaluates the original ShellSort, the improved ShellSort, the provided QuickSort, and the two built-in C++ library sorts. It times sorting random numbers, with the quantity of 'n' varying from one to a million, and graphs the results in a .SVG file:

The tests were conducted on a Core i9 @ 3.5GHz using Microsoft Visual Studio 2022 with the /O3 optimizer flag. (the graph's Y-axis unfortunately required a mixed linear/logarithmic scale to accommodate them all together)

From the results, it's evident that the improved gap sequence offers clear notable enhancements as 'n' increases, and competes favourably with other production sorting implementations. I understand that refining the gap sequence for shellsort is an ongoing area of research in computer science, and I commend the developers for their continued and noble efforts.

Thank you to everyone for the constructive feedback. Given the continued importance of open-source projects like FASTBuild in our industry, those willing to work together to improve such software will no doubt find many future opportunities for tackling large-scale engineering challenges through the art of applied computer science.

ffulin · 2024-02-04T03:12:31Z

v1.12 has now been released. It includes a fix for the core sorting function bug at the core of this thread, as well as the other improvements discussed (elimination of a sort entirely during core graph traversal and better leveraging of move semantics in some sorts)

missmah reviewed Jun 24, 2023

View reviewed changes

ffulin added the Question label Jun 24, 2023

ffulin added the Need More Info (or Blocked) label Aug 6, 2023

rmitton closed this Sep 15, 2023

rmitton deleted the dev branch September 15, 2023 08:38

ffulin added Bug and removed Need More Info (or Blocked) Question labels Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgraded sort code to use QuickSort instead of ShellSort. #988

Upgraded sort code to use QuickSort instead of ShellSort. #988

rmitton commented Jun 23, 2023

missmah Jun 24, 2023

ffulin Jun 24, 2023

ffulin commented Jun 24, 2023 •

edited

ffulin commented Jul 4, 2023

rmitton commented Sep 12, 2023

larsx2 commented Sep 12, 2023

MinisculeGirraffe commented Sep 13, 2023

jan-vandenberg commented Sep 13, 2023

rmitton commented Sep 15, 2023

ffulin commented Sep 15, 2023 •

edited

MinisculeGirraffe commented Sep 15, 2023

ffulin commented Sep 16, 2023

rmitton commented Sep 18, 2023

ffulin commented Feb 4, 2024

Upgraded sort code to use QuickSort instead of ShellSort. #988

Upgraded sort code to use QuickSort instead of ShellSort. #988

Conversation

rmitton commented Jun 23, 2023

missmah Jun 24, 2023

Choose a reason for hiding this comment

ffulin Jun 24, 2023

Choose a reason for hiding this comment

ffulin commented Jun 24, 2023 • edited

ffulin commented Jul 4, 2023

rmitton commented Sep 12, 2023

larsx2 commented Sep 12, 2023

MinisculeGirraffe commented Sep 13, 2023

jan-vandenberg commented Sep 13, 2023

rmitton commented Sep 15, 2023

ffulin commented Sep 15, 2023 • edited

MinisculeGirraffe commented Sep 15, 2023

ffulin commented Sep 16, 2023

rmitton commented Sep 18, 2023

ffulin commented Feb 4, 2024

ffulin commented Jun 24, 2023 •

edited

ffulin commented Sep 15, 2023 •

edited