Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/exp/slices: unexpected pdqsort behavior #52518

Closed
zhangyunhao116 opened this issue Apr 24, 2022 · 7 comments
Closed

x/exp/slices: unexpected pdqsort behavior #52518

zhangyunhao116 opened this issue Apr 24, 2022 · 7 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@zhangyunhao116
Copy link
Contributor

Open this issue to discuss pdqsort behavior. cc @randall77 , please take a look, thanks!

  • Why in CL 371574 we need to use {{GreaterOrEqual}} instead of !{{Less}}

After we used go run gen_sort_variants.go -generic to generate code for package slices, we found that the slices test will fail in float64 slice(slices/sort_test.go:TestSortFloat64Slice).

The reason for this error is that in pdqsort we use if a > 0 && !{{Less "data" "a-1" "pivot"}} to detect if we selected the same pivot twice for partition, which is fine in most cases. But in slices/zsortordered.go(generated codes), this line will be if a > 0 && !(data[a-1]<data[pivot]), when data[pivot] or data[a-1] is NaN, !(data[a-1]<data[pivot]) is always true, resulting in incorrect algorithm behavior(mistakenly believe that data[a-1] equals data[pivot]).

The fix is PatchSet18 at CL 371574, we can see the diff in https://go-review.googlesource.com/c/go/+/371574/17..18. In this patch, we use {{GreaterOrEqual}} instead of !{{Less}}, for other files, {{GreaterOrEqual}} is !{{Less}}(keep the same as before), for slices/zsortordered.go, the generated code will be if a > 0 && (data[a-1]>=data[pivot]), which can solve the problem. This change should only work for zsortordered.go:pdqsort, but it still changes StableSort's code due to global replacement, so we need to fix this in the new CL.

The reason for these problems is that we use if a > 0 && !{{Less "data" "a-1" "pivot"}} in pdqsort to determine whether there are duplicate pivot selections (data [a-1] must be less than or equal to pivot if exists). If this mechanism is removed, we can use !Less instead of {{GreaterOrEqual}} in all cases. I'm still not sure if there is a better way to implement this mechanism.

BTW, if we use different maxInsertion in sort and slices as discussed earlier, it will also lead to inconsistent algorithm behavior, does this mean that we can't use different maxInsertion in different versions?

@ulikunitz
Copy link
Contributor

Maybe following remark is helpful: The original pdqsort assumes strict weak ordering as std::sort in C++. NaNs violate this assumption. The developer of pdqsort, Orson Peters, explains it here: https://youtu.be/jz-PBiWwNjc?t=826.

@zhangyunhao116
Copy link
Contributor Author

@ulikunitz Thanks!

For sort package, we can use func (x Float64Slice) Less(i, j int) bool { return x[i] < x[j] || (isNaN(x[i]) && !isNaN(x[j])) } to keep this assumption. So it will be fine.

For the custom comparison function(both sort and slices), it's undefined behavior.

For ordered types(in slices package), we can't use the same mechanism as sort, because the generic does not support template metaprogramming or any other form of compile-time programming.

It seems that disable this optimization can keep our previous compatibility? (some users rely on this behavior, for example, using custom functions to sort float64 slices), but it also will reduce the performance when the slice contains many duplicate elements.

@randall77
Copy link
Contributor

It's already the case (before pdqsort) that both sort.Slice and exp/slices.Sort fail to sort a slice of float64s that has NaNs in it.
https://go.dev/play/p/o6cPn_DCyBu
I think the failure with pdqsort is that we can now have adjacent non-NaN values that are sorted incorrectly. I don't think that was possible before (just guessing, I don't have a proof of that). But it was always true that if there are NaNs, the result is not sorted in that we could have a[i] < a[j] for some i,j with i>j (just not i=j+1).

So I think we can leave everything as !Less, and just fix the test in exp/slices to not require correct sorting if there are NaNs. It turns out that the test there just happens to work, but I think that test happens to test just adjacent elements, not all pairs. It seems the test data was copied from stdlib sort, but the test there uses the ordering function that has a NaN special case.

It would certainly be worth adding a comment to exp/slices.Sort that no guarantees are made if sorting NaNs, with a pointer to the stdlib sort.Float64s and friends which can handle NaNs.

@ianlancetaylor
Copy link
Contributor

CC @eliben

@zhangyunhao116
Copy link
Contributor Author

I think the failure with pdqsort is that we can now have adjacent non-NaN values that are sorted incorrectly. I don't think that was possible before (just guessing, I don't have a proof of that)

Yes! The pdqsort now have adjacent non-NaN values that are sorted incorrectly when sorting a slice that contains NaNs. Added a warning in both exp/slices.Sort and exp/slices.SortFunc.

Agree the idea that we can just leave everything as !Less, and fix the test in exp/slices. Related CLs are updated, please take a look.

@gopherbot
Copy link

Change https://go.dev/cl/402534 mentions this issue: slices: add package-level documentation for less function

@cagedmantis cagedmantis changed the title slices: pdqsort behavior x/exp/slices: unexpected pdqsort behavior Apr 27, 2022
@gopherbot gopherbot added this to the Unreleased milestone Apr 27, 2022
@cagedmantis cagedmantis added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 27, 2022
@cagedmantis cagedmantis modified the milestones: Unreleased, Backlog Apr 27, 2022
@eliben
Copy link
Member

eliben commented Apr 28, 2022

Thanks for handling this, @zhangyunhao116 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants