Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use version-sorting for all sorting #115046

Merged
merged 6 commits into from Jan 11, 2024
Merged

Use version-sorting for all sorting #115046

merged 6 commits into from Jan 11, 2024

Conversation

joshtriplett
Copy link
Member

Add a description of a version-sorting algorithm. (This algorithm does
not precisely match strverscmp; it's intentionally simpler in its
handling of leading zeroes, and produces a result easier for humans to
easily understand and do by hand.)

Change all references to sorting to use version-sorting.

Change all references to "ASCIIbetically" to instead say "sort
non-lowercase before lowercase".

Add a description of a version-sorting algorithm. (This algorithm does
not precisely match `strverscmp`; it's intentionally simpler in its
handling of leading zeroes, and produces a result easier for humans to
easily understand and do by hand.)

Change all references to sorting to use version-sorting.

Change all references to "ASCIIbetically" to instead say "sort
non-lowercase before lowercase".
@joshtriplett joshtriplett added I-style-nominated The issue / PR has been nominated for discussion during a style team meeting. T-style Relevant to the style team, which will review and decide on the PR/issue. labels Aug 21, 2023
@rustbot
Copy link
Collaborator

rustbot commented Aug 21, 2023

r? @compiler-errors

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 21, 2023
@rustbot
Copy link
Collaborator

rustbot commented Aug 21, 2023

Some changes occurred in src/doc/style-guide

cc @rust-lang/style

src/doc/style-guide/src/README.md Outdated Show resolved Hide resolved
src/doc/style-guide/src/README.md Outdated Show resolved Hide resolved
src/doc/style-guide/src/README.md Outdated Show resolved Hide resolved
leading zeroes, longer sequences of digits are larger numbers, and
equal-length sequences of digits can be sorted lexicographically.)
- If the numbers have the same numeric value, the one with more leading zeroes
comes first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a deep reason for this? I think I would have expected the one with more leading zeroes to come later, similar to how with usual lexicographic ordering the longer string (in that case with more trailing characters) comes later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly, consistency with numeric sorting. Suppose you have 000, 001, 002, ..., 010, 011, ..., 100, 101. We sort those in that order, not for any reason involving the leading zeroes, but because of their numeric value. But the net effect is also that numbers with three leading zeroes come first, then two leading zeroes, then one leading zero, then no leading zeroes. So it seemed consistent, to me, to put "more leading zeroes" first in this case too.

Copy link
Member

@calebcartwright calebcartwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I follow (roughly), with a couple inline questions for clarity.

I don't have any strong opinions one way or another about the resultant sort order, but I do think it would be worth expanding on the set of examples (if not in the guide itself, then at least as for discussion with t-style members to help everyone visualize the differences with current state)

Comment on lines 110 to 112
For the purposes of the Rust style, to compare two strings for version-sorting:

- Compare the strings by (Unicode) character as normal, finding the index of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I presume correctly that the usage of the term "strings" here does not bound the following prescriptions just to literal strings? I.e. we want the same algorithm to also apply in all other sorting contexts (e.g. idents in imports)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the intention of "strings" in "compare two strings for version-sorting" here is in the sense of it being a string to the tool parsing it, not that it's a literal string in the code being parsed.

Suggestions for clearer, more unambiguous wording welcome.

src/doc/style-guide/src/README.md Outdated Show resolved Hide resolved
@compiler-errors
Copy link
Member

@joshtriplett: Would you like to start an FCP on this, or do you want to have synchronous discussion on this? If you want sync discussion, please re-nominate this, otherwise just kick off the FCP 😺.

@compiler-errors compiler-errors removed the I-style-nominated The issue / PR has been nominated for discussion during a style team meeting. label Sep 27, 2023
@compiler-errors
Copy link
Member

cc rust-lang/style-team#143 as well

@joshtriplett
Copy link
Member Author

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Oct 31, 2023

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Oct 31, 2023
@calebcartwright
Copy link
Member

@rfcbot concern require all checkboxes

(we require all members to provide their check due to small team size)

@calebcartwright
Copy link
Member

@rfcbot resolve require all checkboxes

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Nov 29, 2023
@rfcbot
Copy link

rfcbot commented Nov 29, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Nov 29, 2023
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Dec 9, 2023
@rfcbot rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Dec 12, 2023
@yaahc yaahc added the I-style-nominated The issue / PR has been nominated for discussion during a style team meeting. label Dec 12, 2023
@calebcartwright
Copy link
Member

For example consider "p00b" vs "p0a". Debian's algorithm does: "p" vs "p", same; "00" vs "0" number, same; "b" vs "g"

Should that last bit be "b" vs "a" or have I gotten confused somewhere 🤔

@ijackson
Copy link
Contributor

Should that last bit be "b" vs "a" or have I gotten confused somewhere 🤔

Oops, yes, sorry. The keys are ... err ... like right next to each other? Edited to fix it, thanks.

@joshtriplett
Copy link
Member Author

We discussed this at length in today's @rust-lang/style meeting.

For context on my original writeup of this algorithm, I have no attachment to being anywhere close to the strverscmp algorithm (and I do think strverscmp itself has some historical issues and bugs). I started out wanting to solve the problem of sorting x8 < x16 < x32 < x64 < x128 in that order, and similar, and knew version-sorting would get there, so I looked at strverscmp. Then I found its quirks (from the manpage) as well as historical bugs (notably, musl's version history has some notes on glibc implementation issues that musl reworked to be bug-compatible with), so I reworked this algorithm description to make more sense because we have no need to be bug-compatible with strverscmp.

What we discussed and agreed on in the style meeting was that we shouldn't be path-dependent and end up with "strverscmp with some tweaks/fixes". We should decide on the properties we want, come up with an algorithm that provides those properties, and not worry about how closely that aligns with strverscmp or any other version-sorting algorithm.

We're also mindful of the fact that this is mostly a sort for Rust identifiers (and possibly future Rust style needs like Cargo.toml but not yet), not a general sorting algorithm, and that sorting algorithms are context dependent. We want to prioritize making the mental model simple, but we're more flexible about the implementation being somewhat more complicated.

We do need to cover more corner cases in the document (it included some, but not all the ones raised here), by listing a series of identifiers and how they sort. (We don't think it's critical to document every difference between this and other version-sorting algorithms, but we can give a high-level difference like "This differs from other version-sorting algorithms, notably in the treatment of underscores and leading zeroes.".)

Properties we want to achieve, as agreed in the style meeting:

  • We want to sort numeric values without regard for leading zeroes, so x0a < x00b < x0c.
  • We have to provide a total order, so we have to have a well-defined order between any non-equal strings.
  • If identifiers are identical all the way to the end, other than the number of leading zeroes in one of its numeric components (e.g. x005a09b and x5a009b), we should sort them based on the first such difference.
    • (We didn't try to figure out in the meeting whether we had consensus on which direction to sort in that case. Speaking for myself only, I have a mild preference for putting more leading zeroes first, but if there's a concrete use case that argues the other direction I'd prioritize the concrete use case.)
  • We agreed that _ makes sense to special-case, since it acts like a space between words. (We noted that for a fully general sort algorithm, we'd need to specify things like how _ and sort relative to each other, but for characters allowed in Rust identifiers, we don't.)

I'm going to revise this proposal to support these properties.

@joshtriplett
Copy link
Member Author

@ijackson wrote:

It is more valuable for the algorithm to be simple than for it to do something precisely optimal with mixed length digit strings. So IMO we should have the following order: "a0" "a00" "p0a" "p00a" "p0b" "p00b"

As far as I can tell, we're going to need an explicit check at the end of identical-other-than-leading-zeroes strings based on remembering the first leading-zero difference. We shouldn't just do "if they're identical other than leading zeroes, compare lexically", because AFAICT that would produce this order: x0 x00 x000 x001 x01 x1 x002 x02 x2. (NUL sorts lexically before 0, 0 sorts lexically before 1.)

I think we should do one of the following two orders, for consistency:

  • x000 x00 x0 x001 x01 x1 x002 x02 x2
  • x0 x00 x000 x1 x01 x001 x2 x02 x002

Of the two, I would favor the former over the latter, because other than the case of numeric substrings that are all zero but have a different number of zeroes (which seems far more unlikely), it's consistent with lexical ordering ('0' < '1'), so it's more likely to be what people expect.

@ijackson
Copy link
Contributor

ijackson commented Dec 14, 2023

We want to sort numeric values without regard for leading zeroes, so x0a < x00b < x0c.

Is that because it's like hex? The problem with this is that it requires a fallback rescan, which makes the algorithm more difficult to explain. I don't think we actually care very much about the precise ordering of strings which differ only in number of leading zeroes in numerical components.

So I think we could have an algorithm which doesn't have a fallback rescan, and just has a rule which treats numerical pieces with the same numerical vlaue but different lengths, as unequal right away, rather than treating them as equal now and looking at them again later if the rest of the string is equal too.

But, the way it looks like hex got me thinking. Can we make it so that identifiers containing hex values come out in the right order? Identifiers with hex in aren't rare. Ideally we don't want an algorithm which guesses the base, so it falls out of the rest of the algorithm, but I'm not sure that's possible.

Ie for hex I think we want: x0 x12 xa12 xa1f and x000 x012 xa12 xa1f and 0x0 0x12 0xa12.

This is hard because xa12 vs xa1f looks like xa, 12 vs xa, 1, f. I think this shows that we can't do hex sensibly without base-guessing?

If we had a very simple base-guessing heuristic perhaps people woudl be able to structure their identifiers to trigger it. "Previous character was x" maybe.

@ijackson
Copy link
Contributor

If we had a very simple base-guessing heuristic perhaps people woudl be able to structure their identifiers to trigger it. "Previous character was x" maybe.

hexagon < hexadecimal because x triggers "hex" so it's he, x!, 0xa, gon < he, x!, 0xadec, imal. lol. So err not that then. Maybe only _x? I think this wants some experimentation/brainstorming. Sorry to think of this at this late stage in the process...

@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Dec 14, 2023
@joshtriplett
Copy link
Member Author

We want to sort numeric values without regard for leading zeroes, so x0a < x00b < x0c.

Is that because it's like hex?

No, this is nothing to do with hex; that the examples I gave started with an x was just the convention of picking x for an unknown variable. :)

I don't think we actually care very much about the precise ordering of strings which differ only in number of leading zeroes in numerical components.

The proposal I was making was "when comparing numeric components, compare by numeric value and ignore the number of leading zeroes". We don't have to do a second pass, though; if they're equal, we go on to the next group, but we remember if one or the other had less leading zeroes, and if we get to the end and the two strings happened to be equal in every group, we base the return value on that first difference.

Can we make it so that identifiers containing hex values come out in the right order? Identifiers with hex in aren't rare.

That's a reasonable thought. As you noted, I don't think there's any way to make hex sort numerically without attempting to detect hexadecimal. And I think we're not going to be able to do that reliably. My current inclination would be to not attempt to handle hexadecimal numbers at all.

Treat numeric chunks with equal value but differing numbers of leading
zeroes as equal, unless we get to the end of the entire string in which
case we use "more leading zeroes in the earliest differing chunk" as a
tiebreaker.

Treat `_` as a word separator, sorting it before anything other than
space.

Give more examples.
@joshtriplett
Copy link
Member Author

@yaahc I've addressed the concern; could you please review?

@yaahc
Copy link
Member

yaahc commented Dec 23, 2023

@rfcbot resolve discuss edge-cases raised by ijackson

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Dec 23, 2023
@rfcbot
Copy link

rfcbot commented Dec 23, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Dec 23, 2023
@ijackson
Copy link
Contributor

ijackson commented Jan 8, 2024

LGTM too now, thanks for the updates :-)

@joshtriplett joshtriplett removed the I-style-nominated The issue / PR has been nominated for discussion during a style team meeting. label Jan 10, 2024
@compiler-errors
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Jan 10, 2024

📌 Commit 2e931b5 has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 10, 2024
@compiler-errors
Copy link
Member

@bors rollup

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 11, 2024
…iaskrgr

Rollup of 11 pull requests

Successful merges:

 - rust-lang#115046 (Use version-sorting for all sorting)
 - rust-lang#118915 (Add some comments, add `can_define_opaque_ty` check to `try_normalize_ty_recur`)
 - rust-lang#119006 (Fix is_global special address handling)
 - rust-lang#119637 (Pass LLVM error message back to pass wrapper.)
 - rust-lang#119715 (Exhaustiveness: abort on type error)
 - rust-lang#119763 (Cleanup things in and around `Diagnostic`)
 - rust-lang#119788 (change function name in comments)
 - rust-lang#119790 (Fix all_trait* methods to return all traits available in StableMIR)
 - rust-lang#119803 (Silence some follow-up errors [1/x])
 - rust-lang#119804 (Stabilize mutex_unpoison feature)
 - rust-lang#119832 (Meta: Add project const traits to triagebot config)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 3cc4e02 into rust-lang:master Jan 11, 2024
11 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 11, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 11, 2024
Rollup merge of rust-lang#115046 - joshtriplett:master, r=compiler-errors

Use version-sorting for all sorting

Add a description of a version-sorting algorithm. (This algorithm does
not precisely match `strverscmp`; it's intentionally simpler in its
handling of leading zeroes, and produces a result easier for humans to
easily understand and do by hand.)

Change all references to sorting to use version-sorting.

Change all references to "ASCIIbetically" to instead say "sort
non-lowercase before lowercase".
wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Mar 29, 2024
Pkgsrc changes:
 * Adapt checksums and patches.

Upstream chnages:

Version 1.77.0 (2024-03-21)
==========================

- [Reveal opaque types within the defining body for exhaustiveness checking.]
  (rust-lang/rust#116821)
- [Stabilize C-string literals.]
  (rust-lang/rust#117472)
- [Stabilize THIR unsafeck.]
  (rust-lang/rust#117673)
- [Add lint `static_mut_refs` to warn on references to mutable statics.]
  (rust-lang/rust#117556)
- [Support async recursive calls (as long as they have indirection).]
  (rust-lang/rust#117703)
- [Undeprecate lint `unstable_features` and make use of it in the compiler.]
  (rust-lang/rust#118639)
- [Make inductive cycles in coherence ambiguous always.]
  (rust-lang/rust#118649)
- [Get rid of type-driven traversal in const-eval interning]
  (rust-lang/rust#119044),
  only as a [future compatiblity lint]
  (rust-lang/rust#122204) for now.
- [Deny braced macro invocations in let-else.]
  (rust-lang/rust#119062)

Compiler
--------

- [Include lint `soft_unstable` in future breakage reports.]
  (rust-lang/rust#116274)
- [Make `i128` and `u128` 16-byte aligned on x86-based targets.]
  (rust-lang/rust#116672)
- [Use `--verbose` in diagnostic output.]
  (rust-lang/rust#119129)
- [Improve spacing between printed tokens.]
  (rust-lang/rust#120227)
- [Merge the `unused_tuple_struct_fields` lint into `dead_code`.]
  (rust-lang/rust#118297)
- [Error on incorrect implied bounds in well-formedness check]
  (rust-lang/rust#118553),
  with a temporary exception for Bevy.
- [Fix coverage instrumentation/reports for non-ASCII source code.]
  (rust-lang/rust#119033)
- [Fix `fn`/`const` items implied bounds and well-formedness check.]
  (rust-lang/rust#120019)
- [Promote `riscv32{im|imafc}-unknown-none-elf` targets to tier 2.]
  (rust-lang/rust#118704)
- Add several new tier 3 targets:
  - [`aarch64-unknown-illumos`]
    (rust-lang/rust#112936)
  - [`hexagon-unknown-none-elf`]
    (rust-lang/rust#117601)
  - [`riscv32imafc-esp-espidf`]
    (rust-lang/rust#119738)
  - [`riscv32im-risc0-zkvm-elf`]
    (rust-lang/rust#117958)

Refer to Rust's [platform support page][platform-support-doc]
for more information on Rust's tiered platform support.

Libraries
---------

- [Implement `From<&[T; N]>` for `Cow<[T]>`.]
  (rust-lang/rust#113489)
- [Remove special-case handling of `vec.split_off
  (0)`.](rust-lang/rust#119917)

Stabilized APIs
---------------

- [`array::each_ref`]
  (https://doc.rust-lang.org/stable/std/primitive.array.html#method.each_ref)
- [`array::each_mut`]
  (https://doc.rust-lang.org/stable/std/primitive.array.html#method.each_mut)
- [`core::net`]
  (https://doc.rust-lang.org/stable/core/net/index.html)
- [`f32::round_ties_even`]
  (https://doc.rust-lang.org/stable/std/primitive.f32.html#method.round_ties_even)
- [`f64::round_ties_even`]
  (https://doc.rust-lang.org/stable/std/primitive.f64.html#method.round_ties_even)
- [`mem::offset_of!`]
  (https://doc.rust-lang.org/stable/std/mem/macro.offset_of.html)
- [`slice::first_chunk`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first_chunk)
- [`slice::first_chunk_mut`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first_chunk_mut)
- [`slice::split_first_chunk`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first_chunk)
- [`slice::split_first_chunk_mut`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first_chunk_mut)
- [`slice::last_chunk`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last_chunk)
- [`slice::last_chunk_mut`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last_chunk_mut)
- [`slice::split_last_chunk`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last_chunk)
- [`slice::split_last_chunk_mut`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last_chunk_mut)
- [`slice::chunk_by`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunk_by)
- [`slice::chunk_by_mut`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunk_by_mut)
- [`Bound::map`]
  (https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.map)
- [`File::create_new`]
  (https://doc.rust-lang.org/stable/std/fs/struct.File.html#method.create_new)
- [`Mutex::clear_poison`]
  (https://doc.rust-lang.org/stable/std/sync/struct.Mutex.html#method.clear_poison)
- [`RwLock::clear_poison`]
  (https://doc.rust-lang.org/stable/std/sync/struct.RwLock.html#method.clear_poison)

Cargo
-----

- [Extend the build directive syntax with `cargo::`.]
  (rust-lang/cargo#12201)
- [Stabilize metadata `id` format as `PackageIDSpec`.]
  (rust-lang/cargo#12914)
- [Pull out as `cargo-util-schemas` as a crate.]
  (rust-lang/cargo#13178)
- [Strip all debuginfo when debuginfo is not requested.]
  (rust-lang/cargo#13257)
- [Inherit jobserver from env for all kinds of runners.]
  (rust-lang/cargo#12776)
- [Deprecate rustc plugin support in cargo.]
  (rust-lang/cargo#13248)

Rustdoc
-----

- [Allows links in markdown headings.]
  (rust-lang/rust#117662)
- [Search for tuples and unit by type with `()`.]
  (rust-lang/rust#118194)
- [Clean up the source sidebar's hide button.]
  (rust-lang/rust#119066)
- [Prevent JS injection from `localStorage`.]
  (rust-lang/rust#120250)

Misc
----

- [Recommend version-sorting for all sorting in style guide.]
  (rust-lang/rust#115046)

Internal Changes
----------------

These changes do not affect any public interfaces of Rust, but they represent
significant improvements to the performance or internals of rustc and related
tools.

- [Add more weirdness to `weird-exprs.rs`.]
  (rust-lang/rust#119028)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-style Relevant to the style team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants