Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use version-sorting for all sorting #115046

Merged
merged 6 commits into from Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
36 changes: 36 additions & 0 deletions src/doc/style-guide/src/README.md
Expand Up @@ -99,6 +99,42 @@ fn bar() {}
fn baz() {}
```

### Sorting

In various cases, the default Rust style specifies to sort things. If not
otherwise specified, such sorting should be "version sorting", which ensures
that (for instance) `x8` comes before `x16` even though the character `1` comes
before the character `8`. (If not otherwise specified, version-sorting is
lexicographical.)

For the purposes of the Rust style, to compare two strings for version-sorting:

- Compare the strings by (Unicode) character as normal, finding the index of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I presume correctly that the usage of the term "strings" here does not bound the following prescriptions just to literal strings? I.e. we want the same algorithm to also apply in all other sorting contexts (e.g. idents in imports)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the intention of "strings" in "compare two strings for version-sorting" here is in the sense of it being a string to the tool parsing it, not that it's a literal string in the code being parsed.

Suggestions for clearer, more unambiguous wording welcome.

the first differing character. (If the two strings do not have the same
length, this may be the end of the shorter string.)
- For both strings, determine the sequence of ASCII digits containing either
that character or the character before. (If either string doesn't have such a
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
sequence of ASCII digits, fall back to comparing the strings as normal.)
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
- Compare the numeric values of the number specified by the sequence of digits.
(Note that an implementation of this algorithm can easily check this without
accumulating copies of the digits or converting to a number: longer sequences
of digits are larger numbers, equal-length sequences can be sorted
lexicographically.)
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
- If the numbers have the same numeric value, the one with more leading zeroes
comes first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a deep reason for this? I think I would have expected the one with more leading zeroes to come later, similar to how with usual lexicographic ordering the longer string (in that case with more trailing characters) comes later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly, consistency with numeric sorting. Suppose you have 000, 001, 002, ..., 010, 011, ..., 100, 101. We sort those in that order, not for any reason involving the leading zeroes, but because of their numeric value. But the net effect is also that numbers with three leading zeroes come first, then two leading zeroes, then one leading zero, then no leading zeroes. So it seemed consistent, to me, to put "more leading zeroes" first in this case too.


Note that there exist various algorithms called "version sorting", which differ
most commonly in their handling of numbers with leading zeroes. This algorithm
does not purport to precisely match the behavior of any particular other
algorithm, only to produce a simple and satisfying result for Rust formatting.
(In particular, this algorithm aims to produce a satisfying result for a set of
symbols that have the same number of leading zeroes, and an acceptable and
easily understandable result for a set of symbols that has varying numbers of
leading zeroes.)

As an example, version-sorting will sort the following symbols in the order
given: `x000`, `x00`, `x0`, `x01`, `x1`, `x09`, `x9`, `x010`, `x10`.
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved

### [Module-level items](items.md)

### [Statements](statements.md)
Expand Down
6 changes: 3 additions & 3 deletions src/doc/style-guide/src/cargo.md
Expand Up @@ -8,11 +8,11 @@ Put a blank line between the last key-value pair in a section and the header of
the next section. Do not place a blank line between section headers and the
key-value pairs in that section, or between key-value pairs in a section.

Sort key names alphabetically within each section, with the exception of the
Version-sort key names within each section, with the exception of the
`[package]` section. Put the `[package]` section at the top of the file; put
the `name` and `version` keys in that order at the top of that section,
followed by the remaining keys other than `description` in alphabetical order,
followed by the `description` at the end of that section.
followed by the remaining keys other than `description` in order, followed by
the `description` at the end of that section.

Don't use quotes around any standard key names; use bare keys. Only use quoted
keys for non-standard keys whose names require them, and avoid introducing such
Expand Down
2 changes: 2 additions & 0 deletions src/doc/style-guide/src/editions.md
Expand Up @@ -37,6 +37,8 @@ history of the style guide. Notable changes in the Rust 2024 style edition
include:

- Miscellaneous `rustfmt` bugfixes.
- Use version-sort (sort `x8`, `x16`, `x32`, `x64`, `x128` in that order).
- Change "ASCIIbetical" sort to Unicode-aware "non-lowercase before lowercase".

## Rust 2015/2018/2021 style edition

Expand Down
23 changes: 15 additions & 8 deletions src/doc/style-guide/src/items.md
Expand Up @@ -9,8 +9,8 @@ an item appears at module level or within another item.
alphabetically.

`use` statements, and module *declarations* (`mod foo;`, not `mod { ... }`)
must come before other items. Put imports before module declarations. Sort each
alphabetically, except that `self` and `super` must come before any other
must come before other items. Put imports before module declarations.
Version-sort each, except that `self` and `super` must come before any other
names.

Don't automatically move module declarations annotated with `#[macro_use]`,
Expand Down Expand Up @@ -441,8 +441,10 @@ foo::{
A *group* of imports is a set of imports on the same or sequential lines. One or
more blank lines or other items (e.g., a function) separate groups of imports.

Within a group of imports, imports must be sorted ASCIIbetically (uppercase
before lowercase). Groups of imports must not be merged or re-ordered.
Within a group of imports, imports must be version-sorted, except that
non-lowercase characters (characters that can start an `UpperCamelCase`
identifier) must be sorted before lowercase characters. Groups of imports must
not be merged or re-ordered.

E.g., input:

Expand All @@ -469,10 +471,15 @@ re-ordering.

### Ordering list import

Names in a list import must be sorted ASCIIbetically, but with `self` and
`super` first, and groups and glob imports last. This applies recursively. For
example, `a::*` comes before `b::a` but `a::b` comes before `a::*`. E.g.,
`use foo::bar::{a, b::c, b::d, b::d::{x, y, z}, b::{self, r, s}};`.
Names in a list import must be version-sorted, except that:
- `self` and `super` always come first if present,
- non-lowercase characters (characters that can start an `UpperCamelCase`
identifier) must be sorted before lowercase characters, and
- groups and glob imports always come last if present.

This applies recursively. For example, `a::*` comes before `b::a` but `a::b`
comes before `a::*`. E.g., `use foo::bar::{a, b::c, b::d, b::d::{x, y, z},
b::{self, r, s}};`.

### Normalisation

Expand Down