Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NonZeroU32 instead of u32 within Symbol. #56662

Closed
wants to merge 1 commit into from

Conversation

nnethercote
Copy link
Contributor

This shrinks Option<Symbol> from 8 bytes to 4 bytes, which shrinks
Token from 24 bytes to 16 bytes. This reduces instruction counts by up
to 1% across a range of benchmarks.

The commit introduces a new type, SymbolVec, to encapsulate the
1-indexing now required to inter-operate with the non-zero indices.

This shrinks `Option<Symbol>` from 8 bytes to 4 bytes, which shrinks
`Token` from 24 bytes to 16 bytes. This reduces instruction counts by up
to 1% across a range of benchmarks.

The commit introduces a new type, `SymbolVec`, to encapsulate the
1-indexing now required to inter-operate with the non-zero indices.
@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 10, 2018
@nnethercote
Copy link
Contributor Author

Here is the first part of the Cachegrind diff on one benchmark. It shows lots of little improvements all over the parser, as you'd expect.

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
-47,660,021 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                    file:function
--------------------------------------------------------------------------------
-11,088,522 (23.27%)  /home/njn/moz/rustN/src/libsyntax/tokenstream.rs:<syntax::tokenstream::TokenStream as core::clone::Clone>::clone
  9,362,836 (-19.6%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:syntax_pos::hygiene::SyntaxContext::as_u32
 -9,362,836 (19.65%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:syntax_pos::symbol::Symbol::as_u32
 -7,706,886 (16.17%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/macro_rules.rs:<syntax::ext::tt::macro_rules::MacroRulesMacroExpander as syntax::ext::base::TTMacroExpander>::expand
 -6,599,254 (13.85%)  /home/njn/moz/rustN/src/libsyntax/parse/parser.rs:syntax::parse::parser::TokenCursor::next
 -5,627,859 (11.81%)  /home/njn/moz/rustN/src/libcore/option.rs:syntax::parse::parser::TokenCursor::next
 -4,994,685 (10.48%)  /home/njn/moz/rustN/src/libsyntax/parse/parser.rs:syntax::parse::parser::Parser::next_tok
 -4,670,190 ( 9.80%)  /home/njn/moz/rustN/src/libsyntax/parse/parser.rs:syntax::parse::parser::Parser::new
 -3,918,583 ( 8.22%)  /home/njn/moz/rustN/src/libsyntax/tokenstream.rs:syntax::parse::parser::TokenCursor::next
 -3,040,491 ( 6.38%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/macro_parser.rs:syntax::ext::tt::macro_parser::parse
  2,848,455 (-5.98%)  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:<T as serialize::serialize::Encodable>::encode
  2,848,455 (-5.98%)  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:<T as serialize::serialize::Encodable>::encode
 -2,485,687 ( 5.22%)  /home/njn/moz/rustN/src/libcore/option.rs:<syntax::tokenstream::TokenStream as core::clone::Clone>::clone
 -2,185,073 ( 4.58%)  /build/glibc-OTsEL5/glibc-2.27/elf/dl-lookup.c:do_lookup_x
  1,984,346 (-4.16%)  /home/njn/moz/rustN/src/libsyntax/parse/token.rs:syntax::parse::parser::TokenCursor::next
 -1,802,517 ( 3.78%)  /home/njn/moz/rustN/src/libcore/ptr.rs:syntax::parse::parser::TokenCursor::next
  1,779,398 (-3.73%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:<syntax::tokenstream::TokenStream as core::clone::Clone>::clone
  1,709,073 (-3.59%)  /home/njn/moz/rustN/src/libcore/num/mod.rs:<T as serialize::serialize::Encodable>::encode
 -1,580,218 ( 3.32%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/macro_parser.rs:syntax::ext::tt::macro_parser::token_name_eq
  1,580,031 (-3.32%)  /home/njn/moz/rustN/src/libsyntax/parse/token.rs:syntax::ext::tt::macro_parser::token_name_eq
 -1,558,222 ( 3.27%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/quoted.rs:syntax::ext::tt::macro_parser::parse
  1,548,572 (-3.25%)  /home/njn/moz/rustN/src/libsyntax/tokenstream.rs:<syntax::ext::tt::macro_rules::MacroRulesMacroExpander as syntax::ext::base::TTMacroExpander>::expand
 -1,535,496 ( 3.22%)  /home/njn/moz/rustN/src/libcore/slice/mod.rs:<syntax::ext::tt::macro_rules::MacroRulesMacroExpander as syntax::ext::base::TTMacroExpander>::expand
 -1,533,944 ( 3.22%)  /home/njn/moz/rustN/src/libcore/option.rs:syntax::ext::tt::macro_parser::parse

@Mark-Simulacrum
Copy link
Member

@bors r+

Seems like a good optimization.

@bors
Copy link
Contributor

bors commented Dec 10, 2018

📌 Commit cf682e0 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 10, 2018
@leonardo-m
Copy link

NonZeroU32 allows to denote one value (zero) as special, so with compiler help Option<> uses it to denote a missing value. But then you have had to implement the 1-indexing. I guess the CPU doesn't pay much those 1-subtractions caused by SymbolVec::get, but an alternative solution is to use a different value, like u32::MAX to denote the missing value for Option<>. This still restricts the u32 range by 1, but doesn't require the subtractions and the 1-indexing code.

In the D standard library ( https://dlang.org/phobos/std_typecons.html#Nullable ) there is a second version of Nullable that allows that:
struct Nullable(T, T nullValue);
Rust Integer generics should help create something like this.

/// Symbols (which are 1-indexed) index into this (which is 0-indexed
/// internally). The methods handle the index conversions.
#[derive(Default)]
pub struct SymbolVec(Vec<&'static str>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make Symbol a newtype_index, which would prevent such index arithmetic hacks and even make Option<Option<Symbol>> and deeper nestings take up the same amount of bytes.

That would require no changes to the list of symbols, too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does newtype_index avoid the space for the tag when used within an Option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reserves (u32::max_value() - 255)..=u32::max_value() via compiler-internal trickery:

#[rustc_layout_scalar_valid_range_end($max)]

@oli-obk
Copy link
Contributor

oli-obk commented Dec 10, 2018

@bors r- we should discuss the way this optimization is implemented

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 10, 2018
@petrochenkov petrochenkov self-assigned this Dec 10, 2018
@petrochenkov
Copy link
Contributor

I tried this previously (#28657) and found the difference to be lost in noise, but I didn't have proper benchmarks.

@petrochenkov
Copy link
Contributor

If the goal is to shrink Token specifically, then Option<Symbol> can be replaced with Symbol with keywords::Invalid.name() as None.
There are multiple places do that already.

@petrochenkov
Copy link
Contributor

Reserving a slot in the end of integer range instead of 0 (#56662 (comment)) seems like the best solution.
Perhaps then we could do the opposite of #56662 (comment) and avoid keywords::Invalid in some places.

@alexcrichton
Copy link
Member

r? @oli-obk

@petrochenkov petrochenkov self-assigned this Dec 10, 2018
@nnethercote
Copy link
Contributor Author

nnethercote commented Dec 10, 2018

If the goal is to shrink Token specifically, then Option<Symbol> can be replaced with Symbol with keywords::Invalid.name() as None.

I wrote a patch doing that a while ago. It makes the code much uglier, and spreads complexity to every piece of code that touches Lit. I strongly prefer this PR's approach.

@nnethercote
Copy link
Contributor Author

I opened a new PR (#56699) for the newtype_index! approach.

@nnethercote nnethercote deleted the NonZero-Symbol branch December 12, 2018 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants