Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0 mangling should avoid backrefs when they're not shorter than their target. #87511

Open
eddyb opened this issue Jul 27, 2021 · 2 comments
Open
Labels
A-linkage Area: linking into static, shared libraries and binaries C-enhancement Category: An issue proposing an enhancement or a PR with one. I-heavy Issue: Problems and improvements with respect to binary size of generated code.

Comments

@eddyb
Copy link
Member

eddyb commented Jul 27, 2021

While working on #61486 / #87194 I came across backrefs in byte array values (with repeating elements), which I hadn't thought about (in terms of implications on optimizing the length of a symbol), but which is otherwise entirely expected. In order to demonstrate an edge case, I encoded some...longer...path::<{ &[0, 0, 0, 0] }> to get KRAh0_B12_B12_B12_E, which is parsed as follows:

K       // const generic
 R      //  &
  A     //   [
   h0_  //    0u8
   B12_ //    backref to "h0_" / `0u8`
   B12_ //    backref to "h0_" / `0u8`
   B12_ //    backref to "h0_" / `0u8`
  E     //   ]

Because this is a long enough symbol, the backrefs end up being (at least) 4 bytes each (to encode positions greater than 62), whereas an integer leaf constant with a 0..=15 value gets encoded as 3 bytes, so the backrefs are each adding an extra byte (and making more pointless work for the demangler).

We could avoid this by tracking the range, not just the start, of an encoded path/type/const, and one of:

  • skip caching the backref position when encoding that position would require more bytes than the inline size
  • record the range and make print_backref decide between actually printing the backref, and duplicating the contents of that range, inline, depending on which would be shorter

However, this isn't an urgent concern IMO because of the combination of these aspects:

  • the encoder gets to choose whether to use a backref or to repeat the encoding inline, we don't have to update decoders if we change our backref generation heuristic
  • most backrefs are either 3 or 4 bytes (for positions up to 62 or 62² = 3844, respectively)
  • we already special-case one-character leaves, to never record themselves for backrefs, so only less-trivial path/type/const syntax, which will produce 2 bytes or more, is relevant
  • paths have to include a crate root leaf (at least ~16 bytes) or a backref (at least 3 bytes), so the shortest possible paths are 5 bytes (e.g. inherent impl on a builtin type, using a backref for the impl parent module)
    • this further implies that only types/consts that don't reference paths can be 2-3 bytes (e.g. &str is Re, [u8] is Sh, fn() is FEu, etc.)

It might be interesting to measure the size difference across some larger projects (including rustc and Cargo), by making this change (even inefficiently), after v0 becomes the default, but I don't expect a difference bigger than a handful of bytes per symbol, if that.

cc @michaelwoerister

@michaelwoerister
Copy link
Member

Yes, it would make sense to me to include a rule in the RFC that says "only emit a backref if it is actually shorter than the thing it references". But I also don't think it is urgent for the reasons you list.

@eddyb
Copy link
Member Author

eddyb commented Aug 11, 2021

Yes, it would make sense to me to include a rule in the RFC that says "only emit a backref if it is actually shorter than the thing it references".

I wouldn't put it in the RFC, backrefs aren't required AFAIK, they're "just" an optimization over the direct encoding.

(we already skip backrefs of e.g. anything using bound lifetimes of an outer scope, even if there may be a way to correctly cache them, just because of the implementation cost vs small reward)

@wesleywiser wesleywiser added A-linkage Area: linking into static, shared libraries and binaries I-heavy Issue: Problems and improvements with respect to binary size of generated code. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries C-enhancement Category: An issue proposing an enhancement or a PR with one. I-heavy Issue: Problems and improvements with respect to binary size of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants