v0 mangling should avoid backrefs when they're not shorter than their target. #87511

eddyb · 2021-07-27T13:24:09Z

While working on #61486 / #87194 I came across backrefs in byte array values (with repeating elements), which I hadn't thought about (in terms of implications on optimizing the length of a symbol), but which is otherwise entirely expected. In order to demonstrate an edge case, I encoded some...longer...path::<{ &[0, 0, 0, 0] }> to get KRAh0_B12_B12_B12_E, which is parsed as follows:

K       // const generic
 R      //  &
  A     //   [
   h0_  //    0u8
   B12_ //    backref to "h0_" / `0u8`
   B12_ //    backref to "h0_" / `0u8`
   B12_ //    backref to "h0_" / `0u8`
  E     //   ]

Because this is a long enough symbol, the backrefs end up being (at least) 4 bytes each (to encode positions greater than 62), whereas an integer leaf constant with a 0..=15 value gets encoded as 3 bytes, so the backrefs are each adding an extra byte (and making more pointless work for the demangler).

We could avoid this by tracking the range, not just the start, of an encoded path/type/const, and one of:

skip caching the backref position when encoding that position would require more bytes than the inline size
record the range and make print_backref decide between actually printing the backref, and duplicating the contents of that range, inline, depending on which would be shorter

However, this isn't an urgent concern IMO because of the combination of these aspects:

the encoder gets to choose whether to use a backref or to repeat the encoding inline, we don't have to update decoders if we change our backref generation heuristic
most backrefs are either 3 or 4 bytes (for positions up to 62 or 62² = 3844, respectively)
we already special-case one-character leaves, to never record themselves for backrefs, so only less-trivial path/type/const syntax, which will produce 2 bytes or more, is relevant
paths have to include a crate root leaf (at least ~16 bytes) or a backref (at least 3 bytes), so the shortest possible paths are 5 bytes (e.g. inherent impl on a builtin type, using a backref for the impl parent module)
- this further implies that only types/consts that don't reference paths can be 2-3 bytes (e.g. &str is Re, [u8] is Sh, fn() is FEu, etc.)

It might be interesting to measure the size difference across some larger projects (including rustc and Cargo), by making this change (even inefficiently), after v0 becomes the default, but I don't expect a difference bigger than a handful of bytes per symbol, if that.

cc @michaelwoerister

The text was updated successfully, but these errors were encountered:

michaelwoerister · 2021-08-09T14:55:01Z

Yes, it would make sense to me to include a rule in the RFC that says "only emit a backref if it is actually shorter than the thing it references". But I also don't think it is urgent for the reasons you list.

eddyb · 2021-08-11T10:00:05Z

Yes, it would make sense to me to include a rule in the RFC that says "only emit a backref if it is actually shorter than the thing it references".

I wouldn't put it in the RFC, backrefs aren't required AFAIK, they're "just" an optimization over the direct encoding.

(we already skip backrefs of e.g. anything using bound lifetimes of an outer scope, even if there may be a way to correctly cache them, just because of the implementation cost vs small reward)

wesleywiser added A-linkage Area: linking into static, shared libraries and binaries I-heavy Issue: Problems and improvements with respect to binary size of generated code. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Aug 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0 mangling should avoid backrefs when they're not shorter than their target. #87511

v0 mangling should avoid backrefs when they're not shorter than their target. #87511

eddyb commented Jul 27, 2021

michaelwoerister commented Aug 9, 2021

eddyb commented Aug 11, 2021

v0 mangling should avoid backrefs when they're not shorter than their target. #87511

v0 mangling should avoid backrefs when they're not shorter than their target. #87511

Comments

eddyb commented Jul 27, 2021

michaelwoerister commented Aug 9, 2021

eddyb commented Aug 11, 2021