Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add feature gate to permit measuring memory usage of dyn upcasting #112355

Closed
nikomatsakis opened this issue Jun 6, 2023 · 13 comments
Closed

add feature gate to permit measuring memory usage of dyn upcasting #112355

nikomatsakis opened this issue Jun 6, 2023 · 13 comments
Assignees
Labels
F-trait_upcasting `#![feature(trait_upcasting)]` T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jun 6, 2023

@WaffleLapkin is preparing a PR to permit measuring memory usage of dyn upcasting to help gather data for concerns raised in #65991. The lang team is generally in favor of gathering data but we don't want to wait infinitely long. Opening this issue to track this work.

Instructions on how to measure trait upcasting effects on your codebase

You need a recent nightly compiler — 2023-06-14 or newer.

If you are using cargo, run cargo clean && RUSTFLAGS="-Zprint-vtable-sizes" cargo check. Note that cargo clean is required — without it caching will silence the output. If you are using rustc directly pass -Zprint-vtable-sizes to it.

The output will contain lines like these:

print-vtable-sizes { "crate_name": "num_traits", "trait_name": "RefNum", "entries": "23", "entries_ignoring_upcasting": "13", "entries_for_upcasting": "10", "upcasting_cost_percent": "76.92307692307693" }
print-vtable-sizes { "crate_name": "num_traits", "trait_name": "NumAssignOps", "entries": "12", "entries_ignoring_upcasting": "8", "entries_for_upcasting": "4", "upcasting_cost_percent": "50" }
print-vtable-sizes { "crate_name": "num_traits", "trait_name": "NumOps", "entries": "12", "entries_ignoring_upcasting": "8", "entries_for_upcasting": "4", "upcasting_cost_percent": "50" }
print-vtable-sizes { "crate_name": "num_traits", "trait_name": "cast::ToPrimitive", "entries": "17", "entries_ignoring_upcasting": "17", "entries_for_upcasting": "0", "upcasting_cost_percent": "0" }
print-vtable-sizes { "crate_name": "num_traits", "trait_name": "ops::inv::Inv", "entries": "4", "entries_ignoring_upcasting": "4", "entries_for_upcasting": "0", "upcasting_cost_percent": "0" }
print-vtable-sizes { "crate_name": "num_traits", "trait_name": "ops::mul_add::MulAdd", "entries": "4", "entries_ignoring_upcasting": "4", "entries_for_upcasting": "0", "upcasting_cost_percent": "0" }
print-vtable-sizes { "crate_name": "num_traits", "trait_name": "ops::mul_add::MulAddAssign", "entries": "4", "entries_ignoring_upcasting": "4", "entries_for_upcasting": "0", "upcasting_cost_percent": "0" }
print-vtable-sizes { "crate_name": "num_traits", "trait_name": "pow::Pow", "entries": "4", "entries_ignoring_upcasting": "4", "entries_for_upcasting": "0", "upcasting_cost_percent": "0" }

print-vtable-sizes followed by a json describing vtable sizes. Json will contain the following fields:

  • crate_name — name of the crate which defines the traits
  • trait_name — path to the trait
  • entries — number of entries in a vtable with the current algorithm (i.e. with upcasting)
  • entries_ignoring_upcasting — number of entries in a vtable, as-if we did not have trait upcasting
  • entries_for_upcasting — number of entries in a vtable needed solely for upcasting (i.e. entries - entries_ignoring_upcasting).
  • upcasting_cost_percent — cost of having upcasting in % relative to the number of entries without upcasting (i.e. entries_for_upcasting / entries_ignoring_upcasting * 100%).

Lines are sorted by upcasting_cost_percent, so that the biggest % change is first.

You can collect and analyze these stats however your want.

Some important notes:

  1. This includes private traits
  2. This includes traits that are not meant to be used as dyn and/or vtables of which are never actually instantiated
  3. For traits with generics/associated types this can produce slightly misleading stats (only making the change bigger than it actually is)
  4. This does not include traits which are not object safe
@nikomatsakis
Copy link
Contributor Author

@WaffleLapkin can you leave a comment with status update?

@nikomatsakis nikomatsakis added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jun 6, 2023
@WaffleLapkin WaffleLapkin self-assigned this Jun 6, 2023
@WaffleLapkin WaffleLapkin added the F-trait_upcasting `#![feature(trait_upcasting)]` label Jun 6, 2023
@WaffleLapkin
Copy link
Member

I have a WIP branch here, it needs a little bit of cleanup (like gating the stat print under -Z option) but overall works. I planned to finish cleanup today, but couldn't because of {personal reasons I don't think are worth disclosing here}. I'll try to finish the cleanup asap, hopefully tomorrow :)
I'm also not currently to run the experiment myself, because my laptop is too weak and dev desktops don't have any isolation like podman, but hopefully this will be sorted out soon....

@WaffleLapkin
Copy link
Member

I've opened a PR: #112400 (sadly, while cleaning it up I've found a few ICEs I've introduced, which I'm not immediately sure how to fix...)

@WaffleLapkin
Copy link
Member

The implementation PR was merged and with the latest nightly you can try this out (I've added instructions to the issue description).

Note, however, that I'm still not able to run a big experiment on top-N crater, or anything like that.

@saethlin
Copy link
Member

saethlin commented Jul 20, 2023

Hi, Waffle asked if I could help it out so I'm giving this experiment a shot. I maintain a crater-like tool that makes a number of different design decisions (it desperately needs a real name, it's linked above). I'll upload a tarball in a day or so I think with output from the latest version of all published crates per the instructions above.

@Urgau
Copy link
Member

Urgau commented Jul 20, 2023

I just ran the option with some well known crates and created a script to summarises those results.

I may edit this post with more results in the future; also fell free to use my script.

All the summaries have been done with rustc 1.73.0-nightly (399b06823 2023-07-20), which includes #113856 (that PR reduced the cost by ~1/2% for affected traits).


Summary for axum-all-features, axum, bevy, helix, ruffle

Upcasting cost: MEAN ± STDEV (MIN, MAX)

axum-all-features (with dependencies)

Total entries: 3226 (for 708 traits total)
Total entries for upcasting: 55 (~1.70% of all entries)
Affected traits: 19 (~2.68%)

Upcasting cost (for affected traits): ~24.3% ± 18.4 (min: 7.69, max: 76.9)
Upcasting cost (globally): ~0.652% ± 4.91 (min: 0.0, max: 76.9)

axum (with dependencies)

Total entries: 1617 (for 303 traits total)
Total entries for upcasting: 14 (~0.87% of all entries)
Affected traits: 10 (~3.30%)

Upcasting cost (for affected traits): ~16.1% ± 4.51 (min: 8.33, max: 20.0)
Upcasting cost (globally): ~0.531% ± 2.98 (min: 0.0, max: 20.0)

bevy (with dependencies)

Total entries: 7592 (for 1108 traits total)
Total entries for upcasting: 135 (~1.78% of all entries)
Affected traits: 45 (~4.06%)

Upcasting cost (for affected traits): ~21.2% ± 18.3 (min: 1.42, max: 80.0)
Upcasting cost (globally): ~0.863% ± 5.56 (min: 0.0, max: 80.0)

helix (with dependencies)

Total entries: 3706 (for 662 traits total)
Total entries for upcasting: 63 (~1.70% of all entries)
Affected traits: 22 (~3.32%)

Upcasting cost (for affected traits): ~29.5% ± 17.4 (min: 7.14, max: 76.9)
Upcasting cost (globally): ~0.98% ± 6.14 (min: 0.0, max: 76.9)

ruffle (with dependencies)

Total entries: 8291 (for 1234 traits total)
Total entries for upcasting: 214 (~2.58% of all entries)
Affected traits: 73 (~5.92%)

Upcasting cost (for affected traits): ~29.9% ± 18.2 (min: 1.42, max: 80.0)
Upcasting cost (globally): ~1.77% ± 8.31 (min: 0.0, max: 80.0)

External

medium-sized commercial game (with dependencies) #112355 (comment)

Total entries: 5486 (for 824 traits total)
Total entries for upcasting: 88 (~1.60% of all entries)
Affected traits: 32 (~3.88%)

Upcasting cost (for affected traits): ~21.2% ± 19.5 (min: 1.42, max: 80.0)
Upcasting cost (globally): ~0.822% ± 5.57 (min: 0.0, max: 80.0)

@saethlin
Copy link
Member

The file is too big for GitHub, so here's a Google Drive link: https://drive.google.com/file/d/1nURLpnrRODv8Essd268nhhqM_APFwrrX/view

tar xf raw.tar.xz should extract about 120,000 directories with the name of each crate, and in each should be a file with the version of the crate that was built. The total extracted size is about 2.4 GB.

@LPGhatguy
Copy link

Howdy! We're working on a medium-sized commercial game in Rust. I ran the test on our codebase using rustc 1.71.0 (8ede3aae2 2023-07-12) and sorted the results by upcasting_cost_percent:

jq -s -c 'sort_by(.upcasting_cost_percent|tonumber) | reverse[]' output.json > sorted.json

Here's our results: https://gist.github.com/LPGhatguy/1a9624f0a30711f301460a777696bd5f

The biggest concern I would have is the impact on wgpu_core with ~80% upcasting cost. I haven't been following this feature or issue, so I'm not sure what impact that has on performance.

@nbdd0121
Copy link
Contributor

I am not sure that this really captures the major cost of trait upcasting?

For example, if I have

trait A {
    fn foo(&self);
}

trait B {
    fn bar(&self);
}

trait C: A + B {}

and just uses dyn C, then without upcasting, vtable of C is 5 entries, and there's no need for vtable of A and B. With upcasting, the cost of C - 6 entries, and then vtable of B - 3 entries, which is 9 entries in total. The cost here is 80%, not 20% as the flag produces. The flag is underestimating the cost here.

It probably makes more sense to add a feature flag that disable trait upcasting computation at all and then just compare the number of vtables generated in the object files?

@tmandry
Copy link
Member

tmandry commented Aug 8, 2023

It probably makes more sense to add a feature flag that disable trait upcasting computation at all and then just compare the number of vtables generated in the object files?

I completely agree – the best way to measure would be to disable the feature and measure the end-to-end cost in the binary.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Aug 27, 2023
Add an (perma-)unstable option to disable vtable vptr

This flag is intended for evaluation of trait upcasting space cost for embedded use cases.

Compared to the approach in rust-lang#112355, this option provides a way to evaluate end-to-end cost of trait upcasting. Rationale: rust-lang#112355 (comment)

## How this flag should be used (after merge)

Build your project with and without `-Zno-trait-vptr` flag. If you are using cargo, set `RUSTFLAGS="-Zno-trait-vptr"` in the environment variable. You probably also want to use `-Zbuild-std` or the binary built may be broken. Save both binaries somewhere.

### Evaluate the space cost

The option has a direct and indirect impact on vtable space usage. Directly, it gets rid of the trait vptr entry needed to store a pointer to a vtable of a supertrait. (IMO) this is a small saving usually. The larger saving usually comes with the indirect saving by eliminating the vtable of the supertrait (and its parent).

Both impacts only affects vtables (notably the number of functions monomorphized should , however where vtable reside can depend on your relocation model. If the relocation model is static, then vtable is rodata (usually stored in Flash/ROM together with text in embedded scenario). If the binary is relocatable, however, the vtable will live in `.data` (more specifically, `.data.rel.ro`), and this will need to reside in RAM (which may be a more scarce resource in some cases), together with dynamic relocation info living in readonly segment.

For evaluation, you should run `size` on both binaries, with and without the flag. `size` would output three columns, `text`, `data`, `bss` and the sum `dec` (and it's hex version). As explained above, both `text` and `data` may change. `bss` shouldn't usually change. It'll be useful to see:
* Percentage change in text + data (indicating required flash/ROM size)
* Percentage change in data + bss (indicating required RAM size)
@nbdd0121
Copy link
Contributor

nbdd0121 commented Aug 31, 2023

#114974 is merged, and now we have a way to measure end-to-end cost. There's a how-to-use instruction in the PR, and I'll reproduce it here:

How to use -Zno-trait-vptr flag

(You need the latest nightly)

Build your project with and without -Zno-trait-vptr flag. If you are using cargo, set RUSTFLAGS="-Zno-trait-vptr" in the environment variable.

NOTE: A binary produced may be broken if some crates are built with -Zno-trait-vptr and some aren't. If you want to produce a working binary, you probably also want to use -Zbuild-std. If the cost uses dyn trait upcasting with this flag enabled, it may be UB. This flag is for evaluation only and please do not use this flag to produce production binary.

Evaluate the space cost

The option has a direct and indirect impact on vtable space usage. Directly, it gets rid of the trait vptr entry needed to store a pointer to a vtable of a supertrait. (IMO) this is a small saving usually. The larger saving usually comes with the indirect saving by eliminating the vtable of the supertrait (and its parent).

Both impacts only affects vtables (notably the number of functions monomorphized should not change, however where vtable reside can depend on your relocation model. If the relocation model is static, then vtable is rodata (usually stored in Flash/ROM together with text in embedded scenario). If the binary is relocatable, however, the vtable will live in .data (more specifically, .data.rel.ro), and this will need to reside in RAM (which may be a more scarce resource in embedded cases), together with dynamic relocation info living in readonly segment.

For evaluation, you should run size on both binaries, with and without the flag. size would output three columns, text, data, bss and the sum dec (and it's hex version). As explained above, both text and data may change. bss shouldn't usually change. It'll be useful to see:

  • Percentage change in text and data alone;
  • Percentage change in text + data (indicating required flash/ROM size in embedded, or readonly and shareable pages in hosted environment)
  • Percentage change in data + bss (indicating required RAM size in embedded, or process-private pages in hosted environment)

@nbdd0121
Copy link
Contributor

Here's a few projects that I evaluated:

https://github.com/tock/tock (embedded):

  text    data     bss     dec     hex filename
160604      60   19888  180552   2c148 upcast.elf
  text    data     bss     dec     hex filename
160380      60   19888  180328   2c068 noupcast.elf

224 bytes increase in .text, 0.14% overhead. No .data overhead.

I evaluated a few smaller firmware/embedded projects, and I don't see size difference at all, since there are no dyn usage or dyn usage does not involve traits with multiple supertraits.

A personal WIP driver project:

   text	data     bss     dec     hex filename
 814169   12804   	0  826973   c9e5d upcast
   text	data     bss     dec     hex filename
 813233   12804   	0  826037   c9ab5 noupcast

936 bytes increase in .text, 0.12% overhead. 0.11% overall overhead.

https://github.com/nbdd0121/r2vm (hosted, size doesn't really matter, investigate out of curiosity):

   text    data     bss     dec     hex filename
3445057  238856    1584 3685497  383c79 upcast
   text    data     bss     dec     hex filename
3440257  236456    1584 3678297  382059 noupcast

4800 bytes increase in .text, 0.14% overhead. .data 2400 bytes, 1.01% overhead. 0.2% overall overhead for text + data combined.


With the data available, I think my conclusion is that the size overhead is <0.2% in all cases where I think size really matters a lot, so IMO the overhead is acceptable (when weighting against the language/compiler complexity to make the upcasting feature opt-in rather than always enabled).

My only concern now is about libcore/liballoc accidentally introduce upcastable traits and force the overhead to everyone, but that concern was already mitigated by the multiple_supertrait_upcastable lint that's enabled in libcore/liballoc.

I would encourage other people interested in binary size to give it a shot with the new flag and re-run the evaluation using the end-to-end approach and report numbers. @Urgau @LPGhatguy who posted some evaluation here and @oxalica who raised the initial vtable size concern.

@nikomatsakis
Copy link
Contributor Author

In rust-lang triage meeting today, we decided to close this issue because we felt we had gathered the data we needed. Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-trait_upcasting `#![feature(trait_upcasting)]` T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants