-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for rendering a summary of objects held by the graph #14638
Conversation
d7c4abe
to
6bcf942
Compare
4eeeb65
to
bf4dc1a
Compare
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels] # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust]
[ci skip-build-wheels]
bf4dc1a
to
f5c7655
Compare
# Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…tsbuild#14638) As described in pantsbuild#12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a `--memory-summary` option which summarizes the deep sizes of live objects in the `Graph`. On the Python side, the deep size is calculated using a very basic deduping walk of `gc.get_referents` (after having investigated [pympler](https://pypi.org/project/Pympler/), [guppy3](https://pypi.org/project/guppy3/), [objsize](https://pypi.org/project/objsize/)). On the Rust side, the `deepsize` crate is used, with sizes derived for all types reachable from `NodeKey` and `NodeOutput`. Example output: ``` Memory summary: 64 1 pants.backend.docker.subsystems.dockerfile_parser.DockerfileParser 64 1 pants.backend.docker.subsystems.dockerfile_parser.ParserSetup 64 1 pants.backend.java.dependency_inference.java_parser_launcher.JavaParserCompiledClassfiles 64 1 pants.backend.java.dependency_inference.symbol_mapper.FirstPartyJavaTargetsMappingRequest <snip> 1588620 957 (native) pants.engine.internals.graph.hydrate_sources 2317920 2195 (native) pants.backend.python.dependency_inference.module_mapper.map_module_to_address 2774760 2434 (native) pants.engine.internals.graph.determine_explicitly_provided_dependencies 4446900 1458 (native) pants.engine.internals.graph.resolve_dependencies ``` [ci skip-build-wheels]
…tsbuild#14638) As described in pantsbuild#12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a `--memory-summary` option which summarizes the deep sizes of live objects in the `Graph`. On the Python side, the deep size is calculated using a very basic deduping walk of `gc.get_referents` (after having investigated [pympler](https://pypi.org/project/Pympler/), [guppy3](https://pypi.org/project/guppy3/), [objsize](https://pypi.org/project/objsize/)). On the Rust side, the `deepsize` crate is used, with sizes derived for all types reachable from `NodeKey` and `NodeOutput`. Example output: ``` Memory summary: 64 1 pants.backend.docker.subsystems.dockerfile_parser.DockerfileParser 64 1 pants.backend.docker.subsystems.dockerfile_parser.ParserSetup 64 1 pants.backend.java.dependency_inference.java_parser_launcher.JavaParserCompiledClassfiles 64 1 pants.backend.java.dependency_inference.symbol_mapper.FirstPartyJavaTargetsMappingRequest <snip> 1588620 957 (native) pants.engine.internals.graph.hydrate_sources 2317920 2195 (native) pants.backend.python.dependency_inference.module_mapper.map_module_to_address 2774760 2434 (native) pants.engine.internals.graph.determine_explicitly_provided_dependencies 4446900 1458 (native) pants.engine.internals.graph.resolve_dependencies ``` [ci skip-build-wheels]
…rrypick of #14638) (#14644) As described in #12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a `--memory-summary` option which summarizes the deep sizes of live objects in the `Graph`. On the Python side, the deep size is calculated using a very basic deduping walk of `gc.get_referents` (after having investigated [pympler](https://pypi.org/project/Pympler/), [guppy3](https://pypi.org/project/guppy3/), [objsize](https://pypi.org/project/objsize/)). On the Rust side, the `deepsize` crate is used, with sizes derived for all types reachable from `NodeKey` and `NodeOutput`. Example output: ``` Memory summary: 64 1 pants.backend.docker.subsystems.dockerfile_parser.DockerfileParser 64 1 pants.backend.docker.subsystems.dockerfile_parser.ParserSetup 64 1 pants.backend.java.dependency_inference.java_parser_launcher.JavaParserCompiledClassfiles 64 1 pants.backend.java.dependency_inference.symbol_mapper.FirstPartyJavaTargetsMappingRequest <snip> 1588620 957 (native) pants.engine.internals.graph.hydrate_sources 2317920 2195 (native) pants.backend.python.dependency_inference.module_mapper.map_module_to_address 2774760 2434 (native) pants.engine.internals.graph.determine_explicitly_provided_dependencies 4446900 1458 (native) pants.engine.internals.graph.resolve_dependencies ``` [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat. Lmk if you'd like me to do these followup tickets :)
(size, count, f"(native) {name}") for name, (count, size) in rust_sizes.items() | ||
) | ||
|
||
print("Memory summary:", file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to include what each column's title is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed... either here or in the help.
memory_summary = BoolOption( | ||
"--memory-summary", | ||
default=False, | ||
help=("Report a summary of memory usage at the end of each run."), | ||
advanced=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm this naturally fits a lot better in [stats]
. It looks like we don't pass this option through to the engine, so perhaps it can in fact move?
Should this be marked daemon=True
also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, fine with moving.
Shouldn't need daemon=True
: the value is reported per-run, and doesn't require adjustment to the scheduler.
std::mem::size_of_val(&k) | ||
+ k.deep_size_of_children(&mut deep_context) | ||
+ std::mem::size_of_val(&v) | ||
+ v.deep_size_of_children(&mut deep_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts if we should be worried about overflow? Maybe use checked_add
, and if it fails then log at ERROR level and ask people to write a bug report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need to worry about it: overflowing usize
would require 18 exabytes of memory.
The `--stats-memory-summary` added in #14638/#14652 was [reporting surprisingly large sizes](#12662 (comment)) for native `NodeKey` structs -- even when excluding the actual Python values that they held. Investigation showed that both the `Task` and `Entry` structs were contributing significantly to the size of the `Task` struct. The [`internment` crate](https://crates.io/crates/internment) used here (and in #14654) is an alternative to giving these values integer IDs. They become pointers to a unique, shared (technically: leaked) copy of the value. They are consequently 1) much smaller, 2) much faster to compare. The `top`-reported memory usage of `./pants dependencies --transitive ::`: * `313M` before (summary [before.txt](https://github.com/pantsbuild/pants/files/8175461/before.txt)) * `220M` after (summary [after.txt](https://github.com/pantsbuild/pants/files/8175462/after.txt)) [ci skip-build-wheels]
…uild#14683) The `--stats-memory-summary` added in pantsbuild#14638/pantsbuild#14652 was [reporting surprisingly large sizes](pantsbuild#12662 (comment)) for native `NodeKey` structs -- even when excluding the actual Python values that they held. Investigation showed that both the `Task` and `Entry` structs were contributing significantly to the size of the `Task` struct. The [`internment` crate](https://crates.io/crates/internment) used here (and in pantsbuild#14654) is an alternative to giving these values integer IDs. They become pointers to a unique, shared (technically: leaked) copy of the value. They are consequently 1) much smaller, 2) much faster to compare. The `top`-reported memory usage of `./pants dependencies --transitive ::`: * `313M` before (summary [before.txt](https://github.com/pantsbuild/pants/files/8175461/before.txt)) * `220M` after (summary [after.txt](https://github.com/pantsbuild/pants/files/8175462/after.txt)) [ci skip-build-wheels]
…pick of #14683) (#14689) The `--stats-memory-summary` added in #14638/#14652 was [reporting surprisingly large sizes](#12662 (comment)) for native `NodeKey` structs -- even when excluding the actual Python values that they held. Investigation showed that both the `Task` and `Entry` structs were contributing significantly to the size of the `Task` struct. The [`internment` crate](https://crates.io/crates/internment) used here (and in #14654) is an alternative to giving these values integer IDs. They become pointers to a unique, shared (technically: leaked) copy of the value. They are consequently 1) much smaller, 2) much faster to compare. The `top`-reported memory usage of `./pants dependencies --transitive ::`: * `313M` before (summary [before.txt](https://github.com/pantsbuild/pants/files/8175461/before.txt)) * `220M` after (summary [after.txt](https://github.com/pantsbuild/pants/files/8175462/after.txt)) [ci skip-build-wheels]
As described in #12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a
--memory-summary
option which summarizes the deep sizes of live objects in theGraph
.On the Python side, the deep size is calculated using a very basic deduping walk of
gc.get_referents
(after having investigated pympler, guppy3, objsize). On the Rust side, thedeepsize
crate is used, with sizes derived for all types reachable fromNodeKey
andNodeOutput
.Example output: