Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for rendering a summary of objects held by the graph #14638

Merged
merged 4 commits into from
Feb 27, 2022

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Feb 26, 2022

As described in #12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a --memory-summary option which summarizes the deep sizes of live objects in the Graph.

On the Python side, the deep size is calculated using a very basic deduping walk of gc.get_referents (after having investigated pympler, guppy3, objsize). On the Rust side, the deepsize crate is used, with sizes derived for all types reachable from NodeKey and NodeOutput.

Example output:

Memory summary:
64		1		pants.backend.docker.subsystems.dockerfile_parser.DockerfileParser
64		1		pants.backend.docker.subsystems.dockerfile_parser.ParserSetup
64		1		pants.backend.java.dependency_inference.java_parser_launcher.JavaParserCompiledClassfiles
64		1		pants.backend.java.dependency_inference.symbol_mapper.FirstPartyJavaTargetsMappingRequest
<snip>
1588620		957		(native) pants.engine.internals.graph.hydrate_sources
2317920		2195		(native) pants.backend.python.dependency_inference.module_mapper.map_module_to_address
2774760		2434		(native) pants.engine.internals.graph.determine_explicitly_provided_dependencies
4446900		1458		(native) pants.engine.internals.graph.resolve_dependencies

@stuhood stuhood mentioned this pull request Feb 26, 2022
@stuhood stuhood force-pushed the stuhood/memory-summary branch 3 times, most recently from 4eeeb65 to bf4dc1a Compare February 27, 2022 00:35
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@stuhood stuhood merged commit 7acf1cb into pantsbuild:main Feb 27, 2022
@stuhood stuhood deleted the stuhood/memory-summary branch February 27, 2022 03:43
stuhood added a commit to stuhood/pants that referenced this pull request Feb 27, 2022
…tsbuild#14638)

As described in pantsbuild#12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a `--memory-summary` option which summarizes the deep sizes of live objects in the `Graph`.

On the Python side, the deep size is calculated using a very basic deduping walk of `gc.get_referents` (after having investigated [pympler](https://pypi.org/project/Pympler/), [guppy3](https://pypi.org/project/guppy3/), [objsize](https://pypi.org/project/objsize/)). On the Rust side, the `deepsize` crate is used, with sizes derived for all types reachable from `NodeKey` and `NodeOutput`.

Example output:
```
Memory summary:
64		1		pants.backend.docker.subsystems.dockerfile_parser.DockerfileParser
64		1		pants.backend.docker.subsystems.dockerfile_parser.ParserSetup
64		1		pants.backend.java.dependency_inference.java_parser_launcher.JavaParserCompiledClassfiles
64		1		pants.backend.java.dependency_inference.symbol_mapper.FirstPartyJavaTargetsMappingRequest
<snip>
1588620		957		(native) pants.engine.internals.graph.hydrate_sources
2317920		2195		(native) pants.backend.python.dependency_inference.module_mapper.map_module_to_address
2774760		2434		(native) pants.engine.internals.graph.determine_explicitly_provided_dependencies
4446900		1458		(native) pants.engine.internals.graph.resolve_dependencies
```

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this pull request Feb 27, 2022
…tsbuild#14638)

As described in pantsbuild#12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a `--memory-summary` option which summarizes the deep sizes of live objects in the `Graph`.

On the Python side, the deep size is calculated using a very basic deduping walk of `gc.get_referents` (after having investigated [pympler](https://pypi.org/project/Pympler/), [guppy3](https://pypi.org/project/guppy3/), [objsize](https://pypi.org/project/objsize/)). On the Rust side, the `deepsize` crate is used, with sizes derived for all types reachable from `NodeKey` and `NodeOutput`.

Example output:
```
Memory summary:
64		1		pants.backend.docker.subsystems.dockerfile_parser.DockerfileParser
64		1		pants.backend.docker.subsystems.dockerfile_parser.ParserSetup
64		1		pants.backend.java.dependency_inference.java_parser_launcher.JavaParserCompiledClassfiles
64		1		pants.backend.java.dependency_inference.symbol_mapper.FirstPartyJavaTargetsMappingRequest
<snip>
1588620		957		(native) pants.engine.internals.graph.hydrate_sources
2317920		2195		(native) pants.backend.python.dependency_inference.module_mapper.map_module_to_address
2774760		2434		(native) pants.engine.internals.graph.determine_explicitly_provided_dependencies
4446900		1458		(native) pants.engine.internals.graph.resolve_dependencies
```

[ci skip-build-wheels]
stuhood added a commit that referenced this pull request Feb 27, 2022
…rrypick of #14638) (#14644)

As described in #12662, some use cases have surprising memory usage. To enable tracking those cases down, this change adds a `--memory-summary` option which summarizes the deep sizes of live objects in the `Graph`.

On the Python side, the deep size is calculated using a very basic deduping walk of `gc.get_referents` (after having investigated [pympler](https://pypi.org/project/Pympler/), [guppy3](https://pypi.org/project/guppy3/), [objsize](https://pypi.org/project/objsize/)). On the Rust side, the `deepsize` crate is used, with sizes derived for all types reachable from `NodeKey` and `NodeOutput`.

Example output:
```
Memory summary:
64		1		pants.backend.docker.subsystems.dockerfile_parser.DockerfileParser
64		1		pants.backend.docker.subsystems.dockerfile_parser.ParserSetup
64		1		pants.backend.java.dependency_inference.java_parser_launcher.JavaParserCompiledClassfiles
64		1		pants.backend.java.dependency_inference.symbol_mapper.FirstPartyJavaTargetsMappingRequest
<snip>
1588620		957		(native) pants.engine.internals.graph.hydrate_sources
2317920		2195		(native) pants.backend.python.dependency_inference.module_mapper.map_module_to_address
2774760		2434		(native) pants.engine.internals.graph.determine_explicitly_provided_dependencies
4446900		1458		(native) pants.engine.internals.graph.resolve_dependencies
```

[ci skip-build-wheels]
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. Lmk if you'd like me to do these followup tickets :)

(size, count, f"(native) {name}") for name, (count, size) in rust_sizes.items()
)

print("Memory summary:", file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to include what each column's title is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed... either here or in the help.

Comment on lines +1455 to +1460
memory_summary = BoolOption(
"--memory-summary",
default=False,
help=("Report a summary of memory usage at the end of each run."),
advanced=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this naturally fits a lot better in [stats]. It looks like we don't pass this option through to the engine, so perhaps it can in fact move?

Should this be marked daemon=True also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, fine with moving.

Shouldn't need daemon=True: the value is reported per-run, and doesn't require adjustment to the scheduler.

Comment on lines +163 to +166
std::mem::size_of_val(&k)
+ k.deep_size_of_children(&mut deep_context)
+ std::mem::size_of_val(&v)
+ v.deep_size_of_children(&mut deep_context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts if we should be worried about overflow? Maybe use checked_add, and if it fails then log at ERROR level and ask people to write a bug report?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need to worry about it: overflowing usize would require 18 exabytes of memory.

stuhood added a commit that referenced this pull request Mar 3, 2022
The `--stats-memory-summary` added in #14638/#14652 was [reporting surprisingly large sizes](#12662 (comment))  for native `NodeKey` structs -- even when excluding the actual Python values that they held. Investigation showed that both the `Task` and `Entry` structs were contributing significantly to the size of the `Task` struct.

The [`internment` crate](https://crates.io/crates/internment) used here (and in #14654) is an alternative to giving these values integer IDs. They become pointers to a unique, shared (technically: leaked) copy of the value. They are consequently 1) much smaller, 2) much faster to compare.

The `top`-reported memory usage of `./pants dependencies --transitive ::`:
* `313M` before (summary [before.txt](https://github.com/pantsbuild/pants/files/8175461/before.txt))
* `220M` after (summary [after.txt](https://github.com/pantsbuild/pants/files/8175462/after.txt))

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this pull request Mar 3, 2022
…uild#14683)

The `--stats-memory-summary` added in pantsbuild#14638/pantsbuild#14652 was [reporting surprisingly large sizes](pantsbuild#12662 (comment))  for native `NodeKey` structs -- even when excluding the actual Python values that they held. Investigation showed that both the `Task` and `Entry` structs were contributing significantly to the size of the `Task` struct.

The [`internment` crate](https://crates.io/crates/internment) used here (and in pantsbuild#14654) is an alternative to giving these values integer IDs. They become pointers to a unique, shared (technically: leaked) copy of the value. They are consequently 1) much smaller, 2) much faster to compare.

The `top`-reported memory usage of `./pants dependencies --transitive ::`:
* `313M` before (summary [before.txt](https://github.com/pantsbuild/pants/files/8175461/before.txt))
* `220M` after (summary [after.txt](https://github.com/pantsbuild/pants/files/8175462/after.txt))

[ci skip-build-wheels]
stuhood added a commit that referenced this pull request Mar 3, 2022
…pick of #14683) (#14689)

The `--stats-memory-summary` added in #14638/#14652 was [reporting surprisingly large sizes](#12662 (comment))  for native `NodeKey` structs -- even when excluding the actual Python values that they held. Investigation showed that both the `Task` and `Entry` structs were contributing significantly to the size of the `Task` struct.

The [`internment` crate](https://crates.io/crates/internment) used here (and in #14654) is an alternative to giving these values integer IDs. They become pointers to a unique, shared (technically: leaked) copy of the value. They are consequently 1) much smaller, 2) much faster to compare.

The `top`-reported memory usage of `./pants dependencies --transitive ::`:
* `313M` before (summary [before.txt](https://github.com/pantsbuild/pants/files/8175461/before.txt))
* `220M` after (summary [after.txt](https://github.com/pantsbuild/pants/files/8175462/after.txt))

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants