-
Notifications
You must be signed in to change notification settings - Fork 58
Copied Dcperf (Feedsim) : Add graph storage/loading optimization and eliminate per-thread graph building #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
marziehlenjaniMeta
wants to merge
1
commit into
facebookresearch:v2-beta
from
marziehlenjaniMeta:export-D80288337-to-v2-beta
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D80288337 |
9168660 to
bdd15a2
Compare
marziehlenjaniMeta
added a commit
to marziehlenjaniMeta/DCPerf
that referenced
this pull request
Aug 25, 2025
…te per-thread graph building (facebookresearch#201) Summary: This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities. It also eliminates redundant graph building across multiple thread runs, improving performance and scalability. **Key changes:** 1. **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`): * Added `-S` flag to store generated graphs to a file for reuse across instances * Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread * Enhanced help documentation to explain the new optimization options * Updated command line parsing to handle the new flags and pass them through to the underlying executables 2. **Command line options** (`LeafNodeRankCmdline.ggo`): * Added `store_graph` option to enable saving generated graphs to a specified file * Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones **Performance optimizations:** * **Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version * Reduces benchmark initialization time * Enables consistent testing across runs by using identical graph structures * Improves reproducibility of benchmark results * Optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads Rollback Plan: Differential Revision: D80288337
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D80288337 |
marziehlenjaniMeta
added a commit
to marziehlenjaniMeta/DCPerf
that referenced
this pull request
Aug 28, 2025
…te per-thread graph building (facebookresearch#201) Summary: This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities. It also eliminates redundant graph building across multiple thread runs, improving performance and scalability. **Key changes:** 1. **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`): * Added `-S` flag to store generated graphs to a file for reuse across instances * Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread * Enhanced help documentation to explain the new optimization options * Updated command line parsing to handle the new flags and pass them through to the underlying executables 2. **Command line options** (`LeafNodeRankCmdline.ggo`): * Added `store_graph` option to enable saving generated graphs to a specified file * Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones **Performance optimizations:** * **Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version * Reduces benchmark initialization time * Enables consistent testing across runs by using identical graph structures * Improves reproducibility of benchmark results * Optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads Rollback Plan: Differential Revision: D80288337
bdd15a2 to
4f872f3
Compare
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D80288337 |
…ng optimization and eliminate per-thread graph building (facebookresearch#201) Summary: This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities, eliminating redundant graph building across multiple thread runs, and replacing fixed sleep time with checking for server readiness. **Key changes:** 1. **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`): * Added `-S` flag to store generated graphs to a file for reuse across instances * Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread * Added `-I` flag to enable instrumenting graph generation * Enhanced help documentation to explain the new optimization options * Updated command line parsing to handle the new flags and pass them through to the underlying executables 2. **Command line options** (`LeafNodeRankCmdline.ggo`): * Added `store_graph` option to enable saving generated graphs to a specified file * Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones * Added `instrument_graph` option to enable measuring the time for graph generation 3. **Performance optimizations:** * Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version. This also optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads * Reduces benchmark initialization time by replacing the fixed sleep time with checking for server readiness Differential Revision: D80288337
4f872f3 to
f74fbc8
Compare
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D80288337 |
facebook-github-bot
pushed a commit
that referenced
this pull request
Aug 30, 2025
…ng optimization and eliminate per-thread graph building (#201) Summary: Pull Request resolved: #201 This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities, eliminating redundant graph building across multiple thread runs, and replacing fixed sleep time with checking for server readiness. **Key changes:** 1. **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`): * Added `-S` flag to store generated graphs to a file for reuse across instances * Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread * Added `-I` flag to enable instrumenting graph generation * Enhanced help documentation to explain the new optimization options * Updated command line parsing to handle the new flags and pass them through to the underlying executables 2. **Command line options** (`LeafNodeRankCmdline.ggo`): * Added `store_graph` option to enable saving generated graphs to a specified file * Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones * Added `instrument_graph` option to enable measuring the time for graph generation 3. **Performance optimizations:** * Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version. This also optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads * Reduces benchmark initialization time by replacing the fixed sleep time with checking for server readiness Reviewed By: excelle08 Differential Revision: D80288337 fbshipit-source-id: 9b1fc935d3c3106e44dd8ef3238b78f953e1e58a
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
fb-exported
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities. It also eliminates redundant graph building across multiple thread runs, improving performance and scalability.
Key changes:
Shell script enhancements (
run-feedsim-multi.sh,run.sh):-Sflag to store generated graphs to a file for reuse across instances-Lflag to load pre-generated graphs from a file instead of rebuilding per threadCommand line options (
LeafNodeRankCmdline.ggo):store_graphoption to enable saving generated graphs to a specified fileload_graphoption to enable loading graphs from a specified file instead of generating new onesPerformance optimizations:
This optimization is particularly impactful for ranking workloads in multi-threaded scenarios where graph generation was previously a significant bottleneck, with each thread duplicating expensive graph building work.
Rollback Plan:
Differential Revision: D80288337