Skip to content

Conversation

@marziehlenjaniMeta
Copy link

Summary:
This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities. It also eliminates redundant graph building across multiple thread runs, improving performance and scalability.

Key changes:

  1. Shell script enhancements (run-feedsim-multi.sh, run.sh):

    • Added -S flag to store generated graphs to a file for reuse across instances
    • Added -L flag to load pre-generated graphs from a file instead of rebuilding per thread
    • Enhanced help documentation to explain the new optimization options
    • Updated command line parsing to handle the new flags and pass them through to the underlying executables
  2. Command line options (LeafNodeRankCmdline.ggo):

    • Added store_graph option to enable saving generated graphs to a specified file
    • Added load_graph option to enable loading graphs from a specified file instead of generating new ones

Performance optimizations:

  • Eliminates per-thread graph building overhead: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version
  • Reduces benchmark initialization time by orders of magnitude for multi-instance runs
  • Enables consistent testing across runs by using identical graph structures
  • Improves reproducibility of benchmark results
  • Optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads

This optimization is particularly impactful for ranking workloads in multi-threaded scenarios where graph generation was previously a significant bottleneck, with each thread duplicating expensive graph building work.

Rollback Plan:

Differential Revision: D80288337

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 14, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80288337

@marziehlenjaniMeta marziehlenjaniMeta force-pushed the export-D80288337-to-v2-beta branch from 9168660 to bdd15a2 Compare August 25, 2025 20:19
marziehlenjaniMeta added a commit to marziehlenjaniMeta/DCPerf that referenced this pull request Aug 25, 2025
…te per-thread graph building (facebookresearch#201)

Summary:

This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities. It also eliminates redundant graph building across multiple thread runs, improving performance and scalability.

**Key changes:**

1.  **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`):

    *   Added `-S` flag to store generated graphs to a file for reuse across instances
    *   Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread
    *   Enhanced help documentation to explain the new optimization options
    *   Updated command line parsing to handle the new flags and pass them through to the underlying executables
2.  **Command line options** (`LeafNodeRankCmdline.ggo`):

    *   Added `store_graph` option to enable saving generated graphs to a specified file
    *   Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones

**Performance optimizations:**

*   **Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version
*   Reduces benchmark initialization time 
*   Enables consistent testing across runs by using identical graph structures
*   Improves reproducibility of benchmark results
*   Optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads

Rollback Plan:

Differential Revision: D80288337
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80288337

marziehlenjaniMeta added a commit to marziehlenjaniMeta/DCPerf that referenced this pull request Aug 28, 2025
…te per-thread graph building (facebookresearch#201)

Summary:

This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities. It also eliminates redundant graph building across multiple thread runs, improving performance and scalability.

**Key changes:**

1.  **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`):

    *   Added `-S` flag to store generated graphs to a file for reuse across instances
    *   Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread
    *   Enhanced help documentation to explain the new optimization options
    *   Updated command line parsing to handle the new flags and pass them through to the underlying executables
2.  **Command line options** (`LeafNodeRankCmdline.ggo`):

    *   Added `store_graph` option to enable saving generated graphs to a specified file
    *   Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones

**Performance optimizations:**

*   **Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version
*   Reduces benchmark initialization time 
*   Enables consistent testing across runs by using identical graph structures
*   Improves reproducibility of benchmark results
*   Optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads

Rollback Plan:

Differential Revision: D80288337
@marziehlenjaniMeta marziehlenjaniMeta force-pushed the export-D80288337-to-v2-beta branch from bdd15a2 to 4f872f3 Compare August 28, 2025 21:53
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80288337

…ng optimization and eliminate per-thread graph building (facebookresearch#201)

Summary:

This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities, eliminating redundant graph building across multiple thread runs, and replacing fixed sleep time with checking for server readiness. 

**Key changes:**

1.  **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`):

    *   Added `-S` flag to store generated graphs to a file for reuse across instances
    *   Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread
    *   Added `-I` flag to enable instrumenting graph generation
    *   Enhanced help documentation to explain the new optimization options
    *   Updated command line parsing to handle the new flags and pass them through to the underlying executables
2.  **Command line options** (`LeafNodeRankCmdline.ggo`):

    *   Added `store_graph` option to enable saving generated graphs to a specified file
    *   Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones
    * Added `instrument_graph` option to enable measuring the time for graph generation

3. **Performance optimizations:**

*  Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version. This also optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads
*   Reduces benchmark initialization time by replacing the fixed sleep time with checking for server readiness

Differential Revision: D80288337
@marziehlenjaniMeta marziehlenjaniMeta force-pushed the export-D80288337-to-v2-beta branch from 4f872f3 to f74fbc8 Compare August 29, 2025 18:46
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80288337

facebook-github-bot pushed a commit that referenced this pull request Aug 30, 2025
…ng optimization and eliminate per-thread graph building (#201)

Summary:
Pull Request resolved: #201

This diff enhances the DCPerf Feedsim benchmark by adding graph storage and loading optimization capabilities, eliminating redundant graph building across multiple thread runs, and replacing fixed sleep time with checking for server readiness.

**Key changes:**

1.  **Shell script enhancements** (`run-feedsim-multi.sh`, `run.sh`):

    *   Added `-S` flag to store generated graphs to a file for reuse across instances
    *   Added `-L` flag to load pre-generated graphs from a file instead of rebuilding per thread
    *   Added `-I` flag to enable instrumenting graph generation
    *   Enhanced help documentation to explain the new optimization options
    *   Updated command line parsing to handle the new flags and pass them through to the underlying executables
2.  **Command line options** (`LeafNodeRankCmdline.ggo`):

    *   Added `store_graph` option to enable saving generated graphs to a specified file
    *   Added `load_graph` option to enable loading graphs from a specified file instead of generating new ones
    * Added `instrument_graph` option to enable measuring the time for graph generation

3. **Performance optimizations:**

*  Eliminates per-thread graph building overhead**: Instead of each parallel instance building its own graph, one instance can build and store the graph while others load the pre-built version. This also optimizes memory and CPU usage by avoiding redundant graph generation across parallel threads
*   Reduces benchmark initialization time by replacing the fixed sleep time with checking for server readiness

Reviewed By: excelle08

Differential Revision: D80288337

fbshipit-source-id: 9b1fc935d3c3106e44dd8ef3238b78f953e1e58a
@YifanYuan3 YifanYuan3 closed this Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants