Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel map step for DistributedDataAnalyzer map-reduce #5291

Merged

Conversation

bm-synth
Copy link
Contributor

@bm-synth bm-synth commented Mar 17, 2024

  • adds multi CPU-processing to the DistributedDataAnalyzer map operation (parallelism set with parameter num_workers). Works with a SharedMemory / Manager's queue per metric, written concurrently by processes.
  • much faster write_buffer_to_file in DistributedDataAnalyzer reduce operation by copying to cpu and "detaching" output tensor.

@bm-synth bm-synth changed the title Parallel run in distributed data analyzer Parallel map step for DistributedDataAnalyzer map-reduce Mar 17, 2024
@bm-synth bm-synth marked this pull request as ready for review March 17, 2024 15:19
@bm-synth bm-synth mentioned this pull request Mar 18, 2024
@bm-synth
Copy link
Contributor Author

@loadams @conglongli what's holding this PR?

@conglongli conglongli self-assigned this Apr 18, 2024
@conglongli conglongli added this pull request to the merge queue Apr 18, 2024
Merged via the queue into microsoft:master with commit 64defe6 Apr 18, 2024
12 checks passed
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
…#5291)

- adds multi CPU-processing to the `DistributedDataAnalyzer` map
operation (parallelism set with parameter `num_workers`). Works with a
`SharedMemory` / `Manager's` queue per metric, written concurrently by
processes.
- much faster `write_buffer_to_file` in `DistributedDataAnalyzer` reduce
operation by copying to cpu and "detaching" output tensor.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
…#5291)

- adds multi CPU-processing to the `DistributedDataAnalyzer` map
operation (parallelism set with parameter `num_workers`). Works with a
`SharedMemory` / `Manager's` queue per metric, written concurrently by
processes.
- much faster `write_buffer_to_file` in `DistributedDataAnalyzer` reduce
operation by copying to cpu and "detaching" output tensor.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants