Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scatter_object_list API for c10d #43930

Closed
wants to merge 9 commits into from

Commits on Sep 1, 2020

  1. scatter_object_list API for c10d

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Sep 1, 2020
    Configuration menu
    Copy the full SHA
    c46dec2 View commit details
    Browse the repository at this point in the history
  2. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Sep 1, 2020
    Configuration menu
    Copy the full SHA
    12c66ea View commit details
    Browse the repository at this point in the history
  3. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Sep 1, 2020
    Configuration menu
    Copy the full SHA
    81e504f View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2020

  1. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Sep 2, 2020
    Configuration menu
    Copy the full SHA
    fde8a9c View commit details
    Browse the repository at this point in the history
  2. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Sep 2, 2020
    Configuration menu
    Copy the full SHA
    88ec2b7 View commit details
    Browse the repository at this point in the history

Commits on Sep 4, 2020

  1. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Sep 4, 2020
    Configuration menu
    Copy the full SHA
    d91b3ae View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2020

  1. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Dec 2, 2020
    Configuration menu
    Copy the full SHA
    5bde216 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2020

  1. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Dec 3, 2020
    Configuration menu
    Copy the full SHA
    137931f View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2020

  1. Update on "scatter_object_list API for c10d"

    Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.
    
    The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.
    
    Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.
    
    Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.
    
    It only works for Gloo because NCCL doesn't support scatter.
    
    Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)
    
    **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
    
    [ghstack-poisoned]
    rohan-varma committed Dec 4, 2020
    Configuration menu
    Copy the full SHA
    370522c View commit details
    Browse the repository at this point in the history