-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Allow controlling PG backend and options via init_device_mesh #159371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159371
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 1 Unrelated FailureAs of commit 9927ca7 with merge base 908c5cc ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! though it would be nice to also test that the options gets passed correctly. (or maybe this is covered by existing tests?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix and leave some comments. Stamp to unblock
# Inherit from the parent group if no options are specified for the group. | ||
if dim in _mesh_resources.mesh_dim_group_options: | ||
if pg_backend_and_options[dim] != (None, None): | ||
raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm maybe we can let pg_backend_and_options
to override what the mesh has set already? No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to worry right now about which of the two gets priority if they're both specified, so I decided to throw in such cases. Do you have an argument for why the argument should take preference over the global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I think the argument is that if user passed in an option explicitly that means overriding existing one?
torch/distributed/device_mesh.py
Outdated
if mesh_dim_name in pg_backend_and_options: | ||
if mesh_dim_idx in pg_backend_and_options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you might want to consolidate these two into one if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we do (like now):
if a:
if b:
raise
do()
or (as you suggest)
if a and b:
raise
if a:
do()
I prefer the first one since it doesn't repeat a
twice.
torch/distributed/device_mesh.py
Outdated
mesh_shape: tuple[int, ...], | ||
*, | ||
mesh_dim_names: Optional[tuple[str, ...]] = None, | ||
pg_backend_and_options: Optional[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you kindly add a comment in L1040 to say that the key here can be a dim and a dim name?
torch/distributed/device_mesh.py
Outdated
if pg_backend_and_options is not None: | ||
pg_backend_and_options_tuple = tuple( | ||
pg_backend_and_options.pop(i, (None, None)) | ||
for i in range(len(mesh_shape)) | ||
) | ||
if pg_backend_and_options: | ||
raise RuntimeError( | ||
f"Found invalid keys in pg_backend_and_options: got {list(pg_backend_and_options.keys())}, " | ||
f"expected integers in range [0, {len(mesh_shape)}) or one of {mesh_dim_names or []}" | ||
) | ||
else: | ||
pg_backend_and_options_tuple = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance we can move this logic into the previous for-loop? Looks like we are looping the mesh twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous for-loop is only entered if mesh_dim_names
is not None. So we could merge this into a single loop, but this loop would then have to do redundant checks on mesh_dim_names
, and that would also be redundant (in a different way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the feature! I have some suggestions on how the UX should be look like, requesting changes to align on the UX.
My opinion is that we should separate the backend selection and the pg options, instead of blending them together, this would make the argument more concise. Here is the proposed UX:
# switch between different backends to be controlled by the device_type argument, this is aligned with how process group backend selection works
init_device_mesh("cuda:nccl", ..)
# Specify backend options can be passed in by sth like
init_device_mesh("cuda:nccl", mesh_shape, mesh_dim_names=("dp, "tp), mesh_dim_options={"dp": nccl_pg_options})
Basically mesh_dim_options
should be a type of Dict[str, C10dBackend.Options]
, I think we can support str
as key only but adding int
together also sounds good.
torch/distributed/device_mesh.py
Outdated
mesh_shape: tuple[int, ...], | ||
*, | ||
mesh_dim_names: Optional[tuple[str, ...]] = None, | ||
pg_backend_and_options: Optional[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I would prefer to:
- separate the pg backend and pg options, these should be separately controlled IMO
- name this as
mesh_dim_options
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it useless to prepend mesh_dim_
: this is the function to create a device mesh, it's quite implicit!
On the other hand, I want to explicitly mention pg_
or group_
or something to clarify it's not the options of the mesh dim itself, but of the underlying ProcessGroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it useless to prepend mesh_dim_: this is the function to create a device mesh, it's quite implicit!
Sure that make sense! I was originally thinking to align it with mesh_dim_names
, but I agree it's redundant. Shall we simply call it options
? Maybe we should change mesh_dim_names
to dim_names
or simply names
later
On the other hand, I want to explicitly mention pg_ or group_ or something to clarify it's not the options of the mesh dim itself, but of the underlying ProcessGroup.
I wonder what does you mean by the options of the mesh dim itself
? I thought there's nothing can be configured (or have options) on a mesh dim except its underlying communicator have options, so it would be quite explicit to user? I am ok with group_options
, but I think if there's no confusion about options
we should just go with options
?
I'm not sure I agree with the comments on UX.
What I can propose is that we allow the values of the dict to also be simple |
I think the problem of the blending the concept of backend and pg option is that: many times user actually only want to specify the backend, or only want to specify the options, there're very rare chance that user want to specify them together, so IMO we should try to make the common case experience be better, i.e. what if user just pass pg options, they don't want to specify the backend? when specifying options for the multi-dim mesh the backend needs to be specified again and again.
I wonder how it make things much verbose if one want to give both a backend type and an options? My proposal is to not make a separate
In current way it would be:
I think it would become even simpler if user want to specify the pg options to multiple dimension of the device mesh, i.e. the proposed UX would be:
In the current way it would be:
It seems to me that repeating the "nccl" backend and with nested dict looks cumbersome. There should not be out of sync issue as the backend is parsed from device_type and we can easily guard it. |
here's a clarifying question: how important is it to support a different backend type per dimension? if so, then this version is limiting |
@wconstab I am not sure about how torchft exactly wants to change the backend for each dimension, or I think it's not very clear to me that even having different backend on each dimension make sense, i.e. the way to create subgroup is to inherit the backend of the world group (and need to rely on the backend way to split the group, i.e, ncclSplit), and many operations of device mesh today ( |
I think we better not assume the world group is materialized. It may be important to have it for some jobs, but it is also too expensive to create a world in larger scale jobs. So I prefer the direction of decoupling the top level world from the first layer of dims. And in this case I think it can make sense to allow separate dims to use different backends. But I also agree the semantic of split should have a defined meaning corresponding to a pg split. |
Yeah sure, I am not assuming the world group is materialized, and I think it's probably not a good idea to even materialize the world pg at all when it's multi-dim mesh. But even though it don't materialize the world pg, we should avoid undefined behaviors as much as possible (i.e., what does If someone really want to have a separate backend for some certain ranks/dims, imo they can either manage it as a separate process group by hand, or manage a separate DeviceMesh that created directly by the |
I'm gonna make a call in order to close this discussion, which is to proceed as follow: I'll rename the argument to |
struct Options : Backend::Options { | ||
explicit Options() : Backend::Options("fake") {} | ||
|
||
int fake_option = 0; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to define options for the fake PG in order to test the options-only override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess this is fine. Maybe later we can replace it with a real option
|
||
def init_pg(self, eager_init) -> None: | ||
if "nccl" in self.backend and torch.cuda.device_count() < self.world_size: | ||
def init_pg(self, eager_init, backend: Optional[str] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to modify this (and below in this file) in order to allow using the "fake" backend as the root PG
|
||
|
||
def _create_fake_pg(prefix_store, rank, world_size, timeout): | ||
def _create_fake_pg(common_opts, backend_opts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to using the "extended API" as otherwise the options wouldn't get passed to the FakePG constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering it this will cause any bc breaking for some tests?
I've added support for the pg override to |
test/distributed/test_device_mesh.py
Outdated
self.device_type, | ||
(2, 2, 2), | ||
mesh_dim_names=("dp", "tp", "cp"), | ||
pg_override={0: "fake", 2: ("fake", opts)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what if user create a 4D or 5D device mesh, and they want to use their custom backend (i.e. even the fake backend testing here), but they don't need to customize the options for device mesh dimensions? This is pretty common for non-cuda backend vendors, with the current API user would need to do sth like this:
mesh = init_device_mesh(
self.device_type,
(2, 2, 2, 2, 2),
mesh_dim_names=("pp", "dp", "tp", "cp", "ep"),
pg_override={0: "fake", 1: "fake", 2: "fake", 3: "fake", 4: "fake"}
)
As a user, i feel this is still very cumbersome, but if I don't do this, some of the mesh dimension would just silently use the "default" backend for the current device type..
@lw Can we start exposing this argument as a private API (both in |
I don't understand if this is a use-case you're hitting in practice or if you're imagining corner cases where the API would be awkward to use? The new API addresses what you asked earlier (one can provide just the options if one so chooses), but now you're evoking a case where one does want to change the backend? In order to pragmatically address your example use-case, I think the user should just use Note also that no API in PyTorch is truly private. No matter whether there's a leading underscore, if that's the only way to achieve what people need, they will use it in their downstream project and PyTorch won't be able to remove it or break it. Case in point: the Do you have a concrete proposal that addresses your concerns? If not, I'll probably merge this PR as-is. |
@lw To be clear, my original comment is about both: "user actually only want to specify the backend, or only want to specify the options". The new API only address one of the ask, so I am not sure why you think I evoke a new case, as I am just discussing the existing ones. The case that user only specify the backend is pretty common, i.e. for HCCL users, or any users who implemented custom process group and register as a backend (we have one so it's something we would hit with the current API).
The private API is mostly for developers to have the freedom of adapt/change the API depending on new usecases/feedbacks before making it stable, otherwise every BC breaking change need to go through 3 release cycle. It's also a signal to users that PyTorch can remove/break the API so use at risk. If we don't have private APIs then we don't need to have prototype features, everything can be public and develop/use it prod, I don't think PyTorch want to be in that way.
I already gave my suggestion in my previous comments, basically we can embed the backend as a substring of the device_type, i.e. |
Thanks everyone! |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: Check mergeability of ghstack PR / ghstack-mergeability-check Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f "Failures due to CI SEV #159825" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Pull Request resolved: #159371 Approved by: https://github.com/wconstab, https://github.com/fduwjj, https://github.com/wanchaol
…h#159371) Pull Request resolved: pytorch#159371 Approved by: https://github.com/wconstab, https://github.com/fduwjj, https://github.com/wanchaol
We allow passing in PG option via #159371 and we did a clean up of Meta internal usage of `_set_mesh_dim_group_options`, since this a private API, we don't have any bc guarantee, we want to directly remove so that people use the new behavior from now on. Also since we now allow passing pg in both DeviceMesh constructor and flatten API, so that we also want to get rid of the global pg option override variable. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
…group_options API" We allow passing in PG option via #159371 and we did a clean up of Meta internal usage of `_set_mesh_dim_group_options`, since this a private API, we don't have any bc guarantee, we want to directly remove so that people use the new behavior from now on. Also since we now allow passing pg in both DeviceMesh constructor and flatten API, so that we also want to get rid of the global pg option override variable. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
We allow passing in PG option via #159371 and we did a clean up of Meta internal usage of `_set_mesh_dim_group_options`, since this a private API, we don't have any bc guarantee, we want to directly remove so that people use the new behavior from now on. Also since we now allow passing pg in both DeviceMesh constructor and flatten API, so that we also want to get rid of the global pg option override variable. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]
We allow passing in PG option via #159371 and we did a clean up of Meta internal usage of `_set_mesh_dim_group_options`, since this a private API, we don't have any bc guarantee, we want to directly remove so that people use the new behavior from now on. Also since we now allow passing pg in both DeviceMesh constructor and flatten API, so that we also want to get rid of the global pg option override variable. Pull Request resolved: #164750 Approved by: https://github.com/lw, https://github.com/fegin
Stack from ghstack (oldest at bottom):
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta