Skip to content

Conversation

@clumsy
Copy link
Contributor

@clumsy clumsy commented Apr 20, 2023

I used PyTorch-Ignite metrics that were all-gathered on both NCCL and Gloo backends.
In one of the cases the tensor (buffer) was empty and while NCCL tolerated this (hopefully as a no-op), Gloo gave me SIGFPE(8). I think this is because of the modulo by dataSize present today in Gloo all-gather code paths.
I believe this can be a no-op for Gloo, especially since contextSize=1 (rank) is already a no-op.

@facebook-github-bot
Copy link

@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link

@malfet merged this pull request in d96897b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants