-
Notifications
You must be signed in to change notification settings - Fork 228
Check PyTorch supports TCP_TLS gloo transport #779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, btw, is pytorch/pytorch#58996 going into 1.9?
It seems that it's already included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the wheel building script also have the USE_GLOO_WITH_OPENSLL
flag as well?
Needed for pytorch/builder#779 Co-authored-by: Your Name <driazati@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to be safe with this it'd be good to have a test PR upstream with pytorch/pytorch
to verify this does not introduce a regression in the nightlies
Testing PR can be found here: pytorch/pytorch#59306 Looks like in spite of the environment variable being present for the conda build the test still fails:
|
Needed for pytorch/builder#779 Co-authored-by: Your Name <driazati@users.noreply.github.com>
############################################################################### | ||
# Check PyTorch supports TCP_TLS gloo transport | ||
############################################################################### | ||
|
||
if [[ "$(uname)" == 'Linux' ]]; then | ||
GLOO_DEVICE_TRANSPORT=TCP_TLS MASTER_ADDR=localhost MASTER_PORT=63945 python -c "import torch; import torch.distributed as dist; print(torch.__version__); dist.init_process_group('gloo', rank=0, world_size=1)" | grep "unsupported gloo device" &> /dev/null | ||
RESULT=$? | ||
if [ $RESULT -eq 0 ]; then | ||
echo "PyTorch doesn't support TLS_TCP transport, please set USE_GLOO_WITH_OPENSSL=1" | ||
exit 1 | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
############################################################################### | |
# Check PyTorch supports TCP_TLS gloo transport | |
############################################################################### | |
if [[ "$(uname)" == 'Linux' ]]; then | |
GLOO_DEVICE_TRANSPORT=TCP_TLS MASTER_ADDR=localhost MASTER_PORT=63945 python -c "import torch; import torch.distributed as dist; print(torch.__version__); dist.init_process_group('gloo', rank=0, world_size=1)" | grep "unsupported gloo device" &> /dev/null | |
RESULT=$? | |
if [ $RESULT -eq 0 ]; then | |
echo "PyTorch doesn't support TLS_TCP transport, please set USE_GLOO_WITH_OPENSSL=1" | |
exit 1 | |
fi | |
fi | |
############################################################################### | |
# Check PyTorch supports TCP_TLS gloo transport | |
############################################################################### | |
function built_without_gloo_tls() { | |
GLOO_DEVICE_TRANSPORT=TCP_TLS MASTER_ADDR=localhost MASTER_PORT=63945 python -c "import torch; import torch.distributed as dist; print(torch.__version__); dist.init_process_group('gloo', rank=0, world_size=1)" | grep "unsupported gloo device" &> /dev/null | |
} | |
if [[ "$(uname)" == 'Linux' ]]; then | |
if built_without_gloo_tls; then | |
echo "PyTorch doesn't support TLS_TCP transport, please set USE_GLOO_WITH_OPENSSL=1" | |
exit 1 | |
fi | |
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work, but I also wrote this suggestion in the github UI so it'd be good to double check that it checks out locally as well
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Looks like tests are passing in the tester PR for |
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Depends on pytorch/pytorch#58996