Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCPStore constructor arguments mismatch unexpected behavior #49052

Closed
H-Huang opened this issue Dec 8, 2020 · 0 comments
Closed

TCPStore constructor arguments mismatch unexpected behavior #49052

H-Huang opened this issue Dec 8, 2020 · 0 comments
Labels
high priority oncall: distributed Add this issue/PR to distributed oncall triage queue triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@H-Huang
Copy link
Member

H-Huang commented Dec 8, 2020

馃悰 Bug

TCPStore arguments are likely not properly type checked. Documentation examples also needs to be updated to use 5 arguments instead of 4.

To Reproduce

Steps to reproduce the behavior:

import torch.distributed as dist
import datetime

dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))

This is the example given in documentation, which succeeds, but it shouldn't.

Expected behavior

We expect an error like

TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. torch._C._distributed_c10d.TCPStore(host_name: str, port: int, world_size: int, is_master: bool, timeout: datetime.timedelta = datetime.timedelta(seconds=300))

Environment

  • PyTorch Version (e.g., 1.0): master branch
  • OS (e.g., Linux): Ubuntu 18.04.3 LTS (x86_64)
  • Python version: 3.8

cc: @osalpekar

cc @ezyang @gchanan @zou3519 @bdhirsh @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd

@zhangguanheng66 zhangguanheng66 added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 9, 2020
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this issue Jan 6, 2021
Summary:
Fixes pytorch#49052

The TCPStore example with 4 arguments was working because the datetime value was being implicitly converted to a bool. Modified the pybind definition and updated documentation.

Pull Request resolved: pytorch#49685

Test Plan:
```
import torch.distributed as dist
from datetime import timedelta

dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
```

Now fails with
```
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. torch._C._distributed_c10d.TCPStore(host_name: str, port: int, world_size: int, is_master: bool, timeout: datetime.timedelta = datetime.timedelta(seconds=300))

Invoked with: '127.0.0.1', 0, True, datetime.timedelta(seconds=30)
```

Reviewed By: mrshenli, ngimel

Differential Revision: D25668021

Pulled By: H-Huang

fbshipit-source-id: ce40b8648d0a414f0255666fbc680f1a66fae090
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority oncall: distributed Add this issue/PR to distributed oncall triage queue triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants