Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TCPStore type coercion #49685

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
35 changes: 24 additions & 11 deletions torch/csrc/distributed/c10d/init.cpp
Expand Up @@ -383,7 +383,8 @@ value with the new supplied ``value``.

Example::
>>> import torch.distributed as dist
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> from datetime import timedelta
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> store.set("first_key", "first_value")
>>> # Should return "first_value"
>>> store.get("first_key")
Expand Down Expand Up @@ -411,7 +412,8 @@ when initializing the store, before throwing an exception.

Example::
>>> import torch.distributed as dist
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> from datetime import timedelta
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> store.set("first_key", "first_value")
>>> # Should return "first_value"
>>> store.get("first_key")
Expand All @@ -434,8 +436,9 @@ in an exception.

Example::
>>> import torch.distributed as dist
>>> from datetime import timedelta
>>> # Using TCPStore as an example, other store types can also be used
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> store.add("first_key", 1)
>>> store.add("first_key", 6)
>>> # Should return 7
Expand All @@ -461,8 +464,9 @@ Deletes the key-value pair associated with ``key`` from the store. Returns

Example::
>>> import torch.distributed as dist
>>> from datetime import timedelta
>>> # Using TCPStore as an example, HashStore can also be used
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> store.set("first_key")
>>> # This should return true
>>> store.delete_key("first_key")
Expand All @@ -487,8 +491,9 @@ the workers using the store.

Example::
>>> import torch.distributed as dist
>>> from datetime import timedelta
>>> # Using TCPStore as an example, other store types can also be used
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> store.set("first_key", "first_value")
>>> # This should return 2
>>> store.num_keys()
Expand All @@ -506,8 +511,9 @@ Sets the store's default timeout. This timeout is used during initialization and

Example::
>>> import torch.distributed as dist
>>> from datetime import timedelta
>>> # Using TCPStore as an example, other store types can also be used
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> store.set_timeout(timedelta(seconds=10))
>>> # This will throw an exception after 10 seconds
>>> store.wait(["bad_key"])
Expand All @@ -528,8 +534,9 @@ will throw an exception.

Example::
>>> import torch.distributed as dist
>>> from datetime import timedelta
>>> # Using TCPStore as an example, other store types can also be used
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> # This will throw an exception after 30 seconds
>>> store.wait(["bad_key"])
)")
Expand All @@ -551,8 +558,9 @@ if the keys have not been set by the supplied ``timeout``.

Example::
>>> import torch.distributed as dist
>>> from datetime import timedelta
>>> # Using TCPStore as an example, other store types can also be used
>>> store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
>>> # This will throw an exception after 10 seconds
>>> store.wait(["bad_key"], timedelta(seconds=10))
)");
Expand Down Expand Up @@ -617,8 +625,11 @@ pair, :meth:`~torch.distributed.store.get` to retrieve a key-value pair, etc.

Example::
>>> import torch.distributed as dist
>>> server_store = dist.TCPStore("127.0.0.1", 0, True, timedelta(seconds=30))
>>> client_store = dist.TCPStore("127.0.0.1", 0, False)
>>> from datetime import timedelta
>>> # Run on process 1 (server)
>>> server_store = dist.TCPStore("127.0.0.1", 1234, 2, True, timedelta(seconds=30))
>>> # Run on process 2 (client)
>>> client_store = dist.TCPStore("127.0.0.1", 1234, 2, False)
>>> # Use any of the store methods from either the client or server after initialization
>>> server_store.set("first_key", "first_value")
>>> client_store.get("first_key")
Expand All @@ -633,7 +644,9 @@ Example::
py::arg("host_name"),
py::arg("port"),
py::arg("world_size"),
py::arg("is_master"),
// using noconvert() requires this argument to be True or False
// prevents accidental implicit conversion to bool
py::arg("is_master").noconvert(),
H-Huang marked this conversation as resolved.
Show resolved Hide resolved
py::arg("timeout") =
std::chrono::milliseconds(::c10d::Store::kDefaultTimeout));

Expand Down