-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dist_context
/RPC for distributed training
#7671
Conversation
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## master #7671 +/- ##
==========================================
- Coverage 91.95% 91.57% -0.39%
==========================================
Files 453 455 +2
Lines 25556 25657 +101
==========================================
- Hits 23500 23495 -5
- Misses 2056 2162 +106
... and 22 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
dist_context
/RPC for distributed training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some initial comments. The biggest one has to do with the use of global state here, where we may have a cleaner implementation if we remove some of these state and consolidate under classes that can be set as instance variables on the feature and graph store.
torch_geometric/distributed/rpc.py
Outdated
def _rpc_remote_async_call(call_id, *args, **kwargs): | ||
r""" Entry for rpc requests """ | ||
return _rpc_call_pool.get(call_id).rpc_async_call(*args, **kwargs) | ||
|
||
|
||
def _rpc_remote_sync_all(call_id, *args, **kwargs): | ||
r""" Entry for rpc requests """ | ||
return _rpc_call_pool.get(call_id).rpc_sync_call(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a global pool here? Can't we just pas the sync or async function and its args directly to rpc_request_async and rpc_request_sync below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For there we will use one pool it to distinguish the different call functions in mp processing for easy code understanding.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. Left a few additional comments on top of the ones from before; once we address these, we should be good to merge this and handle more clean-up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit hard to review this without proper context. Not sure if possible, but would be great to test some of these.
This code belongs to the part of the whole distributed training for PyG.
This class will do
These basic rpc functionality will be used in later feature lookup function after node sampling.
Any comments please let us know. thanks.