New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rpc] special case tensor type check when getting RRef #33582
Conversation
[ghstack-poisoned]
Differential Revision: [D20009837](https://our.internmc.facebook.com/intern/diff/D20009837) [ghstack-poisoned]
💊 CircleCI build failures summary and remediationsAs of commit 44f99e4: None of the build failures appear to be your fault.
Detailed failure analysisOne may explore the probable reasons each build failed interactively on the Dr. CI website. ❄️ 1 failure recognized as flakyThe following build failures have been detected as flaky and may not be your fault: pytorch_linux_xenial_cuda10_1_cudnn7_py3_gcc7_test (1/1)Step: "Test" (full log | pattern match details) ❄️
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are real test failures, please see inline comments as well
TORCH_INTERNAL_ASSERT(ownerRRef->type()->isSubtypeOf(TensorType::get())); | ||
} else { | ||
TORCH_INTERNAL_ASSERT(ownerRRef->type() == type); | ||
} | ||
return ownerRRef; | ||
} else { | ||
return createUserRRef(ownerId, rrefId, forkId, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about createUserRRef case when type is not exactly matched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc createUserRRef
will not try to find if there's an existing UserRRef in the RRefContext, so there would be no type match problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean whether there will be an issue if one user created user rref with subtypeOf(TensorType::get()), then the user shared this user rref to another user, and another user will create user rref here with a TensorType::get(). iiuc, the owner rref will have subtypeof(TensorType::get()).
So some user rref will have slightly different type based on above, I'm wondering whether this will be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plain TensorType is always a subtype of specialized SubTensorType, the only reason why this is failing is that we have a Type equal assertion here, I didn't see this in other places. For the case that you described, the forked UserRRef
will holding the plain TensorType, which is subtype compatible with the SubTensorType, which should be safe. In fact, we can only get into your described case when we fist run the ScriptFunction locally, then call rpc.remote
on this ScriptFunction again remotely. But when we run the ScriptFunction in remote, we shouldn't preserve the Specialized SubTensorType information (because that's the information we get from the local run). So I think the fix here should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explaining! Sounds good, would you please also add this to comment?
Differential Revision: [D20009837](https://our.internmc.facebook.com/intern/diff/D20009837) [ghstack-poisoned]
Differential Revision: [D20009837](https://our.internmc.facebook.com/intern/diff/D20009837) [ghstack-poisoned]
Summary: Pull Request resolved: #33582 Test Plan: Imported from OSS Differential Revision: D20009837 Pulled By: wanchaol fbshipit-source-id: 7e9ab87d4dddb822c7575891a2b620eff83bfa00
Summary: Pull Request resolved: pytorch#33582 Test Plan: Imported from OSS Differential Revision: D20009837 Pulled By: wanchaol fbshipit-source-id: 7e9ab87d4dddb822c7575891a2b620eff83bfa00
Stack from ghstack:
Differential Revision: D20009837