New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make python remote exception to rethrow when using remote reference to itself #29930
Conversation
…o itself Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue() close #29924 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/) [ghstack-poisoned]
…o itself Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue() close #29924 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/) ghstack-source-id: 94037095 Pull Request resolved: #29930
Looks like we have a duplicate PR: #29929? |
torch/csrc/distributed/rpc/rref.cpp
Outdated
template <typename T> | ||
const T& OwnerRRef<T>::getValue() const { | ||
template <> | ||
const IValue& OwnerRRef<IValue>::getValue() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to separate IValue and python objects like this? Shouldn't the same code work in python/torchscript mode since this is C++ code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because python object and IValue exception handing is different per previous request.
For IValue exception handling, it catches C++ exception and returns a message with message::Exception, then exception will be rethrown based on message::Exception type.
For pyobject exception handing, it did not throw a C++ exception, but catch it in python side and wrap it into a RemoteException to caller. Caller will check this returned result is RemoteException or not and then rethrown. This is based on previous request that we need to return python call stack trace to users. C++ exception can only return a message, no stack trace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing! Shall we add tests for both to_here
and local_value
for both local (creating an RRef
locally) and remote RRef?
torch/csrc/distributed/rpc/rref.cpp
Outdated
// TODO: use callback to make this non-blocking | ||
std::unique_lock<std::mutex> lock(mutex_); | ||
valueCV_.wait(lock, [this] { return value_.has_value(); }); | ||
PythonRpcHandler::getInstance().handleException(value_.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this inconsistent with what we do in PYTHON_CALL
? In PYTHON_CALL
the exception is wrapped into payload in generatePythonUDFResult
, and then the exception message is sent to the caller and thrown there in toPyObjInternal
. But here, the exception is thrown on the callee (owner) and relies on RequestCallback::operator()
to catch it and convert that into an exception message? Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both PYTHON_CALL and PYTHON_FETCH_CALL re-throw python exception during deserialization.
Here, callee and caller are in the same process, did not go through deserialization, so skipped python exception re-thrown
@mrshenli and @pritamdamania it is my bad to make it confusing, let me move python exception rethrown to PyRRef::localValue() where only caller can call it. |
…reference to itself" Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue() close #29924 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/) [ghstack-poisoned]
…o itself Pull Request resolved: #29930 Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue() close #29924 ghstack-source-id: 94130312 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Please make sure all tests pass before landing.
@zhaojuanmao please update the PR/diff description accordingly. |
…reference to itself" Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue() close #29924 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/) [ghstack-poisoned]
…o itself Pull Request resolved: #29930 Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here() close #29924 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/) ghstack-source-id: 94148099
checked these tests actually finished, but CI showed they are pending |
@pytorchbot retest please |
…reference to itself" Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue() close #29924 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/) [ghstack-poisoned]
…o itself Pull Request resolved: #29930 Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here() close #29924 ghstack-source-id: 94188833 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/)
@@ -119,6 +119,9 @@ def _run_function(binary_data, tensor_table): | |||
result = RemoteException(except_str) | |||
return result | |||
|
|||
def _handle_exception(result): | |||
if isinstance(result, RemoteException): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question/nit, do we lost the original Exception type while wrapping it as RemoteException
? If we are able to retrieve the original type, rethrowing it as the original type will be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzzwen all exception type, msg and stack trace are wrapped into remoteException and users can read these info in client.
some tests stuck again, rerun workflow did not trigger to retest it somehow. for safety, I'm rebasing it and try one more round of tests again. |
…reference to itself" Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue() close #29924 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/) [ghstack-poisoned]
…o itself Pull Request resolved: #29930 Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here() close #29924 ghstack-source-id: 94210894 Differential Revision: [D18541916](https://our.internmc.facebook.com/intern/diff/D18541916/)
This pull request has been merged in b410d86. |
Stack from ghstack:
Right now, python call remote exception rethrown is coupled with deserializtiaon.
For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown.
This diff is to move remote exception rethrown out of deserialization, and exception can be handled for owner ref getValue()
close #29924
Differential Revision: D18541916