-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Support composite of lowered sub modules of the same backend #59921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 3e9657c (more details on the Dr. CI page and at hud.pytorch.org/pr/59921):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
@iseeyuan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
// Generate LoweredModule. | ||
Module loweredModule( | ||
"torch.jit." + backend_name + "LoweredModule", | ||
"torch.jit." + backend_name + "_" + module_name + "_LoweredModule", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are we sure that two instances of the same module type will not get lowered separately? What happens if that is the case? We will run into the same issue, no? I think we should probably use some unique id as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we also discussed this back then (the need some unique ID).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the qualified name of a module is unique. @suo and @SplitInfinity , could you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think thats the case. You can instantiate one module type with many instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @kimishpatel means is that you can have a parent module with 2 submodules of the same type.
So the submodule will indeed have a unique name, but then it will show up twice in the parent module.
If that has a problem then we need to mangle another unique id; otherwise, then what you did is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me quickly confirm "a parent module with 2 submodules of the same type" case and see if they have the same qualified name. Thanks @raziel and @kimishpatel !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After diving deeper, there are actually two issues:
- The issue for delegation that this PR is to resolve. Basically, a new Module is constructed here, with a class name as the first argument of the constructor. This class name should reflect the original class name. After this fix, different classes lowered to the same backend would also have different class names.
- General issue of bytecode serialization. For the same original class name but different instances (@kimishpatel and @raziel raised here). It is handled in TorchScript serialization, by mangling the names in a TypeNameUniquer. However, it's not mangled when serializing bytecode. As a result, there is discrepancy between the names in TS and in bytecode. This discrepancy appears only once at the first serialization. When it's loaded again the mangled name would be taken from bytecode. T93782563 is created to follow up this issue.
So for issue #1 we need to resolve it anyway I think this PR is still valid for that. The unique id for an instance is a separate issue and can be addressed in a centralized place in bytecode serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
"For the same original class name but different instances. It is handled in TorchScript serialization, by mangling the names in a TypeNameUniquer. However, it's not mangled when serializing bytecode."
So this is a general issue in TS that potentially affects any custom class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a general issue in TS that potentially affects any custom class?
I think so. Not only for custom classes but for all class types. TS has handled it using type_name_uniquer_
in ScriptModuleSerializer
. I think bytecode needs to reflect the same pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we need to add some unique id to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
For what you said the remaining issue (at the instance level) should be solved somewhere else in the code, and this fix (at the type/class level) still makes sense.
… same backend" For a certain backend, the lowered models has a fixed name as `"torch.jit." + backend_name + "_LoweredModule"`. There is an issue of composite situations, where two different submodules are both lowered to the same backend. The submodule names are identical. It causes a bug in bytecode serialization, where the module names are not mangled (a follow up PR could be put to mangle the names to make them unique). As a result, the `__setstate__` method is only serialized for one submodule because of the name conflict. When loading other modules, the corresponding `__setstate__` cannot be found and run. The sub module is loaded as an ordinary nn module with properties in a dictionary, causing crash with error message, ""Expected GenericDict but got Tuple". In this PR it's fixed by adding the submodule's original (unique) qualified name to the lowered module name. It's also good for human understanding and debugging purposes. Test: Added unit test of `BackendTest.TestComposite` CI Differential Revision: [D29091143](https://our.internmc.facebook.com/intern/diff/D29091143) [ghstack-poisoned]
@iseeyuan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
… same backend" For a certain backend, the lowered models has a fixed name as `"torch.jit." + backend_name + "_LoweredModule"`. There is an issue of composite situations, where two different submodules are both lowered to the same backend. The submodule names are identical. It causes a bug in bytecode serialization, where the module names are not mangled (a follow up PR could be put to mangle the names to make them unique). As a result, the `__setstate__` method is only serialized for one submodule because of the name conflict. When loading other modules, the corresponding `__setstate__` cannot be found and run. The sub module is loaded as an ordinary nn module with properties in a dictionary, causing crash with error message, ""Expected GenericDict but got Tuple". In this PR it's fixed by adding the submodule's original (unique) qualified name to the lowered module name. It's also good for human understanding and debugging purposes. Test: Added unit test of `BackendTest.TestComposite` CI Differential Revision: [D29091143](https://our.internmc.facebook.com/intern/diff/D29091143) [ghstack-poisoned]
@iseeyuan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
… same backend" For a certain backend, the lowered models has a fixed name as `"torch.jit." + backend_name + "_LoweredModule"`. There is an issue of composite situations, where two different submodules are both lowered to the same backend. The submodule names are identical. It causes a bug in bytecode serialization, where the module names are not mangled (a follow up PR could be put to mangle the names to make them unique). As a result, the `__setstate__` method is only serialized for one submodule because of the name conflict. When loading other modules, the corresponding `__setstate__` cannot be found and run. The sub module is loaded as an ordinary nn module with properties in a dictionary, causing crash with error message, ""Expected GenericDict but got Tuple". In this PR it's fixed by adding the submodule's original (unique) qualified name to the lowered module name. It's also good for human understanding and debugging purposes. Test: Added unit test of `BackendTest.TestComposite` CI Differential Revision: [D29091143](https://our.internmc.facebook.com/intern/diff/D29091143) [ghstack-poisoned]
@iseeyuan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
… same backend" For a certain backend, the lowered models has a fixed name as `"torch.jit." + backend_name + "_LoweredModule"`. There is an issue of composite situations, where two different submodules are both lowered to the same backend. The submodule names are identical. It causes a bug in bytecode serialization, where the module names are not mangled (a follow up PR could be put to mangle the names to make them unique). As a result, the `__setstate__` method is only serialized for one submodule because of the name conflict. When loading other modules, the corresponding `__setstate__` cannot be found and run. The sub module is loaded as an ordinary nn module with properties in a dictionary, causing crash with error message, ""Expected GenericDict but got Tuple". In this PR it's fixed by adding the submodule's original (unique) qualified name to the lowered module name. It's also good for human understanding and debugging purposes. Test: Added unit test of `BackendTest.TestComposite` CI Differential Revision: [D29091143](https://our.internmc.facebook.com/intern/diff/D29091143) [ghstack-poisoned]
… same backend" For a certain backend, the lowered models has a fixed name as `"torch.jit." + backend_name + "_LoweredModule"`. There is an issue of composite situations, where two different submodules are both lowered to the same backend. The submodule names are identical. It causes a bug in bytecode serialization, where the module names are not mangled (a follow up PR could be put to mangle the names to make them unique). As a result, the `__setstate__` method is only serialized for one submodule because of the name conflict. When loading other modules, the corresponding `__setstate__` cannot be found and run. The sub module is loaded as an ordinary nn module with properties in a dictionary, causing crash with error message, ""Expected GenericDict but got Tuple". In this PR it's fixed by adding the submodule's original (unique) qualified name to the lowered module name. It's also good for human understanding and debugging purposes. Test: Added unit test of `BackendTest.TestComposite` CI Differential Revision: [D29091143](https://our.internmc.facebook.com/intern/diff/D29091143) [ghstack-poisoned]
@iseeyuan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…nd (pytorch#59921) Summary: Pull Request resolved: pytorch#59921 Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D29091143 Pulled By: iseeyuan fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc
Stack from ghstack:
For a certain backend, the lowered models has a fixed name as
"torch.jit." + backend_name + "_LoweredModule"
. There is an issue of composite situations, where two different submodules are both lowered to the same backend. The submodule names are identical.It causes a bug in bytecode serialization, where the module names are not mangled (a follow up PR could be put to mangle the names to make them unique). As a result, the
__setstate__
method is only serialized for one submodule because of the name conflict. When loading other modules, the corresponding__setstate__
cannot be found and run. The sub module is loaded as an ordinary nn module with properties in a dictionary, causing crash with error message, ""Expected GenericDict but got Tuple".In this PR it's fixed by adding the submodule's original (unique) qualified name to the lowered module name. It's also good for human understanding and debugging purposes.
Test:
Added unit test of
BackendTest.TestComposite
CI
Differential Revision: D29091143