-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Move device type init from BackendSelect to backend kernels #37402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. ghstack-source-id: 51d4dca Pull Request resolved: #37402
💊 Build failures summary and remediationsAs of commit 4a27ab6 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 38 times. |
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974) [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. ghstack-source-id: 9b8000a Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974) [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974) [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. ghstack-source-id: ec9f3d3 Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974) [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. ghstack-source-id: e574477 Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974) [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. ghstack-source-id: fa2bbeb Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. **Note on HIPification**: this PR introduces direct calls to device-specific initializers in generated code, in particular `globalLegacyTypeDispatch().initCUDA()` is called in factory kernels defined in `CUDAType.cpp`. No changes have been made to the conversion defined in https://github.com/pytorch/pytorch/tree/master/torch/utils/hipify and run by [build_amd.py](https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py), so these calls remain in HIPified code. This isn't unusual - HIPified code contains unrenamed functions whose behavior has been repurposed. Relying on tests to verify that this call performs the correct initialization on HIPified builds. Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974) [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. ghstack-source-id: 829336a Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. **Note on HIPification**: this PR introduces direct calls to device-specific initializers in generated code, in particular `globalLegacyTypeDispatch().initCUDA()` is called in factory kernels defined in `CUDAType.cpp`. No changes have been made to the conversion defined in https://github.com/pytorch/pytorch/tree/master/torch/utils/hipify and run by [build_amd.py](https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py), so these calls remain in HIPified code. This isn't unusual - HIPified code contains unrenamed functions whose behavior has been repurposed. Relying on tests to verify that this call performs the correct initialization on HIPified builds. Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974) [ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. ghstack-source-id: e91e8f7 Pull Request resolved: #37402
@@ -20,26 +20,26 @@ namespace at { | |||
|
|||
class CAFFE2_API LegacyTypeDispatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably an even further improvement (which is not for this PR) would be to eliminate the dynamic dispatch indirection, now that the initializations are being done from code that has direct access to it (i.e., is not from torch_cpu
)
def is_factory(option): | ||
# type: (FunctionOption) -> bool | ||
formals = option['formals_list'] | ||
return find_formal_by_type('TensorOptions', formals) is not None and 'method' not in option['variants'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these are all non-substantive refactors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - sorry about the noisy diff. This just factors out the old RHS of is_factory_method
(line 1224) so it can be used to gate the device init code as well. find_formal_by_name
is just the old find_formal
moved up and out to accompany.
…37402) Summary: Pull Request resolved: pytorch#37402 Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Test Plan: Imported from OSS Differential Revision: D21282974 Pulled By: bhosmer fbshipit-source-id: cd46eb788596948e0572a15fac0f8b43feca5d75
…37402) Summary: Pull Request resolved: pytorch#37402 Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Test Plan: Imported from OSS Differential Revision: D21282974 Pulled By: bhosmer fbshipit-source-id: cd46eb788596948e0572a15fac0f8b43feca5d75
Stack from ghstack:
Previously, BackendSelect kernels did just-in-time device type
initialization by calling
LegacyTypeDispatch.initForDispatchKey()
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.
Note on HIPification: this PR introduces direct calls to device-specific initializers in generated code, in particular
globalLegacyTypeDispatch().initCUDA()
is called in factory kernels defined inCUDAType.cpp
. No changes have been made to the conversion defined in https://github.com/pytorch/pytorch/tree/master/torch/utils/hipify and run by build_amd.py, so these calls remain in HIPified code. This isn't unusual - HIPified code contains unrenamed functions whose behavior has been repurposed. Relying on tests to verify that this call performs the correct initialization on HIPified builds.Differential Revision: D21282974