Skip to content

Move device type init from BackendSelect to backend kernels #37402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

bhosmer
Copy link

@bhosmer bhosmer commented Apr 28, 2020

Stack from ghstack:

Previously, BackendSelect kernels did just-in-time device type
initialization by calling LegacyTypeDispatch.initForDispatchKey()
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Note on HIPification: this PR introduces direct calls to device-specific initializers in generated code, in particular globalLegacyTypeDispatch().initCUDA() is called in factory kernels defined in CUDAType.cpp. No changes have been made to the conversion defined in https://github.com/pytorch/pytorch/tree/master/torch/utils/hipify and run by build_amd.py, so these calls remain in HIPified code. This isn't unusual - HIPified code contains unrenamed functions whose behavior has been repurposed. Relying on tests to verify that this call performs the correct initialization on HIPified builds.

Differential Revision: D21282974

Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

[ghstack-poisoned]
bhosmer pushed a commit that referenced this pull request Apr 28, 2020
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

ghstack-source-id: 51d4dca
Pull Request resolved: #37402
@dr-ci
Copy link

dr-ci bot commented Apr 28, 2020

💊 Build failures summary and remediations

As of commit 4a27ab6 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 38 times.

Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

[ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

[ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974)

[ghstack-poisoned]
bhosmer pushed a commit that referenced this pull request Apr 28, 2020
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

ghstack-source-id: 9b8000a
Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974)

[ghstack-poisoned]
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974)

[ghstack-poisoned]
bhosmer pushed a commit that referenced this pull request Apr 29, 2020
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

ghstack-source-id: ec9f3d3
Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974)

[ghstack-poisoned]
bhosmer pushed a commit that referenced this pull request Apr 29, 2020
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

ghstack-source-id: e574477
Pull Request resolved: #37402
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974)

[ghstack-poisoned]
bhosmer pushed a commit that referenced this pull request May 1, 2020
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

ghstack-source-id: fa2bbeb
Pull Request resolved: #37402
@bhosmer bhosmer requested a review from smessmer May 1, 2020 23:05
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

**Note on HIPification**: this PR introduces direct calls to device-specific initializers in generated code, in particular `globalLegacyTypeDispatch().initCUDA()` is called in factory kernels defined in `CUDAType.cpp`. No changes have been made to the conversion defined in https://github.com/pytorch/pytorch/tree/master/torch/utils/hipify and run by [build_amd.py](https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py), so these calls remain in HIPified code. This isn't unusual - HIPified code contains unrenamed functions whose behavior has been repurposed. Relying on tests to verify that this call performs the correct initialization on HIPified builds.


Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974)

[ghstack-poisoned]
bhosmer pushed a commit that referenced this pull request May 3, 2020
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

ghstack-source-id: 829336a
Pull Request resolved: #37402
@bhosmer bhosmer requested a review from ezyang May 4, 2020 16:21
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

**Note on HIPification**: this PR introduces direct calls to device-specific initializers in generated code, in particular `globalLegacyTypeDispatch().initCUDA()` is called in factory kernels defined in `CUDAType.cpp`. No changes have been made to the conversion defined in https://github.com/pytorch/pytorch/tree/master/torch/utils/hipify and run by [build_amd.py](https://github.com/pytorch/pytorch/blob/master/tools/amd_build/build_amd.py), so these calls remain in HIPified code. This isn't unusual - HIPified code contains unrenamed functions whose behavior has been repurposed. Relying on tests to verify that this call performs the correct initialization on HIPified builds.


Differential Revision: [D21282974](https://our.internmc.facebook.com/intern/diff/D21282974)

[ghstack-poisoned]
bhosmer pushed a commit that referenced this pull request May 4, 2020
Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

ghstack-source-id: e91e8f7
Pull Request resolved: #37402
@@ -20,26 +20,26 @@ namespace at {

class CAFFE2_API LegacyTypeDispatch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably an even further improvement (which is not for this PR) would be to eliminate the dynamic dispatch indirection, now that the initializations are being done from code that has direct access to it (i.e., is not from torch_cpu)

def is_factory(option):
# type: (FunctionOption) -> bool
formals = option['formals_list']
return find_formal_by_type('TensorOptions', formals) is not None and 'method' not in option['variants']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these are all non-substantive refactors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - sorry about the noisy diff. This just factors out the old RHS of is_factory_method (line 1224) so it can be used to gate the device init code as well. find_formal_by_name is just the old find_formal moved up and out to accompany.

@facebook-github-bot
Copy link
Contributor

@bhosmer merged this pull request in 209c6f9.

ShawnZhong pushed a commit to ShawnZhong/pytorch that referenced this pull request May 5, 2020
…37402)

Summary:
Pull Request resolved: pytorch#37402

Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

Test Plan: Imported from OSS

Differential Revision: D21282974

Pulled By: bhosmer

fbshipit-source-id: cd46eb788596948e0572a15fac0f8b43feca5d75
bharatr21 pushed a commit to bharatr21/pytorch that referenced this pull request May 5, 2020
…37402)

Summary:
Pull Request resolved: pytorch#37402

Previously, BackendSelect kernels did just-in-time device type
initialization by calling `LegacyTypeDispatch.initForDispatchKey()`
with a computed dispatch key. Here we move the initialization to
the backend kernels themselves, where we can call the device-
specific initializer directly.

Putting this up to run tests on it, but a couple questions remain:
* why were only BackendSelect kernels doing this initialization?
  Not all factory ops appear there, nor are all the ops that do
  appear there factory ops. Currently we generate init code for
  exactly the BackendSelect ops, but the choice should be better
  motivated.
* the previous scheme maps HIP to its own legacy type dispatch
  entry, but the logic assumes it's exclusive with CUDA, and no
  ops appear to mention HIP explicitly, so the new logic doesn't
  expose a static entry point for it. Needs to be verified.

Test Plan: Imported from OSS

Differential Revision: D21282974

Pulled By: bhosmer

fbshipit-source-id: cd46eb788596948e0572a15fac0f8b43feca5d75
@facebook-github-bot facebook-github-bot deleted the gh/bhosmer/27/head branch May 8, 2020 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants