-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Structured kernels generate Meta registrations #48116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: b11c064 Pull Request resolved: #48116
💊 CI failures summary and remediationsAs of commit 69b0f4d (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 20 times. |
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: c8e0cba Pull Request resolved: #48116
// TODO: eliminate indirection | ||
return at::empty_meta(meta.sizes, meta.options); | ||
} | ||
|
||
inline Tensor tensor_from_meta(const TensorMeta& meta) { | ||
// TODO: eliminate indirection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the downside of this indirection? I think the wrapper function name makes it a little clearer what the goal is in the codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually a different indirection; at::empty_meta
is making a hop in the dispatcher but it's unnecessary, we could hardcode at::native::empty_meta
here. I haven't done it yet because I want to keep implementation simple for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh yep
@@ -1672,6 +1672,7 @@ | |||
CPU: resize_ | |||
CUDA: resize_cuda_ | |||
QuantizedCPU: quantized_resize_cpu_ | |||
Meta: resize_meta_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we can't make resize_ structured
, right? You have a check in gen_structured
to assert that people don't explicitly include their own structured implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible alternative: Just call the meta function (resize_meta_
) directly rather than through the dispatcher in our codegen. I think that just means that we'll need to call meta_tensor_from_meta
to convert it to a tensor directly in the codegen as well. Although you'll only want to do that when the dispatch key is meta
, which means emitting more different code for different kernels :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: I'm not clear on the use case for out
-variant meta
functions. If your goal is to find the meta-info of the output tensor, you can just examine out
argument directly, right?
I could imagine the goal being to take an existing model, stick the Meta
dispatch key on all the input tensors, and run everything without having to make changes to the model. That's probably it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we can't make resize_ structured, right?
This is correct. In fact, we can't make resize_
structured anyway, because there is no out variant nor purely functional variant (nor would they make sense).
Possible alternative: Just call the meta function (resize_meta_) directly rather than through the dispatcher in our codegen.
We should do this anyway.
I could imagine the goal being to take an existing model, stick the Meta dispatch key on all the input tensors, and run everything without having to make changes to the model. That's probably it.
Yep!
test/test_torch.py
Outdated
# (not sure why passing None here doesn't work? How strange...) | ||
z = torch.empty_meta(0) | ||
torch._C._nn.upsample_nearest1d(x, (4 * 10 ** 10,), 2, out=z) | ||
self.assertEqual(z.size(), (2 * 10 ** 10, 3, 4 * 10 ** 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth asserting that we didn't actually perform the computation? Also totally understand that the meta testing is subject to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not... sure exactly how to test that haha. I'm sort of trying to implicitly test this by using ludicrously large sizes here; if you actually allocated these you'd probably OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just thinking running the meta op, running the actual op, then asserting that their out
results aren't equal (I'm not actually sure what the contents of the empta_meta
are- if it's uninitialized memory we could just zero it out first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A meta tensor should give a predictable error if you try to access its data, right? Can we just test by expecting that error here?
In master torch.empty_meta(10)[0]
gets you a RuntimeError:
RuntimeError: Could not run 'aten::as_strided' with arguments from the 'Meta' backend.
Expecting a RuntimeError on data access (not on as_strided
, just in general) seems like a reasonable proxy for "didn't compute"
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D25056640](https://our.internmc.facebook.com/intern/diff/D25056640) [ghstack-poisoned]
The ASAN failure on this PR is legit; but it looks like it's just because I picked input sizes that are too big |
test/test_torch.py
Outdated
# (not sure why passing None here doesn't work? How strange...) | ||
z = torch.empty_meta(0) | ||
torch._C._nn.upsample_nearest1d(x, (4 * 10 ** 10,), 2, out=z) | ||
self.assertEqual(z.size(), (2 * 10 ** 10, 3, 4 * 10 ** 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A meta tensor should give a predictable error if you try to access its data, right? Can we just test by expecting that error here?
In master torch.empty_meta(10)[0]
gets you a RuntimeError:
RuntimeError: Could not run 'aten::as_strided' with arguments from the 'Meta' backend.
Expecting a RuntimeError on data access (not on as_strided
, just in general) seems like a reasonable proxy for "didn't compute"
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D25056640](https://our.internmc.facebook.com/intern/diff/D25056640) [ghstack-poisoned]
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 9e2c1fd Pull Request resolved: pytorch#48116
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 9e2c1fd Pull Request resolved: pytorch#48116
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D25056640](https://our.internmc.facebook.com/intern/diff/D25056640) [ghstack-poisoned]
If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: c1642f6 Pull Request resolved: pytorch#48116
Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
Summary: Pull Request resolved: pytorch#48116 If you port kernels to be structured, you get Meta kernels automatically generated for you. This is one payoff of structured kernels. Code generation was mercifully really simple, although at risk of "swiss cheese" syndrome: there's two new conditionals in the codegen to tweak behavior when generating for meta keys. It's not too bad right now but there's a risk of things getting out of hand. One way to rationalize the logic here would be to transmit "TensorMeta-ness" inside the TensorOptions (so tensor_from_meta can deal with it); then the "Meta" kernel magic would literally just be generating empty out_impls to call after all the scaffolding is done. But I didn't do this because it seemed like it would be more annoying short term. Also had to teach resize_ to work on meta tensors, since we use them to implement the out kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer, ailzhang Differential Revision: D25056640 Pulled By: ezyang fbshipit-source-id: f8fcfa0dbb58a94d9b4196748f56e155f83b1521
Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 1de9cce Pull Request resolved: pytorch#48731
Summary: Pull Request resolved: #48731 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25278034 Pulled By: ezyang fbshipit-source-id: 73652311b48d8d80c06e9385b7ff18ef3a158ae8
Stack from ghstack:
If you port kernels to be structured, you get Meta kernels automatically
generated for you. This is one payoff of structured kernels.
Code generation was mercifully really simple, although at risk of
"swiss cheese" syndrome: there's two new conditionals in the codegen
to tweak behavior when generating for meta keys. It's not too bad
right now but there's a risk of things getting out of hand. One
way to rationalize the logic here would be to transmit "TensorMeta-ness"
inside the TensorOptions (so tensor_from_meta can deal with it); then
the "Meta" kernel magic would literally just be generating empty
out_impls to call after all the scaffolding is done. But I didn't
do this because it seemed like it would be more annoying short term.
Also had to teach resize_ to work on meta tensors, since we use them
to implement the out kernels.
Signed-off-by: Edward Z. Yang ezyang@fb.com
Differential Revision: D25056640