-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Decomp for nn.functional.grid_sampler_2d #84350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 842de6c (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
x = grid[..., 0] | ||
y = grid[..., 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to benchmark / think about: Try doing grid[..., [0,1]]
to merge both calls. We could then see whether we can do everything in a vectorised fashion via broadcasting at least for interpolation_mode == 0
.
[ghstack-poisoned]
Note: ended up finding a small bug in an edge case of the cuda implementation, related to rounding. The PR also fixes this. |
[ghstack-poisoned]
@pytorchbot merge -g |
@pytorchbot successfully started a merge job. Check the current status here. |
Hey @fdrocha. |
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: d8f7d54 Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: 7ff6b8a Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: 3e6559d Pull Request resolved: #1134
Summary: Pull Request resolved: #84350 Approved by: https://github.com/jansel, https://github.com/Lezcano Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/91a5f52f51de9d6aa305d184fe07fe15d20b82c9 Reviewed By: mehtanirav Differential Revision: D39277804 fbshipit-source-id: dab01a97cea62949684a12ae7a785a295dcb1ff9
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: c91a2c5 Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: 986b35a Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: adef10f Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: 4f2951f Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: 872e1f1 Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 [ghstack-poisoned]
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: 93522ab Pull Request resolved: #1134
There is now a decomposition in pytorch that seems to have better performance, see benchmarks at pytorch/pytorch#84350 ghstack-source-id: 93522ab Pull Request resolved: #1134
Stack from ghstack (oldest at bottom):
I ran some benchmarks comparing performance of eager, torch.inductor using this decomposition and torch.inductor using a previously existing lowering of this function:
Benchmarks
Seems decomposed version is fastest most of the time. There are two lines where the lowering is 10% faster but for larger sizes in particular decomp is 20% faster.
Here is the script used to run the benchmarks
Script
And here is the code generated by torch inductor for decomp and lowering versions:
Generated code