New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vulkan] Add mean.dim op for vulkan #47312
Conversation
[ghstack-poisoned]
layout(set = 0, binding = 2) uniform constBlock { | ||
layout(set = 0, binding = 0, rgba16f) uniform PRECISION restrict writeonly image3D uOutput; | ||
layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput; | ||
layout(set = 0, binding = 2) uniform Block { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mark this with PRECISION and restrict.
for (xi = 0; xi < W; ++xi) { | ||
for (yi = 0; yi < H; ++yi) { | ||
for (xi = 0; xi < uBlock.W; ++xi) { | ||
for (yi = 0; yi < uBlock.H; ++yi) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterate over y in the outer loop, and x in the inner loop. We are dealing with a texture that is packed in an opaque format, so this might not apply, but if and when the memory is laid out linearly that traversal has better locality of access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check adaptive_avg_pool shader here: https://github.com/pytorch/pytorch/pull/47261/files
int OW = uConstBlock.OW; | ||
int OH = uConstBlock.OH; | ||
vec4 r = vec4(1.0) / float(W) / float(H); | ||
vec4 r = vec4(1.0) / float(uBlock.W) / float(uBlock.H); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check adaptive_avg_pool here https://github.com/pytorch/pytorch/pull/47261/files for another implementation. Divisions are typically slower than multiplications.
@@ -169,6 +169,42 @@ TEST(VulkanTest, mm) { | |||
ASSERT_TRUE(check); | |||
} | |||
|
|||
TEST(VulkanTest, mean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VulkanAPITest
|
||
const auto check = almostEqual(t_out, t_out_expected); | ||
if (!check) { | ||
//std::cout << "original:\n" << t_in << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean or uncomment?
} | ||
vec4 outValue = r * acc; | ||
|
||
int test = (imageSize(uOutput).x*pos.x + pos.x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean?
💊 CI failures summary and remediationsAs of commit 9185eb4 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_linux_bionic_py3_8_gcc9_coverage_build (1/1)Step: "Build" (full log | diagnosis details | 🔁 rerun)
|
[ghstack-poisoned]
ghstack-source-id: 19a3669e01f721c1ae14b7ea02cd0a901c65ece3 Pull Request resolved: #47312
Stack from ghstack:
Differential Revision: D24713617