add channels last support for ReflectionPad on CPU #99608

mingfeima · 2023-04-20T06:59:16Z

Stack from ghstack (oldest at bottom):

This patch add channels last support for ReflectionPad2d/3d on CPU backend. The following test cases will pass with this patch:

python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32
python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32

This patch improves padding performance on CPU, which:

original kernel has nested paralleled loops, e.g. first on dim of batch and then on dim of channels, this is not optimal practice when N * C is small. This patch did dimension collapse on NC and adjacent spatial dims to maximize the parallelism scope.
original kernel is scalar logic. This patch did vectorization on dim of width on NCHW, did vectorization on channels on NHWC.

The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket.

single core inference

(before)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) , NCHW: 0.281 ms;  ; NHWC: 0.356 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NCHW: 55.675 ms;  ; NHWC: 86.821 ms

(after)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) , NCHW: 0.049 ms;  ; NHWC: 0.328 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NCHW: 17.252 ms;  ; NHWC: 16.806 ms

single socket inference

(before)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) , NCHW: 0.118 ms;  ; NHWC: 0.142 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NCHW: 4.023 ms;  ; NHWC: 7.367 ms

(after)
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([1, 3, 224, 224]) , NCHW: 0.010 ms;  ; NHWC: 0.027 ms
ReflectionPad2d((2, 2, 2, 2)) size:  torch.Size([128, 64, 56, 56]) , NCHW: 3.149 ms;  ; NHWC: 3.181 ms

Notes:

when C < vector length: on NCHW, the vectorization is done on width on NCHW when the output index is overlapped with the input index; on NHWC, it is scalar logic, so it will be slower than NCHW.
when C >= vector length: NCHW and NHWC perf are similar.

cc @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

[ghstack-poisoned]

pytorch-bot · 2023-04-20T06:59:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99608

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 1eac7e6:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…nPad on CPU" Fix #96738 This patch add channels last support for ReflectionPad2d/3d and ReplicationPad2d/3d on CPU backend. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad2d_cpu_float32 ``` cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 2a5e4fc Pull Request resolved: #99608

…nPad on CPU" Fix #96738 This patch add channels last support for ReflectionPad2d/3d and ReplicationPad2d/3d on CPU backend. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad2d_cpu_float32 ``` This patch improves padding performance on CPU, which: * original kernel has nested paralleled loops, e.g. first on dim of **batch** and then on dim of **channels**, this is not optimal practice when N * C is small. This patch did dimension collapse on NC and adjacent spatial dims to maximize the parallelism scope. * original kernel is scalar logic. This patch did vectorization on dim of **width** on NCHW, did vectorization on **channels** on NHWC. The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.281 ms; ; NHWC: 0.356 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 55.675 ms; ; NHWC: 86.821 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.265 ms; ; NHWC: 0.339 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 52.336 ms; ; NHWC: 82.935 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.049 ms; ; NHWC: 0.328 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.252 ms; ; NHWC: 16.806 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.048 ms; ; NHWC: 0.324 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.199 ms; ; NHWC: 16.717 ms ``` ### single socket inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.118 ms; ; NHWC: 0.142 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 4.023 ms; ; NHWC: 7.367 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.111 ms; ; NHWC: 0.135 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.885 ms; ; NHWC: 7.203 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.010 ms; ; NHWC: 0.027 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.149 ms; ; NHWC: 3.181 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.011 ms; ; NHWC: 0.029 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.148 ms; ; NHWC: 3.174 ms ``` Notes: * when C < vector length: on NCHW, the vectorization is done on **width** on NCHW when the output index is overlapped with the input index; on NHWC, it is scalar logic, so it will be slower than NCHW. * when C >= vector length: NCHW and NHWC perf are similar. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 6a777b7 Pull Request resolved: #99608

…nPad on CPU" Fix #96738 This patch add channels last support for ReflectionPad2d/3d and ReplicationPad2d/3d on CPU backend. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad2d_cpu_float32 ``` This patch improves padding performance on CPU, which: * original kernel has nested paralleled loops, e.g. first on dim of **batch** and then on dim of **channels**, this is not optimal practice when N * C is small. This patch did dimension collapse on NC and adjacent spatial dims to maximize the parallelism scope. * original kernel is scalar logic. This patch did vectorization on dim of **width** on NCHW, did vectorization on **channels** on NHWC. The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.281 ms; ; NHWC: 0.356 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 55.675 ms; ; NHWC: 86.821 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.265 ms; ; NHWC: 0.339 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 52.336 ms; ; NHWC: 82.935 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.049 ms; ; NHWC: 0.328 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.252 ms; ; NHWC: 16.806 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.048 ms; ; NHWC: 0.324 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.199 ms; ; NHWC: 16.717 ms ``` ### single socket inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.118 ms; ; NHWC: 0.142 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 4.023 ms; ; NHWC: 7.367 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.111 ms; ; NHWC: 0.135 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.885 ms; ; NHWC: 7.203 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.010 ms; ; NHWC: 0.027 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.149 ms; ; NHWC: 3.181 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.011 ms; ; NHWC: 0.029 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.148 ms; ; NHWC: 3.174 ms ``` Notes: * when C < vector length: on NCHW, the vectorization is done on **width** on NCHW when the output index is overlapped with the input index; on NHWC, it is scalar logic, so it will be slower than NCHW. * when C >= vector length: NCHW and NHWC perf are similar. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

ghstack-source-id: 0dd1a40 Pull Request resolved: #99608

mingfeima · 2023-04-25T02:17:25Z

@cpuhrsch could you please help review this one ?

cpuhrsch · 2023-04-28T14:09:22Z

@mingfeima - This diff is too large (+1,020 −1,714). Can this be split up? Are there some code moves in here that could be split out into separate PRs in this stack?

cpuhrsch

Please split this large set of changes (+1,020 −1,714) into a stack of smaller PRs

mingfeima · 2023-05-04T07:33:42Z

@mingfeima - This diff is too large (+1,020 −1,714). Can this be split up? Are there some code moves in here that could be split out into separate PRs in this stack?

sure, just coming back from holiday, will split this one into smaller PRs

…nPad on CPU" Fix #96738 This patch add channels last support for ReflectionPad2d/3d and ReplicationPad2d/3d on CPU backend. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReplicationPad2d_cpu_float32 ``` This patch improves padding performance on CPU, which: * original kernel has nested paralleled loops, e.g. first on dim of **batch** and then on dim of **channels**, this is not optimal practice when N * C is small. This patch did dimension collapse on NC and adjacent spatial dims to maximize the parallelism scope. * original kernel is scalar logic. This patch did vectorization on dim of **width** on NCHW, did vectorization on **channels** on NHWC. The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.281 ms; ; NHWC: 0.356 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 55.675 ms; ; NHWC: 86.821 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.265 ms; ; NHWC: 0.339 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 52.336 ms; ; NHWC: 82.935 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.049 ms; ; NHWC: 0.328 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.252 ms; ; NHWC: 16.806 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.048 ms; ; NHWC: 0.324 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.199 ms; ; NHWC: 16.717 ms ``` ### single socket inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.118 ms; ; NHWC: 0.142 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 4.023 ms; ; NHWC: 7.367 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.111 ms; ; NHWC: 0.135 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.885 ms; ; NHWC: 7.203 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.010 ms; ; NHWC: 0.027 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.149 ms; ; NHWC: 3.181 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.011 ms; ; NHWC: 0.029 ms ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.148 ms; ; NHWC: 3.174 ms ``` Notes: * when C < vector length: on NCHW, the vectorization is done on **width** on NCHW when the output index is overlapped with the input index; on NHWC, it is scalar logic, so it will be slower than NCHW. * when C >= vector length: NCHW and NHWC perf are similar. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima · 2023-05-06T04:09:52Z

@cpuhrsch could you please help review again ? I have separated the original PR into 2 PRs, one for ReflectionPad and one for ReplicationPad.

cpuhrsch · 2023-05-08T18:47:56Z

aten/src/ATen/native/ReflectionPad.cpp

+  auto pad_r = padding[1];
+
+  // allow empty batch size but not other dimensions.
+  at::native::padding::check_valid_input<1>(input);


Thanks for making the updates @mingfeima ! I think it looks much better. I do have to ask for another small refactor so I can more easily accept and defend this change.

While I agree that this unifying function is useful, I have to worry that we're maintaining the same error messages and error behavior across both code paths.

Can you split the refactor of these input sanitization functions into PRs below this stack?

I want to make sure we separate the task of "refactor existing error checking code for both CPU and CUDA" from "add an optimized kernel for a new feature" and from "add a new feature (channels last)". I think this PR currently does those three things at once.

Also the error checking functions should be within this file and not in the cpu subfolder since they're not cpu specific and we don't need to compile them multiple times for various instruction sets.

Thank you!

OK, sure will have it done :)

Fix #96738 This patch add channels last support for ReflectionPad2d/3d on CPU backend. The following test cases will pass with this patch: ``` python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad2d_cpu_float32 python test_modules.py TestModuleCPU.test_memory_format_nn_ReflectionPad3d_cpu_float32 ``` This patch improves padding performance on CPU, which: * original kernel has nested paralleled loops, e.g. first on dim of **batch** and then on dim of **channels**, this is not optimal practice when N * C is small. This patch did dimension collapse on NC and adjacent spatial dims to maximize the parallelism scope. * original kernel is scalar logic. This patch did vectorization on dim of **width** on NCHW, did vectorization on **channels** on NHWC. The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.281 ms; ; NHWC: 0.356 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 55.675 ms; ; NHWC: 86.821 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.049 ms; ; NHWC: 0.328 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.252 ms; ; NHWC: 16.806 ms ``` ### single socket inference ``` (before) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.118 ms; ; NHWC: 0.142 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 4.023 ms; ; NHWC: 7.367 ms (after) ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.010 ms; ; NHWC: 0.027 ms ReflectionPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.149 ms; ; NHWC: 3.181 ms ``` Notes: * when C < vector length: on NCHW, the vectorization is done on **width** on NCHW when the output index is overlapped with the input index; on NHWC, it is scalar logic, so it will be slower than NCHW. * when C >= vector length: NCHW and NHWC perf are similar. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

replacement of #99608, breaking the old pr into smaller ones. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

replacement of #99608, breaking the old pr into smaller ones. this one handles the common error message from both CPU and CUDA device, to simplify the code. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

replacement of #99608, breaking the old pr into smaller ones. this one handles the common error message from both CPU and CUDA device, to simplify the code. Pull Request resolved: #102253 Approved by: https://github.com/cpuhrsch, https://github.com/albanD

github-actions · 2023-07-10T06:36:49Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

add channels last support for ReflectionPad and ReplicationPad on CPU

57d62d2

[ghstack-poisoned]

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Apr 20, 2023

pytorchbot added the open source label Apr 20, 2023

mingfeima added a commit that referenced this pull request Apr 20, 2023

add channels last support for ReflectionPad and ReplicationPad on CPU

8e2584c

ghstack-source-id: 2a5e4fc Pull Request resolved: #99608

mingfeima added release notes: memory format release notes category topic: not user facing topic category labels Apr 20, 2023

mingfeima marked this pull request as draft April 20, 2023 07:38

mingfeima added a commit that referenced this pull request Apr 21, 2023

add channels last support for ReflectionPad and ReplicationPad on CPU

12b8c39

ghstack-source-id: 6a777b7 Pull Request resolved: #99608

mingfeima added a commit that referenced this pull request Apr 23, 2023

add channels last support for ReflectionPad and ReplicationPad on CPU

0e5e72f

ghstack-source-id: 0dd1a40 Pull Request resolved: #99608

mingfeima marked this pull request as ready for review April 23, 2023 05:17

mingfeima mentioned this pull request Apr 23, 2023

Many padding Module fail memory_format tests #96738

Open

mingfeima requested review from albanD, cpuhrsch and malfet April 23, 2023 07:49

cpuhrsch requested changes Apr 28, 2023

View reviewed changes

mingfeima mentioned this pull request May 6, 2023

add channels last support for ReplicationPad on CPU #100789

Closed

mingfeima added the ciflow/trunk Trigger trunk jobs on your pull request label May 6, 2023

mingfeima requested a review from cpuhrsch May 6, 2023 04:06

mingfeima changed the title ~~add channels last support for ReflectionPad and ReplicationPad on CPU~~ add channels last support for ReflectionPad on CPU May 6, 2023

cpuhrsch requested changes May 8, 2023

View reviewed changes

mingfeima mentioned this pull request May 25, 2023

update argument checks from padding layers #102253

Closed

mingfeima added a commit that referenced this pull request May 25, 2023

Update on "update argument checks from padding layers"

9e26a97

replacement of #99608, breaking the old pr into smaller ones. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima added a commit that referenced this pull request May 30, 2023

Update base for Update on "update argument checks from padding layers"

224593f

replacement of #99608, breaking the old pr into smaller ones. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima added a commit that referenced this pull request May 30, 2023

Update on "update argument checks from padding layers"

e854341

replacement of #99608, breaking the old pr into smaller ones. cc jgong5 XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

github-actions bot added the Stale label Jul 10, 2023

github-actions bot closed this Aug 9, 2023

facebook-github-bot deleted the gh/mingfeima/115/head branch September 8, 2023 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add channels last support for ReflectionPad on CPU #99608

add channels last support for ReflectionPad on CPU #99608

Uh oh!

mingfeima commented Apr 20, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 20, 2023 •

edited

Loading

Uh oh!

mingfeima commented Apr 25, 2023

Uh oh!

cpuhrsch commented Apr 28, 2023

Uh oh!

cpuhrsch left a comment

Uh oh!

mingfeima commented May 4, 2023

Uh oh!

mingfeima commented May 6, 2023

Uh oh!

cpuhrsch May 8, 2023

Uh oh!

mingfeima May 9, 2023

Uh oh!

github-actions bot commented Jul 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add channels last support for ReflectionPad on CPU #99608

add channels last support for ReflectionPad on CPU #99608

Uh oh!

Conversation

mingfeima commented Apr 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

single core inference

single socket inference

Uh oh!

pytorch-bot bot commented Apr 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99608

❌ 2 New Failures

Uh oh!

mingfeima commented Apr 25, 2023

Uh oh!

cpuhrsch commented Apr 28, 2023

Uh oh!

cpuhrsch left a comment

Choose a reason for hiding this comment

Uh oh!

mingfeima commented May 4, 2023

Uh oh!

mingfeima commented May 6, 2023

Uh oh!

cpuhrsch May 8, 2023

Choose a reason for hiding this comment

Uh oh!

mingfeima May 9, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mingfeima commented Apr 20, 2023 •

edited

Loading

pytorch-bot bot commented Apr 20, 2023 •

edited

Loading