Mobile Backend: NHWC memory layout + XNNPACK integration. #32509

AshkanAliabadi · 2020-01-22T21:17:17Z

In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original #30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

kostmo · 2020-01-22T21:22:03Z

💊 CircleCI build failures summary and remediations

As of commit 4b95293:

1/1 failures introduced in this PR

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakage:

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | pattern match details)

RuntimeError: test_jit_fuser failed!

 
---------------------------------------------------------------------- 
Ran 46 tests in 11.450s 
 
FAILED (errors=4, skipped=10) 
Traceback (most recent call last): 
  File "run_test.py", line 486, in <module> 
    main() 
  File "run_test.py", line 479, in main 
    raise RuntimeError(message) 
RuntimeError: test_jit_fuser failed! 
 
(base) circleci@PACKER-5E29F737 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 65 times.

.gitmodules

aten/src/ATen/native/mobile/cpu/internal/Add.cpp

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

aten/src/ATen/CMakeLists.txt

ezyang · 2020-01-23T15:44:38Z

There are a ton of unintentional submodule updates in this diff.

ezyang · 2020-01-23T15:45:43Z

This diff is really long and the description in your PR is not commensurate with its length. Please reping for review after there is a longer description of the changes.

ezyang

No description

aten/src/ATen/native/mobile/cpu/Engine.h

aten/src/ATen/native/mobile/cpu/internal/Convolution.cpp

aten/src/ATen/native/mobile/cpu/internal/Allocator.cpp

AshkanAliabadi · 2020-01-23T17:29:11Z

Thank you for your reviews Edward and Greg. I will address your comments, along with non-threadpool related comments from Jiakai on the previous PR, and upload an update.

AshkanAliabadi · 2020-01-23T17:48:09Z

There are a ton of unintentional submodule updates in this diff.

Well a few of those updates are actually necessary for this patch to work, such as PSIMD, cpu_info, and pthreadpool. These updates are required for XNNPACK to compile. The NNPACK update fixes a buffer overflow which can be a separate PR. I can leave the others out. The only reason I updated those is since they are used in NNPACK and family and I wanted to make sure we are using any potential bugfix or performance improvement they might bring to the table.

AshkanAliabadi

Removed trace of all pthreadpool changes.
Removed extra submodule updates.
Removed c10 op registration. Will decide on how best to expose the operators in a follow-up patch.
Addressed comments.
Will update PR description shortly.

Please let me know if I missed anything, or if you have any further concerns.

aten/src/ATen/CMakeLists.txt

aten/src/ATen/native/mobile/cpu/internal/Add.cpp

ezyang · 2020-01-24T16:30:52Z

such as PSIMD, cpu_info, and pthreadpool

In that case, if the updates are backwards compatible, it's generally a good idea to do them in a separate diff first, and then the main diff. Although in this case it looks like you got all the tests to work.

ezyang · 2020-01-24T16:45:45Z

I'm adding @bwasti to this PR, because the caching scheme here is similar to things that bwasti observed were necessary in his sparse experiments in https://github.com/pytorch/sparse

gchanan · 2020-01-24T16:46:57Z

ya, splitting out the module updates into a separate PR is probably a good idea. You never know what could break downstream and having a minimally revertible piece is nice.

aten/src/ATen/native/mobile/cpu/Engine.h

aten/src/ATen/native/native_functions.yaml

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

dreiss

Looks like all major comments have been addressed. I'm going to accept this so we can start testing it out in real apps and iterating on the frontend.

aten/src/ATen/CMakeLists.txt

aten/src/ATen/native/ConvUtils.h

aten/src/ATen/native/native_functions.yaml

aten/src/ATen/native/utils/Allocator.h

aten/src/ATen/native/xnnpack/Common.h

dreiss · 2020-02-10T23:10:43Z

aten/src/ATen/native/xnnpack/Convolution.cpp

+          groups,
+          output_min,
+          output_max),
+      "xnnpack::convolution not available!");


I'm a bit worried that this error message doesn't actually tell the user what the problem was. Let's be sure to improve this if anyone gets confused.

My idea was that if this error prints the line and file the user can investigate. Otherwise if I want to make the error message super descriptive, which is also a possibility, I have to break that function into its constituent tests. Is that what you have in mind?

Yes. Don't need to do it right now, though.

Add a bit more context "xnnpack engine for conv2d doesn't support this combination of padding and strides"

And put a TODO to improve this message

you'll likely need to expose this function later to JIT or other parts to make sure that the rewriting pass is safe in handling details. Or we could add a fallback path that despite "prepacking" still just calls a regular conv if the params don't match. General principles is: everything should run, but some things can run slowly. But it can be done in a separate diff.

dreiss · 2020-02-10T23:10:58Z

aten/src/ATen/native/xnnpack/Convolution.cpp

+
+  TORCH_CHECK(
+      usable(input_nhwc),
+      "xnnpack::convolution not usable!");


Same. If I hit this, I wouldn't know what the problem was.

aten/src/ATen/native/xnnpack/Factory.cpp

dreiss · 2020-02-10T23:15:13Z

aten/src/ATen/native/xnnpack/Shim.cpp

@@ -0,0 +1,96 @@
+#ifndef USE_XNNPACK


Maybe comment what this file is for.

cmake/Dependencies.cmake

aten/src/ATen/native/xnnpack/Convolution.cpp

ezyang · 2020-02-10T23:50:05Z

The E2E bindings look pretty reasonable; I'm not sure what the current state of torchbind is but that's the main question I'd like resolved before shipping this

kimishpatel · 2020-02-11T01:51:06Z

The E2E bindings look pretty reasonable; I'm not sure what the current state of torchbind is but that's the main question I'd like resolved before shipping this

I was gonna work on this following @jamesr66a's PR:https://github.com/pytorch/pytorch/pull/32938/files, taking similar approach. Basically a custom class (OpContext) registered with torchbind that captures the OP context. The setstate method registered with torchbind will create OpContext. Current linear_prepack will also generate OpContext as output that is consumed by linear_run. Freezing API then can be used to get rid of linear_prepack. I have a small quip describing this appraoch, to which I will add you guys for further comments.

kimishpatel · 2020-02-11T17:27:03Z

@ezyang, note that having linear_prepack and linear_run, or their updated _ prefixed versions, will eventually have to return not Tensor but a custom class that is registered with torchbind. This means we will have to introduce the new class in native_functions.yaml and patch everything else to make it work with the build system. At the moment, similar approach using torchbind for quantization does not have to deal with this as the quantization ops are registered differently.
So my question is, is the introduction of these custom types in native_functions.yaml and the subsequent patching acceptable?

ezyang · 2020-02-12T01:29:17Z

I suspect the better strategy is to do the registrations manually, in the same way quantization does them. But as long as there's an agreed upon plan on record here, I don't mind if the patch goes in "as is".

dzhulgakov

Looks good modulo a few renames in inline comments. Let's land this and then have follow ups with frontend tests.

Also - do you plan in some follow up diffs to re-route regular non-prepacked ops to call xnnpack? Or is the perf penalty too high for it?

aten/src/ATen/native/native_functions.yaml

dzhulgakov · 2020-02-12T06:50:43Z

aten/src/ATen/native/xnnpack/Convolution.cpp

+          groups,
+          output_min,
+          output_max),
+      "xnnpack::convolution not available!");


Add a bit more context "xnnpack engine for conv2d doesn't support this combination of padding and strides"

And put a TODO to improve this message

dzhulgakov · 2020-02-12T06:58:38Z

aten/src/ATen/native/xnnpack/Convolution.cpp

+          groups,
+          output_min,
+          output_max),
+      "xnnpack::convolution not available!");


you'll likely need to expose this function later to JIT or other parts to make sure that the rewriting pass is safe in handling details. Or we could add a fallback path that despite "prepacking" still just calls a regular conv if the params don't match. General principles is: everything should run, but some things can run slowly. But it can be done in a separate diff.

aten/src/ATen/native/xnnpack/Convolution.cpp

AshkanAliabadi · 2020-02-18T20:23:43Z

Renamed the following operators:
conv_prepack -> _conv_prepack
conv_run -> _conv_packed
linear_prepack -> _linear_prepack
linear_run -> _linear_packed
~~Add XNNPACK integration files to the build in ATen/CMakeLists.txt only if USE_XNNPACK is true.~~ _conv2d_prepack and family definitions required, even in absence of XNNPACK, to avoid linker errors.
Removed erroneously added header from ATen/native/ConvUtils.h
Renamed a number of variables and functions.
Handle the case where batch size is zero.
Added true to the end of the conditions.
Modified empty_with_tail_padding() to take dtype instead of TensorOptions, and to use resize_ instead of set_sizes_contiguous + empty_tensor_restride combo.
Improved error message to be more descriptive.
Added TODO to break error messages into respective constituents to better describe reason for failure.

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

AshkanAliabadi · 2020-02-22T18:15:32Z

Also - do you plan in some follow up diffs to re-route regular non-prepacked ops to call xnnpack? Or is the perf penalty too high for it?

Sorry this went under my radar. Yes, I plan to follow up with another diff that will reroute those operators. The performance penalty shouldn't be high.

Also there is another patch left that I took out of the original diff whose purpose is to unify threading on mobile to reconcile XNNPACK's use of an updated version of pthreadpool that is using an updated interface that our internal custom Caffe2 implementation is not written against. Marat has also done improvements to pthreadpool's implementation itself to use spin locks for short sleeps which now makes our custom Caffe2 implementation redundant as the latter was also based on the same premise. My benchmarks last half showed better performance using pthreadpool's updated implementation compared to our custom version, and on top of that I think using a unified threading solution makes for easier code maintenance too. We have run into linker issues complaining about duplicate symbols any time we wanted to add support for a new platform, including internal BUCK targets.

So yeah, these are the two pieces on the backend side of things remaining. There's of course the JIT side as well which Kimish is working on.

kimishpatel · 2020-02-23T02:13:22Z

Sorry this went under my radar. Yes, I plan to follow up with another diff that will reroute those operators. The performance penalty shouldn't be high.

What is this referring to Ashkan? Is this the same for ops such as Add?
Also in the diffs I am working on, I was able to get around the linker issue, as we discussed in person, and compile XNNPACK against pthreadpool while still leaving the older interface that uses C2's internal implementation. Eventually we should unify these but with my patch, the unification should not be blocker.

AshkanAliabadi · 2020-02-24T03:22:50Z

Yaaay! I'm genuinely so excited. Like a child discovering candy for the first time.

…

On Sun, Feb 23, 2020, 7:10 PM Facebook Community Bot < ***@***.***> wrote: Closed #32509 <#32509> via 941b424 <941b424> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32509?email_source=notifications&email_token=AD24NKC2UONUAKJSFK6XJEDREM3CZA5CNFSM4KKMULX2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOW2QKKDQ#event-3063981326>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD24NKD6LU7RTGPZLMKU2IDREM3CZANCNFSM4KKMULXQ> .

facebook-github-bot · 2020-02-24T04:42:54Z

@AshkanAliabadi merged this pull request in 941b424.

Summary: Pull Request resolved: #33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original #30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: #32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c

Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding **native** implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original #30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: #32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa

Summary: Pull Request resolved: #33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original #30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: #32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c

) Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding **native** implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original pytorch#30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: pytorch#32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa

) Summary: Pull Request resolved: pytorch#33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original pytorch#30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: pytorch#32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c

AshkanAliabadi mentioned this pull request Jan 22, 2020

Mobile Backend: NHWC memory layout + XNNPACK integration. #32480

Closed

facebook-github-bot reviewed Jan 22, 2020

View reviewed changes

AshkanAliabadi requested review from dreiss, ezyang and ljk53 January 22, 2020 21:27

AshkanAliabadi commented Jan 22, 2020

View reviewed changes

.gitmodules Show resolved Hide resolved

aten/src/ATen/native/mobile/cpu/internal/Add.cpp Outdated Show resolved Hide resolved

AshkanAliabadi requested a review from facebook-github-bot January 22, 2020 22:41

facebook-github-bot reviewed Jan 22, 2020

View reviewed changes

ezyang reviewed Jan 23, 2020

View reviewed changes

aten/src/ATen/CMakeLists.txt Outdated Show resolved Hide resolved

ezyang requested a review from smessmer January 23, 2020 15:28

ezyang requested changes Jan 23, 2020

View reviewed changes

gchanan reviewed Jan 23, 2020

View reviewed changes

aten/src/ATen/native/mobile/cpu/Engine.h Outdated Show resolved Hide resolved

gchanan reviewed Jan 23, 2020

View reviewed changes

aten/src/ATen/native/mobile/cpu/internal/Convolution.cpp Outdated Show resolved Hide resolved

gchanan reviewed Jan 23, 2020

View reviewed changes

aten/src/ATen/native/mobile/cpu/internal/Allocator.cpp Outdated Show resolved Hide resolved

AshkanAliabadi commented Jan 23, 2020

View reviewed changes

ljk53 reviewed Jan 23, 2020

View reviewed changes

aten/src/ATen/CMakeLists.txt Outdated Show resolved Hide resolved

aten/src/ATen/native/mobile/cpu/internal/Add.cpp Outdated Show resolved Hide resolved

ljk53 requested a review from dzhulgakov January 23, 2020 22:39

ezyang requested a review from bwasti January 24, 2020 16:45

AshkanAliabadi mentioned this pull request Jan 24, 2020

NHWC memory layout support and XNNPACK integration for mobile #30644

Closed

ezyang reviewed Jan 24, 2020

View reviewed changes

aten/src/ATen/native/mobile/cpu/Engine.h Outdated Show resolved Hide resolved

ezyang reviewed Jan 24, 2020

View reviewed changes

aten/src/ATen/native/mobile/cpu/Engine.h Outdated Show resolved Hide resolved

ezyang reviewed Jan 24, 2020

View reviewed changes

aten/src/ATen/native/mobile/cpu/Engine.h Outdated Show resolved Hide resolved

AshkanAliabadi commented Feb 4, 2020

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

AshkanAliabadi commented Feb 6, 2020

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

facebook-github-bot reviewed Feb 7, 2020

View reviewed changes

dreiss approved these changes Feb 10, 2020

View reviewed changes

ezyang reviewed Feb 10, 2020

View reviewed changes

aten/src/ATen/native/xnnpack/Convolution.cpp Show resolved Hide resolved

dzhulgakov reviewed Feb 12, 2020

View reviewed changes

AshkanAliabadi mentioned this pull request Feb 12, 2020

Upgrade pytorch to use XNNPACK instead of NNPACK for android #30622

Closed

facebook-github-bot reviewed Feb 18, 2020

View reviewed changes

Mobile Backend: NHWC memory layout + XNNPACK integration.

4b95293

facebook-github-bot reviewed Feb 18, 2020

View reviewed changes

facebook-github-bot closed this in 941b424 Feb 24, 2020

facebook-github-bot added the merged label Feb 24, 2020

AshkanAliabadi mentioned this pull request Feb 24, 2020

Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509) #33722

Closed

kimishpatel mentioned this pull request Mar 2, 2020

JIT pass to insert XNNPACK ops #34048

Closed

mruberry added the Merged label Oct 28, 2020

Mobile Backend: NHWC memory layout + XNNPACK integration. #32509

Mobile Backend: NHWC memory layout + XNNPACK integration. #32509

Conversation

AshkanAliabadi commented Jan 22, 2020 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

kostmo commented Jan 22, 2020 • edited by dr-ci bot Loading

💊 CircleCI build failures summary and remediations

Detailed failure analysis

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

facebook-github-bot left a comment

Choose a reason for hiding this comment

ezyang commented Jan 23, 2020

ezyang commented Jan 23, 2020

ezyang left a comment

Choose a reason for hiding this comment

AshkanAliabadi commented Jan 23, 2020

AshkanAliabadi commented Jan 23, 2020

AshkanAliabadi left a comment

Choose a reason for hiding this comment

ezyang commented Jan 24, 2020

ezyang commented Jan 24, 2020

gchanan commented Jan 24, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

dreiss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang commented Feb 10, 2020

kimishpatel commented Feb 11, 2020

kimishpatel commented Feb 11, 2020

ezyang commented Feb 12, 2020 • edited Loading

dzhulgakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AshkanAliabadi commented Feb 18, 2020 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

AshkanAliabadi commented Feb 22, 2020

kimishpatel commented Feb 23, 2020

AshkanAliabadi commented Feb 24, 2020 via email

facebook-github-bot commented Feb 24, 2020

AshkanAliabadi commented Jan 22, 2020 •

edited

Loading

kostmo commented Jan 22, 2020 •

edited by dr-ci bot

Loading

ezyang commented Feb 12, 2020 •

edited

Loading

AshkanAliabadi commented Feb 18, 2020 •

edited

Loading