Add static dispatch mode to reduce mobile code size#22335
Add static dispatch mode to reduce mobile code size#22335li-roy wants to merge 8 commits intogh/li-roy/36/basefrom
Conversation
Add static dispatch mode to reduce mobile code size gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
Add static dispatch mode to reduce mobile code size gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
Add static dispatch mode to reduce mobile code size gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
smessmer
left a comment
There was a problem hiding this comment.
There's a few questions we should talk about before landing this, but generally looks good.
|
|
||
| # dispatch aten ops statically for mobile | ||
| if [[ ${BUILD_ENVIRONMENT} == *"android"* ]]; then | ||
| NAMED_FLAG="export USE_STATIC_DISPATCH=1" |
There was a problem hiding this comment.
Let's give it a more scary name. I don't want people to think "static dispatch sounds nice" and use this flag without knowing that it restricts them to CPU only, prevents them from overriding kernels, and has a few other restrictions.
Actually, thinking about it, maybe we should combine this flag with the flag we're planning for selecting which ops should be registered? Have a flag
PYTORCH_WHITELIST_ATEN_OPS_FOR_MOBILE="aten::conv,aten::add"
or
PYTORCH_ONLY_BUILD_OPS_FOR_MOBILE="aten::conv,aten::add"
and whenever this flag is present, you use static dispatch?
There was a problem hiding this comment.
Yeah I think what we do for this depends on what we decide to do for specifying the subset of ops. Can we move forward with a boolean flag for now and change it later if we need to? We can change the name if you prefer, or maybe just pass an existing flag. But as far as I know, there's not a single existing flag that does what we want, because we don't want this to be triggered for internal mobile builds. @ljk53 any thoughts?
There was a problem hiding this comment.
I lean towards having separate flags for now. We might still need decide how to specify ops whitelist, e.g. what if we want to choose between ops having same name? what if we want to use config file instead of encoded string? We can always make the "static dispatch" flag "intern_" flag and set it automatically later (IMO keeping it as separate intern flag is easier to understand anyway).
BTW, you probably only need modify .jenkins/pytorch/build.sh (see how DBUILD_CAFFE2_MOBILE is set there).
There was a problem hiding this comment.
Is the plan for PyTorch Mobile to like, completely ignore OpenGL, Metal, Vulkan, etc?
There was a problem hiding this comment.
@ajtulloch when we get there can we use switch-case or hashmap-of-function-pointers approach? The first step is to get rid of huge aten vtable which blocks linker from striping out unused code...
There was a problem hiding this comment.
Static dispatch doesn't imply only CPU - the code below already generates the switch statements (which is good). We should add a configurable filter on dispatch ids (basically selecting which devices to compile), but it can be done separately.
c10/core/TensorImpl.cpp
Outdated
| #ifndef CAFFE2_FB_LIMITED_MOBILE_CAPABILITY | ||
|
|
||
| #ifdef USE_STATIC_DISPATCH | ||
| thread_local bool NonVariableTypeMode_enabled = true; |
There was a problem hiding this comment.
Why is variable type mode different depending on static dispatch?
There was a problem hiding this comment.
For mobile, we're never going through Variable code, but we're passing Variables through TH methods that expect tensor. Specifically checked_tensor_unwrap does an is_variable() check, and it'll fail because NonVariableTypeMode is always off, even though we are never going through Variable code. Because our constraints are inference-only and never going through VariableType, I thought it would make sense to just set NonVariableTypeMode and keep it.
@yf225 Do you have any thoughts on this?
There was a problem hiding this comment.
After @yf225's Variable/Tensor unification work, do we still need keep is_variable() check?
Earlier (before Will landed his work) when I tried removing virtual methods on types & variable, I figured it mostly worked fine without is_variable() check, only need keep a few variable overridden methods virtual:
It's not needed by mobile inference for now but we are still discussing whether we need variable/autodiff for federate on mobile in the future.
There was a problem hiding this comment.
It's a bit scary - I don't say we will add autograd on mobile build, but coupling with static dispatch like that is suspicious. Can't we just have a lightweight guard (at::AutoNonVariableTypeMode) in every method call?
Probably even better would be to just make sure we don't produce Variables at all - e.g. stub out factory functions if autograd is not compiled in (do we have a dedicated flag for turning off autograd?)
Add static dispatch mode to reduce mobile code size gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
Add static dispatch mode to reduce mobile code size gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
| fi | ||
|
|
||
| # dispatch aten ops statically for mobile | ||
| if [[ ${BUILD_ENVIRONMENT} == *"android"* ]]; then |
There was a problem hiding this comment.
we should also filter for iOS here, @xta0 - what is the right way to do it?
There was a problem hiding this comment.
I think we don't have ios CI yet.
|
@pytorchbot retest this please |
Add static dispatch mode to reduce mobile code size gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
Add static dispatch mode to reduce mobile code size gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
|
looks good, thanks |
| .device(${device}) | ||
| .pinned_memory(${pin_memory}); | ||
| auto result_ = torch::${name}(${args_with_tensor_options}); | ||
| #ifdef USE_STATIC_DISPATCH |
There was a problem hiding this comment.
Is it to avoid creating variables? It might be better to put NoGrad guard on. Afaiu, we want to get rid of at:: factory functions eventually and just create always variables (cc @gchanan)
There was a problem hiding this comment.
Yeah it's to avoid creating variables. I don't think NoGrad works without additional changes, torch:: will always create a variable.
There was a problem hiding this comment.
I plan to look into optional build for autograd/variable functionality on top of this PR and make changes if I find anything - so it's fine as long as the static dispatching part works.
| TENSOR_METHOD_DEFINITION = CodeTemplate("""\ | ||
| inline ${return_type} Tensor::${api_name}(${method_formals}) const { | ||
| #ifdef USE_STATIC_DISPATCH | ||
| ${mobile_method_body} |
There was a problem hiding this comment.
nit: you might want to call this "static_dispatch_method_body" to be consistent with the macro name?
| ('BFloat16', 'BFloat16', 'BFloat16AccrealNotDefined', True), | ||
| ] | ||
|
|
||
| mobile_backends = ['CPU', 'QuantizedCPU', 'SparseCPU'] |
| .device(${device}) | ||
| .pinned_memory(${pin_memory}); | ||
| auto result_ = torch::${name}(${args_with_tensor_options}); | ||
| #ifdef USE_STATIC_DISPATCH |
There was a problem hiding this comment.
I plan to look into optional build for autograd/variable functionality on top of this PR and make changes if I find anything - so it's fine as long as the static dispatching part works.
Summary: Pull Request resolved: pytorch/pytorch#22335 Test Plan: Imported from OSS Differential Revision: D16048264 Pulled By: li-roy fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e
|
This broke the named tensor master-only build. @zou3519 https://circleci.com/gh/pytorch/pytorch/2519587?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console |
Stack from ghstack:
As we discussed, this will allow the linker to remove unused operators automatically.
Differential Revision: D16048264