Skip to content

Add static dispatch mode to reduce mobile code size#22335

Closed
li-roy wants to merge 8 commits intogh/li-roy/36/basefrom
gh/li-roy/36/head
Closed

Add static dispatch mode to reduce mobile code size#22335
li-roy wants to merge 8 commits intogh/li-roy/36/basefrom
gh/li-roy/36/head

Conversation

@li-roy
Copy link
Copy Markdown
Contributor

@li-roy li-roy commented Jun 28, 2019

Stack from ghstack:

As we discussed, this will allow the linker to remove unused operators automatically.

Differential Revision: D16048264

@pytorchbot pytorchbot added caffe2 module: android Related to Android support module: build Build system issues module: internals Related to internal abstractions in c10 and ATen module: operators labels Jun 28, 2019
@li-roy li-roy requested review from dzhulgakov, ljk53 and smessmer June 28, 2019 09:18
royboy added 2 commits June 28, 2019 03:46
Add static dispatch mode to reduce mobile code size

gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
Add static dispatch mode to reduce mobile code size

gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
@pytorchbot pytorchbot added the module: ci Related to continuous integration label Jun 28, 2019
Add static dispatch mode to reduce mobile code size

gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
Copy link
Copy Markdown
Contributor

@smessmer smessmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few questions we should talk about before landing this, but generally looks good.


# dispatch aten ops statically for mobile
if [[ ${BUILD_ENVIRONMENT} == *"android"* ]]; then
NAMED_FLAG="export USE_STATIC_DISPATCH=1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give it a more scary name. I don't want people to think "static dispatch sounds nice" and use this flag without knowing that it restricts them to CPU only, prevents them from overriding kernels, and has a few other restrictions.

Actually, thinking about it, maybe we should combine this flag with the flag we're planning for selecting which ops should be registered? Have a flag

PYTORCH_WHITELIST_ATEN_OPS_FOR_MOBILE="aten::conv,aten::add"

or

PYTORCH_ONLY_BUILD_OPS_FOR_MOBILE="aten::conv,aten::add"

and whenever this flag is present, you use static dispatch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think what we do for this depends on what we decide to do for specifying the subset of ops. Can we move forward with a boolean flag for now and change it later if we need to? We can change the name if you prefer, or maybe just pass an existing flag. But as far as I know, there's not a single existing flag that does what we want, because we don't want this to be triggered for internal mobile builds. @ljk53 any thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards having separate flags for now. We might still need decide how to specify ops whitelist, e.g. what if we want to choose between ops having same name? what if we want to use config file instead of encoded string? We can always make the "static dispatch" flag "intern_" flag and set it automatically later (IMO keeping it as separate intern flag is easier to understand anyway).

BTW, you probably only need modify .jenkins/pytorch/build.sh (see how DBUILD_CAFFE2_MOBILE is set there).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan for PyTorch Mobile to like, completely ignore OpenGL, Metal, Vulkan, etc?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajtulloch when we get there can we use switch-case or hashmap-of-function-pointers approach? The first step is to get rid of huge aten vtable which blocks linker from striping out unused code...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static dispatch doesn't imply only CPU - the code below already generates the switch statements (which is good). We should add a configurable filter on dispatch ids (basically selecting which devices to compile), but it can be done separately.

#ifndef CAFFE2_FB_LIMITED_MOBILE_CAPABILITY

#ifdef USE_STATIC_DISPATCH
thread_local bool NonVariableTypeMode_enabled = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is variable type mode different depending on static dispatch?

Copy link
Copy Markdown
Contributor Author

@li-roy li-roy Jun 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mobile, we're never going through Variable code, but we're passing Variables through TH methods that expect tensor. Specifically checked_tensor_unwrap does an is_variable() check, and it'll fail because NonVariableTypeMode is always off, even though we are never going through Variable code. Because our constraints are inference-only and never going through VariableType, I thought it would make sense to just set NonVariableTypeMode and keep it.

@yf225 Do you have any thoughts on this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After @yf225's Variable/Tensor unification work, do we still need keep is_variable() check?

Earlier (before Will landed his work) when I tried removing virtual methods on types & variable, I figured it mostly worked fine without is_variable() check, only need keep a few variable overridden methods virtual:

ljk53@2f795a5
ljk53@cc8a178

It's not needed by mobile inference for now but we are still discussing whether we need variable/autodiff for federate on mobile in the future.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit scary - I don't say we will add autograd on mobile build, but coupling with static dispatch like that is suspicious. Can't we just have a lightweight guard (at::AutoNonVariableTypeMode) in every method call?

Probably even better would be to just make sure we don't produce Variables at all - e.g. stub out factory functions if autograd is not compiled in (do we have a dedicated flag for turning off autograd?)

Add static dispatch mode to reduce mobile code size

gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
@pytorchbot pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 13, 2019
Add static dispatch mode to reduce mobile code size

gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
fi

# dispatch aten ops statically for mobile
if [[ ${BUILD_ENVIRONMENT} == *"android"* ]]; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also filter for iOS here, @xta0 - what is the right way to do it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have ios CI yet.

@li-roy
Copy link
Copy Markdown
Contributor Author

li-roy commented Aug 13, 2019

@pytorchbot retest this please

royboy added 2 commits August 13, 2019 15:08
Add static dispatch mode to reduce mobile code size

gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
Add static dispatch mode to reduce mobile code size

gh-metadata: pytorch pytorch 22335 gh/li-roy/36/head
@smessmer
Copy link
Copy Markdown
Contributor

looks good, thanks

.device(${device})
.pinned_memory(${pin_memory});
auto result_ = torch::${name}(${args_with_tensor_options});
#ifdef USE_STATIC_DISPATCH
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it to avoid creating variables? It might be better to put NoGrad guard on. Afaiu, we want to get rid of at:: factory functions eventually and just create always variables (cc @gchanan)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's to avoid creating variables. I don't think NoGrad works without additional changes, torch:: will always create a variable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to look into optional build for autograd/variable functionality on top of this PR and make changes if I find anything - so it's fine as long as the static dispatching part works.

TENSOR_METHOD_DEFINITION = CodeTemplate("""\
inline ${return_type} Tensor::${api_name}(${method_formals}) const {
#ifdef USE_STATIC_DISPATCH
${mobile_method_body}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you might want to call this "static_dispatch_method_body" to be consistent with the macro name?

('BFloat16', 'BFloat16', 'BFloat16AccrealNotDefined', True),
]

mobile_backends = ['CPU', 'QuantizedCPU', 'SparseCPU']
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_dispatch_backends?

.device(${device})
.pinned_memory(${pin_memory});
auto result_ = torch::${name}(${args_with_tensor_options});
#ifdef USE_STATIC_DISPATCH
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to look into optional build for autograd/variable functionality on top of this PR and make changes if I find anything - so it's fine as long as the static dispatching part works.

@zou3519 zou3519 deleted the gh/li-roy/36/head branch August 20, 2019 19:21
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@li-roy merged this pull request in 6824c90.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Aug 20, 2019
Summary: Pull Request resolved: pytorch/pytorch#22335

Test Plan: Imported from OSS

Differential Revision: D16048264

Pulled By: li-roy

fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Aug 20, 2019

@li-roy li-roy restored the gh/li-roy/36/head branch August 20, 2019 23:53
@li-roy li-roy deleted the gh/li-roy/36/head branch August 20, 2019 23:54
@li-roy li-roy restored the gh/li-roy/36/head branch August 20, 2019 23:56
@li-roy li-roy deleted the gh/li-roy/36/head branch August 21, 2019 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 Merged module: android Related to Android support module: build Build system issues module: ci Related to continuous integration module: internals Related to internal abstractions in c10 and ATen oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants