[pytorch] reintroduce static dispatch #51554

ljk53 · 2021-02-02T11:00:56Z

Stack from ghstack:

[pytorch] reintroduce static dispatch #51554 [pytorch] reintroduce static dispatch

The new static dispatch and c10 registration can work together.

It generates static dispatch code for selected backends (if set) and
fallback to regular dispatch for the rest. This way, it can be
used to reduce dispatcher's overhead for perf sensitive use cases
without compromising the functionality.

If the static_dispatch_backends flag is not set, then the behavior is
the same as before.

Added back the E2E mobile static dispatch CI for testing purpose.

This PR doesn't try to optimize mobile build size yet. We can introduce
separate build flags to disable the fallback logic, with which the linker
can strip out unused op invocation code.

Static dispatch for manually registrated ops / custom ops / autograd
kernels are not handled by this PR. We can work on these special cases
progressively.

Sample code (with static dispatch backend = CPU):

// aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
Tensor Tensor::add(const Tensor & other, Scalar alpha) const {
    DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this));
    DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL);
    switch (_dk) {
    case DispatchKey::BackendSelect:
        // fallthrough
    case DispatchKey::CPU:
        return at::cpu::add(const_cast<Tensor&>(*this), other, alpha);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::add", "Tensor")
        .typed<Tensor (const Tensor &, const Tensor &, Scalar)>();
    return op.call(const_cast<Tensor&>(*this), other, alpha);
}

If the op has BackendSelect kernel, then it should fallback to c10 dispatch:

// aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
Tensor arange(Scalar end, const TensorOptions & options) {
    DispatchKey _dk = options.computeDispatchKey();

    switch (_dk) {
    case DispatchKey::CPU:
        return at::math::arange(end, options);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::arange", "")
        .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>();
    return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}

If the op only has math kernel and there is no tensor argument / tensor option
to infer the dispatch key from, then always dispatch to math kernel (only if
static_dispatch_backends is set).

// aten::_nnpack_available() -> bool
bool _nnpack_available() {
    return at::math::_nnpack_available();
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_nnpack_available", "")
        .typed<bool ()>();
    return op.call();
}

If the op doesn't have CPU backend, then nothing changes:

// aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor
Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::quantized_batch_norm", "")
        .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>();
    return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point);
}

Differential Revision: D26197326

The new static dispatch and c10 registration can work together. It generates static dispatch code for selected backends (if set) and fallback to regular dispatch for the rest. This way, it can be used to reduce dispatcher's overhead for perf sensitive use cases without compromising the functionality. If the static_dispatch_backends flag is not set, then the behavior is the same as before. Added back the E2E mobile static dispatch CI for testing purpose. This PR doesn't try to optimize mobile build size yet. We can introduce separate build flags to disable the fallback logic, with which the linker can strip out unused op invocation code. Static dispatch for manually registrated ops / custom ops / autograd kernels are not handled by this PR. We can work on these special cases progressively. - Sample code (with static dispatch backend = CPU): ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor Tensor Tensor::add(const Tensor & other, Scalar alpha) const { DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this)); DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL); switch (_dk) { case DispatchKey::BackendSelect: // fallthrough case DispatchKey::CPU: return at::cpu::add(const_cast<Tensor&>(*this), other, alpha); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::add", "Tensor") .typed<Tensor (const Tensor &, const Tensor &, Scalar)>(); return op.call(const_cast<Tensor&>(*this), other, alpha); } ``` - If the op has BackendSelect kernel, then it should fallback to c10 dispatch: ``` // aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor Tensor arange(Scalar end, const TensorOptions & options) { DispatchKey _dk = options.computeDispatchKey(); switch (_dk) { case DispatchKey::CPU: return at::math::arange(end, options); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::arange", "") .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>(); return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); } ``` - If the op only has math kernel and there is no tensor argument / tensor option to infer the dispatch key from, then always dispatch to math kernel (only if `static_dispatch_backends` is set). ``` // aten::_nnpack_available() -> bool bool _nnpack_available() { return at::math::_nnpack_available(); static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_nnpack_available", "") .typed<bool ()>(); return op.call(); } ``` - If the op doesn't have CPU backend, then nothing changes: ``` // aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::quantized_batch_norm", "") .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>(); return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point); } ``` [ghstack-poisoned]

facebook-github-bot · 2021-02-02T11:01:11Z

💊 CI failures summary and remediations

As of commit 8825104 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: GitHub Actions - flake8-py3

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

The new static dispatch and c10 registration can work together. It generates static dispatch code for selected backends (if set) and fallback to regular dispatch for the rest. This way, it can be used to reduce dispatcher's overhead for perf sensitive use cases without compromising the functionality. If the static_dispatch_backends flag is not set, then the behavior is the same as before. Added back the E2E mobile static dispatch CI for testing purpose. This PR doesn't try to optimize mobile build size yet. We can introduce separate build flags to disable the fallback logic, with which the linker can strip out unused op invocation code. Static dispatch for manually registrated ops / custom ops / autograd kernels are not handled by this PR. We can work on these special cases progressively. - Sample code (with static dispatch backend = CPU): ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor Tensor Tensor::add(const Tensor & other, Scalar alpha) const { DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this)); DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL); switch (_dk) { case DispatchKey::BackendSelect: // fallthrough case DispatchKey::CPU: return at::cpu::add(const_cast<Tensor&>(*this), other, alpha); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::add", "Tensor") .typed<Tensor (const Tensor &, const Tensor &, Scalar)>(); return op.call(const_cast<Tensor&>(*this), other, alpha); } ``` - If the op has BackendSelect kernel, then it should fallback to c10 dispatch: ``` // aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor Tensor arange(Scalar end, const TensorOptions & options) { DispatchKey _dk = options.computeDispatchKey(); switch (_dk) { case DispatchKey::CPU: return at::math::arange(end, options); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::arange", "") .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>(); return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); } ``` - If the op only has math kernel and there is no tensor argument / tensor option to infer the dispatch key from, then always dispatch to math kernel (only if `static_dispatch_backends` is set). ``` // aten::_nnpack_available() -> bool bool _nnpack_available() { return at::math::_nnpack_available(); static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_nnpack_available", "") .typed<bool ()>(); return op.call(); } ``` - If the op doesn't have CPU backend, then nothing changes: ``` // aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::quantized_batch_norm", "") .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>(); return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point); } ``` ghstack-source-id: 28364e2 Pull Request resolved: #51554

ezyang · 2021-02-02T22:06:53Z

tools/codegen/api/translate.py

+        t = b.ctype
+        if isinstance(t, ConstRefCType) and isinstance(t.elem, OptionalCType) and \
+           isinstance(t.elem.elem, BaseCType) and t.elem.elem.type == 'Tensor':
+            ctx[ConstRefCType(BaseCType("Tensor", b.name))] = f'({b.name}.has_value() ? *{b.name} : at::Tensor())'


nit: Today, functionally it doesn't make a difference, but it would be better to put this translation rule inside solve itself, so that we are still uniformly doing backward inference. Because the rule here is very simple it can be done with forward and backward, so it's mostly a uniformity thing.

(The trouble with forward inference is when you start stuffing the context with tons and tons of possible conversions "just because they might help"; backward lets you be a lot more directed about things. Though it's not that bad of an idea; see Datalog for example :)

You know what I changed my mind, unpacking of optional to tensor should be done as forward inference.

ezyang · 2021-02-02T22:10:27Z

tools/codegen/gen.py

+        assert len(tensor_opts) == 1
+        # specialized fast pass
+        stmts.append(f"""\
+    DispatchKey _dk = {tensor_opts[0].name}.computeDispatchKey();


I hope that c10::detail::multi_dispatch_key_set is just as good as this ;)

ezyang · 2021-02-02T22:14:18Z

tools/codegen/gen.py

+    for case_key in backends:
+        for dispatch_key in (case_key, DispatchKey.DefaultBackend, DispatchKey.Math):
+            # FIXME: how do I get dispatch table for function with structured_delegate? Is it correct to
+            # always statically dispatch to the delegate?


When there's a structured delegate, the dispatch table is automatically generated based on the out variant (https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#structured-keyword-proposal ; there is no dispatch table for upsample_nearest1d because it delegates its dispatch to upsample_nearest1d_out). We're still on the hook for generating wrapper functions for all the variants.

I'm not sure if that helps you globally here, still reading.

This PR backports a subset of Jiakai's changes from #51554 that adds support for at::cpu in non-structured kernels. The unusual bits: - Need to add a new forward inference rule for doing conversions of const optional<Tensor>& to const Tensor& - Need to give the wrapper functions a prefix so that the call to wrapper is not ambiguous Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

This PR backports a subset of Jiakai's changes from #51554 that adds support for at::cpu in non-structured kernels. The unusual bits: - Need to add a new forward inference rule for doing conversions of const optional<Tensor>& to const Tensor& - Need to give the wrapper functions a prefix so that the call to wrapper is not ambiguous Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 4dbeaf2 Pull Request resolved: #51590

This PR backports a subset of Jiakai's changes from #51554 that adds support for at::cpu in non-structured kernels. The unusual bits: - Need to add a new forward inference rule for doing conversions of const optional<Tensor>& to const Tensor& - Need to give the wrapper functions a prefix so that the call to wrapper is not ambiguous Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D26209871](https://our.internmc.facebook.com/intern/diff/D26209871) [ghstack-poisoned]

This PR backports a subset of Jiakai's changes from #51554 that adds support for at::cpu in non-structured kernels. The unusual bits: - Need to add a new forward inference rule for doing conversions of const optional<Tensor>& to const Tensor& - Need to give the wrapper functions a prefix so that the call to wrapper is not ambiguous Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: d65a982 Pull Request resolved: #51590

Summary: Pull Request resolved: #51590 This PR backports a subset of Jiakai's changes from #51554 that adds support for at::cpu in non-structured kernels. The unusual bits: - Need to add a new forward inference rule for doing conversions of const optional<Tensor>& to const Tensor& - Need to give the wrapper functions a prefix so that the call to wrapper is not ambiguous Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26209871 Pulled By: ezyang fbshipit-source-id: 8162686039675ab92a2af7a14f6b18941f8944df

The new static dispatch and c10 registration can work together. It generates static dispatch code for selected backends (if set) and fallback to regular dispatch for the rest. This way, it can be used to reduce dispatcher's overhead for perf sensitive use cases without compromising the functionality. If the static_dispatch_backends flag is not set, then the behavior is the same as before. Added back the E2E mobile static dispatch CI for testing purpose. This PR doesn't try to optimize mobile build size yet. We can introduce separate build flags to disable the fallback logic, with which the linker can strip out unused op invocation code. Static dispatch for manually registrated ops / custom ops / autograd kernels are not handled by this PR. We can work on these special cases progressively. - Sample code (with static dispatch backend = CPU): ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor Tensor Tensor::add(const Tensor & other, Scalar alpha) const { DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this)); DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL); switch (_dk) { case DispatchKey::BackendSelect: // fallthrough case DispatchKey::CPU: return at::cpu::add(const_cast<Tensor&>(*this), other, alpha); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::add", "Tensor") .typed<Tensor (const Tensor &, const Tensor &, Scalar)>(); return op.call(const_cast<Tensor&>(*this), other, alpha); } ``` - If the op has BackendSelect kernel, then it should fallback to c10 dispatch: ``` // aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor Tensor arange(Scalar end, const TensorOptions & options) { DispatchKey _dk = options.computeDispatchKey(); switch (_dk) { case DispatchKey::CPU: return at::math::arange(end, options); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::arange", "") .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>(); return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); } ``` - If the op only has math kernel and there is no tensor argument / tensor option to infer the dispatch key from, then always dispatch to math kernel (only if `static_dispatch_backends` is set). ``` // aten::_nnpack_available() -> bool bool _nnpack_available() { return at::math::_nnpack_available(); static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_nnpack_available", "") .typed<bool ()>(); return op.call(); } ``` - If the op doesn't have CPU backend, then nothing changes: ``` // aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::quantized_batch_norm", "") .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>(); return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point); } ``` Differential Revision: [D26197326](https://our.internmc.facebook.com/intern/diff/D26197326) [ghstack-poisoned]

The new static dispatch and c10 registration can work together. It generates static dispatch code for selected backends (if set) and fallback to regular dispatch for the rest. This way, it can be used to reduce dispatcher's overhead for perf sensitive use cases without compromising the functionality. If the static_dispatch_backends flag is not set, then the behavior is the same as before. Added back the E2E mobile static dispatch CI for testing purpose. This PR doesn't try to optimize mobile build size yet. We can introduce separate build flags to disable the fallback logic, with which the linker can strip out unused op invocation code. Static dispatch for manually registrated ops / custom ops / autograd kernels are not handled by this PR. We can work on these special cases progressively. - Sample code (with static dispatch backend = CPU): ``` // aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor Tensor Tensor::add(const Tensor & other, Scalar alpha) const { DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this)); DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL); switch (_dk) { case DispatchKey::BackendSelect: // fallthrough case DispatchKey::CPU: return at::cpu::add(const_cast<Tensor&>(*this), other, alpha); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::add", "Tensor") .typed<Tensor (const Tensor &, const Tensor &, Scalar)>(); return op.call(const_cast<Tensor&>(*this), other, alpha); } ``` - If the op has BackendSelect kernel, then it should fallback to c10 dispatch: ``` // aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor Tensor arange(Scalar end, const TensorOptions & options) { DispatchKey _dk = options.computeDispatchKey(); switch (_dk) { case DispatchKey::CPU: return at::math::arange(end, options); default: // fallback to regular dispatch // TORCH_CHECK(false, "Unsupported static dispatch", _dk); break; } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::arange", "") .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>(); return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt()); } ``` - If the op only has math kernel and there is no tensor argument / tensor option to infer the dispatch key from, then always dispatch to math kernel (only if `static_dispatch_backends` is set). ``` // aten::_nnpack_available() -> bool bool _nnpack_available() { return at::math::_nnpack_available(); static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_nnpack_available", "") .typed<bool ()>(); return op.call(); } ``` - If the op doesn't have CPU backend, then nothing changes: ``` // aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::quantized_batch_norm", "") .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>(); return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point); } ``` ghstack-source-id: c0bb78f Pull Request resolved: #51554

This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "") .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>(); return op.call(log_probs, targets, input_lengths, target_lengths, blank); } ``` [ghstack-poisoned]

This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "") .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>(); return op.call(log_probs, targets, input_lengths, target_lengths, blank); } ``` Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857) [ghstack-poisoned]

This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "") .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>(); return op.call(log_probs, targets, input_lengths, target_lengths, blank); } ``` ghstack-source-id: f1dfc2e Pull Request resolved: #51957

This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "") .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>(); return op.call(log_probs, targets, input_lengths, target_lengths, blank); } ``` Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857) [ghstack-poisoned]

This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857) [ghstack-poisoned]

This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` ghstack-source-id: c40e6b6 Pull Request resolved: #51957

Summary: Pull Request resolved: #51957 This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: D26337857 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364

Summary: Pull Request resolved: pytorch#51957 This is a simplified version of pytorch#51554. Compared to pytorch#51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: D26337857 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364

facebook-github-bot added the cla signed label Feb 2, 2021

ljk53 requested review from bhosmer, ezyang, smessmer and swolchok February 2, 2021 11:03

ezyang reviewed Feb 2, 2021

View reviewed changes

ezyang mentioned this pull request Feb 2, 2021

Support at::cpu on non-structured kernels #51590

Closed

ljk53 closed this Feb 9, 2021

ljk53 mentioned this pull request Feb 9, 2021

[pytorch] reintroduce static dispatch #51957

Closed

facebook-github-bot deleted the gh/ljk53/201/head branch March 11, 2021 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pytorch] reintroduce static dispatch #51554

[pytorch] reintroduce static dispatch #51554

Uh oh!

ljk53 commented Feb 2, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 2, 2021 •

edited

Loading

Uh oh!

ezyang Feb 2, 2021 •

edited

Loading

Uh oh!

ezyang Feb 2, 2021

Uh oh!

ezyang Feb 2, 2021

Uh oh!

ezyang Feb 2, 2021

Uh oh!

ezyang Feb 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[pytorch] reintroduce static dispatch #51554

[pytorch] reintroduce static dispatch #51554

Uh oh!

Conversation

ljk53 commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

Uh oh!

ezyang Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Feb 2, 2021

Choose a reason for hiding this comment

Uh oh!

ezyang Feb 2, 2021

Choose a reason for hiding this comment

Uh oh!

ezyang Feb 2, 2021

Choose a reason for hiding this comment

Uh oh!

ezyang Feb 2, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ljk53 commented Feb 2, 2021 •

edited

Loading

facebook-github-bot commented Feb 2, 2021 •

edited

Loading

ezyang Feb 2, 2021 •

edited

Loading