Skip to content

Conversation

ljk53
Copy link
Contributor

@ljk53 ljk53 commented Feb 2, 2021

Stack from ghstack:

The new static dispatch and c10 registration can work together.

It generates static dispatch code for selected backends (if set) and
fallback to regular dispatch for the rest. This way, it can be
used to reduce dispatcher's overhead for perf sensitive use cases
without compromising the functionality.

If the static_dispatch_backends flag is not set, then the behavior is
the same as before.

Added back the E2E mobile static dispatch CI for testing purpose.

This PR doesn't try to optimize mobile build size yet. We can introduce
separate build flags to disable the fallback logic, with which the linker
can strip out unused op invocation code.

Static dispatch for manually registrated ops / custom ops / autograd
kernels are not handled by this PR. We can work on these special cases
progressively.

  • Sample code (with static dispatch backend = CPU):
// aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
Tensor Tensor::add(const Tensor & other, Scalar alpha) const {
    DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this));
    DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL);
    switch (_dk) {
    case DispatchKey::BackendSelect:
        // fallthrough
    case DispatchKey::CPU:
        return at::cpu::add(const_cast<Tensor&>(*this), other, alpha);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::add", "Tensor")
        .typed<Tensor (const Tensor &, const Tensor &, Scalar)>();
    return op.call(const_cast<Tensor&>(*this), other, alpha);
}
  • If the op has BackendSelect kernel, then it should fallback to c10 dispatch:
// aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
Tensor arange(Scalar end, const TensorOptions & options) {
    DispatchKey _dk = options.computeDispatchKey();

    switch (_dk) {
    case DispatchKey::CPU:
        return at::math::arange(end, options);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::arange", "")
        .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>();
    return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
  • If the op only has math kernel and there is no tensor argument / tensor option
    to infer the dispatch key from, then always dispatch to math kernel (only if
    static_dispatch_backends is set).
// aten::_nnpack_available() -> bool
bool _nnpack_available() {
    return at::math::_nnpack_available();
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_nnpack_available", "")
        .typed<bool ()>();
    return op.call();
}
  • If the op doesn't have CPU backend, then nothing changes:
// aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor
Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::quantized_batch_norm", "")
        .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>();
    return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point);
}

Differential Revision: D26197326

The new static dispatch and c10 registration can work together.

It generates static dispatch code for selected backends (if set) and
fallback to regular dispatch for the rest. This way, it can be
used to reduce dispatcher's overhead for perf sensitive use cases
without compromising the functionality.

If the static_dispatch_backends flag is not set, then the behavior is
the same as before.

Added back the E2E mobile static dispatch CI for testing purpose.

This PR doesn't try to optimize mobile build size yet. We can introduce
separate build flags to disable the fallback logic, with which the linker
can strip out unused op invocation code.

Static dispatch for manually registrated ops / custom ops / autograd
kernels are not handled by this PR. We can work on these special cases
progressively.

- Sample code (with static dispatch backend = CPU):
```
// aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
Tensor Tensor::add(const Tensor & other, Scalar alpha) const {
    DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this));
    DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL);
    switch (_dk) {
    case DispatchKey::BackendSelect:
        // fallthrough
    case DispatchKey::CPU:
        return at::cpu::add(const_cast<Tensor&>(*this), other, alpha);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::add", "Tensor")
        .typed<Tensor (const Tensor &, const Tensor &, Scalar)>();
    return op.call(const_cast<Tensor&>(*this), other, alpha);
}
```

- If the op has BackendSelect kernel, then it should fallback to c10 dispatch:
```
// aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
Tensor arange(Scalar end, const TensorOptions & options) {
    DispatchKey _dk = options.computeDispatchKey();

    switch (_dk) {
    case DispatchKey::CPU:
        return at::math::arange(end, options);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::arange", "")
        .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>();
    return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
```

- If the op only has math kernel and there is no tensor argument / tensor option
  to infer the dispatch key from, then always dispatch to math kernel (only if
  `static_dispatch_backends` is set).
```
// aten::_nnpack_available() -> bool
bool _nnpack_available() {
    return at::math::_nnpack_available();
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_nnpack_available", "")
        .typed<bool ()>();
    return op.call();
}
```

- If the op doesn't have CPU backend, then nothing changes:
```
// aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor
Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::quantized_batch_norm", "")
        .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>();
    return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point);
}
```

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 2, 2021

💊 CI failures summary and remediations

As of commit 8825104 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ljk53 added a commit that referenced this pull request Feb 2, 2021
The new static dispatch and c10 registration can work together.

It generates static dispatch code for selected backends (if set) and
fallback to regular dispatch for the rest. This way, it can be
used to reduce dispatcher's overhead for perf sensitive use cases
without compromising the functionality.

If the static_dispatch_backends flag is not set, then the behavior is
the same as before.

Added back the E2E mobile static dispatch CI for testing purpose.

This PR doesn't try to optimize mobile build size yet. We can introduce
separate build flags to disable the fallback logic, with which the linker
can strip out unused op invocation code.

Static dispatch for manually registrated ops / custom ops / autograd
kernels are not handled by this PR. We can work on these special cases
progressively.

- Sample code (with static dispatch backend = CPU):
```
// aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
Tensor Tensor::add(const Tensor & other, Scalar alpha) const {
    DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this));
    DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL);
    switch (_dk) {
    case DispatchKey::BackendSelect:
        // fallthrough
    case DispatchKey::CPU:
        return at::cpu::add(const_cast<Tensor&>(*this), other, alpha);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::add", "Tensor")
        .typed<Tensor (const Tensor &, const Tensor &, Scalar)>();
    return op.call(const_cast<Tensor&>(*this), other, alpha);
}
```

- If the op has BackendSelect kernel, then it should fallback to c10 dispatch:
```
// aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
Tensor arange(Scalar end, const TensorOptions & options) {
    DispatchKey _dk = options.computeDispatchKey();

    switch (_dk) {
    case DispatchKey::CPU:
        return at::math::arange(end, options);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::arange", "")
        .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>();
    return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
```

- If the op only has math kernel and there is no tensor argument / tensor option
  to infer the dispatch key from, then always dispatch to math kernel (only if
  `static_dispatch_backends` is set).
```
// aten::_nnpack_available() -> bool
bool _nnpack_available() {
    return at::math::_nnpack_available();
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_nnpack_available", "")
        .typed<bool ()>();
    return op.call();
}
```

- If the op doesn't have CPU backend, then nothing changes:
```
// aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor
Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::quantized_batch_norm", "")
        .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>();
    return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point);
}
```

ghstack-source-id: 28364e2
Pull Request resolved: #51554
t = b.ctype
if isinstance(t, ConstRefCType) and isinstance(t.elem, OptionalCType) and \
isinstance(t.elem.elem, BaseCType) and t.elem.elem.type == 'Tensor':
ctx[ConstRefCType(BaseCType("Tensor", b.name))] = f'({b.name}.has_value() ? *{b.name} : at::Tensor())'
Copy link
Contributor

@ezyang ezyang Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Today, functionally it doesn't make a difference, but it would be better to put this translation rule inside solve itself, so that we are still uniformly doing backward inference. Because the rule here is very simple it can be done with forward and backward, so it's mostly a uniformity thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The trouble with forward inference is when you start stuffing the context with tons and tons of possible conversions "just because they might help"; backward lets you be a lot more directed about things. Though it's not that bad of an idea; see Datalog for example :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what I changed my mind, unpacking of optional to tensor should be done as forward inference.

assert len(tensor_opts) == 1
# specialized fast pass
stmts.append(f"""\
DispatchKey _dk = {tensor_opts[0].name}.computeDispatchKey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that c10::detail::multi_dispatch_key_set is just as good as this ;)

for case_key in backends:
for dispatch_key in (case_key, DispatchKey.DefaultBackend, DispatchKey.Math):
# FIXME: how do I get dispatch table for function with structured_delegate? Is it correct to
# always statically dispatch to the delegate?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there's a structured delegate, the dispatch table is automatically generated based on the out variant (https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#structured-keyword-proposal ; there is no dispatch table for upsample_nearest1d because it delegates its dispatch to upsample_nearest1d_out). We're still on the hook for generating wrapper functions for all the variants.

I'm not sure if that helps you globally here, still reading.

ezyang added a commit that referenced this pull request Feb 2, 2021
This PR backports a subset of Jiakai's changes from
#51554 that adds support
for at::cpu in non-structured kernels.

The unusual bits:

- Need to add a new forward inference rule for doing conversions
  of const optional<Tensor>& to const Tensor&
- Need to give the wrapper functions a prefix so that the call to
  wrapper is not ambiguous

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Feb 2, 2021
This PR backports a subset of Jiakai's changes from
#51554 that adds support
for at::cpu in non-structured kernels.

The unusual bits:

- Need to add a new forward inference rule for doing conversions
  of const optional<Tensor>& to const Tensor&
- Need to give the wrapper functions a prefix so that the call to
  wrapper is not ambiguous

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 4dbeaf2
Pull Request resolved: #51590
ezyang added a commit that referenced this pull request Feb 3, 2021
This PR backports a subset of Jiakai's changes from
#51554 that adds support
for at::cpu in non-structured kernels.

The unusual bits:

- Need to add a new forward inference rule for doing conversions
  of const optional<Tensor>& to const Tensor&
- Need to give the wrapper functions a prefix so that the call to
  wrapper is not ambiguous

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D26209871](https://our.internmc.facebook.com/intern/diff/D26209871)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Feb 3, 2021
This PR backports a subset of Jiakai's changes from
#51554 that adds support
for at::cpu in non-structured kernels.

The unusual bits:

- Need to add a new forward inference rule for doing conversions
  of const optional<Tensor>& to const Tensor&
- Need to give the wrapper functions a prefix so that the call to
  wrapper is not ambiguous

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: d65a982
Pull Request resolved: #51590
facebook-github-bot pushed a commit that referenced this pull request Feb 4, 2021
Summary:
Pull Request resolved: #51590

This PR backports a subset of Jiakai's changes from
#51554 that adds support
for at::cpu in non-structured kernels.

The unusual bits:

- Need to add a new forward inference rule for doing conversions
  of const optional<Tensor>& to const Tensor&
- Need to give the wrapper functions a prefix so that the call to
  wrapper is not ambiguous

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D26209871

Pulled By: ezyang

fbshipit-source-id: 8162686039675ab92a2af7a14f6b18941f8944df
The new static dispatch and c10 registration can work together.

It generates static dispatch code for selected backends (if set) and
fallback to regular dispatch for the rest. This way, it can be
used to reduce dispatcher's overhead for perf sensitive use cases
without compromising the functionality.

If the static_dispatch_backends flag is not set, then the behavior is
the same as before.

Added back the E2E mobile static dispatch CI for testing purpose.

This PR doesn't try to optimize mobile build size yet. We can introduce
separate build flags to disable the fallback logic, with which the linker
can strip out unused op invocation code.

Static dispatch for manually registrated ops / custom ops / autograd
kernels are not handled by this PR. We can work on these special cases
progressively.

- Sample code (with static dispatch backend = CPU):
```
// aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
Tensor Tensor::add(const Tensor & other, Scalar alpha) const {
    DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this));
    DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL);
    switch (_dk) {
    case DispatchKey::BackendSelect:
        // fallthrough
    case DispatchKey::CPU:
        return at::cpu::add(const_cast<Tensor&>(*this), other, alpha);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::add", "Tensor")
        .typed<Tensor (const Tensor &, const Tensor &, Scalar)>();
    return op.call(const_cast<Tensor&>(*this), other, alpha);
}
```

- If the op has BackendSelect kernel, then it should fallback to c10 dispatch:
```
// aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
Tensor arange(Scalar end, const TensorOptions & options) {
    DispatchKey _dk = options.computeDispatchKey();

    switch (_dk) {
    case DispatchKey::CPU:
        return at::math::arange(end, options);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::arange", "")
        .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>();
    return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
```

- If the op only has math kernel and there is no tensor argument / tensor option
  to infer the dispatch key from, then always dispatch to math kernel (only if
  `static_dispatch_backends` is set).
```
// aten::_nnpack_available() -> bool
bool _nnpack_available() {
    return at::math::_nnpack_available();
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_nnpack_available", "")
        .typed<bool ()>();
    return op.call();
}
```

- If the op doesn't have CPU backend, then nothing changes:
```
// aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor
Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::quantized_batch_norm", "")
        .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>();
    return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point);
}
```

Differential Revision: [D26197326](https://our.internmc.facebook.com/intern/diff/D26197326)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 9, 2021
The new static dispatch and c10 registration can work together.

It generates static dispatch code for selected backends (if set) and
fallback to regular dispatch for the rest. This way, it can be
used to reduce dispatcher's overhead for perf sensitive use cases
without compromising the functionality.

If the static_dispatch_backends flag is not set, then the behavior is
the same as before.

Added back the E2E mobile static dispatch CI for testing purpose.

This PR doesn't try to optimize mobile build size yet. We can introduce
separate build flags to disable the fallback logic, with which the linker
can strip out unused op invocation code.

Static dispatch for manually registrated ops / custom ops / autograd
kernels are not handled by this PR. We can work on these special cases
progressively.

- Sample code (with static dispatch backend = CPU):
```
// aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
Tensor Tensor::add(const Tensor & other, Scalar alpha) const {
    DispatchKeySet _dk_set = c10::detail::multi_dispatch_key_set(other, const_cast<Tensor&>(*this));
    DispatchKey _dk = c10::impl::dispatchTypeId(_dk_set, DispatchKeySet::FULL);
    switch (_dk) {
    case DispatchKey::BackendSelect:
        // fallthrough
    case DispatchKey::CPU:
        return at::cpu::add(const_cast<Tensor&>(*this), other, alpha);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::add", "Tensor")
        .typed<Tensor (const Tensor &, const Tensor &, Scalar)>();
    return op.call(const_cast<Tensor&>(*this), other, alpha);
}
```

- If the op has BackendSelect kernel, then it should fallback to c10 dispatch:
```
// aten::arange(Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor
Tensor arange(Scalar end, const TensorOptions & options) {
    DispatchKey _dk = options.computeDispatchKey();

    switch (_dk) {
    case DispatchKey::CPU:
        return at::math::arange(end, options);
    default:
        // fallback to regular dispatch
        // TORCH_CHECK(false, "Unsupported static dispatch", _dk);
        break;
    }
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::arange", "")
        .typed<Tensor (Scalar, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>)>();
    return op.call(end, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt());
}
```

- If the op only has math kernel and there is no tensor argument / tensor option
  to infer the dispatch key from, then always dispatch to math kernel (only if
  `static_dispatch_backends` is set).
```
// aten::_nnpack_available() -> bool
bool _nnpack_available() {
    return at::math::_nnpack_available();
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_nnpack_available", "")
        .typed<bool ()>();
    return op.call();
}
```

- If the op doesn't have CPU backend, then nothing changes:
```
// aten::quantized_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor mean, Tensor var, float eps, float output_scale, int output_zero_point) -> Tensor
Tensor quantized_batch_norm(const Tensor & input, const c10::optional<Tensor> & weight, const c10::optional<Tensor> & bias, const Tensor & mean, const Tensor & var, double eps, double output_scale, int64_t output_zero_point) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::quantized_batch_norm", "")
        .typed<Tensor (const Tensor &, const c10::optional<Tensor> &, const c10::optional<Tensor> &, const Tensor &, const Tensor &, double, double, int64_t)>();
    return op.call(input, weight, bias, mean, var, eps, output_scale, output_zero_point);
}
```

ghstack-source-id: c0bb78f
Pull Request resolved: #51554
@ljk53 ljk53 closed this Feb 9, 2021
ljk53 added a commit that referenced this pull request Feb 9, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "")
        .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>();
    return op.call(log_probs, targets, input_lengths, target_lengths, blank);
}
```

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 9, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "")
        .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>();
    return op.call(log_probs, targets, input_lengths, target_lengths, blank);
}
```

Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 9, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "")
        .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>();
    return op.call(log_probs, targets, input_lengths, target_lengths, blank);
}
```

Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 9, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "")
        .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>();
    return op.call(log_probs, targets, input_lengths, target_lengths, blank);
}
```

Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 9, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "")
        .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>();
    return op.call(log_probs, targets, input_lengths, target_lengths, blank);
}
```

ghstack-source-id: f1dfc2e
Pull Request resolved: #51957
ljk53 added a commit that referenced this pull request Feb 19, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it still uses the c10 dispatch, e.g.:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::_use_cudnn_ctc_loss", "")
        .typed<bool (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t)>();
    return op.call(log_probs, targets, input_lengths, target_lengths, blank);
}
```

Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 19, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

Differential Revision: [D26337857](https://our.internmc.facebook.com/intern/diff/D26337857)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 19, 2021
This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

ghstack-source-id: c40e6b6
Pull Request resolved: #51957
facebook-github-bot pushed a commit that referenced this pull request Feb 19, 2021
Summary:
Pull Request resolved: #51957

This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

Differential Revision: D26337857

Test Plan: Imported from OSS

Reviewed By: bhosmer

Pulled By: ljk53

fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364
iramazanli pushed a commit to iramazanli/pytorch that referenced this pull request Feb 23, 2021
Summary:
Pull Request resolved: pytorch#51957

This is a simplified version of pytorch#51554.

Compared to pytorch#51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

Differential Revision: D26337857

Test Plan: Imported from OSS

Reviewed By: bhosmer

Pulled By: ljk53

fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364
@facebook-github-bot facebook-github-bot deleted the gh/ljk53/201/head branch March 11, 2021 15:14
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
Summary:
Pull Request resolved: pytorch#51957

This is a simplified version of pytorch#51554.

Compared to pytorch#51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

Differential Revision: D26337857

Test Plan: Imported from OSS

Reviewed By: bhosmer

Pulled By: ljk53

fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
Pull Request resolved: pytorch#51957

This is a simplified version of pytorch#51554.

Compared to pytorch#51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

Differential Revision: D26337857

Test Plan: Imported from OSS

Reviewed By: bhosmer

Pulled By: ljk53

fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants