Function request: support returning multiple values in CPU kernel #51108

RockingJavaBean · 2021-01-26T13:41:24Z

🚀 Feature

Allow users to return multiple values in the CPU kernel.

Motivation

In the NumPy-like functionality rollup list #38349, there are some functions that require returning two tensors, such as divmod and frexp.

They are more complicated and less straightforward to be implemented compared to other NumPy-like functions, as the current CPU elementwise kernels like cpu_kernel, cpu_kernel_vec and cpu_serial_kernel only support one output tensor.

Adding a new kernel function in aten/src/ATen/native/cpu/Loops.h that supports multiple outputs could decrease the complexity of implementing such NumPy-like functions, and may help developers implement torch functions with multiple outputs more conveniently in the future.

Pitch

PR: #51097
Implement a new kernel function cpu_kernel_multiple_outputs. Instead of a scalar type output, it requires developer return output values using std::tuple.

Example code:

auto iter = at::TensorIteratorConfig()
  .add_output(out1)
  .add_output(out2)
  .add_input(in1)
  .add_input(in2)
  .build();
at::native::cpu_kernel_multiple_outputs(iter,
  [=](float a, float b) -> std::tuple<float, float> {
    float add = a + b;
    float mul = a * b;
    return std::tuple<float, float>(add, mul);
  }
);

The out1 tensor will equal to torch.add(in1, in2), while the out2 will equal to torch.mul(in1, in2).

Alternatives

Instead of leveraging CPU kernel functions, developers have to use a more primitive for_each TensorIterator function.
This requires developers to manually handle logics like data type casting, offset calculations via strides and etc.

Additional context

@crcrpar implements gpu_kernel_multiple_outputs through PR Implement gpu_kernel_multiple_outputs #37969.

cc @mruberry @heitorschueroff

The text was updated successfully, but these errors were encountered:

mruberry · 2021-02-01T06:43:52Z

Providing better support for GPU and CPU functions that return multiple values is a great. We should make sure that support is consistent between the two of them, too.

RockingJavaBean mentioned this issue Jan 26, 2021

Implements cpu_kernel_multiple_outputs and torch.frexp #51097

Closed

facebook-github-bot closed this as completed in da10ccd Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function request: support returning multiple values in CPU kernel #51108

Function request: support returning multiple values in CPU kernel #51108

RockingJavaBean commented Jan 26, 2021 •

edited

mruberry commented Feb 1, 2021

Function request: support returning multiple values in CPU kernel #51108

Function request: support returning multiple values in CPU kernel #51108

Comments

RockingJavaBean commented Jan 26, 2021 • edited

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

mruberry commented Feb 1, 2021

RockingJavaBean commented Jan 26, 2021 •

edited