Function request: support returning multiple values in CPU kernel #51108
Labels
function request
A request for a new function or the addition of new arguments/modes to an existing function.
module: reductions
module: TensorIterator
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 Feature
Allow users to return multiple values in the CPU kernel.
Motivation
In the NumPy-like functionality rollup list #38349, there are some functions that require returning two tensors, such as divmod and frexp.
They are more complicated and less straightforward to be implemented compared to other NumPy-like functions, as the current CPU elementwise kernels like
cpu_kernel
,cpu_kernel_vec
andcpu_serial_kernel
only support one output tensor.Adding a new kernel function in
aten/src/ATen/native/cpu/Loops.h
that supports multiple outputs could decrease the complexity of implementing such NumPy-like functions, and may help developers implement torch functions with multiple outputs more conveniently in the future.Pitch
PR: #51097
Implement a new kernel function
cpu_kernel_multiple_outputs
. Instead of ascalar
type output, it requires developer return output values usingstd::tuple
.Example code:
The
out1
tensor will equal totorch.add(in1, in2)
, while theout2
will equal totorch.mul(in1, in2)
.Alternatives
Instead of leveraging CPU kernel functions, developers have to use a more primitive
for_each
TensorIterator function.This requires developers to manually handle logics like data type casting, offset calculations via strides and etc.
Additional context
gpu_kernel_multiple_outputs
through PR Implementgpu_kernel_multiple_outputs
聽#37969.cc @mruberry @heitorschueroff
The text was updated successfully, but these errors were encountered: