Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Spec clarifications #3651

Open
jcwchen opened this issue Aug 13, 2021 · 22 comments
Open

[Tracking] Spec clarifications #3651

jcwchen opened this issue Aug 13, 2021 · 22 comments
Labels
contributions welcome documentation Issues related to ONNX documentation enhancement Request for new feature or operator spec clarification Clarification of the ONNX spec needed tracking Tracking issues

Comments

@jcwchen
Copy link
Member

jcwchen commented Aug 13, 2021

Creating this issue to track work items for spec clarifications. There are already multiple GitHub issues regarding op spec not being clear or error in expected outputs. I will collect them in this issue. Feel free to add more work item here. Thanks!

@jcwchen jcwchen added documentation Issues related to ONNX documentation enhancement Request for new feature or operator labels Aug 13, 2021
@jcwchen jcwchen pinned this issue Aug 13, 2021
@jcwchen
Copy link
Member Author

jcwchen commented Aug 13, 2021

Old issues which look still valid:

Pooling related

Conv/ConvTranspose related

RNN/LSTM related

Other operators

IR related

@AlexandreEichenberger
Copy link
Contributor

Conv operation does not specify if M (channel out) must be a multiple of number of groups. TF Keras and Pytorch have this restriction. #3641

@kraiskil
Copy link

Difference between Add and Sum could be more explicit, and that difference would be nice to have mentioned in both operator's documentation.
#339 says there was some differences with broadcasting support, but that doesn't seem to be the case anymore.

@tungld
Copy link

tungld commented Oct 5, 2021

Expected outputs for Round looks incorrect #3755.

This one has been resolved. (Edited by @jcwchen)

@gramalingam
Copy link
Contributor

The description of Cast operator could be improved (especially with conversions involving bool or string).

@jcwchen
Copy link
Member Author

jcwchen commented Dec 3, 2021

[Spec issue] Rounding or truncation behavior for Cast operator is not specified in the spec #3876

@gramalingam
Copy link
Contributor

DequantizeLinear/QuantizeLinear: Spec should be more explicit if the inputs should be 2D or can be N-D, and explain the behavior in N-D case better.

@gramalingam
Copy link
Contributor

QuantizeLinear: Spec should be more clear about the precision used for the different operators when using mixed-precision inputs (in computing x / y_scale + y_zero_point).

@jcwchen jcwchen added the tracking Tracking issues label Jan 20, 2022
@volcacius
Copy link

LSTM: whenever input_forget=True, the spec doesn't specify which set of weights and biases ends up being used. It should be clarified that it's implementation dependent (e.g. ONNXRuntime uses the input ones and discards the forget ones https://github.com/microsoft/onnxruntime/blob/ef7b4dc05cae1084546ff7caba048a2f908ac1d8/onnxruntime/test/providers/cpu/rnn/LSTM.py#L208 ).

@garymm
Copy link
Contributor

garymm commented Feb 10, 2022

Prelu: The constraint rank(slope) == rank(X) OR slope is unidirectional broadcastable to X makes sense, but the spec doesn't explicitly say that. That is, it seems to allow rank(slope) > rank(X), which doesn't seem like it makes sense (if it does, could someone please explain?).

https://github.com/onnx/onnx/blob/main/docs/Operators.md#Prelu

@jcwchen
Copy link
Member Author

jcwchen commented Mar 25, 2022

I have a same question as #2303: "Do subgraph initializer names and input names shadow outer-scope names?" It's better to clarify it in the IR.

@gramalingam
Copy link
Contributor

The specification of the "Pad" operator does not describe the intended behavior when there is an interaction between the use of negative pad-values (to truncate/slice) and the use of reflection (with a positive pad-value for the opposite side). Please see microsoft/onnxruntime#11828 for a detailed description.

@gramalingam
Copy link
Contributor

The storage_order attribute of the MaxPool op seems out of place in the ONNX spec. Does the op need to be updated to eliminate/deprecate this attribute?

@AlexandreEichenberger
Copy link
Contributor

Minor point, but naming of inputs matter in our onnx-mlir code base and we had to write extra templates because A & B for the MatMul, MatMulInteger, and QLinearMatMul are capitalized differently.

Use A & B:

Use a and b:

I know this may not matter to most, and it is not a big deal, but nevertheless name discrepancy between related ops for which, for example, shape inference can be "commoned" may face minor impediments due to naming conventions.

If this can be fixed without triggering incompatibilities between before/after renaming, then I am for it. If it would trigger incompatibilities, then let us just remember to use names as consistency as possible for future ops.

Thanks

@gramalingam
Copy link
Contributor

If this can be fixed without triggering incompatibilities between before/after renaming, then I am for it. If it would trigger incompatibilities, then let us just remember to use names as consistency as possible for future ops.

My own guess would be that the names should not matter. But, paradoxically, it looks like it would matter for onnx-mlir.

I am in favor of using uniform naming conventions. If anyone has any concerns about changing parameter names of ops (without changing their version-number), please let us know.

@AlexandreEichenberger
Copy link
Contributor

@gramalingam thanks for your answers

Found another case: Conv/ConvTranspose uses X() and W() but ConvInteger/QLinearConv uses x() and w()

@gramalingam
Copy link
Contributor

Another couple of issues relating to Reduce* ops:

(a) I assume that the Reduce* ops should support the special-case of zero-rank tensors (with a single element). It would be good if the spec explicitly clarifies this.

(b) The spec does not indicate what happens if the input is a tensor with zero elements: e.g., a tensor of shape (100, 0) with reduction along the axis with size 0 (axis 1 in above example).
(i) The ideal answer for ops like Sum is 0, and Prod is 1. (The identity element for the op.)
(ii) However, for other ops like Max or Min, it can be complicated. An ideal answer for Max() is minus-infinity, but that is valid for only float-like types. For integral-types, it could be min-int.

In any case, some clarification whether this is allowed would be useful.

@justinchuby
Copy link
Contributor

Max and Min can both take rank 0 tensors as inputs but it was not clear from the spec.

@edgchen1
Copy link
Contributor

Regarding Pad op spec:

onnx/docs/Operators.md

Lines 17129 to 17142 in 0e9deba

### <a name="Pad"></a><a name="pad">**Pad**</a>
Given a tensor containing the data to be padded (`data`), a tensor containing the number of start and end pad values for axis (`pads`), (optionally) a `mode`, and (optionally) `constant_value`,
a padded tensor (`output`) is generated.
The three supported `modes` are (similar to corresponding modes supported by `numpy.pad`):
1) `constant`(default) - pads with a given constant value as specified by `constant_value` (which defaults to 0, empty string, or False)
2) `reflect` - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis
3) `edge` - pads with the edge values of array
4) `wrap` - wrap-around padding as if the data tensor forms a torus

  • should probably also mention optional axes input in the first sentence
  • "The three supported modes are..." and then four are given

@edgchen1
Copy link
Contributor

edgchen1 commented Jun 13, 2023

Regarding MeanVarianceNormalization op spec:

onnx/docs/Operators.md

Lines 15221 to 15226 in 0f53636

#### Attributes
<dl>
<dt><tt>axes</tt> : list of ints (default is ['0', '2', '3'])</dt>
<dd>A list of integers, along which to reduce. The default is to caculate along axes [0,2,3] for calculating mean and variance along each channel. Two variables with the same C-coordinate are associated with the same mean and variance.</dd>
</dl>

  • How should the default axes value be interpreted for inputs with fewer than four dimensions? The "C" reference seems specific to NCHW. This op does not otherwise seem limited to 4D input though.
  • Are negative axis values supported?

@justinchuby
Copy link
Contributor

github-merge-queue bot pushed a commit that referenced this issue Sep 23, 2023
Clarify the behavior of the reduction-ops when reducing an empty set of
values, by updating the test-cases and documentation.

It is useful in various edge-cases. For example, ReduceProd should
return 1 for an empty tensor, and ReduceSum should return 0 for an empty
tensor.

(See #3651 (comment))

### Summary

ReduceSum ({}) = 0
ReduceProd ({}) = 1
ReduceMin ({}) = Max. value of datatype
ReduceMax ({}) = Min. value of datatype
ReduceLogSum ({}) = minus infinity or undefined for datatypes without
minus infinity
ReduceLogSumExp ({}) = minus infinity or undefined for datatypes without
minus infinity
ReduceMean ({}) = Undefined

---------

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: gramalingam <gramalingam@users.noreply.github.com>
@justinchuby justinchuby unpinned this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions welcome documentation Issues related to ONNX documentation enhancement Request for new feature or operator spec clarification Clarification of the ONNX spec needed tracking Tracking issues
Projects
None yet
Development

No branches or pull requests

9 participants