Clarify treatment of Scalars vs 1D tesnors of size 1 #2428

askhade · 2019-10-31T18:08:53Z

ONNX IR does not state that a scalar should be treated same as 1D tensor od size 1 and through out the operator spec there is not consistency maintained between the usage of these 2.

For example when an input or attribute is defined as scalar in the spec I have seen 3 possible cases :

some operators which restrict these inputs\attributes as scalars in the shape inference
some operators which dont check shape inference at all and their unit tests use 1D tensors of size 1 instead of scalars (example NMS)
some operators allow the inputs to be scalars or 1D tensors of size 1 (example OneHot)

Which one of the 3 is the right behavior?

chinhuang007 · 2020-07-21T23:44:09Z

Any update on this? We currently follow the spec/doc strictly for OneHot so we treat depth as scalar only. If 1D tensors of size 1 is also allowed, I think the spec should be updated to make it clear.

gramalingam · 2020-07-22T00:46:50Z

Ideally, a scalar should be just treated as a 0D tensor. Unit tests should just use 0D tensors for scalars. (Shape-inference can check for it; but, we need to remember that shape-inference is not guaranteed to catch errors of this kind; sometimes, it is not possible to statically figure out the rank of an input tensor.)

The difficult problem is figuring out to what extent existing users/models have taken a dependency on using 1D tensors for scalars. If there are such users/models, "fixing" this issue could be challenging.

gramalingam · 2020-07-22T05:00:35Z

I suggest that for any new operator or any operator update, we strictly specify and use 0D tensors for scalars. (Correcting existing operators for the sake of consistency may be tricky and has potential disadvantages. It may be better to do that as and when needed.)

chinhuang007 · 2020-07-22T17:18:00Z

I also think "A scalar should be just treated as a 0D tensor. Unit tests should just use 0D tensors for scalars."

If that is the common guidelines for any new operator or any operator update, I don't see the reason NOT correcting the existing inconsistency. Why do we have double standard for old and new ops? As a standard spec, ONNX is supposed to provide consistent specification for all operators.

Just take OneHot as example, the spec states "Input 'depth' must be a scalar or a single-element vector",

onnx/onnx/defs/tensor/defs.cc

Line 2340 in 21b2cda

// Input 'depth' must be a scalar or a single-element vector.

, and unit test uses 1D tensor for depth as well. However, the doc mentions depth is scalar only,

onnx/onnx/defs/tensor/defs.cc

Line 2249 in 21b2cda

    
               dimension is specified by required scalar input 'depth'. The type of the output tensor is the same

and

onnx/onnx/defs/tensor/defs.cc

Line 2290 in 21b2cda

    
           "Scalar specifying the number of classes in one-hot tensor. This is also the size "

. If we don't want to change the implementation and unit tests, is it possible we just update the doc to include 1D tensor of size 1?

gramalingam · 2020-07-23T18:23:04Z

I agree that we should make the documentation and implementation etc. self-consistent. The non-trivial question is when such a change requires bumping the version-number of the operator. If we change the spec and implementation of OneHot to say that only 0D scalars are allowed, I think we would need to bump the version-number of the op. What do you think? It turns out bumping the version-number of an op in this fashion also has a non-trivial cost, since exporters, convertors, backends have specific checks on version-numbers.

chinhuang007 · 2020-07-24T01:01:59Z

I would suggest the opposite. We keep the spec and implementation of OneHot to support both scalar and 1D tensor of size 1. Just update the documentation to reflect the schema and implementation. Does it require a version-number bump?

gramalingam · 2020-07-27T16:02:24Z

Yes, I had the same question. It is somewhat a gray area. I personally think it is a reasonable compromise to update the documentation (in this case) without bumping the version-number. It would be a good idea to get community agreement and document the general guidelines, just like in this PR: https://github.com/onnx/onnx/pull/2925/files ... what do you think?

But it is also useful to decide what we want in the long run, just like you commented

If that is the common guidelines for any new operator or any operator update, I don't see the reason NOT correcting the existing 
inconsistency. Why do we have double standard for old and new ops? As a standard spec, ONNX is supposed to provide 
consistent specification for all operators.

Do we want to eventually align OneHot with other ops and restrict it to just scalars?

chinhuang007 · 2020-07-27T22:45:53Z

I think the general guideline is that doc update does not require bumping the version number, as long as there is no implementation (schema) change.

The second question probably should be answered by the operators SIG. I believe OneHot and NonMaxSuppression have the loosely defined scalar. If the SIG decides to remove 1D tensor of size 1, we will need to change the schema, unit tests, and bump the version number.

gramalingam · 2020-07-27T23:05:58Z

I think it is more subtle/complex than that. I think a doc update that does not change the specification of the operator does not require bumping the version number. What we are discussing above is slightly tricky because the existing "specification" is inconsistent, and we need to fix the inconsistency. I don't know if we have any guidelines for such a case. In principle, this change would be problematic for any backend that assumes that the input will be a 0D tensor (which is the case for bumping the opset version number). However, given that the existing test-cases include the 1D tensor case, we can argue any valid backend should be able to handle existing test-cases and so presumably are able to handle the 1D tensor case. So, I agree with the suggestion that we don't need to bump version number in this particular case.

AlexandreEichenberger · 2021-10-26T02:26:52Z

As I am implementing this feature in onnx-mlir, I would like to repeat that when we can get a value in different ways, it adds overhead. So sticking to a single approach (which appears 0D tensor) is best, and normalizing new and old are preferred.

If I compare ONNX with an other standard that I am familiar with (OpenMP), OpenMP has put a lot of emphasis there on defining terms properly. The good thing about a definition of a term (e.g. scalar) is that you can scan the glossary of definitions once, and then you never have to think again about it. In ONNX, without such definition, we have to read each individual operation's description very carefully to try to see if there is an exception or not to the rule. That is burdensome to the readers, as well as a burden to the spec writer as we have to precisely repeat the same definition for each operation.

Another example of this sort is the negative axis. I just spent 10 min trying to understand why the range of axis for OneHot is [-r-1, r] whereas for most other examples it is [-r, r-1]. If we had a clear definition for such intervals, all of the operation's definitions would become simpler because they all rely on the same unambiguous definition. Let me give it a try.

Bidirectional definition

A bidirectional-index(0, X) defines a value in the interval from 0 inclusively to X exclusively. A non-negative index (i>=0) points to the value in the interval counting up from 0. Thus index i>=0 points to interval value i. A negative index (i<0) points to the value in the interval counting down from X. Thus index i<0 points to the interval value X-i; for example, index i=-1 points to the interval value X-1, the last value included in the interval.

Valid indices in bidirectional-index(0, X) range from -X inclusively to X exclusively (or -X to X-1 inclusively).

Compress's axis definition

Axis along which to take slices; axis is given as a bidirectional-index(0, R) where R= rank(input). Axis 0 slice along the first dimension; axis R-1 or -1 slices the last dimension.

OneHot axis definition

Axis along which one-hot representation in added; axis is given as a bidirectional-index(0, R+1) where R =rank(indices). Axis 0 inserts the additional dimension before any of the indices dimensions; axis R or -1 inserts the additional dimension after any of the indices dimensions.

TLDR

I would encourage ONNX to define a few terms that would tighten the definition of the specs at the same time as shorten it and reduce the redundant info.

askhade added the operator Issues related to ONNX operators label Oct 31, 2019

askhade assigned ebarsoum, linkerzhang and gramalingam Oct 31, 2019

askhade added the question Questions about ONNX label Jun 19, 2020

jcwchen mentioned this issue Feb 10, 2021

how to make a node which contains scalar value #3262

Closed

jcwchen mentioned this issue Aug 13, 2021

[Tracking] Spec clarifications #3651

Open

askhade mentioned this issue Sep 21, 2021

Use 0D tensors for scalar inputs in testcases #3583

Closed

github-actions bot added the no-issue-activity label Oct 26, 2022

github-actions bot closed this as completed Nov 17, 2022

AlexandreEichenberger mentioned this issue May 26, 2023

document policy for optional-not-present, empty tensors, empty shapes onnx/onnx-mlir#2274

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify treatment of Scalars vs 1D tesnors of size 1 #2428

Clarify treatment of Scalars vs 1D tesnors of size 1 #2428

askhade commented Oct 31, 2019

chinhuang007 commented Jul 21, 2020 •

edited

gramalingam commented Jul 22, 2020

gramalingam commented Jul 22, 2020

chinhuang007 commented Jul 22, 2020

gramalingam commented Jul 23, 2020

chinhuang007 commented Jul 24, 2020

gramalingam commented Jul 27, 2020

chinhuang007 commented Jul 27, 2020

gramalingam commented Jul 27, 2020

AlexandreEichenberger commented Oct 26, 2021

Clarify treatment of Scalars vs 1D tesnors of size 1 #2428

Clarify treatment of Scalars vs 1D tesnors of size 1 #2428

Comments

askhade commented Oct 31, 2019

chinhuang007 commented Jul 21, 2020 • edited

gramalingam commented Jul 22, 2020

gramalingam commented Jul 22, 2020

chinhuang007 commented Jul 22, 2020

gramalingam commented Jul 23, 2020

chinhuang007 commented Jul 24, 2020

gramalingam commented Jul 27, 2020

chinhuang007 commented Jul 27, 2020

gramalingam commented Jul 27, 2020

AlexandreEichenberger commented Oct 26, 2021

Bidirectional definition

Compress's axis definition

OneHot axis definition

TLDR

chinhuang007 commented Jul 21, 2020 •

edited