Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify treatment of Scalars vs 1D tesnors of size 1 #2428

Closed
askhade opened this issue Oct 31, 2019 · 10 comments
Closed

Clarify treatment of Scalars vs 1D tesnors of size 1 #2428

askhade opened this issue Oct 31, 2019 · 10 comments
Assignees
Labels
no-issue-activity operator Issues related to ONNX operators question Questions about ONNX

Comments

@askhade
Copy link
Contributor

askhade commented Oct 31, 2019

ONNX IR does not state that a scalar should be treated same as 1D tensor od size 1 and through out the operator spec there is not consistency maintained between the usage of these 2.

For example when an input or attribute is defined as scalar in the spec I have seen 3 possible cases :

  1. some operators which restrict these inputs\attributes as scalars in the shape inference
  2. some operators which dont check shape inference at all and their unit tests use 1D tensors of size 1 instead of scalars (example NMS)
  3. some operators allow the inputs to be scalars or 1D tensors of size 1 (example OneHot)

Which one of the 3 is the right behavior?

@askhade askhade added the operator Issues related to ONNX operators label Oct 31, 2019
@askhade askhade added the question Questions about ONNX label Jun 19, 2020
@chinhuang007
Copy link
Contributor

chinhuang007 commented Jul 21, 2020

Any update on this? We currently follow the spec/doc strictly for OneHot so we treat depth as scalar only. If 1D tensors of size 1 is also allowed, I think the spec should be updated to make it clear.

@gramalingam
Copy link
Contributor

Ideally, a scalar should be just treated as a 0D tensor. Unit tests should just use 0D tensors for scalars. (Shape-inference can check for it; but, we need to remember that shape-inference is not guaranteed to catch errors of this kind; sometimes, it is not possible to statically figure out the rank of an input tensor.)

The difficult problem is figuring out to what extent existing users/models have taken a dependency on using 1D tensors for scalars. If there are such users/models, "fixing" this issue could be challenging.

@gramalingam
Copy link
Contributor

I suggest that for any new operator or any operator update, we strictly specify and use 0D tensors for scalars. (Correcting existing operators for the sake of consistency may be tricky and has potential disadvantages. It may be better to do that as and when needed.)

@chinhuang007
Copy link
Contributor

I also think "A scalar should be just treated as a 0D tensor. Unit tests should just use 0D tensors for scalars."

If that is the common guidelines for any new operator or any operator update, I don't see the reason NOT correcting the existing inconsistency. Why do we have double standard for old and new ops? As a standard spec, ONNX is supposed to provide consistent specification for all operators.

Just take OneHot as example, the spec states "Input 'depth' must be a scalar or a single-element vector",

// Input 'depth' must be a scalar or a single-element vector.
, and unit test uses 1D tensor for depth as well. However, the doc mentions depth is scalar only,
dimension is specified by required scalar input 'depth'. The type of the output tensor is the same
and
"Scalar specifying the number of classes in one-hot tensor. This is also the size "
. If we don't want to change the implementation and unit tests, is it possible we just update the doc to include 1D tensor of size 1?

@gramalingam
Copy link
Contributor

I agree that we should make the documentation and implementation etc. self-consistent. The non-trivial question is when such a change requires bumping the version-number of the operator. If we change the spec and implementation of OneHot to say that only 0D scalars are allowed, I think we would need to bump the version-number of the op. What do you think? It turns out bumping the version-number of an op in this fashion also has a non-trivial cost, since exporters, convertors, backends have specific checks on version-numbers.

@chinhuang007
Copy link
Contributor

I would suggest the opposite. We keep the spec and implementation of OneHot to support both scalar and 1D tensor of size 1. Just update the documentation to reflect the schema and implementation. Does it require a version-number bump?

@gramalingam
Copy link
Contributor

Yes, I had the same question. It is somewhat a gray area. I personally think it is a reasonable compromise to update the documentation (in this case) without bumping the version-number. It would be a good idea to get community agreement and document the general guidelines, just like in this PR: https://github.com/onnx/onnx/pull/2925/files ... what do you think?

But it is also useful to decide what we want in the long run, just like you commented

If that is the common guidelines for any new operator or any operator update, I don't see the reason NOT correcting the existing 
inconsistency. Why do we have double standard for old and new ops? As a standard spec, ONNX is supposed to provide 
consistent specification for all operators.

Do we want to eventually align OneHot with other ops and restrict it to just scalars?

@chinhuang007
Copy link
Contributor

I think the general guideline is that doc update does not require bumping the version number, as long as there is no implementation (schema) change.

The second question probably should be answered by the operators SIG. I believe OneHot and NonMaxSuppression have the loosely defined scalar. If the SIG decides to remove 1D tensor of size 1, we will need to change the schema, unit tests, and bump the version number.

@gramalingam
Copy link
Contributor

I think it is more subtle/complex than that. I think a doc update that does not change the specification of the operator does not require bumping the version number. What we are discussing above is slightly tricky because the existing "specification" is inconsistent, and we need to fix the inconsistency. I don't know if we have any guidelines for such a case. In principle, this change would be problematic for any backend that assumes that the input will be a 0D tensor (which is the case for bumping the opset version number). However, given that the existing test-cases include the 1D tensor case, we can argue any valid backend should be able to handle existing test-cases and so presumably are able to handle the 1D tensor case. So, I agree with the suggestion that we don't need to bump version number in this particular case.

@AlexandreEichenberger
Copy link
Contributor

As I am implementing this feature in onnx-mlir, I would like to repeat that when we can get a value in different ways, it adds overhead. So sticking to a single approach (which appears 0D tensor) is best, and normalizing new and old are preferred.

If I compare ONNX with an other standard that I am familiar with (OpenMP), OpenMP has put a lot of emphasis there on defining terms properly. The good thing about a definition of a term (e.g. scalar) is that you can scan the glossary of definitions once, and then you never have to think again about it. In ONNX, without such definition, we have to read each individual operation's description very carefully to try to see if there is an exception or not to the rule. That is burdensome to the readers, as well as a burden to the spec writer as we have to precisely repeat the same definition for each operation.

Another example of this sort is the negative axis. I just spent 10 min trying to understand why the range of axis for OneHot is [-r-1, r] whereas for most other examples it is [-r, r-1]. If we had a clear definition for such intervals, all of the operation's definitions would become simpler because they all rely on the same unambiguous definition. Let me give it a try.

Bidirectional definition

A bidirectional-index(0, X) defines a value in the interval from 0 inclusively to X exclusively. A non-negative index (i>=0) points to the value in the interval counting up from 0. Thus index i>=0 points to interval value i. A negative index (i<0) points to the value in the interval counting down from X. Thus index i<0 points to the interval value X-i; for example, index i=-1 points to the interval value X-1, the last value included in the interval.

Valid indices in bidirectional-index(0, X) range from -X inclusively to X exclusively (or -X to X-1 inclusively).

Compress's axis definition

Axis along which to take slices; axis is given as a bidirectional-index(0, R) where R= rank(input). Axis 0 slice along the first dimension; axis R-1 or -1 slices the last dimension.

OneHot axis definition

Axis along which one-hot representation in added; axis is given as a bidirectional-index(0, R+1) where R =rank(indices). Axis 0 inserts the additional dimension before any of the indices dimensions; axis R or -1 inserts the additional dimension after any of the indices dimensions.

TLDR

I would encourage ONNX to define a few terms that would tighten the definition of the specs at the same time as shorten it and reduce the redundant info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity operator Issues related to ONNX operators question Questions about ONNX
Projects
None yet
Development

No branches or pull requests

6 participants