-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify treatment of Scalars vs 1D tesnors of size 1 #2428
Comments
Any update on this? We currently follow the spec/doc strictly for OneHot so we treat depth as scalar only. If 1D tensors of size 1 is also allowed, I think the spec should be updated to make it clear. |
Ideally, a scalar should be just treated as a 0D tensor. Unit tests should just use 0D tensors for scalars. (Shape-inference can check for it; but, we need to remember that shape-inference is not guaranteed to catch errors of this kind; sometimes, it is not possible to statically figure out the rank of an input tensor.) The difficult problem is figuring out to what extent existing users/models have taken a dependency on using 1D tensors for scalars. If there are such users/models, "fixing" this issue could be challenging. |
I suggest that for any new operator or any operator update, we strictly specify and use 0D tensors for scalars. (Correcting existing operators for the sake of consistency may be tricky and has potential disadvantages. It may be better to do that as and when needed.) |
I also think "A scalar should be just treated as a 0D tensor. Unit tests should just use 0D tensors for scalars." If that is the common guidelines for any new operator or any operator update, I don't see the reason NOT correcting the existing inconsistency. Why do we have double standard for old and new ops? As a standard spec, ONNX is supposed to provide consistent specification for all operators. Just take OneHot as example, the spec states "Input 'depth' must be a scalar or a single-element vector", Line 2340 in 21b2cda
Line 2249 in 21b2cda
Line 2290 in 21b2cda
|
I agree that we should make the documentation and implementation etc. self-consistent. The non-trivial question is when such a change requires bumping the version-number of the operator. If we change the spec and implementation of OneHot to say that only 0D scalars are allowed, I think we would need to bump the version-number of the op. What do you think? It turns out bumping the version-number of an op in this fashion also has a non-trivial cost, since exporters, convertors, backends have specific checks on version-numbers. |
I would suggest the opposite. We keep the spec and implementation of OneHot to support both scalar and 1D tensor of size 1. Just update the documentation to reflect the schema and implementation. Does it require a version-number bump? |
Yes, I had the same question. It is somewhat a gray area. I personally think it is a reasonable compromise to update the documentation (in this case) without bumping the version-number. It would be a good idea to get community agreement and document the general guidelines, just like in this PR: https://github.com/onnx/onnx/pull/2925/files ... what do you think? But it is also useful to decide what we want in the long run, just like you commented
Do we want to eventually align OneHot with other ops and restrict it to just scalars? |
I think the general guideline is that doc update does not require bumping the version number, as long as there is no implementation (schema) change. The second question probably should be answered by the operators SIG. I believe OneHot and NonMaxSuppression have the loosely defined scalar. If the SIG decides to remove 1D tensor of size 1, we will need to change the schema, unit tests, and bump the version number. |
I think it is more subtle/complex than that. I think a doc update that does not change the specification of the operator does not require bumping the version number. What we are discussing above is slightly tricky because the existing "specification" is inconsistent, and we need to fix the inconsistency. I don't know if we have any guidelines for such a case. In principle, this change would be problematic for any backend that assumes that the input will be a 0D tensor (which is the case for bumping the opset version number). However, given that the existing test-cases include the 1D tensor case, we can argue any valid backend should be able to handle existing test-cases and so presumably are able to handle the 1D tensor case. So, I agree with the suggestion that we don't need to bump version number in this particular case. |
As I am implementing this feature in onnx-mlir, I would like to repeat that when we can get a value in different ways, it adds overhead. So sticking to a single approach (which appears 0D tensor) is best, and normalizing new and old are preferred. If I compare ONNX with an other standard that I am familiar with (OpenMP), OpenMP has put a lot of emphasis there on defining terms properly. The good thing about a definition of a term (e.g. scalar) is that you can scan the glossary of definitions once, and then you never have to think again about it. In ONNX, without such definition, we have to read each individual operation's description very carefully to try to see if there is an exception or not to the rule. That is burdensome to the readers, as well as a burden to the spec writer as we have to precisely repeat the same definition for each operation. Another example of this sort is the negative axis. I just spent 10 min trying to understand why the range of axis for OneHot is [-r-1, r] whereas for most other examples it is [-r, r-1]. If we had a clear definition for such intervals, all of the operation's definitions would become simpler because they all rely on the same unambiguous definition. Let me give it a try. Bidirectional definitionA Valid indices in Compress's axis definitionAxis along which to take slices; axis is given as a OneHot axis definitionAxis along which one-hot representation in added; axis is given as a TLDRI would encourage ONNX to define a few terms that would tighten the definition of the specs at the same time as shorten it and reduce the redundant info. |
ONNX IR does not state that a scalar should be treated same as 1D tensor od size 1 and through out the operator spec there is not consistency maintained between the usage of these 2.
For example when an input or attribute is defined as scalar in the spec I have seen 3 possible cases :
Which one of the 3 is the right behavior?
The text was updated successfully, but these errors were encountered: