Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator versioning - round 2 #186

Merged
merged 11 commits into from
Nov 8, 2017
Merged

Operator versioning - round 2 #186

merged 11 commits into from
Nov 8, 2017

Conversation

donbox
Copy link
Contributor

@donbox donbox commented Nov 3, 2017

After several days of rapid iteration, Edward and I are retracting PR175 and replacing it with this PR.

The first checkin contains only the changes to ONNX.in.proto so that Edward and I can divide and conquer.

He will be doing the implementation. I will be doing the updates to versioning.md to reflect the design.

I attempted to make the comments in the .proto file as descriptive as possible and appropriate. Real prose to follow.

Feel free to comment away on this PR.

@donbox donbox requested a review from ezyang November 3, 2017 12:49
@donbox
Copy link
Contributor Author

donbox commented Nov 3, 2017

I believe the design that Edward and I worked on is now fully specified in both onnx.in.proto and versioning.md.

Unleash the review hounds, ideally before Edward gets too far into implementation.

@ezyang
Copy link
Contributor

ezyang commented Nov 3, 2017

@donbox I'm going to move the proto stuff around a little, no substantive changes.

* Operators that are referenced by a given ONNX graph. The version of a given operator is referred to as the *operator version*. The operator version is represented by the `TBD` field.
* An ONNX ModelProto that represents a given graph - that is, the contents of a model. We refer to this version as the *model version* and it is represented by the `ModelProto.model_version` field.
* The abstract model for graphs and operators and the concrete format that represents them. These are always versioned atomically and are referred to as the *IR version.*
* Operator specifications that may be referenced by a given ONNX graph. We refer to this as the *operator version*.
Copy link
Contributor

@ezyang ezyang Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more unambiguous to say something like, "The set of operators that may be referenced by a given ONNX graph. We refer to this as the operator set version."

I'd actually like to expunge ALL references to "operator version" from the standard (replacing them with "operator set version"), I think this will help prevent common mistakes from our users. The basic problem is that "operator version" implies that an operator has a version associated with it. But this is not the model; rather, an operator is canonically defined by a particular operator set version that it was added in (slash, the operator set version when it broke compatibility with the old version.) So I really want to avoid people forgetting to rev the major version, and I think being more precise with the terminology here will help.

Copy link
Contributor Author

@donbox donbox Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our call, it's clear to me that the prose and name around op_version is super confusing.

Per our chat, I'm working up a fix.

ISSUE: sort out the design and then document.
ONNX is defined such that the IR can evolve independently from the set of operators. In ONNX, operators represent both the signature and semantics of a given operation. Operators are abstract interfaces in that they do not imply a specific implementation; rather, they are simply the contract between a model author and the implementations that model may execute on.

A given operator is identified by a three-tuple: (domain, op_type, op_version), written domain.op_type:op_version in prose (e.g., com.acme.FastConv:3). Nodes in graphs always refer
Copy link
Contributor

@ezyang ezyang Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose some alternative nomenclature here, going further along the lines of "op version is confusing, because people will think ops version individually."

Let's think about what op_version actually means: it is the version of the opset in which this operator was added / had its last breaking change. So it is not an "operator version", it is an "operator set version". And what is it's relation with the operator; we've had the operator since that opset version. So a long and wordy, but clear (IMO) identifier is since_opset_version. This highlights the unusualness of the proposal: normally you don't identify operators by the version they were added, but this is exactly what we are doing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. We definitely need a different name. Look for a checkin in 30 min.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about calling it domain_version?

Any change of semantics implies a new operator, which MAY share a domain and op_type with another operator. This includes adding new behavior triggered only by previously unspecified inputs or attributes - these changes in semantics also MUST have a distinct operator id.

ONNX uses operator sets to group together immutable operator specifications. An ONNX operator set
specifies both the domain of all operators it includes, as well as an opset version. The opset version is largely independent from the version field of the operators it includes. When the enventory of a given operator set changes either by addition or removal, its opset version MUST increase. Moreover,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/enventory/inventory/

And notice that with the "since_opset_version" change, one doesn't have to call out that the opset version is "independent" of the op versions, because, well, obviously some ops are going to carry over from older ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

ONNX specification. The union of the operator sets specified by a given model MUST have have have a compatible operator declaration for each node in the model's graph.


How nodes bind to operator declarations is strictly defined and are designed to increase model compatibilitey across ONNX implementations (appealing to the conservative clause of the robustnes principle).
Copy link
Contributor

@ezyang ezyang Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/compatibilitey/compatibility/
s/robustnes/robustness/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank goodness you are part of the mechanical turk of typo corrections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/oroto/proto/
s/enventory/inventory/
s/have have have/have/
s/seriaized/serialized/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/therefor/therefore/

@gramalingam
Copy link
Contributor

Re. "As a rough approximation, the serialized model plays the role of an API's callee; the consumer of the seriaized model plays the role of the API's caller.". Wouldn't the producer of a serialized model play the role of the "caller" and the consumer (the runtime that executes) a model play the role of the "callee" ?

@gramalingam
Copy link
Contributor

A single model can reference and make use of multiple OperatorSets. Are there any constraints here? Can a model use both "com.acme.CNN:3" and "com.acme.RNN:5" mixing different versions of the same OperatorSet (domain)? Or, are such questions outside the scope of this? If an implementation claims to support some specific OperatorSets, does it mean that it should be able to handle models that mix operators from these different OperatorSets, or is that outside the scope of this standard?

@ezyang
Copy link
Contributor

ezyang commented Nov 3, 2017

Wouldn't the producer of a serialized model play the role of the "caller" and the consumer (the runtime that executes) a model play the role of the "callee" ?

Yeah, I agree with this.

A single model can reference and make use of multiple OperatorSets. Are there any constraints here? Can a model use both "com.acme.CNN:3" and "com.acme.RNN:5" mixing different versions of the same OperatorSet (domain)? Or, are such questions outside the scope of this?

You are NOT allowed to mix together multiple versions of an operator set. While technically nothing is stopping us from adding this to spec, it's bad behavior and we shan't encourage it.

One extension Don and I debated was whether or not to allow specifying a version on individual NodeProto. This would short circuit the normal resolution process and let you refer to a legacy operator. On balance, we decided not to add it, although it is a coherent extension.

If an implementation claims to support some specific OperatorSets, does it mean that it should be able to handle models that mix operators from these different OperatorSets, or is that outside the scope of this standard?

Yes, mixes of different operator sets (e.g., domains) should be handled.

@ezyang ezyang force-pushed the pr/operator-versioning branch 2 times, most recently from 6bb1e5a to 6740163 Compare November 3, 2017 21:47
@gramalingam
Copy link
Contributor

How do "experimental" operators fit within this scheme? Will they move into another OperatorSet, like "experimental.onnx"?

@ezyang
Copy link
Contributor

ezyang commented Nov 4, 2017

Will they move into another OperatorSet, like "experimental.onnx"?

The ONNX checker will move to only checking NodeProto in the default domain (domain is missing); and vendors can implement their own checkers for custom operator sets. We won't try to record vendor extensions here because, well, they're vendor extensions.

@ezyang
Copy link
Contributor

ezyang commented Nov 4, 2017

To see an example of operator versioning in implementation, see: onnx/onnx-caffe2#64

@ezyang
Copy link
Contributor

ezyang commented Nov 4, 2017

The checker probably needs some more in-depth checks for the new versioning.

@ezyang
Copy link
Contributor

ezyang commented Nov 4, 2017

PyTorch operator versioning here: pytorch/pytorch#3485

// OperatorSets are uniquely identified by a (domain, opset_version) pair.
message OperatorSetIdProto {
// The domain of the operator set being identified.
// The empty string ("") implies the operator ses that us defined as part of the ONNX specification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ses -> sets?

@@ -53,13 +66,33 @@ As a general principle, implementations SHOULD be robust in the face of missing

By way of example, the `ModelProto.ir_version` MUST be present in every model. The ONNX checker (`onnx/checker.py`) will enforce these rules.

ISSUE: decide and document how we want to handle changes to the format that are forward compatible but will cause the generated code to introduce a breaking change (e.g., changing the data type of a field from int64 to int32, int64 to double or bytes to string). I would recommend that we never do it after we hit 1.0.0 - prior to that we should be fairly liberal with changes to size (e.g., `int32<>int16`, `float<>double`) and conservative with changes to value space (e.g., `integral<>floatingpoint`, `string<>int`, `scalar<>message`)
Because onnx.proto is expected to be consumed by multiple independent developers, changes to onnx.oroto SHOULD NOT break code that depends on generated language bindings (e.g., changing the type of an existing field).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm my understanding please. So a type change of an existing field will introduce a IR major version change, am I right? (although it's data compatible in some cases). Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. I think as written here, it would be a major version change. However, I am not convinced this is necessarily what we should do; I'd rather revisions to major versions indicating something like, "Hey! Consumer! You need to do something different!" Which wouldn't be the case here.


ISSUE: define type compatiblity rules either here or under model versioning - probably here

## Operator versioning

ISSUE: sort out the design and then document.
ONNX is defined such that the IR can evolve independently from the set of operators. In ONNX, operators represent both the signature and semantics of a given operation. Operators are abstract interfaces in that they do not imply a specific implementation; rather, they are simply the contract between a model author and the implementations that model may execute on.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"operators represent" -> "an operator represents"? :)

to operators by their three-part identifier.

Once an operator is published, all implementations of that operator's (domain, op_type, op_version) MUST adhere to the signature and semantics of the operator at the time of publication.
Any change of semantics implies a new operator, which MAY share a domain and op_type with another operator. This includes adding new behavior triggered only by previously unspecified inputs or attributes - these changes in semantics also MUST have a distinct operator id.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion please. we may mark the definition "operator id" above a little bit more explicitly, when we say the tuple of (domain, op_type, op_version) is the identifier (operator id).


ONNX uses operator sets to group together immutable operator specifications. An ONNX operator set
specifies both the domain of all operators it includes, as well as an opset version. The opset version is largely independent from the version field of the operators it includes. When the enventory of a given operator set changes either by addition or removal, its opset version MUST increase. Moreover,
the opset version MUST be no less than the highest operator version number in the set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm my understanding please. So now, an operator set (operator specification) has a op_set_version, and each operator has a "since_op_set_version" to clarify when it's added into an operator set. Am I right? Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

the opset version MUST be no less than the highest operator version number in the set.

ONNX models declare which operator sets they require as a list of two part operator ids (domain, opset_version). The empty string ("") domain indicates the operators defined as part of the
ONNX specification. The union of the operator sets specified by a given model MUST have have have a compatible operator declaration for each node in the model's graph.
Copy link
Member

@linkerzhang linkerzhang Nov 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "have have have a compatible operator declaration..." for example is asking that "modol author should not use operators with same domain, same op_type, but different version in one model", am I right? if yes, we may list these policies explicitly here.
For the policies that we don't list here, I'm assuming that different runtime (importers) will have different way to handle (out of our control), make sense please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last question was answered with "yes" by "Implementations of ONNX MAY elect
to introduce more sophisticated operator declaration/implementation binding modes to appeal to the liberal clause of the robustness principle. "

@prasanthpul prasanthpul mentioned this pull request Nov 7, 2017
## IR versioning

Changes to the file format or graph model semantics version atomically. Breaking changes to the format or semantics of the ONNX specification require an increment of the MAJOR version. Non-breaking format or semantic changes that introduce new functionality require an increment of the MINOR version. Non-breaking changes to the specification that simply clarify spec ambiguities require an increment of the PATCH version.
Changes to the file format or abstract graph semantics version atomically. Breaking changes to the format or semantics of the ONNX specification require an increment of the MAJOR version. Non-breaking format or semantic changes that introduce new functionality require an increment of the MINOR version. Non-breaking changes to the specification that simply clarify spec ambiguities require an increment of the PATCH version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a description of how profile (ONNX, ONNX-ML) versioning works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one way to make it work:

  • ONNX ML has a separate IR version number
  • If a BC-breaking change is made in the ONNX ML fragment of the proto language, only the ONNX ML IR version number is bumped
  • If a BC-breaking change is made in the ONNX fragment, the ONNX IR and ONNX ML IR version numbers are bumped.

Thus, ONNX only clients don't see any IR version changes from ONNX ML, and ONNX ML clients see a coherent version number profile by looking at the ONNX ML number only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me since ONNX-ML is a superset of ONNX. Let's include this in the document.

ONNX is defined such that the IR can evolve independently from the set of operators. In ONNX, operators represent both the signature and semantics of a given operation. Operators are abstract interfaces in that they do not imply a specific implementation; rather, they are simply the contract between a model author and the implementations that model may execute on.

A given operator is identified by a three-tuple: (domain, op_type, op_version), written domain.op_type:op_version in prose (e.g., com.acme.FastConv:3). Nodes in graphs always refer
to operators by their three-part identifier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the standard domains for ONNX and ONNX-ML ops?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ONNX, the standard domain is empty; the implicit domain corresponds to ONNX.

ONNX-ML domain needs to be decided; may I suggest ai.onnx.ml?

Copy link
Member

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

For experimental ops - we should probably move them to be "caffe2/pytorch" domain. Who's on the hook to do that?

Also, Don - are you planning to change C++ checker to verify the correctness of import specs? (in this PR or another one). You'd need this logic anyway somewhere and it'd be really great to have it in ONNX repo in a shared form.

Another follow up item might be having a "upgrader" as a part of ONNX. So that for ops from the main domain which are changed in predictable ways we can systematically rewrite their arguments. This component can also be shared by various backends.

ISSUE: sort out the design and then document.
ONNX is defined such that the IR can evolve independently from the set of operators. In ONNX, operators represent both the signature and semantics of a given operation. Operators are abstract interfaces in that they do not imply a specific implementation; rather, they are simply the contract between a model author and the implementations that model may execute on.

A given operator is identified by a three-tuple: (domain, op_type, op_version), written domain.op_type:op_version in prose (e.g., com.acme.FastConv:3). Nodes in graphs always refer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about calling it domain_version?

* The earliest operator set version which this operator was
* present in.
*/
OpSchema& SinceVersion(int n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need "last changed in version"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method definitely needs some renaming (and doc-updating); it should be "LastUpdatedVersion" or something like that. The name here comes from when I was originally planning to have the ONNX repo continuously keep all information about operators, both old ones (who have since been updated in a BC-breaking way) and the up-to-date ones.

Don and I have discussed a bit about how exactly want to go about representing history change in defs. A more traditional model is that the state of the Git repository at any point in time is the definitive description of the operator state at this point in time. So, for example, if the current ONNX operator version is 34 and you want 12, you'll have to check out the ONNX repo as it was at that version to get the canonical description of that ONNX operator state. When the only description of operators is C++ code, this is not a great state of affairs; so an important further component of this story is a normative protobuf description of the operators at some version, made available for download in some way. The good thing about this approach is that it closely matches ordinary software development practice, and it doesn't require us to change very much our model for operator development; plus, you can't "edit" history.

@@ -257,6 +265,8 @@ class OpSchema {
int min_output_ = 0;
int max_output_ = std::numeric_limits<int>::max();
std::set<int> optional_inputs_;
// The default is a little goofy, since it is never what you want
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it -1 and you can enforce that it's set in Finalize() call. But then you'd need to codemod all operator defs

This checkin declared three new message types:

    - OperatorProto : a declaration of a specific op version of an operator
    - OperatorSetProto : a set of OperatorProtos that is identified by a domain and opset version
    - OperatorSetIdProto : the domain and opset version of an operator set

It also adds the following fields to existing message types:

    - ModelProto.import : a list of OperatorSets referenced by this ModelProto
    - NodeProto.domain/op_version: The domain and op_version of the operator being executed

edyang will follow up with backing implementaion
dbox will follow up with spec prose.
Don Box and others added 10 commits November 8, 2017 20:00
Added IR semver fields to OperatorSetProto (ModelProto changes in other PR).
Cleaned up some prose in the front matter and IR versioning sections.
…nd writing the obvious binding rule).

Changed OperatorProto.op_version to since_version and made definition clear.

Added Experimental|Stable to reflect versioning differences.  THis willl replace OpStatus from schema.h.

Still UNDONE is doing a word smithing pass in onnx.in.proto and rewriting the versioning.md operator spec to be more clear.
Also minor edits.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang
Copy link
Contributor

ezyang commented Nov 8, 2017

We definitely need more checker support; I'm going to go ahead and merge this for now but there are some follow up tasks to do (I'll open a task to track it.)

@ezyang ezyang merged commit a30709e into master Nov 8, 2017
@anderspapitto anderspapitto deleted the pr/operator-versioning branch February 22, 2018 20:17
@anderspapitto anderspapitto restored the pr/operator-versioning branch February 22, 2018 20:37
@prasanthpul prasanthpul deleted the pr/operator-versioning branch February 23, 2018 00:16
Ac2zoom pushed a commit to Ac2zoom/onnx that referenced this pull request Jun 21, 2018
* Initial checkin of op versioning round 2

This checkin declared three new message types:

    - OperatorProto : a declaration of a specific op version of an operator
    - OperatorSetProto : a set of OperatorProtos that is identified by a domain and opset version
    - OperatorSetIdProto : the domain and opset version of an operator set

It also adds the following fields to existing message types:

    - ModelProto.import : a list of OperatorSets referenced by this ModelProto
    - NodeProto.domain/op_version: The domain and op_version of the operator being executed

edyang will follow up with backing implementaion
dbox will follow up with spec prose.

* Fixed local merge issue

* Beefed up comments.
Added IR semver fields to OperatorSetProto (ModelProto changes in other PR).

* First pass at spec prose for operator versioning.
Cleaned up some prose in the front matter and IR versioning sections.

* Simplified Node->Operator binding (by removing nodeproto.op_version and writing the obvious binding rule).

Changed OperatorProto.op_version to since_version and made definition clear.

Added Experimental|Stable to reflect versioning differences.  THis willl replace OpStatus from schema.h.

Still UNDONE is doing a word smithing pass in onnx.in.proto and rewriting the versioning.md operator spec to be more clear.

* Split proto to separate file, with support.

Also minor edits.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* s/import/opset_import/ avoid reserved name

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add changelog entry.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Do a proper revision update to 0.0.3

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Cleaned up/clarified comments in proto file.

* Cleanup whitespace, regenerate proto files.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
pranavm-nvidia pushed a commit to pranavm-nvidia/onnx that referenced this pull request Jan 7, 2020
jcwchen pushed a commit to jcwchen/onnx that referenced this pull request Sep 23, 2020
* Initial checkin of op versioning round 2

This checkin declared three new message types:

    - OperatorProto : a declaration of a specific op version of an operator
    - OperatorSetProto : a set of OperatorProtos that is identified by a domain and opset version
    - OperatorSetIdProto : the domain and opset version of an operator set

It also adds the following fields to existing message types:

    - ModelProto.import : a list of OperatorSets referenced by this ModelProto
    - NodeProto.domain/op_version: The domain and op_version of the operator being executed

edyang will follow up with backing implementaion
dbox will follow up with spec prose.

* Fixed local merge issue

* Beefed up comments.
Added IR semver fields to OperatorSetProto (ModelProto changes in other PR).

* First pass at spec prose for operator versioning.
Cleaned up some prose in the front matter and IR versioning sections.

* Simplified Node->Operator binding (by removing nodeproto.op_version and writing the obvious binding rule).

Changed OperatorProto.op_version to since_version and made definition clear.

Added Experimental|Stable to reflect versioning differences.  THis willl replace OpStatus from schema.h.

Still UNDONE is doing a word smithing pass in onnx.in.proto and rewriting the versioning.md operator spec to be more clear.

* Split proto to separate file, with support.

Also minor edits.


* s/import/opset_import/ avoid reserved name


* Add changelog entry.


* Do a proper revision update to 0.0.3


* Cleaned up/clarified comments in proto file.

* Cleanup whitespace, regenerate proto files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants