Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added TensorDenotation and metadata_props for images #879

Merged
merged 25 commits into from May 23, 2018

Conversation

walrusmcd
Copy link
Contributor

@walrusmcd walrusmcd commented May 3, 2018

This proposal introduces Tensor Denotation

Tensor Denotation is an experimental attempt along with Dimension Denotation to give tensor semantic description and enable model authors to describe what the input and output tensors are.

The motivation of such a mechanism can be illustrated via a simple example. In the the neural network SqueezeNet, it takes in an NCHW image input float[1,2,244,244] and produces a output float[1,1000,1,1]:
input_in_NCHW -> data_0 -> SqueezeNet() -> output_softmaxout_1
In order to run this model the user needs a lot of information. In this case the user needs to know:

  • the input is an image
  • the image is in the format of NCHW
  • the color channels are in the order of bgr
  • the pixel data is 8 bit
  • the pixel data is stored as values 0-255

This proposal consists of three key components to provide all of this information: Tensor Denotation, Dimension Denotation, and model metadata. Each of which will be discussed in detail.

@CLAassistant
Copy link

CLAassistant commented May 3, 2018

CLA assistant check
All committers have signed the CLA.

// An optional denotation can be used to denote the whole
// tensor with a standard semantic description as to what is
// stored inside
optional string denotation = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd put this denotation in TypeProto instead of Tensor so that other types (say Map, Sequence) will also be able to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had thought of that option as well. I think the real challenge here is that we have Tensor as our base …. and all of these other things are sub classes of tensor. Map and Sequence are first class citizens right now. I imagine after this experiment, we would promote Image/Audio/Text to first class citezens as well. Thus my though was to keep it on tensor to allow us to experiment for candidates to promote to first class types.
Also, maps and sequences don't have the denotations we define in this first iteration. A map cannot be an image/audio/text. Nor can a sequence. I think if we find candidates for denotation on map & sequence (Things that are common to all TypeProto's) ; we could add one then.
thoughts?


## Tensor Denotation

See the [tensor denotation documentation](TensorDenotation.md) for more details on denotation applied at the tensor level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge all denotation documents into one file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that , but then it becomes on larger doc. Preference / thoughts?

if you really like the idea I can take the time to merge them and see how it looks. lmk.


This proposal consists of three key components to provide all of this information: Tensor Denotation, [Dimension Denotation](DimensionDenotation.md), and [model metadata](MetadataProps.md). Each of which will be discussed in detail.

## Tensor Denotatio Definition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n is missing -> Denotation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. fixed in the the next iteration.


|Key|Value|Description|
|-----|----|-----------|
|`Image.BitmapPixelFormat`|__string__|Specifies the format of pixel data. Each enumeration value defines a channel ordering and bit depth. Possible values: <ul><li>`Gray8`: 1 channel image, the pixel data is 8 bpp grayscale.</li><li>`Rgb8`: 3 channel image, channel order is RGB, pixel data is 8bpp (No alpha)</li><li>`Bgr8`: 3 channel image, channel order is BGR, pixel data is 8bpp (No alpha)</li><li>`Rgba8`: 4 channel image, channel order is RGBA, pixel data is 8bpp (Straight alpha)</li><li>`Bgra8`: 4 channel image, channel order is BGRA, pixel data is 8bpp (Straight alpha)</li></ul>|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call out that keys and values are case insensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice ! added to the next iteration.


## Dimension Denotation

Tensor denotation can be combined with the new experimental feature of Dimension Denotation. For example if the Tensor Denotation is `IMAGE`, then that tensor SHOULD also have [Dimension Denotation](DimensionDenotation.md) stating the NCHW channel layout. (described as [`DATA_BATCH`, `DATA_CHANNEL`, `DATA_FEATURE`, `DATA_FEATURE`]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NCHW format on a model level is not easily enforceable and thus brings about nightmares in terms of compliance checking. ONNX, as it stands today, only enforces node-level input format, which does not translate to any restrictions on model level format. C.f. my comment copied from the thread discussing Dimension Denotation:

  1. ONNX as a standard will keep as it is to support only NCHW.

    We need to be careful about what you mean. There are no operations supporting NHWC format in ONNX; however, that does not preclude models from supporting NHWC format input. In fact, you cannot preclude such use case as long as you have a transpose in the operator set because you can always transpose the input once and enter the NCHW world thereafter. Thus, unless you explicitly pattern match and prohibit these cases in the checker, there is no way to prevent a user from creating ONNX models that accept NHWC input but uses NCHW operators. Thus it is my opinion that

    This means all ONNX models should be in NCHW format,

    does not hold true as an implication from your previous statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. with this we are not proposing enforcing or being strict about NCHW, but instead allowing enough metadata and denotation for a model builder to let a model consumer know what to do. This PR is less about NCHW (dimension denotation talks about that). This PR is more about the other things metadata talks about (bitmap pixel format, etc..etc..) .

thoughts? Would you propose changing anything in the docs or strings for this PR ? thx !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the delay, my opinion is that SHOULD is too big a word in this scenario for reasons I outlined earlier. My suggestion is to switch to may.

And why did you capitalize it?

be used during inferencing.

The goal is this proposal is to provide enough metadata that the model consumer can perform their own featurization prior to running the model and provide a compatible input
or retrive an output and know what it's format is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's --> its ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next iteration

@AppVeyorBot
Copy link

Build onnx 0.3.3055 failed (commit 1b7498f427 by @walrusmcd)

@tjingrant
Copy link
Contributor

tjingrant commented May 12, 2018

@walrusmcd In addition to the wording comments, I would suggest you don't try to draw a parallel between other types of denotations and dimension denotations. And I will outline my reason:

Dimension denotation is a unique subject whose complexity may go beyond TensorDenotation or TypeDenotation. The primary goal of dimension denotation is not the storage of meta data, but rather the propagation and verification of the meta data, which is no where near the goal of other types of denotations, you are wasting spaces trying to explain that the other meta datas are not propagated and verified. Furthermore, you are distracting readers who need to realize the complexity and purpose of dimension denotation. What you are interested in is a very specific sub feature of Dimension Denotation, so the best way to refer to Dimension Denotation is via a link to the original proposal.

Not to mention that, dimension denotation was the result of lengthy discussion, debate and deliberations (c.f., #443). Even the word denotation was carefully chosen to express the connotation of "literal meaning" or "meaning at its lowest level of abstraction", which is BTW an unnecessary word choice in this context and may make your proposal a bit less understandable to people whose mother tongue is not English. I would kindly ask you please do not move the original proposal around unless it is absolutely necessary.

@AppVeyorBot
Copy link

Build onnx 0.3.3082 failed (commit e693a9c814 by @linkerzhang)

@AppVeyorBot
Copy link

Build onnx 0.3.3094 failed (commit c9001e36b0 by @linkerzhang)

@AppVeyorBot
Copy link

Build onnx 0.3.3095 failed (commit dace12a51c by @walrusmcd)

@AppVeyorBot
Copy link

Build onnx 0.3.3096 failed (commit 87abe5394a by @walrusmcd)

@linkerzhang
Copy link
Member

@bddppq @houseroad please help to review and share your comments if any.


#### Motivation

The motivation of such a mechanism can be illustrated via a simple example. In the the neural network SqueezeNet, it takes in an NCHW image input float[1,2,244,244] and produces a output float[1,1000,1,1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typical SqueezeNet takes [1, 3, 224, 224] as input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I meant 3. fixed.

# Metadata
# Metadata

In addition to the core metadata recommendations listed in the [extensibility documentation](IR.md#metadata) there is additional experimental metadata to help provide information for model inputs and outputs.
Copy link
Member

@houseroad houseroad May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc is called metadata prop, so we should not introduce the additional experimental metadata here.
Let's move the introduction of the experimental metadata to metadata section in IR doc.

We can extract the useful pieces in this doc, and merge it into metadata section in IR.doc.

At least, we should have a better name for this section...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example should help to show that we have 3 separate pieces here, and how to tie them all together.

|-----|----|-----------|
|`Image.BitmapPixelFormat`|__string__|Specifies the format of pixel data. Each enumeration value defines a channel ordering and bit depth. Possible values: <ul><li>`Gray8`: 1 channel image, the pixel data is 8 bpp grayscale.</li><li>`Rgb8`: 3 channel image, channel order is RGB, pixel data is 8bpp (No alpha)</li><li>`Bgr8`: 3 channel image, channel order is BGR, pixel data is 8bpp (No alpha)</li><li>`Rgba8`: 4 channel image, channel order is RGBA, pixel data is 8bpp (Straight alpha)</li><li>`Bgra8`: 4 channel image, channel order is BGRA, pixel data is 8bpp (Straight alpha)</li></ul>|
|`Image.ColorSpaceGamma`|__string__|Specifies the gamma color space used. Possible values:<ul><li>`Linear`: Linear color space, gamma == 1.0</li><li>`SRGB`: sRGB color space, gamma == 2.2</li></ul>|
|`Image.NominalPixelRange`|__string__|Specifies the range that pixel values are stored. Possible values: <ul><li>`NominalRange_0_255`: [0...255] for 8bpp samples</li><li>`Normalized_0_1`: [0...1] pixel data is stored normalized</li><li>`Normalized_1_1`: [-1...1] pixel data is stored normalized</li><li>`NominalRange_16_235`: [16...235] for 8bpp samples</li></ul>|
Copy link
Member

@houseroad houseroad May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, means for each channel is also needed to preprocess the input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think that would look ? Are you OK if we add this as a follow up PR later ? I think this follows this model of adding more and more metadata as we find it to be useful. love it !

Copy link
Member

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR. This version looks good to me. We may also want to add mean as metadata, because it is used in many cases during preprocessing.

* `Image.BitmapPixelFormat` = `Bgr8`
* `Image.ColorSpaceGamma` = `SRGB`
* `Image.NominalPixelRange` = `NominalRange_0_255`
* For that same ValueInfoProto, make sure to also use Dimension Denotations to denote NCHW

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you consider to add more metadata like "crop_size", "mean_value", or "scale"? (Just like caffe data layer "transform_param")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We considered this and did not go that route in that the goal was not to do the transformation for you, but instead provide the final values that the model expects. This allows people using the model to perform their own transformation outside the model.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the metadata contains the transformation information, the model user can know how to perform the transform that the model expects. For example, specify the mean value [123.68, 116.78, 103.94] for VGG net in the metadata in order to hint model user to do the image pre-processing.

- extra line in metadata.md
- added a default "TENSOR" value to typedenotation as a base option
@AppVeyorBot
Copy link

Build onnx 0.3.3523 failed (commit d7077107c2 by @walrusmcd)


Specifically, in our first proposal we define the following set of standard denotations:

0. `TENSOR` describes that a type holds an generic tensor using the standard TypeProto message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "a" not "an"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

Copy link
Contributor

@lupesko lupesko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, one tiny nit :)

@AppVeyorBot
Copy link

Build onnx 0.3.3537 failed (commit cfe8f6730c by @linkerzhang)

@linkerzhang linkerzhang merged commit 4898c9e into onnx:master May 23, 2018
onnxbot added a commit to pytorch/pytorch that referenced this pull request May 24, 2018
bddppq added a commit to pytorch/pytorch that referenced this pull request May 24, 2018
onnxbot added a commit to onnx/onnx-coreml that referenced this pull request May 24, 2018
orionr pushed a commit to pytorch/pytorch that referenced this pull request May 24, 2018
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e"

This reverts commit 9c679da.

* Revert "Add BiasCHW fallback for GPU (#7738)"

This reverts commit 14ad2e7.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)"

This reverts commit 2ebcf4b.
zdevito pushed a commit to zdevito/ATen that referenced this pull request May 24, 2018
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e"

This reverts commit 9c679dab5fe7cac27bb8c783fd143276e6046ef1.

* Revert "Add BiasCHW fallback for GPU (#7738)"

This reverts commit 14ad2e74f108d13ec98abb078f6aa7f01aae0aad.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)"

This reverts commit 2ebcf4bb37739733e76b754284cf8b2ffcba1c30.
@linkerzhang linkerzhang mentioned this pull request May 24, 2018
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e"

This reverts commit 9c679da.

* Revert "Add BiasCHW fallback for GPU (pytorch#7738)"

This reverts commit 14ad2e7.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (pytorch#7566)"

This reverts commit 2ebcf4b.
Ac2zoom pushed a commit to Ac2zoom/onnx that referenced this pull request Jun 21, 2018
* Added TensorDenotation, markdown docs for metadata_props for images, and markdown docs for dimension denotation

* CR feedback in the docs.

* moved to type

* Updated the docs, kept TypeDenotation and DimesnionDenotation sperate (CR feedback)

* fixed the CI by running gen_proto.py

* PR feedback, trimmed the type denotation doc and added a sample

* markdown cleanup

* markdown cleanup

* markdown cleanup

* PR feedback.
- extra line in metadata.md
- added a default "TENSOR" value to typedenotation as a base option

* typo (a vs. an)
changed TENSOR from 0 to 1 - onnx field numbers must be positive integers

* typo - a vs an
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e"

This reverts commit 9c679dab5fe7cac27bb8c783fd143276e6046ef1.

* Revert "Add BiasCHW fallback for GPU (#7738)"

This reverts commit 14ad2e74f108d13ec98abb078f6aa7f01aae0aad.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)"

This reverts commit 2ebcf4bb37739733e76b754284cf8b2ffcba1c30.
jcwchen pushed a commit to jcwchen/onnx that referenced this pull request Sep 23, 2020
* Added TensorDenotation, markdown docs for metadata_props for images, and markdown docs for dimension denotation

* CR feedback in the docs.

* moved to type

* Updated the docs, kept TypeDenotation and DimesnionDenotation sperate (CR feedback)

* fixed the CI by running gen_proto.py

* PR feedback, trimmed the type denotation doc and added a sample

* markdown cleanup

* markdown cleanup

* markdown cleanup

* PR feedback.
- extra line in metadata.md
- added a default "TENSOR" value to typedenotation as a base option

* typo (a vs. an)
changed TENSOR from 0 to 1 - onnx field numbers must be positive integers

* typo - a vs an
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants