-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added TensorDenotation and metadata_props for images #879
Changes from 18 commits
8de10bd
b87d257
9ee54fe
33c69fd
39e43e4
de1fcb3
cca4cf5
60baa38
3482ce8
e97b359
e396e64
9b610b8
59f2af8
a49680d
0ef4bdc
960048a
8798305
31e0092
e5d2401
0da5253
fb31948
6784565
3a966f8
167c335
8500466
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Metadata | ||
# Metadata | ||
|
||
In addition to the core metadata recommendations listed in the [extensibility documentation](IR.md#metadata) there is additional experimental metadata to help provide information for model inputs and outputs. | ||
|
||
This metadata applies to all input and output tensors of a given category. The first such category we define is: `Image`. | ||
|
||
## Motivation | ||
|
||
The motivation of such a mechanism is to allow model authors to convey to model consumers enough information for them to consume the model. | ||
|
||
In the case of images there are many option for providing valid image data. However a model which consumes images was trained with a particular set of these options which must | ||
be used during inferencing. | ||
|
||
The goal is this proposal is to provide enough metadata that the model consumer can perform their own featurization prior to running the model and provide a compatible input or retrive an output and know what its format is. | ||
|
||
## Image Category Definition | ||
|
||
For every tensor in this model that uses [Type Denotation](TypeDenotation.md) to declare itself an `IMAGE`, you SHOULD provide metadata to assist the model consumer. Note that any metadata provided using this mechanism is global to ALL types | ||
with the accompanying denotation. | ||
|
||
Keys and values are case insenstive. | ||
|
||
Specifically, we define here the following set image metadata: | ||
|
||
|Key|Value|Description| | ||
|-----|----|-----------| | ||
|`Image.BitmapPixelFormat`|__string__|Specifies the format of pixel data. Each enumeration value defines a channel ordering and bit depth. Possible values: <ul><li>`Gray8`: 1 channel image, the pixel data is 8 bpp grayscale.</li><li>`Rgb8`: 3 channel image, channel order is RGB, pixel data is 8bpp (No alpha)</li><li>`Bgr8`: 3 channel image, channel order is BGR, pixel data is 8bpp (No alpha)</li><li>`Rgba8`: 4 channel image, channel order is RGBA, pixel data is 8bpp (Straight alpha)</li><li>`Bgra8`: 4 channel image, channel order is BGRA, pixel data is 8bpp (Straight alpha)</li></ul>| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you call out that keys and values are case insensitive? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice ! added to the next iteration. |
||
|`Image.ColorSpaceGamma`|__string__|Specifies the gamma color space used. Possible values:<ul><li>`Linear`: Linear color space, gamma == 1.0</li><li>`SRGB`: sRGB color space, gamma == 2.2</li></ul>| | ||
|`Image.NominalPixelRange`|__string__|Specifies the range that pixel values are stored. Possible values: <ul><li>`NominalRange_0_255`: [0...255] for 8bpp samples</li><li>`Normalized_0_1`: [0...1] pixel data is stored normalized</li><li>`Normalized_1_1`: [-1...1] pixel data is stored normalized</li><li>`NominalRange_16_235`: [16...235] for 8bpp samples</li></ul>| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In some cases, means for each channel is also needed to preprocess the input. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you think that would look ? Are you OK if we add this as a follow up PR later ? I think this follows this model of adding more and more metadata as we find it to be useful. love it ! |
||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Type Denotation | ||
|
||
Type Denotation is used to describe semantic information around what the inputs and outputs are. It is stored on the TypeProto message. | ||
|
||
## Motivation | ||
|
||
The motivation of such a mechanism can be illustrated via a simple example. In the the neural network SqueezeNet, it takes in an NCHW image input float[1,3,244,244] and produces a output float[1,1000,1,1]: | ||
|
||
``` | ||
input_in_NCHW -> data_0 -> SqueezeNet() -> output_softmaxout_1 | ||
``` | ||
|
||
In order to run this model the user needs a lot of information. In this case the user needs to know: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think instead of using type denotation, saving the following information in the metadata is more straightforward. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We experimented with that approach first, but it's more semantically correct to first denote that the type is an IMAGE. only then do you know to go look at the metdata to see how the model requires it's images. If you had multiple inputs into the model, you need a type denotation to know which of those types is the image. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 the need to support multiple inputs is a good call |
||
* the input is an image | ||
* the image is in the format of NCHW | ||
* the color channels are in the order of bgr | ||
* the pixel data is 8 bit | ||
* the pixel data is normalized as values 0-255 | ||
|
||
This proposal consists of three key components to provide all of this information: | ||
* Type Denotation, | ||
* [Dimension Denotation](DimensionDenotation.md), | ||
* [Model Metadata](MetadataProps.md). | ||
|
||
## Type Denotation Definition | ||
|
||
To begin with, we define a set of semantic types that define what models generally consume as inputs and produce as outputs. | ||
|
||
Specifically, in our first proposal we define the following set of standard denotations: | ||
|
||
0. `TENSOR` describes that a type holds an generic tensor using the standard TypeProto message. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: "a" not "an" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed :) |
||
1. `IMAGE` describes that a type holds an image. You can use dimension denotation to learn more about the layout of the image, and also the optional model metadata_props. | ||
2. `AUDIO` describes that a type holds an audio clip. | ||
3. `TEXT` describes that a type holds a block of text. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about good old numerical tensors, for other tasks such as recommendations, forecasting, anomaly detection, etc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense ! I added the 3 we had been using the most in our models (image/audio/text) and added explicit metadata for image. I assumed that there would be follow up proposals and PR's as we add more "types" here. What you thinking ? Also, we do have the normal case where there is no denotation . In that case the tensor is also a good old numerical tensor. since denotation is optional, it still has all the fun stuff, like shape and type. |
||
Model authors SHOULD add type denotation to inputs and outputs for the model as appropriate. | ||
|
||
## An Example with input IMAGE | ||
|
||
Let's use the same SqueezeNet example from above and show everything to properly annotate the model: | ||
|
||
* First set the TypeProto.denotation =`IMAGE` for the ValueInfoProto `data_0` | ||
* Because it's an image, the model consumer now knows to go look for image metadata on the model | ||
* Then include 3 metadata strings on ModelProto.metadata_props | ||
* `Image.BitmapPixelFormat` = `Bgr8` | ||
* `Image.ColorSpaceGamma` = `SRGB` | ||
* `Image.NominalPixelRange` = `NominalRange_0_255` | ||
* For that same ValueInfoProto, make sure to also use Dimension Denotations to denote NCHW | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you consider to add more metadata like "crop_size", "mean_value", or "scale"? (Just like caffe data layer "transform_param") There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. We considered this and did not go that route in that the goal was not to do the transformation for you, but instead provide the final values that the model expects. This allows people using the model to perform their own transformation outside the model. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean if the metadata contains the transformation information, the model user can know how to perform the transform that the model expects. For example, specify the mean value [123.68, 116.78, 103.94] for VGG net in the metadata in order to hint model user to do the image pre-processing. |
||
* TensorShapeProto.Dimension[0].denotation = `DATA_BATCH` | ||
* TensorShapeProto.Dimension[1].denotation = `DATA_CHANNEL` | ||
* TensorShapeProto.Dimension[2].denotation = `DATA_FEATURE` | ||
* TensorShapeProto.Dimension[3].denotation = `DATA_FEATURE` | ||
|
||
Now there is enough information in the model to know everything about how to pass a correct image into the model. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -476,6 +476,25 @@ message TypeProto { | |
Map map_type = 5; | ||
|
||
} | ||
|
||
// An optional denotation can be used to denote the whole | ||
// type with a standard semantic description as to what is | ||
// stored inside | ||
optional string denotation = 6; | ||
} | ||
|
||
// A set of pre-defined constants to be used as values for | ||
// the standard denotation field in Tensor for semantic | ||
// description of the tensor. | ||
message TypeDenotationConstProto { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see my comment above about a generic tensor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking we make denotation optional, and "generic" would be that there is no denotation at all. Are you saying we should add a TENSOR to the constproto as the default/0 case so that you could provide a tensor denotation and say it is just a "tensor" ? Assuming that , what do you think? I'll add a GENERIC to this PR for completeness, but leave the denotation as optional. Let me know what you think and if you are proposing we make the denotation required. thanks ! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. |
||
// A generic tensor | ||
optional string TENSOR = 0 [default = "TENSOR"]; | ||
// An image is stored inside this tensor | ||
optional string IMAGE = 1 [default = "IMAGE"]; | ||
// Audio is stored inside this tensor | ||
optional string AUDIO = 2 [default = "AUDIO"]; | ||
// Text is stored inside this tensor | ||
optional string TEXT = 3 [default = "TEXT"]; | ||
} | ||
|
||
// Operator Sets | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc is called metadata prop, so we should not introduce the additional experimental metadata here.Let's move the introduction of the experimental metadata to metadata section in IR doc.
We can extract the useful pieces in this doc, and merge it into metadata section in IR.doc.
At least, we should have a better name for this section...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example should help to show that we have 3 separate pieces here, and how to tie them all together.