Merge 5d9ffa1 into 330fd0f

onnx · May 14, 2018 · c9001e3 · c9001e3
2 parents 330fd0f + 5d9ffa1
commit c9001e3
Show file tree

Hide file tree

Showing 8 changed files with 182 additions and 6 deletions.
diff --git a/docs/DimensionDenotation.md → docs/Denotation.md b/docs/DimensionDenotation.md → docs/Denotation.md
@@ -1,8 +1,64 @@
-# Dimension Denotation
+Using Denotation For Semantic Description
+------------------------------------------
+
+Denotation is an experiment to give semantic description to models.  This enables model authors to describe the parts of their model application developers need to know in order to consume them.
+
+There are 2 types of denotation : [Type Denotation](Denotation.md#type-denotation) and [Dimension Denotation](Denotation.md#dimension-denotation).
+
+### Type Denotation
+
+Type Denotation is used to describe semantic information around what the inputs and outputs are.    It is stored on the TypeProto message.
+
+#### Motivation
+
+The motivation of such a mechanism can be illustrated via a simple example. In the the neural network SqueezeNet, it takes in an NCHW image input float[1,2,244,244] and produces a output float[1,1000,1,1]:
+
+```
+input_in_NCHW -> data_0 -> SqueezeNet() -> output_softmaxout_1
+```
+
+In order to run this model the user needs a lot of information.    In this case the user needs to know:
+* the input is an image
+* the image is in the format of NCHW
+* the color channels are in the order of bgr
+* the pixel data is 8 bit
+* the pixel data is normalized as values 0-255
+
+This proposal consists of three key components to provide all of this information: Type Denotation, [Dimension Denotation](Denotation.md#dimension-denotation), and [model metadata](MetadataProps.md).  Each of which will be discussed in detail.
+
+#### Type Denotation Definition
+
+To begin with, we define a set of semantic types that define what models generally consume as inputs and produce as outputs.
+
+Specifically, in our first proposal we define the following set of standard denotations:
+
+1. `IMAGE` describes that a type holds an image.  You can use dimension denotation to learn more about the layout of the image, and also the optional model metadata_props.
+2. `AUDIO` describes that a type holds an audio clip.   
+3. `TEXT` describes that a type holds a block of text.
+
+Model authors SHOULD add type denotation to inputs and outputs for the model.
+
+#### Denotation Propagation
+
+Type Denotation propagation does not automatically occur.   It is used to describe the initial input or final outputs, but as data flows through the graph no inference is made to propogate if the data still holds an image (for example).   A model builder or conversion tool MAY apply propagation manually in the model if they knows that subsequent types share the same semantic denotation.
+
+#### Denotation Verification
+
+Denotation Verification is not enforced.   It is simply a method for model authors to indicate to model consumers what they should be passing in and what they should be expecting out.  No error is reported if you do not actually pass in an image (for example).
+
+#### Combined With Dimension Denotation
+
+Type denotation can be combined with the new experimental feature of [Dimension Denotation](Denotation.md#dimension-denotation).  For example if the Type Denotation is `IMAGE`, then that type SHOULD also have [Dimension Denotation](Denotation.md#dimension-denotation) stating the NCHW channel layout.  (described as [`DATA_BATCH`, `DATA_CHANNEL`, `DATA_FEATURE`, `DATA_FEATURE`]).
+
+#### Model metadata_props
+
+A model author then uses model metadata to describe information about ALL of the inputs and outputs for the model.   For example, `Image.BitmapPixelFormat`.  See the [model metadata documentation](MetadataProps.md) for details.
+
+### Dimension Denotation
 
 Dimension Denotation is an experimental attempt to give tensor axis semantic descriptions and thus types and perform verification steps based on them subsequently.
 
-## Motivation
+#### Motivation
 
 The motivation of such a mechanism can be illustrated via a simple example. In the linear neural network specification below, we assume a NCHW model input:
 
@@ -14,7 +70,7 @@ In this neural network, a user mistakenly constructed a neural network that tran
 
 This proposal consists of three key components: Denotation Definition, Denotation Propagation and Denotation Verification, each of which will be discussed in detail.
 
-## Denotation Definition
+#### Denotation Definition
 
 To begin with, we define a set of types for tensor types. Such types are defined based on the following principles:
 1. Be fine grain enough to eliminate potential pitfalls. For instance, the above example illustrated in the motivation section mandates that we distinguish between a channel dimension and a spatial feature dimension to ensure the correctness of execution of the AveragePool op.
@@ -31,7 +87,7 @@ Specifically, in our first proposal, we define the following set of standard den
 6. `FILTER_OUT_CHANNEL` describes a filter out-channel dimension. This is the dimension that is identical (in size) to the channel dimension of the output image feature maps.
 7. `FILTER_SPATIAL` describes a filter spatial dimension.
 
-## Denotation Propagation
+#### Denotation Propagation
 
 Denotation Propagation happens when an operation permutes, destroys or creates dimensions with respect to its input tensor. In such scenarios, we will implement customized, operation-specific functions to infer the output tensor dimension denotation based on the input tensor dimension denotation. An example operation where denotation propagation happens is Transpose operation where the pseudocode for output dimension denotation inference can be formulated as a function of the input dimension denotation:
 
@@ -40,6 +96,7 @@ for i, j in enumerate(perm):
     out_dim_denotaion[i] = in_dim_denotation[j]
 ```
 
-## Denotation Verification
+#### Denotation Verification
 
 Denotation Verification happens when an operation expects its input to arrive in a particular format. An example operation where denotation verification happens is AveragePool operation where the input, if annotated with dimension denotation, in the 2D case should have the denotation [`DATA_BATCH`, `DATA_CHANNEL`, `DATA_FEATURE`, `DATA_FEATURE`]. If there is a mismatch between the expected dimension denotation and the actual dimension denotation, an error should be reported.
+
diff --git a/docs/IR.md b/docs/IR.md
@@ -357,6 +357,8 @@ The type system used for attributes is a superset of that used for of inputs and
 
 The ONNX specification is comprised of this document, which defines the semantics of the IR and the standard data types, and the following documents defining standard operator semantics and the IR syntax. The latter is specified as Protobuf v2 and v3 schema files.
 
+See the [metadata category documentation](MetadataProps.md) for more details.
+
 ### Operators
 
 [Neural Network Operators](Operators.md)

diff --git a/docs/MetadataProps.md b/docs/MetadataProps.md
@@ -0,0 +1,33 @@
+# Metadata
+
+In addition to the core metadata recommendations listed in the [extensibility documentation](IR.md#metadata) there is additional experimental metadata to help provide information for model inputs and outputs.  
+
+This metadata applies to all input and output tensors of a given category.  The first such category we define is: `Image`.
+
+## Motivation
+
+The motivation of such a mechanism is to allow model authors to convey to model consumers enough information for them to consume the model.  
+
+In the case of images there are many option for providing valid image data.  However a model which consumes images was trained with a particular set of these options which must 
+be used during inferencing.
+
+The goal is this proposal is to provide enough metadata that the model consumer can perform their own featurization prior to running the model and provide a compatible input 
+or retrive an output and know what its format is.
+
+## Image Category Definition
+
+For every tensor in this model that uses [Type Denotation](Denotation.md#type-denotation) to declare itself an `IMAGE`, you can provide more information to the model consumer.  Note that any metadata provided using this mechanism is global to ALL tensors
+with the accompanying denotation.
+
+Keys and values are case insenstive.
+
+Specifically, we define here the following set image metadata:
+
+|Key|Value|Description|
+|-----|----|-----------|
+|`Image.BitmapPixelFormat`|__string__|Specifies the format of pixel data. Each enumeration value defines a channel ordering and bit depth. Possible values: <ul><li>`Gray8`: 1 channel image, the pixel data is 8 bpp grayscale.</li><li>`Rgb8`: 3 channel image, channel order is RGB, pixel data is 8bpp (No alpha)</li><li>`Bgr8`: 3 channel image, channel order is BGR, pixel data is 8bpp (No alpha)</li><li>`Rgba8`: 4 channel image, channel order is RGBA, pixel data is 8bpp (Straight alpha)</li><li>`Bgra8`: 4 channel image, channel order is BGRA, pixel data is 8bpp (Straight alpha)</li></ul>|
+|`Image.ColorSpaceGamma`|__string__|Specifies the gamma color space used. Possible values:<ul><li>`Linear`: Linear color space, gamma == 1.0</li><li>`SRGB`: sRGB color space, gamma == 2.2</li></ul>|
+|`Image.NominalPixelRange`|__string__|Specifies the range that pixel values are stored. Possible values: <ul><li>`NominalRange_0_255`:  [0...255] for 8bpp samples</li><li>`Normalized_0_1`: [0...1] pixel data is stored normalized</li><li>`Normalized_1_1`: [-1...1] pixel data is stored normalized</li><li>`NominalRange_16_235`: [16...235] for 8bpp samples</li></ul>|
+
+
+		
diff --git a/onnx/onnx-ml.proto b/onnx/onnx-ml.proto
@@ -477,6 +477,23 @@ message TypeProto {
     Map map_type = 5;
 
   }
+
+  // An optional denotation can be used to denote the whole 
+  // type with a standard semantic description as to what is 
+  // stored inside
+  optional string denotation = 6;
+}
+
+// A set of pre-defined constants to be used as values for
+// the standard denotation field in Tensor for semantic 
+// description of the tensor.
+message TypeDenotationConstProto {
+  // An image is stored inside this tensor
+  optional string IMAGE = 1 [default = "IMAGE"];
+  // Audio is stored inside this tensor
+  optional string AUDIO = 2 [default = "AUDIO"];
+  // Text is stored inside this tensor
+  optional string TEXT = 3 [default = "TEXT"];
 }
 
 // Operator Sets

diff --git a/onnx/onnx-ml.proto3 b/onnx/onnx-ml.proto3
@@ -477,6 +477,23 @@ message TypeProto {
     Map map_type = 5;
 
   }
+
+  // An optional denotation can be used to denote the whole 
+  // type with a standard semantic description as to what is 
+  // stored inside
+  string denotation = 6;    
+}
+
+// A set of pre-defined constants to be used as values for
+// the standard denotation field in Tensor for semantic 
+// description of the tensor.
+message TypeDenotationConstProto {
+  // An image is stored inside this tensor
+  string IMAGE = 1 [default = "IMAGE"];
+  // Audio is stored inside this tensor
+  string AUDIO = 2 [default = "AUDIO"];
+  // Text is stored inside this tensor
+  string TEXT = 3 [default = "TEXT"];
 }
 
 // Operator Sets

diff --git a/onnx/onnx.in.proto b/onnx/onnx.in.proto
@@ -478,6 +478,23 @@ message TypeProto {
 
 // #endif
   }
+
+  // An optional denotation can be used to denote the whole 
+  // type with a standard semantic description as to what is 
+  // stored inside
+  optional string denotation = 6;
+}
+
+// A set of pre-defined constants to be used as values for
+// the standard denotation field in Tensor for semantic 
+// description of the tensor.
+message TypeDenotationConstProto {
+  // An image is stored inside this type
+  optional string IMAGE = 1 [default = "IMAGE"];
+  // Audio is stored inside this type
+  optional string AUDIO = 2 [default = "AUDIO"];
+  // Text is stored inside this type
+  optional string TEXT = 3 [default = "TEXT"];
 }
 
 // Operator Sets

diff --git a/onnx/onnx.proto b/onnx/onnx.proto
@@ -440,12 +440,28 @@ message TypeProto {
     optional TensorShapeProto shape = 2;
   }
 
-
   oneof value {
     // The type of a tensor.
     Tensor tensor_type = 1;
 
   }
+
+  // An optional denotation can be used to denote the whole 
+  // type with a standard semantic description as to what is 
+  // stored inside
+  optional string denotation = 6;  
+}
+
+// A set of pre-defined constants to be used as values for
+// the standard denotation field in Tensor for semantic 
+// description of the tensor.
+message TypeDenotationConstProto {
+  // An image is stored inside this type
+  optional string IMAGE = 1 [default = "IMAGE"];
+  // Audio is stored inside this type
+  optional string AUDIO = 2 [default = "AUDIO"];
+  // Text is stored inside this type
+  optional string TEXT = 3 [default = "TEXT"];
 }
 
 // Operator Sets

diff --git a/onnx/onnx.proto3 b/onnx/onnx.proto3
@@ -446,6 +446,23 @@ message TypeProto {
     Tensor tensor_type = 1;
 
   }
+
+  // An optional denotation can be used to denote the whole 
+  // type with a standard semantic description as to what is 
+  // stored inside
+  string denotation = 6;    
+}
+
+// A set of pre-defined constants to be used as values for
+// the standard denotation field in Tensor for semantic 
+// description of the tensor.
+message TypeDenotationConstProto {
+  // An image is stored inside this tensor
+  string IMAGE = 1 [default = "IMAGE"];
+  // Audio is stored inside this tensor
+  string AUDIO = 2 [default = "AUDIO"];
+  // Text is stored inside this tensor
+  string TEXT = 3 [default = "TEXT"];
 }
 
 // Operator Sets