Rel 1.7.103 verify (#2687)

* Fix Greater/LessOrEqual function definition (#2645) * Fix Greater/LessOrEqual function definition * Update test data Co-authored-by: Ke Zhang <kezhan@microsoft.com> * Suppress a warning in unsqueeze (#2637) I keep getting this warning when building PyTorch: ``` In file included from /home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/utils.h:6, from /home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc:4: /home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc: In lambda function: /home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc:1414:22: warning: unnecessary parentheses in declaration of â��iâ�� [-Wparentheses] for (size_t(i) = 0; i < axes.size(); ++i) { ^ /home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/schema.h:959:12: note: in definition of macro â��ONNX_OPERATOR_SET_SCHEMA_EXâ�� return impl.SetName(#name) \ ^~~~ /home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc:1369:1: note: in expansion of macro â��ONNX_OPERATOR_SET_SCHEMAâ�� ONNX_OPERATOR_SET_SCHEMA( ``` This commit should fix it and modernize the code a bit. Co-authored-by: Ke Zhang <kezhan@microsoft.com> * [Training] Add Adagrad optimizer operator (#1955) * Adagrad draft * MIMO * Support multiple tensors to be optimized * Address comments * Move optimizers to a new place Remove copied Add momentum Save Remove momentum Fix Move constants to attributes * Fix build * Add shape test Add two node tests Update test coverage * Fix shape inf * Fix shape inf * fix shape inf * Format * Add function type * Merge lines * Format * Fix version number * Update op version in model files * Fix a test function and update related test files * Update onnx/backend/test/case/node/adagrad.py * Remove unused file * sync docs * Fix shape test * sync doc * sync with master * Update onnx/defs/training/defs.cc Co-Authored-By: Michał Karzyński <postrational@users.noreply.github.com> * sync doc * address comments * address a minor comment * Polish one line Co-authored-by: Michał Karzyński <postrational@users.noreply.github.com> * [Training] SG with Momentum Optimizer (#1959) * SG with Momentum * Registrate Op Fix Update other docs * Add shape inference code and polish definition * Update docs * Add test cases and fix several bugs * Remove accidently added copy * Alpha -> alpha & Beta -> beta * Clarify an attribute * Fix an attribute * Fix bug * Fix missing attributes * sync doc * Remove unused domain * sync with master Co-authored-by: Chin Huang <chhuang@us.ibm.com> * Change type of label tensor to int32/int64 in SoftmaxCrossEntropyLoss spec. (#2667) * Update Pow input types in Opset 12 (#2666) * Update Pow input types in Opset 12 * gen doc and tests * remove uints and 8 bit ints * add tests * remove uint int x tets * Adding CI for ONNX Debug mode (Linux, OSX) (#2651) * adding an osx build, linux build, with and without onnx_ml for debug mode * test debug mode with ONNX_ML=1 * Rename OPTIONAL to OPTIONAL_VALUE (#2682) Co-authored-by: G. Ramalingam <grama@microsoft.com> * Update Batchnorm test (#2674) * Update Batchnorm test * relax shape inference on scalar * Remove unnecessary copies and std::move (#2684) * Update sequence test case so input is not scalar and splits are specified (#2675) * Update sequence test case to input is not scalar and splits are specified * Add spaces to make the checker happy * Use cmake GNUInstallDirs (#2661) https://cmake.org/cmake/help/latest/module/GNUInstallDirs.html this make allow install the libraries (and headers) in different location than `lib` (Gentoo uses lib64 for 64-bits libs) also change the .cmake files for avoid conclicts if build both 32-bis and 64-bits (avoids conflict/overwrite files) Co-authored-by: Ke Zhang <kezhan@microsoft.com> * Add 'ignore_index' input in the spec for SoftmaxCrossEntropyLoss and NLLLoss. (#2680) * Add 'ignore_index' input in the spec for SoftmaxCrossEntropyLoss and NLLLoss. * Add tests. * build break. * build break. * clean up. * build break. * Change ignore_index to attribute. * Change ignore_index to attribute. * PR feedback. * PR feedback. * Make ignore_index optional in NLLLoss. * Build break. * remove trailing spaces to fix build break. * Build break. * Update spec doc. * Fix NLLLoss function definition to fix test: test_negative_log_likelihood_loss_input_shape_is_NCd1d2_with_weight_reduction_sum_ignore_index_expanded * PR feedback. * Fix test for softmax cross entropy loss to exclude ignored_index'ed weights from the sum of weights. * Build break. * Reduce binary size of libraries consuming ONNX (part 1/2) (#2643) * Change the return type for the zipmap operator to match the description in the spec. * Reduce binary size of libraries consuming ONNX (part 1/2) * Fix build error * Replace separate Get*Doc() functions with easy macro for greater convenience * Add one more macro for complicated operator doc documentation. Co-authored-by: Ke Zhang <kezhan@microsoft.com> * Update pybind (#2340) (#2688) * Change version number for release verification Change version number for release verification Co-authored-by: Takeshi Watanabe <take-cheeze@users.noreply.github.com> Co-authored-by: Ke Zhang <kezhan@microsoft.com> Co-authored-by: Hong Xu <hong@topbug.net> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: Michał Karzyński <postrational@users.noreply.github.com> Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com> Co-authored-by: Lara Haidar <haidar.lara@gmail.com> Co-authored-by: Vinitra Swamy <vinitras@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Co-authored-by: Changming Sun <me@sunchangming.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Gustavo Alvarez <462213+sl1pkn07@users.noreply.github.com> Co-authored-by: Pranav Sharma <prs@microsoft.com>
onnx · Mar 31, 2020 · 3193b02 · 3193b02
1 parent 18d03bb
commit 3193b02
Show file tree

Hide file tree

Showing 189 changed files with 2,662 additions and 807 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -12,6 +12,16 @@ matrix:
       env: PYTHON_VERSION=python3 ONNX_ML=0
       language: python
       python: "3.6"
+    - os: linux
+      sudo: required
+      env: PYTHON_VERSION=python3 ONNX_ML=0 ONNX_DEBUG=1
+      language: python
+      python: "3.6"
+    - os: linux
+      sudo: required
+      env: PYTHON_VERSION=python3 ONNX_ML=1 ONNX_DEBUG=1
+      language: python
+      python: "3.6"
     - os: osx
       osx_image: xcode9.3
       env: PYTHON_VERSION=python2 ONNX_ML=0
@@ -34,6 +44,9 @@ matrix:
     - os: osx
       osx_image: xcode9.3
       env: PYTHON_VERSION=python3
+    - os: osx
+      osx_image: xcode9.3
+      env: PYTHON_VERSION=python3 ONNX_DEBUG=1
     - os: linux
       sudo: required
       env: PYTHON_VERSION=python2 LITE=1

diff --git a/.travis/install.sh b/.travis/install.sh
@@ -13,5 +13,9 @@ fi
 export CMAKE_ARGS="${CMAKE_ARGS} -DONNXIFI_DUMMY_BACKEND=ON"
 export ONNX_NAMESPACE=ONNX_NAMESPACE_FOO_BAR_FOR_CI
 
+if [ "${ONNX_DEBUG}" == "1" ]; then
+  export DEBUG=1
+fi
+
 time python setup.py --quiet bdist_wheel --universal --dist-dir .
 find . -maxdepth 1 -name "*.whl" -ls -exec pip install {} \;
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -640,12 +640,14 @@ if(MSVC)
   add_msvc_runtime_flag(onnxifi_dummy)
 endif()
 
+include(GNUInstallDirs)
+
 install(DIRECTORY ${ONNX_ROOT}/onnx
-        DESTINATION include
+        DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
         FILES_MATCHING
         PATTERN "*.h")
 install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/onnx
-        DESTINATION include
+        DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
         FILES_MATCHING
         PATTERN "*.h")
 
@@ -660,13 +662,13 @@ configure_file(
 install(FILES
   ${PROJECT_BINARY_DIR}/ONNXConfigVersion.cmake
   ${PROJECT_BINARY_DIR}/ONNXConfig.cmake
-  DESTINATION share/cmake/ONNX
+  DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ONNX
   COMPONENT dev)
-install(EXPORT ONNXTargets DESTINATION share/cmake/ONNX)
+install(EXPORT ONNXTargets DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/ONNX")
 install(TARGETS
   onnx onnx_proto
   onnxifi onnxifi_dummy onnxifi_loader
-  EXPORT ONNXTargets DESTINATION lib)
+  EXPORT ONNXTargets DESTINATION ${CMAKE_INSTALL_LIBDIR})
 
 if(NOT ANDROID AND NOT IOS)
   install(TARGETS onnxifi_wrapper

diff --git a/VERSION_NUMBER b/VERSION_NUMBER
@@ -1 +1 @@
-1.7.102
+1.7.103
diff --git a/docs/Changelog.md b/docs/Changelog.md
@@ -14801,6 +14801,8 @@ This version of the operator has been available since version 12 of the default
 #### Attributes
 
 <dl>
+<dt><tt>ignore_index</tt> : int</dt>
+<dd>Specifies a target value that is ignored and does not contribute to the input gradient. It is an optional value and valid values are [0, C).</dd>
 <dt><tt>reduction</tt> : string (default is mean)</dt>
 <dd>Type of reduction to apply to loss: none, sum, mean (default). 'none': the output is the loss for each sample. 'sum': the output will be summed. 'mean': the sum of the output will be divided by the sum of applied weights.</dd>
 </dl>
@@ -14832,6 +14834,42 @@ This version of the operator has been available since version 12 of the default
 <dd>Constrain target to integer types</dd>
 </dl>
 
+### <a name="Pow-12"></a>**Pow-12**</a>
+
+  Pow takes input data (Tensor<T>) and exponent Tensor, and
+  produces one output data (Tensor<T>) where the function `f(x) = x^exponent`,
+  is applied to the data tensor elementwise.
+  This operator supports **multidirectional (i.e., Numpy-style) broadcasting**; for more details please check [the doc](Broadcasting.md).
+
+#### Version
+
+This version of the operator has been available since version 12 of the default ONNX operator set.
+
+#### Inputs
+
+<dl>
+<dt><tt>X</tt> : T</dt>
+<dd>First operand, base of the exponent.</dd>
+<dt><tt>Y</tt> : T1</dt>
+<dd>Second operand, power of the exponent.</dd>
+</dl>
+
+#### Outputs
+
+<dl>
+<dt><tt>Z</tt> : T</dt>
+<dd>Output tensor (same size as X)</dd>
+</dl>
+
+#### Type Constraints
+
+<dl>
+<dt><tt>T</tt> : tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)</dt>
+<dd>Constrain input X and output types to float/int tensors.</dd>
+<dt><tt>T1</tt> : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)</dt>
+<dd>Constrain input Y types to float/int tensors.</dd>
+</dl>
+
 ### <a name="ReduceMax-12"></a>**ReduceMax-12**</a>
 
   Computes the max of the input tensor's element along the provided axes. The resulted
@@ -14956,6 +14994,8 @@ This version of the operator has been available since version 12 of the default
 #### Attributes
 
 <dl>
+<dt><tt>ignore_index</tt> : int</dt>
+<dd>Specifies a target value that is ignored and does not contribute to the input gradient. It is an optional value and valid values are [0, C).</dd>
 <dt><tt>reduction</tt> : string (default is mean)</dt>
 <dd>Type of reduction to apply to loss: none, sum, mean(default). 'none': no reduction will be applied, 'sum': the output will be summed. 'mean': the sum of the output will be divided by the number of elements in the output.</dd>
 </dl>
@@ -14965,7 +15005,7 @@ This version of the operator has been available since version 12 of the default
 <dl>
 <dt><tt>scores</tt> : T</dt>
 <dd>The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of dimensions.</dd>
-<dt><tt>labels</tt> : T</dt>
+<dt><tt>labels</tt> : Tind</dt>
 <dd>The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, ..., Dk], where K is the number of dimensions.</dd>
 <dt><tt>weights</tt> (optional) : T</dt>
 <dd>A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.</dd>
@@ -14985,6 +15025,8 @@ This version of the operator has been available since version 12 of the default
 <dl>
 <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt>
 <dd>Constrain input and output types to float tensors.</dd>
+<dt><tt>Tind</tt> : tensor(int32), tensor(int64)</dt>
+<dd>Constrain target to integer types</dd>
 </dl>
 
 ### <a name="UnfoldToDepth-12"></a>**UnfoldToDepth-12**</a>
@@ -15424,3 +15466,112 @@ This version of the operator has been available since version 1 of the 'ai.onnx.
 <dd>Allow inputs and outputs to be any kind of tensor.</dd>
 </dl>
 
+### <a name="ai.onnx.training.Momentum-1"></a>**ai.onnx.training.Momentum-1**</a>
+
+  Compute one iteration of stochastic gradient update with momentum.
+      This operator can conduct the optimization of multiple tensor variables.
+
+      Let's define the behavior of this operator. As you can imagine, SG with momentum requires
+      several parameters:
+
+       - The learning-rate "R".
+       - The update count "T". That is, the number of conducted training iterations. It should
+         be zero in the first training iteration.
+       - A L2-norm regularization coefficient "norm_coefficient".
+       - A decay coefficient of previous accumulated gradient (i.e., momentum) "alpha".
+       - The scaling coefficient of current gradient "beta".
+       - An attribute to choose either standard momentum or Nesterov's momentum "mode" should
+         be used.
+
+      For the sake of simplicity, assume that there is only one tensor (called "X") to be optimized.
+      Other necessary inputs are "X"'s gradient (called "G") and "X"'s momentum (called "V"). This
+      Momentum operator maps all these inputs to the new value of "X" (called "X_new") and its new
+      momentum (called "V_new").
+
+      This operator supports two different momentum algorithms. Set the attribute "mode" to
+      "nesterov" if Nesterov's momentum is desired. Otherwise, set the attribute "model" to
+      "standard" to use standard momentum. Computation details are described subsequently.
+
+      Let "+", "-", "*", and "/" are all element-wise operations with numpy-style broadcasting.
+
+      Pseudo code for SG with standard momentum:
+
+        // Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared
+        // values of all elements in X.
+        G_regularized = norm_coefficient * X + G
+
+        // In the first training iteration, beta should always be 1.
+        beta_adjusted = T > 0 ? beta : 1
+
+        // Compute the current momentum based on previous momentum and the current gradient.
+        V_new = alpha * V + beta_adjusted * G_regularized
+
+        // Update X.
+        X_new = X - R * V_new
+
+      Pseudo code for SG with Nesterov's momentum:
+
+        // Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared
+        // values of all elements in X.
+        G_regularized = norm_coefficient * X + G;
+
+        // In the first training iteration, beta should always be 1.
+        beta_adjusted = T > 0 ? beta : 1
+
+        // Compute the current momentum based on previous momentum and the current gradient.
+        V_new = alpha * V + beta_adjusted * G_regularized;
+
+        // Compute final update direction and then update X.
+        X_new = X - R * (G_regularized + alpha * V_new)
+
+      If one assign this operators to optimize multiple inputs, for example, "X_1" and "X_2". The same
+      pseudo code would be extended to handle all tensors jointly. More specifically, we can view "X" as a
+      concatenation of "X_1" and "X_2" (of course, their gradient and accumulate gradient should
+      be concatenated too) and then our pseudo code becomes applicable.
+
+#### Version
+
+This version of the operator has been available since version 1 of the 'ai.onnx.training' operator set.
+
+#### Attributes
+
+<dl>
+<dt><tt>alpha</tt> : float (required)</dt>
+<dd>The decay factor of momentum. It should be a scalar.</dd>
+<dt><tt>beta</tt> : float (required)</dt>
+<dd>The coefficient of gradient in computing new momentum. It should be a scalar.</dd>
+<dt><tt>mode</tt> : string (required)</dt>
+<dd>Its value should be either "nesterov" or "standard". The value "nesterov" leads to the use of Nesterov's momentum while "standard" invokes stochastic gradient method using standard momentum</dd>
+<dt><tt>norm_coefficient</tt> : float (required)</dt>
+<dd>Coefficient of 0.5 * norm_coefficient * ||X||^2.</dd>
+</dl>
+
+#### Inputs (3 - &#8734;)
+
+<dl>
+<dt><tt>R</tt> : T1</dt>
+<dd>The learning rate.</dd>
+<dt><tt>T</tt> : T2</dt>
+<dd>Update count of "X". It should be a scalar.</dd>
+<dt><tt>inputs</tt> (variadic, heterogeneous) : T3</dt>
+<dd>It sequentially contains the current values of optimized tensors, then their gradient tensors, and finally their momentum tensors. For example, if two tensors "X_1" and "X_2" are optimized, The expected input list would be ["X_1", "X_2", gradient of "X_1", gradient of "X_2", momentum of "X_1", momentum of "X_2"].</dd>
+</dl>
+
+#### Outputs (1 - &#8734;)
+
+<dl>
+<dt><tt>outputs</tt> (variadic, heterogeneous) : T3</dt>
+<dd>It sequentially contains the new values of optimized tensors and then the new values of their momentum tensors. For example, if two tensors "X_1" and "X_2" are optimized, the output list would be [new value of "X_1," new value of "X_2" new momentum of "X_1", new momentum of "X_2"].</dd>
+</dl>
+
+#### Type Constraints
+
+<dl>
+<dt><tt>T1</tt> : tensor(float), tensor(double)</dt>
+<dd>Constrain input types to float scalars.</dd>
+<dt><tt>T2</tt> : tensor(int64)</dt>
+<dd>Constrain input types to 64-bit integer scalars.</dd>
+<dt><tt>T3</tt> : tensor(float), tensor(double)</dt>
+<dd>Constrain input types to float tensors.</dd>
+</dl>
+