Skip to content

Commit

Permalink
Rel 1.7.103 verify (#2687)
Browse files Browse the repository at this point in the history
* Fix Greater/LessOrEqual function definition (#2645)

* Fix Greater/LessOrEqual function definition

* Update test data

Co-authored-by: Ke Zhang <kezhan@microsoft.com>

* Suppress a warning in unsqueeze (#2637)

I keep getting this warning when building PyTorch:

```
In file included from
/home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/utils.h:6,
                 from
/home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc:4:
/home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc: In
lambda function:
/home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc:1414:22:
warning: unnecessary parentheses in declaration of �i�
[-Wparentheses]
           for (size_t(i) = 0; i < axes.size(); ++i) {
                      ^
/home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/schema.h:959:12:
note: in definition of macro �ONNX_OPERATOR_SET_SCHEMA_EX�
     return impl.SetName(#name)
\
            ^~~~
/home/hong/wsrc/pytorch/third_party/onnx/onnx/defs/tensor/defs.cc:1369:1:
note: in expansion of macro �ONNX_OPERATOR_SET_SCHEMA�
 ONNX_OPERATOR_SET_SCHEMA(
```

This commit should fix it and modernize the code a bit.

Co-authored-by: Ke Zhang <kezhan@microsoft.com>

* [Training] Add Adagrad optimizer operator (#1955)

* Adagrad draft

* MIMO

* Support multiple tensors to be optimized

* Address comments

* Move optimizers to a new place

Remove copied

Add momentum

Save

Remove momentum

Fix

Move constants to attributes

* Fix build

* Add shape test

Add two node tests

Update test coverage

* Fix shape inf

* Fix shape inf

* fix shape inf

* Format

* Add function type

* Merge lines

* Format

* Fix version number

* Update op version in model files

* Fix a test function and update related test files

* Update onnx/backend/test/case/node/adagrad.py

* Remove unused file

* sync docs

* Fix shape test

* sync doc

* sync with master

* Update onnx/defs/training/defs.cc

Co-Authored-By: Michał Karzyński <postrational@users.noreply.github.com>

* sync doc

* address comments

* address a minor comment

* Polish one line

Co-authored-by: Michał Karzyński <postrational@users.noreply.github.com>

* [Training] SG with Momentum Optimizer (#1959)

* SG with Momentum

* Registrate Op

Fix

Update other docs

* Add shape inference code and polish definition

* Update docs

* Add test cases and fix several bugs

* Remove accidently added copy

* Alpha -> alpha & Beta -> beta

* Clarify an attribute

* Fix an attribute

* Fix bug

* Fix missing attributes

* sync doc

* Remove unused domain

* sync with master

Co-authored-by: Chin Huang <chhuang@us.ibm.com>

* Change type of label tensor to int32/int64 in SoftmaxCrossEntropyLoss spec. (#2667)

* Update Pow input types in Opset 12 (#2666)

* Update Pow input types in Opset 12

* gen doc and tests

* remove uints and 8 bit ints

* add tests

* remove uint int x tets

* Adding CI for ONNX Debug mode (Linux, OSX) (#2651)

* adding an osx build, linux build, with and without onnx_ml for debug mode

* test debug mode with ONNX_ML=1

* Rename OPTIONAL to OPTIONAL_VALUE (#2682)

Co-authored-by: G. Ramalingam <grama@microsoft.com>

* Update Batchnorm test (#2674)

* Update Batchnorm test

* relax shape inference on scalar

* Remove unnecessary copies and std::move (#2684)

* Update sequence test case so input is not scalar and splits are specified (#2675)

* Update sequence test case to input is not scalar and splits are specified

* Add spaces to make the checker happy

* Use cmake GNUInstallDirs (#2661)

https://cmake.org/cmake/help/latest/module/GNUInstallDirs.html
this make allow install the libraries (and headers) in different location than `lib` (Gentoo uses lib64 for 64-bits libs)
also change the .cmake files for avoid conclicts if build both 32-bis and 64-bits (avoids conflict/overwrite files)

Co-authored-by: Ke Zhang <kezhan@microsoft.com>

* Add 'ignore_index' input in the spec for SoftmaxCrossEntropyLoss and NLLLoss. (#2680)

* Add 'ignore_index' input in the spec for SoftmaxCrossEntropyLoss and NLLLoss.

* Add tests.

* build break.

* build break.

* clean up.

* build break.

* Change ignore_index to attribute.

* Change ignore_index to attribute.

* PR feedback.

* PR feedback.

* Make ignore_index optional in NLLLoss.

* Build break.

* remove trailing spaces to fix build break.

* Build break.

* Update spec doc.

* Fix NLLLoss function definition to fix test: test_negative_log_likelihood_loss_input_shape_is_NCd1d2_with_weight_reduction_sum_ignore_index_expanded

* PR feedback.

* Fix test for softmax cross entropy loss to exclude ignored_index'ed weights from the sum of weights.

* Build break.

* Reduce binary size of libraries consuming ONNX (part 1/2) (#2643)

* Change the return type for the zipmap operator to match the description in the spec.

* Reduce binary size of libraries consuming ONNX (part 1/2)

* Fix build error

* Replace separate Get*Doc() functions with easy macro for greater convenience

* Add one more macro for complicated operator doc documentation.

Co-authored-by: Ke Zhang <kezhan@microsoft.com>

* Update pybind (#2340) (#2688)

* Change version number for release verification

Change version number for release verification

Co-authored-by: Takeshi Watanabe <take-cheeze@users.noreply.github.com>
Co-authored-by: Ke Zhang <kezhan@microsoft.com>
Co-authored-by: Hong Xu <hong@topbug.net>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Michał Karzyński <postrational@users.noreply.github.com>
Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com>
Co-authored-by: Lara Haidar <haidar.lara@gmail.com>
Co-authored-by: Vinitra Swamy <vinitras@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: G. Ramalingam <grama@microsoft.com>
Co-authored-by: Changming Sun <me@sunchangming.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Gustavo Alvarez <462213+sl1pkn07@users.noreply.github.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
  • Loading branch information
15 people committed Mar 31, 2020
1 parent 18d03bb commit 3193b02
Show file tree
Hide file tree
Showing 189 changed files with 2,662 additions and 807 deletions.
13 changes: 13 additions & 0 deletions .travis.yml
Expand Up @@ -12,6 +12,16 @@ matrix:
env: PYTHON_VERSION=python3 ONNX_ML=0
language: python
python: "3.6"
- os: linux
sudo: required
env: PYTHON_VERSION=python3 ONNX_ML=0 ONNX_DEBUG=1
language: python
python: "3.6"
- os: linux
sudo: required
env: PYTHON_VERSION=python3 ONNX_ML=1 ONNX_DEBUG=1
language: python
python: "3.6"
- os: osx
osx_image: xcode9.3
env: PYTHON_VERSION=python2 ONNX_ML=0
Expand All @@ -34,6 +44,9 @@ matrix:
- os: osx
osx_image: xcode9.3
env: PYTHON_VERSION=python3
- os: osx
osx_image: xcode9.3
env: PYTHON_VERSION=python3 ONNX_DEBUG=1
- os: linux
sudo: required
env: PYTHON_VERSION=python2 LITE=1
Expand Down
4 changes: 4 additions & 0 deletions .travis/install.sh
Expand Up @@ -13,5 +13,9 @@ fi
export CMAKE_ARGS="${CMAKE_ARGS} -DONNXIFI_DUMMY_BACKEND=ON"
export ONNX_NAMESPACE=ONNX_NAMESPACE_FOO_BAR_FOR_CI

if [ "${ONNX_DEBUG}" == "1" ]; then
export DEBUG=1
fi

time python setup.py --quiet bdist_wheel --universal --dist-dir .
find . -maxdepth 1 -name "*.whl" -ls -exec pip install {} \;
12 changes: 7 additions & 5 deletions CMakeLists.txt
Expand Up @@ -640,12 +640,14 @@ if(MSVC)
add_msvc_runtime_flag(onnxifi_dummy)
endif()

include(GNUInstallDirs)

install(DIRECTORY ${ONNX_ROOT}/onnx
DESTINATION include
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
FILES_MATCHING
PATTERN "*.h")
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/onnx
DESTINATION include
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
FILES_MATCHING
PATTERN "*.h")

Expand All @@ -660,13 +662,13 @@ configure_file(
install(FILES
${PROJECT_BINARY_DIR}/ONNXConfigVersion.cmake
${PROJECT_BINARY_DIR}/ONNXConfig.cmake
DESTINATION share/cmake/ONNX
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/ONNX
COMPONENT dev)
install(EXPORT ONNXTargets DESTINATION share/cmake/ONNX)
install(EXPORT ONNXTargets DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/ONNX")
install(TARGETS
onnx onnx_proto
onnxifi onnxifi_dummy onnxifi_loader
EXPORT ONNXTargets DESTINATION lib)
EXPORT ONNXTargets DESTINATION ${CMAKE_INSTALL_LIBDIR})

if(NOT ANDROID AND NOT IOS)
install(TARGETS onnxifi_wrapper
Expand Down
2 changes: 1 addition & 1 deletion VERSION_NUMBER
@@ -1 +1 @@
1.7.102
1.7.103
153 changes: 152 additions & 1 deletion docs/Changelog.md
Expand Up @@ -14801,6 +14801,8 @@ This version of the operator has been available since version 12 of the default
#### Attributes

<dl>
<dt><tt>ignore_index</tt> : int</dt>
<dd>Specifies a target value that is ignored and does not contribute to the input gradient. It is an optional value and valid values are [0, C).</dd>
<dt><tt>reduction</tt> : string (default is mean)</dt>
<dd>Type of reduction to apply to loss: none, sum, mean (default). 'none': the output is the loss for each sample. 'sum': the output will be summed. 'mean': the sum of the output will be divided by the sum of applied weights.</dd>
</dl>
Expand Down Expand Up @@ -14832,6 +14834,42 @@ This version of the operator has been available since version 12 of the default
<dd>Constrain target to integer types</dd>
</dl>

### <a name="Pow-12"></a>**Pow-12**</a>

Pow takes input data (Tensor<T>) and exponent Tensor, and
produces one output data (Tensor<T>) where the function `f(x) = x^exponent`,
is applied to the data tensor elementwise.
This operator supports **multidirectional (i.e., Numpy-style) broadcasting**; for more details please check [the doc](Broadcasting.md).

#### Version

This version of the operator has been available since version 12 of the default ONNX operator set.

#### Inputs

<dl>
<dt><tt>X</tt> : T</dt>
<dd>First operand, base of the exponent.</dd>
<dt><tt>Y</tt> : T1</dt>
<dd>Second operand, power of the exponent.</dd>
</dl>

#### Outputs

<dl>
<dt><tt>Z</tt> : T</dt>
<dd>Output tensor (same size as X)</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T</tt> : tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)</dt>
<dd>Constrain input X and output types to float/int tensors.</dd>
<dt><tt>T1</tt> : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)</dt>
<dd>Constrain input Y types to float/int tensors.</dd>
</dl>

### <a name="ReduceMax-12"></a>**ReduceMax-12**</a>

Computes the max of the input tensor's element along the provided axes. The resulted
Expand Down Expand Up @@ -14956,6 +14994,8 @@ This version of the operator has been available since version 12 of the default
#### Attributes

<dl>
<dt><tt>ignore_index</tt> : int</dt>
<dd>Specifies a target value that is ignored and does not contribute to the input gradient. It is an optional value and valid values are [0, C).</dd>
<dt><tt>reduction</tt> : string (default is mean)</dt>
<dd>Type of reduction to apply to loss: none, sum, mean(default). 'none': no reduction will be applied, 'sum': the output will be summed. 'mean': the sum of the output will be divided by the number of elements in the output.</dd>
</dl>
Expand All @@ -14965,7 +15005,7 @@ This version of the operator has been available since version 12 of the default
<dl>
<dt><tt>scores</tt> : T</dt>
<dd>The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of dimensions.</dd>
<dt><tt>labels</tt> : T</dt>
<dt><tt>labels</tt> : Tind</dt>
<dd>The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, ..., Dk], where K is the number of dimensions.</dd>
<dt><tt>weights</tt> (optional) : T</dt>
<dd>A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.</dd>
Expand All @@ -14985,6 +15025,8 @@ This version of the operator has been available since version 12 of the default
<dl>
<dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt>
<dd>Constrain input and output types to float tensors.</dd>
<dt><tt>Tind</tt> : tensor(int32), tensor(int64)</dt>
<dd>Constrain target to integer types</dd>
</dl>

### <a name="UnfoldToDepth-12"></a>**UnfoldToDepth-12**</a>
Expand Down Expand Up @@ -15424,3 +15466,112 @@ This version of the operator has been available since version 1 of the 'ai.onnx.
<dd>Allow inputs and outputs to be any kind of tensor.</dd>
</dl>

### <a name="ai.onnx.training.Momentum-1"></a>**ai.onnx.training.Momentum-1**</a>

Compute one iteration of stochastic gradient update with momentum.
This operator can conduct the optimization of multiple tensor variables.

Let's define the behavior of this operator. As you can imagine, SG with momentum requires
several parameters:

- The learning-rate "R".
- The update count "T". That is, the number of conducted training iterations. It should
be zero in the first training iteration.
- A L2-norm regularization coefficient "norm_coefficient".
- A decay coefficient of previous accumulated gradient (i.e., momentum) "alpha".
- The scaling coefficient of current gradient "beta".
- An attribute to choose either standard momentum or Nesterov's momentum "mode" should
be used.

For the sake of simplicity, assume that there is only one tensor (called "X") to be optimized.
Other necessary inputs are "X"'s gradient (called "G") and "X"'s momentum (called "V"). This
Momentum operator maps all these inputs to the new value of "X" (called "X_new") and its new
momentum (called "V_new").

This operator supports two different momentum algorithms. Set the attribute "mode" to
"nesterov" if Nesterov's momentum is desired. Otherwise, set the attribute "model" to
"standard" to use standard momentum. Computation details are described subsequently.

Let "+", "-", "*", and "/" are all element-wise operations with numpy-style broadcasting.

Pseudo code for SG with standard momentum:

// Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared
// values of all elements in X.
G_regularized = norm_coefficient * X + G

// In the first training iteration, beta should always be 1.
beta_adjusted = T > 0 ? beta : 1

// Compute the current momentum based on previous momentum and the current gradient.
V_new = alpha * V + beta_adjusted * G_regularized

// Update X.
X_new = X - R * V_new

Pseudo code for SG with Nesterov's momentum:

// Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared
// values of all elements in X.
G_regularized = norm_coefficient * X + G;

// In the first training iteration, beta should always be 1.
beta_adjusted = T > 0 ? beta : 1

// Compute the current momentum based on previous momentum and the current gradient.
V_new = alpha * V + beta_adjusted * G_regularized;

// Compute final update direction and then update X.
X_new = X - R * (G_regularized + alpha * V_new)

If one assign this operators to optimize multiple inputs, for example, "X_1" and "X_2". The same
pseudo code would be extended to handle all tensors jointly. More specifically, we can view "X" as a
concatenation of "X_1" and "X_2" (of course, their gradient and accumulate gradient should
be concatenated too) and then our pseudo code becomes applicable.

#### Version

This version of the operator has been available since version 1 of the 'ai.onnx.training' operator set.

#### Attributes

<dl>
<dt><tt>alpha</tt> : float (required)</dt>
<dd>The decay factor of momentum. It should be a scalar.</dd>
<dt><tt>beta</tt> : float (required)</dt>
<dd>The coefficient of gradient in computing new momentum. It should be a scalar.</dd>
<dt><tt>mode</tt> : string (required)</dt>
<dd>Its value should be either "nesterov" or "standard". The value "nesterov" leads to the use of Nesterov's momentum while "standard" invokes stochastic gradient method using standard momentum</dd>
<dt><tt>norm_coefficient</tt> : float (required)</dt>
<dd>Coefficient of 0.5 * norm_coefficient * ||X||^2.</dd>
</dl>

#### Inputs (3 - &#8734;)

<dl>
<dt><tt>R</tt> : T1</dt>
<dd>The learning rate.</dd>
<dt><tt>T</tt> : T2</dt>
<dd>Update count of "X". It should be a scalar.</dd>
<dt><tt>inputs</tt> (variadic, heterogeneous) : T3</dt>
<dd>It sequentially contains the current values of optimized tensors, then their gradient tensors, and finally their momentum tensors. For example, if two tensors "X_1" and "X_2" are optimized, The expected input list would be ["X_1", "X_2", gradient of "X_1", gradient of "X_2", momentum of "X_1", momentum of "X_2"].</dd>
</dl>

#### Outputs (1 - &#8734;)

<dl>
<dt><tt>outputs</tt> (variadic, heterogeneous) : T3</dt>
<dd>It sequentially contains the new values of optimized tensors and then the new values of their momentum tensors. For example, if two tensors "X_1" and "X_2" are optimized, the output list would be [new value of "X_1," new value of "X_2" new momentum of "X_1", new momentum of "X_2"].</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>T1</tt> : tensor(float), tensor(double)</dt>
<dd>Constrain input types to float scalars.</dd>
<dt><tt>T2</tt> : tensor(int64)</dt>
<dd>Constrain input types to 64-bit integer scalars.</dd>
<dt><tt>T3</tt> : tensor(float), tensor(double)</dt>
<dd>Constrain input types to float tensors.</dd>
</dl>

0 comments on commit 3193b02

Please sign in to comment.