Merge from master into ort_training by codemzs · Pull Request #3486 · microsoft/onnxruntime

codemzs · 2020-04-10T19:59:53Z

End-to-end test passed:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=122628&view=results

Performance test as expected:
FP16, seq=128, bs=66, throughput=185.885
FP16, seq=512, bs=10, throughput=35.7223
FP32, seq=128, bs=33, throughput=42.1087
FP32, seq=512, bs=5, throughput=9.19644

Convergence test:

* Add benchmark script and notebook for GPT2 * Update Reshape fusion for GPT2 model * Add opt_level option for bert_model_optimization to disable onnxruntime by setting --opt_level 0 * Fix keras optimization

Updated the tags in the table to reflect the new images for Release v1.2

Advance ONNX commit to pickup the latest ArgMax, ArgMin, ReduceMax/ReduceMin, MaxPool Declare new versions for CPU/CUDA. Implement infrastructure support for int8/uint8. Adust GatherOp test for a new error. Adjust Scan9.BadShape test. Add exclusions for index out of bounds checks. Rework result verification for SVDTransformer.

1. Fix a bug in FunctionImpl::FunctionImpl. It set wrong name for the new attribute. 2. Set error code to NOT_IMPLEMENTED if a function contains a not implemented op.

* Rework SVMClassifier - use GEMM for initial scoring - minimize data allocations and copies - parallelize the second half of the scoring for larger batches

…3396)

1. Add build options for enabling AVX/AVX2/AVX512 2. Update eigen to a newer version, because the current one doesn't work with VC and AVX512.

* Copy image tests from ADO * wip * Port tests to googletest * Add FNS-Candy license * Add missing collaterals * Remove brand images * Fix typos * Use PrepareModelSessionBinding in MnistImageTest * Fix typos

Allow zero in split op (A change in onnx 1.7 without bumping up the op version)

…viders. (#3400)

* Enable use_nearest2x_optimization for opset 11 of Resize when possible

… when including the header from multiple places.

Use the existing 2D convolution code in MlasConv to also handle 1D convolutions.

Update onnx submodule to 1.7.0 release candidate. This isn't a release tag, but it will be released soon, in 1-2 weeks.

An ExternOp's input needs buffers, so we cannot add compute_inline schedule on it even if it's a scalar tensor. Instead, we need to schedule it as compute_root.

To bypass a MSVC bug. Without this change, people can't use VS2017 to build onnxruntime in Release or RelWithDebInfo mode.

s/initailizer/initializer/

Implement Max/Min for opset 12. Add CLip(12) CPU impl. Implement Clip(12) for CPU and CUDA add tests

Add opt_level option for graph optimization level in bert perf test. Support BERT models that output each layer, where SkipLayerNormalization has more than 4 children. Check weight and bias are 1D for layer norm fusion. Add a dummy class Gpt2OnnxModel for further changes of GPT2 model.

Re-enable some tests that was recently fixed.

* Add CPU implementation for FastGelu operator * Update optimization script to fuse Gelu or FastGelu according to Elf or Tanh is used in graph. * Merge BiasGelu and FastGelu into one class * Enable FastGelu Fusion optimizer for CPU Execution Provider.

The commit 06fc950 which refactored cpu Pool class broke ACL EP build. Also worked on the commit a4fe60c as it also affects the new class. Move the declaration of the new MaxPoolV8 cpu class in the header file. Implement MaxPool 8-11 in ACL EP. Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>

Use IMMA for int8 matmul to leverage Turing Tensor Core Format files under onnxruntime/core/providers/cude

…raining.

…tests.

…dnn_common.cc.

tianleiwu and others added 30 commits March 31, 2020 13:43

Add Benchmark of GPT2 CPU inference (#3351)

ecbacd7

* Add benchmark script and notebook for GPT2 * Update Reshape fusion for GPT2 model * Add opt_level option for bert_model_optimization to disable onnxruntime by setting --opt_level 0 * Fix keras optimization

Updated tags for v1.2.0 release (#3386)

044c466

Updated the tags in the table to reflect the new images for Release v1.2

Fix a bug in FunctionImpl::FunctionImpl (#3376)

55fd283

1. Fix a bug in FunctionImpl::FunctionImpl. It set wrong name for the new attribute. 2. Set error code to NOT_IMPLEMENTED if a function contains a not implemented op.

Fix ARM cross compilation (related to #3378, #3298) (#3385)

a61400d

Rework SVMClassifier to improve performance (#3363)

33d3239

* Rework SVMClassifier - use GEMM for initial scoring - minimize data allocations and copies - parallelize the second half of the scoring for larger batches

fix some warnings in concurrency tests (#3395)

052c1fd

ERROR_NOT_SUPPORTED doesn't trigger Failed Hresult. Need E_NOTIMPL (#…

77c7d09

…3396)

Build options for enabling AVX/AVX2/AVX512 (#3373)

accffde

1. Add build options for enabling AVX/AVX2/AVX512 2. Update eigen to a newer version, because the current one doesn't work with VC and AVX512.

Fix python examples in documentation (#3379)

edec804

Add Ninja generator to build.py (#3331)

1c334ed

[WIP] Port image tests from WAI (#3365)

1671072

* Copy image tests from ADO * wip * Port tests to googletest * Add FNS-Candy license * Add missing collaterals * Remove brand images * Fix typos * Use PrepareModelSessionBinding in MnistImageTest * Fix typos

Allow zero in split op (#3389)

aefa466

Allow zero in split op (A change in onnx 1.7 without bumping up the op version)

Disable model tests for Mac OS X builds

a5fea26

Allow a custom op with the same name to be registered for several pro…

3568f8d

…viders. (#3400)

Enable upsample2x optimization for opset 11 Resize (#3388)

85131e7

* Enable use_nearest2x_optimization for opset 11 of Resize when possible

Fix issue in construction of DummyArena. (#3416)

14f4c3e

Add #pragma once to providers.h, so avoid 'struct' redefinition error…

5835349

… when including the header from multiple places.

Use MlasConv for 1D convolutions (#3425)

d4d19a7

Use the existing 2D convolution code in MlasConv to also handle 1D convolutions.

Update onnx submodule to 1.7.0 release candidate (#3405)

33006f4

Update onnx submodule to 1.7.0 release candidate. This isn't a release tag, but it will be released soon, in 1-2 weeks.

Fix race condition creating ConverterResourceStore (#3419)

517693a

Do not inline ExternOp's scalar tensor inputs (#3426)

d361121

An ExternOp's input needs buffers, so we cannot add compute_inline schedule on it even if it's a scalar tensor. Instead, we need to schedule it as compute_root.

Disable strong inline (#3399)

0dcc603

To bypass a MSVC bug. Without this change, people can't use VS2017 to build onnxruntime in Release or RelWithDebInfo mode.

change (#3431)

4ebad88

Fixed a typo (no functional change) (#3433)

7c69b17

s/initailizer/initializer/

Implement Min/Max/Clip(12) (#3410)

c8f5e6e

Implement Max/Min for opset 12. Add CLip(12) CPU impl. Implement Clip(12) for CPU and CUDA add tests

Re-enable tests (#3437)

9e65298

Re-enable some tests that was recently fixed.

tracysh and others added 13 commits April 7, 2020 15:01

Fix output range for int8_t QuantizeLinear op (#3445)

de60a14

Use IMMA for int8 matmul to leverage Turing Tensor Core (#3413)

4d71958

Use IMMA for int8 matmul to leverage Turing Tensor Core Format files under onnxruntime/core/providers/cude

Merge branch 'master' into ort_training

6ba7c99

Fix onnxruntime_unittests.cmake after merge.

8ea0e59

Fix dynamicslice.cc after merge.

eaa3f65

Rename ONNX OPTIONAL to OPTIONAL_VALUE.

84773c6

Get cuda_common.h from master.

0e4080f

Get onnxruntime/core/providers/cuda/tensor/slice.h from ort_training.

6bbc809

Get onnxruntime/contrib_ops/cuda/bert/fast_gelu.cc from ort_training.

c517608

Get onnxruntime/core/providers/cuda/cu from ort_training.

1b465ba

Get onnxruntime/core/providers/cuda/math/matmul_integer.cc from ort_t…

507d2bb

…raining.

Remove FastGelu from activations.

bb2f427

Put dropout_default, dropout_random, celu back in the list of broken …

4b5f66a

…tests.

kit1980 force-pushed the sedymche/merge_master_ort_training branch from 5b6e6a4 to 4b5f66a Compare April 13, 2020 00:42

Sergii Dymchenko added 4 commits April 12, 2020 19:16

Add to list of failing backend tests from master.

571a6d5

Get cudnn_common.cc from master.

7b2fc19

Remove usage of DeviceProp (which is removed in ort_training) from cu…

b670cdc

…dnn_common.cc.

Put back SubmoduleCheckoutMode parameter into mac-ci.yml.

bf3df41

kit1980 changed the title ~~WIP: Sedymche/merge master ort training~~ Merge master into ort_training Apr 13, 2020

kit1980 changed the title ~~Merge master into ort_training~~ Merge from master into ort_training Apr 13, 2020

kit1980 marked this pull request as ready for review April 13, 2020 09:20

kit1980 requested a review from a team as a code owner April 13, 2020 09:20

kit1980 added the training issues related to ONNX Runtime training; typically submitted using template label Apr 13, 2020

kit1980 requested review from SherlockNoMad and edgchen1 April 13, 2020 09:50

SherlockNoMad approved these changes Apr 13, 2020

View reviewed changes

edgchen1 approved these changes Apr 13, 2020

View reviewed changes

codemzs merged commit 5d99f17 into ort_training Apr 13, 2020

codemzs deleted the sedymche/merge_master_ort_training branch April 13, 2020 17:55

kit1980 requested a review from xzhu1900 April 13, 2020 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge from master into ort_training#3486

Merge from master into ort_training#3486
codemzs merged 50 commits intoort_trainingfrom
sedymche/merge_master_ort_training

codemzs commented Apr 10, 2020 •

edited by kit1980

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Conversation

codemzs commented Apr 10, 2020 • edited by kit1980 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

codemzs commented Apr 10, 2020 •

edited by kit1980

Loading