Merge from master into ort_training#3486
Merged
codemzs merged 50 commits intoort_trainingfrom Apr 13, 2020
Merged
Conversation
* Add benchmark script and notebook for GPT2 * Update Reshape fusion for GPT2 model * Add opt_level option for bert_model_optimization to disable onnxruntime by setting --opt_level 0 * Fix keras optimization
Updated the tags in the table to reflect the new images for Release v1.2
Advance ONNX commit to pickup the latest ArgMax, ArgMin, ReduceMax/ReduceMin, MaxPool Declare new versions for CPU/CUDA. Implement infrastructure support for int8/uint8. Adust GatherOp test for a new error. Adjust Scan9.BadShape test. Add exclusions for index out of bounds checks. Rework result verification for SVDTransformer.
1. Fix a bug in FunctionImpl::FunctionImpl. It set wrong name for the new attribute. 2. Set error code to NOT_IMPLEMENTED if a function contains a not implemented op.
* Rework SVMClassifier - use GEMM for initial scoring - minimize data allocations and copies - parallelize the second half of the scoring for larger batches
1. Add build options for enabling AVX/AVX2/AVX512 2. Update eigen to a newer version, because the current one doesn't work with VC and AVX512.
* Copy image tests from ADO * wip * Port tests to googletest * Add FNS-Candy license * Add missing collaterals * Remove brand images * Fix typos * Use PrepareModelSessionBinding in MnistImageTest * Fix typos
Allow zero in split op (A change in onnx 1.7 without bumping up the op version)
* Enable use_nearest2x_optimization for opset 11 of Resize when possible
… when including the header from multiple places.
Use the existing 2D convolution code in MlasConv to also handle 1D convolutions.
Update onnx submodule to 1.7.0 release candidate. This isn't a release tag, but it will be released soon, in 1-2 weeks.
An ExternOp's input needs buffers, so we cannot add compute_inline schedule on it even if it's a scalar tensor. Instead, we need to schedule it as compute_root.
To bypass a MSVC bug. Without this change, people can't use VS2017 to build onnxruntime in Release or RelWithDebInfo mode.
s/initailizer/initializer/
Implement Max/Min for opset 12. Add CLip(12) CPU impl. Implement Clip(12) for CPU and CUDA add tests
Add opt_level option for graph optimization level in bert perf test. Support BERT models that output each layer, where SkipLayerNormalization has more than 4 children. Check weight and bias are 1D for layer norm fusion. Add a dummy class Gpt2OnnxModel for further changes of GPT2 model.
Re-enable some tests that was recently fixed.
* Add CPU implementation for FastGelu operator * Update optimization script to fuse Gelu or FastGelu according to Elf or Tanh is used in graph. * Merge BiasGelu and FastGelu into one class * Enable FastGelu Fusion optimizer for CPU Execution Provider.
The commit 06fc950 which refactored cpu Pool class broke ACL EP build. Also worked on the commit a4fe60c as it also affects the new class. Move the declaration of the new MaxPoolV8 cpu class in the header file. Implement MaxPool 8-11 in ACL EP. Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
Use IMMA for int8 matmul to leverage Turing Tensor Core Format files under onnxruntime/core/providers/cude
5b6e6a4 to
4b5f66a
Compare
SherlockNoMad
approved these changes
Apr 13, 2020
edgchen1
approved these changes
Apr 13, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
End-to-end test passed:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=122628&view=results
Performance test as expected:
FP16, seq=128, bs=66, throughput=185.885
FP16, seq=512, bs=10, throughput=35.7223
FP32, seq=128, bs=33, throughput=42.1087
FP32, seq=512, bs=5, throughput=9.19644
Convergence test:
