forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuration template to disable some SIMD. #3
Open
jslap-ubi
wants to merge
837
commits into
main
Choose a base branch
from
js/allow-disable-SIMD
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jslap-ubi
force-pushed
the
js/allow-disable-SIMD
branch
from
August 1, 2024 19:49
2f67b8f
to
46f8996
Compare
jslap-ubi
pushed a commit
that referenced
this pull request
Aug 1, 2024
### Description Security fuzz test with address sanitizer found several bugs
Add a check of node.InputDefs()[2]->Exists() for Layernorm bias (Follow up https://github.com/microsoft/onnxruntime/pull/21528/files#r1694026327) Format the file: break long line to be within 120 chars limit.
### Description <!-- Describe your changes. --> Changes to add in Set external data path for model weight files. Additional fixes to ensure this compiles off the latest v1.19 Onnxruntime ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Separate weights used for larger models (like stable diffusion) is motivation for this change set --------- Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Artur Wojcik <artur.wojcik@amd.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
### Description WebNN only supports test mode, so we don't care about other inputs or attributes about training mode, use WebNN's identity op to implement the Dropout op directly.
### Description Several tests result in segfaults during the minimal cuda build. Although test failures are expected due to the limitation of the minimal cuda EP, failing gracefully would be much preferred. ### Motivation and Context To reproduce: 1. Build ORT with: ```bash ./build.sh --build_shared_lib --use_full_protobuf --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --tensorrt_home /TensorRT-10.0.1.6 --parallel --skip_tests --skip_submodule_sync --allow_running_as_root --use_tensorrt --cmake_extra_defines onnxruntime_CUDA_MINIMAL=1 ``` 2. Run `onnxruntime_test_all` ```bash ... [----------] 1 test from AllocationPlannerTest [ RUN ] AllocationPlannerTest.ReusedInputCrossDifferentStreams Segmentation fault (core dumped) ```
…microsoft#21536) ### Description Refactor framework directory structure for MacOS packages ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Apple started enforcing specific [framework structure](https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPFrameworks/Concepts/FrameworkAnatomy.html) for MacOS packages. We need to change how we package for MacOS to follow the guidelines Fixes following issue: [Malformed Framework](microsoft/onnxruntime-swift-package-manager#19 )
Bump up version in main from 1.19.0 to 1.20.0 since the release branch has been cut.
### Description <!-- Describe your changes. --> Add ability to test packaging without rebuilding every time. Add ability to comment out some platforms/architectures without the scripts to assemble the c/obj-c packages breaking. Update a couple of commands to preserve symlinks. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make debugging packaging issues faster. Creates correct package for mac-catalyst and doesn't require setting symlinks via bash script.
…ft#21606) The mobile packages have been removed.
### Description <!-- Describe your changes. --> update script with cmake 3.30 to unblock EP Perf ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…rosoft#21625) ### Description Fix 2 typos in mlas avx 4bit gemm implementation to call correct vnni functions under vnni condition ### Motivation and Context needed for 1.19.0 release Signed-off-by: liqunfu <liqun.fu@microsoft.com>
… transient connection exceptions. (microsoft#21612) ### Description Improve docker commands to make docker image layer caching works. It can make docker building faster and more stable. So far, A100 pool's system disk is too small to use docker cache. We won't use pipeline cache for docker image and remove some legacy code. ### Motivation and Context There are often an exception of ``` 64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail 286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2) ``` Because Onnxruntime pipeline have been sending too many requests to download Nodejs in docker building. Which is the major reason of pipeline failing now In fact, docker image layer caching never works. We can always see the scrips are still running ``` microsoft#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts microsoft#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) microsoft#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) microsoft#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory microsoft#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']' microsoft#9 0.236 +++ tr -dc 0-9. microsoft#9 0.236 +++ cut -d . -f1 microsoft#9 0.238 ++ os_major_version=8 .... microsoft#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail microsoft#9 60.59 + return 0 ... ``` This PR is improving the docker command to make image layer caching work. Thus, CI won't send so many redundant request of downloading NodeJS. ``` microsoft#9 [2/5] ADD scripts /tmp/scripts microsoft#9 CACHED microsoft#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts microsoft#10 CACHED microsoft#11 [4/5] RUN adduser --uid 1000 onnxruntimedev microsoft#11 CACHED microsoft#12 [5/5] WORKDIR /home/onnxruntimedev microsoft#12 CACHED ``` ###Reference https://docs.docker.com/build/drivers/ --------- Co-authored-by: Yi Zhang <your@email.com>
### Description - Update pipelines to use QNN SDK 2.25 by default - Update ifdef condition to apply workaround for QNN LayerNorm validation bug to QNN SDK 2.25 (as well as 2.24) ### Motivation and Context Use the latest QNN SDK
Fix usability checker CoreML config file path. The files got renamed but one place was still referring to the old name.
### Description <!-- Describe your changes. --> Improve speed in combining `per-channel` data for using a single `np.concatenate` instead of multiple `np.concatenates` within a for loop. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix the issue microsoft#21562 Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com>
### Description <!-- Describe your changes. --> ### Motivation and Context To fix whisper test failure
Fix missed ORT_ENFORCE check which caused heap buffer overflow because of out of bound access.
### Description <!-- Describe your changes. --> Update to match microsoft#21627 and make the info for Split consistent. As a Split that doesn't split anything is a no-op it doesn't seem meaningful to call that limitation out in the docs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…21664) ### Description allow op test to use f16 type for inputs/outputs. This PR introduces "@petamoriken/float16" as Float16Array polyfill but restricts it to be only used for test runner.
### Description * Fix migraphx build error caused by microsoft#21598: Add a conditional compile on code block that depends on ROCm >= 6.2. Note that the pipeline uses ROCm 6.0. Unblock orttraining-linux-gpu-ci-pipeline and orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline pipelines: * Disable a model test in linux GPU training ci pipelines caused by microsoft#19470: Sometime, cudnn frontend throws exception that cudnn graph does not support a Conv node of keras_lotus_resnet3D model on V100 GPU. Note that same test does not throw exception in other GPU pipelines. The failure might be related to cudnn 8.9 and V100 GPU used in the pipeline (Amper GPUs and cuDNN 9.x do not have the issue). The actual fix requires fallback logic, which will take time to implement, so we temporarily disable the test in training pipelines. * Force install torch for cuda 11.8. (The docker has torch 2.4.0 for cuda 12.1 to build torch extension, which it is not compatible cuda 11.8). Note that this is temporary walkround. More elegant fix is to make sure right torch version in docker build step, that might need update install_python_deps.sh and corresponding requirements.txt. * Skip test_gradient_correctness_conv1d since it causes segment fault. Root cause need more investigation (maybe due to cudnn frontend as well). * Skip test_aten_attention since it causes assert failure. Root cause need more investigation (maybe due to torch version). * Skip orttraining_ortmodule_distributed_tests.py since it has error that compiler for torch extension does not support c++17. One possible fix it to set the following compile argument inside setup.py of extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17']. However, due to the urgency of unblocking the pipelines, just disable the test for now. * skip test_softmax_bf16_large. For some reason, torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so the test was run in CI, but V100 does not support bf16 natively. * Fix typo of deterministic ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description <!-- Describe your changes. --> The xcframework now uses symlinks to have the correct structure according to Apple requirements. Symlinks are not supported by nuget on Windows. In order to work around that we can store a zip of the xcframeworks in the nuget package. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix nuget packaging build break
### Description Fix a check of mask type introduced by me in a recent commit. Add tests.
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description <!-- Describe your changes. --> This change allows to match external data path like `a.data` to `./a.data`. <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…efault (microsoft#19707) ### Description update the build script for webgpu to enable model dump by default Now if using build_jsep.bat to build debug, the model dump is enabled. Using [`optimizedModelFilePath`](https://onnxruntime.ai/docs/api/js/interfaces/InferenceSession.SessionOptions.html#optimizedModelFilePath) in session option can dump the optimized model in browser ### Motivation and Context Helps to debug/rule out problems may related to model optimizer.
### Description This change enhances the existing Pad Fusion to fuse Pad even if a Cast operator is present between Pad and Conv/MaxPool/AveragePool. It keeps the Cast as it is. <pre> /* * Before Fusion: * Pad * | * Cast (Optional) * | * Conv/MaxPool/AveragePool * * After Fusion: * Cast (Optional) * | * Conv/MaxPool/AveragePool */ </pre> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description Add a gather that supports block-quantized input data. ### Motivation and Context To support Web inference scenario with quantized vocabulary embeddings.
### Description <!-- Describe your changes. --> Fix wrong per-tensor quantized weight type for matmul. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix related bug as described in microsoft#21346
…rosoft#21693) ### Description When quantize MatMul to DQ + MatMul using 4bit QDQ tool chain, previously the opsets of domains are not changed. Now, when quantize MatMul to DQ + MatMul in QDQ format, force upgrade onnx domain to opset 21. ### Motivation and Context In QDQ format, DQ with int4 and blocked quantization is used. This requires DQ with opset >= 21. When quantize MatMul to DQ + MatMul, force upgrade onnx domain to opset 21.
`supportsModel` is deprecated in TRT 10.1. Add `supportsModelV2 `but still keep `supportsModel` as we still need to support TRT 8.6 where `supportsModelV2 ` is not supported.
### Description See microsoft/onnxruntime-extensions#476 and actions/runner-images#7671 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Current issue - [ ] For default xcode 15.2, that come with the MacOS-13, We Need to update the boost container header boost/container_hash/hash.hpp version to pass the build - [x] For xcode 14.2 The Build passed but the `Run React Native Detox Android e2e Test` Failed. Possible flaky test, microsoft#21969 - [x] For xcode 14.3.1 We encountered following issue in `Build React Native Detox iOS e2e Tests` ``` ld: file not found: /Applications/Xcode_14.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/arc/libarclite_iphonesimulator.a clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` Applied following code to the eof in both ios/Podfile and fixed the issue ``` post_install do |installer| installer.generated_projects.each do |project| project.targets.each do |target| target.build_configurations.each do |config| config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '13.0' end end end end ``` - [x] facebook/react-native#32483 Applying changes to ios/Pofile ``` pre_install do |installer| # Custom pre-install script or commands puts "Running pre-install script..." # Recommended fix for facebook/react-native#32483 # from facebook/react-native#32483 (comment) system("sed -i '' 's/typedef uint8_t clockid_t;//' \"${SRCROOT}/Pods/RCT-Folly/folly/portability/Time.h\"") end ``` - [ ] Detox environment setting up exceeded time out of 120000ms during iso e2e test ### dependent - [x] microsoft#21159 --------- Co-authored-by: Changming Sun <chasun@microsoft.com>
### Description Update XNNPack to latest version (Sep 4) - Some op outputs are changed, channel or stride paras are moved into reshape func. e.g. google/XNNPACK@96962a6 - input params of xnnpack's resize related function are changed a lot - KleidiAI is added as a dependency in ARM64 - The latest XNNPACK includes 2 static libs microkernels-prod and xnnpack. Without microkernels-prod, it throws the exception of Undefined symbols. - Add ORT_TARGET_PROCESSOR to get the real processor target in CMake
### Description This PR refactors the `CPU` kernel for the `CumSum` operator. The new implementation strives to have as little indirection as possible. ### Motivation and Context Currently the `CumSum` operator perform very poorly in the case of 1D tensors(it was slower than a python loop). This is caused by the extensive use of the `SliceIterator`-s. Here is a relevant snippet: ```python import time import ndonnx as ndx import onnxruntime as ort import numpy as np import onnx def test_cumsum(sz): a = ndx.array(shape=(sz,), dtype=ndx.int64) b = ndx.cumsum(a) model = ndx.build({'a': a}, {'b': b}) onnx.save(model, "model.onnx") input = np.ones(sz, np.int64) start = time.time() result = ort.InferenceSession(model.SerializeToString()).run(None, {'a': input}) end = time.time() return end - start def test_cumsum_by_hand(sz): input = np.ones(sz, np.int64) start = time.time() answer = [0] for i in input: answer.append(answer[-1] + i) end = time.time() return end - start print(test_cumsum(int(1e7))) print(test_cumsum_by_hand(int(1e7))) ``` Before ```console 0.9794480800628662 0.4518160820007324 ``` After ```console 0.02483987808227539 0.5496008396148682 ``` The `model.onnx`: <img width="214" alt="image" src="https://github.com/user-attachments/assets/a213d6ff-86c3-49b5-a493-ebfd97deaa41"> The flame graph: ![profile-3](https://github.com/user-attachments/assets/c7418a05-cb65-4d72-a76d-6a6b05b4ba4d)
### Description Builds arm64 python 3.12 wheel for QNN EP. ### Motivation and Context
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description Add ONNX export script for segment anything v2 (SAM2). ### Limitations * Does not support video. Only support image right now. * The decoder does not support batch inference. ### Credits The demo that is based on [SAM2 notebook](https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb), and modified to run with ORT. The export of decoder is inspired by https://github.com/vietanhdev/samexporter. ### Demo Example output of demo: ![sam2_demo](https://github.com/user-attachments/assets/9a9fa360-8c20-482e-9935-a7aba9cf15de) ### Motivation and Context For support optimization of SAM2 image segmentation.
### Description Fix regression caused by microsoft#17361 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…MM (microsoft#21984) ### Description <!-- Describe your changes. --> ONNXRuntime implementation of S8S8 was using the default C++ implementation; with this new ISA, all variants of QGemm Int8 can support VNNI dot product and full AVX2 instructions. All signed/unsigned variants support VNNI instructions starting with LNL. Renamed structs and functions to better indicate support of all Int8 vs U8X8 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> LNL HW implemented new ISA, and this code enables that ISA in QGemm. Speed is improved for S8S8 to match with existing U8S8 code. S8U8 would also match speed if ONNX formally accepted the data type.
set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn this can be used for offline context binary generation.
…iptor.shape (microsoft#22121) The spec renames MLOperandDescriptor.dimensions to MLOperandDescriptor.shape, in order to support older Chromium versions, we will keep both in WebNN EP for a while. Fixed microsoft#22120
### Description <!-- Describe your changes. --> Add linker flags to support 16KB page size support on Android. See https://source.android.com/docs/core/architecture/16kb-page-size/16kb#build-lib-16kb-alignment ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> microsoft#21837
…ty is supported. (microsoft#22141) ### Description <!-- Describe your changes. --> Use the latest nuget.exe for the `readme` property to be supported. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> microsoft#22137
…osoft#21927) ### Description Update DML EP for `FusedMatMul` ORT graph node have TransA/B attribute set instead of updating the strides. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
### Description Fix random crash for QNN UTs with multi-thread run like QnnHTPBackendTests.MultithreadHtpPowerCfgDefaultAndRunOption Root cause, last minute code change microsoft@b4e26bd static std::mutex mutex; -> OrtMutex mutex; missed static.
…22157) followed the rocm example below it which isn't the naming convention we want to follow. didn't fix rocm because i'm not sure if there are consumers using its naming convention.
…oft#22140) ### Description Decouple implementation for different A types to improve readability and maintainability. ### Motivation and Context As more types are added, the implementation can differ a lot between types. Besides, different hardware may require different implementations. This PR creates an abstraction boundary where different implemetation can plug in easily.
…microsoft#22135) ### Description Fixes the logic for getting the number of elements for the input and output spans in the `MinMaxMLFloat16` method. This was incorrectly using the full number of elements in the output rather than the number of elements in the current span, which worked fine with 1D inputs but breaks with 2D inputs. This meant that as the `BroadcastLooper` iterated over spans, `MinMaxMLFloat16` would start at a position further forward in the input and output and read and write further beyond the end of the input and output respectively, causing the asan error in microsoft#21558 and sometimes segfaults in larger examples. ### Motivation and Context Fixes microsoft#21558. From further testing, this issue didn't only cause asan errors in tests but causes segfaults with larger sized inputs.
### Description <!-- Describe your changes. --> The optional `axes` input may exist with an empty name and be a nullptr. Update the CUDA implementation to handle this. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> microsoft#22035
) ### Description Fix usage of c++ std::chrono::operator<< in mac builds for wider range of xcode/targets. ### Motivation and Context microsoft#21033
### Description <!-- Describe your changes. --> ### Motivation and Context By default, CMAKE_SYSTEM_PROCESSOR is same CMAKE_HOST_SYSTEM_PROCESSOR https://cmake.org/cmake/help/latest/variable/CMAKE_SYSTEM_PROCESSOR.html KleidiAI uses CMAKE_SYSTEM_PROCESSOR to determine whether to include some arm64 ukernels. https://gitlab.arm.com/kleidi/kleidiai/-/blob/main/CMakeLists.txt#L134 We use Mac with Intel CPU to cross compile MAC with ARM in ios packaging pipeline So we need to make CMAKE_SYSTEM_PROCESSOR same with ORT_TARGET_PROCESSOR
### Description When K == 0 output a MxN matrix filled with bias if present or filled with zeros. This brings it inline with MatMul behavior especially when Gemm is used to fuse MatMul with Add. ### Motivation and Context * Comply with numpy spec of MatMul * Address a case when empty initializers are used for computation.
### Description * Add MultiHeadAttention fusion for SAM2. * Add LayerNormalization fusion for NCHW format by inserting Transpose from NCHW to NHWC before layer normalization, and add another Transpose after layer norm to convert NHWC back to NCHW. Hopefully, those extra Transpose nodes will be removed when prefer_nhwc is enabled later. * Add a condition that the input shall be 3D when fuse SkipLayerNorm. * Update convert_to_onnx.py to add `--optimize` and `--use_gpu` options to output optimized onnx model for CPU/CUDA eps. * Add an option `--dtype fp16|fp32` in convert_to_onnx.py to support converting optimized model to float16. * Update the demo to use the optimized onnx models. ### Motivation and Context To support optimization of SAM2 for CPU/CUDA eps that is exported in microsoft#22119
### Description Add benchmark script segment anything v2. It depends on microsoft#22119 for onnx export, and microsoft#22167 for sam2 graph fusion. ### Motivation and Context Benchmark SAM2 model performance.
…n logger from that session (microsoft#22170) ### Description Fix an issue that QNN models shared from other session use the session logger from that producer session also which cause confusion. Make QNN model compute function use the session logger from current session.
…osoft#22056) ### Description <!-- Describe your changes. --> Specify the path of `ar`, `ld` and `libtool` when building apple framework. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Sometimes non-system executables will comes before the system-provided ones. This PR intends to prevent it from happening.
jslap-ubi
force-pushed
the
js/allow-disable-SIMD
branch
from
September 23, 2024 16:58
46f8996
to
d0aada7
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Motivation and Context