Allow configuration template to disable some SIMD. #3

jslap-ubi · 2024-04-05T20:55:37Z

Description

Motivation and Context

### Description Security fuzz test with address sanitizer found several bugs

Add a check of node.InputDefs()[2]->Exists() for Layernorm bias (Follow up https://github.com/microsoft/onnxruntime/pull/21528/files#r1694026327) Format the file: break long line to be within 120 chars limit.

### Description  Changes to add in Set external data path for model weight files. Additional fixes to ensure this compiles off the latest v1.19 Onnxruntime ### Motivation and Context  Separate weights used for larger models (like stable diffusion) is motivation for this change set --------- Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Artur Wojcik <artur.wojcik@amd.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>

### Description WebNN only supports test mode, so we don't care about other inputs or attributes about training mode, use WebNN's identity op to implement the Dropout op directly.

### Description Several tests result in segfaults during the minimal cuda build. Although test failures are expected due to the limitation of the minimal cuda EP, failing gracefully would be much preferred. ### Motivation and Context To reproduce: 1. Build ORT with: ```bash ./build.sh --build_shared_lib --use_full_protobuf --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --tensorrt_home /TensorRT-10.0.1.6 --parallel --skip_tests --skip_submodule_sync --allow_running_as_root --use_tensorrt --cmake_extra_defines onnxruntime_CUDA_MINIMAL=1 ``` 2. Run `onnxruntime_test_all` ```bash ... [----------] 1 test from AllocationPlannerTest [ RUN ] AllocationPlannerTest.ReusedInputCrossDifferentStreams Segmentation fault (core dumped) ```

…microsoft#21536) ### Description Refactor framework directory structure for MacOS packages ### Motivation and Context  Apple started enforcing specific [framework structure](https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPFrameworks/Concepts/FrameworkAnatomy.html) for MacOS packages. We need to change how we package for MacOS to follow the guidelines Fixes following issue: [Malformed Framework](microsoft/onnxruntime-swift-package-manager#19 )

Bump up version in main from 1.19.0 to 1.20.0 since the release branch has been cut.

### Description  Add ability to test packaging without rebuilding every time. Add ability to comment out some platforms/architectures without the scripts to assemble the c/obj-c packages breaking. Update a couple of commands to preserve symlinks. ### Motivation and Context  Make debugging packaging issues faster. Creates correct package for mac-catalyst and doesn't require setting symlinks via bash script.

…ft#21606) The mobile packages have been removed.

### Description  update script with cmake 3.30 to unblock EP Perf ### Motivation and Context

…rosoft#21625) ### Description Fix 2 typos in mlas avx 4bit gemm implementation to call correct vnni functions under vnni condition ### Motivation and Context needed for 1.19.0 release Signed-off-by: liqunfu <liqun.fu@microsoft.com>

… transient connection exceptions. (microsoft#21612) ### Description Improve docker commands to make docker image layer caching works. It can make docker building faster and more stable. So far, A100 pool's system disk is too small to use docker cache. We won't use pipeline cache for docker image and remove some legacy code. ### Motivation and Context There are often an exception of ``` 64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail 286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2) ``` Because Onnxruntime pipeline have been sending too many requests to download Nodejs in docker building. Which is the major reason of pipeline failing now In fact, docker image layer caching never works. We can always see the scrips are still running ``` microsoft#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts microsoft#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) microsoft#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) microsoft#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory microsoft#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']' microsoft#9 0.236 +++ tr -dc 0-9. microsoft#9 0.236 +++ cut -d . -f1 microsoft#9 0.238 ++ os_major_version=8 .... microsoft#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail microsoft#9 60.59 + return 0 ... ``` This PR is improving the docker command to make image layer caching work. Thus, CI won't send so many redundant request of downloading NodeJS. ``` microsoft#9 [2/5] ADD scripts /tmp/scripts microsoft#9 CACHED microsoft#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts microsoft#10 CACHED microsoft#11 [4/5] RUN adduser --uid 1000 onnxruntimedev microsoft#11 CACHED microsoft#12 [5/5] WORKDIR /home/onnxruntimedev microsoft#12 CACHED ``` ###Reference https://docs.docker.com/build/drivers/ --------- Co-authored-by: Yi Zhang <your@email.com>

### Description - Update pipelines to use QNN SDK 2.25 by default - Update ifdef condition to apply workaround for QNN LayerNorm validation bug to QNN SDK 2.25 (as well as 2.24) ### Motivation and Context Use the latest QNN SDK

Fix usability checker CoreML config file path. The files got renamed but one place was still referring to the old name.

### Description  Improve speed in combining `per-channel` data for using a single `np.concatenate` instead of multiple `np.concatenates` within a for loop. ### Motivation and Context  Fix the issue microsoft#21562 Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com>

### Description  ### Motivation and Context To fix whisper test failure

Fix missed ORT_ENFORCE check which caused heap buffer overflow because of out of bound access.

### Description  Update to match microsoft#21627 and make the info for Split consistent. As a Split that doesn't split anything is a no-op it doesn't seem meaningful to call that limitation out in the docs. ### Motivation and Context

…21664) ### Description allow op test to use f16 type for inputs/outputs. This PR introduces "@petamoriken/float16" as Float16Array polyfill but restricts it to be only used for test runner.

### Description * Fix migraphx build error caused by microsoft#21598: Add a conditional compile on code block that depends on ROCm >= 6.2. Note that the pipeline uses ROCm 6.0. Unblock orttraining-linux-gpu-ci-pipeline and orttraining-ortmodule-distributed and orttraining-amd-gpu-ci-pipeline pipelines: * Disable a model test in linux GPU training ci pipelines caused by microsoft#19470: Sometime, cudnn frontend throws exception that cudnn graph does not support a Conv node of keras_lotus_resnet3D model on V100 GPU. Note that same test does not throw exception in other GPU pipelines. The failure might be related to cudnn 8.9 and V100 GPU used in the pipeline (Amper GPUs and cuDNN 9.x do not have the issue). The actual fix requires fallback logic, which will take time to implement, so we temporarily disable the test in training pipelines. * Force install torch for cuda 11.8. (The docker has torch 2.4.0 for cuda 12.1 to build torch extension, which it is not compatible cuda 11.8). Note that this is temporary walkround. More elegant fix is to make sure right torch version in docker build step, that might need update install_python_deps.sh and corresponding requirements.txt. * Skip test_gradient_correctness_conv1d since it causes segment fault. Root cause need more investigation (maybe due to cudnn frontend as well). * Skip test_aten_attention since it causes assert failure. Root cause need more investigation (maybe due to torch version). * Skip orttraining_ortmodule_distributed_tests.py since it has error that compiler for torch extension does not support c++17. One possible fix it to set the following compile argument inside setup.py of extension fused_adam: extra_compile_args['cxx'] = ['-std=c++17']. However, due to the urgency of unblocking the pipelines, just disable the test for now. * skip test_softmax_bf16_large. For some reason, torch.cuda.is_bf16_supported() returns True in V100 with torch 2.3.1, so the test was run in CI, but V100 does not support bf16 natively. * Fix typo of deterministic ### Motivation and Context

### Description  The xcframework now uses symlinks to have the correct structure according to Apple requirements. Symlinks are not supported by nuget on Windows. In order to work around that we can store a zip of the xcframeworks in the nuget package. ### Motivation and Context  Fix nuget packaging build break

### Description Fix a check of mask type introduced by me in a recent commit. Add tests.

### Description  ### Motivation and Context

### Description  This change allows to match external data path like `a.data` to `./a.data`.

…efault (microsoft#19707) ### Description update the build script for webgpu to enable model dump by default Now if using build_jsep.bat to build debug, the model dump is enabled. Using [`optimizedModelFilePath`](https://onnxruntime.ai/docs/api/js/interfaces/InferenceSession.SessionOptions.html#optimizedModelFilePath) in session option can dump the optimized model in browser ### Motivation and Context Helps to debug/rule out problems may related to model optimizer.

### Description This change enhances the existing Pad Fusion to fuse Pad even if a Cast operator is present between Pad and Conv/MaxPool/AveragePool. It keeps the Cast as it is. <pre> /* * Before Fusion: * Pad * | * Cast (Optional) * | * Conv/MaxPool/AveragePool * * After Fusion: * Cast (Optional) * | * Conv/MaxPool/AveragePool */ </pre> ### Motivation and Context

### Description Add a gather that supports block-quantized input data. ### Motivation and Context To support Web inference scenario with quantized vocabulary embeddings.

### Description  Fix wrong per-tensor quantized weight type for matmul. ### Motivation and Context  Fix related bug as described in microsoft#21346

…rosoft#21693) ### Description When quantize MatMul to DQ + MatMul using 4bit QDQ tool chain, previously the opsets of domains are not changed. Now, when quantize MatMul to DQ + MatMul in QDQ format, force upgrade onnx domain to opset 21. ### Motivation and Context In QDQ format, DQ with int4 and blocked quantization is used. This requires DQ with opset >= 21. When quantize MatMul to DQ + MatMul, force upgrade onnx domain to opset 21.

`supportsModel` is deprecated in TRT 10.1. Add `supportsModelV2 `but still keep `supportsModel` as we still need to support TRT 8.6 where `supportsModelV2 ` is not supported.

### Description See microsoft/onnxruntime-extensions#476 and actions/runner-images#7671 ### Motivation and Context  ### Current issue - [ ] For default xcode 15.2, that come with the MacOS-13, We Need to update the boost container header boost/container_hash/hash.hpp version to pass the build - [x] For xcode 14.2 The Build passed but the `Run React Native Detox Android e2e Test` Failed. Possible flaky test, microsoft#21969 - [x] For xcode 14.3.1 We encountered following issue in `Build React Native Detox iOS e2e Tests` ``` ld: file not found: /Applications/Xcode_14.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/arc/libarclite_iphonesimulator.a clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` Applied following code to the eof in both ios/Podfile and fixed the issue ``` post_install do |installer| installer.generated_projects.each do |project| project.targets.each do |target| target.build_configurations.each do |config| config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '13.0' end end end end ``` - [x] facebook/react-native#32483 Applying changes to ios/Pofile ``` pre_install do |installer| # Custom pre-install script or commands puts "Running pre-install script..." # Recommended fix for facebook/react-native#32483 # from facebook/react-native#32483 (comment) system("sed -i '' 's/typedef uint8_t clockid_t;//' \"${SRCROOT}/Pods/RCT-Folly/folly/portability/Time.h\"") end ``` - [ ] Detox environment setting up exceeded time out of 120000ms during iso e2e test ### dependent - [x] microsoft#21159 --------- Co-authored-by: Changming Sun <chasun@microsoft.com>

### Description Update XNNPack to latest version (Sep 4) - Some op outputs are changed, channel or stride paras are moved into reshape func. e.g. google/XNNPACK@96962a6 - input params of xnnpack's resize related function are changed a lot - KleidiAI is added as a dependency in ARM64 - The latest XNNPACK includes 2 static libs microkernels-prod and xnnpack. Without microkernels-prod, it throws the exception of Undefined symbols. - Add ORT_TARGET_PROCESSOR to get the real processor target in CMake

### Description This PR refactors the `CPU` kernel for the `CumSum` operator. The new implementation strives to have as little indirection as possible. ### Motivation and Context Currently the `CumSum` operator perform very poorly in the case of 1D tensors(it was slower than a python loop). This is caused by the extensive use of the `SliceIterator`-s. Here is a relevant snippet: ```python import time import ndonnx as ndx import onnxruntime as ort import numpy as np import onnx def test_cumsum(sz): a = ndx.array(shape=(sz,), dtype=ndx.int64) b = ndx.cumsum(a) model = ndx.build({'a': a}, {'b': b}) onnx.save(model, "model.onnx") input = np.ones(sz, np.int64) start = time.time() result = ort.InferenceSession(model.SerializeToString()).run(None, {'a': input}) end = time.time() return end - start def test_cumsum_by_hand(sz): input = np.ones(sz, np.int64) start = time.time() answer = [0] for i in input: answer.append(answer[-1] + i) end = time.time() return end - start print(test_cumsum(int(1e7))) print(test_cumsum_by_hand(int(1e7))) ``` Before ```console 0.9794480800628662 0.4518160820007324 ``` After ```console 0.02483987808227539 0.5496008396148682 ``` The `model.onnx`: <img width="214" alt="image" src="https://github.com/user-attachments/assets/a213d6ff-86c3-49b5-a493-ebfd97deaa41"> The flame graph: ![profile-3](https://github.com/user-attachments/assets/c7418a05-cb65-4d72-a76d-6a6b05b4ba4d)

…rosoft#22106)

@gyagp

Fixes microsoft#21748 CC @gyagp

### Description Builds arm64 python 3.12 wheel for QNN EP. ### Motivation and Context

### Description  ### Motivation and Context

### Description Add ONNX export script for segment anything v2 (SAM2). ### Limitations * Does not support video. Only support image right now. * The decoder does not support batch inference. ### Credits The demo that is based on [SAM2 notebook](https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb), and modified to run with ORT. The export of decoder is inspired by https://github.com/vietanhdev/samexporter. ### Demo Example output of demo: ![sam2_demo](https://github.com/user-attachments/assets/9a9fa360-8c20-482e-9935-a7aba9cf15de) ### Motivation and Context For support optimization of SAM2 image segmentation.

### Description Fix regression caused by microsoft#17361 ### Motivation and Context

…MM (microsoft#21984) ### Description  ONNXRuntime implementation of S8S8 was using the default C++ implementation; with this new ISA, all variants of QGemm Int8 can support VNNI dot product and full AVX2 instructions. All signed/unsigned variants support VNNI instructions starting with LNL. Renamed structs and functions to better indicate support of all Int8 vs U8X8 ### Motivation and Context  LNL HW implemented new ISA, and this code enables that ISA in QGemm. Speed is improved for S8S8 to match with existing U8S8 code. S8U8 would also match speed if ONNX formally accepted the data type.

set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn this can be used for offline context binary generation.

…iptor.shape (microsoft#22121) The spec renames MLOperandDescriptor.dimensions to MLOperandDescriptor.shape, in order to support older Chromium versions, we will keep both in WebNN EP for a while. Fixed microsoft#22120

### Description  Add linker flags to support 16KB page size support on Android. See https://source.android.com/docs/core/architecture/16kb-page-size/16kb#build-lib-16kb-alignment ### Motivation and Context  microsoft#21837

…ty is supported. (microsoft#22141) ### Description  Use the latest nuget.exe for the `readme` property to be supported. ### Motivation and Context  microsoft#22137

…osoft#21927) ### Description Update DML EP for `FusedMatMul` ORT graph node have TransA/B attribute set instead of updating the strides. ### Motivation and Context

### Description Fix random crash for QNN UTs with multi-thread run like QnnHTPBackendTests.MultithreadHtpPowerCfgDefaultAndRunOption Root cause, last minute code change microsoft@b4e26bd static std::mutex mutex; -> OrtMutex mutex; missed static.

…22157) followed the rocm example below it which isn't the naming convention we want to follow. didn't fix rocm because i'm not sure if there are consumers using its naming convention.

…oft#22140) ### Description Decouple implementation for different A types to improve readability and maintainability. ### Motivation and Context As more types are added, the implementation can differ a lot between types. Besides, different hardware may require different implementations. This PR creates an abstraction boundary where different implemetation can plug in easily.

…microsoft#22135) ### Description Fixes the logic for getting the number of elements for the input and output spans in the `MinMaxMLFloat16` method. This was incorrectly using the full number of elements in the output rather than the number of elements in the current span, which worked fine with 1D inputs but breaks with 2D inputs. This meant that as the `BroadcastLooper` iterated over spans, `MinMaxMLFloat16` would start at a position further forward in the input and output and read and write further beyond the end of the input and output respectively, causing the asan error in microsoft#21558 and sometimes segfaults in larger examples. ### Motivation and Context Fixes microsoft#21558. From further testing, this issue didn't only cause asan errors in tests but causes segfaults with larger sized inputs.

### Description  The optional `axes` input may exist with an empty name and be a nullptr. Update the CUDA implementation to handle this. ### Motivation and Context  microsoft#22035

) ### Description Fix usage of c++ std::chrono::operator<< in mac builds for wider range of xcode/targets. ### Motivation and Context microsoft#21033

### Description  ### Motivation and Context By default, CMAKE_SYSTEM_PROCESSOR is same CMAKE_HOST_SYSTEM_PROCESSOR https://cmake.org/cmake/help/latest/variable/CMAKE_SYSTEM_PROCESSOR.html KleidiAI uses CMAKE_SYSTEM_PROCESSOR to determine whether to include some arm64 ukernels. https://gitlab.arm.com/kleidi/kleidiai/-/blob/main/CMakeLists.txt#L134 We use Mac with Intel CPU to cross compile MAC with ARM in ios packaging pipeline So we need to make CMAKE_SYSTEM_PROCESSOR same with ORT_TARGET_PROCESSOR

### Description When K == 0 output a MxN matrix filled with bias if present or filled with zeros. This brings it inline with MatMul behavior especially when Gemm is used to fuse MatMul with Add. ### Motivation and Context * Comply with numpy spec of MatMul * Address a case when empty initializers are used for computation.

### Description * Add MultiHeadAttention fusion for SAM2. * Add LayerNormalization fusion for NCHW format by inserting Transpose from NCHW to NHWC before layer normalization, and add another Transpose after layer norm to convert NHWC back to NCHW. Hopefully, those extra Transpose nodes will be removed when prefer_nhwc is enabled later. * Add a condition that the input shall be 3D when fuse SkipLayerNorm. * Update convert_to_onnx.py to add `--optimize` and `--use_gpu` options to output optimized onnx model for CPU/CUDA eps. * Add an option `--dtype fp16|fp32` in convert_to_onnx.py to support converting optimized model to float16. * Update the demo to use the optimized onnx models. ### Motivation and Context To support optimization of SAM2 for CPU/CUDA eps that is exported in microsoft#22119

### Description Add benchmark script segment anything v2. It depends on microsoft#22119 for onnx export, and microsoft#22167 for sam2 graph fusion. ### Motivation and Context Benchmark SAM2 model performance.

…n logger from that session (microsoft#22170) ### Description Fix an issue that QNN models shared from other session use the session logger from that producer session also which cause confusion. Make QNN model compute function use the session logger from current session.

…osoft#22056) ### Description  Specify the path of `ar`, `ld` and `libtool` when building apple framework. ### Motivation and Context  Sometimes non-system executables will comes before the system-provided ones. This PR intends to prevent it from happening.

jslap-ubi force-pushed the js/allow-disable-SIMD branch from 2f67b8f to 46f8996 Compare August 1, 2024 19:49

jslap-ubi pushed a commit that referenced this pull request Aug 1, 2024

Security fuzz address sanitizer fix Bug #2 and #3 (microsoft#21528)

48fb8a7

### Description Security fuzz test with address sanitizer found several bugs

tianleiwu and others added 28 commits August 2, 2024 15:45

Security fuzz address sanitizer fix Bug (continue) (microsoft#21579)

54d6614

Add a check of node.InputDefs()[2]->Exists() for Layernorm bias (Follow up https://github.com/microsoft/onnxruntime/pull/21528/files#r1694026327) Format the file: break long line to be within 120 chars limit.

[WebNN EP] Support Dropout op (microsoft#21586)

8c641d7

### Description WebNN only supports test mode, so we don't care about other inputs or attributes about training mode, use WebNN's identity op to implement the Dropout op directly.

bumps up version in main from 1.19 -> 1.20 (microsoft#21588)

134f477

Bump up version in main from 1.19.0 to 1.20.0 since the release branch has been cut.

Clean up some mobile package related files and their usages. (microso…

a5ce65d

…ft#21606) The mobile packages have been removed.

[QNN EP] Update QNN SDK to 2.25 (microsoft#21623)

0acefc7

### Description - Update pipelines to use QNN SDK 2.25 by default - Update ifdef condition to apply workaround for QNN LayerNorm validation bug to QNN SDK 2.25 (as well as 2.24) ### Motivation and Context Use the latest QNN SDK

Fix usability checker CoreML config file path. (microsoft#21626)

4ad87ca

Fix usability checker CoreML config file path. The files got renamed but one place was still referring to the old name.

Pin transformer and optimum version (microsoft#21650)

621b16f

### Description  ### Motivation and Context To fix whisper test failure

fix wrong check for tree ensemble regressor (microsoft#21595)

c93b92a

Fix missed ORT_ENFORCE check which caused heap buffer overflow because of out of bound access.

[js/web] allow op test to use f16 type for inputs/outputs (microsoft#…

5e66fcc

…21664) ### Description allow op test to use f16 type for inputs/outputs. This PR introduces "@petamoriken/float16" as Float16Array polyfill but restricts it to be only used for test runner.

[CUDA] Fix MHA mask (microsoft#21655)

9334d4e

### Description Fix a check of mask type introduced by me in a recent commit. Add tests.

update pipeline list for run_CIs_for_external_pr.py (microsoft#21665)

ae2b4d3

### Description  ### Motivation and Context

[CPU EP] Add block quantized Gather contrib op (microsoft#21630)

f30581e

### Description Add a gather that supports block-quantized input data. ### Motivation and Context To support Web inference scenario with quantized vocabulary embeddings.

chilo-ms and others added 29 commits September 17, 2024 09:52

[TensorRT EP] Add supportsModelV2 (microsoft#22081)

6dcdc70

`supportsModel` is deprecated in TRT 10.1. Add `supportsModelV2 `but still keep `supportsModel` as we still need to support TRT 8.6 where `supportsModelV2 ` is not supported.

Bump body-parser from 1.20.1 to 1.20.3 in /onnxruntime/test/wasm (mic…

7e98926

…rosoft#22106)

Update q4common.h to include the missing header (microsoft#21786)

c6dc787

Fixes microsoft#21748 CC @gyagp

[QNN EP] Build Python 3.12 wheel for Windows ARM64 (microsoft#22118)

b8dae68

### Description Builds arm64 python 3.12 wheel for QNN EP. ### Motivation and Context

[DML EP] Add QDQ+MatMul fusion into MatMulNBits (microsoft#22114)

05acfb9

### Description  ### Motivation and Context

use mac 12 for esrp code sign (microsoft#22134)

560778f

### Description Fix regression caused by microsoft#17361 ### Motivation and Context

[QNN EP] set up py packaging pipeline for Linux x64 (microsoft#22132)

944d873

set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn this can be used for offline context binary generation.

[qnn ep] fix naming convention of ort-nightly-qnn package (microsoft#…

c270fe6

…22157) followed the rocm example below it which isn't the naming convention we want to follow. didn't fix rocm because i'm not sure if there are consumers using its naming convention.

Fix std::chrono/date conflict for mac builds with C++20 (microsoft#22138

d469283

) ### Description Fix usage of c++ std::chrono::operator<< in mac builds for wider range of xcode/targets. ### Motivation and Context microsoft#21033

Add benchmark script for segment anything v2 (microsoft#22169)

171b901

### Description Add benchmark script segment anything v2. It depends on microsoft#22119 for onnx export, and microsoft#22167 for sam2 graph fusion. ### Motivation and Context Benchmark SAM2 model performance.

Allow configuration template to disable some SIMD.

d0aada7

jslap-ubi force-pushed the js/allow-disable-SIMD branch from 46f8996 to d0aada7 Compare September 23, 2024 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuration template to disable some SIMD. #3

Allow configuration template to disable some SIMD. #3

jslap-ubi commented Apr 5, 2024

Allow configuration template to disable some SIMD. #3

Are you sure you want to change the base?

Allow configuration template to disable some SIMD. #3

Conversation

jslap-ubi commented Apr 5, 2024

Description

Motivation and Context