Integrate ONNX 1.22.0 (opset 27) — issue #28752#28754
Conversation
… tradition When an op's kernel set changes for the new opset (e.g. Range gaining fp16/bf16 at opset 27), version-split / bump that op's registration in EVERY EP that registers it (CPU and CUDA at minimum) in the SAME PR, so no EP silently lags behind CPU and the advertised opset boundaries stay consistent. Even an open-ended kernel that already binds the new opset (e.g. CUDA Range at SinceVersion(11)) should still be version-split for convention/clarity. Worked example cited: PR microsoft#28754 split Range [11,26]+27 in both CPU and CUDA (verified). - §1 Group B: added the all-EP tradition callout + an EP checklist (grep each provider dir; split cpu/cuda/js/rocm macro registrations; assess dml/webgpu/coreml/nnapi/etc. per their own systems; bump coreml/nnapi/vsinpu/webnn GetMaxSupportedOpSet ceilings). - §11: added a cross-EP consistency convention note distinguishing splitting (clarity) from binding-coverage (correctness). - §6 checklist: Group B line now calls out version-splitting in every EP. Agent-signed-off: Architect (bcad189c) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
34b6359 to
06ced99
Compare
… tradition When an op's kernel set changes for the new opset (e.g. Range gaining fp16/bf16 at opset 27), version-split / bump that op's registration in EVERY EP that registers it (CPU and CUDA at minimum) in the SAME PR, so no EP silently lags behind CPU and the advertised opset boundaries stay consistent. Even an open-ended kernel that already binds the new opset (e.g. CUDA Range at SinceVersion(11)) should still be version-split for convention/clarity. Worked example cited: PR microsoft#28754 split Range [11,26]+27 in both CPU and CUDA (verified). - §1 Group B: added the all-EP tradition callout + an EP checklist (grep each provider dir; split cpu/cuda/js/rocm macro registrations; assess dml/webgpu/coreml/nnapi/etc. per their own systems; bump coreml/nnapi/vsinpu/webnn GetMaxSupportedOpSet ceilings). - §11: added a cross-EP consistency convention note distinguishing splitting (clarity) from binding-coverage (correctness). - §6 checklist: Group B line now calls out version-splitting in every EP. Agent-signed-off: Architect (bcad189c) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…2 (PR microsoft#28754 CI Issue B) ONNX 1.22's onnx_core(OBJECT)/onnx_proto target split lists the generated ${ONNX_PROTO_SRCS} (.pb.cc) as compiled sources in BOTH targets (by design: onnx_core's objects feed onnx + onnx_cpp2py_export with hidden visibility, while onnx_proto is a standalone static lib with different defines). Xcode's new build system forbids two targets independently producing the same onnx-data.pb.cc. Extend the existing onnx_proto_gen custom target to also DEPEND on ${ONNX_PROTO_SRCS}, making it the single owner of the protoc generation step. Both libraries already depend on onnx_proto_gen, so they now consume the pre-generated sources (and still each compile their own object). This is CMake's documented add_custom_target-driver pattern for multi-target generated outputs (add_custom_command docs) and changes no defines/visibility, so it is safe for the normal Make/Ninja build. - cmake/patches/onnx/onnx.patch: new CMakeLists.txt hunk (now 2 files / 4 hunks) - cmake/vcpkg-ports/onnx/binskim.patch: byte-identical mirror (verified sha256) - onnx-opset-bump-checklist SKILL.md §7: hunk-count annotation 3 -> 4 Verified on Linux (Ninja, Debug): patch applies clean (git apply + patch -p1); onnx + onnx_proto configure, proto generates, both targets compile onnx-ml.pb.cc, libonnx.a + libonnx_proto.a link. Xcode itself is CI-verified only (no macOS host) - iOS_CI_on_Mac + React-Native-iOS are the oracles. Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icrosoft#28754 CI Issue B facet 2) ONNX 1.22 splits onnx into onnx_core(OBJECT)/onnx_proto; the aggregate `onnx` target is add_library(onnx $<TARGET_OBJECTS:onnx_core>) with no real sources. Xcode's generator won't archive a static lib whose only sources are $<TARGET_OBJECTS:...>, so no libonnx.a is emitted and ORT's iOS framework link fails. Guard the onnx target with if(CMAKE_GENERATOR STREQUAL "Xcode") to add a generated empty dummy source forcing the archive; else()-branch leaves Make/Ninja/MSVC byte-unchanged. Mirrored byte-identically into binskim.patch; skill §7 hunk count updated 4->5. Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
20b4010 to
85a11f2
Compare
…28754 CI Issue B facet 3) The onnx_core(OBJECT) aggregate consumed via $<TARGET_OBJECTS:onnx_core> gives the Xcode generator no build-order guarantee, so onnx's libtool archive step races ahead of onnx_core's compilation and fails with 'libtool: can't open file: onnx_core.../defs.o (No such file or directory)'. Under the Xcode guard, build libonnx.a from the generated empty source and link onnx_core via target_link_libraries(onnx PRIVATE onnx_core) instead of loose TARGET_OBJECTS: cmake records a proper target-level dependency (onnx_core compiles first) and still archives every onnx_core object into libonnx.a, so ORT's ar -x over $<TARGET_FILE:onnx> extracts the full symbol set. else()-branch leaves Make/Ninja/MSVC byte-unchanged. Mirrored byte-identically into binskim.patch; skill §7 annotation updated. Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
017edb3 to
85a11f2
Compare
… leniency) ONNX 1.22 (rc2, cherry-pick microsoft#8051) tightened convTransposeShapeInference to fail_shape_inference when an output_shape/output_padding attribute size does not match the number of spatial dimensions. ORT historically also accepted a non-spec rank+2 (full N,C,H,W) output_shape form. This is Option A: instead of patching ONNX to restore the leniency (Option B, commit 031c777), conform ORT's own test models to the spec so the onnx.patch ConvTranspose hunks never land in main. Changes: - conv_transpose_op_test.cc: 10 output_shape attributes -> spatial-only (N,C prefix dropped; Y_shape/expected_vals unchanged). Keep B's InvalidKernelShape expected-message fix (ONNX rejects kernel_shape rank at Graph::Resolve). Add a new InferenceSession-based regression test (ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime) that feeds an unknown-rank input so the kept rank+2 kernel branch stays exercised. - xnnpack_basic_test.cc: 1 output_shape attribute -> spatial-only. - conv_transpose_attributes.h: keep the rank+2 toleration (runtime-reachable for dynamic-rank inputs) and document why it is retained; no behavior change. - onnx.patch / binskim.patch: unchanged at the 3-hunk base (no ConvTranspose reverts) since this branch is built on the pre-B base. Breaking change: models specifying a rank+2 output_shape AND a statically-known input rank now fail to load under ONNX 1.22 with 'Attribute output_shape has incorrect size'. Migration: use spatial-only output_shape. See PR microsoft#28754 description. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
031c777 to
2f63b0a
Compare
01b9673 to
e72e1de
Compare
Pin ONNX to rel-1.22.0 HEAD commit bc3be77bec2f628788796dff60819186bacf49df (VERSION_NUMBER 1.22.0rc1): - cmake/deps.txt: commit-archive zip URL + SHA1 421e5a9afb6c41a54696e424e5b9a3796aab6821 - cmake/external/onnx: submodule -> bc3be77b - cmake/vcpkg-ports/onnx/vcpkg.json: version-semver 1.22.0, port-version 0 - cmake/vcpkg-ports/onnx/portfile.cmake: REF commit form + tar.gz SHA512 e0c526f5...3ce467 Rebase cmake/patches/onnx/onnx.patch to ONNX 1.22 and mirror byte-identically into binskim.patch: - Kept ONNX_MINIMAL_BUILD option (rebased context) - Restructured the minimal-build source-selection hunk for ONNX 1.22's new onnx_core OBJECT library / add_subdirectory(onnx) layout - Dropped the Utils.cmake protobuf_warnings hunk (already removed upstream in 1.22) - Kept the GroupNormalization-18 .Deprecate() removal (still present in 1.22) Agent-signed-off: Developer (8ac66e2a) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ONNX 1.22.0rc1 is published on PyPI (verified via pip index versions onnx --pre), so all 7 requirements files use the published wheel pin onnx==1.22.0rc1 rather than the git source pin. Agent-signed-off: Developer (0ede529c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o 27 and register Range at opset 27 - optimizer_api.h: kMaxSupportedOpset 26 -> 27 - Range CPU kernel split into VERSIONED [11,26] + new non-versioned [27] reusing the existing kernel (Range-27 adds fp16/bf16 types and a stash_type attr; the existing common numeric types bind, fp16/bf16 is a deferred enhancement) - cpu_execution_provider.cc: versioned the Range forward-declare and BuildKernelCreateInfo entries and added the opset-27 Range registration Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…doc (ONNX 1.22.0rc1, issue microsoft#28752) - onnx_backend_test_series_filters.jsonc: exclude deferred opset-27 ops CausalConvWithState and LinearAttention (whole families, base + _expanded), and the Range-27 float16/bfloat16 node tests (ORT CPU Range-27 reuses the existing kernel registration which supports only the common numeric types; fp16/bf16 is a tracked follow-up). float/int32 Range tests remain enabled. - js/web/docs/webgl-operators.md: regenerated via npm run build:doc; now lists CausalConvWithState and LinearAttention as unsupported WebGL ops, reflecting the opset-27 schemas pulled in by the submodule bump. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ONNX 1.22 re-registers Range with an opset-27 schema (adds stash_type attr, widens type constraints; signature unchanged: 3 scalar inputs -> 1-D output).
- gather_fusion.cc: add 27 to GatherToSlice Range version list {1,11}->{1,11,27}; add opset-27 GatherToSliceFusion test.
- embed_layer_norm_fusion.cc: add 27 to the two Range path matchers (parent_path_3/4) so the embedding fusion still matches opset-27 models.
- coreml/nnapi/vsinpu/webnn base_op_builder.h: bump default GetMaxSupportedOpSet 25->27, matching the lockstep convention of prior ONNX-integration PRs (microsoft#26579, microsoft#25678, microsoft#24449) so opset-27 nodes are not spuriously rejected.
qdq_util.cc / layout_transformation_potentially_added_ops.h / kernel_type_str_resolver_utils.cc were checked and need no change (none reference the opset-27 ops Range/CausalConvWithState/LinearAttention).
Agent-signed-off: Developer (0ede529c) [claude-opus-4.6 via copilot]
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Completes the T6 opset-27 sweep that the prior commit (c6366eb) missed for these two files. ONNX 1.22 re-registers Range with an opset-27 schema (operator_sets.h line 1460); find_optimizer_opset_version_updates_required.py flagged: 'Newer opset found for kOnnxDomain.Range. Latest:27 Optimizer support ends at 11. File: gather_fusion.cc' - gather_fusion.cc:309: GatherToSlice Range version list {1,11}->{1,11,27}. Range-27's signature is unchanged (3 scalar inputs -> 1-D output; only adds stash_type attr and wider type constraints), and the fusion depends only on that signature, so extending the accepted SinceVersion list is safe. - graph_transform_test.cc: new OpSet-27 (int64) GatherToSliceFusion block mirroring the existing OpSet-12/OpSet-14 blocks, proving Range-27 -> Gather fuses to Slice. Agent-signed-off: Developer (0ede529c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…0rc1) Update the CPU ai.onnx Range entry to reflect T2's split registration: a non-versioned opset-27 kernel (27+) plus the versioned [11, 26] kernel, matching the output of tools/python/gen_opkernel_doc.py against a fresh RelWithDebInfo build. The doc was updated surgically rather than via a full regen because this validation build is CPU-only (no CUDA/DML EPs available on this Linux host; DML is Windows-only). A full regen would have destructively dropped the CUDA and DmlExecutionProvider sections. The CPU-section delta was verified to be exactly this Range change by diffing the freshly generated CPU section against the committed doc. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…WithState/LinearAttention/Range-fp16-bf16 exclusions (ONNX 1.22.0rc1, issue microsoft#28752) T7 validation showed all 62 opset-27 backend node tests (the complete set: base + _expanded, including fp16/bf16) pass on the CPU EP. These ops are ONNX function ops (and Range-27 carries a function body), so ORT expands them into primitive nodes at partition time and executes correctly despite no native kernel and the CPU Range-27 kernel registering only common numeric types. Verified via onnx_test_runner -e cpu (62/62 succeeded, output-compared) under ALLOW_RELEASED_ONNX_OPSET_ONLY=0. Removing the global filters restores backend-test coverage per design-review Minor-1. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s work today via ONNX function-body expansion (ONNX 1.22.0rc1, issue microsoft#28752) The prior comment said float16/bfloat16 'support is a follow-up enhancement', which a maintainer could misread as fp16/bf16 Range being broken. Clarify that such models execute correctly today because Range-27 carries an ONNX function body that ORT expands at graph-partition time; the follow-up is an efficient native kernel, not a functional fix. Comment-only change; verified onnxruntime_providers recompiles and all 20 GatherToSliceFusion/EmbedLayerNormFusion optimizer tests pass. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…27 filter comment (ONNX 1.22.0rc1, issue microsoft#28752) The opset-27 validation only exercised the CPU EP. That limitation was previously only implicit (via the 'if a non-CPU EP fails in CI' clause). Add an explicit NOTE so a future maintainer cannot miss that GPU/CUDA/DML EPs were not exercised in this validation env. Comment-only; JSONC still parses (302 entries unchanged). Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… tradition When an op's kernel set changes for the new opset (e.g. Range gaining fp16/bf16 at opset 27), version-split / bump that op's registration in EVERY EP that registers it (CPU and CUDA at minimum) in the SAME PR, so no EP silently lags behind CPU and the advertised opset boundaries stay consistent. Even an open-ended kernel that already binds the new opset (e.g. CUDA Range at SinceVersion(11)) should still be version-split for convention/clarity. Worked example cited: PR microsoft#28754 split Range [11,26]+27 in both CPU and CUDA (verified). - §1 Group B: added the all-EP tradition callout + an EP checklist (grep each provider dir; split cpu/cuda/js/rocm macro registrations; assess dml/webgpu/coreml/nnapi/etc. per their own systems; bump coreml/nnapi/vsinpu/webnn GetMaxSupportedOpSet ceilings). - §11: added a cross-EP consistency convention note distinguishing splitting (clarity) from binding-coverage (correctness). - §6 checklist: Group B line now calls out version-splitting in every EP. Agent-signed-off: Architect (bcad189c) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…message Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PSET_ONLY legs Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…efault Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-pin the ONNX C++ source dependency to rel-1.22.0 HEAD (onnx/onnx@b124e0188a, VERSION_NUMBER 1.22.0rc2), which carries the upstream Xcode/iOS CMake fix onnx#8056 (cherry-picked via onnx#8066). With the fix upstream, cmake/patches/onnx/onnx.patch collapses back to the 3 ORT-only hunks (ONNX_MINIMAL_BUILD option + minimal-build source branch rebased onto rc2's post-8056 onnx-target layout, and the GroupNormalization-18 .Deprecate() removal); binskim.patch mirrors it byte-for-byte. - cmake/deps.txt: onnx archive zip + SHA1 -> rc2 - cmake/external/onnx: submodule pointer -> b124e0188a - cmake/vcpkg-ports/onnx/portfile.cmake: REF + tar.gz SHA512 -> rc2 - cmake/patches/onnx/onnx.patch + binskim.patch: regenerated for rc2 rc1..rc2 source delta (bc3be77be..b124e0188a, 3 commits): besides the Xcode/CMake restructure (onnx#8056) and the version bump, the range also carries runtime-touching but schema-NEUTRAL hardening that is compiled into ORT: - onnx#8051 (via microsoft#8058): Conv/Pool/RoiPool/ConvTranspose shape-inference guards (reject <min-rank inputs, non-positive dilations/kernel/strides, negative pads) plus a behavior-identical auto_pad residual fix. - onnx#8066 cherrypicks: onnx/checker.cc raw_data size validation for packed sub-byte tensors, and a Conv weight/input spatial-rank guard. There are ZERO operator-schema/opset/type-constraint/attribute changes in the range, so these deltas only reject previously-malformed inputs and do not change results for any valid model. The 7 CI requirements.txt files intentionally stay at onnx==1.22.0rc1: the rc2 wheel is not yet on PyPI, and the iOS/Xcode build consumes the GitHub source archive (deps.txt), not the wheel. Given the schema-neutral delta above, the rc1 wheel stays functionally compatible; bump to rc2 once ONNX publishes the wheel. Validated: minimal-build extended gate passes (ONNX_MINIMAL_BUILD=ON); onnx.patch applies cleanly to the rc2 source; binskim.patch byte-identical. Agent-signed-off: Developer (a478a765) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ror gotcha in onnx-opset-bump-checklist The vcpkg asset cache runs with x-block-origin (no GitHub fallback), so a missing blob hard-fails every --use_vcpkg leg with a 404. Each archive bump (rc1->rc2->formal) is a new SHA512 = new un-mirrored blob; a green rc1 run doesn't mean rc2 is mirrored. Added a read-only curl probe and the bare-SHA512 blob-key detail. Agent-signed-off: Architect (f1afcb8a) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…for vcpkg asset-mirror gotcha Extends the onnx-opset-bump-checklist mirroring section with: (h) the exact vcpkg log signature (404 + 'x-block-origin set' on a /artifacts/<sha512> URL, vcpkg legs fail while FetchContent legs pass); (i) ordered fix options (Terrapin self-seed Windows leg / az blob upload under bare-SHA512 name / EngSys ticket) with a verify-via-curl-200 step. References the architect rc2 mirror runbook artifact. Agent-signed-off: Architect (f1afcb8a) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… leniency) ONNX 1.22 (rc2, cherry-pick microsoft#8051) tightened convTransposeShapeInference to fail_shape_inference when an output_shape/output_padding attribute size does not match the number of spatial dimensions. ORT historically also accepted a non-spec rank+2 (full N,C,H,W) output_shape form. This is Option A: instead of patching ONNX to restore the leniency (Option B, commit 031c777), conform ORT's own test models to the spec so the onnx.patch ConvTranspose hunks never land in main. Changes: - conv_transpose_op_test.cc: 10 output_shape attributes -> spatial-only (N,C prefix dropped; Y_shape/expected_vals unchanged). Keep B's InvalidKernelShape expected-message fix (ONNX rejects kernel_shape rank at Graph::Resolve). Add a new InferenceSession-based regression test (ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime) that feeds an unknown-rank input so the kept rank+2 kernel branch stays exercised. - xnnpack_basic_test.cc: 1 output_shape attribute -> spatial-only. - conv_transpose_attributes.h: keep the rank+2 toleration (runtime-reachable for dynamic-rank inputs) and document why it is retained; no behavior change. - onnx.patch / binskim.patch: unchanged at the 3-hunk base (no ConvTranspose reverts) since this branch is built on the pre-B base. Breaking change: models specifying a rank+2 output_shape AND a statically-known input rank now fail to load under ONNX 1.22 with 'Attribute output_shape has incorrect size'. Migration: use spatial-only output_shape. See PR microsoft#28754 description. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ONNX 1.22.0rc2 tightened convPoolShapeInference to reject Conv inputs
with rank < 3 at model load ("Input tensor must have at least 3
dimensions"). test_op_split.py declared a spec-invalid rank-2 input
[6,3] feeding Conv (weight also rank-2), which rc2 now rejects, breaking
test_quantize_split and test_quantize_split_s8s8 on 16 CI legs.
Conform the test model to the ONNX Conv spec (Option-A philosophy):
- input [6,3] -> NCHW [1,1,6,3]
- conv_weight [6,3] -> 4D [2,1,1,1] (M=2 output channels, 1x1 kernel)
- data_reader feed [6,3] -> [1,1,6,3]
Conv output is now [1,2,6,3] = 36 elements, unchanged downstream:
reshape -> [3,12], Split(axis=0, [1,1,1]) -> three [1,12] outputs.
Op-type-count assertions are unaffected (only ranks changed).
Validated against an rc2-linked onnxruntime build: full quantization
unittest suite (314 tests) passes; the two previously-failing split
tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n + atol override for fp16 causal_conv
Two ONNX-1.22 backend node-test failures were unmasked once the python quant
suite stopped aborting before the OnnxBackendNodeModelTest stage:
1. test_attention_4d_softcap_neginf_mask_expanded (+ _poison_expanded) ERROR
on the macOS-arm64-webgpu Release backend-conformance leg: the ONNX
function-EXPANDED Attention reference decomposition trips a SizeFromDimension
underflow (dim = SIZE_MAX) downstream of the bias Add (prime suspect
softmax.cc:177). The native FUSED Attention kernel
(test_attention_4d_softcap_neginf_mask[_poison], no _expanded) passes on
every arch, and CPU EP passes the expanded tests on x64 + Linux-arm64 and in
the ONNX ReferenceEvaluator -- so this is not user-facing; only the expanded
reference graph trips on that build.
Placed in the GLOBAL current_failing_tests list (matching the existing
expanded-attention webgpu-skip precedent at L38/40/42), so the 2 _expanded
REFERENCE decompositions are skipped on all configs including CPU. Global was
chosen over the current_failing_tests_WEBGPU section for a deterministic green
that is independent of the supports_device('WEBGPU') runtime condition. The
FUSED production Attention kernel stays fully covered on every arch; only the
2 expanded reference variants stop running. Tracked for a follow-up fix.
2. test_causal_conv_with_state_silu_fp16 (+ _expanded) is a single-ULP fp16
tolerance miss on arm64 (max rel diff 0.001174). Given an atol override of
5e-4 (mirroring test_attention_4d_fp16) rather than a skip, so coverage is
retained on all platforms.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ow (microsoft#28969) The vectorized broadcast path counted trailing "shared" dimensions with a loop that treated an exhausted operand's implicit size-1 dim as a match against a literal 1. For unequal-rank operands with leading unit dims (e.g. Add lhs=[1,1,6,6] + rhs=[6,6]), num_shared_dimension grew past the smaller operand's rank, so lhs/rhs SizeFromDimension(rank - num_shared) underflowed (size_t wrap to SIZE_MAX) and tripped ORT_ENFORCE in TensorShape::SizeFromDimension, failing every WebGPU binary op at the [...,1,d,e] + [d,e] corner. Extract the shared-trailing-dim math into a deviceless free helper CountSharedTrailingDimensions that breaks as soon as EITHER operand runs out of real dimensions, bounding the count by min(lhs rank, rhs rank). The shared-dim product (and thus the divisible-by-4 vectorize decision) is unchanged for all existing cases; only the previously-underflowing corner is corrected. Add a deviceless gtest on the helper and an end-to-end OpTester regression on DefaultWebGpuExecutionProvider (Add [1,1,6,6]+[6,6]) that fails pre-fix with the SIZE_MAX SizeFromDimension enforce and passes post-fix. Verified locally against lavapipe software Vulkan; the full elementwise/broadcast suite (62 tests) stays green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fixed) Remove the two global skips for test_attention_4d_softcap_neginf_mask_expanded and its _poison_expanded variant. They were added only to dodge the WebGPU binary-elementwise broadcast SizeFromDimension underflow (microsoft#28969), which is now fixed in this branch by the CountSharedTrailingDimensions helper. The expanded function-reference Attention tests can run again on every config. The fp16 causal_conv atol override in onnx_backend_test_series_overrides.jsonc is an independent tolerance fix and is intentionally left in place. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pe, filter-vs-override, latent bugs) Add three gotchas to .agents/skills/onnx-opset-bump-checklist/SKILL.md: - (j) the Linux webgpu CI leg is build-only; ONNX backend node tests (OnnxBackendNodeModelTest) only execute on the macOS-arm64 webgpu leg, so a green Linux webgpu leg does not mean WebGPU actually ran them. - (k) a filter-vs-override decision rubric: filters.jsonc SKIPs for a real EP bug (cite issue + removal condition), overrides.jsonc RELAXes ATOL for benign fp16/ULP diffs (prefer over a skip when the kernel is correct) — but only after root-causing the diff as ~1 ULP; unexplained/large/growing diffs are bugs. - (l) new upstream reference tests can expose latent EP bugs (e.g. microsoft#28969, a WebGPU broadcast underflow surfaced by ONNX 1.22 expanded-Attention tests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…on Linux without a GPU Add .agents/skills/webgpu-local-testing/SKILL.md covering how to build and run ONNX Runtime WebGPU provider tests on Linux using a software Vulkan adapter (Mesa lavapipe): why software Vulkan suffices for host-side enforce/shape bugs and MatMul-free kernels, dnf install on Azure Linux, the --use_webgpu build flag, the onnxruntime_provider_test target with VK_ICD_FILENAMES, the lavapipe MatMul-family crash gotcha, and the fact that the Linux webgpu CI leg is build-only. Scope is called out explicitly: any MatMul-containing graph (including the expanded-Attention node tests that motivated microsoft#28969) cannot run on lavapipe and is validated only on macOS-arm64 Metal; microsoft#28969 itself was validated on lavapipe via a standalone Add-broadcast OpTester proxy, not the expanded-Attention node test. The lavapipe ICD path is noted as arch-specific (x86_64 vs aarch64). Cross-reference the new skill from ort-test (running WebGPU tests locally) and ort-build (--use_webgpu key flag). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ONNX 1.22.0 was released 2026-06-15 (tag v1.22.0, commit 2bb50465). The rc2->final delta touched only CI workflow yml + VERSION_NUMBER -- no operator schema, opset, or backend-testdata change -- so this is pure version plumbing. - cmake/deps.txt: onnx archive -> refs/tags/v1.22.0.zip (SHA1 2b2cd58a...) - cmake/external/onnx submodule -> 2bb50465112feca9003e1ed654d77f01ff1415ca - cmake/vcpkg-ports/onnx/portfile.cmake: REF v1.22.0 + tar.gz SHA512 13fafff0... - 7 CI requirements.txt: onnx==1.22.0rc1 -> onnx==1.22.0 (now on PyPI); the 3 transformers-model requirements stay frozen at onnx==1.18.0. - onnx.patch / binskim.patch unchanged (source identical rc2<->final; still apply). - filters.jsonc integration comment: 1.22.0rc1 -> 1.22.0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e72e1de to
0a25d14
Compare
The *CurrentOpset fusion regression tests build/load models stamped at the
current ONNX opset (27 in ONNX 1.22), which is still under development while
opset 26 is the last released version. Under ORT's default strict load-time
validation (ALLOW_RELEASED_ONNX_OPSET_ONLY unset or '1'), loading such a model
throws, so these tests failed on every strict CI leg.
Pass ModelOptions{allow_released_opsets_only=false, strict_shape_type_inference=false}
through the model-construction/load path of these tests (mirroring the existing
GatherToSliceFusion opset-27 precedent) so they RUN and PASS on every leg,
strict or not, preserving opset-27 fusion coverage with no masking.
- 9 TestGraphTransformer calls (Gelu/FastGelu/BiasGelu/MatMulAdd/DivMul/QuickGelu,
LayerNorm/SkipLayerNorm, GQA-Qwen): append the ModelOptions argument.
- AttentionFusionMobileClipMhaCurrentOpsetTest (TransformerTester): thread an
optional ModelOptions through TransformerTester; load the serialized model via
the istream Load overload so allow_released_opsets_only is honored (the byte/
proto Load overloads hardcode it). No product-code change.
- 3 EmbedLayerNorm tests: pass ModelOptions to Model::Load in LoadModelAtCurrentOpset.
- ReshapeFusionOpsetTest (ENABLE_TRAINING-only): its opset loop includes the
current opset; apply the same ModelOptions to both TestGraphTransformer calls so
it runs on strict training builds too. Training-gated (validated by analogy to
the non-training pattern; not compiled in the default build).
Validated: 13 non-training tests PASS under default(strict), =1, and =0; full
onnxruntime_test_all under strict passes 1820 tests with no 'Opset 27 under
development' throw.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add WHY comments + tracking issue refs (microsoft#28966, and microsoft#28969 on the WebGPU attention-fusion path) to the ModelOptions{allow_released_opsets_only=false} call sites in the *CurrentOpset fusion tests, so a future reader knows they can be removed once ONNX opset 27 ships. No test logic or ModelOptions args change. Extend the onnx-opset-bump-checklist skill with three hard-won gotchas from the 1.22.0 integration: (m) the vcpkg MS-internal asset mirror must be Terrapin-seeded with the new tag tarball or every --use_vcpkg leg 404s; (n) a FINAL onnx release can still ship a map-max opset > last released opset (1.22.0: 27 > 26), leaving it under-development; (o) prefer per-model ModelOptions{allow_released_opsets_only=false} over per-leg CI env flips or GTEST_SKIP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review summary (review-team pass) — ONNX 1.22.0 / opset 27Structurally matches the merged 1.21 bump (#27601): same version-plumbing, same versioned-split pattern for new-opset ops, same Actionable1. (Minor / consistency) WebGPU 2. (Minor / open question) bf16 Range-27 is untested; description slightly overstates Optional polish (readability, non-blocking)
Praise
Generated by a 5-reviewer pass (readability, code, critical, deep-spec, integration); the two Minor items above were re-verified against the source before posting. |
tianleiwu
left a comment
There was a problem hiding this comment.
Review summary — ONNX 1.22.0 / opset 27 integration
Careful, well-documented version bump. I independently re-verified the risky
plumbing and the operator/optimizer/test changes; no blocking issues, only
optional nitpicks. Verdict: approve-leaning (commenting, not gating).
Independent verification (re-checked at this head)
| Check | Result |
|---|---|
cmake/external/onnx submodule pin |
2bb50465112feca9003e1ed654d77f01ff1415ca = v1.22.0 tag commit ✅ |
cmake/deps.txt SHA1 of v1.22.0.zip |
sha1sum of fresh download matches 2b2cd58a… ✅ |
portfile.cmake SHA512 of v1.22.0.tar.gz |
sha512sum of fresh download matches 13fafff0… ✅ |
| vcpkg MS asset mirror (portfile SHA512) | …/artifacts/13fafff0… → HTTP 200 (mirrored; --use_vcpkg legs won't 404) ✅ |
onnx.patch ↔ binskim.patch |
byte-identical (sha1 6a4e6ed8…) ✅ |
| requirements pins | all 7 CI files at 1.22.0; transformers files correctly left at 1.18.0 ✅ |
The onnx.patch rebase is sound: ONNX_MINIMAL_BUILD was re-expressed against
1.22's add_subdirectory(onnx) layout via target_sources(onnx PRIVATE … data_type_utils.cc),
the GroupNormalization-18 .Deprecate() removal is kept, and the now-upstreamed
Utils.cmake protobuf-warnings hunk is correctly dropped.
Correctness highlights
- Range opset-27 split (CPU + CUDA): versioned
[11,26]+ new27, matching
forward-declares andBuildKernelCreateInfo. fp16/bf16 correctly fall through to
ONNX function-body expansion;stash_typeis irrelevant to the native path. - WebGPU broadcast fix (#28969):
CountSharedTrailingDimensionsnow breaks once
either operand is exhausted, bounding the shared run bymin(lhs_rank, rhs_rank).
Every divergence from the old inline loop is exactly a previously-underflowing
case (SizeFromDimension(rank − num_shared)size_t wrap), so the fix is strictly
safer. Deviceless unit test + end-to-end Add-broadcast coverage is thorough, and
the Dawn-free header avoids a webgpu-provider link dependency in the CPU test TU. - ConvTranspose
output_shapeconformance (breaking): rank+2 form now rejected
atGraph::Resolvefor static rank (onnx#5400); the retained kernel branch is
correctly documented as runtime-reachable only for dynamic-rank inputs, and the
newConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtimetest exercises
exactly that path. - Backend-test filters:
^test_flexattention_(?!.*expanded)is correct given
ONNX's_cpu/_cudamethod-name suffix — it excludes basetest_flexattention_cpu
while preserving the_expanded_ver26variants. - Test infra: threading
ModelOptions+ mirroringstrict_shape_type_inference
onto the session and loading viastd::istream(soallow_released_opsets_onlyis
honored) is the right way to exercise under-development opset 27 on strict legs.
Each relaxed call site is annotated with #28966.
Minor / optional (non-blocking)
- Spurious
1in Range version lists.{1, 11, 27}ingather_fusion.ccand
embed_layer_norm_fusion.cccarries a leading1, but Range has no opset-1
schema, so1can never match a node'sSinceVersion. Harmless and pre-existing,
but{11, 27}would be marginally cleaner since these lines are already being
edited. - Deferred follow-ups confirmed, not missed: JS Range stays open-ended at
11
(still matches opset-27 nodes), DML/ROCm unsplit, multi-EPOperatorKernels.md
regen, and a native fp16/bf16 Range-27 kernel are all listed as explicit
follow-ups in the description.
The new onnx-opset-bump-checklist and webgpu-local-testing skills capture the
exact gotchas hit here — a good durable artifact for the next bump.
…exec-11)
Test-only / docs-only readability polish on top of the opset-27 ModelOptions
fusion-test fix. No behavior change.
1. Extract the magic boolean used at every *CurrentOpset fusion-test site.
Introduce `constexpr bool kAllowReleasedOpsetsOnly` (in the shared
graph_transform_test_builder.h, namespace onnxruntime::test) and use it as
the ModelOptions::allow_released_opsets_only first argument at all 14 call
sites across graph_transform_test.cc, graph_transform_test_layernorm.cc,
group_query_attention_pre_norm_fusion_test.cc, and the GatherToSlice
precedent. The constant mirrors the ctor argument name exactly so each site
reads ModelOptions{kAllowReleasedOpsetsOnly, ...} (false = do not restrict to
released opsets, i.e. load models stamped at the not-yet-released opset).
strict_shape_type_inference=false behavior unchanged.
2. binary_elementwise_broadcast_utils.h: add Doxygen @param/@return docs to
CountSharedTrailingDimensions and rename the local dimA/dimB -> lhs_dim/
rhs_dim for clarity. Stays inline + Dawn-free (only tensor_shape.h);
behavior unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address tianleiwu's review nit on PR microsoft#28754: the Range op was introduced at ONNX opset 11 (there is no opset-1 Range schema), so the leading `1` in the `{1, 11, 27}` version lists is dead and never matches. Trim it to `{11, 27}`, keeping 27 so opset-27 Range nodes still match. Sites: - onnxruntime/core/optimizer/gather_fusion.cc (Range->Gather->Slice matcher) - onnxruntime/core/optimizer/embed_layer_norm_fusion.cc (two Range path-matchers) No behavior change: opset-1 Range never existed, so removing it cannot drop any real match; 11 and 27 are preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the thorough review, @tianleiwu, and for independently re-verifying the version pins, vcpkg mirror, Good catch on the spurious leading
The change is behavior-identical (exact-set version matching, the The other items you flagged (JS Range left open-ended at 11, DML/ROCm unsplit, Could you take another look and formally approve when you have a chance? Thanks! |
tianleiwu
left a comment
There was a problem hiding this comment.
Re-review at db8d8bbc (delta since my prior review at 80674c19)
Re-reviewed the two commits added since my last pass. The bump stays clean — no blocking issues from my side.
Resolved
- Spurious leading
1in theRangeversion lists (my prior nitpick): now{11, 27}ingather_fusion.ccandembed_layer_norm_fusion.cc. Verified behavior-identical —IsSupportedOptypeVersionAndDomaindoes exact-set matching andRangehas no opset-1 schema, so the1was unreachable.
Independent re-verification at this head
Range-27 native kernel +stash_type: confirmed against the pinned ONNX 1.22.0 source (onnx/defs/generator/defs.cc,Range/27 +BuildFunctionBodyRange27). Opset 27 only addsfloat16/bfloat16toTand astash_typeattribute that is consumed only whenT ∈ {float16, bfloat16}("Has no effect for other types"). The native CPU/CUDA kernels keep the 5-type constraint{float, double, int16, int32, int64}, for which the opset-27 function body is identical to opset 11 — so the[11,26]+27split is behavior-preserving and fp16/bf16 correctly route through ONNX function expansion. The kernel comments are accurate.- Broadcast util refactor (
binary_elementwise_broadcast_utils.h): the doc +lhs_dim/rhs_dimrename does not change the loop's underflow-safe boundmin(lhs rank, rhs rank, output_rank-1). Still correct.
Confirming the two minor items from your review-team pass (non-blocking)
- WebGPU
Rangeopen-ended at.SinceVersion(11)(webgpu/generator/range.cc:96, not in this diff): I verified it is behavior-correct for itsT ∈ {float, int32, int64}constraint — none of those types are touched by the opset-27 change and fp16/bf16 aren't registered there, so theSinceVersion(11)kernel that now also serves opset-27 nodes produces identical results. Purely a consistency/clarity gap; agree it belongs in the follow-up list alongside the JS/DML deferral (or split now). Not a blocker. - bf16
Range-27 test coverage / description wording: agree — the backend filter excludes both the base and_expandedbf16 variants, so bf16 has no passing test (fp16 does, via function expansion). Softening the description's "fp16/bf16 … pass" or adding a C++OpTesterbf16 case would make it precise. Minor.
The named-constant + broadcast-doc polish in d00fd69f reads well.
Verdict: no blocking issues; the remaining items are documented, non-blocking follow-ups.
Integrate ONNX 1.22.0rc1 (opset 27)
Resolves #28752.
Pin:
onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df(VERSION_NUMBER1.22.0rc1).ONNX 1.21.0 → 1.22.0rc1. Max ai.onnx opset 26 → 27. IR version unchanged (13 /
0x0D).This is the RC validation phase of an incremental integration (same strategy as the ONNX 1.21 bump, #27601). The formal
v1.22.0GitHub release is still a draft (no git tag yet), so re-pinning to the released tag is deferred to Phase 2 (see Follow-ups). Landing the RC now validates ONNX 1.22 against ORT before ONNX publishes the formal release.Update — ONNX 1.22.0 FINAL re-pin + rebase onto
upstream/main+ closes #28969ONNX published the formal
v1.22.0GitHub release, so this PR is re-pinned rc2 → FINAL (onnx/onnx@v1.22.0) — the Phase-2 step deferred in the rc1 description below. The branch was also rebased ontoupstream/mainto pick up the intervening optimizer/opset-26 work. The released tag tarball is a different asset hash than the RCs, so the vcpkg MS-internal asset mirror was re-seeded for the final tag (otherwise--use_vcpkglegs 404).Also closes #28969 (WebGPU binary-elementwise broadcast
SIZE_MAXunderflow). ONNX 1.22's expanded-Attention reference tests exposed a latent WebGPU bug where a broadcast shape computeddim - 1on a zero/unit dimension and underflowed toSIZE_MAX; the fix is included here and the previously-skipped reference tests are re-enabled.Opset-27
*CurrentOpsettest handling. ONNX 1.22.0 FINAL shipsDomainToVersionRangemap-max 27 while the last released opset is 26, so opset 27 stays under development for the whole 1.22 cycle. Strict legs (the default, orALLOW_RELEASED_ONNX_OPSET_ONLY=1) therefore throw "Opset 27 under development" at model load on every*CurrentOpsetfusion test that builds at the max opset. These tests now load with per-modelModelOptions{/*allow_released_opsets_only*/ false, /*strict_shape_type_inference*/ false}, extending the existing38f17243b/ GatherToSlice precedent to the rest of the*CurrentOpsetsuite. This is leg-agnostic (exercises opset 27 on every CI leg, not just the relaxed ones) and preserves opset coverage (vs.GTEST_SKIP). Each call site is annotated with a one-line WHY + tracking issue (#28966) so the relaxation can be removed once opset 27 is released.Resolves #28752(unchanged). Closes #28969.Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX
output_shapespecSince the original rc1 description below, this PR was re-pinned rc1 → rc2 (
onnx/onnx@b124e0188a,VERSION_NUMBER 1.22.0rc2) to pick up the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries onnx#8051, which tightenedconvTransposeShapeInferenceto reject anoutput_shape/output_paddingwhose size does not match the number of spatial dimensions (per the ONNX spec clarification onnx#5400). ONNX Runtime now conforms to that spec instead of patching ONNX to preserve a non-standard form.output_shapenow follows the ONNX spec (spatial dimensions only). ORT previously also accepted a non-standardrank + 2form that included batch and channel, i.e.(N, C, H, W). As of ONNX 1.22, arank + 2output_shapeon a ConvTranspose whose input has a statically-known rank is rejected atGraph::Resolvewith "Attribute output_shape has incorrect size". Migration: specifyoutput_shapewith spatial dimensions only — e.g.{1, 1, 1, 14}→{1, 14}(batch and channel are always inferred from the input and weight, so results are identical; the kernel ignoresN, C). Models whose ConvTranspose input has a dynamic/unknown rank are unaffected — ONNX skips the size check and ORT computes the same result (covered by the newConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtimetest).Patch inventory — supersedes "2 files, 3 hunks" below.
cmake/patches/onnx/onnx.patch(and its byte-identicalbinskim.patchmirror) carries only theONNX_MINIMAL_BUILDoption hunk and the GroupNormalization-18.Deprecate()removal — no ConvTranspose hunks. rc2's strict shape-inference check is kept as-is; ORT's own test models were conformed to the spec. The upstream archive hash,deps.txt,portfile.cmake,vcpkg.json, and the submodule pin are unchanged.Additional rc2 test conform. rc2 also tightened
convPoolShapeInferenceto rejectConvinputs with rank < 3 ("Input tensor must have at least 3 dimensions"). The hand-authored model inonnxruntime/test/python/quantization/test_op_split.pydeclared a spec-invalid rank-2Convinput/weight; it was conformed to a valid NCHW shape ([6, 3]→[1, 1, 6, 3], weight →[2, 1, 1, 1]), keeping the quantized-Split graph and expected outputs identical. No ORT source change.What changed (29 files)
Version plumbing
cmake/deps.txt— onnx archive URL → rc1 commit zip + SHA1421e5a9afb6c41a54696e424e5b9a3796aab6821.cmake/external/onnx— submodule →bc3be77b.cmake/vcpkg-ports/onnx/portfile.cmake—REFcommit form + tar.gz SHA512e0c526f5…3ce467.cmake/vcpkg-ports/onnx/vcpkg.json—version-semver1.22.0,port-version0.cmake/patches/onnx/onnx.patch+cmake/vcpkg-ports/onnx/binskim.patch— byte-identical rebase onto 1.22 (2 files, 3 hunks): kept theONNX_MINIMAL_BUILDoption (restructured for 1.22's newonnx_coreOBJECT-lib /add_subdirectory(onnx)layout) and the GroupNormalization-18.Deprecate()removal; dropped theUtils.cmakeprotobuf-warnings hunk (already merged upstream in 1.22).Opset-27 op enablement (Range)
onnxruntime/core/providers/cpu/generator/range.cc— split into versioned[11, 26]+ a new unversioned27registration. The opset-27 kernel natively supports the existing common numeric types (float/double/int16/int32/int64). fp16 Range is covered via ONNX's Range-27 function body, which ORT expands into primitive ops at partition time. bf16 Range is deferred to that same function expansion — there is no native bf16 kernel, and its bf16 reference node test (test_range_bfloat16_type_positive_delta, base +_expanded) is not exercised by the Python/numpy ONNX backend series, whose harness cannot materialize bf16 (Numpy_type 256); a native fp16/bf16 kernel +stash_typehandling is a follow-up (efficiency, not correctness).onnxruntime/core/providers/cpu/cpu_execution_provider.cc— versioned the Range forward-declare +BuildKernelCreateInfoentries and added the opset-27 registration.[11, 26]+ opset-27 split as CPU (onnxruntime/core/providers/cuda/generator/range.cc+cuda_execution_provider.cc); GPU-verified locally:onnx_test_runner -e cuda8/8 opset-27 Range node tests pass, native Range-27 placed on CUDAExecutionProvider (fp16/bf16 via function expansion).Optimizer / EP opset ceilings
…/transpose_optimization/optimizer_api.h—kMaxSupportedOpset26 → 27.coreml/nnapi/vsinpu/webnnbase_op_builder.h—GetMaxSupportedOpSet()25 → 27 (upper guard only; per-op support checks still gate — these EPs gain no new kernels here).Fusion updates
onnxruntime/core/optimizer/gather_fusion.cc— GatherToSlice Range version list{1,11}→{1,11,27}.onnxruntime/core/optimizer/embed_layer_norm_fusion.cc— add27to the two Range path-matchers (parent_path_3/4) so embedding fusion still matches opset-27 models.onnxruntime/test/optimizer/graph_transform_test.cc— new opset-27 GatherToSliceFusion test.Requirements (7 bumped)
requirements.txt→onnx==1.22.0rc1(rc1 wheel is on PyPI). The 3 transformers pins remain frozen at1.18.0(unrelated to this bump; intentionally untouched).Generated docs / test data
js/web/docs/webgl-operators.md— regenerated.docs/OperatorKernels.md— surgical edit: CPU EP and CUDA EP Range rows (27++[11, 26]continuation each); see caveats.onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc— comment-only: documents why no opset-27 CPU exclusions are needed (all opset-27 node tests pass via function expansion).Docs
.agents/skills/onnx-opset-bump-checklist/SKILL.md— new reusable checklist skill distilled from this integration. Now also documents the "bump all execution providers together" tradition (CPU + CUDA + JS/DML assessment in one pass) so future opset bumps don't ship a partial EP set.Validation (CPU EP + CUDA EP, Linux x64)
--minimal_build extendedbuild ✅ (validates the rebasedONNX_MINIMAL_BUILDpatch hunk independently of the vcpkg mirror path)onnxruntime_test_all✅ — 1595 passed / 0 failedonnx_test_runner -e cpuon the ONNX 1.22 opset-27 node tests ✅ — 62/62 pass via ONNX function-body expansion (run withALLOW_RELEASED_ONNX_OPSET_ONLY=0), including CausalConvWithState, LinearAttention, and fp16/bf16 Range — despite no native kernels for them.--use_cudaclean in both Debug and RelWithDebInfo ✅;onnx_test_runner -e cudaon the opset-27 Range node tests ✅ — 8/8 pass, with native Range-27 placed on CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via function-body expansion.Standing caveats (please read before reviewing)
OperatorKernels.mdupdated surgically (CPU Range row only). A CPU-only full regen would destructively wipe the CUDA/DML/other-EP sections (the generator only emits rows for the EPs in the built module). A correct multi-EP regen needs a build per EP and is a follow-up.ALLOW_RELEASED_ONNX_OPSET_ONLY=0(ORT CI already sets this). The opset-27 schemas are always compiled in from the submodule regardless — this gate only affects model load-time acceptance, not schema availability.GetMaxSupportedOpSetjumped 25 → 27 (skips 26). This is an upper guard only; raising it merely lets opset-26/27 nodes reach the per-op support checks that still gate correctness. No regression — it also retroactively un-caps opset-26 for these EPs.onnx_coreOBJECT-library split in Remove glob calls from ONNX CMake code onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by Revert ONNX CMake changes onnx/onnx#7515 for Build failure with Xcode generator onnx/onnx#7514). This is NOT caused by this opset bump. Tracked upstream at onnx/onnx#8053. Non-Xcode builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are unaffected. This resolves at the Phase 2 formalv1.22.0re-pin once ONNX ships the fix.Follow-ups (explicitly NOT in this PR)
OperatorKernels.mdacross all EPs.[11, 26]+27split (currently registered open-ended at11; mirror the CPU/CUDA versioned split).REG_INFOregistration system — assess whether an opset-27 entry is needed).range.ccregistersRange.SinceVersion(11)open-ended, so it already claims opset-27 Range; only the new bf16 type is unsupported and falls back via theTtype-constraint (function expansion). Mirror the CPU/CUDA versioned[11, 26]+27split.CausalConvWithStateandLinearAttentionkernels, and a native fp16/bf16 +stash_typeRange-27 kernel (replace today's function-expansion path with efficient kernels).v1.22.0re-pin: re-pindeps.txt/submodule/portfile/requirements to the released tag once ONNX publishes it (currently blocked on ONNX tagging the release); upload the tag tarball to the vcpkg mirror. This also restores the iOS/macOS Xcode framework build once the upstream onnx OBJECT-library Xcode regression (caveat 5) is resolved and re-pinned.find_optimizer_opset_version_updates_required.py(placeholderverparsed as int) so it can be relied on for future bumps.