Integrate ONNX 1.22.0 (opset 27) — issue #28752 by titaiwangms · Pull Request #28754 · microsoft/onnxruntime

titaiwangms · 2026-06-02T22:04:36Z

Integrate ONNX 1.22.0rc1 (opset 27)

Resolves #28752.

Pin: onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df (VERSION_NUMBER 1.22.0rc1).
ONNX 1.21.0 → 1.22.0rc1. Max ai.onnx opset 26 → 27. IR version unchanged (13 / 0x0D).

This is the RC validation phase of an incremental integration (same strategy as the ONNX 1.21 bump, #27601). The formal v1.22.0 GitHub release is still a draft (no git tag yet), so re-pinning to the released tag is deferred to Phase 2 (see Follow-ups). Landing the RC now validates ONNX 1.22 against ORT before ONNX publishes the formal release.

Update — ONNX 1.22.0 FINAL re-pin + rebase onto `upstream/main` + closes #28969

ONNX published the formal v1.22.0 GitHub release, so this PR is re-pinned rc2 → FINAL (onnx/onnx@v1.22.0) — the Phase-2 step deferred in the rc1 description below. The branch was also rebased onto upstream/main to pick up the intervening optimizer/opset-26 work. The released tag tarball is a different asset hash than the RCs, so the vcpkg MS-internal asset mirror was re-seeded for the final tag (otherwise --use_vcpkg legs 404).

Also closes #28969 (WebGPU binary-elementwise broadcast SIZE_MAX underflow). ONNX 1.22's expanded-Attention reference tests exposed a latent WebGPU bug where a broadcast shape computed dim - 1 on a zero/unit dimension and underflowed to SIZE_MAX; the fix is included here and the previously-skipped reference tests are re-enabled.

Opset-27 *CurrentOpset test handling. ONNX 1.22.0 FINAL ships DomainToVersionRange map-max 27 while the last released opset is 26, so opset 27 stays under development for the whole 1.22 cycle. Strict legs (the default, or ALLOW_RELEASED_ONNX_OPSET_ONLY=1) therefore throw "Opset 27 under development" at model load on every *CurrentOpset fusion test that builds at the max opset. These tests now load with per-model ModelOptions{/*allow_released_opsets_only*/ false, /*strict_shape_type_inference*/ false}, extending the existing 38f17243b / GatherToSlice precedent to the rest of the *CurrentOpset suite. This is leg-agnostic (exercises opset 27 on every CI leg, not just the relaxed ones) and preserves opset coverage (vs. GTEST_SKIP). Each call site is annotated with a one-line WHY + tracking issue (#28966) so the relaxation can be removed once opset 27 is released.

Resolves #28752 (unchanged). Closes #28969.

Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX `output_shape` spec

Since the original rc1 description below, this PR was re-pinned rc1 → rc2 (onnx/onnx@b124e0188a, VERSION_NUMBER 1.22.0rc2) to pick up the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries onnx#8051, which tightened convTransposeShapeInference to reject an output_shape/output_padding whose size does not match the number of spatial dimensions (per the ONNX spec clarification onnx#5400). ONNX Runtime now conforms to that spec instead of patching ONNX to preserve a non-standard form.

⚠️ Breaking change — ConvTranspose output_shape now follows the ONNX spec (spatial dimensions only). ORT previously also accepted a non-standard rank + 2 form that included batch and channel, i.e. (N, C, H, W). As of ONNX 1.22, a rank + 2 output_shape on a ConvTranspose whose input has a statically-known rank is rejected at Graph::Resolve with "Attribute output_shape has incorrect size". Migration: specify output_shape with spatial dimensions only — e.g. {1, 1, 1, 14} → {1, 14} (batch and channel are always inferred from the input and weight, so results are identical; the kernel ignores N, C). Models whose ConvTranspose input has a dynamic/unknown rank are unaffected — ONNX skips the size check and ORT computes the same result (covered by the new ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime test).

Patch inventory — supersedes "2 files, 3 hunks" below. cmake/patches/onnx/onnx.patch (and its byte-identical binskim.patch mirror) carries only the ONNX_MINIMAL_BUILD option hunk and the GroupNormalization-18 .Deprecate() removal — no ConvTranspose hunks. rc2's strict shape-inference check is kept as-is; ORT's own test models were conformed to the spec. The upstream archive hash, deps.txt, portfile.cmake, vcpkg.json, and the submodule pin are unchanged.

Additional rc2 test conform. rc2 also tightened convPoolShapeInference to reject Conv inputs with rank < 3 ("Input tensor must have at least 3 dimensions"). The hand-authored model in onnxruntime/test/python/quantization/test_op_split.py declared a spec-invalid rank-2 Conv input/weight; it was conformed to a valid NCHW shape ([6, 3] → [1, 1, 6, 3], weight → [2, 1, 1, 1]), keeping the quantized-Split graph and expected outputs identical. No ORT source change.

This note should also seed the GitHub Release notes for the ONNX 1.22 / opset 27 milestone and the squash-commit message.

What changed (29 files)

Version plumbing

cmake/deps.txt — onnx archive URL → rc1 commit zip + SHA1 421e5a9afb6c41a54696e424e5b9a3796aab6821.
cmake/external/onnx — submodule → bc3be77b.
cmake/vcpkg-ports/onnx/portfile.cmake — REF commit form + tar.gz SHA512 e0c526f5…3ce467.
cmake/vcpkg-ports/onnx/vcpkg.json — version-semver 1.22.0, port-version 0.
cmake/patches/onnx/onnx.patch + cmake/vcpkg-ports/onnx/binskim.patch — byte-identical rebase onto 1.22 (2 files, 3 hunks): kept the ONNX_MINIMAL_BUILD option (restructured for 1.22's new onnx_core OBJECT-lib / add_subdirectory(onnx) layout) and the GroupNormalization-18 .Deprecate() removal; dropped the Utils.cmake protobuf-warnings hunk (already merged upstream in 1.22).

Opset-27 op enablement (Range)

onnxruntime/core/providers/cpu/generator/range.cc — split into versioned [11, 26] + a new unversioned 27 registration. The opset-27 kernel natively supports the existing common numeric types (float/double/int16/int32/int64). fp16 Range is covered via ONNX's Range-27 function body, which ORT expands into primitive ops at partition time. bf16 Range is deferred to that same function expansion — there is no native bf16 kernel, and its bf16 reference node test (test_range_bfloat16_type_positive_delta, base + _expanded) is not exercised by the Python/numpy ONNX backend series, whose harness cannot materialize bf16 (Numpy_type 256); a native fp16/bf16 kernel + stash_type handling is a follow-up (efficiency, not correctness).
onnxruntime/core/providers/cpu/cpu_execution_provider.cc — versioned the Range forward-declare + BuildKernelCreateInfo entries and added the opset-27 registration.
CUDA Range — same versioned [11, 26] + opset-27 split as CPU (onnxruntime/core/providers/cuda/generator/range.cc + cuda_execution_provider.cc); GPU-verified locally: onnx_test_runner -e cuda 8/8 opset-27 Range node tests pass, native Range-27 placed on CUDAExecutionProvider (fp16/bf16 via function expansion).

Optimizer / EP opset ceilings

…/transpose_optimization/optimizer_api.h — kMaxSupportedOpset 26 → 27.
coreml/nnapi/vsinpu/webnn base_op_builder.h — GetMaxSupportedOpSet() 25 → 27 (upper guard only; per-op support checks still gate — these EPs gain no new kernels here).

Fusion updates

onnxruntime/core/optimizer/gather_fusion.cc — GatherToSlice Range version list {1,11} → {1,11,27}.
onnxruntime/core/optimizer/embed_layer_norm_fusion.cc — add 27 to the two Range path-matchers (parent_path_3/4) so embedding fusion still matches opset-27 models.
onnxruntime/test/optimizer/graph_transform_test.cc — new opset-27 GatherToSliceFusion test.

Requirements (7 bumped)

All 7 CI requirements.txt → onnx==1.22.0rc1 (rc1 wheel is on PyPI). The 3 transformers pins remain frozen at 1.18.0 (unrelated to this bump; intentionally untouched).

Generated docs / test data

js/web/docs/webgl-operators.md — regenerated.
docs/OperatorKernels.md — surgical edit: CPU EP and CUDA EP Range rows (27+ + [11, 26] continuation each); see caveats.
onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc — comment-only: documents why no opset-27 CPU exclusions are needed (all opset-27 node tests pass via function expansion).

Docs

.agents/skills/onnx-opset-bump-checklist/SKILL.md — new reusable checklist skill distilled from this integration. Now also documents the "bump all execution providers together" tradition (CPU + CUDA + JS/DML assessment in one pass) so future opset bumps don't ship a partial EP set.

Validation (CPU EP + CUDA EP, Linux x64)

Full build ✅
--minimal_build extended build ✅ (validates the rebased ONNX_MINIMAL_BUILD patch hunk independently of the vcpkg mirror path)
onnxruntime_test_all ✅ — 1595 passed / 0 failed
onnx_test_runner -e cpu on the ONNX 1.22 opset-27 node tests ✅ — 62/62 pass via ONNX function-body expansion (run with ALLOW_RELEASED_ONNX_OPSET_ONLY=0), including CausalConvWithState, LinearAttention, and fp16/bf16 Range — despite no native kernels for them.
CUDA EP (H100): built --use_cuda clean in both Debug and RelWithDebInfo ✅; onnx_test_runner -e cuda on the opset-27 Range node tests ✅ — 8/8 pass, with native Range-27 placed on CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via function-body expansion.

Standing caveats (please read before reviewing)

CUDA EP now locally verified for Range; other GPU EPs/ops still CI-only. The CUDA EP was built and the opset-27 Range node tests run locally on an H100 (8/8 pass). DML and the remaining GPU EPs/ops were not exercised here. Function-body expansion is EP-agnostic, so other opset-27 models are expected to run on those EPs too, but broader GPU coverage remains a CI/follow-up item.
OperatorKernels.md updated surgically (CPU Range row only). A CPU-only full regen would destructively wipe the CUDA/DML/other-EP sections (the generator only emits rows for the EPs in the built module). A correct multi-EP regen needs a build per EP and is a follow-up.
Opset 27 is "under development" in ONNX's released-versions map. ORT's load-time validation rejects opset-27 models unless ALLOW_RELEASED_ONNX_OPSET_ONLY=0 (ORT CI already sets this). The opset-27 schemas are always compiled in from the submodule regardless — this gate only affects model load-time acceptance, not schema availability.
EP GetMaxSupportedOpSet jumped 25 → 27 (skips 26). This is an upper guard only; raising it merely lets opset-26/27 nodes reach the per-op support checks that still gate correctness. No regression — it also retroactively un-caps opset-26 for these EPs.
iOS/macOS Xcode framework build is currently broken by an upstream ONNX CMake regression (the onnx_core OBJECT-library split in Remove glob calls from ONNX CMake code onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by Revert ONNX CMake changes onnx/onnx#7515 for Build failure with Xcode generator onnx/onnx#7514). This is NOT caused by this opset bump. Tracked upstream at onnx/onnx#8053. Non-Xcode builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are unaffected. This resolves at the Phase 2 formal v1.22.0 re-pin once ONNX ships the fix.

Follow-ups (explicitly NOT in this PR)

GPU/multi-EP coverage: run opset-27 CUDA/DML node tests; regenerate OperatorKernels.md across all EPs.
JS EP Range [11, 26] + 27 split (currently registered open-ended at 11; mirror the CPU/CUDA versioned split).
DML Range opset-27 assessment (DML uses its own REG_INFO registration system — assess whether an opset-27 entry is needed).
WebGPU EP Range opset-27 split — range.cc registers Range .SinceVersion(11) open-ended, so it already claims opset-27 Range; only the new bf16 type is unsupported and falls back via the T type-constraint (function expansion). Mirror the CPU/CUDA versioned [11, 26] + 27 split.
Native kernels: implement CPU (and EP) CausalConvWithState and LinearAttention kernels, and a native fp16/bf16 + stash_type Range-27 kernel (replace today's function-expansion path with efficient kernels).
Phase 2 — formal v1.22.0 re-pin: re-pin deps.txt/submodule/portfile/requirements to the released tag once ONNX publishes it (currently blocked on ONNX tagging the release); upload the tag tarball to the vcpkg mirror. This also restores the iOS/macOS Xcode framework build once the upstream onnx OBJECT-library Xcode regression (caveat 5) is resolved and re-pinned.
Tooling: fix the pre-existing crash in find_optimizer_opset_version_updates_required.py (placeholder ver parsed as int) so it can be relied on for future bumps.

… tradition When an op's kernel set changes for the new opset (e.g. Range gaining fp16/bf16 at opset 27), version-split / bump that op's registration in EVERY EP that registers it (CPU and CUDA at minimum) in the SAME PR, so no EP silently lags behind CPU and the advertised opset boundaries stay consistent. Even an open-ended kernel that already binds the new opset (e.g. CUDA Range at SinceVersion(11)) should still be version-split for convention/clarity. Worked example cited: PR microsoft#28754 split Range [11,26]+27 in both CPU and CUDA (verified). - §1 Group B: added the all-EP tradition callout + an EP checklist (grep each provider dir; split cpu/cuda/js/rocm macro registrations; assess dml/webgpu/coreml/nnapi/etc. per their own systems; bump coreml/nnapi/vsinpu/webnn GetMaxSupportedOpSet ceilings). - §11: added a cross-EP consistency convention note distinguishing splitting (clarity) from binding-coverage (correctness). - §6 checklist: Group B line now calls out version-splitting in every EP. Agent-signed-off: Architect (bcad189c) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…2 (PR microsoft#28754 CI Issue B) ONNX 1.22's onnx_core(OBJECT)/onnx_proto target split lists the generated ${ONNX_PROTO_SRCS} (.pb.cc) as compiled sources in BOTH targets (by design: onnx_core's objects feed onnx + onnx_cpp2py_export with hidden visibility, while onnx_proto is a standalone static lib with different defines). Xcode's new build system forbids two targets independently producing the same onnx-data.pb.cc. Extend the existing onnx_proto_gen custom target to also DEPEND on ${ONNX_PROTO_SRCS}, making it the single owner of the protoc generation step. Both libraries already depend on onnx_proto_gen, so they now consume the pre-generated sources (and still each compile their own object). This is CMake's documented add_custom_target-driver pattern for multi-target generated outputs (add_custom_command docs) and changes no defines/visibility, so it is safe for the normal Make/Ninja build. - cmake/patches/onnx/onnx.patch: new CMakeLists.txt hunk (now 2 files / 4 hunks) - cmake/vcpkg-ports/onnx/binskim.patch: byte-identical mirror (verified sha256) - onnx-opset-bump-checklist SKILL.md §7: hunk-count annotation 3 -> 4 Verified on Linux (Ninja, Debug): patch applies clean (git apply + patch -p1); onnx + onnx_proto configure, proto generates, both targets compile onnx-ml.pb.cc, libonnx.a + libonnx_proto.a link. Xcode itself is CI-verified only (no macOS host) - iOS_CI_on_Mac + React-Native-iOS are the oracles. Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…icrosoft#28754 CI Issue B facet 2) ONNX 1.22 splits onnx into onnx_core(OBJECT)/onnx_proto; the aggregate `onnx` target is add_library(onnx $<TARGET_OBJECTS:onnx_core>) with no real sources. Xcode's generator won't archive a static lib whose only sources are $<TARGET_OBJECTS:...>, so no libonnx.a is emitted and ORT's iOS framework link fails. Guard the onnx target with if(CMAKE_GENERATOR STREQUAL "Xcode") to add a generated empty dummy source forcing the archive; else()-branch leaves Make/Ninja/MSVC byte-unchanged. Mirrored byte-identically into binskim.patch; skill §7 hunk count updated 4->5. Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…28754 CI Issue B facet 3) The onnx_core(OBJECT) aggregate consumed via $<TARGET_OBJECTS:onnx_core> gives the Xcode generator no build-order guarantee, so onnx's libtool archive step races ahead of onnx_core's compilation and fails with 'libtool: can't open file: onnx_core.../defs.o (No such file or directory)'. Under the Xcode guard, build libonnx.a from the generated empty source and link onnx_core via target_link_libraries(onnx PRIVATE onnx_core) instead of loose TARGET_OBJECTS: cmake records a proper target-level dependency (onnx_core compiles first) and still archives every onnx_core object into libonnx.a, so ORT's ar -x over $<TARGET_FILE:onnx> extracts the full symbol set. else()-branch leaves Make/Ninja/MSVC byte-unchanged. Mirrored byte-identically into binskim.patch; skill §7 annotation updated. Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… leniency) ONNX 1.22 (rc2, cherry-pick microsoft#8051) tightened convTransposeShapeInference to fail_shape_inference when an output_shape/output_padding attribute size does not match the number of spatial dimensions. ORT historically also accepted a non-spec rank+2 (full N,C,H,W) output_shape form. This is Option A: instead of patching ONNX to restore the leniency (Option B, commit 031c777), conform ORT's own test models to the spec so the onnx.patch ConvTranspose hunks never land in main. Changes: - conv_transpose_op_test.cc: 10 output_shape attributes -> spatial-only (N,C prefix dropped; Y_shape/expected_vals unchanged). Keep B's InvalidKernelShape expected-message fix (ONNX rejects kernel_shape rank at Graph::Resolve). Add a new InferenceSession-based regression test (ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime) that feeds an unknown-rank input so the kept rank+2 kernel branch stays exercised. - xnnpack_basic_test.cc: 1 output_shape attribute -> spatial-only. - conv_transpose_attributes.h: keep the rank+2 toleration (runtime-reachable for dynamic-rank inputs) and document why it is retained; no behavior change. - onnx.patch / binskim.patch: unchanged at the 3-hunk base (no ConvTranspose reverts) since this branch is built on the pre-B base. Breaking change: models specifying a rank+2 output_shape AND a statically-known input rank now fail to load under ONNX 1.22 with 'Attribute output_shape has incorrect size'. Migration: use spatial-only output_shape. See PR microsoft#28754 description. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Pin ONNX to rel-1.22.0 HEAD commit bc3be77bec2f628788796dff60819186bacf49df (VERSION_NUMBER 1.22.0rc1): - cmake/deps.txt: commit-archive zip URL + SHA1 421e5a9afb6c41a54696e424e5b9a3796aab6821 - cmake/external/onnx: submodule -> bc3be77b - cmake/vcpkg-ports/onnx/vcpkg.json: version-semver 1.22.0, port-version 0 - cmake/vcpkg-ports/onnx/portfile.cmake: REF commit form + tar.gz SHA512 e0c526f5...3ce467 Rebase cmake/patches/onnx/onnx.patch to ONNX 1.22 and mirror byte-identically into binskim.patch: - Kept ONNX_MINIMAL_BUILD option (rebased context) - Restructured the minimal-build source-selection hunk for ONNX 1.22's new onnx_core OBJECT library / add_subdirectory(onnx) layout - Dropped the Utils.cmake protobuf_warnings hunk (already removed upstream in 1.22) - Kept the GroupNormalization-18 .Deprecate() removal (still present in 1.22) Agent-signed-off: Developer (8ac66e2a) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ONNX 1.22.0rc1 is published on PyPI (verified via pip index versions onnx --pre), so all 7 requirements files use the published wheel pin onnx==1.22.0rc1 rather than the git source pin. Agent-signed-off: Developer (0ede529c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…o 27 and register Range at opset 27 - optimizer_api.h: kMaxSupportedOpset 26 -> 27 - Range CPU kernel split into VERSIONED [11,26] + new non-versioned [27] reusing the existing kernel (Range-27 adds fp16/bf16 types and a stash_type attr; the existing common numeric types bind, fp16/bf16 is a deferred enhancement) - cpu_execution_provider.cc: versioned the Range forward-declare and BuildKernelCreateInfo entries and added the opset-27 Range registration Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…doc (ONNX 1.22.0rc1, issue microsoft#28752) - onnx_backend_test_series_filters.jsonc: exclude deferred opset-27 ops CausalConvWithState and LinearAttention (whole families, base + _expanded), and the Range-27 float16/bfloat16 node tests (ORT CPU Range-27 reuses the existing kernel registration which supports only the common numeric types; fp16/bf16 is a tracked follow-up). float/int32 Range tests remain enabled. - js/web/docs/webgl-operators.md: regenerated via npm run build:doc; now lists CausalConvWithState and LinearAttention as unsupported WebGL ops, reflecting the opset-27 schemas pulled in by the submodule bump. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ONNX 1.22 re-registers Range with an opset-27 schema (adds stash_type attr, widens type constraints; signature unchanged: 3 scalar inputs -> 1-D output). - gather_fusion.cc: add 27 to GatherToSlice Range version list {1,11}->{1,11,27}; add opset-27 GatherToSliceFusion test. - embed_layer_norm_fusion.cc: add 27 to the two Range path matchers (parent_path_3/4) so the embedding fusion still matches opset-27 models. - coreml/nnapi/vsinpu/webnn base_op_builder.h: bump default GetMaxSupportedOpSet 25->27, matching the lockstep convention of prior ONNX-integration PRs (microsoft#26579, microsoft#25678, microsoft#24449) so opset-27 nodes are not spuriously rejected. qdq_util.cc / layout_transformation_potentially_added_ops.h / kernel_type_str_resolver_utils.cc were checked and need no change (none reference the opset-27 ops Range/CausalConvWithState/LinearAttention). Agent-signed-off: Developer (0ede529c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Completes the T6 opset-27 sweep that the prior commit (c6366eb) missed for these two files. ONNX 1.22 re-registers Range with an opset-27 schema (operator_sets.h line 1460); find_optimizer_opset_version_updates_required.py flagged: 'Newer opset found for kOnnxDomain.Range. Latest:27 Optimizer support ends at 11. File: gather_fusion.cc' - gather_fusion.cc:309: GatherToSlice Range version list {1,11}->{1,11,27}. Range-27's signature is unchanged (3 scalar inputs -> 1-D output; only adds stash_type attr and wider type constraints), and the fusion depends only on that signature, so extending the accepted SinceVersion list is safe. - graph_transform_test.cc: new OpSet-27 (int64) GatherToSliceFusion block mirroring the existing OpSet-12/OpSet-14 blocks, proving Range-27 -> Gather fuses to Slice. Agent-signed-off: Developer (0ede529c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…0rc1) Update the CPU ai.onnx Range entry to reflect T2's split registration: a non-versioned opset-27 kernel (27+) plus the versioned [11, 26] kernel, matching the output of tools/python/gen_opkernel_doc.py against a fresh RelWithDebInfo build. The doc was updated surgically rather than via a full regen because this validation build is CPU-only (no CUDA/DML EPs available on this Linux host; DML is Windows-only). A full regen would have destructively dropped the CUDA and DmlExecutionProvider sections. The CPU-section delta was verified to be exactly this Range change by diffing the freshly generated CPU section against the committed doc. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…WithState/LinearAttention/Range-fp16-bf16 exclusions (ONNX 1.22.0rc1, issue microsoft#28752) T7 validation showed all 62 opset-27 backend node tests (the complete set: base + _expanded, including fp16/bf16) pass on the CPU EP. These ops are ONNX function ops (and Range-27 carries a function body), so ORT expands them into primitive nodes at partition time and executes correctly despite no native kernel and the CPU Range-27 kernel registering only common numeric types. Verified via onnx_test_runner -e cpu (62/62 succeeded, output-compared) under ALLOW_RELEASED_ONNX_OPSET_ONLY=0. Removing the global filters restores backend-test coverage per design-review Minor-1. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…s work today via ONNX function-body expansion (ONNX 1.22.0rc1, issue microsoft#28752) The prior comment said float16/bfloat16 'support is a follow-up enhancement', which a maintainer could misread as fp16/bf16 Range being broken. Clarify that such models execute correctly today because Range-27 carries an ONNX function body that ORT expands at graph-partition time; the follow-up is an efficient native kernel, not a functional fix. Comment-only change; verified onnxruntime_providers recompiles and all 20 GatherToSliceFusion/EmbedLayerNormFusion optimizer tests pass. Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…27 filter comment (ONNX 1.22.0rc1, issue microsoft#28752) The opset-27 validation only exercised the CPU EP. That limitation was previously only implicit (via the 'if a non-CPU EP fails in CI' clause). Add an explicit NOTE so a future maintainer cannot miss that GPU/CUDA/DML EPs were not exercised in this validation env. Comment-only; JSONC still parses (302 entries unchanged). Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… tradition When an op's kernel set changes for the new opset (e.g. Range gaining fp16/bf16 at opset 27), version-split / bump that op's registration in EVERY EP that registers it (CPU and CUDA at minimum) in the SAME PR, so no EP silently lags behind CPU and the advertised opset boundaries stay consistent. Even an open-ended kernel that already binds the new opset (e.g. CUDA Range at SinceVersion(11)) should still be version-split for convention/clarity. Worked example cited: PR microsoft#28754 split Range [11,26]+27 in both CPU and CUDA (verified). - §1 Group B: added the all-EP tradition callout + an EP checklist (grep each provider dir; split cpu/cuda/js/rocm macro registrations; assess dml/webgpu/coreml/nnapi/etc. per their own systems; bump coreml/nnapi/vsinpu/webnn GetMaxSupportedOpSet ceilings). - §11: added a cross-EP consistency convention note distinguishing splitting (clarity) from binding-coverage (correctness). - §6 checklist: Group B line now calls out version-splitting in every EP. Agent-signed-off: Architect (bcad189c) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Agent-signed-off: Architect (bcad189c) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…message Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…PSET_ONLY legs Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…efault Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Re-pin the ONNX C++ source dependency to rel-1.22.0 HEAD (onnx/onnx@b124e0188a, VERSION_NUMBER 1.22.0rc2), which carries the upstream Xcode/iOS CMake fix onnx#8056 (cherry-picked via onnx#8066). With the fix upstream, cmake/patches/onnx/onnx.patch collapses back to the 3 ORT-only hunks (ONNX_MINIMAL_BUILD option + minimal-build source branch rebased onto rc2's post-8056 onnx-target layout, and the GroupNormalization-18 .Deprecate() removal); binskim.patch mirrors it byte-for-byte. - cmake/deps.txt: onnx archive zip + SHA1 -> rc2 - cmake/external/onnx: submodule pointer -> b124e0188a - cmake/vcpkg-ports/onnx/portfile.cmake: REF + tar.gz SHA512 -> rc2 - cmake/patches/onnx/onnx.patch + binskim.patch: regenerated for rc2 rc1..rc2 source delta (bc3be77be..b124e0188a, 3 commits): besides the Xcode/CMake restructure (onnx#8056) and the version bump, the range also carries runtime-touching but schema-NEUTRAL hardening that is compiled into ORT: - onnx#8051 (via microsoft#8058): Conv/Pool/RoiPool/ConvTranspose shape-inference guards (reject <min-rank inputs, non-positive dilations/kernel/strides, negative pads) plus a behavior-identical auto_pad residual fix. - onnx#8066 cherrypicks: onnx/checker.cc raw_data size validation for packed sub-byte tensors, and a Conv weight/input spatial-rank guard. There are ZERO operator-schema/opset/type-constraint/attribute changes in the range, so these deltas only reject previously-malformed inputs and do not change results for any valid model. The 7 CI requirements.txt files intentionally stay at onnx==1.22.0rc1: the rc2 wheel is not yet on PyPI, and the iOS/Xcode build consumes the GitHub source archive (deps.txt), not the wheel. Given the schema-neutral delta above, the rc1 wheel stays functionally compatible; bump to rc2 once ONNX publishes the wheel. Validated: minimal-build extended gate passes (ONNX_MINIMAL_BUILD=ON); onnx.patch applies cleanly to the rc2 source; binskim.patch byte-identical. Agent-signed-off: Developer (a478a765) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ror gotcha in onnx-opset-bump-checklist The vcpkg asset cache runs with x-block-origin (no GitHub fallback), so a missing blob hard-fails every --use_vcpkg leg with a 404. Each archive bump (rc1->rc2->formal) is a new SHA512 = new un-mirrored blob; a green rc1 run doesn't mean rc2 is mirrored. Added a read-only curl probe and the bare-SHA512 blob-key detail. Agent-signed-off: Architect (f1afcb8a) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…for vcpkg asset-mirror gotcha Extends the onnx-opset-bump-checklist mirroring section with: (h) the exact vcpkg log signature (404 + 'x-block-origin set' on a /artifacts/<sha512> URL, vcpkg legs fail while FetchContent legs pass); (i) ordered fix options (Terrapin self-seed Windows leg / az blob upload under bare-SHA512 name / EngSys ticket) with a verify-via-curl-200 step. References the architect rc2 mirror runbook artifact. Agent-signed-off: Architect (f1afcb8a) [claude-opus-4.8 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… leniency) ONNX 1.22 (rc2, cherry-pick microsoft#8051) tightened convTransposeShapeInference to fail_shape_inference when an output_shape/output_padding attribute size does not match the number of spatial dimensions. ORT historically also accepted a non-spec rank+2 (full N,C,H,W) output_shape form. This is Option A: instead of patching ONNX to restore the leniency (Option B, commit 031c777), conform ORT's own test models to the spec so the onnx.patch ConvTranspose hunks never land in main. Changes: - conv_transpose_op_test.cc: 10 output_shape attributes -> spatial-only (N,C prefix dropped; Y_shape/expected_vals unchanged). Keep B's InvalidKernelShape expected-message fix (ONNX rejects kernel_shape rank at Graph::Resolve). Add a new InferenceSession-based regression test (ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime) that feeds an unknown-rank input so the kept rank+2 kernel branch stays exercised. - xnnpack_basic_test.cc: 1 output_shape attribute -> spatial-only. - conv_transpose_attributes.h: keep the rank+2 toleration (runtime-reachable for dynamic-rank inputs) and document why it is retained; no behavior change. - onnx.patch / binskim.patch: unchanged at the 3-hunk base (no ConvTranspose reverts) since this branch is built on the pre-B base. Breaking change: models specifying a rank+2 output_shape AND a statically-known input rank now fail to load under ONNX 1.22 with 'Attribute output_shape has incorrect size'. Migration: use spatial-only output_shape. See PR microsoft#28754 description. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ONNX 1.22.0rc2 tightened convPoolShapeInference to reject Conv inputs with rank < 3 at model load ("Input tensor must have at least 3 dimensions"). test_op_split.py declared a spec-invalid rank-2 input [6,3] feeding Conv (weight also rank-2), which rc2 now rejects, breaking test_quantize_split and test_quantize_split_s8s8 on 16 CI legs. Conform the test model to the ONNX Conv spec (Option-A philosophy): - input [6,3] -> NCHW [1,1,6,3] - conv_weight [6,3] -> 4D [2,1,1,1] (M=2 output channels, 1x1 kernel) - data_reader feed [6,3] -> [1,1,6,3] Conv output is now [1,2,6,3] = 36 elements, unchanged downstream: reshape -> [3,12], Split(axis=0, [1,1,1]) -> three [1,12] outputs. Op-type-count assertions are unaffected (only ranks changed). Validated against an rc2-linked onnxruntime build: full quantization unittest suite (314 tests) passes; the two previously-failing split tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…n + atol override for fp16 causal_conv Two ONNX-1.22 backend node-test failures were unmasked once the python quant suite stopped aborting before the OnnxBackendNodeModelTest stage: 1. test_attention_4d_softcap_neginf_mask_expanded (+ _poison_expanded) ERROR on the macOS-arm64-webgpu Release backend-conformance leg: the ONNX function-EXPANDED Attention reference decomposition trips a SizeFromDimension underflow (dim = SIZE_MAX) downstream of the bias Add (prime suspect softmax.cc:177). The native FUSED Attention kernel (test_attention_4d_softcap_neginf_mask[_poison], no _expanded) passes on every arch, and CPU EP passes the expanded tests on x64 + Linux-arm64 and in the ONNX ReferenceEvaluator -- so this is not user-facing; only the expanded reference graph trips on that build. Placed in the GLOBAL current_failing_tests list (matching the existing expanded-attention webgpu-skip precedent at L38/40/42), so the 2 _expanded REFERENCE decompositions are skipped on all configs including CPU. Global was chosen over the current_failing_tests_WEBGPU section for a deterministic green that is independent of the supports_device('WEBGPU') runtime condition. The FUSED production Attention kernel stays fully covered on every arch; only the 2 expanded reference variants stop running. Tracked for a follow-up fix. 2. test_causal_conv_with_state_silu_fp16 (+ _expanded) is a single-ULP fp16 tolerance miss on arm64 (max rel diff 0.001174). Given an atol override of 5e-4 (mirroring test_attention_4d_fp16) rather than a skip, so coverage is retained on all platforms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ow (microsoft#28969) The vectorized broadcast path counted trailing "shared" dimensions with a loop that treated an exhausted operand's implicit size-1 dim as a match against a literal 1. For unequal-rank operands with leading unit dims (e.g. Add lhs=[1,1,6,6] + rhs=[6,6]), num_shared_dimension grew past the smaller operand's rank, so lhs/rhs SizeFromDimension(rank - num_shared) underflowed (size_t wrap to SIZE_MAX) and tripped ORT_ENFORCE in TensorShape::SizeFromDimension, failing every WebGPU binary op at the [...,1,d,e] + [d,e] corner. Extract the shared-trailing-dim math into a deviceless free helper CountSharedTrailingDimensions that breaks as soon as EITHER operand runs out of real dimensions, bounding the count by min(lhs rank, rhs rank). The shared-dim product (and thus the divisible-by-4 vectorize decision) is unchanged for all existing cases; only the previously-underflowing corner is corrected. Add a deviceless gtest on the helper and an end-to-end OpTester regression on DefaultWebGpuExecutionProvider (Add [1,1,6,6]+[6,6]) that fails pre-fix with the SIZE_MAX SizeFromDimension enforce and passes post-fix. Verified locally against lavapipe software Vulkan; the full elementwise/broadcast suite (62 tests) stays green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fixed) Remove the two global skips for test_attention_4d_softcap_neginf_mask_expanded and its _poison_expanded variant. They were added only to dodge the WebGPU binary-elementwise broadcast SizeFromDimension underflow (microsoft#28969), which is now fixed in this branch by the CountSharedTrailingDimensions helper. The expanded function-reference Attention tests can run again on every config. The fp16 causal_conv atol override in onnx_backend_test_series_overrides.jsonc is an independent tolerance fix and is intentionally left in place. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…pe, filter-vs-override, latent bugs) Add three gotchas to .agents/skills/onnx-opset-bump-checklist/SKILL.md: - (j) the Linux webgpu CI leg is build-only; ONNX backend node tests (OnnxBackendNodeModelTest) only execute on the macOS-arm64 webgpu leg, so a green Linux webgpu leg does not mean WebGPU actually ran them. - (k) a filter-vs-override decision rubric: filters.jsonc SKIPs for a real EP bug (cite issue + removal condition), overrides.jsonc RELAXes ATOL for benign fp16/ULP diffs (prefer over a skip when the kernel is correct) — but only after root-causing the diff as ~1 ULP; unexplained/large/growing diffs are bugs. - (l) new upstream reference tests can expose latent EP bugs (e.g. microsoft#28969, a WebGPU broadcast underflow surfaced by ONNX 1.22 expanded-Attention tests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…on Linux without a GPU Add .agents/skills/webgpu-local-testing/SKILL.md covering how to build and run ONNX Runtime WebGPU provider tests on Linux using a software Vulkan adapter (Mesa lavapipe): why software Vulkan suffices for host-side enforce/shape bugs and MatMul-free kernels, dnf install on Azure Linux, the --use_webgpu build flag, the onnxruntime_provider_test target with VK_ICD_FILENAMES, the lavapipe MatMul-family crash gotcha, and the fact that the Linux webgpu CI leg is build-only. Scope is called out explicitly: any MatMul-containing graph (including the expanded-Attention node tests that motivated microsoft#28969) cannot run on lavapipe and is validated only on macOS-arm64 Metal; microsoft#28969 itself was validated on lavapipe via a standalone Add-broadcast OpTester proxy, not the expanded-Attention node test. The lavapipe ICD path is noted as arch-specific (x86_64 vs aarch64). Cross-reference the new skill from ort-test (running WebGPU tests locally) and ort-build (--use_webgpu key flag). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ONNX 1.22.0 was released 2026-06-15 (tag v1.22.0, commit 2bb50465). The rc2->final delta touched only CI workflow yml + VERSION_NUMBER -- no operator schema, opset, or backend-testdata change -- so this is pure version plumbing. - cmake/deps.txt: onnx archive -> refs/tags/v1.22.0.zip (SHA1 2b2cd58a...) - cmake/external/onnx submodule -> 2bb50465112feca9003e1ed654d77f01ff1415ca - cmake/vcpkg-ports/onnx/portfile.cmake: REF v1.22.0 + tar.gz SHA512 13fafff0... - 7 CI requirements.txt: onnx==1.22.0rc1 -> onnx==1.22.0 (now on PyPI); the 3 transformers-model requirements stay frozen at onnx==1.18.0. - onnx.patch / binskim.patch unchanged (source identical rc2<->final; still apply). - filters.jsonc integration comment: 1.22.0rc1 -> 1.22.0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The *CurrentOpset fusion regression tests build/load models stamped at the current ONNX opset (27 in ONNX 1.22), which is still under development while opset 26 is the last released version. Under ORT's default strict load-time validation (ALLOW_RELEASED_ONNX_OPSET_ONLY unset or '1'), loading such a model throws, so these tests failed on every strict CI leg. Pass ModelOptions{allow_released_opsets_only=false, strict_shape_type_inference=false} through the model-construction/load path of these tests (mirroring the existing GatherToSliceFusion opset-27 precedent) so they RUN and PASS on every leg, strict or not, preserving opset-27 fusion coverage with no masking. - 9 TestGraphTransformer calls (Gelu/FastGelu/BiasGelu/MatMulAdd/DivMul/QuickGelu, LayerNorm/SkipLayerNorm, GQA-Qwen): append the ModelOptions argument. - AttentionFusionMobileClipMhaCurrentOpsetTest (TransformerTester): thread an optional ModelOptions through TransformerTester; load the serialized model via the istream Load overload so allow_released_opsets_only is honored (the byte/ proto Load overloads hardcode it). No product-code change. - 3 EmbedLayerNorm tests: pass ModelOptions to Model::Load in LoadModelAtCurrentOpset. - ReshapeFusionOpsetTest (ENABLE_TRAINING-only): its opset loop includes the current opset; apply the same ModelOptions to both TestGraphTransformer calls so it runs on strict training builds too. Training-gated (validated by analogy to the non-training pattern; not compiled in the default build). Validated: 13 non-training tests PASS under default(strict), =1, and =0; full onnxruntime_test_all under strict passes 1820 tests with no 'Opset 27 under development' throw. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add WHY comments + tracking issue refs (microsoft#28966, and microsoft#28969 on the WebGPU attention-fusion path) to the ModelOptions{allow_released_opsets_only=false} call sites in the *CurrentOpset fusion tests, so a future reader knows they can be removed once ONNX opset 27 ships. No test logic or ModelOptions args change. Extend the onnx-opset-bump-checklist skill with three hard-won gotchas from the 1.22.0 integration: (m) the vcpkg MS-internal asset mirror must be Terrapin-seeded with the new tag tarball or every --use_vcpkg leg 404s; (n) a FINAL onnx release can still ship a map-max opset > last released opset (1.22.0: 27 > 26), leaving it under-development; (o) prefer per-model ModelOptions{allow_released_opsets_only=false} over per-leg CI env flips or GTEST_SKIP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

titaiwangms · 2026-06-15T23:54:04Z

Review summary (review-team pass) — ONNX 1.22.0 / opset 27

Structurally matches the merged 1.21 bump (#27601): same version-plumbing, same versioned-split pattern for new-opset ops, same kMaxSupportedOpset bump. The EP GetMaxSupportedOpSet 25→27 jump is a legitimate catch-up (1.21 left these EPs at 25), not a regression. All spec claims were verified against the pinned 1.22 commit. No Critical/Major blockers.

Actionable

1. (Minor / consistency) WebGPU Range missed from the follow-up list
onnxruntime/core/providers/webgpu/generator/range.cc:96 registers Range open-ended at .SinceVersion(11), so WebGPU will claim opset-27 Range — the exact situation deferred for JS/DML, but WebGPU isn't in the follow-up list. Same risk class as the already-accepted JS deferral, so not a blocker. Suggest adding WebGPU alongside JS/DML in the follow-ups (or splitting it now).

2. (Minor / open question) bf16 Range-27 is untested; description slightly overstates
onnx_backend_test_series_filters.jsonc excludes both test_range_bfloat16_* base and _expanded variants (the Python harness can't materialize bf16), so bf16 Range-27 has no passing test. fp16 is covered (its node test isn't excluded and passes via function expansion). The description says "fp16/bf16 Range … pass", but bf16 is actually excluded. Low risk (same EP-agnostic function-expansion path as fp16). Suggest softening the wording, or adding a C++ OpTester bf16 Range-27 test that bypasses the numpy harness.

Optional polish (readability, non-blocking)

Extract the ModelOptions{/*allow_released_opsets_only*/ false, /*strict_shape_type_inference*/ false} literal (repeated ~10×) into a named kAllowUnreleasedOpset constant — the GatherToSlice site already shows the pattern.
binary_elementwise_broadcast_utils.h: move the header-design rationale out of the function docstring; rename dimA/dimB → lhs_dim/rhs_dim; the in-loop comment largely duplicates the block comment above.
Add @param model_options to the single-opset TestGraphTransformer doxygen overload (only the vector<int> overload was updated).

Praise

WebGPU broadcast underflow fix (Expanded Attention reference decomposition: SizeFromDimension SIZE_MAX underflow at Add node (surfaced by ONNX 1.22 integration) #28969): helper extraction + deviceless unit test + explicit 1-is-not-shared case is a clean, structural fix.
ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime: correctly hand-builds an unknown-rank graph to reach the runtime-only branch and pins it to the spatial-only reference.
Version plumbing is impeccably synchronized across deps.txt / submodule / vcpkg / the 7 requirements.txt files (correctly avoiding the frozen transformers pins).

Generated by a 5-reviewer pass (readability, code, critical, deep-spec, integration); the two Minor items above were re-verified against the source before posting.

tianleiwu

Review summary — ONNX 1.22.0 / opset 27 integration

Careful, well-documented version bump. I independently re-verified the risky
plumbing and the operator/optimizer/test changes; no blocking issues, only
optional nitpicks. Verdict: approve-leaning (commenting, not gating).

Independent verification (re-checked at this head)

Check	Result
`cmake/external/onnx` submodule pin	`2bb50465112feca9003e1ed654d77f01ff1415ca` = `v1.22.0` tag commit ✅
`cmake/deps.txt` SHA1 of `v1.22.0.zip`	`sha1sum` of fresh download matches `2b2cd58a…` ✅
`portfile.cmake` SHA512 of `v1.22.0.tar.gz`	`sha512sum` of fresh download matches `13fafff0…` ✅
vcpkg MS asset mirror (portfile SHA512)	`…/artifacts/13fafff0…` → HTTP 200 (mirrored; `--use_vcpkg` legs won't 404) ✅
`onnx.patch` ↔ `binskim.patch`	byte-identical (`sha1 6a4e6ed8…`) ✅
requirements pins	all 7 CI files at `1.22.0`; transformers files correctly left at `1.18.0` ✅

The onnx.patch rebase is sound: ONNX_MINIMAL_BUILD was re-expressed against
1.22's add_subdirectory(onnx) layout via target_sources(onnx PRIVATE … data_type_utils.cc),
the GroupNormalization-18 .Deprecate() removal is kept, and the now-upstreamed
Utils.cmake protobuf-warnings hunk is correctly dropped.

Correctness highlights

Range opset-27 split (CPU + CUDA): versioned [11,26] + new 27, matching
forward-declares and BuildKernelCreateInfo. fp16/bf16 correctly fall through to
ONNX function-body expansion; stash_type is irrelevant to the native path.
WebGPU broadcast fix (#28969): CountSharedTrailingDimensions now breaks once
either operand is exhausted, bounding the shared run by min(lhs_rank, rhs_rank).
Every divergence from the old inline loop is exactly a previously-underflowing
case (SizeFromDimension(rank − num_shared) size_t wrap), so the fix is strictly
safer. Deviceless unit test + end-to-end Add-broadcast coverage is thorough, and
the Dawn-free header avoids a webgpu-provider link dependency in the CPU test TU.
ConvTranspose output_shape conformance (breaking): rank+2 form now rejected
at Graph::Resolve for static rank (onnx#5400); the retained kernel branch is
correctly documented as runtime-reachable only for dynamic-rank inputs, and the
new ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime test exercises
exactly that path.
Backend-test filters: ^test_flexattention_(?!.*expanded) is correct given
ONNX's _cpu/_cuda method-name suffix — it excludes base test_flexattention_cpu
while preserving the _expanded_ver26 variants.
Test infra: threading ModelOptions + mirroring strict_shape_type_inference
onto the session and loading via std::istream (so allow_released_opsets_only is
honored) is the right way to exercise under-development opset 27 on strict legs.
Each relaxed call site is annotated with #28966.

Minor / optional (non-blocking)

Spurious 1 in Range version lists. {1, 11, 27} in gather_fusion.cc and
embed_layer_norm_fusion.cc carries a leading 1, but Range has no opset-1
schema, so 1 can never match a node's SinceVersion. Harmless and pre-existing,
but {11, 27} would be marginally cleaner since these lines are already being
edited.
Deferred follow-ups confirmed, not missed: JS Range stays open-ended at 11
(still matches opset-27 nodes), DML/ROCm unsplit, multi-EP OperatorKernels.md
regen, and a native fp16/bf16 Range-27 kernel are all listed as explicit
follow-ups in the description.

The new onnx-opset-bump-checklist and webgpu-local-testing skills capture the
exact gotchas hit here — a good durable artifact for the next bump.

@return

…exec-11) Test-only / docs-only readability polish on top of the opset-27 ModelOptions fusion-test fix. No behavior change. 1. Extract the magic boolean used at every *CurrentOpset fusion-test site. Introduce `constexpr bool kAllowReleasedOpsetsOnly` (in the shared graph_transform_test_builder.h, namespace onnxruntime::test) and use it as the ModelOptions::allow_released_opsets_only first argument at all 14 call sites across graph_transform_test.cc, graph_transform_test_layernorm.cc, group_query_attention_pre_norm_fusion_test.cc, and the GatherToSlice precedent. The constant mirrors the ctor argument name exactly so each site reads ModelOptions{kAllowReleasedOpsetsOnly, ...} (false = do not restrict to released opsets, i.e. load models stamped at the not-yet-released opset). strict_shape_type_inference=false behavior unchanged. 2. binary_elementwise_broadcast_utils.h: add Doxygen @param/@return docs to CountSharedTrailingDimensions and rename the local dimA/dimB -> lhs_dim/ rhs_dim for clarity. Stays inline + Dawn-free (only tensor_shape.h); behavior unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address tianleiwu's review nit on PR microsoft#28754: the Range op was introduced at ONNX opset 11 (there is no opset-1 Range schema), so the leading `1` in the `{1, 11, 27}` version lists is dead and never matches. Trim it to `{11, 27}`, keeping 27 so opset-27 Range nodes still match. Sites: - onnxruntime/core/optimizer/gather_fusion.cc (Range->Gather->Slice matcher) - onnxruntime/core/optimizer/embed_layer_norm_fusion.cc (two Range path-matchers) No behavior change: opset-1 Range never existed, so removing it cannot drop any real match; 11 and 27 are preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

titaiwangms · 2026-06-16T16:39:41Z

Thanks for the thorough review, @tianleiwu, and for independently re-verifying the version pins, vcpkg mirror, onnx.patch rebase, and the Range/WebGPU/ConvTranspose changes.

Good catch on the spurious leading 1 in the Range version lists. Since Range was introduced at ONNX opset 11, there's no opset-1 schema, so the 1 could never match a node's SinceVersion — it was dead. I've cleaned it up to {11, 27} (keeping 27 for the opset-27 Range nodes) at both sites:

onnxruntime/core/optimizer/gather_fusion.cc
onnxruntime/core/optimizer/embed_layer_norm_fusion.cc

The change is behavior-identical (exact-set version matching, the 1 was unreachable). Pushed as db8d8bbc24; CI is re-running on the new head.

The other items you flagged (JS Range left open-ended at 11, DML/ROCm unsplit, OperatorKernels.md regen, native fp16/bf16 Range-27 kernel) are intentional follow-ups noted in the PR description, not oversights. Let me know if you'd like any of those pulled into this PR instead.

Could you take another look and formally approve when you have a chance? Thanks!

tianleiwu

Re-review at `db8d8bbc` (delta since my prior review at `80674c19`)

Re-reviewed the two commits added since my last pass. The bump stays clean — no blocking issues from my side.

Resolved

Spurious leading 1 in the Range version lists (my prior nitpick): now {11, 27} in gather_fusion.cc and embed_layer_norm_fusion.cc. Verified behavior-identical — IsSupportedOptypeVersionAndDomain does exact-set matching and Range has no opset-1 schema, so the 1 was unreachable.

Independent re-verification at this head

Range-27 native kernel + stash_type: confirmed against the pinned ONNX 1.22.0 source (onnx/defs/generator/defs.cc, Range/27 + BuildFunctionBodyRange27). Opset 27 only adds float16/bfloat16 to T and a stash_type attribute that is consumed only when T ∈ {float16, bfloat16} ("Has no effect for other types"). The native CPU/CUDA kernels keep the 5-type constraint {float, double, int16, int32, int64}, for which the opset-27 function body is identical to opset 11 — so the [11,26] + 27 split is behavior-preserving and fp16/bf16 correctly route through ONNX function expansion. The kernel comments are accurate.
Broadcast util refactor (binary_elementwise_broadcast_utils.h): the doc + lhs_dim/rhs_dim rename does not change the loop's underflow-safe bound min(lhs rank, rhs rank, output_rank-1). Still correct.

Confirming the two minor items from your review-team pass (non-blocking)

WebGPU Range open-ended at .SinceVersion(11) (webgpu/generator/range.cc:96, not in this diff): I verified it is behavior-correct for its T ∈ {float, int32, int64} constraint — none of those types are touched by the opset-27 change and fp16/bf16 aren't registered there, so the SinceVersion(11) kernel that now also serves opset-27 nodes produces identical results. Purely a consistency/clarity gap; agree it belongs in the follow-up list alongside the JS/DML deferral (or split now). Not a blocker.
bf16 Range-27 test coverage / description wording: agree — the backend filter excludes both the base and _expanded bf16 variants, so bf16 has no passing test (fp16 does, via function expansion). Softening the description's "fp16/bf16 … pass" or adding a C++ OpTester bf16 case would make it precise. Minor.

The named-constant + broadcast-doc polish in d00fd69f reads well.

Verdict: no blocking issues; the remaining items are documented, non-blocking follow-ups.

titaiwangms force-pushed the integrate-onnx-1.22.0rc1 branch from 34b6359 to 06ced99 Compare June 2, 2026 23:34

titaiwangms force-pushed the integrate-onnx-1.22.0rc1 branch from 20b4010 to 85a11f2 Compare June 3, 2026 20:57

titaiwangms force-pushed the integrate-onnx-1.22.0rc1 branch from 017edb3 to 85a11f2 Compare June 3, 2026 22:31

titaiwangms mentioned this pull request Jun 5, 2026

[THROWAWAY/DO-NOT-MERGE] Validate onnx#8056 Xcode fix via ORT iOS CI #28815

Closed

titaiwangms force-pushed the integrate-onnx-1.22.0rc1 branch from 031c777 to 2f63b0a Compare June 9, 2026 20:18

titaiwangms mentioned this pull request Jun 10, 2026

Expanded Attention reference decomposition: SizeFromDimension SIZE_MAX underflow at Add node (surfaced by ONNX 1.22 integration) #28969

Closed

titaiwangms linked an issue Jun 10, 2026 that may be closed by this pull request

Expanded Attention reference decomposition: SizeFromDimension SIZE_MAX underflow at Add node (surfaced by ONNX 1.22 integration) #28969

Closed

titaiwangms force-pushed the integrate-onnx-1.22.0rc1 branch from 01b9673 to e72e1de Compare June 10, 2026 20:54

titaiwangms and others added 16 commits June 15, 2026 20:07

Add onnx-opset-bump-checklist skill doc

8b3fb0f

Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Bump CUDA Range kernel to opset 27 (mirror CPU)

1c4a280

Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update FunctionTest cycle-rejection assertions for ONNX 1.22 checker …

db5b7cd

…message Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix opset-27 GatherToSliceFusion test on strict ALLOW_RELEASED_ONNX_O…

38f1724

…PSET_ONLY legs Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Filter unimplemented ONNX 1.22 FlexAttention + dtype-256 backend tests

08dfd41

Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

titaiwangms and others added 12 commits June 15, 2026 20:08

Pin released opset in symlink-data test to avoid ONNX 1.22 opset-27 d…

f799965

…efault Agent-signed-off: Developer (d307842f) [claude-opus-4.6 via copilot] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

titaiwangms force-pushed the integrate-onnx-1.22.0rc1 branch from e72e1de to 0a25d14 Compare June 15, 2026 20:22

titaiwangms changed the title ~~Integrate ONNX 1.22.0rc1 (opset 27) — issue #28752~~ Integrate ONNX 1.22.0 (opset 27) — issue #28752 Jun 15, 2026

titaiwangms and others added 2 commits June 15, 2026 22:06

titaiwangms requested review from gramalingam, justinchuby, tianleiwu and xadupre June 15, 2026 23:46

tianleiwu reviewed Jun 16, 2026

View reviewed changes

titaiwangms and others added 2 commits June 16, 2026 00:22

tianleiwu reviewed Jun 16, 2026

View reviewed changes

tianleiwu approved these changes Jun 16, 2026

View reviewed changes

titaiwangms merged commit 43fd961 into microsoft:main Jun 16, 2026
86 checks passed

titaiwangms deleted the integrate-onnx-1.22.0rc1 branch June 16, 2026 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate ONNX 1.22.0 (opset 27) — issue #28752#28754

Integrate ONNX 1.22.0 (opset 27) — issue #28752#28754
titaiwangms merged 32 commits into
microsoft:mainfrom
titaiwangms:integrate-onnx-1.22.0rc1

titaiwangms commented Jun 2, 2026 •

edited

Loading

Uh oh!

titaiwangms commented Jun 15, 2026

Uh oh!

tianleiwu left a comment

Uh oh!

titaiwangms commented Jun 16, 2026

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

titaiwangms commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integrate ONNX 1.22.0rc1 (opset 27)

Update — ONNX 1.22.0 FINAL re-pin + rebase onto upstream/main + closes #28969

Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX output_shape spec

What changed (29 files)

Validation (CPU EP + CUDA EP, Linux x64)

Standing caveats (please read before reviewing)

Follow-ups (explicitly NOT in this PR)

Uh oh!

titaiwangms commented Jun 15, 2026

Review summary (review-team pass) — ONNX 1.22.0 / opset 27

Actionable

Optional polish (readability, non-blocking)

Praise

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review summary — ONNX 1.22.0 / opset 27 integration

Independent verification (re-checked at this head)

Correctness highlights

Minor / optional (non-blocking)

Uh oh!

titaiwangms commented Jun 16, 2026

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Re-review at db8d8bbc (delta since my prior review at 80674c19)

Resolved

Independent re-verification at this head

Confirming the two minor items from your review-team pass (non-blocking)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

titaiwangms commented Jun 2, 2026 •

edited

Loading

Update — ONNX 1.22.0 FINAL re-pin + rebase onto `upstream/main` + closes #28969

Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX `output_shape` spec

Re-review at `db8d8bbc` (delta since my prior review at `80674c19`)