Fix two breaking CI failures: reshape-reshape rule test and logsumexp float16 tolerance by Copilot · Pull Request #2907 · microsoft/onnxscript

Copilot · 2026-05-02T03:45:42Z

Two unrelated CI failures: ORT-nightly incorrectly folds consecutive Reshape ops with allowzero=1, causing a shape mismatch between the original and rewritten models; torch-nightly exposes float16 precision gaps in logsumexp that exceed default tolerances.

Changes

_basic_rules_test.py — test_reshape_reshape_dynamic_rule: switch numerical comparison to use_reference=True (ONNX reference evaluator) instead of ORT. ORT's graph optimizer incorrectly folds Reshape(Reshape(x, s1), s2, allowzero=1) when the intermediate has zero-sized dimensions, producing wrong output shapes ((1,0,6,3) vs the correct (1,0,0,3)). The rewrite rule itself is semantically correct; ORT is the broken oracle.
ops_test_data.py — add tolerance={torch.float16: (2e-2, 1e-4)} to the logsumexp entry. Observed float16 drift in torch-nightly: absolute Δ ≈ 2.3e-5 (limit 1e-5), relative Δ ≈ 2.3e-3 (limit 1e-3).

…erance Agent-Logs-Url: https://github.com/microsoft/onnxscript/sessions/cb0312cb-87e4-4b34-845d-32155f2f2eb0 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

justinchuby · 2026-05-02T04:02:57Z

@copilot create a PR on onnxruntime to fix the reshape bug

Copilot · 2026-05-02T04:03:56Z

@copilot create a PR on onnxruntime to fix the reshape bug

I'm unable to create PRs on external repositories — I can only push to the current microsoft/onnxscript repo. To file the fix in onnxruntime, someone would need to:

Locate the graph optimizer pass that folds consecutive Reshape ops (likely in onnxruntime/core/optimizer/reshape_fusion.cc or a similar file).
Add a guard: if either Reshape node has allowzero=1 and any of the shape values are 0, the fold should be skipped, because the semantic meaning of 0 changes depending on the context (copy-from-input vs literal zero).
Add a test case covering Reshape(Reshape(x, shape_with_zeros, allowzero=0), shape_with_zeros, allowzero=1) with a zero-sized dimension to verify the correct output shape is produced.

The repro is: input shape (0, 6, 2) → first Reshape([3, 2, -1], allowzero=0) → second Reshape([0, 0, 3], allowzero=1). ORT currently gives (0, 6, 3) but the correct output is (0, 0, 3).

codecov · 2026-05-02T04:09:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.52%. Comparing base (79c0a99) to head (91bed69).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2907   +/-   ##
=======================================
  Coverage   72.52%   72.52%           
=======================================
  Files         258      258           
  Lines       31480    31480           
  Branches     2972     2972           
=======================================
  Hits        22831    22831           
  Misses       7640     7640           
  Partials     1009     1009

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… dims (#28349) ### Description `ReshapeFusion::FuseContiguousReshapes` collapses a chain of `Reshape` / `Squeeze` / `Unsqueeze` nodes into a single `Reshape` whose shape data is taken verbatim from the fully-inferred output shape of the last node in the chain. The new node is created without an `allowzero` attribute, so it defaults to `allowzero = 0`. When that inferred shape contains a literal `0` dim (legitimate when the original chain used `allowzero=1`, or when intermediate tensors had zero-sized dimensions), the fused `Reshape` misinterprets the `0` as "copy the corresponding dim from the input tensor" — but the input here is the original input of the *first* reshape in the chain, with unrelated dims. The result is a silently wrong output shape (and a benign-looking `MergeShapeInfo` warning at graph load). ### Repro (before the fix) ```python import numpy as np, onnx, onnxruntime as ort, onnx.reference from onnx import helper, TensorProto X = helper.make_tensor_value_info("X", TensorProto.FLOAT, [0, 6, 2]) Y = helper.make_tensor_value_info("Y", TensorProto.FLOAT, [None, None, None]) s1 = helper.make_tensor("s1", TensorProto.INT64, [3], [3, 2, -1]) s2 = helper.make_tensor("s2", TensorProto.INT64, [3], [0, 0, 3]) n1 = helper.make_node("Reshape", ["X", "s1"], ["mid"]) n2 = helper.make_node("Reshape", ["mid", "s2"], ["Y"], allowzero=1) m = helper.make_model(helper.make_graph([n1, n2], "g", [X], [Y], initializer=[s1, s2]), opset_imports=[helper.make_opsetid("", 18)]) inp = np.random.default_rng(7).random((0, 6, 2), dtype=np.float32) print("REF:", onnx.reference.ReferenceEvaluator(m).run(None, {"X": inp})[0].shape) print("ORT:", ort.InferenceSession(m.SerializeToString(), providers=["CPUExecutionProvider"]).run(None, {"X": inp})[0].shape) ``` Output on `main` (`40c9f85f69`): ``` REF: (0, 0, 3) [W ... graph.cc:122 MergeShapeInfo] Error merging shape info for output. 'Y' source:{0,6,3} target:{0,0,3}. Falling back to lenient merge. ORT: (0, 6, 3) ❌ ``` ### Fix Setting `allowzero=1` on the fused node would also work but requires opset >= 14, which this transformer cannot assume (it accepts `Reshape` opset 5+). Bail out of fusion conservatively when `shape_value` contains any literal `0` dim. ### Test Adds `ReshapeFusionContiguousReshapesWithZeroDim` that builds the bug repro programmatically and asserts: - the two reshapes are NOT collapsed - the inferred output shape stays `(0, 0, 3)` The existing happy-path test `ReshapeFusion_Contiguous_Reshape` (added in #22494) is unaffected — its inferred output shape `(2, 1, 64, 32)` contains no zero dims, so the new guard does not trigger. ### Provenance `FuseContiguousReshapes` was introduced in #22494 (Feb 2025). The bug has been latent in `main` since then. ### Motivation and Context Found while reviewing microsoft/onnxscript#2907 — the rewriter rule under test there is semantically correct, but its numerical-equivalence check using ORT as the oracle fails because of this fusion bug. Fixes #28348. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Initial plan

87b071f

Copilot AI assigned Copilot and justinchuby May 2, 2026

github-project-automation Bot added this to ONNX Script Review Board May 2, 2026

github-project-automation Bot moved this to Todo in ONNX Script Review Board May 2, 2026

Copilot started work on behalf of justinchuby May 2, 2026 03:45 View session

Copilot AI linked an issue May 2, 2026 that may be closed by this pull request

Fix breaking CIs #2906

Open

Fix breaking CIs: reshape-reshape rule test and logsumexp float16 tol…

c9c7759

…erance Agent-Logs-Url: https://github.com/microsoft/onnxscript/sessions/cb0312cb-87e4-4b34-845d-32155f2f2eb0 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix breaking CIs in ort-nightly and torch-nightly~~ Fix two breaking CI failures: reshape-reshape rule test and logsumexp float16 tolerance May 2, 2026

Copilot finished work on behalf of justinchuby May 2, 2026 04:01

Copilot AI requested a review from justinchuby May 2, 2026 04:01

Copilot started work on behalf of justinchuby May 2, 2026 04:03 View session

Copilot finished work on behalf of justinchuby May 2, 2026 04:04

This was referenced May 4, 2026

ReshapeFusion drops allowzero, producing wrong shape when inferred intermediate has 0-sized dim microsoft/onnxruntime#28348

Closed

Fix ReshapeFusion dropping allowzero on inferred 0-sized intermediate dims microsoft/onnxruntime#28349

Merged

Merge branch 'main' into copilot/fix-breaking-cis

91bed69

justinchuby marked this pull request as ready for review May 5, 2026 01:41

justinchuby approved these changes May 5, 2026

View reviewed changes

justinchuby enabled auto-merge (squash) May 5, 2026 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two breaking CI failures: reshape-reshape rule test and logsumexp float16 tolerance#2907

Fix two breaking CI failures: reshape-reshape rule test and logsumexp float16 tolerance#2907
Copilot wants to merge 3 commits intomainfrom
copilot/fix-breaking-cis

Copilot AI commented May 2, 2026 •

edited

Loading

Uh oh!

justinchuby commented May 2, 2026

Uh oh!

Copilot AI commented May 2, 2026

Uh oh!

codecov Bot commented May 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

justinchuby commented May 2, 2026

Uh oh!

Copilot AI commented May 2, 2026

Uh oh!

codecov Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 2, 2026 •

edited

Loading

codecov Bot commented May 2, 2026 •

edited

Loading