Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains targeting bool by Copilot · Pull Request #28102 · microsoft/onnxruntime

Copilot · 2026-04-16T21:32:55Z

Description

RemoveDuplicateCastTransformer removes the first Cast when all consumers are also Cast nodes, but didn't check whether the first Cast is a lossy conversion targeting a bool destination. This caused Cast(float→int32) → Cast(int32→bool) to collapse into Cast(float→bool), changing truncation semantics:

float32(-0.1) → int32(0) → bool(false)   # correct: truncation zeros the value
float32(-0.1) → bool(true)               # wrong: non-zero maps to true

Fix: Block removal of the first Cast only when loss_precision_cast is true AND any kept child Cast targets bool. Bool conversion (non-zero → true, zero → false) is uniquely problematic because any lossy intermediate cast that maps non-zero values to zero changes the boolean result. For non-bool destinations, lossy intermediates may change specific boundary values but the overall behavior is acceptable.

This targeted constraint still allows optimization of lossy chains with non-bool destinations:

Cast(float→float16) → Cast(float16→int32) → Cast(float→int32) ✓ (optimized)
Cast(float→int64) → Cast(int64→int32) → Cast(float→int32) ✓ (optimized)
Cast(float→int32) → Cast(int32→bool) → blocked ✓ (preserved)

Changes:

onnxruntime/core/optimizer/insert_cast_transformer.cc — added any_child_casts_to_bool check using GetTypeGroup() + updated comments
onnxruntime/test/framework/insert_cast_transformer_test.cc — regression test CastFloatToIntToBoolNotFused + two tests verifying lossy chains with non-bool destinations are still optimized (LossyCastChainWithNonBoolDestIsOptimized, LossyCastFloatToInt64ToInt32IsOptimized)

Motivation and Context

The graph optimizer's cast deduplication assumed (high precision → low precision → lower precision) chains could always be collapsed. This is incorrect when the intermediate type introduces truncation that affects downstream bool semantics — specifically float→integer truncation before a bool cast, where small non-zero floats truncate to zero and then convert to false instead of true.

The fix is scoped to bool destinations because bool's zero/non-zero test is uniquely sensitive to truncation. Other destination types tolerate the small value differences from removing lossy intermediates (rounding differences, consistent saturation behavior).

… chains Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/b2fb07f8-4b7f-4c42-90af-06406e642903 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

tianleiwu · 2026-04-16T22:19:55Z

@copilot, please also consider some cases like below. Although the first one is lossy conversion, it can still be optimized as Cast(float->int32).

Cast(float->float16)->Cast(float16->int32)
Cast(float->int64)->Cast(int64->int32)

Could you do some analysis of all combinations to find some rules to loose the constraint?

…s bool Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/e270d8c9-47ab-4b29-bc13-d85613154d30 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Copilot · 2026-04-16T22:34:51Z

@copilot, please also consider some cases like below. Although the first one is lossy conversion, it can still be optimized as Cast(float->int32).

Cast(float->float16)->Cast(float16->int32)
...

Refined the constraint in 7ebf3be. Now only blocks removal when loss_precision_cast && any_child_casts_to_bool.

Analysis: The bool destination is uniquely problematic because its conversion semantics (non-zero → true, zero → false) interact badly with any lossy intermediate cast that can map non-zero values to zero:

float(-0.1) → int32(0) → bool(false) vs float(-0.1) → bool(true) — WRONG
double(1e-46) → float(0) → bool(false) vs double(1e-46) → bool(true) — WRONG

For non-bool destinations, lossy intermediates may change specific boundary values but the overall behavior is acceptable (rounding differences, same saturation behavior):

Cast(float→float16) → Cast(float16→int32) → Cast(float→int32): allowed ✓
Cast(float→int64) → Cast(int64→int32) → Cast(float→int32): allowed ✓
Cast(float→int32) → Cast(int32→bool): blocked ✓

Added tests for both the reviewer's examples (verifying they ARE optimized) and the original bug case (verifying it's blocked).

tianleiwu · 2026-04-16T22:56:52Z

Verified the refined constraint in 7ebf3be. Built and ran all 24 TransformerTest.* tests locally — all pass, including the three new regression tests.

Analysis of the three guard conditions and how they interact:

The RemoveDuplicateCastTransformer has three independent gates that can block removal of the first Cast:

inconsistent_casts — blocks high→low→high bit-count patterns (e.g., float32→float16→float32). This is the pre-existing check.
cross_ep — blocks fusion across different execution providers (added in [WebGPU] Failed to find kernel for Cast(13) for WebGpuExecutionProvider #27291).
loss_precision_cast && any_child_casts_to_bool — blocks lossy first cast when any child targets bool (the new check in this PR).

Why Cast(float→float16)→Cast(float16→int32) is optimized:

This chain is not directly optimized by the RemoveDuplicateCastTransformer (the inconsistent_casts check blocks it because UnsafeCast(int32, float16) = true). Instead, the InsertCastTransformer first inserts a float16→float32 cast before Cast_2 (since Cast_2's float16 input needs CPU float32 handling), creating float→float16→float32→int32. Then RemoveDuplicateCastTransformer collapses the A→B→A pattern (float→float16→float32) and removes the dangling first Cast, leaving float→int32.

Why Cast(float→int64)→Cast(int64→int32) is optimized:

Here inconsistent_casts = false (because UnsafeCast(int32, int64) returns false — 32 > 64 is false). No float16 inputs, so InsertCast does nothing. The cast_nodes_to_keep path runs, and since any_child_casts_to_bool = false (int32 is not bool), the first Cast is removed.

Why Cast(float→int32)→Cast(int32→bool) is blocked:

inconsistent_casts = false, but loss_precision_cast = true (float→int) and any_child_casts_to_bool = true (bool destination). The guard !(loss_precision_cast && any_child_casts_to_bool) evaluates to false, correctly preventing float(-0.1)→int32(0)→bool(false) from collapsing to float(-0.1)→bool(true).

skottmckay · 2026-04-16T23:40:12Z

We can add this, but my concern is we're adding complexity for a contrived example. Has this issue ever affected a production model?

tianleiwu · 2026-04-17T00:04:59Z

concern is we're adding complexity for a contrived example. Has this issue ever affected a production model?

I searched Transformers modeling code and did not find a direct float->int->bool source pattern. I did find many production model families, especially DETR-style vision models, that do float mask interpolation followed by bool conversion. So float-to-bool mask conversion is common in real model code, but the specific intermediate integer Cast in #28089 is still only proven by the minimal repro unless we can show an exported ONNX graph containing that exact chain.

skottmckay · 2026-04-17T00:12:00Z

concern is we're adding complexity for a contrived example. Has this issue ever affected a production model?

I searched Transformers modeling code and did not find a direct float->int->bool source pattern. I did find many production model families, especially DETR-style vision models, that do float mask interpolation followed by bool conversion. So float-to-bool mask conversion is common in real model code, but the specific intermediate integer Cast in #28089 is still only proven by the minimal repro unless we can show an exported ONNX graph containing that exact chain.

I'll sign off and leave it up to you to decide if we need this. It's not too much new code so the binary size cost shouldn't be high, and I assume any future edits will be done by AI so the extra complexity shouldn't cost developer time to understand it.

yuslepukhin · 2026-04-17T01:31:21Z

Coverage gaps (nice-to-have, not blockers):

Fan-out case: float→int32 with two children: int32→bool and int32→int64. Should block removal since one child targets bool. Not tested.
double→float→bool: Another lossy-to-bool pattern; would be a cheap additional test.
Non-lossy chain to bool: e.g., int32→int64→bool should be optimized (lossless upcast). Not tested directly.

Minor style nit

There are trailing whitespace characters on a few lines in the production code (lines ~442, ~475). These are cosmetic but may be flagged by lint CI:

// that can map non-zero values to zero, changing the semantics. (trailing spaces)
auto kept_dst_type = kept_node.OutputDefs()[0]->Type(); (trailing spaces)

Initial plan

8126fe6

Copilot AI assigned Copilot and tianleiwu Apr 16, 2026

Copilot started work on behalf of tianleiwu April 16, 2026 21:32 View session

Copilot AI linked an issue Apr 16, 2026 that may be closed by this pull request

ORT_ENABLE_ALL incorrectly eliminates Cast(float->int32)->Cast(int32->bool) and changes semantics #28089

Open

Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast…

c1b9e46

… chains Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/b2fb07f8-4b7f-4c42-90af-06406e642903 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

tianleiwu mentioned this pull request Apr 16, 2026

ORT_ENABLE_ALL incorrectly eliminates Cast(float->int32)->Cast(int32->bool) and changes semantics #28089

Open

Copilot AI changed the title ~~[WIP] Fix ORT_ENABLE_ALL optimizer casting issue~~ Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains Apr 16, 2026

Copilot finished work on behalf of tianleiwu April 16, 2026 22:26

Copilot AI requested a review from tianleiwu April 16, 2026 22:26

Copilot started work on behalf of tianleiwu April 16, 2026 22:26 View session

Refine constraint to only block lossy cast removal when destination i…

7ebf3be

…s bool Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/e270d8c9-47ab-4b29-bc13-d85613154d30 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

tianleiwu approved these changes Apr 16, 2026

View reviewed changes

tianleiwu marked this pull request as ready for review April 16, 2026 23:06

tianleiwu reviewed Apr 16, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/insert_cast_transformer.cc Outdated

Address cast transformer review feedback

0957192

tianleiwu requested a review from skottmckay April 16, 2026 23:19

Copilot AI changed the title ~~Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains~~ Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains targeting bool Apr 16, 2026

Copilot finished work on behalf of tianleiwu April 16, 2026 23:22

Copilot AI requested a review from tianleiwu April 16, 2026 23:22

tianleiwu requested review from hariharans29 and yuslepukhin April 16, 2026 23:32

refine

50da80a

skottmckay approved these changes Apr 17, 2026

View reviewed changes

microsoft deleted a comment from azure-pipelines bot Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains targeting bool#28102

Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains targeting bool#28102
Copilot wants to merge 5 commits intomainfrom
copilot/fix-ort-enable-all-cast-issue

Copilot AI commented Apr 16, 2026 •

edited

Loading

Uh oh!

tianleiwu commented Apr 16, 2026

Uh oh!

Copilot AI commented Apr 16, 2026

Uh oh!

tianleiwu commented Apr 16, 2026

Uh oh!

Uh oh!

skottmckay commented Apr 16, 2026

Uh oh!

tianleiwu commented Apr 17, 2026

Uh oh!

skottmckay commented Apr 17, 2026

Uh oh!

yuslepukhin commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

tianleiwu commented Apr 16, 2026

Uh oh!

Copilot AI commented Apr 16, 2026

Uh oh!

tianleiwu commented Apr 16, 2026

Uh oh!

Uh oh!

skottmckay commented Apr 16, 2026

Uh oh!

tianleiwu commented Apr 17, 2026

Uh oh!

skottmckay commented Apr 17, 2026

Uh oh!

yuslepukhin commented Apr 17, 2026

Coverage gaps (nice-to-have, not blockers):

Minor style nit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Apr 16, 2026 •

edited

Loading