Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains targeting bool#28102
Fix RemoveDuplicateCastTransformer incorrectly eliminating lossy Cast chains targeting bool#28102
Conversation
… chains Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/b2fb07f8-4b7f-4c42-90af-06406e642903 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
|
@copilot, please also consider some cases like below. Although the first one is lossy conversion, it can still be optimized as Cast(float->int32). Cast(float->float16)->Cast(float16->int32) Could you do some analysis of all combinations to find some rules to loose the constraint? |
…s bool Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/e270d8c9-47ab-4b29-bc13-d85613154d30 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Refined the constraint in 7ebf3be. Now only blocks removal when Analysis: The bool destination is uniquely problematic because its conversion semantics (non-zero → true, zero → false) interact badly with any lossy intermediate cast that can map non-zero values to zero:
For non-bool destinations, lossy intermediates may change specific boundary values but the overall behavior is acceptable (rounding differences, same saturation behavior):
Added tests for both the reviewer's examples (verifying they ARE optimized) and the original bug case (verifying it's blocked). |
|
Verified the refined constraint in 7ebf3be. Built and ran all 24 Analysis of the three guard conditions and how they interact: The
Why This chain is not directly optimized by the Why Here Why
|
|
We can add this, but my concern is we're adding complexity for a contrived example. Has this issue ever affected a production model? |
I searched Transformers modeling code and did not find a direct float->int->bool source pattern. I did find many production model families, especially DETR-style vision models, that do float mask interpolation followed by bool conversion. So float-to-bool mask conversion is common in real model code, but the specific intermediate integer Cast in #28089 is still only proven by the minimal repro unless we can show an exported ONNX graph containing that exact chain. |
I'll sign off and leave it up to you to decide if we need this. It's not too much new code so the binary size cost shouldn't be high, and I assume any future edits will be done by AI so the extra complexity shouldn't cost developer time to understand it. |
Coverage gaps (nice-to-have, not blockers):Fan-out case: float→int32 with two children: int32→bool and int32→int64. Should block removal since one child targets bool. Not tested. Minor style nitThere are trailing whitespace characters on a few lines in the production code (lines ~442, ~475). These are cosmetic but may be flagged by lint CI: // that can map non-zero values to zero, changing the semantics. (trailing spaces) |
Description
RemoveDuplicateCastTransformerremoves the first Cast when all consumers are also Cast nodes, but didn't check whether the first Cast is a lossy conversion targeting a bool destination. This causedCast(float→int32) → Cast(int32→bool)to collapse intoCast(float→bool), changing truncation semantics:Fix: Block removal of the first Cast only when
loss_precision_castis true AND any kept child Cast targetsbool. Bool conversion (non-zero → true, zero → false) is uniquely problematic because any lossy intermediate cast that maps non-zero values to zero changes the boolean result. For non-bool destinations, lossy intermediates may change specific boundary values but the overall behavior is acceptable.This targeted constraint still allows optimization of lossy chains with non-bool destinations:
Cast(float→float16) → Cast(float16→int32)→Cast(float→int32)✓ (optimized)Cast(float→int64) → Cast(int64→int32)→Cast(float→int32)✓ (optimized)Cast(float→int32) → Cast(int32→bool)→ blocked ✓ (preserved)Changes:
onnxruntime/core/optimizer/insert_cast_transformer.cc— addedany_child_casts_to_boolcheck usingGetTypeGroup()+ updated commentsonnxruntime/test/framework/insert_cast_transformer_test.cc— regression testCastFloatToIntToBoolNotFused+ two tests verifying lossy chains with non-bool destinations are still optimized (LossyCastChainWithNonBoolDestIsOptimized,LossyCastFloatToInt64ToInt32IsOptimized)Motivation and Context
The graph optimizer's cast deduplication assumed
(high precision → low precision → lower precision)chains could always be collapsed. This is incorrect when the intermediate type introduces truncation that affects downstream bool semantics — specifically float→integer truncation before a bool cast, where small non-zero floats truncate to zero and then convert tofalseinstead oftrue.The fix is scoped to bool destinations because bool's zero/non-zero test is uniquely sensitive to truncation. Other destination types tolerate the small value differences from removing lossy intermediates (rounding differences, consistent saturation behavior).