Add solve fuzzer and fix the soundness bugs it surfaced#9105
Add solve fuzzer and fix the soundness bugs it surfaced#9105alexreinking wants to merge 2 commits intomainfrom
Conversation
Adds a libfuzzer harness at test/fuzz/solve.cpp that exercises
solve_expression (semantic equivalence under random substitutions),
solve_for_inner_interval / solve_for_outer_interval (inside-inner
implies true; outside-outer implies false), and
and_condition_over_domain (weakened condition implies input) on
random expression trees up to depth 6 across Int(8..64), UInt(1..64),
Float(32), Float(64), with broadcasts and ramps enabled. The
random_expr_generator now skips bitwise ops on float types, which
would otherwise abort the fuzzer on construction.
Alongside the fuzzer, this fixes every soundness bug the fuzzer
surfaced while building it up to the wide type set:
Solve.cpp (src/Solve.cpp):
- visit_cmp rewrote `a + b @ c` to `a @ c - b` for every
comparison, but the subtraction wrap flips ordering (LT/LE/GT/GE)
under modular arithmetic. Restrict ordering rewrites to
no_overflow_int; EQ/NE stay (modular equivalence preserves
equality).
- visit_cmp's Mul rewrites `a*c @ b <=> a @ b/c && b%c == 0`
computed b/c and b%c for any shape of c, introducing div-by-zero
UB when c was a non-constant that evaluated to zero. The code
comment claimed it only fired on constant positive multipliers;
move the EQ/NE cases inside that guard and also require
no_overflow_int. Narrow signed types (where overflow is still UB)
go through the same gate.
- visit(Add)'s `a + a -> a * 2` used `Expr * int`, which aborts on
UInt(1) ("Integer constant 2 ... coerced to type uint1"). Switch
to Mul::make(a, make_const(a.type(), 2)); make_const truncates
modulo width, which is the correct modular value.
- visit(Add)/visit(Sub) `f(x)/a + g(x) -> (f(x) + g(x)*a)/a` only
holds under non-wrapping arithmetic. Gate on no_overflow_int.
- Canonicalize two inlined `a.type().is_int() && bits() >= 32`
predicates to no_overflow_int.
Bounds.cpp (src/Bounds.cpp):
- visit(Cast) took a blanket "no overflow possible" shortcut for
signed-int/at-least-int32 destinations regardless of the source,
which silently wrapped UInt(32)/UInt(64) -> Int(32) narrowings
and carried a wrapped-negative as "max". Restrict the shortcut
to non-unsigned sources.
- After the first restriction, the fuzzer also found that *bounded*
signed-int sources can exceed the destination and wrap (e.g.
int64 bounds [59, 4294967287] becoming int32 [59, -9], an
inverted interval that downstream treats as "always true").
Further restrict the shortcut to unbounded signed-int sources;
let bounded ones fall through to the existing
can_prove(cast(from, cast(to, x)) == x) fit check.
- Later, the fuzzer (with floats enabled) caught the same flavor
of bug for float sources: the comment claimed "float-to-int
casts truncate in place and preserve interval orientation", but
IntImm::make wraps by sign-extending low bits, and two float
endpoints can wrap to opposite sides of zero (inverting the
interval). Drop the float shortcut entirely; the bounded-a path
already does a sound to.can_represent(*fmin) fit check with a
fallback to bounds_of_type(to).
- visit(Mod) unconditionally claimed the result is in [0, ...],
which matches integer mod but not fmod (sign of the dividend;
NaN on zero divisor). For float types, return
bounds_of_type(t) (unbounded).
Simplify_Cast.cpp (src/Simplify_Cast.cpp):
- A nested-cast simplification stripped the inner cast when the
outer was narrower, on the grounds that the inner is always a
sign or zero extend. That assumption only holds when the inner
cast's source is itself int-or-uint; for a float source, the
inner is an fp-to-int conversion and stripping it changes the
value. Concretely, `int32(uint64(float64(-21)))` evaluates to 0
(fp-to-unsigned of negative is implementation-defined, often 0),
but the stripped `int32(float64(-21))` evaluates to -21. Add
the `cast->value.type().is_int_or_uint()` guard, mirroring the
sibling rule above it.
Each of the solver and bounds fixes has a companion regression test
in test/correctness/solve.cpp that, prior to the fix, reproduces the
specific failure mode (verified by reverting each fix individually).
With all fixes in place, the fuzzer runs clean for 250k+ iterations
across the full type set; the correctness suite (solve, bounds,
simplify, cast) still passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit dropped the float-source shortcut from visit(Cast) to fix the fuzzer-found unsoundness where constant float bounds outside int range wrapped via IntImm::make and inverted the resulting int interval. That was too broad: bilateral_grid (and similar pipelines) produce SYMBOLIC float bounds like select(r_sigma > 0, 0/r_sigma, 1/r_sigma) + 0.5 at bounds-inference time, which can't be proven to fit in int range but are load-bearing for downstream BoundsInference to resolve Func call extents -- without them, "histogram calls histogram in an unbounded way in dimension 2" and the build fails. Restore the float-source shortcut for bounded-a, but reject it only when an endpoint is a float *constant* we can prove exceeds the int range. Symbolic bounds pass through (matching Halide's convention that out-of-range float-to-int is the programmer's problem); constant bounds that would wrap do not. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| could_overflow = false; | ||
| } else if (from.is_float() && to.is_int() && to.bits() >= 32) { | ||
| // Float source to int32+: out-of-range float-to-signed-int | ||
| // casts are implementation-defined, so Halide's convention |
There was a problem hiding this comment.
This is slightly misleading. Float to signed int casts in Halide saturate for bits <= 16.
|
|
||
| // Mod is always positive | ||
| if (t.is_float()) { | ||
| // fmod's result takes the sign of the dividend (so it can be |
There was a problem hiding this comment.
Mod in Halide is supposed to be positive. For integers we have the Euclidean identity:
(a / b) * b + (a % b) == a
The division rounds up when b is negative, so the LHS of the mul is larger than the true answer, which means, when multiplied by the negative b, the LHS of the add is smaller than the true answer, which means mod must be positive.
I think the bug might be in lowering of floating point mod. It's currently lowered as:
a - floor(a / b) * b
but it should be
a - floor((a/b) * b)
There was a problem hiding this comment.
Thinking about this more, my rationale is this: Floating point mod should coincide with integer mod if the floats happen to be integers. This means if 7 % -2 == 1 in the integers 7.f % -2.f should be 1.f in floats.
There was a problem hiding this comment.
Floating point mod should coincide with integer mod if the floats happen to be integers.
I would be wary of deviating from what the hardware does (fmod) for this.
There was a problem hiding this comment.
We've already deviated for integer div. We should be consistent.
| // a float source makes `cast` an fp-to-int conversion, whose low | ||
| // bits are not the same as an fp-to-int conversion of a narrower | ||
| // type. For example, int32(uint64(float64(-21))) evaluates to 0 | ||
| // (undefined-behavior territory for negative float-to-unsigned) |
There was a problem hiding this comment.
I don't think casting a negative float to uint is UB in Halide. It's supposed to saturate.
There was a problem hiding this comment.
Though again, my understanding may not match the current lowering!
| // for the fit check below to prove, and the destination is | ||
| // wide enough that any concretization will fit. | ||
| // | ||
| // Unsigned sources are excluded unconditionally: unsigned |
There was a problem hiding this comment.
I don't think this is true either. A uint to int cast is assumed to not overflow.
There was a problem hiding this comment.
See #7814 for what happened when I tried to claim they wrap.
| // Several of these rewrites rearrange the comparison by adding | ||
| // or subtracting on both sides. That's only sound under an | ||
| // assumption of no integer overflow -- for types that may wrap | ||
| // (unsigned, or narrow signed whose overflow is UB and could |
There was a problem hiding this comment.
"narrow signed whose overflow is UB" Not entirely sure what narrow means here but our narrow signed ints don't have UB on overflow.
| // uint8 with c = 3, b = 7: the rewrite misses the | ||
| // solutions that arise from wrap). | ||
| // | ||
| // Don't use operator/ and operator % to sneak past the |
There was a problem hiding this comment.
I think the original comment was stale. Division and mod by zero is now fine (returns zero). For the rewrite I think this means:
a * 0 == b <=> a == b / 0
which is wrong, so it's just the comment that needs correcting.
I'm worried that the earlier comment says we would only use this for positive or negative constants, but there's no guard to check that. It might mean that we don't statically know at this point that it's a non-zero constant, but the usage site knows that for some other reason. That's sort of horrifying so I hope it isn't the case.
| @@ -644,16 +672,14 @@ class SolveExpression : public IRMutator { | |||
| } else if (is_ne) { | |||
| // f(x) * c != b -> f(x) != b/c || b%c != 0 | |||
There was a problem hiding this comment.
Oh I see the other rules were guarded by is_positive_const, just not this one, so it's not the horrifying situation. Let's check
f(x) * 0 == b <=> f(x) == b/0 || b%0 == 0
Simplifying both sides
b == 0 <=> true
Nope. The NE case is also busted.
There was a problem hiding this comment.
(meaning the code change here is correct)
| @@ -0,0 +1,404 @@ | |||
| #include "Halide.h" | |||
There was a problem hiding this comment.
The solver tests are currently in Solve.cpp at the bottom. It would be good to move all of them here, instead of just making a new collection.
| // Returns true if the expression, under the given substitution, contains a | ||
| // division or modulo whose divisor simplifies to zero. Solve is allowed to | ||
| // rearrange such expressions in ways that cause the simplifier to fold the UB | ||
| // inconsistently between the original and rewritten form -- so samples that |
| val = random_int_val(reg.fuzz, -32, 32); | ||
| } | ||
|
|
||
| // Skip samples that invoke div/mod-by-zero UB in the input: the |
| // extends the low bits and can invert the resulting | ||
| // interval, which is worse than just using bounds_of_type. | ||
| Interval s = a; | ||
| if (s.has_lower_bound()) s.min = simplify(s.min); |
There was a problem hiding this comment.
There is a method elsewhere in this file that simplifies whole intervals. It contains a critical difference to this: It only calls the simplifier once if max.same_as(min). This guarantees the two don't become two different but equal Expr objects, which would cause is_single_point to fail.
| double lo = -std::ldexp(1.0, to.bits() - 1); | ||
| double hi_exclusive = std::ldexp(1.0, to.bits() - 1); | ||
| bool const_oob = false; | ||
| if (s.has_lower_bound()) { |
There was a problem hiding this comment.
This path may break pipelines that rely on Bounds inference "knowing" that cast<int>(abs(unbounded_float)) is non-negative. I really think we need to just throw up our hands here if the cast is to an int32 and assume it's not going to overflow. People use all sorts of types cast to int to index luts, and they assume the bounds on the original are still true through the cast.
|
I think, unfortunately, like other fuzzers this one is going to have to carefully avoid getting itself into a situation where an Int(32) Expr overflows. |
| if (t.is_float()) { | ||
| // fmod's result takes the sign of the dividend (so it can be | ||
| // negative), its magnitude is bounded by |b| but can reach it, | ||
| // and fmod(x, 0) = NaN which is not in any real interval. Don't |
There was a problem hiding this comment.
I think we're assuming no-nans here. Otherwise it would be a hypothetical "strict_mod" intrinsic.
| // eliminated. The inner cast is either a sign extend | ||
| // or a zero extend, and the outer cast truncates the extended bits | ||
| // or a zero extend, and the outer cast truncates the extended bits. | ||
| // The requirement that cast->value is itself int-or-uint is crucial: |
There was a problem hiding this comment.
Comments should not spend this much space arguing why a hypothetical alternative version of this would be buggy. After the first sentence it's enough to just say that everything involved needs to be an int.
| Interval s = a; | ||
| if (s.has_lower_bound()) s.min = simplify(s.min); | ||
| if (s.has_upper_bound()) s.max = simplify(s.max); | ||
| double lo = -std::ldexp(1.0, to.bits() - 1); |
There was a problem hiding this comment.
Aside: I think we should consider allowing one-line if statements.
Summary
test/fuzz/solve.cppforsolve_expression,solve_for_{inner,outer}_interval, andand_condition_over_domain, running over random expression trees up to depth 6 acrossInt(8..64),UInt(1..64),Float(32),Float(64), with broadcasts and ramps enabled.Solve.cpp, four inBounds.cpp, and one inSimplify_Cast.cpp— each with a companion regression test intest/correctness/solve.cppthat reproduces the failure without the fix.Solve.cpp
visit_cmpa + b @ c -> a @ c - bflipped ordering under modular wrap; restrict tono_overflow_int, keep EQ/NE.visit_cmpMul rewrites introduced div-by-zero UB on non-constant multipliers; move EQ/NE cases under the positive-const guard and gate onno_overflow_int.visit(Add)a + a -> a * 2aborted onUInt(1); switch toMul::make(a, make_const(a.type(), 2)).visit(Add)/visit(Sub)f(x)/a + g(x) -> (f(x) + g(x)*a)/awrapped on narrow types; gate onno_overflow_int.is_int() && bits() >= 32tono_overflow_int.Bounds.cpp
visit(Cast)shortcut silently wrapped unsigned narrowings to signed (UInt(64) -> Int(32)carried a wrapped-negative as max); restrict to non-unsigned sources.int64 [59, 4294967287] -> int32 [59, -9]); further restrict the shortcut to unbounded signed-int sources and let bounded ones fall through to the existingcan_prove(cast(from, cast(to, x)) == x)fit check.to.can_represent(*fmin)fit check.visit(Mod)claimed[0, ...]even for floats;fmodtakes the sign of the dividend and is NaN on zero divisor, so returnbounds_of_type(t)for float types.Simplify_Cast.cpp
int32(uint64(float64(-21)))) stripping changes the value. Add theis_int_or_uint()guard.Test plan
ctest -R 'correctness_(solve|bounds|simplify)$'passestest/fuzz/fuzz_solve -runs=10000across multiple seeds with floats enabled runs cleanFixes #9101
🤖 Generated with Claude Code