solver doesn't realize that `z > x && z < y` is unsat with the fact `x == y` #62215

hallo-wereld · 2023-04-18T15:32:24Z

version: Clang trunk
args:

--analyze -Xclang -analyzer-stats -Xclang -analyzer-checker=core,debug.ExprInspection -Xanalyzer -analyzer-constraints=z3 --analyzer-output text

Clang Static Analyzer doesn't realize that z > x && z < y is unsatisfiable with the fact x == y. See it live: https://godbolt.org/z/Te3d9GPM6

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-04-18T15:39:29Z

@llvm/issue-subscribers-clang-static-analyzer

EugeneZelenko · 2023-04-20T01:37:16Z

@hallo-wereld: Please elaborate.

hallo-wereld · 2023-04-20T02:30:14Z

@hallo-wereld: Please elaborate.

Sorry,

steakhal · 2023-08-26T14:05:35Z

Confirmed: https://godbolt.org/z/q6W8fhr67

int top(int x, int y, int z) {
    assert(x == y);
    assert(z > x);
    assert(z < y);
    // This should be dead code.
    clang_analyzer_warnIfReached();
}

We could probably get this work if we would accept some runtime cost.
Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general.
Here is what we could do:
If we have X RELOP Y,
For all X' eqclass of X, and Y' eqclass of Y: try to resolve X' RELOP Y'

0x21af · 2023-10-26T08:12:46Z

Confirmed: https://godbolt.org/z/q6W8fhr67
int top(int x, int y, int z) {
    assert(x == y);
    assert(z > x);
    assert(z < y);
    // This should be dead code.
    clang_analyzer_warnIfReached();
}
We could probably get this work if we would accept some runtime cost. Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general. Here is what we could do: If we have X RELOP Y, For all X' eqclass of X, and Y' eqclass of Y: try to resolve X' RELOP Y'

That idea does sound promising! Thanks :)

tianxinghe · 2023-10-30T14:36:42Z

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

steakhal · 2023-10-30T16:13:38Z

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

tianxinghe · 2023-10-30T16:24:46Z

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you！

steakhal · 2023-10-30T16:40:59Z

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you！

Let me explain.
CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints.
The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.

Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated.
What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|, arguing that usually, eqclasses are really small.

[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP.
It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.

tianxinghe · 2023-10-30T17:18:42Z

Thank you for such a detailed explanation. I have a fever and am seeing a doctor. I will read it carefully later. Thank you again!

…

---Original--- From: "Balazs ***@***.***> Date: Tue, Oct 31, 2023 00:41 AM To: ***@***.***>; Cc: "Tianxing ***@***.******@***.***>; Subject: Re: [llvm/llvm-project] solver doesn't realize that `z > x && z < y`is unsat with the fact `x == y` (Issue #62215) Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit? In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question. I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you！ Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned. Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|`, arguing that usually, eqclasses are really small. [1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP. It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

martong · 2023-11-03T17:46:38Z

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you！

Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.

Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|, arguing that usually, eqclasses are really small.

[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP. It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.

I think you can get this done within O(N) steps (N := |eqclass(X)|). When you assume z > x then in the constraint solver you assign the not-null range ([[-inf, -1], [1, inf]]) to z > x. Then when you assume z < y, you can build the symbol z < S for each element in the eq class of x. Query (infer) the ranges to all z < S and you should find different results which means there is a contradiction, you can return the infeasible (null) state.

…dictions The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, then check the eqclasses of X and Y respectively and for `X' RELOP Y'` SymSymExprs and try to infer their ranges. If there is no contradiction with any of the equivalent alternatives, then intersecting all these RangeSets should never be empty - aka. there should be a value satisfying the constraints we have. It costs around `|eqclass(X)| + |eqclass(y)|`. The approach has its limitations, as demonstrated by `gh_62215_contradicting_nested_right_equivalent`, where we would need to apply the same logic, but on a sub-expression of a direct operand. Before the patch, line 90, 100, and 112 would be reachable; and become unreachable after this. Line 127 will remain still reachable, but keep in mind that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would be eliminated. The idea comes from llvm#62215 (comment) Fixes llvm#62215

steakhal · 2023-11-04T14:17:08Z

@martong Thanks for the idea, when I read it it clicked and I could not resist experimenting with it today.
Luckily, after an afternoon, I have a patch for it.

github-actions bot added the new issue label Apr 18, 2023

EugeneZelenko added clang:static analyzer and removed new issue labels Apr 18, 2023

hallo-wereld closed this as completed Apr 20, 2023

hallo-wereld reopened this Apr 20, 2023

hallo-wereld mentioned this issue Apr 20, 2023

clang static analyzer make an imprecise assumption for x #62181

Closed

steakhal added the confirmed Verified by a second party label Aug 26, 2023

steakhal changed the title ~~clang static analyzer doesn't realize that z > x && z < y is unsatisfiable with the fact x == y~~ solver doesn't realize that z > x && z < y is unsat with the fact x == y Aug 26, 2023

steakhal linked a pull request Nov 4, 2023 that will close this issue

[analyzer][solver] On SymSym RelOps, check EQClass members for contradictions #71284

Open

0x21af mentioned this issue Nov 7, 2023

solver doesn't realize that z > x && z < y is unsat with the fact x == y Geoffrey1014/SA_Bugs#76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

solver doesn't realize that `z > x && z < y` is unsat with the fact `x == y` #62215

solver doesn't realize that `z > x && z < y` is unsat with the fact `x == y` #62215

hallo-wereld commented Apr 18, 2023 •

edited

llvmbot commented Apr 18, 2023

EugeneZelenko commented Apr 20, 2023

hallo-wereld commented Apr 20, 2023

steakhal commented Aug 26, 2023

0x21af commented Oct 26, 2023 •

edited

tianxinghe commented Oct 30, 2023

steakhal commented Oct 30, 2023

tianxinghe commented Oct 30, 2023

steakhal commented Oct 30, 2023 •

edited

tianxinghe commented Oct 30, 2023 via email

martong commented Nov 3, 2023

steakhal commented Nov 4, 2023

solver doesn't realize that z > x && z < y is unsat with the fact x == y #62215

solver doesn't realize that z > x && z < y is unsat with the fact x == y #62215

Comments

hallo-wereld commented Apr 18, 2023 • edited

llvmbot commented Apr 18, 2023

EugeneZelenko commented Apr 20, 2023

hallo-wereld commented Apr 20, 2023

steakhal commented Aug 26, 2023

0x21af commented Oct 26, 2023 • edited

tianxinghe commented Oct 30, 2023

steakhal commented Oct 30, 2023

tianxinghe commented Oct 30, 2023

steakhal commented Oct 30, 2023 • edited

tianxinghe commented Oct 30, 2023 via email

martong commented Nov 3, 2023

steakhal commented Nov 4, 2023

solver doesn't realize that `z > x && z < y` is unsat with the fact `x == y` #62215

solver doesn't realize that `z > x && z < y` is unsat with the fact `x == y` #62215

hallo-wereld commented Apr 18, 2023 •

edited

0x21af commented Oct 26, 2023 •

edited

steakhal commented Oct 30, 2023 •

edited