Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solver doesn't realize that z > x && z < y is unsat with the fact x == y #62215

Open
hallo-wereld opened this issue Apr 18, 2023 · 12 comments · May be fixed by #71284
Open

solver doesn't realize that z > x && z < y is unsat with the fact x == y #62215

hallo-wereld opened this issue Apr 18, 2023 · 12 comments · May be fixed by #71284
Labels
clang:static analyzer confirmed Verified by a second party

Comments

@hallo-wereld
Copy link

hallo-wereld commented Apr 18, 2023

version: Clang trunk
args:

--analyze -Xclang -analyzer-stats -Xclang -analyzer-checker=core,debug.ExprInspection -Xanalyzer -analyzer-constraints=z3 --analyzer-output text

Clang Static Analyzer doesn't realize that z > x && z < y is unsatisfiable with the fact x == y. See it live: https://godbolt.org/z/Te3d9GPM6

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2023

@llvm/issue-subscribers-clang-static-analyzer

@EugeneZelenko
Copy link
Contributor

@hallo-wereld: Please elaborate.

@hallo-wereld
Copy link
Author

@hallo-wereld: Please elaborate.

Sorry,

@steakhal
Copy link
Contributor

Confirmed: https://godbolt.org/z/q6W8fhr67

int top(int x, int y, int z) {
    assert(x == y);
    assert(z > x);
    assert(z < y);
    // This should be dead code.
    clang_analyzer_warnIfReached();
}

We could probably get this work if we would accept some runtime cost.
Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general.
Here is what we could do:
If we have X RELOP Y,
For all X' eqclass of X, and Y' eqclass of Y: try to resolve X' RELOP Y'

@steakhal steakhal added the confirmed Verified by a second party label Aug 26, 2023
@steakhal steakhal changed the title clang static analyzer doesn't realize that z > x && z < y is unsatisfiable with the fact x == y solver doesn't realize that z > x && z < y is unsat with the fact x == y Aug 26, 2023
@0x21af
Copy link

0x21af commented Oct 26, 2023

Confirmed: https://godbolt.org/z/q6W8fhr67

int top(int x, int y, int z) {
    assert(x == y);
    assert(z > x);
    assert(z < y);
    // This should be dead code.
    clang_analyzer_warnIfReached();
}

We could probably get this work if we would accept some runtime cost. Given that usually eqclasses are not large, it shouldn't impose large runtime overhead in general. Here is what we could do: If we have X RELOP Y, For all X' eqclass of X, and Y' eqclass of Y: try to resolve X' RELOP Y'

That idea does sound promising! Thanks :)

@tianxinghe
Copy link

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

@steakhal
Copy link
Contributor

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

@tianxinghe
Copy link

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!

@steakhal
Copy link
Contributor

steakhal commented Oct 30, 2023

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!

Let me explain.
CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints.
The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.

Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated.
What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|, arguing that usually, eqclasses are really small.

[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP.
It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.

@tianxinghe
Copy link

tianxinghe commented Oct 30, 2023 via email

@martong
Copy link
Collaborator

martong commented Nov 3, 2023

Is it possible that CSA did not know the values of x, y, and z, so it directly determined unknown during the visit?

In the example of the issue, x, y, z were all symbolic, because they are the parameters of the top-level function being analyzed (where we don't know anything about their values). Other than that, I'm not sure if I understood your question.

I feel that the issue may be caused by the CSA visiting comparison statements and returning unknowns? I'm not sure about this. I should test it. Thank you!

Let me explain. CSA maps symbols to equalivalence classes (eqclasses), and then maps eqclasses to range sets as constraints. The rationale is that if x is now known to be equal to y, everything that they were equal to should be all equal and share the constraints we learned. This is achieved by intersecting the range sets of the merged eqclasses.

Now, if we learn that x == y, then learn that z > x; and then try to evaluate the expression z < y one would expect that we recognize that y could be substituted by x (or any other member of its eqclass). And AFAIK we would successfully recognize the pattern z < x when simplifying the expression and eliminate that path [1]. However, since we don't do such substitution, the pattern won't match and the path won't be eliminated. What I suggested was to not try to improve the simplification of relational operators, but rather brute-force try to use each member of the eqclass and try each combination. (yes, N*M combinations, where X RELOP Y, N := |eqclass(X)|, M := |eqclass(Y)|, arguing that usually, eqclasses are really small.

[1]: You can see that if you replace assert(z < y); by assert(z < x);, we would successfully eliminate the path, and the FP. It is because we have seen z > x before; thus z < x can't be also true; and this is recognized by some pattern when the z < x is simplified.

I think you can get this done within O(N) steps (N := |eqclass(X)|). When you assume z > x then in the constraint solver you assign the not-null range ([[-inf, -1], [1, inf]]) to z > x. Then when you assume z < y, you can build the symbol z < S for each element in the eq class of x. Query (infer) the ranges to all z < S and you should find different results which means there is a contradiction, you can return the infeasible (null) state.

steakhal added a commit to steakhal/llvm-project that referenced this issue Nov 4, 2023
…dictions

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`,
then check the eqclasses of X and Y respectively and for `X' RELOP Y'`
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives,
then intersecting all these RangeSets should never be empty - aka. there
should be a value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by
`gh_62215_contradicting_nested_right_equivalent`, where we would need to
apply the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become
unreachable after this. Line 127 will remain still reachable, but keep
in mind that when cross-checking with Z3 (aka. Z3 refutation), then all
4 reports would be eliminated.

The idea comes from
llvm#62215 (comment)

Fixes llvm#62215
@steakhal
Copy link
Contributor

steakhal commented Nov 4, 2023

@martong Thanks for the idea, when I read it it clicked and I could not resist experimenting with it today.
Luckily, after an afternoon, I have a patch for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer confirmed Verified by a second party
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants