-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Trust base to derived casts for dynamic types #69057
base: main
Are you sure you want to change the base?
Conversation
When doing a base to derived cast, and we should add a cast info recording that fact. This information will be then used for example inside `CXXInstanceCall::getRuntimeDefinition` (CallEvent.cpp), where for virtual calls, it will query the associated dynamic type for the `*this` region. Note that inside `ExprEngine::defaultEvalCall`, if the runtime definition is found but it might have other definitions, then we will split the path and have one where we inline the candidate and one other doing conservative call evaluation. This ensures that if we inlined the wrong definition, we still have a path where conservative evaluation happened. In addition to this, remember that once a function was conservatively evaluated on a path, then further calls to the same function will always result in conservative evaluation, see `ExprEngine::BifurcateCall` and the `DynamicDispatchBifurcationMap`. As a consequence of these rules, we can have execution paths where a function call might be resolved to different implemetations, depending on the sequence of events, like here: ```C++ void top(Base *p) { p->fun(); // Will split into 2 paths: // 1) Inlines the virtual `Base::fun()`. // 2) Conservative eval calls, and remembers to always conservatively eval this function. // Perform a base to derived cast, and just discard the result. (void)static_cast<Derived *>(p); p->fun(); // Will split path (1) into 2: // 2.1) Inlines the virtual `Derived::fun()`. // 2.2) Conservative eval calls, and remembers to always conservatively eval this function. // Now, we have exactly 3 paths at this point. } ``` Fixes llvm#62663
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Balazs Benics (steakhal) ChangesWhen doing a base to derived cast, and we should add a cast info recording that fact. Note that inside As a consequence of these rules, we can have execution paths where a function call might be resolved to different implemetations, depending on the sequence of events, like here: void top(Base *p) {
p->fun(); // Will split into 2 paths:
// 1) Inlines the virtual `Base::fun()`.
// 2) Conservative eval calls, and remembers to always conservatively eval this function.
// Perform a base to derived cast, and just discard the result.
(void)static_cast<Derived *>(p);
p->fun(); // Will split path (1) into 2:
// 2.1) Inlines the virtual `Derived::fun()`.
// 2.2) Conservative eval calls, and remembers to always conservatively eval this function.
// Now, we have exactly 3 paths at this point.
} Fixes #62663 Full diff: https://github.com/llvm/llvm-project/pull/69057.diff 3 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp b/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp
index eda8302595ba4fb..6539f3c1e09e321 100644
--- a/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp
@@ -20,6 +20,7 @@
//
//===----------------------------------------------------------------------===//
+#include "clang/AST/OperationKinds.h"
#include "clang/AST/ParentMap.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Basic/Builtins.h"
@@ -392,10 +393,6 @@ void DynamicTypePropagation::checkPostCall(const CallEvent &Call,
}
}
-/// TODO: Handle explicit casts.
-/// Handle C++ casts.
-///
-/// Precondition: the cast is between ObjCObjectPointers.
ExplodedNode *DynamicTypePropagation::dynamicTypePropagationOnCasts(
const CastExpr *CE, ProgramStateRef &State, CheckerContext &C) const {
// We only track type info for regions.
@@ -403,8 +400,19 @@ ExplodedNode *DynamicTypePropagation::dynamicTypePropagationOnCasts(
if (!ToR)
return C.getPredecessor();
- if (isa<ExplicitCastExpr>(CE))
+ if (CE->getCastKind() == CK_BaseToDerived) {
+ bool CastSucceeds = true;
+ QualType FromTy = CE->getSubExpr()->getType();
+ QualType ToTy = CE->getType();
+ ToR = ToR->StripCasts(/*StripBaseAndDerivedCasts=*/true);
+ State = setDynamicTypeAndCastInfo(State, ToR, FromTy, ToTy, CastSucceeds);
+ return C.addTransition(State);
+ }
+
+ // TODO: Handle the rest of explicit casts, inluding the regular C++ casts.
+ if (isa<ExplicitCastExpr>(CE)) {
return C.getPredecessor();
+ }
if (const Type *NewTy = getBetterObjCType(CE, C)) {
State = setDynamicTypeInfo(State, ToR, QualType(NewTy, 0));
@@ -609,9 +617,13 @@ storeWhenMoreInformative(ProgramStateRef &State, SymbolRef Sym,
/// symbol and the destination type of the cast are unrelated, report an error.
void DynamicTypePropagation::checkPostStmt(const CastExpr *CE,
CheckerContext &C) const {
+ ProgramStateRef State = C.getState();
+ ExplodedNode *AfterTypeProp = dynamicTypePropagationOnCasts(CE, State, C);
+
if (CE->getCastKind() != CK_BitCast)
return;
+ ASTContext &ASTCtxt = C.getASTContext();
QualType OriginType = CE->getSubExpr()->getType();
QualType DestType = CE->getType();
@@ -621,11 +633,6 @@ void DynamicTypePropagation::checkPostStmt(const CastExpr *CE,
if (!OrigObjectPtrType || !DestObjectPtrType)
return;
- ProgramStateRef State = C.getState();
- ExplodedNode *AfterTypeProp = dynamicTypePropagationOnCasts(CE, State, C);
-
- ASTContext &ASTCtxt = C.getASTContext();
-
// This checker detects the subtyping relationships using the assignment
// rules. In order to be able to do this the kindofness must be stripped
// first. The checker treats every type as kindof type anyways: when the
diff --git a/clang/test/Analysis/cast-trust-base-to-derived.cpp b/clang/test/Analysis/cast-trust-base-to-derived.cpp
new file mode 100644
index 000000000000000..e717e5617867ca9
--- /dev/null
+++ b/clang/test/Analysis/cast-trust-base-to-derived.cpp
@@ -0,0 +1,105 @@
+// RUN: %clang_analyze_cc1 %s -verify \
+// RUN: -analyzer-checker=core \
+// RUN: -analyzer-checker=alpha.security.taint.TaintPropagation \
+// RUN: -analyzer-checker=debug.ExprInspection
+
+// See issue https://github.com/llvm/llvm-project/issues/62663
+
+template <typename T> void clang_analyzer_dump(T);
+void clang_analyzer_warnIfReached();
+void clang_analyzer_numTimesReached();
+void clang_analyzer_isTainted(int);
+
+extern int scanf(const char *format, ...);
+
+class ActionHandler {
+public:
+ virtual ~ActionHandler() = default;
+ virtual void onAction(int x, int &) {
+ clang_analyzer_dump(x + 1); // expected-warning {{101}}
+ }
+};
+
+class MyHandler final : public ActionHandler {
+public:
+ void onAction(int x, int &) override {
+ clang_analyzer_dump(x + 2); // expected-warning {{202}}
+ }
+};
+
+void trust_static_types(ActionHandler *p) {
+ // This variable will help to see if conservative call evaluation happened or not.
+ int invalidation_detector;
+
+ // At this point we don't know anything about the dynamic type of `*p`, thus the `onAction` call might be resolved to the default implementation, matching the static type.
+ invalidation_detector = 1000;
+ p->onAction(100, invalidation_detector);
+ clang_analyzer_dump(invalidation_detector);
+ // expected-warning@-1 {{1000}} on this path we trusted the type, and resolved the call to `ActionHandler::onAction(int, int&)`
+ // expected-warning@-2 {{conj}} on this path we conservatively evaluated the previous call
+
+ clang_analyzer_numTimesReached(); // expected-warning {{2}} we only have that 2 paths here
+ // 1) inlined `ActionHandler::onAction(int, int&)`
+ // 2) conservatively eval called
+
+ // Trust that the `*p` refers to an object with `MyHandler` static type (or to some other sub-class).
+ auto *q = static_cast<MyHandler *>(p);
+ (void)q; // Discard the result of the cast. We already learned the type `p` might refer to.
+
+ invalidation_detector = 2000;
+ p->onAction(200, invalidation_detector);
+ clang_analyzer_dump(invalidation_detector);
+ // expected-warning@-1 {{2000}} on this path we trusted the type, and resolved the call to `MyHandler::onAction(int, int&)`
+ // expected-warning@-2 {{conj}} on this path we conservatively evaluated the previous call
+
+ clang_analyzer_numTimesReached(); // expected-warning {{3}} we only have 3 paths here, not 4
+ // 1) inlined 2 different callees
+ // 2) inlined only 1st
+ // 3) non were inlined
+ // 4) inlined only the second: This can't happen because if we conservative called a specific function on a path, we will always evaluate it like that.
+ // See ExprEngine::BifurcateCall and DynamicDispatchBifurcationMap.
+}
+
+// -------
+
+class Base {
+public:
+ virtual void OnRecvCancel(int port) = 0;
+};
+class Handler final : public Base {
+public:
+ void OnRecvCancel(int port) override {
+ clang_analyzer_dump(100 + port); // expected-warning {{+ 150}}
+ clang_analyzer_isTainted(port); // expected-warning {{YES}}
+ }
+};
+class PParent {
+public:
+ bool OnMessageReceived();
+};
+class Actor {
+public:
+ explicit Actor(Base* aRequest) : m(aRequest) {}
+protected:
+ Base* m;
+};
+
+class Parent : public Actor, public PParent {
+public:
+ explicit Parent(Base* aRequest) : Actor(aRequest) {}
+ bool RecvCancel(int port) {
+ clang_analyzer_dump(200 + port); // expected-warning {{+ 200}}
+ clang_analyzer_isTainted(port); // expected-warning {{YES}}
+ Handler* foo = (Handler*)m;
+ foo->OnRecvCancel(port + 50);
+ return true;
+ }
+};
+
+bool PParent::OnMessageReceived() {
+ int port;
+ scanf("%i", &port);
+ clang_analyzer_isTainted(port); // expected-warning {{YES}}
+ Parent* foo = static_cast<Parent*>(this);
+ return foo->RecvCancel(port);
+}
diff --git a/clang/test/Analysis/osobject-retain-release.cpp b/clang/test/Analysis/osobject-retain-release.cpp
index 2ae5752f4402374..b4085813a72a1aa 100644
--- a/clang/test/Analysis/osobject-retain-release.cpp
+++ b/clang/test/Analysis/osobject-retain-release.cpp
@@ -492,11 +492,13 @@ void check_required_cast() {
void check_cast_behavior(OSObject *obj) {
OSArray *arr1 = OSDynamicCast(OSArray, obj);
- clang_analyzer_eval(arr1 == obj); // expected-warning{{TRUE}}
- // expected-note@-1{{TRUE}}
- // expected-note@-2{{Assuming 'arr1' is not equal to 'obj'}}
- // expected-warning@-3{{FALSE}}
- // expected-note@-4 {{FALSE}}
+ clang_analyzer_eval(arr1 == obj); // #check_cast_behavior_1
+ // expected-warning@#check_cast_behavior_1 {{TRUE}}
+ // expected-note@#check_cast_behavior_1 {{TRUE}}
+ // expected-note@#check_cast_behavior_1{{Assuming 'arr1' is equal to 'obj'}}
+ // expected-warning@#check_cast_behavior_1 {{FALSE}}
+ // expected-note@#check_cast_behavior_1 {{FALSE}}
+ // expected-note@#check_cast_behavior_1{{Assuming 'arr1' is not equal to 'obj'}}
OSArray *arr2 = OSRequiredCast(OSArray, obj);
clang_analyzer_eval(arr2 == obj); // expected-warning{{TRUE}}
// expected-note@-1{{TRUE}}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some nits inline, overall, it looks good to me.
This patch makes me think whether we actually respect final on methods/classes to avoid unnecessary bifurcation on virtual calls. We probably do not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that I know about the dynamic type handling is that it's not too reliable; it would be great if you could improve it.
ExplodedNode *DynamicTypePropagation::dynamicTypePropagationOnCasts( | ||
const CastExpr *CE, ProgramStateRef &State, CheckerContext &C) const { | ||
// We only track type info for regions. | ||
const MemRegion *ToR = C.getSVal(CE).getAsRegion(); | ||
if (!ToR) | ||
return C.getPredecessor(); | ||
|
||
if (isa<ExplicitCastExpr>(CE)) | ||
if (CE->getCastKind() == CK_BaseToDerived) { | ||
bool CastSucceeds = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a rough intuition I feel that it's suspicious that here the code blindly assumes that the cast succeeds instead of checking the previously collected knowledge and somehow reporting an error if the cast contradicts it.
However I don't understand the exact details, so this might be the correct heuristic depending on other factors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it would make sense.
It's not currently the behavior, and I think this TODO comment might relate to this subject.
// From CastValueChecker.cpp:
// TODO list:
// - It only allows one succesful cast between two types however in the wild
// the object could be casted to multiple types.
// - It needs to check the most likely type information from the dynamic type
// map to increase precision of dynamic casting.
I haven't looked at the details of the dynamic type tracking either, but it feels like my patch makes one baby step to the right direction. I added a test demonstrating the same issue using that checker to confirm that it also mishandles this case.
However, one important differentiating factor is that such mishandling could happen much more frequently after this patch; on the other hand, fixing this at a wider scope (including DynamicTypePropagation, CastValueChecker, and how they interact with call inlining) would be a considerable amount of work.
IDK. Maybe @haoNoQ has opinions, given he was probably there when both of these classes were developed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looks like setDynamicTypeAndCastInfo()
is a very low-level primitive that blindly sets the state to whatever you think it should be. Judging by CastValueChecker's addCastTransition()
, the caller is supposed to fully figure out whether the cast succeeds, judging by the rest of the dynamic type map state, the static types in the AST and in regions, and the nature of the cast, and there's no reusable utility provided for that.
In our case this means that if we know the cast fails (eg., the dynamic type is already incompatible, or a similar cast has failed on the same object previously), we should probably sink the analysis. (Ideally we'd emit a warning about it.) I'm not sure how urgent such improvement is, but it's quite likely that lack of support for failed casts may leave the cast maps in an inconsistent state which would produce more weird issues later in the analysis.
clang_analyzer_eval(arr1 == obj); // #check_cast_behavior_1 | ||
// expected-warning@#check_cast_behavior_1 {{TRUE}} | ||
// expected-note@#check_cast_behavior_1 {{TRUE}} | ||
// expected-note@#check_cast_behavior_1{{Assuming 'arr1' is equal to 'obj'}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this looks like a regression. In this test, on the "cast succeeds" branch, arr1
should be known to be the same as obj
.
OSDynamicCast()
is a macro that implements custom RTTI in XNU. It is defined as:
13 #define OSDynamicCast(type, inst) \
14 ((type *) OSMetaClassBase::safeMetaCast(inst, type::metaClass))
where safeMetaCast()
is a function without visible definition that returns an OSMetaClassBase *
(the most base class in the OSObject
hierarchy).
Then the checker models safeMetaCast()
to split the state and either "pass through" the value or return 0. But the subsequent C-style cast is technically a base-to-derived class (because safeMetaCast()
doesn't return the actual type, but a base type). So it's likely that we've somehow regressed when we started actively modeling that cast.
We probably shouldn't have regressed though. Our desired behavior is the same as the old behavior: fully trust the cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at the exploded-graph before and after the patch, and the values and the branches are all the same. There is only one difference: in the new graph, after the (OSArray *)
cast in init-expr of the decl, we have the type info bound: Dynamic Types
:
SymRegion{reg_$0<OSObject * obj>} -> OSArray (or a sub-class)
Later, when we load for the analyzer_eval, we can see that on the branch where the safeMetaCast
succeeds, arr1
and obj
refers to the same lvalue, thus already proven to be equal.
BTW if I change the test into this, it would pass on both the old and new versions:
void check_cast_behavior(OSObject *obj) {
OSArray *arr1 = OSDynamicCast(OSArray, obj);
if (!arr1) return;
// expected-note@-1 {{'arr1' is non-null}} expected-note@-1 {{Taking false branch}}
// expected-note@-2 {{'arr1' is non-null}} expected-note@-2 {{Taking false branch}}
clang_analyzer_eval(arr1 == obj); // expected-warning{{TRUE}} expected-note{{TRUE}}
OSArray *arr2 = OSRequiredCast(OSArray, obj);
clang_analyzer_eval(arr2 == obj); // expected-warning{{TRUE}} expected-note{{TRUE}}
}
I think this test would describe the behavior for the "cast-succeeds" branch, aka. the branch where arr1
is non-null. Consequently, I believe this PR would not regress this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @haoNoQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back to this comment to double check, and I can confirm what's happening.
On the baseline revision, the node for clang_analyzer_eval(arr2 == obj)
has two parents:
- Where
arr2
is known to be0 (loc)
, by an eager assumption byclang_analyzer_eval(arr1 == obj)
. - Where
arr2
is known to be non-null, by the same eager assumption. However, for this we already had a message asAssuming 'arr1' is not equal to 'obj'
(see the baseline test).
After this PR, the exploded graph would no longer have two parents for the given bug-report, so it can't couldn't chose the same parent node in the exploded graph when constructing the bug report. This means that on that path, it will see an eager assumption to true
, and craft a piece message accordingly.
This means, that the bug report construction could have chosen the other parent for constructing the bug report, and have the same behavior as demonstrated after the PR.
The next question could be, why are the following exploded nodes not merged after this PR?
On the baseline it looks like this, (notice the two parents):
Anyways, this debunks the theory that this PR changes this test in any meaningful/practical ways - as far as I can see it.
This sounds crazy, but I think I found a bug in this patchset. I applied it on top of the 17.0.2 tag, and then ran the whole analysis on mozilla-central. I got segfaults on about 4000 executions, all with the same stack trace:
I took the smallest file, minimized it, and came up with this reproduction, which, admittedly, seems crazy to me. (What does
And the command:
Again, this is these 4 patches, put atop 17.0.2 (6009708). You can even download the compiler here (in the Artifacts tab of the clang-17 job). This shows the patch additions. If I run it using 17.0.2 without the patches, it does not fail. |
I looked into Tom's bug report and I hit the following assertion in a debug build:
I was also able to greatly reduce the CPP testfile:
If I output the
What I don't understand is why these are not reference types. @steakhal do you know more here? |
Thanks for the replies.
Its because prior we |
Hm, testing these patches on the original testcase in #62663 (the one where we use statements 1B and 2B) - I don't think this patchset solves that scenario... |
I think I've just fixed the reported crash for this PR. For some reason a
I've looked into the case and I think now I see your problem. In case we have If I'm not mistaken, inheritance has "open-set" semantics, which means that in our current translation-unit we might not see all the derived classes, and could be that some derived classes are loaded only at runtime using shared libraries fox example. What it means for us is that, by seeing a For example, in an other translation unit they might define a If we would have seen the allocation (thus the dynamic type) of the object of which we call functions, it's fairly easy to inline the right definitions. However, when we need to guess what could be the dynamic type, its no longer that easy. In presence of static-casts, the developer at least hinted a type, so in that case we have some justification for trusting the type there; otherwise all bets are off. What could we do about inlining function definitions without knowing precisely the dynamic type of the object? I think it would still make sense to assume well architected code and if only one thing derives (implements) a Base class, then it's probably safe to assume that that's the definition we want to inline. However, what should we do if multiple (N) classes implement Base? Trying each N, and basically splitting the state to (N+1) ways is not going to scale. Unless N is of course really small, like 2 or 3 at most. To conclude, I don't think we could get the |
To illustrate the case of my previous argument, here are two examples: // base.h BEGIN:
class Base {
public:
virtual int fun() const = 0;
};
class Derived1 final : public Base {
public:
int fun() const override { return 1; }
};
// base.h END
Base *spawn(); // Defined in "secondary.cpp"
template <class T> void clang_analyzer_dump(T) {}
int main() {
Base *p = spawn();
int n = p->fun();
clang_analyzer_dump(n); // conj; and never "1"
int z = 100 / (n - 2);
(void)z;
} And here is the example with definition of // secondary.cpp
#include "base.h"
class Derived2 final : public Base {
public:
int fun() const override { return 2; }
};
Base *spawn() {
return new Derived2();
} |
That's kind of what I imagined - try them all. The Analyzer has a built-in timeout mechanism that will come into play. If you wanted to be fancy, there could be an obscure config option, with a default of N=3. If config value is not set, and N>3 it asserts, alerting the user to the situation. If config value is set, it limits to analyzing the first N. |
When doing a base to derived cast, and we should add a cast info recording that fact.
This information will be then used for example inside
CXXInstanceCall::getRuntimeDefinition
(CallEvent.cpp), where for virtual calls, it will query the associated dynamic type for the*this
region.Note that inside
ExprEngine::defaultEvalCall
, if the runtime definition is found but it might have other definitions, then we will split the path and have one where we inline the candidate and one other doing conservative call evaluation. This ensures that if we inlined the wrong definition, we still have a path where conservative evaluation happened. In addition to this, remember that once a function was conservatively evaluated on a path, then further calls to the same function will always result in conservative evaluation, seeExprEngine::BifurcateCall
and theDynamicDispatchBifurcationMap
.As a consequence of these rules, we can have execution paths where a function call might be resolved to different implemetations, depending on the sequence of events, like here:
Fixes #62663