Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Trust base to derived casts for dynamic types #69057

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

steakhal
Copy link
Contributor

When doing a base to derived cast, and we should add a cast info recording that fact.
This information will be then used for example inside CXXInstanceCall::getRuntimeDefinition (CallEvent.cpp), where for virtual calls, it will query the associated dynamic type for the *this region.

Note that inside ExprEngine::defaultEvalCall, if the runtime definition is found but it might have other definitions, then we will split the path and have one where we inline the candidate and one other doing conservative call evaluation. This ensures that if we inlined the wrong definition, we still have a path where conservative evaluation happened. In addition to this, remember that once a function was conservatively evaluated on a path, then further calls to the same function will always result in conservative evaluation, see ExprEngine::BifurcateCall and the DynamicDispatchBifurcationMap.

As a consequence of these rules, we can have execution paths where a function call might be resolved to different implemetations, depending on the sequence of events, like here:

void top(Base *p) {
  p->fun(); // Will split into 2 paths:
  // 1) Inlines the virtual `Base::fun()`.
  // 2) Conservative eval calls, and remembers to always conservatively eval this function.

  // Perform a base to derived cast, and just discard the result.
  (void)static_cast<Derived *>(p);

  p->fun(); // Will split path (1) into 2:
  // 2.1) Inlines the virtual `Derived::fun()`.
  // 2.2) Conservative eval calls, and remembers to always conservatively eval this function.

  // Now, we have exactly 3 paths at this point.
}

Fixes #62663

When doing a base to derived cast, and we should add a cast info
recording that fact.
This information will be then used for example inside
`CXXInstanceCall::getRuntimeDefinition` (CallEvent.cpp), where for
virtual calls, it will query the associated dynamic type for the `*this` region.

Note that inside `ExprEngine::defaultEvalCall`, if the runtime
definition is found but it might have other definitions, then we will
split the path and have one where we inline the candidate and one other
doing conservative call evaluation. This ensures that if we inlined the
wrong definition, we still have a path where conservative evaluation happened.
In addition to this, remember that once a function was conservatively
evaluated on a path, then further calls to the same function will always
result in conservative evaluation, see `ExprEngine::BifurcateCall` and
the `DynamicDispatchBifurcationMap`.

As a consequence of these rules, we can have execution paths where a
function call might be resolved to different implemetations, depending
on the sequence of events, like here:

```C++
void top(Base *p) {
  p->fun(); // Will split into 2 paths:
  // 1) Inlines the virtual `Base::fun()`.
  // 2) Conservative eval calls, and remembers to always conservatively eval this function.

  // Perform a base to derived cast, and just discard the result.
  (void)static_cast<Derived *>(p);

  p->fun(); // Will split path (1) into 2:
  // 2.1) Inlines the virtual `Derived::fun()`.
  // 2.2) Conservative eval calls, and remembers to always conservatively eval this function.

  // Now, we have exactly 3 paths at this point.
}
```

Fixes llvm#62663
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Oct 14, 2023
@steakhal steakhal removed clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Oct 14, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 14, 2023

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Balazs Benics (steakhal)

Changes

When doing a base to derived cast, and we should add a cast info recording that fact.
This information will be then used for example inside CXXInstanceCall::getRuntimeDefinition (CallEvent.cpp), where for virtual calls, it will query the associated dynamic type for the *this region.

Note that inside ExprEngine::defaultEvalCall, if the runtime definition is found but it might have other definitions, then we will split the path and have one where we inline the candidate and one other doing conservative call evaluation. This ensures that if we inlined the wrong definition, we still have a path where conservative evaluation happened. In addition to this, remember that once a function was conservatively evaluated on a path, then further calls to the same function will always result in conservative evaluation, see ExprEngine::BifurcateCall and the DynamicDispatchBifurcationMap.

As a consequence of these rules, we can have execution paths where a function call might be resolved to different implemetations, depending on the sequence of events, like here:

void top(Base *p) {
  p-&gt;fun(); // Will split into 2 paths:
  // 1) Inlines the virtual `Base::fun()`.
  // 2) Conservative eval calls, and remembers to always conservatively eval this function.

  // Perform a base to derived cast, and just discard the result.
  (void)static_cast&lt;Derived *&gt;(p);

  p-&gt;fun(); // Will split path (1) into 2:
  // 2.1) Inlines the virtual `Derived::fun()`.
  // 2.2) Conservative eval calls, and remembers to always conservatively eval this function.

  // Now, we have exactly 3 paths at this point.
}

Fixes #62663


Full diff: https://github.com/llvm/llvm-project/pull/69057.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp (+17-10)
  • (added) clang/test/Analysis/cast-trust-base-to-derived.cpp (+105)
  • (modified) clang/test/Analysis/osobject-retain-release.cpp (+7-5)
diff --git a/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp b/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp
index eda8302595ba4fb..6539f3c1e09e321 100644
--- a/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/DynamicTypePropagation.cpp
@@ -20,6 +20,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "clang/AST/OperationKinds.h"
 #include "clang/AST/ParentMap.h"
 #include "clang/AST/RecursiveASTVisitor.h"
 #include "clang/Basic/Builtins.h"
@@ -392,10 +393,6 @@ void DynamicTypePropagation::checkPostCall(const CallEvent &Call,
   }
 }
 
-/// TODO: Handle explicit casts.
-///       Handle C++ casts.
-///
-/// Precondition: the cast is between ObjCObjectPointers.
 ExplodedNode *DynamicTypePropagation::dynamicTypePropagationOnCasts(
     const CastExpr *CE, ProgramStateRef &State, CheckerContext &C) const {
   // We only track type info for regions.
@@ -403,8 +400,19 @@ ExplodedNode *DynamicTypePropagation::dynamicTypePropagationOnCasts(
   if (!ToR)
     return C.getPredecessor();
 
-  if (isa<ExplicitCastExpr>(CE))
+  if (CE->getCastKind() == CK_BaseToDerived) {
+    bool CastSucceeds = true;
+    QualType FromTy = CE->getSubExpr()->getType();
+    QualType ToTy = CE->getType();
+    ToR = ToR->StripCasts(/*StripBaseAndDerivedCasts=*/true);
+    State = setDynamicTypeAndCastInfo(State, ToR, FromTy, ToTy, CastSucceeds);
+    return C.addTransition(State);
+  }
+
+  // TODO: Handle the rest of explicit casts, inluding the regular C++ casts.
+  if (isa<ExplicitCastExpr>(CE)) {
     return C.getPredecessor();
+  }
 
   if (const Type *NewTy = getBetterObjCType(CE, C)) {
     State = setDynamicTypeInfo(State, ToR, QualType(NewTy, 0));
@@ -609,9 +617,13 @@ storeWhenMoreInformative(ProgramStateRef &State, SymbolRef Sym,
 /// symbol and the destination type of the cast are unrelated, report an error.
 void DynamicTypePropagation::checkPostStmt(const CastExpr *CE,
                                            CheckerContext &C) const {
+  ProgramStateRef State = C.getState();
+  ExplodedNode *AfterTypeProp = dynamicTypePropagationOnCasts(CE, State, C);
+
   if (CE->getCastKind() != CK_BitCast)
     return;
 
+  ASTContext &ASTCtxt = C.getASTContext();
   QualType OriginType = CE->getSubExpr()->getType();
   QualType DestType = CE->getType();
 
@@ -621,11 +633,6 @@ void DynamicTypePropagation::checkPostStmt(const CastExpr *CE,
   if (!OrigObjectPtrType || !DestObjectPtrType)
     return;
 
-  ProgramStateRef State = C.getState();
-  ExplodedNode *AfterTypeProp = dynamicTypePropagationOnCasts(CE, State, C);
-
-  ASTContext &ASTCtxt = C.getASTContext();
-
   // This checker detects the subtyping relationships using the assignment
   // rules. In order to be able to do this the kindofness must be stripped
   // first. The checker treats every type as kindof type anyways: when the
diff --git a/clang/test/Analysis/cast-trust-base-to-derived.cpp b/clang/test/Analysis/cast-trust-base-to-derived.cpp
new file mode 100644
index 000000000000000..e717e5617867ca9
--- /dev/null
+++ b/clang/test/Analysis/cast-trust-base-to-derived.cpp
@@ -0,0 +1,105 @@
+// RUN: %clang_analyze_cc1 %s -verify \
+// RUN:   -analyzer-checker=core \
+// RUN:   -analyzer-checker=alpha.security.taint.TaintPropagation \
+// RUN:   -analyzer-checker=debug.ExprInspection
+
+// See issue https://github.com/llvm/llvm-project/issues/62663
+
+template <typename T> void clang_analyzer_dump(T);
+void clang_analyzer_warnIfReached();
+void clang_analyzer_numTimesReached();
+void clang_analyzer_isTainted(int);
+
+extern int scanf(const char *format, ...);
+
+class ActionHandler {
+public:
+  virtual ~ActionHandler() = default;
+  virtual void onAction(int x, int &) {
+    clang_analyzer_dump(x + 1); // expected-warning {{101}}
+  }
+};
+
+class MyHandler final : public ActionHandler {
+public:
+  void onAction(int x, int &) override {
+    clang_analyzer_dump(x + 2); // expected-warning {{202}}
+  }
+};
+
+void trust_static_types(ActionHandler *p) {
+  // This variable will help to see if conservative call evaluation happened or not.
+  int invalidation_detector;
+
+  // At this point we don't know anything about the dynamic type of `*p`, thus the `onAction` call might be resolved to the default implementation, matching the static type.
+  invalidation_detector = 1000;
+  p->onAction(100, invalidation_detector);
+  clang_analyzer_dump(invalidation_detector);
+  // expected-warning@-1 {{1000}} on this path we trusted the type, and resolved the call to `ActionHandler::onAction(int, int&)`
+  // expected-warning@-2 {{conj}} on this path we conservatively evaluated the previous call
+
+  clang_analyzer_numTimesReached(); // expected-warning {{2}} we only have that 2 paths here
+  // 1) inlined `ActionHandler::onAction(int, int&)`
+  // 2) conservatively eval called
+
+  // Trust that the `*p` refers to an object with `MyHandler` static type (or to some other sub-class).
+  auto *q = static_cast<MyHandler *>(p);
+  (void)q; // Discard the result of the cast. We already learned the type `p` might refer to.
+
+  invalidation_detector = 2000;
+  p->onAction(200, invalidation_detector);
+  clang_analyzer_dump(invalidation_detector);
+  // expected-warning@-1 {{2000}} on this path we trusted the type, and resolved the call to `MyHandler::onAction(int, int&)`
+  // expected-warning@-2 {{conj}} on this path we conservatively evaluated the previous call
+
+  clang_analyzer_numTimesReached(); // expected-warning {{3}} we only have 3 paths here, not 4
+  // 1) inlined 2 different callees
+  // 2) inlined only 1st
+  // 3) non were inlined
+  // 4) inlined only the second: This can't happen because if we conservative called a specific function on a path, we will always evaluate it like that.
+  //    See ExprEngine::BifurcateCall and DynamicDispatchBifurcationMap.
+}
+
+// -------
+
+class Base {
+public:
+  virtual void OnRecvCancel(int port) = 0;
+};
+class Handler final : public Base {
+public:
+  void OnRecvCancel(int port) override {
+    clang_analyzer_dump(100 + port); // expected-warning {{+ 150}}
+    clang_analyzer_isTainted(port);  // expected-warning {{YES}}
+  }
+};
+class PParent  {
+public:
+  bool OnMessageReceived();
+};
+class Actor {
+public:
+  explicit Actor(Base* aRequest) : m(aRequest) {}
+protected:
+  Base* m;
+};
+
+class Parent : public Actor, public PParent {
+public:
+  explicit Parent(Base* aRequest) : Actor(aRequest) {}
+  bool RecvCancel(int port) {
+    clang_analyzer_dump(200 + port); // expected-warning {{+ 200}}
+    clang_analyzer_isTainted(port);  // expected-warning {{YES}}
+    Handler* foo = (Handler*)m;
+    foo->OnRecvCancel(port + 50);
+    return true;
+  }
+};
+
+bool PParent::OnMessageReceived() {
+  int port;
+  scanf("%i", &port);
+  clang_analyzer_isTainted(port); // expected-warning {{YES}}
+  Parent* foo = static_cast<Parent*>(this);
+  return foo->RecvCancel(port);
+}
diff --git a/clang/test/Analysis/osobject-retain-release.cpp b/clang/test/Analysis/osobject-retain-release.cpp
index 2ae5752f4402374..b4085813a72a1aa 100644
--- a/clang/test/Analysis/osobject-retain-release.cpp
+++ b/clang/test/Analysis/osobject-retain-release.cpp
@@ -492,11 +492,13 @@ void check_required_cast() {
 
 void check_cast_behavior(OSObject *obj) {
   OSArray *arr1 = OSDynamicCast(OSArray, obj);
-  clang_analyzer_eval(arr1 == obj); // expected-warning{{TRUE}}
-                                    // expected-note@-1{{TRUE}}
-                                    // expected-note@-2{{Assuming 'arr1' is not equal to 'obj'}}
-                                    // expected-warning@-3{{FALSE}}
-                                    // expected-note@-4   {{FALSE}}
+  clang_analyzer_eval(arr1 == obj); // #check_cast_behavior_1
+  // expected-warning@#check_cast_behavior_1 {{TRUE}}
+  // expected-note@#check_cast_behavior_1    {{TRUE}}
+  // expected-note@#check_cast_behavior_1{{Assuming 'arr1' is equal to 'obj'}}
+  // expected-warning@#check_cast_behavior_1 {{FALSE}}
+  // expected-note@#check_cast_behavior_1    {{FALSE}}
+  // expected-note@#check_cast_behavior_1{{Assuming 'arr1' is not equal to 'obj'}}
   OSArray *arr2 = OSRequiredCast(OSArray, obj);
   clang_analyzer_eval(arr2 == obj); // expected-warning{{TRUE}}
                                     // expected-note@-1{{TRUE}}

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some nits inline, overall, it looks good to me.

@Xazax-hun
Copy link
Collaborator

This patch makes me think whether we actually respect final on methods/classes to avoid unnecessary bifurcation on virtual calls. We probably do not.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that I know about the dynamic type handling is that it's not too reliable; it would be great if you could improve it.

clang/test/Analysis/cast-trust-base-to-derived.cpp Outdated Show resolved Hide resolved
ExplodedNode *DynamicTypePropagation::dynamicTypePropagationOnCasts(
const CastExpr *CE, ProgramStateRef &State, CheckerContext &C) const {
// We only track type info for regions.
const MemRegion *ToR = C.getSVal(CE).getAsRegion();
if (!ToR)
return C.getPredecessor();

if (isa<ExplicitCastExpr>(CE))
if (CE->getCastKind() == CK_BaseToDerived) {
bool CastSucceeds = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a rough intuition I feel that it's suspicious that here the code blindly assumes that the cast succeeds instead of checking the previously collected knowledge and somehow reporting an error if the cast contradicts it.

However I don't understand the exact details, so this might be the correct heuristic depending on other factors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it would make sense.
It's not currently the behavior, and I think this TODO comment might relate to this subject.

// From CastValueChecker.cpp:
// TODO list:
// - It only allows one succesful cast between two types however in the wild
//   the object could be casted to multiple types.
// - It needs to check the most likely type information from the dynamic type
//   map to increase precision of dynamic casting.

I haven't looked at the details of the dynamic type tracking either, but it feels like my patch makes one baby step to the right direction. I added a test demonstrating the same issue using that checker to confirm that it also mishandles this case.

However, one important differentiating factor is that such mishandling could happen much more frequently after this patch; on the other hand, fixing this at a wider scope (including DynamicTypePropagation, CastValueChecker, and how they interact with call inlining) would be a considerable amount of work.
IDK. Maybe @haoNoQ has opinions, given he was probably there when both of these classes were developed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looks like setDynamicTypeAndCastInfo() is a very low-level primitive that blindly sets the state to whatever you think it should be. Judging by CastValueChecker's addCastTransition(), the caller is supposed to fully figure out whether the cast succeeds, judging by the rest of the dynamic type map state, the static types in the AST and in regions, and the nature of the cast, and there's no reusable utility provided for that.

In our case this means that if we know the cast fails (eg., the dynamic type is already incompatible, or a similar cast has failed on the same object previously), we should probably sink the analysis. (Ideally we'd emit a warning about it.) I'm not sure how urgent such improvement is, but it's quite likely that lack of support for failed casts may leave the cast maps in an inconsistent state which would produce more weird issues later in the analysis.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Oct 16, 2023
clang_analyzer_eval(arr1 == obj); // #check_cast_behavior_1
// expected-warning@#check_cast_behavior_1 {{TRUE}}
// expected-note@#check_cast_behavior_1 {{TRUE}}
// expected-note@#check_cast_behavior_1{{Assuming 'arr1' is equal to 'obj'}}
Copy link
Collaborator

@haoNoQ haoNoQ Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this looks like a regression. In this test, on the "cast succeeds" branch, arr1 should be known to be the same as obj.

OSDynamicCast() is a macro that implements custom RTTI in XNU. It is defined as:

   13 #define OSDynamicCast(type, inst)   \
   14     ((type *) OSMetaClassBase::safeMetaCast(inst, type::metaClass))

where safeMetaCast() is a function without visible definition that returns an OSMetaClassBase * (the most base class in the OSObject hierarchy).

Then the checker models safeMetaCast() to split the state and either "pass through" the value or return 0. But the subsequent C-style cast is technically a base-to-derived class (because safeMetaCast() doesn't return the actual type, but a base type). So it's likely that we've somehow regressed when we started actively modeling that cast.

We probably shouldn't have regressed though. Our desired behavior is the same as the old behavior: fully trust the cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at the exploded-graph before and after the patch, and the values and the branches are all the same. There is only one difference: in the new graph, after the (OSArray *) cast in init-expr of the decl, we have the type info bound: Dynamic Types:

SymRegion{reg_$0<OSObject * obj>} -> OSArray (or a sub-class)

Later, when we load for the analyzer_eval, we can see that on the branch where the safeMetaCast succeeds, arr1 and obj refers to the same lvalue, thus already proven to be equal.

image

BTW if I change the test into this, it would pass on both the old and new versions:

void check_cast_behavior(OSObject *obj) {
  OSArray *arr1 = OSDynamicCast(OSArray, obj);
  if (!arr1) return;
  // expected-note@-1 {{'arr1' is non-null}} expected-note@-1 {{Taking false branch}}
  // expected-note@-2 {{'arr1' is non-null}} expected-note@-2 {{Taking false branch}}
  clang_analyzer_eval(arr1 == obj); // expected-warning{{TRUE}} expected-note{{TRUE}}
  OSArray *arr2 = OSRequiredCast(OSArray, obj);
  clang_analyzer_eval(arr2 == obj); // expected-warning{{TRUE}} expected-note{{TRUE}}
}

I think this test would describe the behavior for the "cast-succeeds" branch, aka. the branch where arr1 is non-null. Consequently, I believe this PR would not regress this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @haoNoQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back to this comment to double check, and I can confirm what's happening.
On the baseline revision, the node for clang_analyzer_eval(arr2 == obj) has two parents:

  1. Where arr2 is known to be 0 (loc), by an eager assumption by clang_analyzer_eval(arr1 == obj).
  2. Where arr2 is known to be non-null, by the same eager assumption. However, for this we already had a message as Assuming 'arr1' is not equal to 'obj' (see the baseline test).

After this PR, the exploded graph would no longer have two parents for the given bug-report, so it can't couldn't chose the same parent node in the exploded graph when constructing the bug report. This means that on that path, it will see an eager assumption to true, and craft a piece message accordingly.

This means, that the bug report construction could have chosen the other parent for constructing the bug report, and have the same behavior as demonstrated after the PR.

The next question could be, why are the following exploded nodes not merged after this PR?
image
image

On the baseline it looks like this, (notice the two parents):
image


Anyways, this debunks the theory that this PR changes this test in any meaningful/practical ways - as far as I can see it.

@tomrittervg
Copy link
Contributor

This sounds crazy, but I think I found a bug in this patchset. I applied it on top of the 17.0.2 tag, and then ran the whole analysis on mozilla-central. I got segfaults on about 4000 executions, all with the same stack trace:

1.	<eof> parser at end of file
2.	While analyzing stack: 
	#0 Calling mozilla::FailureLatch::SetFailureFrom(const FailureLatch &) at line /home/tom/Documents/moz/static-analysis/mozilla-unified/objdir/dist/include/mozilla/BaseProfileJSONWriter.h:151:5
	#1 Calling mozilla::baseprofiler::ChunkedJSONWriteFunc::ChangeFailureLatchAndForwardState(FailureLatch &) at line /home/tom/Documents/moz/static-analysis/mozilla-unified/objdir/dist/include/mozilla/BaseProfileJSONWriter.h:465:12
	#2 Calling mozilla::baseprofiler::SpliceableChunkedJSONWriter::ChangeFailureLatchAndForwardState(FailureLatch &) at line /home/tom/Documents/moz/static-analysis/mozilla-unified/objdir/dist/include/mozilla/BaseProfileJSONWriter.h:570:5
	#3 Calling mozilla::baseprofiler::UniqueJSONStrings::ChangeFailureLatchAndForwardState(FailureLatch &)
3.	/home/tom/Documents/moz/static-analysis/mozilla-unified/objdir/dist/include/mozilla/FailureLatch.h:65:36: Error evaluating statement
4.	/home/tom/Documents/moz/static-analysis/mozilla-unified/objdir/dist/include/mozilla/FailureLatch.h:65:36: Error evaluating statement
 #0 0x00007f9378f09cb8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libLLVM-17.so+0x2782cb8)
 #1 0x00007f9378f09813 llvm::sys::CleanupOnSignal(unsigned long) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libLLVM-17.so+0x2782813)
 #2 0x00007f9378ea11fe (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #3 0x00007f9378ea13ae CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #4 0x00007f937626c520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007f937da6ed08 clang::ento::CXXInstanceCall::getRuntimeDefinition() const (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f09d08)
 #6 0x00007f937da6f038 clang::ento::CXXMemberCall::getRuntimeDefinition() const (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f0a038)
 #7 0x00007f937daa9796 clang::ento::ExprEngine::defaultEvalCall(clang::ento::NodeBuilder&, clang::ento::ExplodedNode*, clang::ento::CallEvent const&, clang::ento::EvalCallOptions const&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f44796)
 #8 0x00007f937da776ea clang::ento::CheckerManager::runCheckersForEvalCall(clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet const&, clang::ento::CallEvent const&, clang::ento::ExprEngine&, clang::ento::EvalCallOptions const&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f126ea)
 #9 0x00007f937daa7c64 clang::ento::ExprEngine::evalCall(clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNode*, clang::ento::CallEvent const&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f42c64)
#10 0x00007f937daa7a67 clang::ento::ExprEngine::VisitCallExpr(clang::CallExpr const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f42a67)
#11 0x00007f937da8d503 clang::ento::ExprEngine::Visit(clang::Stmt const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f28503)
#12 0x00007f937da8abec clang::ento::ExprEngine::ProcessStmt(clang::Stmt const*, clang::ento::ExplodedNode*) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f25bec)
#13 0x00007f937da8a9bd clang::ento::ExprEngine::processCFGElement(clang::CFGElement, clang::ento::ExplodedNode*, unsigned int, clang::ento::NodeBuilderContext*) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f259bd)
#14 0x00007f937da7bb7c clang::ento::CoreEngine::HandlePostStmt(clang::CFGBlock const*, unsigned int, clang::ento::ExplodedNode*) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f16b7c)
#15 0x00007f937da7ae62 clang::ento::CoreEngine::ExecuteWorkList(clang::LocationContext const*, unsigned int, llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2f15e62)
#16 0x00007f937dcf3206 (anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*, void>>*) AnalysisConsumer.cpp:0:0
#17 0x00007f937dce8e5d (anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) AnalysisConsumer.cpp:0:0
#18 0x00007f937d8e19d1 clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2d7c9d1)
#19 0x00007f937c420920 clang::ParseAST(clang::Sema&, bool, bool) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x18bb920)
#20 0x00007f937c43431c clang::FrontendAction::Execute() (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x18cf31c)
#21 0x00007f937c433e51 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x18cee51)
#22 0x00007f937d902e8d clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x2d9de8d)
#23 0x0000560e18adb1fe cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/clang-17+0x121fe)
#24 0x0000560e18ad418e ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#25 0x00007f937d542299 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#26 0x00007f9378ea1197 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libLLVM-17.so+0x271a197)
#27 0x00007f937c42455e clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x18bf55e)
#28 0x00007f937c3edca4 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x1888ca4)
#29 0x00007f937c3ed398 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/../lib/libclang-cpp.so.17+0x1888398)
#30 0x0000560e18ad7da2 clang_main(int, char**, llvm::ToolContext const&) (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/clang-17+0xeda2)
#31 0x0000560e18ad4d9e main (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/clang-17+0xbd9e)
#32 0x00007f9376253d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#33 0x00007f9376253e40 call_init ./csu/../csu/libc-start.c:128:20
#34 0x00007f9376253e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#35 0x0000560e18ad95d3 _start (/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/clang-17+0x105d3)

I took the smallest file, minimized it, and came up with this reproduction, which, admittedly, seems crazy to me. (What does ObjCGenerics have to do with anything - and yet, it seems to be one of a certain combination that causes the crash...)

# 5 "/home/tom/Documents/moz/staticanalysis/mozillaunified/objdir/dist/include/mozilla/FailureLatch.h"
class a {
public:
  virtual char b();
};
class c {};
class C {
protected:
  c &d();
};
class e : public c, a {
public:
  void f() { b(); }
};
class g : C {
  void h() { i().f(); }
  e &i() { return static_cast<e &>(d()); }
};

And the command:

#!/bin/sh


"/home/tom/Documents/moz/static-analysis/clang-17-2023-11-22-patches-from-tc/bin/clang-17" \
"-cc1" \
"-triple" \
"x86_64-unknown-linux-gnu" \
"-analyze" \
"-w" \
"-analyzer-checker=osx.cocoa.ObjCGenerics" \
"-x" \
"c++" \
"ProfileJSONWriter-991aed.cpp"

# Several other checker tests did not cause the error, but osx.cocoa.ObjCGenerics did.
# Although on the unminified command, removing that one check did not resolve the issue.
# So maybe it's more like certain checks cause a traversal in a way that causes the
# crash, and this is one of them...?

Again, this is these 4 patches, put atop 17.0.2 (6009708). You can even download the compiler here (in the Artifacts tab of the clang-17 job). This shows the patch additions. If I run it using 17.0.2 without the patches, it does not fail.

@choller
Copy link

choller commented Dec 13, 2023

I looked into Tom's bug report and I hit the following assertion in a debug build:

clang: /srv/repos/llvm-project/clang/lib/StaticAnalyzer/Core/DynamicType.cpp:134: clang::ento::ProgramStateRef clang::ento::setDynamicTypeAndCastInfo(clang::ento::ProgramStateRef, const clang::ento::MemRegion*, clang::QualType, clang::QualType, bool): Assertion `(CastToTy->isAnyPointerType() || CastToTy->isReferenceType()) && "DynamicTypeInfo should always be a pointer."' failed.

I was also able to greatly reduce the CPP testfile:

class c {};
class e : public c {};
class g {
  c &d();
  e &i() { return static_cast<e &>(d()); }
};

If I output the QualType information in fromTy and toTy right before the assertion, I get this:

From:
ElaboratedType 0x55be9d7264b0 'c' sugar
`-RecordType 0x55be9d726260 'class c'
  `-CXXRecord 0x55be9d7261d0 'c'
To:
ElaboratedType 0x55be9d726940 'e' sugar
`-RecordType 0x55be9d726420 'class e'
  `-CXXRecord 0x55be9d726390 'e'

What I don't understand is why these are not reference types. @steakhal do you know more here?

@steakhal
Copy link
Contributor Author

steakhal commented Dec 13, 2023

Thanks for the replies.
I'll come back to this PR once I have some time; maybe during the holidays.
Both assertions directly relate to this PR for sure.

I looked into Tom's bug report and I hit the following assertion in a debug build:

clang: /srv/repos/llvm-project/clang/lib/StaticAnalyzer/Core/DynamicType.cpp:134: clang::ento::ProgramStateRef clang::ento::setDynamicTypeAndCastInfo(clang::ento::ProgramStateRef, const clang::ento::MemRegion*, clang::QualType, clang::QualType, bool): Assertion `(CastToTy->isAnyPointerType() || CastToTy->isReferenceType()) && "DynamicTypeInfo should always be a pointer."' failed.

[...]
What I don't understand is why these are not reference types. @steakhal do you know more here?

Its because prior we unbox them I believe, which here sort of means that we drop refs and pointers I think.

@tomrittervg
Copy link
Contributor

Hm, testing these patches on the original testcase in #62663 (the one where we use statements 1B and 2B) - I don't think this patchset solves that scenario...

@steakhal
Copy link
Contributor Author

Thanks for the replies. I'll come back to this PR once I have some time; maybe during the holidays. Both assertions directly relate to this PR for sure.

I think I've just fixed the reported crash for this PR. For some reason a CastExpr does not have a reference type (getType()), even though it casts lvalue expressions - in case casting references instead of pointers). I have no clue why that's the case, but the workaround seems to work now. Added a testcase demonstrating the fix.


Hm, testing these patches on the original testcase in #62663 (the one where we use statements 1B and 2B) - I don't think this patchset solves that scenario...

I've looked into the case and I think now I see your problem.

In case we have 2B instead of 2A, we no longer have a cast to hint us the dynamic type of the object; consequently we can't follow/devirtualize the call. So, we are basically forced to conservatively evaluate the call, as if the body is not present.
This is strictly speaking correct, but also not particularly useful.

If I'm not mistaken, inheritance has "open-set" semantics, which means that in our current translation-unit we might not see all the derived classes, and could be that some derived classes are loaded only at runtime using shared libraries fox example.

What it means for us is that, by seeing a Base*, which has a pure virtual function, and that only a single class inherits transitively from Base (and defines the virtual function); we can't be really sure that by calling that virtual function it will invoke that particular function.

For example, in an other translation unit they might define a NewDerived implementing Base slightly differently and smuggle a pointer into our context; breaking local reasoning. Which means that the call to foo->OnRecvCancel(port) could potentially invoke a completely different definition that we saw in our translation unit.

If we would have seen the allocation (thus the dynamic type) of the object of which we call functions, it's fairly easy to inline the right definitions. However, when we need to guess what could be the dynamic type, its no longer that easy.

In presence of static-casts, the developer at least hinted a type, so in that case we have some justification for trusting the type there; otherwise all bets are off.


What could we do about inlining function definitions without knowing precisely the dynamic type of the object?

I think it would still make sense to assume well architected code and if only one thing derives (implements) a Base class, then it's probably safe to assume that that's the definition we want to inline. However, what should we do if multiple (N) classes implement Base? Trying each N, and basically splitting the state to (N+1) ways is not going to scale. Unless N is of course really small, like 2 or 3 at most.

To conclude, I don't think we could get the 2B case working (by inlining the Handler::OnRecvCancel(int)); but I'll think about it.
Maybe you have some thoughts @Xazax-hun.

@steakhal
Copy link
Contributor Author

To illustrate the case of my previous argument, here are two examples:
https://godbolt.org/z/5vWadfPM9

// base.h BEGIN:
class Base {
public:
  virtual int fun() const = 0;
};

class Derived1 final : public Base {
public:
  int fun() const override { return 1; }
};
// base.h END

Base *spawn(); // Defined in "secondary.cpp"

template <class T> void clang_analyzer_dump(T) {}

int main() {
    Base *p = spawn();
    int n = p->fun();
    clang_analyzer_dump(n); // conj; and never "1"
    int z = 100 / (n - 2);
    (void)z;
}

And here is the example with definition of spawn inside a different translation unit, which would lead to a division by zero bug at the definition of z.
https://godbolt.org/z/eKMWvTPe6

// secondary.cpp
#include "base.h"
class Derived2 final : public Base {
public:
  int fun() const override { return 2; }
};

Base *spawn() {
    return new Derived2();
}

@tomrittervg
Copy link
Contributor

However, what should we do if multiple (N) classes implement Base? Trying each N, and basically splitting the state to (N+1) ways is not going to scale. Unless N is of course really small, like 2 or 3 at most.

That's kind of what I imagined - try them all. The Analyzer has a built-in timeout mechanism that will come into play.

If you wanted to be fancy, there could be an obscure config option, with a default of N=3. If config value is not set, and N>3 it asserts, alerting the user to the situation. If config value is set, it limits to analyzing the first N.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Taint not being applied across class heirarchies
7 participants