Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Overly eager assumption combinations before/after invalidation #53338

Open
steakhal opened this issue Jan 21, 2022 · 8 comments
Open

Comments

@steakhal
Copy link
Contributor

Imagine this example: https://godbolt.org/z/reavrz7vq

#include <cassert>
struct Entry {};
extern Entry *entries;
extern int num_entries;
bool entry_compare(const Entry *p, const Entry *q);
void *memcpy(void * dest, const void * src, unsigned long n);

const Entry *find(const Entry *p) {
  assert(p && "should be non-null");
  assert(entries && "should be initialized");
  for (int i = 0; i < num_entries; ++i)
    if (entry_compare(&entries[i], p))
      return &entries[i];
  // expected-note@-1 {{Returning pointer, which participates in a condition later}}
  return nullptr;
}

void checked_insert(const Entry *src) {
  // checking invariant: 'src' should not be present in the 'entries' table.
  // expected-note@+2 {{Taking false branch}}
  // expected-note@+1 {{Assuming pointer value is null}}
  if (find(src))
    return; // error: item already present

  Entry *dst = &entries[num_entries++];
  // expected-note@-1 {{'dst' initialized to a null pointer value}}
  // But how can it be null?

  memcpy(dst, src, sizeof(Entry));
  // expected-warning@-1 {{Null pointer passed as 1st argument to memory copy function [unix.cstring.NullArg]}}
}

IMO this is a false-positive, since we take the address of the lvalue entries[i], and the address of a valid object is never null.
Currently, within the analyzer, we model reference locations the same way as pointers.
I understand the motivation and I can see the rationale behind it, but when I see a FP like this I question myself.

I tried experimenting with constraining the address of the lvalue to be non-null, but there are several test cases expecting "null references" to be a thing, but it would break a lot of tests, which explicitly test for the existence of "null references".


What should we do about this?
The standard feels fuzzy about this: ISO C11 6.5.3.2 Address and indirection operators:

The unary & operator yields the address of its operand. If the operand has type ‘‘type’’,
the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated
and the result is as if both were
omitted, except that the constraints on the operators still apply and the result is not an
lvalue. [...]

In C++, we could query this by using constexpr evaluator like here: https://godbolt.org/z/5dx7rhjs3
By which we can conclude that &*(p + i) is a valid expression (aka. ha no UB) if p is nullptr and i is 0, but invalid if i is anything else, e.g. 1.

To wrap this up, I'm not sure how to go about this, but seeing examples like the first one I wish "null references" were not a thing.
WDYT? @haoNoQ @martong @ASDenysPetrov

@haoNoQ
Copy link
Collaborator

haoNoQ commented Jan 21, 2022

This is actually a much simpler case. It has nothing to do with lvalues and references and such.

See the following note:

<source>:12:9: note: Value assigned to 'entries'
    if (entry_compare(&entries[i], p))

This function invalidates globals, regardless of whether &entries is passed in or not. In particular, the prior assert(entries) is no longer in effect.

Then later:

<source>:22:7: note: Assuming pointer value is null
  if (find(src))

... the value produced during invalidation is assumed to be null, which is perfectly reasonable.

So this is probably a false positive but the reason is the loss of information during invalidation of globals, combined with our policy to eagerly make state splits on every if-statement. Simulation of every operation is perfectly fine.

You can confirm my observation by putting assert(entries) after entry_compare():

  for (int i = 0; i < num_entries; ++i)
    if (entry_compare(&entries[i], p)) {
      assert(entries);  // <===
      return &entries[i];
    }

and the warning disappears.

@haoNoQ
Copy link
Collaborator

haoNoQ commented Jan 21, 2022

So I guess the question remains whether we should have warned earlier in this case, i.e. at operator [] inside entries[num_entries++] rather than at dereference. IIRC, &*nullptr is benign most of the time. In fact, C programmers often used to simulate offsetof this way: &(((struct S *)nullptr)->field) is the offset of the field. So if we were to warn in such cases we'd be flagging a lot of code that works just fine. In C++ it's also problematic if you assign a null pointer to a reference, that's immediate UB even if the reference is not used, and we do have a warning for that (though I don't think it covers all the cases, I think it only covers returning null reference from a function).

@haoNoQ
Copy link
Collaborator

haoNoQ commented Jan 21, 2022

But yeah, regardless of how we treat operator [], we would have warned in this code due to invalidation, it's only a matter of where exactly would we warn.

(Sry for the noise!)

@steakhal
Copy link
Contributor Author

Oh, thanks for the detailed explanation! If this is the case, then we should have some notes about invalidation. What do you think, could we have a note when encountering serious information loss, such as invalidations?

@haoNoQ
Copy link
Collaborator

haoNoQ commented Jan 21, 2022

As of today my best plan to combat this issue is to outright suppress bug reports that assume unusual things happening during invalidation. In particular, I think the following sequence of events can be banned:

  1. Assuming X is true,
  2. X is indirectly invalidated,
  3. Assuming X is false.

I.e., specifically when invalidation causes seemingly contradictory choices to be made on two similar expressions, BR.markInvalid() seems appropriate. This is the case in our case here as well: first entries is assumed to be non-null, then it's invalidated, then it's assumed to be null (and even actively participates in the report later!).

The alternative of explaining the very concept of invalidation to the users seems daunting to me. It's educational at best but we're still mostly making excuses. It sounds much more reasonable to actively search for a different path to the same bug that doesn't contain any surprises. Which is exactly what BR.markInvalid() is about.

@steakhal
Copy link
Contributor Author

I like it. I guess we need and PoC implementation to experiment with it. It seems like something worth investigating. We will see if we can put it on the roadmap. Thanks for the discussion. I'll close this ticket in a week, to let others to join to the discussion.

@steakhal
Copy link
Contributor Author

I and @martong thought about the implementation possibilities and the easiest we could come up with is this:
Expressions that play important role in the bugreport, are marked interesting by the end of the visitation. So, we have a collection of these Exprs. Such interesting expressions should not involve contradicting constraints, even if invalidation happens. Let's say due to invalidation the Environment binding Expr1 -> sym1 gets removed, and subsequently at a later point the Expr1 -> sym2 gets introduced by a different program point; we could modify the steps to not only remove but also bind a fresh symbol eagerly. By doing so, we could 'remember' the connection of those two symbols: sym1 and sym2. This won't change the semantics of the invalidation process, since the symbol sym2 is unrelated to sym1. Although, this might have an impact on memory consumption.

We could later use the connection between sym1 and sym2 if it turns out the expression Expr1 is interesting. We could add the sym1 == sym2 constraint to the constraint set being crosschecked by Z3 in the FalsePositiveRefutationBRVisitor.
This way if there were contradicting assumptions across sym1 and sym2, we would suppress those. Non-interesting expressions having contradicting assumptions would not cause interference, in which case the bug report is likely unrelated to the contradiction so we should still emit the report.

WDYT @haoNoQ?

@haoNoQ
Copy link
Collaborator

haoNoQ commented Jan 26, 2022

Let's say due to invalidation the Environment binding Expr1 -> sym1 gets removed

Invalidation never removes Environment bindings. Environment contains values of expressions that have already been computed. If you've computed that 2+2=4, it wouldn't suddenly become 5 just because some global variable changed value. Only future loads from that variable are affected.

and subsequently at a later point the Expr1 -> sym2 gets introduced by a different program point

It'll most likely be a different expression. I.e., in our motivating example

if (a) { ... } // taking true branch
foo(); // variable 'a' invalidated
if (a) { ... } // taking false branch

expressions a and a aren't the same expression.

I suggest focusing on Store bindings instead. I.e., if a reg_$0<R> and a derived_$2<conj_$1, R> of the same region R have non-overlapping constraints, become suspicious at the earliest moment of time where the value of the region R is derived_$2<conj_$1, R> (i.e., the node where it stops being so while moving from bottom to top with a bug visitor - that'd be the moment of time when conj_$1 is introduced, which is hopefully an invalidation event).

@haoNoQ haoNoQ changed the title [analyzer] Can the address of an lvalue ever be null? [analyzer] Overly eager assumption combinations before/after invalidation Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants