New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile-time overflow checks #2274

Closed
chriseth opened this Issue May 16, 2017 · 6 comments

Comments

4 participants
@chriseth
Contributor

chriseth commented May 16, 2017

For developers, it is much more helpful to get notified about potential arithmetic overflows already at compilation time instead of only at runtime. Furthermore, such checks might even be easier to implement than runtime checks and they will not be a breaking change. It remains to be seen how many false positives we will have.

Tasks:

  • come up with a range tracking and overflow detection rules and check about the number of false positives with existing smart contracts
  • implement range tracking
  • implement overflow detection
@roadriverrail

This comment has been minimized.

Show comment
Hide comment
@roadriverrail

roadriverrail May 30, 2017

Contributor

Notes about this-- the warning issued for potential overflow takes a very conservative approach. Anything that could overflow should issue a warning. This is to encourage contract developers to use assert( ) and control flow structures (e.g. if { }) to create guard conditions.

Contributor

roadriverrail commented May 30, 2017

Notes about this-- the warning issued for potential overflow takes a very conservative approach. Anything that could overflow should issue a warning. This is to encourage contract developers to use assert( ) and control flow structures (e.g. if { }) to create guard conditions.

@chriseth chriseth added the Focus label May 31, 2017

@chriseth

This comment has been minimized.

Show comment
Hide comment
@chriseth

chriseth Jun 22, 2017

Contributor

I spent some thoughts on how conditions could be implemented. One way would be to use an SMT solver, but most of them are based on bitvectors, and because of that, I think we should get quite poor performance. There are "smallish" libraries like this one here: https://github.com/stp/stp

Another approach would be to do this "manually", and it could work in the following way:

As we also currently have it, there would be one "Value Range" object for each expression in the AST and also for each variable at a certain context (i.e. it is updated while we walk the AST). A more general term for "Value Range" could also just be a vector of "Constraints", where a constraint for x would be x < 5, x >= 7 or x & 0x100 = 0x100. Currently, a value range consists of two constraints: min <= x and x <= max. For simplicity, let us restrict to constraints of the form a <= x <= b where a and b are both constants known at compile time.

The algorithm to check for overflows in a given function f would have the following stages:

  1. assign default constraints to local variables in f and to state variables (and perhaps also literals):
  • A literal l with value a receives the constraints a <= l <= a.
  • A variable x of uintA type which is a state variable or a function argument receives the constraints 0 <= x <= 2**A - 1.
  • A variable x of intA type which is a state variable or a function argument receives the constraints -2**(A-1) <= x <= 2**(A-1) - 1.
  • All other local variables receive the constraints 0 <= x <= 0.
  1. walk the AST statement by statement (this gets more complicated with function calls and such, but let us keep this simple for now) to update the constraints on the variables and come up with constraints for the expressions in the following way:

2.1 Compute constraints for integer-typed expressions by starting from the atoms (variables and literals) and combining the constraints. Example: If we have a <= x <= b and c <= y <= d then a + c <= x + y <= b + d. At each of these steps, if the resulting expression has an integer type, the constraints are combined conjunctively with the default constraints of the type. If that results in different constraints, flag an overflow error.

2.2 For statements of the form require(E < F) (where E and F are expressions), if (E < F) and so on, we can compute additional constraints that are valid after that point. These constraints are extended starting from the expression in the condition and moved towards the atoms, in the hope of (A) further restricting the constraints on variable (B) finding tautological conditions or (C) finding contradicting conditions. Example:

If in 2.1 we computed a <= E <= b and c <= F <= d and we have the condition E < F, we can improve the constraints on E and F in the following way: a <= E <= min(b, d - 1) and max(c, a + 1) <= F <= d. If no improvement is possible, we can stop the process. Note that in code, this could be written as constraintE = constraintE & Constraint(constraintE.min, constraintF.max - 1).

This of course is applied recursively until we can improve the constraints on a variable. If in this process, we e.g. arrive at an expression of the form E + F, we computed a <= E <= b and c <= F <= d in 2.1 and were able to improve the constraints on E + F in 2.2 to g <= E + F <= h, then we can improve the constraints on E and F in the following way:

Since we know that E + F <= h, i.e. E <= h - F <= h - c.

Also, E + F >= g, i.e. E >= g - F >= g - d.

So we can add the constraint g - d <= E <= h - c to the set of constraints for E.

Here, we assume that the addition did not overflow in the first place. Because of that, we should probably use an overflow warning of the form "This arithmetic operation might overflow. Further analysis of the code will assume that it cannot overflow."

  1. If the control flow splits, all constraints on variables have to be copied. If the control flow joins again, the constraints from the two paths have to be combined disjunctively, i.e. we either use the less constraining constraint or turn the set of constraints into a more complex object that can also handle multiple cases.

  2. For statements of the form assert(X) where we cannot "prove" that X is a tautology, we should issue a warning.

Contributor

chriseth commented Jun 22, 2017

I spent some thoughts on how conditions could be implemented. One way would be to use an SMT solver, but most of them are based on bitvectors, and because of that, I think we should get quite poor performance. There are "smallish" libraries like this one here: https://github.com/stp/stp

Another approach would be to do this "manually", and it could work in the following way:

As we also currently have it, there would be one "Value Range" object for each expression in the AST and also for each variable at a certain context (i.e. it is updated while we walk the AST). A more general term for "Value Range" could also just be a vector of "Constraints", where a constraint for x would be x < 5, x >= 7 or x & 0x100 = 0x100. Currently, a value range consists of two constraints: min <= x and x <= max. For simplicity, let us restrict to constraints of the form a <= x <= b where a and b are both constants known at compile time.

The algorithm to check for overflows in a given function f would have the following stages:

  1. assign default constraints to local variables in f and to state variables (and perhaps also literals):
  • A literal l with value a receives the constraints a <= l <= a.
  • A variable x of uintA type which is a state variable or a function argument receives the constraints 0 <= x <= 2**A - 1.
  • A variable x of intA type which is a state variable or a function argument receives the constraints -2**(A-1) <= x <= 2**(A-1) - 1.
  • All other local variables receive the constraints 0 <= x <= 0.
  1. walk the AST statement by statement (this gets more complicated with function calls and such, but let us keep this simple for now) to update the constraints on the variables and come up with constraints for the expressions in the following way:

2.1 Compute constraints for integer-typed expressions by starting from the atoms (variables and literals) and combining the constraints. Example: If we have a <= x <= b and c <= y <= d then a + c <= x + y <= b + d. At each of these steps, if the resulting expression has an integer type, the constraints are combined conjunctively with the default constraints of the type. If that results in different constraints, flag an overflow error.

2.2 For statements of the form require(E < F) (where E and F are expressions), if (E < F) and so on, we can compute additional constraints that are valid after that point. These constraints are extended starting from the expression in the condition and moved towards the atoms, in the hope of (A) further restricting the constraints on variable (B) finding tautological conditions or (C) finding contradicting conditions. Example:

If in 2.1 we computed a <= E <= b and c <= F <= d and we have the condition E < F, we can improve the constraints on E and F in the following way: a <= E <= min(b, d - 1) and max(c, a + 1) <= F <= d. If no improvement is possible, we can stop the process. Note that in code, this could be written as constraintE = constraintE & Constraint(constraintE.min, constraintF.max - 1).

This of course is applied recursively until we can improve the constraints on a variable. If in this process, we e.g. arrive at an expression of the form E + F, we computed a <= E <= b and c <= F <= d in 2.1 and were able to improve the constraints on E + F in 2.2 to g <= E + F <= h, then we can improve the constraints on E and F in the following way:

Since we know that E + F <= h, i.e. E <= h - F <= h - c.

Also, E + F >= g, i.e. E >= g - F >= g - d.

So we can add the constraint g - d <= E <= h - c to the set of constraints for E.

Here, we assume that the addition did not overflow in the first place. Because of that, we should probably use an overflow warning of the form "This arithmetic operation might overflow. Further analysis of the code will assume that it cannot overflow."

  1. If the control flow splits, all constraints on variables have to be copied. If the control flow joins again, the constraints from the two paths have to be combined disjunctively, i.e. we either use the less constraining constraint or turn the set of constraints into a more complex object that can also handle multiple cases.

  2. For statements of the form assert(X) where we cannot "prove" that X is a tautology, we should issue a warning.

@chriseth

This comment has been minimized.

Show comment
Hide comment
@chriseth

chriseth Jun 26, 2017

Contributor

I spent some more thoughts on SMT solvers and one striking feature (apart from smaller number of false positives) that SMT solvers have but the approach outlined above does not have: SMT solvers can provide counter examples (If you call this functions with inputs 1 and 7, the addition here will overflow) - which might be extremely valuable.

It seems that one of the best SMT solvers - z3 - can both be easily integrated as an external dependency via CMake (and has an MIT license) and it also supports SMT-Lib - a general SMT solver input format.

Because of that, I would like to keep the conditions and requirements as general as possible. We can evaluate the footprint of an external dependency as this and also evaluate its performance.

If the performance is too poor, but the footprint acceptable, we can first only run our manual and crude condition checker and put something like "If you want to run additional checks, use --enable-smt" into the warning. If the user supplies that switch, the SMT solver is run on the conditions.

If bot the footprint and performance is bad, we can output the SMT-Lib code and keep z3 as a fully external component.

Contributor

chriseth commented Jun 26, 2017

I spent some more thoughts on SMT solvers and one striking feature (apart from smaller number of false positives) that SMT solvers have but the approach outlined above does not have: SMT solvers can provide counter examples (If you call this functions with inputs 1 and 7, the addition here will overflow) - which might be extremely valuable.

It seems that one of the best SMT solvers - z3 - can both be easily integrated as an external dependency via CMake (and has an MIT license) and it also supports SMT-Lib - a general SMT solver input format.

Because of that, I would like to keep the conditions and requirements as general as possible. We can evaluate the footprint of an external dependency as this and also evaluate its performance.

If the performance is too poor, but the footprint acceptable, we can first only run our manual and crude condition checker and put something like "If you want to run additional checks, use --enable-smt" into the warning. If the user supplies that switch, the SMT solver is run on the conditions.

If bot the footprint and performance is bad, we can output the SMT-Lib code and keep z3 as a fully external component.

@roadriverrail

This comment has been minimized.

Show comment
Hide comment
@roadriverrail

roadriverrail Jun 27, 2017

Contributor

The good news is that this is the approach I'm taking currently, with the exception that I'm storing ranges in a hashmap associated with the AST but not in AST nodes themselves. Otherwise, my WIP is an exemplar of everything up to 2.1, though a number of operators need to be implemented still.

The entire approach you've outlined is how I'm proceeding, but it's worth noting that (3) is where gets crazy in a hurry. Consider this situation:

if (x < 0 || y > 0 || z == 0 || aa != 0) {
...
} else {
...
}

Let's say that the known ranges are sufficiently wide that any of these clauses may or may not resolve to true. When we evaluate the if { } block for overflows, it only takes one of the four clauses evaluating as true to enter the block, so we must evaluate the block for x < 0, y > 0, z == 0, aa!=0, as well as all cases where 2, 3, and 4 of these are true. We're fortunate in the else { } block because, for it to execute, we know that all those conditions must be false, so it can be evaluated directly. But, you can see how boolean AND and OR are going to balloon out the number of distinct evaluations to run.

Contributor

roadriverrail commented Jun 27, 2017

The good news is that this is the approach I'm taking currently, with the exception that I'm storing ranges in a hashmap associated with the AST but not in AST nodes themselves. Otherwise, my WIP is an exemplar of everything up to 2.1, though a number of operators need to be implemented still.

The entire approach you've outlined is how I'm proceeding, but it's worth noting that (3) is where gets crazy in a hurry. Consider this situation:

if (x < 0 || y > 0 || z == 0 || aa != 0) {
...
} else {
...
}

Let's say that the known ranges are sufficiently wide that any of these clauses may or may not resolve to true. When we evaluate the if { } block for overflows, it only takes one of the four clauses evaluating as true to enter the block, so we must evaluate the block for x < 0, y > 0, z == 0, aa!=0, as well as all cases where 2, 3, and 4 of these are true. We're fortunate in the else { } block because, for it to execute, we know that all those conditions must be false, so it can be evaluated directly. But, you can see how boolean AND and OR are going to balloon out the number of distinct evaluations to run.

@leonardoalt

This comment has been minimized.

Show comment
Hide comment
@leonardoalt

leonardoalt Apr 18, 2018

Member

@chriseth Can we close this issue?

Member

leonardoalt commented Apr 18, 2018

@chriseth Can we close this issue?

@chriseth

This comment has been minimized.

Show comment
Hide comment
@chriseth

chriseth Apr 18, 2018

Contributor

Yes, I think so.

Contributor

chriseth commented Apr 18, 2018

Yes, I think so.

@axic axic removed the Focus label Jul 30, 2018

@axic axic added this to To Do in SMT Checker MVP via automation Jul 30, 2018

@leonardoalt leonardoalt closed this Oct 8, 2018

SMT Checker MVP automation moved this from To Do to Done Oct 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment