New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about when touchedness is reverted during state clearance #716

Open
pirapira opened this Issue Sep 20, 2017 · 17 comments

Comments

Projects
None yet
3 participants
@pirapira
Member

pirapira commented Sep 20, 2017

Severity

  • not posing forking threat on the mainnet
  • causing failures in some conformance tests

Problem

The state-clearing EIPs 158 and 161 do not explicitly specify what happens to the "touched" accounts when OOG happens.

As a result, different implementations do different things. Although they can all follow the mainnet, some tests fail (for example, BlockchainTests/GeneralStateTests/stCreateTest/TransactionCollisionToEmpty_d0g1v0.json reveals a difference between geth and cpp-ethereum).

It is not desirable to keep the unspecified corner in the specification.

Possible solutions

Since no more empty accounts exist on the mainnet, the question here is how to specify what happened on the mainnet.

There are many ways to specify what happened on the mainnet.
a. address 3 is special, although otherwise touched accounts become untouched after OOGs
b. precompiled contracts are special, although otherwise touched accounts become untouched after OOGs
c. a certain transaction in a certain block is spceial, although otherwise touched accounts become untouched after OOGs
d. OOG in inner calls should keep touched accounts; OOG in the outermost call in a transaction untouches accounts.
e. OOG in the touching call and inner calls should keep touched accounts; OOG in a parent call context untouches accounts.

Proposal

The proposal here is to pick d. or e. because they do not involve magic numbers. Perhaps e. is cleaner because it does not talk about the magic "outermost call".

After seeing options, a. looks the simplest.

Steps to be taken

  • an experimental implementation of a. on cpp-ethereum is created will be tested on the mainet.
  • an EIP will be written
  • the EIP will be discussed in coredev meetings
  • test cases will follow the EIP
  • YP will follow the EIP
@holiman

This comment has been minimized.

Show comment
Hide comment
@holiman

holiman Sep 20, 2017

Contributor

Some more context

Background

At block 2675119, shortly after the planned “Spurious Dragon” hard fork, the state sweep to ‘touch’ empty account had just begun. The initial account sweeping went from 0x0 and up.
When a call to 0000000000000000000000000000000000000003 was made (tx), the CALL went Out-of-gas. The reason being that 3 is the address of the RipeMD160 precompile.
Incidentally, both RipeMD160 (0x3) and Identity (0x4) were ‘empty’ accounts created by the attacker, whereas ‘0x1’ and ‘0x2’ already contained ether and were not ‘empty’.
At this point, even though the CALL to RipeMD160 went OOG, the effects were not reverted, and both Geth and Parity deleted the account .
NB: The actual transaction did not go out of gas, only the (inner) CALL.

At block 2686351, the state sweep of ‘seeded’ accounts was in progress. During the execution of these invocations of the Sweeper.sol contract, the number of ‘touches’ to perform is varying. An invocation was made which attempted to perform 1320 CALLs within 600K gas. This was insufficient, and the transaction went out of gas. So while the touches had been performed via CALL, the outer context went OOG.

  • At this point, Parity reverted the effects of the call, thus not deleting the touched accounts
  • Geth failed to revert the effects, and deleted the touched accounts, causing consensus-issue.

As Geth implemented reversion of touch after OOG, the earlier case was discovered.

Possible solutions

I'd even split up d) into two options: either see it as topmost vs internal calls, or see it as call parent context vs call context. Consider the following scenario:

  1. Tx -> Call A -> Call B-> Call/Create "empty but existing account"
  2. Call B-> OOG
Contributor

holiman commented Sep 20, 2017

Some more context

Background

At block 2675119, shortly after the planned “Spurious Dragon” hard fork, the state sweep to ‘touch’ empty account had just begun. The initial account sweeping went from 0x0 and up.
When a call to 0000000000000000000000000000000000000003 was made (tx), the CALL went Out-of-gas. The reason being that 3 is the address of the RipeMD160 precompile.
Incidentally, both RipeMD160 (0x3) and Identity (0x4) were ‘empty’ accounts created by the attacker, whereas ‘0x1’ and ‘0x2’ already contained ether and were not ‘empty’.
At this point, even though the CALL to RipeMD160 went OOG, the effects were not reverted, and both Geth and Parity deleted the account .
NB: The actual transaction did not go out of gas, only the (inner) CALL.

At block 2686351, the state sweep of ‘seeded’ accounts was in progress. During the execution of these invocations of the Sweeper.sol contract, the number of ‘touches’ to perform is varying. An invocation was made which attempted to perform 1320 CALLs within 600K gas. This was insufficient, and the transaction went out of gas. So while the touches had been performed via CALL, the outer context went OOG.

  • At this point, Parity reverted the effects of the call, thus not deleting the touched accounts
  • Geth failed to revert the effects, and deleted the touched accounts, causing consensus-issue.

As Geth implemented reversion of touch after OOG, the earlier case was discovered.

Possible solutions

I'd even split up d) into two options: either see it as topmost vs internal calls, or see it as call parent context vs call context. Consider the following scenario:

  1. Tx -> Call A -> Call B-> Call/Create "empty but existing account"
  2. Call B-> OOG
@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Sep 20, 2017

Member

@holiman I didn't get the further split options. Is it like

  1. Tx -> Call A -> Call B-> Call/Create "empty but existing account" -> OOG
  2. Tx -> Call B-> OOG

?

Member

pirapira commented Sep 20, 2017

@holiman I didn't get the further split options. Is it like

  1. Tx -> Call A -> Call B-> Call/Create "empty but existing account" -> OOG
  2. Tx -> Call B-> OOG

?

@holiman

This comment has been minimized.

Show comment
Hide comment
@holiman

holiman Sep 20, 2017

Contributor
Tx 
    Calls A
       which Calls B
            Which calls (or Creates into) an "empty but existing account". This is a touch. 
            And then, after the call/touch is done, goes OOG.
       but A has not gone OOG, but returns orderly

The question is if the distinction is between:

  1. Topmost calls versus inner calls
  2. Parent context versus child context. (Could be topmost vs inner, but also inner1 vs inner2)
Contributor

holiman commented Sep 20, 2017

Tx 
    Calls A
       which Calls B
            Which calls (or Creates into) an "empty but existing account". This is a touch. 
            And then, after the call/touch is done, goes OOG.
       but A has not gone OOG, but returns orderly

The question is if the distinction is between:

  1. Topmost calls versus inner calls
  2. Parent context versus child context. (Could be topmost vs inner, but also inner1 vs inner2)
@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Sep 20, 2017

Member

Am I getting the words parent and child correct? In your example, Calls A and Calls B are both parent contexts. Calls B and calls (or Creates into) an "empty but existing account" are child contexts. The OOG happens in Call B, which is at the same time a parent context and a child context.

Member

pirapira commented Sep 20, 2017

Am I getting the words parent and child correct? In your example, Calls A and Calls B are both parent contexts. Calls B and calls (or Creates into) an "empty but existing account" are child contexts. The OOG happens in Call B, which is at the same time a parent context and a child context.

@holiman

This comment has been minimized.

Show comment
Hide comment
@holiman

holiman Sep 20, 2017

Contributor

Perhaps I should've examplified with A->B->....M->N . So the outer context runs fine, and so do all the child contexts, except for the last parent-contex (M). Or B could go OOG for that matter .

You wrote:

OOG in inner calls should keep touched accounts; OOG in the outermost call in a transaction untouches accounts.

An alternative could be
OOG in inner calls should keep touched accounts; OOG in a parent call context untouches accounts.

I'm not saying it should be, just that it's an option. Maybe. These things are so confusing. 😕

Edit: and yes, I think you understood my intended use of the words correctly above.

Contributor

holiman commented Sep 20, 2017

Perhaps I should've examplified with A->B->....M->N . So the outer context runs fine, and so do all the child contexts, except for the last parent-contex (M). Or B could go OOG for that matter .

You wrote:

OOG in inner calls should keep touched accounts; OOG in the outermost call in a transaction untouches accounts.

An alternative could be
OOG in inner calls should keep touched accounts; OOG in a parent call context untouches accounts.

I'm not saying it should be, just that it's an option. Maybe. These things are so confusing. 😕

Edit: and yes, I think you understood my intended use of the words correctly above.

@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Sep 20, 2017

Member

I was trying to understand your alternative. If I can rephrase,

OOG during touching is still touching, but an OOG in a shallower execution frame untouches.

This one sounds less arbitrary because it doesn't rely on "the outermost level" (nothing else in EVM is aware of "the outermost level" I guess).

Member

pirapira commented Sep 20, 2017

I was trying to understand your alternative. If I can rephrase,

OOG during touching is still touching, but an OOG in a shallower execution frame untouches.

This one sounds less arbitrary because it doesn't rely on "the outermost level" (nothing else in EVM is aware of "the outermost level" I guess).

@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Sep 20, 2017

Member

So edited the description of the issue.

Member

pirapira commented Sep 20, 2017

So edited the description of the issue.

@holiman

This comment has been minimized.

Show comment
Hide comment
@holiman

holiman Sep 21, 2017

Contributor

Also, a comment about option b. precompiled contracts are special, although otherwise touched accounts become untouched after OOGs.

This may be a bit hairy to implement, depending on where the 'fix' is applied; but potentially the journal would need to know what addresses are precompiles (which depends on blocknumber). It would be possible to state another rule:

  • Addresses < 0x4 are special. A 'touch' on <0x4 is never rolled back.

One added complication to bear in mind, is that with the new precompiles, we have new errors (point not on curve etc), which supposedly behaves like OOG. If we were to use a b-like option, perhaps we should avoid phrasing the rule in terms of OOG, if possible.

Contributor

holiman commented Sep 21, 2017

Also, a comment about option b. precompiled contracts are special, although otherwise touched accounts become untouched after OOGs.

This may be a bit hairy to implement, depending on where the 'fix' is applied; but potentially the journal would need to know what addresses are precompiles (which depends on blocknumber). It would be possible to state another rule:

  • Addresses < 0x4 are special. A 'touch' on <0x4 is never rolled back.

One added complication to bear in mind, is that with the new precompiles, we have new errors (point not on curve etc), which supposedly behaves like OOG. If we were to use a b-like option, perhaps we should avoid phrasing the rule in terms of OOG, if possible.

@holiman

This comment has been minimized.

Show comment
Hide comment
@holiman

holiman Sep 21, 2017

Contributor

Parity uses precompiles are special, option b.

Contributor

holiman commented Sep 21, 2017

Parity uses precompiles are special, option b.

@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Sep 22, 2017

Member

@holiman figured cpp does a very complicated thing. It's so complicated that I don't want to describe it.

Member

pirapira commented Sep 22, 2017

@holiman figured cpp does a very complicated thing. It's so complicated that I don't want to describe it.

@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Sep 26, 2017

Member

I implemented a. in cpp-ethereum and now trying to sync the mainnet. https://github.com/ethereum/cpp-ethereum/tree/geth-style-cleanup-rule

Member

pirapira commented Sep 26, 2017

I implemented a. in cpp-ethereum and now trying to sync the mainnet. https://github.com/ethereum/cpp-ethereum/tree/geth-style-cleanup-rule

@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Oct 17, 2017

Member

On https://github.com/ethereum/cpp-ethereum/tree/experiment-geth-rule (revision 0a90a234579e1f111d0cf1b6df9def609193168d) I get

  �  15:31:52.844|eth
                                                                                
                                                                                
    Import Failure     InvalidReceiptsStateRoot                                 
                                                                                
                       Bad Block #02675119.8ca3da34��                      
                                                                                
  �                                                                             
�  �  15:31:52.844|eth  #4dc73a02�� : Unknown parent #8ca3da34��
  �  15:31:52.845|eth  ODD: Import queue contains block with unknown parent.
�  �  15:31:52.845|eth  #a773968e�� : Unknown parent #4dc73a02��
  �  15:31:52.846|eth  ODD: Import queue contains block with unknown parent.
�  �  15:31:52.846|eth  #749df97d�� : Unknown parent #a773968e��
  �  15:31:52.847|eth  ODD: Import queue contains block with unknown parent.
Member

pirapira commented Oct 17, 2017

On https://github.com/ethereum/cpp-ethereum/tree/experiment-geth-rule (revision 0a90a234579e1f111d0cf1b6df9def609193168d) I get

  �  15:31:52.844|eth
                                                                                
                                                                                
    Import Failure     InvalidReceiptsStateRoot                                 
                                                                                
                       Bad Block #02675119.8ca3da34��                      
                                                                                
  �                                                                             
�  �  15:31:52.844|eth  #4dc73a02�� : Unknown parent #8ca3da34��
  �  15:31:52.845|eth  ODD: Import queue contains block with unknown parent.
�  �  15:31:52.845|eth  #a773968e�� : Unknown parent #4dc73a02��
  �  15:31:52.846|eth  ODD: Import queue contains block with unknown parent.
�  �  15:31:52.846|eth  #749df97d�� : Unknown parent #a773968e��
  �  15:31:52.847|eth  ODD: Import queue contains block with unknown parent.
@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Oct 17, 2017

Member

I can reproduce this. Interestingly, under gdb, the special case about address 3 is not hit.

Member

pirapira commented Oct 17, 2017

I can reproduce this. Interestingly, under gdb, the special case about address 3 is not hit.

@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Oct 17, 2017

Member

Wow, cpp-ethereum doen't even touch it.

Member

pirapira commented Oct 17, 2017

Wow, cpp-ethereum doen't even touch it.

@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Oct 17, 2017

Member

I modified https://github.com/ethereum/cpp-ethereum/tree/experiment-geth-rule and passed block 2675119.

  • check if the same version can pass block 2686351
  • restart the same version from the gnesis block up to the tip of the mainnet
Member

pirapira commented Oct 17, 2017

I modified https://github.com/ethereum/cpp-ethereum/tree/experiment-geth-rule and passed block 2675119.

  • check if the same version can pass block 2686351
  • restart the same version from the gnesis block up to the tip of the mainnet
@pirapira

This comment has been minimized.

Show comment
Hide comment
@pirapira

pirapira Jan 19, 2018

Member

I saw ldb errors and I couldn't sync further. Maybe the problem might have been fixed in cpp-ethereum. I'll rebase and try again.

Member

pirapira commented Jan 19, 2018

I saw ldb errors and I couldn't sync further. Maybe the problem might have been fixed in cpp-ethereum. I'll rebase and try again.

@germsvel

This comment has been minimized.

Show comment
Hide comment
@germsvel

germsvel Oct 15, 2018

@holiman @pirapira sorry to revisit this after so much time has passed. I am trying to implement some code for failed_tx_xcf416c53 common test which deals with this issue.

Was consensus every reached as to whether option a (address 3 is special) or b (all precompiled contracts are special) is preferred?

germsvel commented Oct 15, 2018

@holiman @pirapira sorry to revisit this after so much time has passed. I am trying to implement some code for failed_tx_xcf416c53 common test which deals with this issue.

Was consensus every reached as to whether option a (address 3 is special) or b (all precompiled contracts are special) is preferred?

germsvel added a commit to poanetwork/mana that referenced this issue Oct 16, 2018

Touch RIPEMD account on failure
This fixes GeneralStateTests/stSpecialTest/failed_tx_xcf416c53 in state
tests for SpuriousDragon and Byzantium, and it fixes it in blockchain
tests for SpuriousDragon. Strangely, it does not solve it for Byzantium.

Description
===========

During the empty account cleanup that ensued after the Spurious Dragon
hardfork, a consensus issue arose. Geth was not reverting removed
empty accounts (un-deleting them as it where) when a call was OOG.
Parity was reverting the removal.

During cleanup, it was discovered that there was one more idiosyncrasy:
both Geth and Parity had failed to revert the clean up of an empty
account when an inner call was out-of gas. The account in question was
account 3, the precompiled contract RIPEMD.

Since both Geth and Parity had removed the account and had not reverted
that, it was deemed that it should remain that way. So the
`failed_tx_xcf416c53` test is testing that very thing.

More details
============

If more details are needed, please see:

- The announcement of the consensus bug:
https://blog.ethereum.org/2016/11/25/security-alert-11242016-consensus-bug-geth-v1-4-19-v1-5-2/

In there you can see,

> Details: Geth was failing to revert empty account deletions when the
transaction causing the deletions of empty accounts ended with an
out-of-gas exception. An additional issue was found in Parity, where the
Parity client incorrectly failed to revert empty account deletions in a
more limited set of contexts involving out-of-gas calls to precompiled
contracts; the new Geth behavior matches Parity’s, and empty accounts
will cease to be a source of concern in general in about one week once
the state clearing process finishes.

- This helpful comment by Holiman in EIP 716:
ethereum/EIPs#716 (comment)

- Geth introducing fix:
ethereum/go-ethereum@12d654a#diff-2433aa143ee4772026454b8abd76b9ddR93

- Aleth implementation of fix:
https://github.com/ethereum/aleth/blob/8daa6e1c56e874e88e894d96f89f1ff8c9412a63/libethereum/Executive.cpp#L314

germsvel added a commit to poanetwork/mana that referenced this issue Oct 16, 2018

Touch RIPEMD account on failure (#509)
This fixes GeneralStateTests/stSpecialTest/failed_tx_xcf416c53 in state
tests for SpuriousDragon and Byzantium, and it fixes it in blockchain
tests for SpuriousDragon. Strangely, it does not solve it for Byzantium.

Description
===========

During the empty account cleanup that ensued after the Spurious Dragon
hardfork, a consensus issue arose. Geth was not reverting removed
empty accounts (un-deleting them as it where) when a call was OOG.
Parity was reverting the removal.

During cleanup, it was discovered that there was one more idiosyncrasy:
both Geth and Parity had failed to revert the clean up of an empty
account when an inner call was out-of gas. The account in question was
account 3, the precompiled contract RIPEMD.

Since both Geth and Parity had removed the account and had not reverted
that, it was deemed that it should remain that way. So the
`failed_tx_xcf416c53` test is testing that very thing.

What's our fix?
===========

Since we implicitly revert on a call failure by simply not returning a new sub 
state or exec env, we modify the call failure to return a new sub state with 
the account touched if it is the RIPEMD contract. It is an ugly fix for this, but 
I am unsure as to how else to handle it without changing the logic of how we 
revert state on a call failure.

More details
============

If more details are needed, please see:

- The announcement of the consensus bug:
https://blog.ethereum.org/2016/11/25/security-alert-11242016-consensus-bug-geth-v1-4-19-v1-5-2/

In there you can see,

> Details: Geth was failing to revert empty account deletions when the
transaction causing the deletions of empty accounts ended with an
out-of-gas exception. An additional issue was found in Parity, where the
Parity client incorrectly failed to revert empty account deletions in a
more limited set of contexts involving out-of-gas calls to precompiled
contracts; the new Geth behavior matches Parity’s, and empty accounts
will cease to be a source of concern in general in about one week once
the state clearing process finishes.

- This helpful comment by Holiman in EIP 716:
ethereum/EIPs#716 (comment)

- Geth introducing fix:
ethereum/go-ethereum@12d654a#diff-2433aa143ee4772026454b8abd76b9ddR93

- Aleth implementation of fix:
https://github.com/ethereum/aleth/blob/8daa6e1c56e874e88e894d96f89f1ff8c9412a63/libethereum/Executive.cpp#L314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment