Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIP 911 - Zero redundancy/overhead bytecode deployments (Discussion/Draft) #911

Closed
ivica7 opened this issue Mar 3, 2018 · 24 comments
Closed
Labels

Comments

@ivica7
Copy link

ivica7 commented Mar 3, 2018

Preamble

EIP: 911
Title: Zero-redundancy and overhead-free bytecode deployments
Author: Ivica Aracic <aracic@gmail.com>
Category: EVM
Created: 2018-03-04
Updated: 2018-03-04

Short Description

The user should not have to pay gas for deploying the code, which already exists in the DB, because the node has no extra work in this case except of setting the correct codeHash in the account, which is covered by the CREATE fee.

Motivation

At the time of the mainnet block number 5200035, there are 5,212,554 active contract instances referring to 64,561 unique bytecodes. 10,552 (16.34%) of these unique bytecodes have been deployed more than once, resulting in a redundancy of 2,934,072,376 bytes. This is 93.05% of the total byte size of all deployed contracts.

If we only consider the price of 200 gas / byte, which is calculated for storing the bytecode to the database, these redundancies have in sum a cost of 586,814,475,200 gas. This is ~7.45% of all gas ever spent in Ethereum. Moreover, this number looks even worse, if we consider the costs for tx input data size and historical instantiations of contracts, which do not exist anymore at the analysed block.

Stats are generated with https://github.com/ivica7/eth-chain-stats-collector

              EOAs: 24332129
Contract Instances: 5212554

  Unique Bytecodes
   with duplicates: 10552 / 64561 -> 16.344232586236274 %
      Bytes Wasted: 2934072376 / 3153359448 -> 93.04592211525136 %
        Gas Wasted: 586,814,475,200

Currently two patterns exist for partially solving this problem: (1) delegatecalls, and (2) extcodecopy/size.

delegatecalls allow us to deploy the code once, but again they introduce a code redundancy on its own for the forwarder contracts. Further, they have an additional run-time overhead of ~1000 gas / call.

Extcodecopy/size will reduce transaction input size, but the deployment cost of 200 gas / byte still applies even if it is copied from an existing instance.

Both patterns are helpful, however, they fail in providing an overhead-free solution for the code redundancy problem. When we consider the fact that Ethereum is targeting x100+ in transaction output (e.g. through PoS and Sharding), we have to be careful with this problem, because it will grow proportionally.

For this reason, I am opening this EIP proposal to discuss how to tackle this problem. Below, I provide one possible solution by defining CREATEFROMADDR opcode.

Possible Solution 1

Specification

Add new opcode CREATEFROMADDR(v, p, s)

  • v - value in wei sent to the new contract address
  • p - init bytecode memory address
  • s - size of the init bytecode

CEATEFROMADDR behaves like CREATE, but it interprets the return data from the init code as a contract address and sets the codeHash of this address as the codeHash of the new contract. Since there is no insert of the code to node's database required, the additional gas costs for the code Insert can be omitted. The sender pays only a flat fee of 1400 gas instead of len(code)*200.

TODO: for Constantinople also specify CRATEFROMADDR2(v, n, p, s) analogous to CREATE2.

Implementation (related to geth)

  • add a flag to evm:Create to signal which modus is used for the return data.
  • Old create-opcodes will set the flag to false
  • New create opcodes will set it to true
  • If the flag is true, evm:Create will lookup the code hash of the address returned by init code and it will set it as codeHash of the new contract. size*200 gas fee will be replaced by a flat fee of 1400 gas.

Possible Solution 2

Change CREATE2 specification to support the cloning of the codeHashExtend as described above. This opcode is not productive yet and we would not have to introduce new opcodes.

???

@ivica7 ivica7 changed the title EIP ? - Reduce gas costs for deploying runtime bytecode if it's already in the DB from an earlier deployment (Draft) EIP 911 - Reduce gas costs for deploying runtime bytecode if it's already in the DB from an earlier deployment (Draft) Mar 3, 2018
@Arachnid
Copy link
Contributor

Arachnid commented Mar 4, 2018

How do you determine if the bytecode already exists? You cannot simply check the db, as results will differ if the contract that created it has self destructed, and the node was fast synced or uses pruning.

@ivica7
Copy link
Author

ivica7 commented Mar 4, 2018

How do you determine if the bytecode already exists? You cannot simply check the db, as results will differ if the contract that created it has self destructed, and the node was fast synced or uses pruning.

Yes, you're right. I was thinking about limiting it to the state existing at the current block, but this will not be easy to implement too. I was hoping that with this simple change, we could avoid adding/changing an opcode. But it seems the most straight forward implementation would be to add a parameter to CREATE telling if the data is a contract whose code is to be copied or new bytecode.

@ivica7
Copy link
Author

ivica7 commented Mar 4, 2018

@Arachnid what do you think about this alternative:

Add following new OpCodes:

  • createFromAddress(v, p, s) - behaves like CREATE, but it interprets the return data from the init code as a contract address. The codeHash of this address is set for the new contract -> flat gas fee for runtime code deployment.
  • createFromAddress2(v, n, p, s) -> the Constantinopel-ready version

Implementation (geth):

  • add a flag to evm:Create to signal which modus is used for the return data.
  • Old create-opcodes will set the flag to false
  • New create opcodes will set it to true
  • If the flag is true, evm:Create will lookup the code hash of the address returned by init code and it will set it as codeHash of the new contract. size*200 gas fee will be replaced by a flat fee (e.g. 1400 (700+700) analogous to EXTCODESIZE/COPY)

This would be a minimal change in the implementation.
The ugly thing is that we're adding new opcodes. :-/

Btw. I am working on a script collecting stats from the trie about code duplication. With this script we can estimate how much gas we could have saved with this feature.

EDIT: moved this to the main comment.

@ivica7 ivica7 changed the title EIP 911 - Reduce gas costs for deploying runtime bytecode if it's already in the DB from an earlier deployment (Draft) EIP 911 - Reduce gas costs for deploying runtime bytecode by copying bytecodeHash from an existing contract (Draft) Mar 5, 2018
@Arachnid
Copy link
Contributor

Arachnid commented Mar 5, 2018

That approach is definitely more technically feasible.

I'm not totally convinced this is necessary given how Byzantium enables cheap delegate contracts as an alternative, though.

@ivica7
Copy link
Author

ivica7 commented Mar 5, 2018

@Arachnid yes, delegatecall helps to reduce code duplication, however, it adds infrastructure code, which also has it's own deployment costs and redundancies, and it also adds to runtime costs because of the indirection.

Finally, to be able to decide if it's worth doing, we need conrete statistics about code instantiation in mainnet. As I said, I am trying to calculate some based on the current state of the blockchain. Will report on it as soon as I have the data.

Byzantium enables cheap delegate contracts

What did Byzantium add to make delegate contracts cheaper? Do you mean reducing transaction input size with EXTCODESIZE/COPY or am I missing something?

@Arachnid
Copy link
Contributor

Arachnid commented Mar 5, 2018

@ivica7 Byzantium introduced RETURNDATASIZE and RETURNDATACOPY; before that, it wasn't possible to write a generic delegate contract because return data size could not be determined ahead of time.

@nateawelch
Copy link
Contributor

@ivica7 Check this project: https://github.com/yarrumretep/clone-factory

It's a delegatecall proxy contract and deployer where the proxy contracts are 58-62 bytes. I had written up a EIP similar to this before I had helped write clone-factory, but quickly realized that these proxy contracts were only 12k gas worth of deployed bytecode. In my opinion, that doesn't really make it worth introducing a new opcode for cloning a contracts code.

@ivica7
Copy link
Author

ivica7 commented Mar 5, 2018

@flygoing thank you for commenting. I appreciate it very much.

The clone-factory is an excellent work considering the circumstances. The code size of the proxy is small, but it's still a redundancy. Moreover, there are also runtime costs due to the overhead for DELEGATECALL + mem handling - ~ 1000 gas for every call.

At first glance it looks neglectable, but if we think large scale, and Ethereum is on it's way to it (sharding, pos, more tx/s, etc.), every little bit will count! Just like every single cycle counts in a cpu.

Let us do this naive estimate

  • all contracts are deployed via clone-factory
  • every transaction is having only one delegatecall in average

This would already mean that we're wasting around 1,000,000,000 gas / day, which is currently $2568/day, which is ~$1.000.000/year. I believe, the wasting right now is even worse than that and if you think about that Ethereum's future should be > x100 tx/s, the wasting will scale proportionally to Ethereum's tx throughput.

As I said: I am extracting stats from the mainnet right now in order to get better numbers on this. Only with these numbers a concrete decision can be made if it's worth doing it or not.

Could you please send me your EIP Link on this topic. I would like to read what the discussion was.

@tjayrush
Copy link

tjayrush commented Mar 5, 2018

@ivica7 Is there any chance that you post the code you're using to 'extract stats from the mainnet'? I'd be very interested in what that looks like. Understand if you'd prefer not to.

@ivica7
Copy link
Author

ivica7 commented Mar 6, 2018

@tjayrush the code is published at https://github.com/ivica7/eth-chain-stats-collector
More to come...

@ivica7 ivica7 changed the title EIP 911 - Reduce gas costs for deploying runtime bytecode by copying bytecodeHash from an existing contract (Draft) EIP 911 - Fully redundancy- and overhead-free bytecode deployments (Discussion/Draft) Mar 6, 2018
@ivica7 ivica7 changed the title EIP 911 - Fully redundancy- and overhead-free bytecode deployments (Discussion/Draft) EIP 911 - Zero redundancy/overhead bytecode deployments (Discussion/Draft) Mar 6, 2018
@nateawelch
Copy link
Contributor

@Ivica seeing as the average tx gas is around 50k, the 1.2k (~2%) added for clone calls. Obviously the delegatecall isn't necessary for all contracts, just for ones that are duplicate contracts. So you're looking at maybe 1%.

There are issues with creation with cheap code copy in that if a contract is SELFDESTRUCTED, they don't actually deserve the refund because the contract code can't actually be removed. So it's not a gas savings, it's just a shift of the expensiveness of the system from creating duplicated bytecode to everything else. Remember that you don't pay for transactions in gas, you pay for transactions in ETH. If you make transactions cheaper in gas but don't make the execution behind the scenes cheaper, then gas will just get slightly more expensive.

@ivica7
Copy link
Author

ivica7 commented Mar 8, 2018

@flygoing the refund on SELFDESTRUCT is mainly deserved because the storage for the contract can be removed. Deleting/adding bytecode to the DB is a single insert/delete of a key/value-pair. Already now it's stored redundancy-free as codeHash->bytecode key/value-pair. I think this is not of significance in this case.

Counted at block number 5 200 035, there are 5 212 554 contract instances (see Stats from the Motivation-Section).
5 158 545 contract instances have their code more than once deployed.
Gas wasted since genesis -> 586 814 475 200

Ok, we can do it better nowadays, we have delegatecall and extcodecopy/size and libraries like https://github.com/yarrumretep/clone-factory.

Let's calculate what would have happened, if people would have applied clone-factory from the beginning on. Based on the assumption it is only used for contracts that are deployed more than once (5 158 545 at block 5 200 035). Let's further assume, each contract has received in average 3 TXs during it's lifetime, hence 3x overhead for the delegatecall.

Overhead per contract instance -> (21000 + 32000 + 12000) for the forwarder creation + (3 x 1000) for 3 delegatecall = 68 000 gas

Wasted Gas with the clone-factory: 5 158 545 contract instances x 68 000 overhead for the clone-factory => 350 781 060 000 gas wasted. only ~40% less than without this pattern.

With EIP-911 -> 0 gas wasted.

@nateawelch
Copy link
Contributor

@Ivica AFAIK the refund on SELFDESTRUCT isn't because the storage trie of a contract can be deleted, it's because the reference of nonce+balance+codeHash+storageRoot can be deleted. Seeing as the storage can be zeroed out before SELFDESTRUCT, you can get the refund from SELFDESTRUCT even if you don't have any storage.

Also, SELFDESTRUCT doesn't mean any of that stuff can actually be deleted since it may be used in other locations, it just means the collection that creates the contracts root can be deleted (like I said above). If the code that hashes to contractHash of a SELFDESTRUCT'd contract exists in another contract (which it more than likely would be with your EIP), then it can't be deleted from the state trie and the SELFDESTRUCT refund becomes pointless to an extent since the codeHash->code in the db is largest part of a contract ignoring the storage trie.

In the end, since you're not actually saving any computation/storage for the nodes, you're not really saving any cost. You're just shifting the cost around to disincentivize SELFDESTRUCT, which realistically means a higher cost to the node operators and higher gas prices.

Wasted Gas with the clone-factory: 5 158 545 contract instances x 68 000 overhead for the clone-factory => 350 781 060 000 gas wasted. only ~40% less than without this pattern.

The clone factory is, of course, only really useful when (forwarderDeploymentCost+avereageCallsPerForwarder*gasCostPerForward) < fullContractDeploymentCost. There are a ton of contracts that benefit greatly from this, so there is actually a net benefit from these since there is less work on the nodes to store less bytecode.

@nateawelch
Copy link
Contributor

nateawelch commented Mar 8, 2018

@ivica7 Also, on top of that, the cost of the clone-factory vs your opcode isn't actually 68k vs 0. The cost of the clone-factory is mainly the 21k for tx and 32k for contract creation, which are both costs that would still exist for your opcode. The actual cost difference would be more around ~15k (only 12k for the actual bytecode)

@nootropicat
Copy link

nootropicat commented Mar 9, 2018

Imo it's only really an issue with gas costs being wrong. Deduplication can already happen on the database side, but there's no incentive to create identical contracts. In some cases a delegatecall proxy can be gas cheaper, but more expensive on the node side (with deduplication). As the GasToken project shows there's an urgent need for separate storage fees.
Not sure how to encourage code reuse while refunds exist.

@ivica7
Copy link
Author

ivica7 commented Mar 9, 2018

@flygoing about the refund: yes you're right, removing the account from the state trie is of significance too.

SELFDESTRUCT refund becomes pointless to an extent since the codeHash->code in the db is largest part of a contract

I don't agree on this. It is a large part in terms of bytes (max ~23Kb), but it's a neglectable part in terms of the data structure complexity (a single key/value entry in the database).

About the overhead calculation for the factory: yes, you're right. My fault! I counted first the costs for the instantiation of the factory, but this is neglectable, since it's only instantiated once per unique bytecode.

Overhead for deploying the factories: 350000 gas
Overhead per contract instance -> 15000 gas + (3 delegatecalls in average x 1000 gas) = 18 000 gas

Wasted Gas with the clone-factory: 350000 gas * 10 552 (repeating bytecodes) + 5 158 545 contract instances x 18 000 overhead for the clone-factory => 96 547 010 000 gas wasted. ~80% less than without this pattern

@nateawelch
Copy link
Contributor

@ivica7 There is no data complexity if a contract has no storage, in which case you still get the SELFDESTRUCT refund. If the SELFDESTRUCT was mainly due to the deletion of storage, then the refund would be proportional to how much storage is being deleted.

Again, at the end of the day, you're not making anything cheaper for the node, so you're just shifting the cost of transactions from contract creation and distributing it to everything else. I'm of the opinion that it's better to keep the costs on contract creation and not on transactions themselves.

@ivica7
Copy link
Author

ivica7 commented Mar 9, 2018

@flygoing there is no difference to the current situation! Right now, as-is, on SELFDESTRUCT, a node will only delete the entry for codeHash->bytecode if there are no other contract instances having the same codeHash. For EIP-911 there is no need for another behavior. There is also no computational overhead for the node, because if he's copying the codeHash from a living contract, he will know that there is already a bytecodeHash->bytecode entry in his DB, otherwise the source contract wouldn't work. There is no need to look it up, no need to make an insert for it, and no need to charge the user the gas for storing this code.

That's the whole point of this EIP: the user should not have to pay gas for deploying the code, which already exists in the DB, because the node has no extra work in this case except of setting the correct codeHash in the account, which is covered by the CREATE fee.

@adamskrodzki
Copy link

Any updates on that issue?
I see it is still open. Does it mean it is still under consideration?

@ivica7
Copy link
Author

ivica7 commented May 20, 2018

@adamskrodzki I didn't close it yet, because I still believe it's a good idea, however, AFAIK there are no active plans to consider adding this feature soon.

@luziusmeisser
Copy link

luziusmeisser commented Dec 17, 2020

I would love to see this implemented. The proposed solution is vastly superior over the widespread “proxy” pattern from a software-engineering perspective:

  1. For programmers coming from object-oriented languages, it is very surprising to have to bear the cost of bytecode loading every time an object is instantiated. Once a class is loaded, using the “new” keyword is expected to have costs in the same range as other function calls.
  2. Proxy contracts add a layer of indirection, thereby obfuscating the true functionality of a contract.
  3. Proxy contracts encourage upgradability. Instead, immutability should be encouraged if we really want Ethereum to be a “trust machine”, as the Economist calls it. Immutable contracts can be relied upon. Updatable contracts not so much.

@luziusmeisser
Copy link

Potentially related (this is about changing the bytecode storage layout on chain, but if we do a hard-fork for that, we could also think about deduplication):
https://ethereum-magicians.org/t/eip-2926-chunk-based-code-merkleization/4555/17

@github-actions
Copy link

There has been no activity on this issue for two months. It will be closed in a week if no further activity occurs. If you would like to move this EIP forward, please respond to any outstanding feedback or add a comment indicating that you have addressed all required feedback and are ready for a review.

@github-actions github-actions bot added the stale label Mar 22, 2022
@github-actions
Copy link

github-actions bot commented Apr 5, 2022

This issue was closed due to inactivity. If you are still pursuing it, feel free to reopen it and respond to any feedback or request a review in a comment.

@github-actions github-actions bot closed this as completed Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants