Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose RETURNDATACOPY and RETURNDATASIZE. #211

Merged
merged 15 commits into from
Dec 1, 2017
68 changes: 68 additions & 0 deletions EIPS/returndatacopy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
## Preamble

EIP: 211
Title: New opcodes: RETURNDATASIZE and RETURNDATACOPY
Author: Christian Reitwiessner <chris@ethereum.org>
Type: Standard Track
Category Core
Status: Draft
Created: 2017-02-13
Requires:
Replaces: 5/8


## Simple Summary

A mechanism to allow returning arbitrary-length data inside the EVM has been requested for quite a while now. Existing proposals always had very intricate problems associated with charging gas. This proposal solves the same problem while at the same time, it has a very simple gas charging mechanism and reqires minimal changes to the call opcodes. Its workings are very similar to the way calldata is handled already: After a call, return data is kept inside a virtual buffer from which the caller can copy it (or parts thereof) into memory. At the next call, the buffer is overwritten. This mechanism is 100% backwards compatible.

## Abstract

Please see summary.

## Motivation

In some situations, it is vital for a function to be able to return data whose length cannot be anticipated before the call. In principle, this can be solved without alterations to the EVM, for example by splitting the call into two calls where the first is used to compute only the size. All of these mechanisms, though, are very expensive in at least some situations. A very useful example of such a worst-case situation is a generic forwarding contract: A contract that takes call data, potentially makes some checks and then forwards it as is to another contract. The return data should of course be transferred in a similar way to the original caller. Since the contract is generic and does not know about the contract it calls, there is no way to determine the size of the output without adapting the called contract accordingly or trying a logarithmic number of calls.

Compiler implementors are advised to reserve a zero-length area for return data if the size of the return data is unknown before the call and then use `RETURNDATACOPY` in conjunction with `RETURNDATASIZE` to actually retrieve the data.

Note that this proposal also makes the EIP that proposes to allow to return data in case of an intentional state reversion (EIP [206](https://github.com/ethereum/EIPs/pull/206)) much more useful. Since the size of the failure data might be larger than the regular return data (or even unknown), it is possible to retrieve the failure data after the CALL opcode has signalled a failure, even if the regular output area is not large enough to hold the data.

## Specification

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this EIP specify the starting block somehow, or not?

Add two new opcodes and amend the semantics of any opcode that creates a new call frame (like `CALL`, `CREATE`, `DELEGATECALL`, ...) called call-like opcodes in the following. It is assumed that the EVM (to be more specific: an EVM call frame) has a new internal buffer of variable size, called the return data buffer. This buffer is created empty for each new call frame. Upon executing any call-like opcode, the buffer is cleared (its size is set to zero). After executing a call-like opcode, the complete return data (or failure data, see EIP [206](https://github.com/ethereum/EIPs/pull/206)) of the call is stored in the return data buffer (of the caller), and its size changed accordingly. As an exception, `CREATE` is considered to return the empty buffer in the success case and the failure data in the failure case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception for CREATE is also true for CREATE2, I guess


As an optimization, it is possible to share the return data buffer across call frames because only one will be non-empty at any time.

`RETURNDATASIZE`: `0xd`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this opcode number? It creates gap after SIGNEXTEND 0xb.

Copy link
Member

@axic axic May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be after EXTCODECOPY as that block contains call related lookup. That is 0x3d and 0x3e.


Pushes the size of the return data buffer onto the stack.
Gas costs: 2 (same as `CALLDATASIZE`)

`RETURNDATACOPY`: `0xe`

This opcode has similar semantics to `CALLDATACOPY`, but instead of copying data from the call data, it copies data from the return data buffer. If the return data buffer is accessed beyond its size, it is considered to be filled with zeros.

ALTERNATIVE: If the return data is accessed beyond its size, results in failure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's time to make a decision on this. @Arachnid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the principle of laziness, I prefer zero-filling because that's what CALLDATACOPY does now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a strong advocate of throwing; just because we made the mistake of failing silently earlier doesn't mean we should persist that into new opcodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a test case about this https://github.com/ethereum/tests/pull/168/files#diff-eabb766f15e78eafc24bb31a1d66cefc and I'll follow what's chosen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least I want to hear opinions, so I added an item in the coredev meeting agenda: https://github.com/ethereum/tests/pull/168/files#diff-eabb766f15e78eafc24bb31a1d66cefc

Copy link
Member

@pirapira pirapira May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Arachnid If it's a throw, I need a specification whether an out-of-bounds size-zero reading should throw (the offset being out-of-bounds might be already exception-worthy; or not when no reads are attempted). Sorry, the ALTERNATIVE is already clear about this. Zero-size reads do not lead to exceptions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for zero-filled for consistency

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 consistency. mistakes are made worse by inconsistent, partial fixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkpar @gavofyork Can either of you suggest a plausible situation in which the inconsistency would lead to worse consequences than throwing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A determination has been made in the core developer call on 6/2/2017 that we will go ahead and go with the throwing approach rather than filling with zeros. This decision was made amongst the clients participating in the call, including geth and parity clients who re-iterated their stances and came to the resolution of using throw and pyethereum client who did not have an opinion on the subject.


Gas costs: `3 + 3 * ceil(amount / 32)` (same as `CALLDATACOPY`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need something like this:

In a machine state, the return data of the previous call is maintained as follows. When a new machine state is launched, the return data of the previous call is defined to be the empty byte sequence. When the program counter reaches CALL, CREATE, CALLCODE, DELEGATECALL or STATICCALL, the return data of the previous call is reset to the empty byte sequence. When this instruction gives return data, the resultant data becomes the the return data of the previous call.

Especially, it's currently impossible to guess CREATE counts as a previous call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the following scenario:

  • Call foo()
    • Call bar() -> returns 42
    • RETURNDATA is now 42
    • Error (e.g. oog or invalid jump)
  • What does RETURNDATA give now? Was it cleared when going up a level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holiman I'm not sure I fully understand. In your example, the call to the foo contract signals a failure, correct? The RETURNDATA is always cleared when going up a level unless the call frame returns data using return or revert. In that case, it is set to that data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cleared at Call foo() and stays empty, regardless of what happens in other call stacks. It is not cleared when going up. The RETURNDATA in different machine states do not interfere with each other.

In this scenario, at least two machine states are involved. The machine state that calls foo() and the machine state that calls bar(). The machine state that calls foo() has RETURNDATA reset at Call foo().

In the Yellow Paper (9.4.1. "The Machine State"), a machine state is defined to be a tuple containing the program counter.

Copy link
Member

@pirapira pirapira Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseth, have you changed your answer to my old question:

Does "previous call" mean "previous call in the current transaction" or "previous call in the current message call/contract creation"?

Previous call made from the current call frame, i.e. the EVM execution that shares the same memory with the current executing opcode - not sure if there is a proper name for that somewhere.

Now your description reads as if the RETURNDATA buffer belongs to the transaction, not to the machine state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pirapira it is kind of a different viewpoint on the same thing. As mentioned in another comment, over all call stack frames, only one return data buffer has nonzero size at any point in time. Because of that, you can also think of a single return data buffer for the whole transaction. But I think that viewpoint (one buffer for the whole transaction) might just be useful for implementations. The specification is probably easier to understand when talking about one buffer per call stack frame.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of that, you can also think of a single return data buffer for the whole transaction

It was in that mode of thinking that my question about the clearing above came about. Ok!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested because I want to know if I should change the formulation in YP ethereum/yellowpaper#264 (currently a new buffer is added to the machine state; adding a transaction-wide buffer is also doable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriseth OK. I'll try to follow your choice in the EIP text.

## Rationale

Other solutions that would allow returning dynamic data were considered, but they all had to deduct the gas from the call opcode and thus were both complicated to implement and specify ([5/8](https://github.com/ethereum/EIPs/issues/8)). Since this proposal is very similar to the way calldata is handled, it fits nicely into the concept. Furthermore, the eWASM architecture already handles return data in exactly the same way.

Note that the EVM implementation needs to keep the return data until the next call or the return from the current call. Since this resource was already paid for as part of the memory of the callee, it should not be a problem. Implementations may either choose to keep the full memory of the callee alive until the next call or copy only the return data to a special memory area.

Keeping the memory of the callee until the next call-like opcode does not increase the peak memory usage in the following sense: Any memory allocation in the caller's frame that happens after the return from the call can be moved before the call without a change in gas costs, but will add this allocation to the peak allocation.

The number values of the opcodes were allocated in the same nibble block that also contains `CALLDATASIZE` and `CALLDATACOPY`.

## Backwards Compatibility

This proposal introduces two new opcodes and stays fully backwards compatible apart from that.

## Test Cases

## Implementation

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).