Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-call EVM memory management optimization #481

Open
chfast opened this issue Jun 21, 2022 · 10 comments
Open

Cross-call EVM memory management optimization #481

chfast opened this issue Jun 21, 2022 · 10 comments

Comments

@chfast
Copy link
Collaborator

chfast commented Jun 21, 2022

Let's consider memory "flow" for a caller running at depth d+0 calling a callee at depth d+1.

Present

This is memory buffers workflow with the maximum number of copies. But this is actually current practice for EVMC/evmone.

  1. The caller empties return data buffer R0 (its lifetime ends here).
  2. The caller calls the callee.
  3. The callee allocates EVM memory for itself M1.
  4. The callee returns output O1 (a copy of a slice of M1).
  5. The callee deallocates M1.
  6. The caller copies O1 to R0.
  7. The caller deallocates O1.

Additionally, if CALL output arguments are used the caller must copy a part of R0 to the dedicated place in its memory. There is not much we can do about this copy.

The current EVMC may eliminate M1O1 copy or O1R0 copy.

Future

What we want is:

struct memory_buffer
{
    uint8_t* ptr;
    size_t size;
    size_t capacity;
    uint8_t* return_data;
    size_t return_data_size;

    void expand(size_t new_size);
};

It can represent EVM memory and return data at the same time. And it can work like this:

  1. The caller "empties" return data buffer R0.
  2. The callee calls the callee and passes it the R0.
  3. The callee uses R0 as its EVM memory. Expands its capacity if needed.
  4. The callee returns R0 with the reference to the return data in it.
  5. The caller uses R0 as return data buffer.

The same buffer R0 is passed to every call or create the caller makes. It can grow to the value limited by the quadratic memory cost (see below).

Additionally, we can create intrusive list of memory buffers to preserve more than single buffer for "deeper" calls.

Memory size analysis

  • With 30M gas limit you can allocate ~4M of memory at depth 0.
  • But for higher depths the 63/64 rules applies so the memory size limit will be smaller.
  • Someone should model this, but keeping the chain of all buffers seems risky.

Notes

  1. The output from successful CREATE does not go to the return data buffer. The caller must remember to "empty" the buffer got from the callee in this case.
  2. The callee cannot use caller's memory spare capacity (shared buffer). The proof is trivial therefore left for the reader.
  3. This should play nicely with precompiled contracts too. E.g. identity is optional expand and single copy.
@gcolvin
Copy link

gcolvin commented Jun 25, 2022

Nice! I'm not clear whether you are suggesting a change to the protocol, or only an optimization of EVMC.

Someone should model this, but keeping the chain of all buffers seems risky.

What is the nature of the risk?

@gumb0
Copy link
Member

gumb0 commented Jun 27, 2022

I'm also confused, if the same memory_buffer is passed to callee, then callee's memory may start as non-empty, so this is the change to the protocol it seems?

@chfast
Copy link
Collaborator Author

chfast commented Jun 27, 2022

I'm also confused, if the same memory_buffer is passed to callee, then callee's memory may start as non-empty, so this is the change to the protocol it seems?

The buffer is passed to a callee as a pre-allocated memory capacity. But the logical size remains 0 at start.

@chfast
Copy link
Collaborator Author

chfast commented Jun 27, 2022

Someone should model this, but keeping the chain of all buffers seems risky.

What is the nature of the risk?

The rough estimate is that you will have a chain of 1024 memory buffers (one per each call depth level). And at each level someone can bump the buffer size to ~1MB. In total we are having 1GB of memory.

@rakita
Copy link
Collaborator

rakita commented Jul 6, 2022

This looks like a good optimization, having to share the same memory blob between calls seems natural.
List of memory buffers is probably something even better but with the cost of additional complexity, but sounds plausible.

@gumb0
Copy link
Member

gumb0 commented Jul 6, 2022

The rough estimate is that you will have a chain of 1024 memory buffers (one per each call depth level). And at each level someone can bump the buffer size to ~1MB. In total we are having 1GB of memory.

Still confused. Is it one memory_buffer per call frame or one memory_buffer for entire execution?

@chfast
Copy link
Collaborator Author

chfast commented Jul 6, 2022

It is one memory_buffer per call frame, but it can be reused for next call on the same depth. E.g. When A calls B1 and then A calls B2, A will own a single memory buffer for its callees and use it for both calls.

@rakita
Copy link
Collaborator

rakita commented Jul 8, 2022

I got a very different idea when reading this first time. Now i see that this proposal is related to return data allocations It can represent EVM memory and return data at the same time.

Idea that i got is to share EVM memory between calls.

BufferBlobs

in essence, manually implementing expansion of EVM memory across the whole call stack.

@chfast
Copy link
Collaborator Author

chfast commented Jul 8, 2022

IdeI that i got is to share EVM memory between calls.

You can make it work, but I don't think this will be efficient for EVM memory. However, you can use it for EVM stack.
I mentioned this model in

The callee cannot use caller's memory spare capacity (shared buffer). The proof is trivial therefore left for the reader.

The explanation: when you return back from C2 context to C1 the C2's return data is still in the C2 memory buffer. So C1 has to copy it because otherwise there is a risk of overwriting the data by C1 memory expansion.

@chfast
Copy link
Collaborator Author

chfast commented Apr 4, 2023

@recmo created good memory upper bound analysis for a transaction: https://2π.com/22/eth-max-mem/

I think avoiding returndata copy is not practical. Therefore, the model presented by @rakita should work great if we apply these improvements:

  • At the beginning of the transaction compute the upper bound of memory usage and expand the memory buffer to this size if needed. Then later for subcalls it is guaranteed there is enough memory allocated so you don't even have to check.
  • You can mix-in stack space in the buffer as well. Raw estimate is 32 MB of additional space (1024 * 1024 * 32B).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants