Skip to content

Optimistic Relay #285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

michaelneuder
Copy link
Collaborator

@michaelneuder michaelneuder commented Feb 19, 2023

NOTE: This is the amended version of the optimistic relay from #279. We made minor changes to the database schema and how we determine if a winning block was invalid in getPayload. These changes are reflected in the description below.

NOTE 2: We also wrote https://github.com/michaelneuder/opt-relay-docs/blob/main/proposal.md, which aims to motivate the utility of the optimistic relay. That document accompanies this PR and answers the What? & Why? of the project, while this PR answers How?

📝 Summary

tl;dr; This PR implements an "optimistic" version of Flashbots' mev-boost-relay. The core idea is that we can improve the throughput and latency of block submissions by using asynchronous processing. Instead of validating each block that comes in immediately, we return early and mark the block as eligible to win the auction before checking its validity. This allows builders to submit more blocks and critically, submit blocks later in the time period of the slot. Later blocks will have more MEV and thus a higher probability of winning the block auction. All blocks sent to the relay are still validated, and only a single slot at a time is being processed optimistically. In practice we expect that the first few milliseconds of Slot n will handle the remaining Slot n-1 optimistic blocks, which may have been received very close to the slot boundary. Proposers still use the Proposer API to select the highest bid, but there is no longer a guarantee that this block is valid (because we may not have had time to asynchronously validate it). To submit blocks in the optimistic mode, a builder must put up a collateral that is greater than the value of the blocks they are proposing. If a proposer ends up signing an invalid block, collateral from the builder of that block will be used to refund the proposer for the missed slot.

📚 References

This joint work from Justin Drake, AlphaMonad, and I is a revision of michaelneuder#2. Many thanks to Chris Hager, Alex Stokes, Mateusz Morusiewicz, and Builder0x69 for feedback and support thus far!!

⛱ Motivation and Context

The changes can be described through 3 sequences:

  1. Submitting optimistic blocks.
  2. Validating optimistic blocks.
  3. Proposing optimistic blocks.

1. Submitting optimistic blocks.

Screen Shot 2023-02-19 at 12 36 23 PM

  1. Block builders submit blocks to the Builder API endpoint of the relay.
  2. Based on the collateral and status of the builder:
    a. if the builder collateral is greater than the value of the block and the builder is optimistic, run the block simulation optimistically in a different goroutine.
    b. the optimistic block processor adds 1 to the optBlock waitgroup, which is a used to synchronize all the optimistic block validations happening concurrently during the slot.
    c. otherwise if the builder is highPrio send the block to the highPrio queue of the prio-load-balancer.
    d. else send the block to the lowPrio queue of the prio-load-balancer.
  3. For non-optimistic blocks, wait for the validation result.
  4. Update the builder's current bid in redis.

Notice that for builders with sufficient collateral, we update the bid without validating the incoming block (though we queue it for async processing). This is where the improved performance can be achieved.


2. Validating optimistic blocks.

Screen Shot 2023-02-19 at 12 47 48 PM

  1. The optimistic block processor sends the block as low-prio to the prio-load-balancer for simulation.
  2. The block is simulated on the validating nodes, and the status is returned to the optimistic block processor.
  3. If the simulation failed, the builder is demoted and the details of the failure are written to the database.
  4. The optBlock waitgroup is decremented by one, indicating that this goroutine has completed its tasks.

This flow handles the simulation of all the blocks that were optimistically skipped. An invalid block here results in a demotion, but not necessarily a refund for the proposer because we don't yet know whether this block was the winning bid and thus signed + proposed.


3. Proposing optimistic blocks

Screen Shot 2023-02-19 at 12 55 00 PM

  1. mev-boost calls getHeader on the Proposer API of the relay. This is part of the MEV-Block Block Proposal as documented in https://docs.flashbots.net/flashbots-mev-boost/architecture-overview/block-proposal.
  2. mev-boost calls getPayload on the Proposer API of the relay. This triggers the publication of a SignedBeaconBlock.
  3. The optBlock waitgroup is waited on. This ensures that there are no more optimistic blocks to be simulated for that slot.
  4. The proposer API checks the database for a demotion that matches the header of the winning block. If it is present, then the simulation of the winning block must have failed, and thus a refund in necessary.
  5. If a demotion is found, the proposer API updates the demotion table with the refund justification, which is the SignedBeaconBlock and SignedValidatorRegistration.

This flow represents the process of checking if our optimistic block processing ever results in a validator submitting an invalid block. Since block builders will post collateral, this will be used to reimburse the validator in that case. Since refunds should be a relatively rare event, we plan on handling them manually.


Misc notes

We only allow optimistic processing of one slot at a time. We use a waitgroup to ensure that before we update the optimistic slot, all the previous slot blocks have been processed. This may bleed into the subsequent slot, but that is OK because we are actually shifting that processing time from the end of the previous slot which is where the timing is much more critical for winning bids.

At the beginning of each slot we also cache the status and collateral of each builder. We access this cache during the block submission flow to (1) avoid repeatedly reading the same status + collateral for a builder and (2) avoid race conditions where the status + collateral of a builder changes over course of a slot due to either a demotion or an internal API call.

Since builders may use many different public keys to submit blocks we allow all of those keys to be backed by a single collateral through the use of the a "Collateral ID". This is aimed at simplifying the process of posting collateral, but if any public key using that Collateral ID submits an invalid block, all of the public keys are demoted.

We introduce a database migration by adding is_optimistic, collateral, builder_id as columns to the block builder table and a number timing columns to the block builder submission table for profiling the performance of the relay. We also introduce one new table for the demotions. Additionally, we introduce an internal API on the path /internal/v1/builder/collateral/{pubkey} to provide a convenient way of updating the collateral of a builder in the DB.

We are currently running this version of the relay on Goerli testnet and have confirmed that the flows described above work e2e. Additionally, we added a number of unit tests to exercise the new logic. See https://github.com/michaelneuder/opt-relay-docs/blob/main/proposal.md#learnings-from-goerli for more details on what we learned running this relay on Goerli.


✅ I have run these commands

  • make lint
  • make test-race (this gave a few errors, but i think it is likely because we are doing a lot of asynchronous processing now, not quite sure how to deal with this)
  • go mod tidy
  • I have seen and agree to CONTRIBUTING.md

@michaelneuder michaelneuder mentioned this pull request Feb 19, 2023
4 tasks
@michaelneuder
Copy link
Collaborator Author

Copying the FAQ from Justin in #279 (comment)

Optimistic relay FAQ by Justin Drake


Does an optimistic relay simulate blocks?

Yes, an optimistic relay simulates all blocks that could have been forwarded to a proposer. Simulation just happens away from the latency-critical path.

Can optimistic relaying lead to mass missed slots?

Assuming no relay bug, the worst case is one missed slot per collateralised builder (prior to reactivation). The beacon chain is designed to handle missed slots.

Are builders incentivised to produce invalid blocks?

The builder of an invalid winning block suffers a financial loss. Moreover, a single invalid block will disable optimistic relaying for the builder, yielding a latency disadvantage pending reactivation.

What recommendations do you have for optimistic relay operators?

We have several recommendations for optimistic relay operators:

  • alerts – setup automatic alerts (e.g. email or phone call) for demotions
  • refunds – promptly transfer (e.g. within 24 hours) the bid value plus beacon chain penalties and missed rewards to the proposer fee recipient
  • max collateral – cap the maximum collateral amount per builder (e.g. to 10 ETH) to keep a level playing field
  • investigation – manually investigate demotions and ask builders to fix block building bugs before reactivating optimistic relaying
  • cool-off – impose a post-demotion cool-off period (e.g. 24 hours) before reactivating optimistic relaying
  • penalty – consider a fixed penalty (e.g. 0.1 ETH) per demotion, especially for repeat demotion

@cryptogakusei
Copy link

cryptogakusei commented Feb 20, 2023

In the step 1 of submitting optimistic blocks, how is the value of the block determined for checking that the builder collateral is greater than the value of the block? Is it just the bid from the builder?

@metachris
Copy link
Collaborator

For the record -- this change is going to be discussed at the mev-boost community call on Thursday: https://collective.flashbots.net/t/toward-an-open-research-and-development-process-for-mev-boost/464/19?u=ralexstokes

@michaelneuder
Copy link
Collaborator Author

In the step 1 of submitting optimistic blocks, how is the value of the block determined for checking that the builder collateral is greater than the value of the block? Is it just the bid from the builder?

hey! thanks for the q. yes, it is just the builder bid size. we store the builder collateral in our database and at the beginning of each slot, we update an in-memory cache with the amount of collateral that the builder has (by reading from the db). for each block they submit in that slot, we parse their submission and check the relative size of that bid versus the collateral. if the collateral exceeds the value of the bid, then we simulate optimistically.

@potuz
Copy link

potuz commented Feb 24, 2023

This proposal would make it hard, if not impossible, to eventually implement inclusion lists on Mev-Boost, am I right?

@michaelneuder
Copy link
Collaborator Author

This proposal would make it hard, if not impossible, to eventually implement inclusion lists on Mev-Boost, am I right?

i don't believe so. the full execution body is downloaded by the relay before the bid is made eligible, thus the relay could check the transactions of the body to ensure that the inclusion list transactions are present.

@potuz
Copy link

potuz commented Feb 25, 2023

This proposal would make it hard, if not impossible, to eventually implement inclusion lists on Mev-Boost, am I right?

i don't believe so. the full execution body is downloaded by the relay before the bid is made eligible, thus the relay could check the transactions of the body to ensure that the inclusion list transactions are present.

This doesn't work because the relayer has no way of knowing if the transactions are valid without executing them. So a block builder that just wants to censor those transactions, can simply bid higher and claim that it included those transactions, to make the proposer lose his chance of including them. Thus inclusion lists become useless with this approach.

@michaelneuder
Copy link
Collaborator Author

This proposal would make it hard, if not impossible, to eventually implement inclusion lists on Mev-Boost, am I right?

i don't believe so. the full execution body is downloaded by the relay before the bid is made eligible, thus the relay could check the transactions of the body to ensure that the inclusion list transactions are present.

just to make sure we are on the same page, let me describe the issue i think you are raising. i am using flashbots/mev-boost#215 as my reference point for what the proposed crlists in mev-boost would look like.

in the current implementation (no optimistic relaying) the flow would be:

  1. the proposer announces the inclusion list. this is a set of transactions that must be included in a block for them to accept it. (this proposal is for their own slot, but it could be modified to be for the next slot too, which is sometimes referred to as forward inclusion lists).
  2. the builders submit blocks to the relay that conform to the inclusion list.
  3. the relay confirms that the inclusion list is satisfied and the block is valid.
  4. the block wins the auction and is published by the relay.

with optimistic relaying:

  1. the proposer announces the inclusion list.
  2. the builders submit blocks to the relay and the relay marks them as active before verification.
  3. the block wins the auction and is published by the relay.

in that case, yes. it does seem possible for the builder to submit invalid transactions to trick the relay into thinking that it conforms to the inclusion list without actually doing so. however, the relay would still simulate the block and determine that the transactions were invalid, which results in the builder being demoted (no longer eligible to be processed optimistically). this would also result in the builder collateral being used to refund the proposer who missed their slot, thus the builder is financially punished for this behavior.

overall, we acknowledge that a misbehaving builder can force a missed slot (see https://github.com/michaelneuder/opt-relay-docs/blob/main/proposal.md), but since manual intervention is required for them to return to good-standing and they are damaged reputationally and financially as a result, we expect this to be a rare event. especially in this case where the builder is essentially admitting (cryptographically) to the world that they are trying to censor, this will strongly damage the builder reputation and only delay transaction inclusion by a single slot.

does that answer your question? thanks for bringing this up! it was very useful to consider it and we are keen to make sure we are building towards a future-compatible relay (the goal of the optimistic relay is to fore-run enshrined PBS, so this type of question is exactly what we hope to be answering).

@metachris
Copy link
Collaborator

How about exposing through the data API whether a bid/payload was optimistic?

@potuz
Copy link

potuz commented Mar 4, 2023

How about exposing through the data API whether a bid/payload was optimistic?

This would help, we can default to not accept those bids.

@michaelneuder
Copy link
Collaborator Author

How about exposing through the data API whether a bid/payload was optimistic?

ya! i think this is a nice idea. we are also thinking about adding a public API so that a builder can check their own pubkey status.

This would help, we can default to not accept those bids.

can you expand on this? accept in what context? as a validator?

@potuz
Copy link

potuz commented Mar 5, 2023

can you expand on this? accept in what context? as a validator?

As I mentioned above, the ability for a builder to prevent inclusion of an inclusion list, even for a single block, I believe is a deal breaker as it shifts trust assumptions from validators (or relayers in the case of not full PBS) to builders, thus invalidating any statistical analysis on censored transactions on chain. We can't prevent relayers from not checking the validity of the full execution block, but if the relayer provides the header with the info that it was optimistic we can default to local execution in that case unless the user explicitly passes a flag with a big warning

@metachris
Copy link
Collaborator

There's another MEV-Boost community call later this week (Thursday, 4pm UTC): https://collective.flashbots.net/t/mev-boost-community-call-1-9-mar-2023/1367

@potuz I hope you'll come, can discuss more there too.

@michaelneuder
Copy link
Collaborator Author

As I mentioned above, the ability for a builder to prevent inclusion of an inclusion list, even for a single block, I believe is a deal breaker as it shifts trust assumptions from validators (or relayers in the case of not full PBS) to builders, thus invalidating any statistical analysis on censored transactions on chain. We can't prevent relayers from not checking the validity of the full execution block, but if the relayer provides the header with the info that it was optimistic we can default to local execution in that case unless the user explicitly passes a flag with a big warning

would you be willing to hop on a call to discuss this further? i sent you a friend request on discord :-)

@michaelneuder
Copy link
Collaborator Author

How about exposing through the data API whether a bid/payload was optimistic?

yes, this is a great idea! we are already recording whether or not a submission is processed optimistically so this is super easy to expose via the data API: michaelneuder@61a4b28.

this also allows for checking if a payload is optimistic, it just requires two steps: (1) calling get payload to get the block hash, (2) using the block has to get the specific submission and checking if it was optimistic.

IMO adding the plumbing to get bool through to the payload data API is not worth it because (1) the optimistic property is semantically part of the submission, not really the getPayload call, (2) it would require an extra db read either when we call getPayload or when the data API is called. but happy to get push back on this if there is strong demand for getting the simulation status directly in the payload data API. :-)

@michaelneuder michaelneuder force-pushed the mikeneuder-20230219-1 branch from 374f2ec to 4e3ca5e Compare March 13, 2023 01:59
dependabot bot and others added 3 commits March 17, 2023 13:49
…#300)

Bumps [github.com/ethereum/go-ethereum](https://github.com/ethereum/go-ethereum) from 1.11.2 to 1.11.3.
- [Release notes](https://github.com/ethereum/go-ethereum/releases)
- [Commits](ethereum/go-ethereum@v1.11.2...v1.11.3)

---
updated-dependencies:
- dependency-name: github.com/ethereum/go-ethereum
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
collecting two extra timestamps in the block lifecycle
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2023

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 44.81409% with 282 lines in your changes missing coverage. Please review.

Project coverage is 28.75%. Comparing base (6640511) to head (1ecb2f2).
Report is 232 commits behind head on main.

Files with missing lines Patch % Lines
services/api/service.go 59.16% 89 Missing and 9 partials ⚠️
database/mockdb.go 0.00% 96 Missing ⚠️
beaconclient/mock_multi_beacon_client.go 0.00% 39 Missing ⚠️
common/test_utils.go 25.00% 18 Missing ⚠️
database/database.go 85.22% 9 Missing and 4 partials ⚠️
services/api/blocksim_ratelimiter.go 0.00% 11 Missing ⚠️
database/typesconv.go 0.00% 3 Missing ⚠️
common/common.go 0.00% 2 Missing ⚠️
common/types.go 0.00% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #285       +/-   ##
===========================================
+ Coverage   18.34%   28.75%   +10.40%     
===========================================
  Files          20       22        +2     
  Lines        3559     3944      +385     
===========================================
+ Hits          653     1134      +481     
+ Misses       2826     2666      -160     
- Partials       80      144       +64     
Flag Coverage Δ
unittests 28.75% <44.81%> (+10.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@metachris metachris requested review from jtraglia and Ruteri March 25, 2023 10:20
@metachris
Copy link
Collaborator

We'll likely merge this PR soon.

@PizBernina
Copy link

hi, do we have any update on this?
Is the idea still that only the USM relay will initially have it?

@michaelneuder michaelneuder mentioned this pull request May 1, 2023
4 tasks
@metachris
Copy link
Collaborator

Closing this in favor of #380

@metachris metachris closed this May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants