Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hexary trie roots for lists in ExecutionPayloadHeader #3078

Closed
wants to merge 17 commits into from

Conversation

etan-status
Copy link
Contributor

The ExecutionPayloadHeader currently contains a SSZ merkle root for transactions_root / withdrawals_root. While this is fine for the purpose of tracking the latest_execution_payload_header as part of BeaconState, it introduces challenges when trying to extend the Engine API to support light client based EL client implementations.

By tracking the RLP hash for transactions and withdrawals as well, it would become possible to introduce an engine_newPayloadHeader API that allows passing the full EL block header, unlocking LES use cases.

The `ExecutionPayloadHeader` currently contains a SSZ merkle root for
`transactions_root` / `withdrawals_root`. While this is fine for the
purpose of tracking the `latest_execution_payload_header` as part of
`BeaconState`, it introduces challenges when trying to extend the
Engine API to support light client based EL client implementations.

By tracking the RLP hash for transactions and withdrawals as well,
it would become possible to introduce an `engine_newPayloadHeader` API
that allows passing the full EL block header, unlocking LES use cases.
@etan-status
Copy link
Contributor Author

Note that this change leads to ExecutionPayload having more than 16 struct members, so all the SSZ generalized indices for merkle proofs within ExecutionPayload double their numeric value (but the indices are then stable until 32 members, as long as new members are always added to the end).

Maybe cleaner to just do the switch beyond 16 now instead of with EIP4844 / something else that would promote actually using those generalized indices (so far, they are not used I think).

Comment on lines 157 to 160
transactions_hash: Bytes32 # [New in Capella]
withdrawals: List[Withdrawal, MAX_WITHDRAWALS_PER_PAYLOAD] # [New in Capella]
withdrawals_hash: Bytes32 # [New in Capella]
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd also need to extend the engine API to include validation of these values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, engine API is being extended as part of Capella anyway, so that shouldn't be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching engine API PR: ethereum/execution-apis#318

@ralexstokes
Copy link
Member

how much worse is it to just extend the light client types so that they have the execution payload header and a Merkle proof showing its correctness?

in general we don't want to be modifying or extending core types if the cost is only a few bytes

@etan-status
Copy link
Contributor Author

transactions_hash is an RLP hash, and does not use SSZ merkle proof, so that alternative would consist of:

  • Add RLP hexary proof for EL txRoot within EL blockHash, and extend LC types with it
  • Produce those hexary proofs inside CL light client server (but: CL currently doesn't know anything about RLP, it offloads any RLP duties via engine API to the EL)
  • Verify those hexary proofs inside CL light client (but: it currently doesn't know anything about RLP or hexary proofs)

Simply passing the existing ExecutionPayloadHeader to LC only provides the SSZ merkle transactions_root, which is useless to pass to the EL, because the EL does not know about SSZ and cannot verify the SSZ merkle transactions_root against the rest of the execution payload.

Hence the idea to have the EL expose the transactions RLP hash via engine API, so it is part of the ExecutionPayload and ExecutionPayloadHeader as well. This way, a new engine_newPayloadHeader endpoint could be added that does not include the transaction bodies (saving up to ~2 MB per block).

@ralexstokes
Copy link
Member

ralexstokes commented Nov 4, 2022

maybe I've lost the use case and so missing some critical constraint but can we not just include a Merkle proof from sync committee's block_root to state_root to execution_payload_header?

you can also continue the proof chain down into the header to prove any individual transaction if desired

the CL light client could provide these verified headers to the EL for the use case where we just want to feed execution data to the EL

@etan-status
Copy link
Contributor Author

The use case is driving LES via engine API. LES expects full EL block headers, not just state root; this way, no further network queries are necessary on the LES side — see ethereum/execution-apis#318 (comment) — Mode of operation could be a CL light client that tracks the latest BeaconBlockHeader + ExecutionPayloadHeader, then feeds the ExecutionPayloadHeader into the EL LES light client, which then looks at fields such as logs_bloom to determine whether a block is relevant, e.g., for a locally monitored wallet.

But, currently, there is no easy access to the EL block header, because the txRoot RLP hash is missing. It can be derived from the full ExecutionPayload, but not from the ExecutionPayloadHeader (as that one only includes the SSZ merkle transactions_root but not the RLP hexary hash).

@ralexstokes
Copy link
Member

yeah I see -- it would be nice to just be able to provide the hash and a proof that it is in the correct place inside the preimage of the block hash, simply because each and every thing we add to the state/block has a huge cost around implementation, security, testing and long-term maintenance

I wonder what it would look like to make a SNARK or even STARK of this root -> hash equivalence...

@etan-status
Copy link
Contributor Author

Who would be the entity providing the RLP hash in that situation though?

If it's the CL's providing the LC data, it would mean converting the ExecutionPayload to the corresponding EL RLP structure internally (different types, different endianness), followed by creating the hexary proof for it (CLs don't necessarily have libraries for that), then bundling that into LightClientUpdate. Or, alternatively, asking the EL for the full block header using eth_getBlockByHash just to obtain the transactionsRoot, but then to create the hexary proof I presume that it is still necessary to convert the ExecutionPayload into the RLP format.

The alternative proposal here exposes the existing RLP transactionsRoot information that the EL already keeps track of as part of the EL block header via engine_getPayload, and extends engine_newPayload with a check that the provided RLP transactionsRoot matches the computed value from the transactions list. The CL would not have to touch anything about the RLP, if engine API says it's VALID it's valid.

Note also the matching behaviour for eth_getBlockByNumber – it also exposes transactionsRoot next to the transactions list.

As for SNARK/STARK, it already has to do that equivalence today, to validate:

  • payload_header.transactions_root == hash_tree_root(payload.transactions) (SSZ merkle root)
  • payload.block_hash == rlp_root(<other fields>..., rlp_root(payload.transactions)) (RLP root hash)
    The RLP transactions hash needs to be computed from the transactions list, and then verified against the block hash.
    What the proposal here adds in complexity is just an equivalence check, that that intermediate computed rlp_root(payload.transactions) result matches the provided payload.transactions_hash.

The added value is essentially that the EL block header is constructable from the CL ExecutionPayloadHeader. Currently, it is not, as the CL header uses a different hashing format.

@etan-status
Copy link
Contributor Author

Updated the corresponding engine API call with the corresponding change, if this PR would be adopted:
ethereum/execution-apis#318

The validation of the RLP transactionsRoot and RLP withdrawalsRoot would be covered by the existing requirement:

  1. Client software MUST validate blockHash value as being equivalent to Keccak256(RLP(ExecutionBlockHeader)), where ExecutionBlockHeader is the execution layer block header (the former PoW block header structure). Fields of this object are set to the corresponding payload values and constant values according to the Block structure section of EIP-3675, extended with the corresponding section of EIP-4399. Client software MUST run this validation in all cases even if this branch or any other branches of the block tree are in an active sync process.

In EIP 4895 it is visible that the corresponding RLP txs_root and RLP withdrawals_root are already known by the EL and not an extra concept:

execution_payload_header_rlp = RLP([
  parent_hash,
  0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347, # ommers hash
  coinbase,
  state_root,
  txs_root,
  receipts_root,
  logs_bloom,
  0, # difficulty
  number,
  gas_limit,
  gas_used,
  timestamp,
  extradata,
  prev_randao,
  0x0000000000000000, # nonce
  base_fee_per_gas,
  withdrawals_root,
])

@g11tech
Copy link
Contributor

g11tech commented Nov 14, 2022

  • 1 for having the transactions rlp hash + withdrawals rlp hash as part of the payload header as LES could be totally driven by the the light-client updates

The alternative is for light client update provider to construct them but that this is more protocol native

@etan-status
Copy link
Contributor Author

Yep, having provider compute them would also be possible, but would require adding RLP hashing capabilities to the "light client update provider" (including CL full nodes), as well as into all light clients (even non-LES ones, otherwise they can't participate in libp2p gossip as they can't validate the gossip).

@hwwhww hwwhww added the Capella label Nov 16, 2022
@mkalinin
Copy link
Collaborator

After a conversation with @etan-status in Discord, we identified two goals that this proposal aims at:

  1. Allow for LC to validate that ExecutionPayloadHeader data committed to a given blockHash
  2. Feed LES protocol with RLP transactionRoot value

As for the (1), I agree with @ralexstokes that LC protocol may provide a proof that beacon block (state) root commits to a certain ExecutionPayloadHeader. The absence of blockHash verification shouldn't reduce security level of the protocol as full clients are responsible for running this check.

For the (2), does LES protocol require transactionRoot value to follow the head of the chain? For instance, LES driven by CL doesn't need to verify Ethash seal anymore and probably may accept an ExecutionPayloadHeader based on the trust assumption between CL and EL. In other words, my main question is how heavy LES relies on the full header?

@etan-status
Copy link
Contributor Author

re (2), @zsfelfoldi

@etan-status
Copy link
Contributor Author

BTW, also needs Patricia Trie support to produce the additional hashes (in the BN).
At least, a simplified version that can deal with the sequential indices (insertion order can be optimized).

def compute_trie_root_from_indexed_data(data):
    trie = Trie.from([(i, obj) for i, obj in enumerate(data)])
    return trie.root

execution_payload_header.withdrawals_root = \
    compute_trie_root_from_indexed_data(execution_payload.withdrawals)

(and, similar for transactions)

@etan-status
Copy link
Contributor Author

etan-status commented Nov 22, 2022

Alternate design: trie roots NOT part of ExecutionPayloadHeader

LightClientHeader:
  beacon: BeaconBockHeader             # Signed by sync committee
  execution: ExecutionPayloadHeader
  execution_branch: array[N, Root]     # Computed by CL serving data
  transactions_trie_root: Hash32       # Computed by CL serving data, not covered by merkle proof
  withdrawals_trie_root: Hash32        # Computed by CL serving data, not covered by merkle proof

CL light client would receive this object.

To validate:

  1. Check that beacon signature is valid (obtained separately)
  2. Check that execution is valid (using execution_branch SSZ merkle proof)
  3. Check that transactions_trie_root and withdrawals_trie_root are valid (by checking them against execution.block_hash)

The EL needs to compute transactions_trie_root and withdrawals_trie_root as part of engine_newPayload in any case. In the PR proposed design, the two hashes would be exposed in engine_getPayload and bundled as opaque data with the ExecutionPayload for storage in BeaconBlock and BeaconState. engine_newPayload would be extended with a check to ensure that those hashes match the rest of the payload data.

In the alternate design here, the transactions_trie_root and withdrawals_trie_root are NOT included as part of BeaconBlock and BeaconState. Instead, the light client data server (the CL full node) would compute them on demand.

This replicates some of the EL work as part of the CL and LC logic, but at least keeps BeaconBlock and BeaconState clean, for a different benefit/cost profile.

For the CL, added requirements:

  • Root hash computation of patriciaTrie(rlp(Index) => Data) for a data arrays.
  • RLP encoding for Withdrawals.

For the LC, added requirements:

  • RLP encoding for ExecutionBlockHeader.
  • Keccak hashing.

Production (CL)

def compute_trie_root_from_indexed_data(data):
    """
    Computes the root hash of `patriciaTrie(rlp(Index) => Data)` for a data array.
    """
    t = HexaryTrie(db={})
    for i, obj in enumerate(data):
        k = encode(i, big_endian_int)
        t.set(k, obj)
    return t.root_hash


def get_withdrawal_rlp(withdrawal):
    withdrawal_rlp = [
        # index
        (big_endian_int, withdrawal.index),
        # validator_index
        (big_endian_int, withdrawal.validator_index),
        # address
        (Binary(20, 20), withdrawal.address),
        # amount
        (big_endian_int, uint256(withdrawal.amount) * (10**9)),
    ]

    sedes = List([schema for schema, _ in withdrawal_rlp])
    values = [value for _, value in withdrawal_rlp]
    return encode(values, sedes)


def create_light_client_header(block):
    beacon = BeaconBlockHeader(
        slot=block.message.slot,
        proposer_index=block.message.proposer_index,
        parent_root=block.message.parent_root,
        state_root=block.message.state_root,
        body_root=hash_tree_root(block.message.body),
    )

    payload = block.message.body.execution_payload
    execution = ExecutionPayloadHeader(
        parent_hash=payload.parent_hash,
        fee_recipient=payload.fee_recipient,
        state_root=payload.state_root,
        receipts_root=payload.receipts_root,
        logs_bloom=payload.logs_bloom,
        prev_randao=payload.prev_randao,
        block_number=payload.block_number,
        gas_limit=payload.gas_limit,
        gas_used=payload.gas_used,
        timestamp=payload.timestamp,
        extra_data=payload.extra_data,
        base_fee_per_gas=payload.base_fee_per_gas,
        block_hash=payload.block_hash,
        transactions_root=hash_tree_root(payload.transactions),
        withdrawals_root=hash_tree_root(payload.withdrawals),
    )
    execution_branch = compute_merkle_proof_for_block_body(
        block.message.body, 
        EXECUTION_PAYLOAD_INDEX,
    )

    transactions_trie_root = compute_trie_root_from_indexed_data(payload.transactions)

    withdrawals_encoded = [get_withdrawal_rlp(spec, withdrawal) for withdrawal in payload.withdrawals]
    withdrawals_trie_root = compute_trie_root_from_indexed_data(withdrawals_encoded)

    return LightClientHeader(
        beacon=beacon,
        execution=execution,
        execution_branch=execution_branch,
        transactions_trie_root=transactions_trie_root,
        withdrawals_trie_root=withdrawals_trie_root,
    )

The light client would then validate that the obtained transactions_trie_root and withdrawals_trie_root are consistent w.r.t. the rest of the obtained data, by checking that:

def is_light_client_header_valid(header):  # signature check out of scope
    # Validate that `header.execution` corresponds to `header.beacon`
    if not is_valid_merkle_branch(
        leaf=hash_tree_root(header.execution),
        branch=header.execution_branch,
        depth=floorlog2(EXECUTION_PAYLOAD_INDEX),
        index=get_subtree_index(EXECUTION_PAYLOAD_INDEX),
        root=header.beacon.body_root,
    ):
        return False

    # Validate that `header.transactions_trie_root` and `header.withdrawals_trie_root` 
    # correspond to `header.execution`
    payload_header = light_client_header.execution
    execution_payload_header_rlp = [
         # parent_hash
         (Binary(32, 32), payload_header.parent_hash),
         # ommers_hash
         (Binary(32, 32), bytes.fromhex("1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347")),
         # coinbase
         (Binary(20, 20), payload_header.fee_recipient),
         # state_root
         (Binary(32, 32), payload_header.state_root),
         # txs_root
         (Binary(32, 32), header.transactions_trie_root),
         # receipts_root
         (Binary(32, 32), payload_header.receipts_root),
         # logs_bloom
         (Binary(256, 256), payload_header.logs_bloom),
         # difficulty
         (big_endian_int, 0),
         # number
         (big_endian_int, payload_header.block_number),
         # gas_limit
         (big_endian_int, payload_header.gas_limit),
         # gas_used
         (big_endian_int, payload_header.gas_used),
         # timestamp
         (big_endian_int, payload_header.timestamp),
         # extradata
         (Binary(0, 32), payload_header.extra_data),
         # prev_randao
         (Binary(32, 32), payload_header.prev_randao),
         # nonce
         (Binary(8, 8), bytes.fromhex("0000000000000000")),
         # base_fee_per_gas
         (big_endian_int, payload_header.base_fee_per_gas),
         # withdrawals_root
         (Binary(32, 32), header.withdrawals_trie_root)
     ]

     sedes = List([schema for schema, _ in execution_payload_header_rlp])
     values = [value for _, value in execution_payload_header_rlp]
     encoded = encode(values, sedes)

     computed_block_hash = spec.Hash32(keccak(encoded))
     return computed_block_hash == payload_header.block_hash

See also: #3126 (file: helpers/execution_payload.py)

@etan-status
Copy link
Contributor Author

Alternate design above could interfere with split block storage, in which case the CL is no longer able to reconstruct the hexary tries needed for transactions_trie_root / withdrawals_trie_root.

Overall, it would arguably be better to avoid conflicting serialization strategies across CL and EL. Especially, for the withdrawals which are a completely new addition, I don't see arguments in favor of using RLP / hexary tries / Wei instead of Gwei, and have commented here: https://ethereum-magicians.org/t/eip-4895-beacon-chain-withdrawals-as-system-level-operations/8568/28

As for transactions, given their history, it may make sense to just include the hexary trie root into the ExecutionPayloadHeader as a one-off exception (the rest is already consistent / the CL stores transactions in RLP format as well).

Because of aforementioned split block storage aspects, it may also become prohibitive to require CL to serve proofs about tx / withdrawal inclusion, so the SSZ roots may be necessary in EL world anyway.

@etan-status etan-status changed the title Add RLP hashes for lists in ExecutionPayloadHeader Add hexary trie roots for lists in ExecutionPayloadHeader Nov 28, 2022
@etan-status
Copy link
Contributor Author

hexssz.pdf

@etan-status
Copy link
Contributor Author

elroots.pdf

@etan-status
Copy link
Contributor Author

Superseded by
ethereum/execution-apis#354
ethereum/EIPs#6325

Once EL block header is changed to use SSZ format for withdrawals_root and txs_root, this will automagically be consistent without any CL changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants