-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
2 changed files
with
321 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
--- | ||
Check failure on line 1 in EIPS/eip-7495.md GitHub Actions / Run
Check failure on line 1 in EIPS/eip-7495.md GitHub Actions / Run
Check failure on line 1 in EIPS/eip-7495.md GitHub Actions / Run
Check failure on line 1 in EIPS/eip-7495.md GitHub Actions / Run
|
||
eip: 7495 | ||
title: SSZ PartialContainer | ||
description: New SSZ type to represent a flexible container with stable serialization and merkleization | ||
author: Etan Kissling (@etan-status) | ||
discussions-to: https://ethereum-magicians.org/t/eip-7495-ssz-partialcontainer/15476 | ||
status: Draft | ||
type: Standards Track | ||
category: Core | ||
created: 2023-08-18 | ||
--- | ||
|
||
## Abstract | ||
|
||
This EIP introduces a new [Simple Serialize (SSZ) type](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md) to represent `PartialContainer[T, N]` values. | ||
|
||
A `PartialContainer[T, N]` is an SSZ `Container` with stable serialization and merkleization even when individual fields become optional or new fields are introduced in the future. | ||
|
||
## Motivation | ||
|
||
Partial containers are currently not representable in SSZ. Adding support provides these benefits: | ||
|
||
1. **Stable signatures:** Signing roots derived from a `PartialContainer[T, N]` never change. In the context of Ethereum, this is useful for transaction signatures that are expected to remain valid even when future updates introduce additional transaction fields. Likewise, the overall transaction root remains stable and can be used as a perpetual transaction ID. | ||
|
||
2. **Stable merkle proofs:** Merkle proof verifiers that check specific fields of a `PartialContainer[T, N]` do not need continuous updating when future updates introduce additional fields. Common fields always merkleize at the same [generalized indices](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/merkle-proofs.md). | ||
|
||
3. **Compact serialization:** SSZ serialization is compact; inactive fields do not consume space. | ||
|
||
## Specification | ||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174. | ||
|
||
### Type definition | ||
|
||
`PartialContainer[T, N]` defines a wrapper around an SSZ `Container` type indicated by `T`. `N` indicates the maximum number of fields to which `T` can grow in the future. `N` MUST be `> 0`. | ||
|
||
When wrapped in a `PartialContainer[T, N]`, `T` MAY define fields of type `Optional[E]`. Such fields can either represent a value of SSZ type `E`, or absence of a value (indicated by `None`). The [default value](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#default-values) of an `Optional[E]` is `None`. | ||
|
||
For the purpose of serialization, `PartialContainer[T, N]` is always considered ["variable-size"](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#variable-size-and-fixed-size) regardless of `T`. | ||
|
||
### Stability guarantees | ||
|
||
The serialization and merkleization of a `PartialContainer[T, N]` remains stable as long as: | ||
|
||
- The maximum capacity `N` does not change | ||
- The order of fields within `T` does not change | ||
- New fields are always added to the end of `T` | ||
- Required fields of `T` remain required `E`, or become an `Optional[E]` | ||
- Optional fields of `T` remain `Optional[E]`, or become a required `E` | ||
|
||
When an optional field becomes required, existing messages still have stable serialization and merkleization, but will be rejected on deserialization if not present. | ||
|
||
### Serialization | ||
|
||
Serialization of `PartialContainer[T, N]` is defined similarly to the [existing logic](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#vectors-containers-lists) for `Container`. Notable changes are: | ||
|
||
- A [`Bitvector[N]`](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#composite-types) is constructed, indicating active fields within the `PartialContainer[T, N]`. For required fields `E` and optional fields `Optional[E]` with a present value (not `None`), a `True` bit is included. For optional fields `Optional[E]` with a `None` value, a `False` bit is included. The `Bitvector[N]` is padded with `False` bits up through length `N` | ||
- Only active fields are serialized, i.e., fields with a corresponding `True` bit in the `Bitvector[N]` | ||
- The serialization of the `Bitvector[N]` is prepended to the serialized active fields | ||
- If variable-length fields are serialized, their offsets are relative to the start of serialized active fields, after the `Bitvector[N]` | ||
|
||
```python | ||
def is_active_field(element): | ||
return not is_optional(element) or element is not None | ||
|
||
# Determine active fields | ||
active_fields = Bitvector[N](([is_active_field(element) for element in value] + [False] * N)[:N]) | ||
partial_value = [element for element in value if is_active_field(element)] | ||
|
||
# Recursively serialize | ||
fixed_parts = [serialize(element) if not is_variable_size(element) else None for element in partial_value] | ||
variable_parts = [serialize(element) if is_variable_size(element) else b"" for element in partial_value] | ||
|
||
# Compute and check lengths | ||
fixed_lengths = [len(part) if part != None else BYTES_PER_LENGTH_OFFSET for part in fixed_parts] | ||
variable_lengths = [len(part) for part in variable_parts] | ||
assert sum(fixed_lengths + variable_lengths) < 2**(BYTES_PER_LENGTH_OFFSET * BITS_PER_BYTE) | ||
|
||
# Interleave offsets of variable-size parts with fixed-size parts | ||
variable_offsets = [serialize(uint32(sum(fixed_lengths + variable_lengths[:i]))) for i in range(len(partial_value))] | ||
fixed_parts = [part if part != None else variable_offsets[i] for i, part in enumerate(fixed_parts)] | ||
|
||
# Return the concatenation of the active fields `Bitvector` with the active | ||
# fixed-size parts (offsets interleaved) and the active variable-size parts | ||
return serialize(active_fields) + b"".join(fixed_parts + variable_parts) | ||
``` | ||
|
||
### Deserialization | ||
|
||
Deserialization of a `PartialContainer[T, N]` starts by deserializing a `Bitvector[N]`. That value MUST be validated against the definition of `T`: | ||
|
||
- For each required field within `T`, the corresponding bit in the `Bitvector[N]` MUST be `True` | ||
- For each optional field within `T`, the corresponding bit in the `Bitvector[N]` is not restricted | ||
- All extra bits in the `Bitvector[N]` that exceed the number of fields within `T` MUST be `False` | ||
|
||
The rest of the data is [deserialized](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#deserialization) as `T`, consulting the `Bitvector[N]` to determine what optional fields are present. Absent fields are skipped during deserialization and assigned `None` values. | ||
|
||
### Merkleization | ||
|
||
To merkleize a `PartialContainer[T, N]`, a `Bitvector[N]` is constructed, indicating active fields within the `PartialContainer[T, N]`, using the same process as during serialization. | ||
|
||
```python | ||
class PartialContainer(Generic[T, N]): | ||
data: T | ||
active_fields: Bitvector[N] | ||
``` | ||
|
||
The [merkleization specification](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#merkleization) is extended with the following helper functions: | ||
|
||
- `chunk_count(type)`: calculate the amount of leafs for merkleization of the type. | ||
- `PartialContainer[T, N]`: always `N`, regardless of the actual number of fields in `T` | ||
- `mix_in_object`: Given a Merkle root `root` and an auxiliary SSZ object `obj` return `hash(root + hash_tree_root(obj))`. | ||
|
||
Merkleization `hash_tree_root(value)` of an object `value` is then extended with: | ||
|
||
- `mix_in_object(merkleize(([hash_tree_root(element) if is_active_field(element) else Bytes32() for element in value.data] + [Bytes32()] * N)[:N]), value.active_fields)` if `value` is a `PartialContainer[T, N]`. | ||
|
||
## Rationale | ||
|
||
### What are the problems solved by `PartialContainer[T, N]`? | ||
|
||
Current SSZ types are only stable within one version of a specification, i.e., one fork of Ethereum. This is alright for messages pertaining to a specific fork, such as attestations or beacon blocks. However, it is a limitation for messages that are expected to remain valid across forks, such as transactions or receipts. In order to support evolving the features of such perpetually valid message types, a new SSZ scheme needs to be defined. | ||
|
||
To avoid restricting design space, the scheme has to support extension with new fields, obsolescence of old fields, and new combinations of existing fields. When such adjustments occur, old messages must still deserialize correctly and must retain their original Merkle root. | ||
|
||
### Why not `Union[T, U, V]`? | ||
|
||
Typically, the individual `Union` cases share some form of thematic overlap, sharing certain fields with each other. In a `Union`, shared fields are not necessarily merkleized at the same [generalized indices](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/merkle-proofs.md). Therefore, Merkle proof systems would have to be updated each time that a new flavor is introduced, even when the actual changes are not of interest to the particular system. | ||
|
||
Furthermore, SSZ Union types are currently not used in any final Ethereum specification and do not have a finalized design themselves. The `PartialContainer[T, N]` serializes very similar to current `Union[T, U, V]` proposals, with the difference being a `Bitvector[N]` as a prefix instead of a selector byte. This means that the serialized byte lengths are comparable. | ||
|
||
### Why not a `Container` full of `Optional[E]`? | ||
|
||
If `Optional[E]` is modeled as an SSZ type, each individual field introduces serialization and merkleization overhead. As an `Optional[E]` would be required to be ["variable-size"](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#variable-size-and-fixed-size), lots of additional offset bytes would have to be used in the serialization. For merkleization, each individual `Optional[E]` would require mixing in a bit to indicate presence or absence of the value. | ||
|
||
Additionally, every time that the number of fields reaches a new power of 2, the Merkle roots break, as the number of chunks doubles. The `PartialContainer[T, N]` solves this by artificially extending the Merkle tree to `N` chunks regardless of the actual number of fields defined by the current version of `T`. Therefore, the number of fields is constant and the Merkle tree shape is stable. The overhead of the additional empty placeholder leaves only affects serialization of the `Bitvector[N]` (1 byte per 8 leaves); the number of required hashes during merkleization only grows logarithmically with the total number of leaves. | ||
|
||
## Backwards Compatibility | ||
|
||
`PartialContainer[T, N]` is a new SSZ type and does not conflict with other SSZ types currently in use. | ||
|
||
## Test Cases | ||
|
||
```python | ||
@dataclass | ||
class Foo: | ||
a: uint64 | ||
b: Optional[uint32] | ||
c: Optional[uint16] | ||
|
||
value = PartialContainer[Foo, 32](a=64, b=None, c=16) | ||
assert bytes(serialize(value)).hex() == "0500000040000000000000001000" | ||
``` | ||
|
||
## Reference Implementation | ||
|
||
See [EIP assets](../assets/eip-7495/partial_container.py), based on `protolambda/remerkleable`. | ||
|
||
## Security Considerations | ||
|
||
None | ||
|
||
## Copyright | ||
|
||
Copyright and related rights waived via [CC0](../LICENSE.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
from dataclasses import fields, is_dataclass | ||
import io | ||
from typing import BinaryIO, Dict, Optional, TypeVar, Type, Union as PyUnion, get_args, get_origin | ||
from textwrap import indent | ||
from remerkleable.bitfields import Bitvector | ||
from remerkleable.complex import ComplexView, encode_offset | ||
from remerkleable.core import View, ViewHook, OFFSET_BYTE_LENGTH | ||
from remerkleable.tree import NavigationError, Node, PairNode, \ | ||
get_depth, subtree_fill_to_contents, zero_node | ||
|
||
T = TypeVar('T') | ||
N = TypeVar('N') | ||
|
||
PartialFields = Dict[str, tuple[Type[View], bool]] | ||
|
||
class PartialContainer(ComplexView): | ||
_field_indices: Dict[str, tuple[int, Type[View], bool]] | ||
__slots__ = '_field_indices' | ||
|
||
def __new__(cls, backing: Optional[Node] = None, hook: Optional[ViewHook] = None, **kwargs): | ||
if backing is not None: | ||
if len(kwargs) != 0: | ||
raise Exception("cannot have both a backing and elements to init fields") | ||
return super().__new__(cls, backing=backing, hook=hook, **kwargs) | ||
|
||
for fkey, (ftyp, fopt) in cls.fields().items(): | ||
if fkey not in kwargs: | ||
if not fopt: | ||
raise AttributeError(f"Field '{fkey}' is required in {cls.T}") | ||
kwargs[fkey] = None | ||
|
||
input_nodes = [] | ||
active_fields = Bitvector[cls.N]() | ||
for findex, (fkey, (ftyp, fopt)) in enumerate(cls.fields().items()): | ||
fnode: Node | ||
assert fkey in kwargs | ||
finput = kwargs.pop(fkey) | ||
if finput is None: | ||
fnode = zero_node(0) | ||
active_fields.set(findex, False) | ||
else: | ||
if isinstance(finput, View): | ||
fnode = finput.get_backing() | ||
else: | ||
fnode = ftyp.coerce_view(finput).get_backing() | ||
active_fields.set(findex, True) | ||
input_nodes.append(fnode) | ||
|
||
if len(kwargs) > 0: | ||
raise AttributeError(f'The field names [{"".join(kwargs.keys())}] are not defined in {cls.T}') | ||
|
||
backing = PairNode( | ||
left=subtree_fill_to_contents(input_nodes, get_depth(cls.N)), | ||
right=active_fields.get_backing()) | ||
return super().__new__(cls, backing=backing, hook=hook, **kwargs) | ||
|
||
def __init_subclass__(cls, *args, **kwargs): | ||
super().__init_subclass__(*args, **kwargs) | ||
cls._field_indices = { | ||
fkey: (i, ftyp, fopt) for i, (fkey, (ftyp, fopt)) in enumerate(cls.fields().items())} | ||
if len(cls._field_indices) == 0: | ||
raise Exception(f"PartialContainer {cls.__name__} must have at least one field!") | ||
|
||
def __class_getitem__(cls, params) -> Type["PartialContainer"]: | ||
t, n = params | ||
|
||
if not is_dataclass(t): | ||
raise Exception(f"partialcontainer doesn't wrap `@dataclass`: {t}") | ||
if n <= 0: | ||
raise Exception(f"invalid partialcontainer capacity: {n}") | ||
|
||
class PartialContainerView(PartialContainer): | ||
T = t | ||
N = n | ||
|
||
PartialContainerView.__name__ = PartialContainerView.type_repr() | ||
return PartialContainerView | ||
|
||
@classmethod | ||
def fields(cls) -> PartialFields: | ||
ret = {} | ||
for field in fields(cls.T): | ||
fkey = field.name | ||
fopt = get_origin(field.type) == PyUnion and type(None) in get_args(field.type) | ||
ftyp = get_args(field.type)[0] if fopt else field.type | ||
ret[fkey] = (ftyp, fopt) | ||
return ret | ||
|
||
def active_fields(self) -> Bitvector: | ||
active_fields_node = super().get_backing().get_right() | ||
return Bitvector[self.__class__.N].view_from_backing(active_fields_node) | ||
|
||
def __getattr__(self, item): | ||
if item[0] == '_': | ||
return super().__getattribute__(item) | ||
else: | ||
try: | ||
(findex, ftyp, fopt) = self.__class__._field_indices[item] | ||
except KeyError: | ||
raise AttributeError(f"unknown attribute {item}") | ||
|
||
if not self.active_fields().get(findex): | ||
assert fopt | ||
return None | ||
|
||
data = super().get_backing().get_left() | ||
fnode = data.getter(2**get_depth(self.__class__.N) + findex) | ||
return ftyp.view_from_backing(fnode) | ||
|
||
def _get_field_val_repr(self, fkey: str, ftyp: Type[View], fopt: bool) -> str: | ||
field_start = ' ' + fkey + ': ' + ( | ||
('Optional[' if fopt else '') + ftyp.__name__ + (']' if fopt else '') | ||
) + ' = ' | ||
try: | ||
field_repr = repr(getattr(self, fkey)) | ||
if '\n' in field_repr: # if multiline, indent it, but starting from the value. | ||
i = field_repr.index('\n') | ||
field_repr = field_repr[:i+1] + indent(field_repr[i+1:], ' ' * len(field_start)) | ||
return field_start + field_repr | ||
except NavigationError: | ||
return f"{field_start} *omitted from partial*" | ||
|
||
def __repr__(self): | ||
return f"{self.__class__.type_repr()}:\n" + '\n'.join( | ||
indent(self._get_field_val_repr(fkey, ftyp, fopt), ' ') | ||
for fkey, (ftyp, fopt) in self.__class__.fields().items()) | ||
|
||
@classmethod | ||
def type_repr(cls) -> str: | ||
return f"PartialContainer[{cls.T.__name__}, {cls.N}]" | ||
|
||
def serialize(self, stream: BinaryIO) -> int: | ||
active_fields = self.active_fields() | ||
num_prefix_bytes = active_fields.serialize(stream) | ||
|
||
num_data_bytes = sum( | ||
ftyp.type_byte_length() if ftyp.is_fixed_byte_length() else OFFSET_BYTE_LENGTH | ||
for findex, (_, (ftyp, _)) in enumerate(self.__class__.fields().items()) | ||
if active_fields.get(findex)) | ||
|
||
temp_dyn_stream = io.BytesIO() | ||
data = super().get_backing().get_left() | ||
for findex, (_, (ftyp, _)) in enumerate(self.__class__.fields().items()): | ||
if not active_fields.get(findex): | ||
continue | ||
fnode = data.getter(2**get_depth(self.__class__.N) + findex) | ||
v = ftyp.view_from_backing(fnode) | ||
if ftyp.is_fixed_byte_length(): | ||
v.serialize(stream) | ||
else: | ||
encode_offset(stream, num_data_bytes) | ||
num_data_bytes += v.serialize(temp_dyn_stream) # type: ignore | ||
temp_dyn_stream.seek(0) | ||
stream.write(temp_dyn_stream.read(num_data_bytes)) | ||
|
||
return num_prefix_bytes + num_data_bytes |