Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding for on-chain structures #621

Open
anorth opened this issue Nov 5, 2019 · 3 comments
Open

Encoding for on-chain structures #621

anorth opened this issue Nov 5, 2019 · 3 comments

Comments

@anorth
Copy link
Contributor

@anorth anorth commented Nov 5, 2019

The spec is very light on details about the serialization/encoding of on-chain structures. At present, CBOR is not mentioned but a few structs have a // representation tuple comment.

I believe the intention at present is for all structures to be CBOR-tuple encoded (i.e. a CBOR array with items corresponding to struct fields in their order of declaration). This is efficient but has some potential problems. I'm filing this issue so that we have them written down somewhere.

@jbenet's most recent declaration is:

  • i'm OK with tuple encoding for testnet
  • we MAY ship mainnet w/ tuple encoding
  • we MAY have to change from tuple to int-keyed map for structs for mainnet
  • we will prioritize this along other changes that come out of security review during testnet
  • IF [int-keyed maps are already implemented] we can motivate realignment to that now.
  • IF NOT (gfc has string maps, but not int-keyed maps and those would be a lot of work), proceed w/ tuple. but keep it easy to change this
@anorth

This comment has been minimized.

Copy link
Contributor Author

@anorth anorth commented Nov 5, 2019

Problem: future-proofing.

Quoth @jbenet

in light of evolving protocols, security oriented protocols that serialize into non-self-describing formats take great care to ensure fields are appropriately tagged to ensure the right serialized field value is serialized/deserialized into the right in-memory field. protobuf, capnp, and more enforce this, and have for decades, for precisely protocol evolution and security. deserializing field A into field B is a class of bug trivially defeated and not worth exposing ourselves to.

  • this compounds as formats change and programs (which do not all update in lockstep_ continue to read old and new versions of structures).
  • this is made specially worse in hash-linked data structures which cannot be upgraded by migrating data, but instead tend to force all programs in the future to read old structure versions. field tagging is key for secure schema evolution
@anorth

This comment has been minimized.

Copy link
Contributor Author

@anorth anorth commented Nov 5, 2019

Another annoyance I have just learned about is that tuple-encoding does not play nicely with graphsync. IPLD selectors operate over the encoded IPLD nodes, which in this case will be lists. So selectors for chain syncing need to be expressed with indices, rather than field names.

This is not the end of the world, of course. We can declare (or even reflect) a mapping of field name->index and use symbolic constants to construct queries.

@anorth

This comment has been minimized.

Copy link
Contributor Author

@anorth anorth commented Nov 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.