Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyspec SSZ; HTR caching of Vector. #1481

Closed
wants to merge 1 commit into from

Conversation

protolambda
Copy link
Collaborator

Although I still like to see pyspec-ssz replaced with py-ssz or moved out to refactor more, it looks like py-ssz is not quite ready, or maybe not the right choice. And moving it out and refactoring brings different problems: keeping it sync, and readability / size (py-ssz pyrsistent looks great, but also ~10x more code to go through as a reader).

The two big pain points for mainnet ssz tree-hashing are:

  • the large validator registry
  • the large vectors of data

In tests the validator registry is not that big however, it is really the vector data that's slow without caching: 8192 roots to merkleize together.

So this PR split a vector into smaller vectors during hash-tree-root, and caches the results. On a modification, it removes the cache entry. And caching is only active for large enough vectors, of elements of an immutable type.

A quick bench shows a ~80 times improvement for a Vector[Bytes32, 8192], when modifying elements in a rotation (like the historical vectors in the spec): https://gist.github.com/protolambda/4509db7f91d07b40a65ca3daf1e37685

Writing some tests and a bench of the BeaconState later.

Functionally this does not change SSZ or the spec. And although not too pretty, it helps to make mainnet test generation more bareable.

Note: base-branch on the other SSZ PR, which I would like to merge first, and then update the base.

@protolambda protolambda added the scope:SSZ Simple Serialize label Nov 15, 2019
@protolambda
Copy link
Collaborator Author

Update: inclined to specialize the pyspec-ssz implementation for merkle proofs and caching by doing something like I describe here and like in this POC, which would make this temporary caching hack unnecessary.

@protolambda
Copy link
Collaborator Author

Update: I'm experimenting with a new python SSZ implementation build with binary trees as backings, thus caching every single hash in-place by default. See https://github.com/protolambda/pymerkles

So far it:

  • supports every SSZ type (except Union...)
  • binary tree backings (also partial trees!) work
  • initial phase of implementing deserialization (required for running tests)
  • compatible with spec, but needs more testing (see spec experiment file)

When I completete the serialize/deserialize part, and when it meets pyspec tests, we can swap the pyspec implemention and avoid caching-hacks like in this PR.

@protolambda
Copy link
Collaborator Author

Closing this in favor of #1552 and future iterations of that.

@protolambda protolambda closed this Jan 2, 2020
@protolambda protolambda deleted the vector-caching branch February 9, 2020 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:SSZ Simple Serialize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant