New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploring Serialization via Protobuf and Others #150
Comments
cc: @rawfalafel |
I am getting more sold on protobufs, especially with how it leaves decoding up to each client.
Do you have examples of other projects using gRPC that do this via docker compose? Orchestration seems to be the only big question that arises from this proposal. |
My thoughts on orchestration: we build the containers for each service then set up something to manage those containers. I'm not too familiar with docker compose, but we need something that achieves the following:
Here's an example of how I envision this workflow: # Build all of the service container images
./build
# Start up the test service infrastructure
./start
# Then run the tests against those services
./run_tests These shell scripts (or whatever) above would read a config file to outline port mappings for the test. |
We discussed this on gitter but I'll recap here: Protobuf was originally evaluated and passed on as a serialization mechanism because it doesn't provide byte-perfect consistency. With protobuf, the same object can be encoded multiple ways, and different encodings can be deserialized into the same object. @prestonvanloon mentioned that this isn't an issue once a proposer commits to a Hate to be the nay-sayer, especially because I'd like to see a faster encoding scheme replace RLP as well, but I don't think we can use protobuf as is. |
Did you guys consider fleece? Seems to have the properties needed, while being much simpler than protobuff. https://github.com/couchbaselabs/fleece/blob/master/README.md |
@tfalencar No we haven't, but a quick 15 second scan of this project and I found this:
To be a reasonable replacement for RLP, it should preferably work for all modern languages. With that said, nothing is out of question for this. If you have ideas or would like to explore fleece and share your results then the community would be interested! |
It might be worth to revisit this again now sharding's breaking away from main chain to beacon chain, it's more feasible to switch over to protobuf from rlp with a different consensus protocol. The likely case is to use protobuf to replace RLP with blob serialization |
I've been exploring this topic as well, with the thinking of using FlatBuffers over ProtocolBuffers. The main benefit (IMO) is that FlatBuffers allows for accessing the serialized data in a record without having to unpack it first. There are very large performance implications of this, of course. https://google.github.io/flatbuffers/ Thoughts @prestonvanloon? |
Another potential alternative: Cap'n Proto https://capnproto.org/ by the guy who implemented Protobuf at Google in the first place. It seems to fit:
|
Also Cap'n proto has tons of language support - perhaps we can put together a small repo where we play around with these different schema-based serialization protocols across their different language implementations? |
@rawfalafel, thanks for raising the concern of byte-perfect consistency on protobuf. I'm exploring a similar problem recently. Do you still recall the concrete example of |
@prestonvanloon have you conducted the cross-language experiments? How's consistency? Thanks! |
Yep, take an encoding of a protobuf object and reorder the fields. They should |
BTW, we're exploring a new encoding described here: https://github.com/ethereum/beacon_chain/blob/master/ssz/README.md |
@rawfalafel, thanks for taking my question. I want to clarify if you mean
And
Will be marshaled into the same bytes? If my understanding is correct, I've some followup questions:
|
Protobuf allows fields to be encoded in any order to facilitate merging two messages.
Honest nodes should never encode in a different order. The problem though is when a malicious user purposefully encodes in the wrong order. In this scenario, the same message can have multiple encodings, and therefore multiple hashes, and break consensus. |
This seems to have been resolved as every team is going for simple serialize at the moment - thoughts on closing this @prestonvanloon? |
This issue exists to track progress on exploration of other serialization strategies for sharding and Ethereum. We'll likely want to move this into a new repository once work has been started.
Motivation
With RLP and other serialization mechanisms for Ethereum, it feels a bit like reinventing the wheel when there may be a more supported open source library.
The main motivation for RLP:
The question we try to answer is whether or not this is an issue that is not already solved by protocol buffers or other mechanisms.
Challenges with Hashing in Different languages
See RLP design rationale for more context.
Google Protobuf
How to test consistency across all languages?
One option is to write a gRPC service definition and implement the test in each popular language. The test would be easy to extend to another language, provided that it implements the service.
gRPC server for each language
Example service defintion:
The request proto has an object resymboling a block then the service response with the resulting hash. The test then compares this against the actual hash.
The test can and should be populated with real Ethereum blocks that have been mined and their associated hash. This provides solid evidence that these test cases are valid.
Why set up this infrastructure of gRPC services?
The main idea is that we can run these tests against each language with an agnostic client, in isolation.
Why gRPC?
Due to its low boilerplate, code generation, and structured payload.
List of official supported languages
List of 3rd party supported languages
There are probably many more languages...
How does the test client work?
The test client will act as a command line tool and most likely read from a series of config files.
We can imagine at least config for service to hit and another config for the test cases.
The client will send the test proto to each of the services listed, in parallel. At the end of test execution, the client will print and/or write a report of pass/fail for test cases.
Example output of the client:
Example services config:
Example test protos:
TODO: Real blocks with hashs in proto supported format.
What about service orchestration?
Maybe using docker compose?
It would be annoying to start many gRPC services locally without a single command.
What about benchmarks?
Benchmarks are important, but we already know RLP is not as good in terms of performance for serialization.
We can add language specific benchmarks after we answer the question: will this work at all?
The text was updated successfully, but these errors were encountered: