Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssz: switch integer encoding to little endian #139

Merged
merged 3 commits into from
Jan 17, 2019

Conversation

arnetheduck
Copy link
Contributor

the choice between little and big endian is arbitrary from a functional
point of view, but practially:

  • most commodity hardware these days is either little- or biendian
  • mechanical sympathy between encoding and hardware allows a wider range
    of tricks to be used when encoding and decoding data leading to better
    efficiency
  • we're developing a format that favors "decoding-free" access to data

the choice between little and big endian is arbitrary from a functional
point of view, but practially:

* most commodity hardware these days is either little- or biendian
* mechanical sympathy between encoding and hardware allows a wider range
of tricks to be used when encoding and decoding data leading to better
efficiency
* we're developing a format that favors "decoding-free" access to data
@hwwhww hwwhww requested a review from vbuterin November 17, 2018 08:35
@hwwhww hwwhww added the general:RFC Request for Comments label Nov 30, 2018
@JustinDrake
Copy link
Collaborator

Pinging @AlexeyAkhunov and @karalabe. Big or little endian?

@djrtwo
Copy link
Contributor

djrtwo commented Dec 5, 2018

The main push-back I heard on this was that eth1.0 uses big-endian so don't introduce a different-endian encoding in eth2.0. The small gain in efficiency is not worth the potential confusion and overhead of having to remember which is which.

@arnetheduck
Copy link
Contributor Author

Well, there's a fairly clean break here, considering SSZ vs RLP - it's also an unfortunate fact of life that you have to remember endianness whenever you deal with binary protocols in general.

The main performance difference will be that adjacent integers can either be bulk-copied or will have to be byte-flipped one-by-one. It affects both network serialization and hashing.

@JustinDrake
Copy link
Collaborator

The main performance difference will be that adjacent integers can either be bulk-copied or will have to be byte-flipped one-by-one.

Can this be quantified? What is the performance difference?

It affects both network serialization and hashing.

Would it negatively affect light clients of Ethereum 2.0 built in Ethereum 1.0 contracts?

@mkalinin
Copy link
Collaborator

mkalinin commented Dec 7, 2018

It could be that a win in efficiency gained with this optimization is too low comparing with efficiency of other operations, for example, calculating a hash of validators registry.

Another thing is that all big number implementations in Java that I've seen uses big-endian to encode/decode numbers to/from byte arrays. So does Milagro, even in C implementation. What about other languages, btw? And in our case it results in reversing signature bytes on each encoding/decoding since signature has uint384 type. There is a workaround to it, use byteN type for all big numbers starting from uint72. Otherwise, reversing 48 bytes could be much less efficient than reversing byte order in several adjacent primitives that I believe could be done with bitwise arithmetic.

@arnetheduck
Copy link
Contributor Author

Can this be quantified? What is the performance difference?

I'll see if I can pull up some numbers, but we're really not on that stage yet (it's a pretty low-level / final-touch optimization) - the idea itself is mainly taken from other modern serialization formats that state "direct access" as one of their design goals, for example flatbuffers.

all big number implementations

yep - though here the machine endianess no longer matters - there's no mechanical sympathy to consider because you can't directly use these numbers anyway, and at this point, it's kind of.. arbitrary.

@arnetheduck
Copy link
Contributor Author

anyway, if there's pushback, we can certainly drop this - it's a drop in the sea, as @mkalinin points out (or one of many paper cuts).

@mkalinin
Copy link
Collaborator

mkalinin commented Dec 8, 2018

yep - though here the machine endianess no longer matters

Agree. We may use whichever endiannes for big numbers depending on the case. For instance, BLS12-381#Serialization defines that Fq elements are encoded in big-endian form. And endiannes could be explicitly defined in our spec for this particular value.

The main push-back I heard on this was that eth1.0 uses big-endian so don't introduce a different-endian encoding in eth2.0.

As for me, this is not a strong argument. Cause eth2.0 has many differences wrt eth1.0 and that's even great.

I am not opposed to little endian. Indeed, it's better to have an optimization opportunity even if doesn't seem too valuable at the moment. Possible solution for big numbers would be in representing them with bytesN type in the spec.

@JustinDrake
Copy link
Collaborator

one of many paper cuts

@arnetheduck Is there anything beyond your current 3 issues and 1 PR? I'm keen to address as many issues as possible before we declare the spec a release candidate, so now is a good time to flag things. 👍

@arnetheduck
Copy link
Contributor Author

arnetheduck commented Dec 10, 2018

one of many paper cuts
Is there anything beyond

ah, I hope that it did not came across wrong - it was intended as a general comment and not to say that there are necessarily many in the spec as of now :)

I'll go over my notes and see what is still relevant after the latest refactorings (:+1: good work!), and post ASAP!

@arnetheduck
Copy link
Contributor Author

It's worth noting WASM is little-endian also: https://github.com/WebAssembly/design/blob/master/Semantics.md#linear-memory-accesses

@sorpaas
Copy link

sorpaas commented Jan 14, 2019

SSZ and RLP are already vastly different, so I think using big-endian because eth1.0 uses big-endian may not be that of a really strong argument.

Besides all the architectures using little-endian, for Parity there's also a really specific reason we would prefer little-endian -- our parity-codec format uses little-endian, and parity-codec and ssz are nearly identical in all the basic forms, just except the endianness! By using little-endian, we can unify those two formats.

@JustinDrake
Copy link
Collaborator

Pros of little-endian:

  • Consistent with WASM
  • Consistent with commodity hardware
  • Consistent with parity-codec

Cons of little-endian:

  • Inconsistent with RLP (I agree this is a weak argument)
  • Inconsistent with big number implementations (can be worked around with byteN)

Little-endian feels on net positive :)

@JustinDrake
Copy link
Collaborator

Consensus reached on the Eth2.0 call. Thanks @djrtwo 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general:RFC Request for Comments scope:SSZ Simple Serialize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants