-
Notifications
You must be signed in to change notification settings - Fork 23
Description
As we're working with crypto at Status, we regularly get bitten by the different types in Nim libraries to represent binary data.
Context:
Crypto and hashing libraries in the wild uses arrays or seq of uint8, char, byte, object, uint32, cuchar and also strings to represent the binary hashes. I have noted the following issues:
A. Output hash type is inconsistent:
- array[N, char]:
- array[N, uint8]:
- object-wrapped array[bits div 8, uint8] (number of bits is part of the type):
- keccak_tiny (Status)
- nimcrypto (Status)
- array[N, byte]:
- Eth-keys (Status)
- distinct array[N, uint32]:
- string
- ptr cuchar + len
- nimssl (bindings autogenerated on install so not visible on Github)
B. 0.18 introduction of $ for arrays broke libraries:
C. Input type is inconsistent:
- a string (binary buffer)
- a hex string or a binary buffer (any kind: string, arrays, sequences)
- keccak_tiny (Status) and
its dependency memRange package
- keccak_tiny (Status) and
- a hex string
- eth_keys (Status)
- ptr cuchar + len
- nimssl (bindings autogenerated on install so not visible on Github)
- ptr uint8 + len
- nimcrypto. Further overloads are coming.
D. $ Conversion of the hash type is inconsistent between string, hex or binary representation:
- nimSHA2
$on array[N, char] (N = 28, 32, 48, 64 only) returns a binary string- every other size returns the string representation of the numbers
- libsodium also returns a binary string and requires
bin2hexfor the hex representation. - keccak_tiny (Status) returns the hex representation
- md5 returns the hex representation
- nim-eth-keys (Status) uses the default
$for array of bytes (returns a string of uint8) and requiretoHexfor the hex representation - nimssl doesn't have
$(it's a pointer). It requirestoHexfor the hex representation.
Note that we are currently having this discussion internally, but since Araq mentionned his pain working with hashes following 0.18, I believe the whole Nim ecosystem would benefit.
Stakes
Nimbus (A client for Ethereum 2.0) will bring a lot of attention to Nim crypto ecosystem and it's in our best interest to adopt healthy common standards.
Beyond the internal representations, I think a common set of APIs is what matters
Questions and considerations for API:
1. "String" input
proc hash1234(s: string): Hash = ..., what should s be, a hex string or binary data?
2. Variable size input
Which one(s) among
proc hash1234(s: seq[uint8]): Hash = ...proc hash1234(s: seq[byte]): Hash = ...proc hash1234(s: seq[char]): Hash = ...proc hash1234(s: string): Hash = ...
3. $ behaviour
Which one among:
- An error
- The hex representation
- The uint8/byte numbers
- Convert to binary blob
4. Chaining
hash1234(hash1234(foo)) should be allowed without casting/conversion
5. Hex output
Which one among:
$toHex
Opinions
For consistency one type between byte/uint8/char should be used.
At the same time one type between seq[byte]/seq[uint8]/seq[char] and string should be used.
Note: Those are only my opinions and does not represent the opinion of all Status devs
Opinion 1: $ is very error prone right now. From an ergonomic point of view, it makes sense to convert binary blob/hashes to their hex representation.
Opinion 2: $ converting to the string repr of seq[uint8] can be avoided by using distinct types or wrapping Hash = array[N, byte|uint8] in an object.
Opinion 3: The internal representation should be byte.
Rationale:
- byte is for data or binary blob that are read/written
- uint8 is for numbers (add, multiply, divide, modulo ...)
- char is for text that can be printed.
A counterpoint can be made that byte (or char) hides the actual size of the type.
However Windows and POSIX require 1 byte = 8-bit. The only systems with 1 byte != uint8 are DSP chips (they use 1 byte = 12, 16, 24 bit), see here.
Opinion 4: Let's nail it! 🎆
Inviting: @jangko, @cheatfate, @Araq, @zah, @yglukhov to the discussion.