Skip to content

RFC: Standardize API of all crypto/hashes packages  #32

@mratsim

Description

@mratsim

As we're working with crypto at Status, we regularly get bitten by the different types in Nim libraries to represent binary data.

Context:

Crypto and hashing libraries in the wild uses arrays or seq of uint8, char, byte, object, uint32, cuchar and also strings to represent the binary hashes. I have noted the following issues:

A. Output hash type is inconsistent:

  • array[N, char]:
  • array[N, uint8]:
  • object-wrapped array[bits div 8, uint8] (number of bits is part of the type):
  • array[N, byte]:
  • distinct array[N, uint32]:
  • string
  • ptr cuchar + len
    • nimssl (bindings autogenerated on install so not visible on Github)

B. 0.18 introduction of $ for arrays broke libraries:

C. Input type is inconsistent:

  • a string (binary buffer)
  • a hex string or a binary buffer (any kind: string, arrays, sequences)
  • a hex string
  • ptr cuchar + len
    • nimssl (bindings autogenerated on install so not visible on Github)
  • ptr uint8 + len

D. $ Conversion of the hash type is inconsistent between string, hex or binary representation:

  • nimSHA2 $ on array[N, char] (N = 28, 32, 48, 64 only) returns a binary string
    • every other size returns the string representation of the numbers
  • libsodium also returns a binary string and requires bin2hex for the hex representation.
  • keccak_tiny (Status) returns the hex representation
  • md5 returns the hex representation
  • nim-eth-keys (Status) uses the default $ for array of bytes (returns a string of uint8) and require toHex for the hex representation
  • nimssl doesn't have $(it's a pointer). It requires toHex for the hex representation.

Note that we are currently having this discussion internally, but since Araq mentionned his pain working with hashes following 0.18, I believe the whole Nim ecosystem would benefit.

Stakes

Nimbus (A client for Ethereum 2.0) will bring a lot of attention to Nim crypto ecosystem and it's in our best interest to adopt healthy common standards.
Beyond the internal representations, I think a common set of APIs is what matters

Questions and considerations for API:

1. "String" input

proc hash1234(s: string): Hash = ..., what should s be, a hex string or binary data?

2. Variable size input

Which one(s) among

  • proc hash1234(s: seq[uint8]): Hash = ...
  • proc hash1234(s: seq[byte]): Hash = ...
  • proc hash1234(s: seq[char]): Hash = ...
  • proc hash1234(s: string): Hash = ...

3. $ behaviour

Which one among:

  • An error
  • The hex representation
  • The uint8/byte numbers
  • Convert to binary blob

4. Chaining

hash1234(hash1234(foo)) should be allowed without casting/conversion

5. Hex output

Which one among:

  • $
  • toHex

Opinions

For consistency one type between byte/uint8/char should be used.
At the same time one type between seq[byte]/seq[uint8]/seq[char] and string should be used.

Note: Those are only my opinions and does not represent the opinion of all Status devs

Opinion 1: $ is very error prone right now. From an ergonomic point of view, it makes sense to convert binary blob/hashes to their hex representation.

Opinion 2: $ converting to the string repr of seq[uint8] can be avoided by using distinct types or wrapping Hash = array[N, byte|uint8] in an object.

Opinion 3: The internal representation should be byte.

Rationale:

  • byte is for data or binary blob that are read/written
  • uint8 is for numbers (add, multiply, divide, modulo ...)
  • char is for text that can be printed.

A counterpoint can be made that byte (or char) hides the actual size of the type.
However Windows and POSIX require 1 byte = 8-bit. The only systems with 1 byte != uint8 are DSP chips (they use 1 byte = 12, 16, 24 bit), see here.

Opinion 4: Let's nail it! 🎆

Inviting: @jangko, @cheatfate, @Araq, @zah, @yglukhov to the discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions