This RFC proposes an algorithm to normalise blobs of data such that the hashing algorithm (RFC0010) is predictable. This RFC replaces the item canonicalisation obsoleted by RFC0010.
Note that throughout this RFC blob and item are used interchangeably.
The current specification suggests how null values and empty strings should be treated in terms of normalisation but it is not clear so it is open for interpretation.
Blobs of data must have a normal form such that algorithms processing them (e.g. hashing algorithm (RFC0010)) have a consistent output. Blobs of data are attribute-value pair associative arrays where values are Strings but nulls have to be addressed with care.
The following algorithm ensures all nullable values are removed from the normalised representation:
- Let blob be the blob to normalise.
- Let result be an empty dictionary.
- Foreach (attr, value) pair in blob:
- If value is null, continue.
- If value is an empty String, continue.
- If value is an empty Set, continue.
- If value is a Set:
- Let normSet be an empty Set.
- Foreach el in value:
- If el is null, continue.
- If el is an empty String, continue.
- Otherwise, normalise el and append the result to to normSet.
- If normSet is empty, continue.
- Otherwise, set (attr, normSet) to result.
- Let normValue be null.
- Normalise value and set normValue.
- Set (attr, normValue) to result.
In summary, any value that is an empty String, empty Set or Set with empty strings in it is normalised as a null value and removed from the normalised result.
Note that this normalisation allows parity between flexible formats like JSON and rigid formats like CSV. For example, in CSV having an empty field, emtpy string, would be normalised as null.
A string should be in the NFC form as defined by the Unicode standard: https://en.wikipedia.org/wiki/Unicode_equivalence
There are no known security implications arising from this proposal.