Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
74 lines (54 sloc) 2.34 KB
rfc start_date decision_date pr status
16
2018-08-30
2018-09-07
openregisters/registers-rfc#33
approved

Blob normalisation

Summary

This RFC proposes an algorithm to normalise blobs of data such that the hashing algorithm (RFC0010) is predictable. This RFC replaces the item canonicalisation obsoleted by RFC0010.

Note that throughout this RFC blob and item are used interchangeably.

Motivation

The current specification suggests how null values and empty strings should be treated in terms of normalisation but it is not clear so it is open for interpretation.

Explanation

Blobs of data must have a normal form such that algorithms processing them (e.g. hashing algorithm (RFC0010)) have a consistent output. Blobs of data are attribute-value pair associative arrays where values are Strings but nulls have to be addressed with care.

The following algorithm ensures all nullable values are removed from the normalised representation:

  1. Let blob be the blob to normalise.
  2. Let result be an empty dictionary.
  3. Foreach (attr, value) pair in blob:
    1. If value is null, continue.
    2. If value is an empty String, continue.
    3. If value is an empty Set, continue.
    4. If value is a Set:
      1. Let normSet be an empty Set.
      2. Foreach el in value:
        1. If el is null, continue.
        2. If el is an empty String, continue.
        3. Otherwise, normalise el and append the result to to normSet.
      3. If normSet is empty, continue.
      4. Otherwise, set (attr, normSet) to result.
    5. Otherwise,
      1. Let normValue be null.
      2. Normalise value and set normValue.
      3. Set (attr, normValue) to result.

In summary, any value that is an empty String, empty Set or Set with empty strings in it is normalised as a null value and removed from the normalised result.

Note that this normalisation allows parity between flexible formats like JSON and rigid formats like CSV. For example, in CSV having an empty field, emtpy string, would be normalised as null.

String normalisation

A string should be in the NFC form as defined by the Unicode standard: https://en.wikipedia.org/wiki/Unicode_equivalence

Security considerations

There are no known security implications arising from this proposal.