mix's changes to slp_encoding

ssbc · Feb 3, 2020 · 5adc333 · 5adc333
1 parent a40f3ab
commit 5adc333
Show file tree

Hide file tree

Showing 2 changed files with 78 additions and 76 deletions.
diff --git a/README.md b/README.md
@@ -196,15 +196,15 @@ var Derive = MakeDeriver(feed_id, prev_msg_id)
 
 function MakeDeriver (feed_id, prev_msg_id) {
   return function (key, label, length) {
-    var data = [feed_id, prev_msg_id, label]
-    return HKDF.Expand(key, encode(data), length)
+    var info = [feed_id, prev_msg_id, label]
+    return HKDF.Expand(key, encode(info), length)
   }
 }
 ```
 
 and further:
 - `feed_id` and `prev_msg_id` are encoded in standard binary format (TODO)
-- `encode` is a shallow lenth-prefixed (SLP) encoding of an ordered list
+- `encode` is a [shallow lenth-prefixed (SLP) encoding](./slp-encoding.md) of an ordered list
 - `HKDF.Expand` is a hmac-like function which is specifically designed to generate random buffers of a given length.
   - HKDF-Expand uses `sha256` for hashing, a hash-length of 32 bytes, and the final Derived-Secret length is also 32 bytes.
   - example of a node.js implementation : [futoin-hkdf](https://www.npmjs.com/package/futoin-hkdf#hkdfexpandhash-hash_len-prk-length-info-%E2%87%92-buffer)

diff --git a/info_encoding.md → slp_encoding.md b/info_encoding.md → slp_encoding.md
@@ -1,32 +1,14 @@
 # Encoding of the `info` Field in Derivations
 
-The key derivation should include all information of the context of the derivation.  This includes information that is valid only for this particular message, as well as the intended use of the key.
+The key derivation should include all information of the context of the derivation. This includes information that is valid only for this particular message, as well as the intended use of the key.
 
 Once we have agreed on a suitable selection of values to include, we also must encode them in such a way that it can not collide with the honest encoding of a different set of values.  Obviously, an adversary can perform a different encoding, but this does not matter, as we assume they can not trick the user into performing off-standard derivations or use an off-standard encoding for their derivations.  Implementations must make sure this assumption holds.
 
 ## Notation and Definitions
 
 In this document, we write lists as comma seperated value, enclosed in parentheses. Therefore a list of values 1, 2 and 3 would be written as `(1, 2, 3)`.  Binary concatenation is `||`.
 
-For the purpose of this document, common byte order is defined as little endian. In pseudocode, the type `uint16` has the method `encode`, that returns a string containing the common byte order encoding of the number:
-```
-method encode() of uint16 {
-	var out [2]byte
-
-	out[0] = byte(self & 0xff)
-	out[1] = byte(self >> 8)
-
-	return out
-}
-```
-
-For convenience, the function `encodeU16` takes any positive integer as input, converts it to `uint16` and returns the result of `encode`:
-```
-function encodeU16(i int) {
-	assert i >= 0
-	return uint16(i).encode()
-}
-```
+For the purpose of this document, **common byte order is defined as little endian**. (see [Code section][LE-code] for more detail)
 
 ## Requirements
 
@@ -41,9 +23,9 @@ At this point, we do not require to be able to encode nested data. However, we w
 
 ## Format
 
-The proposed format is called _shallow length prefixed (SLP)_ encoding. Shallow, because it does not specify nesting. Length-prefixed, because each element is prefixed with its length.
+The format is called _shallow length prefixed (SLP)_ encoding. Shallow, because it does not specify nesting. Length-prefixed, because each element is prefixed with its length.
 
-Specifically, the encoding is the element-wise concatenation of the the length of the elements and the element. The length is a uint16 encoded using common byte order. The pseudocode of the algorithm can be found in section [Code].
+Specifically, the encoding is the element-wise concatenation of the the length of the elements and the element. The length is a uint16 encoded using common byte order. The pseudocode of the algorithm can be found in the [Code section][SLP-code].
 
 For example, the list
 ```
@@ -58,45 +40,53 @@ encodeU16(len(a)) || a || encodeU16(len(b)) || b || encodeU16(len(c)) || c
 
 Ordered key-value datasets can be encoded by first constructing and then encoding a list from the dataset.  To construct the list, start with an empty list, iterate over the dataset, and then for each key-value pair in the dataset first add the key and then the value to the list.
 
-### Example
+### Example 1 - encode keys and values
 
-The list representation of the dataset
-
-```json
+To derive the read key from the message key, we perform a derivation with the info argument of HKDF.Expand set to the encoding of the following key-value list, displayed here to look a bit like a JSON object:
+```js
 {
-	"key1": "value1",
-	"key2": "value2"
+	"purpose": "box2",
+	"type": "read key",
+	"feed": "@feedID", // this is a placeholder
+	"prev": "%msgID"   // this is a placeholder
 }
 ```
-is given by
+
+This would be list-encoded by alternating between keys and values:
 ```
-("key1", "value1", "key2", "value2")
+("purpose", "box2", "type", "read key", "feed", "@feedID", "prev", "%msgID")
 ```
 
-## Nested Data
+Here the "schema" is the expected keys and their values, along with the order in which they're encoded.
 
-Later applications may require nested data. Here, we propose two ways to achieve this feature in a backwards-compatible fashion. Again, we stress that due to the lack of schemalessness, new derivations can assign whatever meaning to the contents of the buffers. This does not limit the usefulness or security of the simpler, shallow encoding.
-
-### Path-Style Keys
-
-When using the lists to describe key-value sets, the keys can be chosen to represent paths. This requires defining a path delimiter (e.g. '.', '/' or '\0'), or using length-prefixed lists to separate path segments. Then, if two paths begin with the same sequence of path segments, they are in the same compound data structure. The compound data structure itself is named by the shared prefix.
-
-### _Recursive Length Prefix (RLP)_ Encoding
+The encoding of this list is
+```
+     p u r p o s e       b o x 2       t y p e       r e a d   k e y       f e e d       @ f e e d I D       p r e v       @ m s g I D
+0700 707572706f7365 0400 626f7832 0400 74797065 0800 72656164206b6579 0400 66646463 0700 40666565644944 0400 70726576 0600 406d73674944
+^^^^                ^^^^          ^^^^          ^^^^                  ^^^^          ^^^^                ^^^^          ^^^^
+length              length        length        length                length        length              length        length
+```
 
-RLP is similar to SLP, except that it allows list elements to themselves be lists. If that is the case, the element is first RLP-encoded itself, before the length is taken and it is concatenated.
+### Example 2 - encode values (key implied by order)
 
-Example: The list
+Alternatively, we could also ditch the keys and just use the values. Using the same data from the previous example, we would get this list:
 ```
-(a, (b, c))
+("box2", "read key", "@feedID", "%msgID")
 ```
-becomes:
+That list encodes to
 ```
-encodeU16(len(a)) || a || encodeU16(4 + len(b) + len(c)) || encodeU16(b) || b || encodeU16(c) || c
+     b o x 2       r e a d   k e y       @ f e e d I D       @ m s g I D
+0400 626f7832 0800 72656164206b6579 0700 40666565644944 0600 406d73674944
+^^^^          ^^^^                  ^^^^                ^^^^
+length        length                length              length
 ```
+Note that this is more in line with the idea of using schemas. For example, the schema for the box2 read key dictates that what follows is first a feed ID, and then a message ID. Also, in reality the feed and message IDs would be the binary encoding of the ID.
+
 
-<a name="section-code"> </a>
 ## Code
 
+### SLP Encode
+
 ```
 function encode(list List) Buffer {
 	var out Buffer
@@ -111,7 +101,7 @@ function encode(list List) Buffer {
 }
 ```
 
-### Go
+#### Go
 
 ```go
 import "encoding/binary"
@@ -136,11 +126,11 @@ func Encode(list [][]byte) []byte {
 }
 ```
 
-### JavaScript
+#### JavaScript
 
 ```js
 function binEncodeUInt16(target, number) {
-    target.writeUInt16BE(number, 0)
+  target.writeUInt16LE(number, 0)
 }
 
 function encode (list) {
@@ -160,40 +150,52 @@ where
 - `list` is an Array of Buffers
 - `encode` returns a Buffer
 
-## Examples
 
-To derive the read key from the message key, we perform a derivation with the info argument of HKDF.Expand set to the encoding of the following key-value list, displayed here to look a bit like a JSON object:
-```js
-{
-	"purpose": "box2",
-	"type": "read key",
-	"feed": "@feedID", // this is a placeholder
-	"prev": "%msgID"   // this is a placeholder
-}
-```
-This would be list-encoded by alternating between keys and values:
+### Little Endian Encode
+
+In pseudocode, the type `uint16` has the method `encode`, that returns a string containing the common byte order encoding of the number:
 ```
-("purpose", "box2", "type", "read key", "feed", "@feedID", "prev", "%msgID")
+method encode() of uint16 {
+	var out [2]byte
+
+	out[0] = byte(self & 0xff)
+	out[1] = byte(self >> 8)
+
+	return out
+}
 ```
-The encoding of this list is
+
+For convenience, the function `encodeU16` takes any positive integer as input, converts it to `uint16` and returns the result of `encode`:
 ```
-     p u r p o s e       b o x 2       t y p e       r e a d   k e y       f e e d       @ f e e d I D       p r e v       @ m s g I D
-0700 707572706f7365 0400 626f7832 0400 74797065 0800 72656164206b6579 0400 66646463 0700 40666565644944 0400 70726576 0600 406d73674944
-^^^^                ^^^^          ^^^^          ^^^^                  ^^^^          ^^^^                ^^^^          ^^^^
-length              length        length        length                length        length              length        length
+method encodeU16(i int) {
+	assert i >= 0
+	return uint16(i).encode()
+}
 ```
 
-Alternatively, we could also ditch the keys and just use the values. Using the same data from the previous example, we would get this list:
+---
+
+## Nested Data
+
+Later applications may require nested data. Here, we propose two ways to achieve this feature in a backwards-compatible fashion. Again, we stress that due to the lack of schemalessness, new derivations can assign whatever meaning to the contents of the buffers. This does not limit the usefulness or security of the simpler, shallow encoding.
+
+### Path-Style Keys
+
+When using the lists to describe key-value sets, the keys can be chosen to represent paths. This requires defining a path delimiter (e.g. '.', '/' or '\0'), or using length-prefixed lists to separate path segments. Then, if two paths begin with the same sequence of path segments, they are in the same compound data structure. The compound data structure itself is named by the shared prefix.
+
+### _Recursive Length Prefix (RLP)_ Encoding
+
+RLP is similar to SLP, except that it allows list elements to themselves be lists. If that is the case, the element is first RLP-encoded itself, before the length is taken and it is concatenated.
+
+Example: The list
 ```
-("box2", "read key", "@feedID", "%msgID")
+(a, (b, c))
 ```
-That list encodes to
+becomes:
 ```
-     b o x 2       r e a d   k e y       @ f e e d I D       @ m s g I D
-0400 626f7832 0800 72656164206b6579 0700 40666565644944 0600 406d73674944
-^^^^          ^^^^                  ^^^^                ^^^^
-length        length                length              length
+encodeU16(len(a)) || a || encodeU16(4 + len(b) + len(c)) || encodeU16(b) || b || encodeU16(c) || c
 ```
-Note that this is more in line with the idea of using schemas. For example, the schema for the box2 read key dictates that what follows is first a feed ID, and then a message ID. Also, in reality the feed and message IDs would be the binary encoding of the ID.
 
-[Code]: #section-code
+
+[SLP-code]: #slp-encode
+[LE-code]: #little-endian-encode