Skip to content

Commit

Permalink
mix's changes to slp_encoding
Browse files Browse the repository at this point in the history
  • Loading branch information
mixmix committed Feb 3, 2020
1 parent a40f3ab commit 5adc333
Show file tree
Hide file tree
Showing 2 changed files with 78 additions and 76 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,15 +196,15 @@ var Derive = MakeDeriver(feed_id, prev_msg_id)

function MakeDeriver (feed_id, prev_msg_id) {
return function (key, label, length) {
var data = [feed_id, prev_msg_id, label]
return HKDF.Expand(key, encode(data), length)
var info = [feed_id, prev_msg_id, label]
return HKDF.Expand(key, encode(info), length)
}
}
```

and further:
- `feed_id` and `prev_msg_id` are encoded in standard binary format (TODO)
- `encode` is a shallow lenth-prefixed (SLP) encoding of an ordered list
- `encode` is a [shallow lenth-prefixed (SLP) encoding](./slp-encoding.md) of an ordered list
- `HKDF.Expand` is a hmac-like function which is specifically designed to generate random buffers of a given length.
- HKDF-Expand uses `sha256` for hashing, a hash-length of 32 bytes, and the final Derived-Secret length is also 32 bytes.
- example of a node.js implementation : [futoin-hkdf](https://www.npmjs.com/package/futoin-hkdf#hkdfexpandhash-hash_len-prk-length-info-%E2%87%92-buffer)
Expand Down
148 changes: 75 additions & 73 deletions info_encoding.md → slp_encoding.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,14 @@
# Encoding of the `info` Field in Derivations

The key derivation should include all information of the context of the derivation. This includes information that is valid only for this particular message, as well as the intended use of the key.
The key derivation should include all information of the context of the derivation. This includes information that is valid only for this particular message, as well as the intended use of the key.

Once we have agreed on a suitable selection of values to include, we also must encode them in such a way that it can not collide with the honest encoding of a different set of values. Obviously, an adversary can perform a different encoding, but this does not matter, as we assume they can not trick the user into performing off-standard derivations or use an off-standard encoding for their derivations. Implementations must make sure this assumption holds.

## Notation and Definitions

In this document, we write lists as comma seperated value, enclosed in parentheses. Therefore a list of values 1, 2 and 3 would be written as `(1, 2, 3)`. Binary concatenation is `||`.

For the purpose of this document, common byte order is defined as little endian. In pseudocode, the type `uint16` has the method `encode`, that returns a string containing the common byte order encoding of the number:
```
method encode() of uint16 {
var out [2]byte
out[0] = byte(self & 0xff)
out[1] = byte(self >> 8)
return out
}
```

For convenience, the function `encodeU16` takes any positive integer as input, converts it to `uint16` and returns the result of `encode`:
```
function encodeU16(i int) {
assert i >= 0
return uint16(i).encode()
}
```
For the purpose of this document, **common byte order is defined as little endian**. (see [Code section][LE-code] for more detail)

## Requirements

Expand All @@ -41,9 +23,9 @@ At this point, we do not require to be able to encode nested data. However, we w

## Format

The proposed format is called _shallow length prefixed (SLP)_ encoding. Shallow, because it does not specify nesting. Length-prefixed, because each element is prefixed with its length.
The format is called _shallow length prefixed (SLP)_ encoding. Shallow, because it does not specify nesting. Length-prefixed, because each element is prefixed with its length.

Specifically, the encoding is the element-wise concatenation of the the length of the elements and the element. The length is a uint16 encoded using common byte order. The pseudocode of the algorithm can be found in section [Code].
Specifically, the encoding is the element-wise concatenation of the the length of the elements and the element. The length is a uint16 encoded using common byte order. The pseudocode of the algorithm can be found in the [Code section][SLP-code].

For example, the list
```
Expand All @@ -58,45 +40,53 @@ encodeU16(len(a)) || a || encodeU16(len(b)) || b || encodeU16(len(c)) || c

Ordered key-value datasets can be encoded by first constructing and then encoding a list from the dataset. To construct the list, start with an empty list, iterate over the dataset, and then for each key-value pair in the dataset first add the key and then the value to the list.

### Example
### Example 1 - encode keys and values

The list representation of the dataset

```json
To derive the read key from the message key, we perform a derivation with the info argument of HKDF.Expand set to the encoding of the following key-value list, displayed here to look a bit like a JSON object:
```js
{
"key1": "value1",
"key2": "value2"
"purpose": "box2",
"type": "read key",
"feed": "@feedID", // this is a placeholder
"prev": "%msgID" // this is a placeholder
}
```
is given by

This would be list-encoded by alternating between keys and values:
```
("key1", "value1", "key2", "value2")
("purpose", "box2", "type", "read key", "feed", "@feedID", "prev", "%msgID")
```

## Nested Data
Here the "schema" is the expected keys and their values, along with the order in which they're encoded.

Later applications may require nested data. Here, we propose two ways to achieve this feature in a backwards-compatible fashion. Again, we stress that due to the lack of schemalessness, new derivations can assign whatever meaning to the contents of the buffers. This does not limit the usefulness or security of the simpler, shallow encoding.

### Path-Style Keys

When using the lists to describe key-value sets, the keys can be chosen to represent paths. This requires defining a path delimiter (e.g. '.', '/' or '\0'), or using length-prefixed lists to separate path segments. Then, if two paths begin with the same sequence of path segments, they are in the same compound data structure. The compound data structure itself is named by the shared prefix.

### _Recursive Length Prefix (RLP)_ Encoding
The encoding of this list is
```
p u r p o s e b o x 2 t y p e r e a d k e y f e e d @ f e e d I D p r e v @ m s g I D
0700 707572706f7365 0400 626f7832 0400 74797065 0800 72656164206b6579 0400 66646463 0700 40666565644944 0400 70726576 0600 406d73674944
^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^
length length length length length length length length
```

RLP is similar to SLP, except that it allows list elements to themselves be lists. If that is the case, the element is first RLP-encoded itself, before the length is taken and it is concatenated.
### Example 2 - encode values (key implied by order)

Example: The list
Alternatively, we could also ditch the keys and just use the values. Using the same data from the previous example, we would get this list:
```
(a, (b, c))
("box2", "read key", "@feedID", "%msgID")
```
becomes:
That list encodes to
```
encodeU16(len(a)) || a || encodeU16(4 + len(b) + len(c)) || encodeU16(b) || b || encodeU16(c) || c
b o x 2 r e a d k e y @ f e e d I D @ m s g I D
0400 626f7832 0800 72656164206b6579 0700 40666565644944 0600 406d73674944
^^^^ ^^^^ ^^^^ ^^^^
length length length length
```
Note that this is more in line with the idea of using schemas. For example, the schema for the box2 read key dictates that what follows is first a feed ID, and then a message ID. Also, in reality the feed and message IDs would be the binary encoding of the ID.


<a name="section-code"> </a>
## Code

### SLP Encode

```
function encode(list List) Buffer {
var out Buffer
Expand All @@ -111,7 +101,7 @@ function encode(list List) Buffer {
}
```

### Go
#### Go

```go
import "encoding/binary"
Expand All @@ -136,11 +126,11 @@ func Encode(list [][]byte) []byte {
}
```

### JavaScript
#### JavaScript

```js
function binEncodeUInt16(target, number) {
target.writeUInt16BE(number, 0)
target.writeUInt16LE(number, 0)
}

function encode (list) {
Expand All @@ -160,40 +150,52 @@ where
- `list` is an Array of Buffers
- `encode` returns a Buffer

## Examples

To derive the read key from the message key, we perform a derivation with the info argument of HKDF.Expand set to the encoding of the following key-value list, displayed here to look a bit like a JSON object:
```js
{
"purpose": "box2",
"type": "read key",
"feed": "@feedID", // this is a placeholder
"prev": "%msgID" // this is a placeholder
}
```
This would be list-encoded by alternating between keys and values:
### Little Endian Encode

In pseudocode, the type `uint16` has the method `encode`, that returns a string containing the common byte order encoding of the number:
```
("purpose", "box2", "type", "read key", "feed", "@feedID", "prev", "%msgID")
method encode() of uint16 {
var out [2]byte
out[0] = byte(self & 0xff)
out[1] = byte(self >> 8)
return out
}
```
The encoding of this list is

For convenience, the function `encodeU16` takes any positive integer as input, converts it to `uint16` and returns the result of `encode`:
```
p u r p o s e b o x 2 t y p e r e a d k e y f e e d @ f e e d I D p r e v @ m s g I D
0700 707572706f7365 0400 626f7832 0400 74797065 0800 72656164206b6579 0400 66646463 0700 40666565644944 0400 70726576 0600 406d73674944
^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ ^^^^
length length length length length length length length
method encodeU16(i int) {
assert i >= 0
return uint16(i).encode()
}
```

Alternatively, we could also ditch the keys and just use the values. Using the same data from the previous example, we would get this list:
---

## Nested Data

Later applications may require nested data. Here, we propose two ways to achieve this feature in a backwards-compatible fashion. Again, we stress that due to the lack of schemalessness, new derivations can assign whatever meaning to the contents of the buffers. This does not limit the usefulness or security of the simpler, shallow encoding.

### Path-Style Keys

When using the lists to describe key-value sets, the keys can be chosen to represent paths. This requires defining a path delimiter (e.g. '.', '/' or '\0'), or using length-prefixed lists to separate path segments. Then, if two paths begin with the same sequence of path segments, they are in the same compound data structure. The compound data structure itself is named by the shared prefix.

### _Recursive Length Prefix (RLP)_ Encoding

RLP is similar to SLP, except that it allows list elements to themselves be lists. If that is the case, the element is first RLP-encoded itself, before the length is taken and it is concatenated.

Example: The list
```
("box2", "read key", "@feedID", "%msgID")
(a, (b, c))
```
That list encodes to
becomes:
```
b o x 2 r e a d k e y @ f e e d I D @ m s g I D
0400 626f7832 0800 72656164206b6579 0700 40666565644944 0600 406d73674944
^^^^ ^^^^ ^^^^ ^^^^
length length length length
encodeU16(len(a)) || a || encodeU16(4 + len(b) + len(c)) || encodeU16(b) || b || encodeU16(c) || c
```
Note that this is more in line with the idea of using schemas. For example, the schema for the box2 read key dictates that what follows is first a feed ID, and then a message ID. Also, in reality the feed and message IDs would be the binary encoding of the ID.

[Code]: #section-code

[SLP-code]: #slp-encode
[LE-code]: #little-endian-encode

0 comments on commit 5adc333

Please sign in to comment.