Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Info Encoding Draft Spec #6

Merged
merged 4 commits into from
Feb 3, 2020
Merged

Info Encoding Draft Spec #6

merged 4 commits into from
Feb 3, 2020

Conversation

keks
Copy link
Contributor

@keks keks commented Jan 28, 2020

mix: context here is we're looking for a way to encode infos ... what's the encode function?

function MakeDeriver(encode, feedId, prevMsgId) {
   return func(key, info, len) {
       data = copy(info)
       data.feedId = feedId
       data.prevMsgId = prevMsgId
       return HKDF.Expand(key, encode(data), len)
   }
}

Derive = MakeDeriver(standardEncode, feedId, prevMsgId)

I first sketched

  • the "flat" length-prefixed encoding (Shallow Length Prefix (SLP) encoding)
  • nested length-prefixed encoding without type hints (is this a buffer or a list) (Recursive Length Prefix (RLP) encoding)
  • nested length-prefixed encoding with type hints

I realized that since everyone who does the derivation knows which derivations they are supposed to do, there is a sort of (up until now) implicit schema.

After thinking about it for a while I could not come up with a reason why whoever performs the derivations would not have such a schema. For our base case, the schema is defined in the core box2 spec. For extensions, it will be in the respective extension's spec.

Therefore, I realized that "type confusion" is not possible, so I scratched the version with type hints.

Since the nested/recursive encoding is just a special case of the flat one, where the buffer contents has a special meaning (i.e. "is of a special type"), I realized that this is safe to use but currently we don't need it.

So I propose using the "flat" SLP for now, and use RLP is extensions where we really need it. After all, it only changes the contents of the buffers in the list, not the encoding of the outer list itself.

Open questions:

  • In the Requirements section there is a clunky sentence that needs to be rephrased a bit.
  • I just guessed big endian. I am not aware of any real pro/con for either. Any ideas/opinions?
  • I only now see that there are some rendering issues, e.g. the <a> tag for the code heading didn't work. I'll fix that later.
  • I'd be happy if you could review this! I'm sure it's still pretty rough.

@mixmix
Copy link
Member

mixmix commented Jan 30, 2020

I modified this to add context at the start of your message

info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
@mixmix
Copy link
Member

mixmix commented Jan 30, 2020

@keks this read fairly clearly to me:

✔️ I'm confident I'd know how to implement this
❌ I don't have the cryptography context to fully review this
❓ I think it could benefit from an example with actual infos

e.g.

to encode the infos for a secret the schema might be:

( feed_id, previous_msg_id, label )

where each of these are buffers, the "label" being a "utf8" encoded string in the form of a Buffer

- removed newlines for prettier rendering
- better wording around double negation
- clarified where custom applications specify their own schemas
- removed all references to endiannes except in Notations and
  Definitions section, and in code
- don't use l as a variable because it often looks like 1 or I
- add section on how to use lists for key-value lists
- add examples and more explanations around features they use
@keks
Copy link
Contributor Author

keks commented Jan 30, 2020

Hey @mixmix, thanks for the review! There are two items I didn't want to resolve without checking back with you, would be cool if you could confirm. Also I added a lot of new stuff, could you review that as well?

info_encoding.md Outdated
assert i >= 0
return uint16(i).encode()
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pseudocode confuses me, because it's too far from languages I'm familiar with. (i.e. I might not be the best to review this).
Are you just defining UInt16LE (unsigned integer, 16 bit, little-endian)?
Isn't this fairly common?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you say method () and function ()

Copy link
Member

@mixmix mixmix Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be peculiar, but I personally prefer a "pictionary style" approach to introducing new ideas.

  1. start fuzzy, big picture

"we store things as a concatenated list, so order matters"

  1. add more detail

"we prefix each thing we concatenate with it's length!"

  1. get real specific

the encoding of that length is UInt16LE , here's exactly what that means, and here's what it all looks like in details when you encode this example string with this example "schema" (the definition of what's in the list and the order)

This is just a style-comment. It's a style I find useful for teaching and introducing new ideas (I was a teacher for quite a few years). We don't have to change anything, this is pretty clear. This plus test-vectors will lock it all in

info_encoding.md Outdated
This is enough for most simple operations, and can be extended to
key-value stores.
We require the format to support lists of buffers. This is enough for most simple operations, and can be extended to key-value stores.
A property of many structured data encodings is schemalessness. To get a simpler encoding, we do not require this property. Therefore, derivation steps need to define a schema for the encoding of the info field. For the core derivations, this is given. Custom applications that are defined in extensions need to provide this as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clear enough.
I think let's just be specific about the core examples ... i.e. "see DeriveSecret"

info_encoding.md Outdated
is supposed to be used for.
If they are not, the adversary got them to execute their code already,
which means that particular user is in trouble anyways (REPHRASE).
Note that schemalessness would not help in this context, because all users who derive a key agree, by following the specification, on what that key is supposed to be used for up front, so even in a schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete sentence

info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
info_encoding.md Outdated Show resolved Hide resolved
@mixmix
Copy link
Member

mixmix commented Jan 31, 2020

Summary of review

@keks this is looking a lot clearer to me.
I'll put a 🔥 by the things which I think are really important to change before merge.

I think once we've got this encoding we just need to add a clear spec about how we use it in DeriveSecret and we're ready to go!


Really appreciate the detail you've put into this Keks.
Great work (ノ´ヮ´)ノ*:・゚✧

@mixmix mixmix merged commit 5ce16d2 into first_pass Feb 3, 2020
@mixmix
Copy link
Member

mixmix commented Feb 3, 2020

@keks I made several small changes I'd recommended, and have linked this in to README.md.

You can see the changes I made in the commit history above. Hope that's all good :)

@mixmix mixmix deleted the info_encoding branch February 3, 2020 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants