Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document missing parts of E2E #1284

Merged
merged 27 commits into from
Aug 20, 2018
Merged

Document missing parts of E2E #1284

merged 27 commits into from
Aug 20, 2018

Conversation

Zil0
Copy link
Contributor

@Zil0 Zil0 commented Jun 7, 2018

This is a step towards fixing #501, by merging what has already be written long ago by @richvdh in https://github.com/matrix-org/matrix-doc/tree/drafts/e2e.
Since this branch has long diverged, I had to resort to copy-pasting...

Opening this because I think I will need feedback as I go. Please do not merge until I remove the WIP tag. I think it will need squashing at some point anyway.

Signed-off-by: Valentin Deniaud <valentin.deniaud@inpt.fr>

@Zil0
Copy link
Contributor Author

Zil0 commented Jun 11, 2018

Some specific questions:

  • 9973445 is this correct? if yes, is it clear?
  • e83a955 does this need more detail? Is it the last thing that was missing from the olm section?

@Zil0 Zil0 changed the title [WIP][RFC] Document missing parts of E2E Document missing parts of E2E Jul 18, 2018
@@ -1,9 +1,3 @@
---
allOf:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is not a good solution according to the comment below? But I can't really see why.

}

The type and content of the plaintext message event are given in the payload.

We include the room ID in the payload, because otherwise the homeserver would
be able to change the room a message was sent in.

.. TODO: claimed_keys

Clients must confirm that the ``sender_key`` belongs to the user that sent the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change the beginning of the sentence, but I don't think it is very clear since it feels weird to say "clients must do something they maybe can't".
How about starting the sentence with "When a device is verified, clients must confirm..."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to deal with this would be to separate key verification from verifying that it matches result /keys/query. So you could say something like "Clients must confirm that the sender_key matches the keys returned by /keys/query for the given user. This can be done by ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Clients must confirm that the sender_key and the ed25519 field value under the keys property matches the keys returned by /keys/query for the given user, and verify the signature of the payload. Without this check, a client cannot be sure that the sender device owns the private part of the ed25519 key it claims to have in the Olm payload.
This is crucial when the ed25519 key corresponds to a verified device.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change "and verify" to "and must also verify" since the first sentence is a bit long, and the "must also" will link it with the "Clients must" at the beginning of the sentence. But other than that, it sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

present.
ciphertext:
type:
- object
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need swagger/yaml advice here. This results in the Type column saying object or string, when I want it to say CiphertextInfo or string, documenting the CiphertextInfo object as I would do with

type: object
title: CiphertextInfo
properties:
  ciphertext: ...

How can I do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could try putting title at the same level as type:

type:
  - object
  - string
title: CiphertextInfo

Otherwise I think you'll also need to specify the properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zil0, this is not quite about Swagger, rather about JSON Schema which Swagger uses to define object schemas. You need oneOf: https://spacetelescope.github.io/understanding-json-schema/reference/combining.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for your input.

@turt2live when doing this, CiphertextInfo is simply ignored.

@KitsuneRal it fails with Error reading property None.ciphertext: 'type' in scripts/templating/matrix_templates/units.py:280 when I do:

      ciphertext:
        description: |-
          Normally required. The encrypted content of the event.
        oneOf:
          - type: string
          - type: object
            title: CiphertextInfo
            properties:
              sender_key:
                type: string

type seems like a required key. If I put object as a wild guess, oneOf is ignored.

For the record the first thing I tried fails with mapping values are not allowed here which is thrown by yaml:

      ciphertext:
        description: |-
          Normally required. The encrypted content of the event.
        type:
          - string
          - object
            title: CiphertextInfo
            properties:
              sender_key:
                type: string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was the same as yours except that you can't miss strings and mapping pairs, so it should have been type: object instead of just object. Given that it's type inside type I'm unsure of that works either. As for the case with oneOf, you may try to wrap it in another type - I believe chances should be better then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! Sadly it doesn't build master for me (I cloned your fork).
local variable 'prop_title' referenced before assignment, and other errors after adding back the else:. I don't feel like debugging this right now :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for throwing dysfunctional code at you. I've fixed this and a couple of other errors, gendoc.py should work fine now (just pull the latest changes from the same branch of mine).

There's a formal problem with this, however: turned out that Swagger/OpenAPI 2 only uses allOf from JSON Schema but not oneOf. OpenAPI 3 fixes this, adding oneOf, anyOf etc. to the list of borrowings from JSON Schema. So basically the options are:

  1. Assume and document an "extension" to OpenAPI 2 (similar to Clarification on third party fields #1414)
  2. Wait until we switch to OpenAPI 3 (watch [WIP] Upgrade to OpenAPI/Swagger 3.0 #1446) - this will require all the scripts to be upgraded anyway to support its new features.
  3. Stay with the limited [ object, string ] syntax (it's not the first such case, by the way - I see something similar in client-server/definitions/push_rule.yaml).

Given that it's a pure formality, I'd probably recommend option 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It works great. Option 1 seems good, will you open a separate PR? I'll rebase this one on your branch for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will, and I guess it's a good idea to rebase your branch on mine (now that mine is rebased on master).

type: string
description: |-
The encryption algorithm to be used to encrypt messages sent in this
room. For example, ``m.megolm.v1.aes-sha2``.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a very general question that applies to other parts too: what is the rule of thumb here for saying "For example"?
The way I see it, there is no other supported encryption protocol. If a client implements one, it would be a spec extension, hence not covered by the spec. Another way of putting this is, if a client implements, say rot13 as an encryption algorithm to be used in rooms, it would conform to the spec, while having a non-working end-to-end encryption behavior in the Matrix world; isn't that a problem?
As a result of those thoughts I would have put at least "Should be" instead of "For example", had it not be written before.
Note that the guide clearly states that only Megolm is supported as a value here https://matrix.org/docs/guides/e2e_implementation.html#id32

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is usually done by using the enum property, which is just an array of allowed values. You can see an example of it at https://github.com/matrix-org/matrix-doc/blob/master/api/client-server/receipts.yaml#L52, or you can find other examples by just grepping for "enum".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, totally what I was looking for, thanks.

@Zil0
Copy link
Contributor Author

Zil0 commented Jul 18, 2018

I'm mostly done with this, modulo eventual omissions and wording issues. With #1420, I think the last missing bit is key sharing, and then #501 can hopefully be closed?

Please answer the questions I left as review comments (even the ones marked Outdated by GitHub).

Fix #589, #590 (see here), #1200, #1171, #1157. Partially fix #1212, as it does not include the S2S part.

@uhoreg uhoreg self-assigned this Jul 18, 2018
@Zil0 Zil0 force-pushed the e2e_doc branch 2 times, most recently from 1c51cc8 to 465fab5 Compare July 29, 2018 15:36
type: integer
description: The Olm message type.
description: |-
Normally required. The encrypted content of the event. Either directly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "normally"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question.The spec now says "Required. Normally required."

Let's get rid of "Normally required."

"ciphertext": {
"7qZcfnBmbEGzxxaWfBjElJuvn7BZx+lSz/SvFrDF/z8": {
"type": 0,
"body": "AwogGJJzMhf/S3GQFXAOrCZ3iKyGU5ZScVtjI0KypTYrWG0SIKkPAW8wAPNo9PAo915ex5TEpervpLtTmqbS4K9YgEorGiBLOXb2Sxb8vzJkZYBf7x1jVfIWL6LnCb4NGFMaRsyLBSKgBgMKIEfYtTaCQrwqh/grPMxKFFalrBLMlgez5S7W4mDYDARIEAEi8AV09qcmXzqJboIEUt8Z3O/mH4FhAO+wAYYcpe/Pz/1DQDkh8yhucyk8cxT1lJIIBIS22sRxmC6Rc98HH2g2dzHzszpM20/x68T4OurJ57DZXj5Ts/T3MQIYSAL8wkk/nfnnwrytWSSaAZle0cVqw4uOHq9FcjoNx47oug+lXhpCQX3vC50SygbsTy7LalO/qSiAo272FPOwKJ0NQVB/VHOnf1+jL6fl+v4skiJG1t/flFuCyWFyTSKIZRhY3jwIhXE09Tgtg1TFPYlgbGZXaKJBh3wQaiZ0PgS/Elc7qSkwrS67lQhvGytJKDR0uCxlcK4IXgHv0UTMkruZ/Ce5n8B/X7DT1SHHECF25SjuufyCM08IrcuhCvwukI6K8u0ppRJQGyaDAlQJpwn2SgQXuuYN8Czpj710babJAeYsK7QX8/xZrTWt+0navY/azOr7nScIkBBP7frPxWKVP+rJwwA886+0dmuJ9bGID8AwF4BhMzPvQhaN/ZR7ZCRggFJsl+yfhg7E4QsCAFrxmCMxmzm0RA0PmyPiyg280XCFnsi8iUJck3ipyJFemyiw2OdWuphwO96R/XudAoU4jdZ6JFD+F/RvnO7Uf9ftSG0MNw4dr16uabBrxIkOUZvViHJzyJGTNeHXTq9wyhwjDAV1hG7fSx7sqgM5pavbGb2c1rSe3v1K5toUzLV8xH/suqtFS3Ozea/njHBRoARkdwfblSNkydKLFC75b8e338qpjYntcyD5ZA+9rA4r3oJZLl1Qtd2RKoNhjqu4f70W/oSDM7a1FMUAh0Y9NZXQCaDsy/H8sTlaDJH6ypx3CPYJvZdbS7zE7nRsO6/3C3N3OulwQ/b0JtS5dpo7yELY/WMCFlPeuwwUA5n0lghstm+9uCeGqjV38FMe+uiEppKU2PCWtBszRwIts6+JU9nQf5VK7w75BH5QP5nXsgm/CzJuyjH7Vzncg0UrYC4P/YZpAxAZiu7CW8dUa0w4vkhaCRvHwvNWxKeSXnVnAuLH"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to keep those super-long strings, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for examples, keep it short.

Copy link
Member

@uhoreg uhoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good so far, as far as I can tell. My suggestions are mostly minor wording changes.

session_id:
type: string
description: |-
The ID of the session used to encrypt the message. Only present with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say "Required with Megolm" rather than "Only present with Megolm"?

algorithm:
type: string
description: |-
The encryption algorithm the key in this event is to be used with.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add enum: ["m.megolm.v1.aes-sha2"]


description: |-
This event type is used to exchange keys for end-to-end encryption. Typically
it is encrypted as an ``m.room.encrypted`` event.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention that they're typically sent as to-device events.

description: The room where the key is used.
session_id:
type: string
description: The ID of the session holding the key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The ID of the session that the key is for."?

Messaging algorithm names use the extensible naming scheme used throughout this
specification. Algorithm names that start with ``m.`` are reserved for
algorithms defined by this specification. Implementations wanting to experiment
with new algorithms are encouraged to pick algorithm names that start with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "Implementations wanting to experiment with new algorithms must be uniquely globally namespaced following Java's package naming conventions." (with the wording copied from https://matrix.org/docs/spec/#events)

description: The Olm message type.
description: |-
Normally required. The encrypted content of the event. Either directly
the encrypted payload in the case of a Megolm event, or a map from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace the last sentence with a pointer to the "Messaging Algorithms" section, because I think it needs more than what fits in the description. Perhaps "For more details, see ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appended rather than replaced, since this last part was used to explain the Type column.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"directly" doesn't really make sense here. You could say "Either the encrypted payload itself, in the case of...", or just get rid of "directly".

.. Note::

When Bob and Alice share a room, with Bob tracking Alice's devices, she may leave
the room and then add a new device. Bob expects to be notified of this change, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that "Bob expects to be notified" is accurate or necessary. I'd just say "Bob will not be notified of thes change as he doesn't share a room anymore with Alice."

room again, Bob has an out-of-date list of Alice's devices. In order to address
this issue, Bob's homeserver will add Alice's user ID to the ``changed`` property of
the ``device_lists`` field, thus Bob will update his list of Alice's devices as part
of its normal processing. Note that Bob can also be notified when he stops sharing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "its" in "its normal processing" should probably be changed to "his" as it refers to Bob.

========= ========= =============================================

.. NOTE::

A user is added to ``changed`` everytime they join a room where the client is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"every time" should be two words, and maybe add ", and vice versa" to the end of this sentence. I don't really understand this paragraph, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. It's a bit unclear the way it's worded right now, but it's a bit tricky to word correctly. Maybe something like: "For optimal performance, Alice should be added to changed in Bob's sync only when she adds a new device, or when Alice and Bob now share a room but didn't share any room previously. However, for the sake of simpler logic, a server may add Alice to changed when Alice and Bob share a new room, even if they previously already shared a room."

@@ -266,9 +508,18 @@ device_lists DeviceLists Optional. Information on e2e device updates. Note:
Parameter Type Description
========= ========= =============================================
changed [string] List of users who have updated their device identity keys
since the previous sync response.
since the previous sync response, or who have just joined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bob will also be added to Alice's the changed set if Alice joins a new room with Bob, not just if Bob joins a new room with Alice. So I'd change it to: "List of users who have updated their device identity keys, or who now share an encrypted room with the client since the previous sync response."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, don't think I ever noticed this :/


* Curve25519 for the initial key agreement.
* HKDF-SHA-256 for ratchet key derivation.
* Curve25519 for the DH ratchet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if "DH ratchet" is the right term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://signal.org/docs/specifications/doubleratchet/#diffie-hellman-ratchet does describe the root key ratchet as a "Diffie-Hellman ratchet", so I'd say it's a reasonable term.

On the other hand maybe it would be clearer to say "Curve25519 for the root key ratchet" / "HMAC-SHA-256 for the chain key ratchet".

<shrug> I don't suppose it really matters. The point is simply to list the encryption primitives so that if someone finds a vulnerability in SHA-256 tomorrow you can see what's affected.

@Zil0
Copy link
Contributor Author

Zil0 commented Aug 16, 2018

Applied your suggestions and cleaned-up commit history.

@richvdh richvdh self-requested a review August 16, 2018 12:25
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome! A couple of niggles it would be great to fix if you get a minute

type: integer
description: The Olm message type.
description: |-
Normally required. The encrypted content of the event. Either directly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question.The spec now says "Required. Normally required."

Let's get rid of "Normally required."

description: The Olm message type.
description: |-
Normally required. The encrypted content of the event. Either directly
the encrypted payload in the case of a Megolm event, or a map from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"directly" doesn't really make sense here. You could say "Either the encrypted payload itself, in the case of...", or just get rid of "directly".

Normally required. The encrypted content of the event. Either directly
the encrypted payload in the case of a Megolm event, or a map from the
recipient Curve25519 identity key to ciphertext information in the case
of an Olm event. For more details, see `Messaging Algorithm Names`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want "Messaging Algorithms" rather than "Messaging Algorithm Names".

"origin_server_ts": 1476648761524,
"sender": "@example:localhost",
"type": "m.room.encrypted",
"unsigned": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the other event examples don't include "unsigned", and I think that's a good pattern to follow, because it will only appear when you use certain APIs to retrieve the event.

Bob). She will not send sensitive material to that device, and cannot trust
messages apparently received from it.

* Alice may choose to skip the device verification process. She is not be able
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"She is not able"


* Curve25519 for the initial key agreement.
* HKDF-SHA-256 for ratchet key derivation.
* Curve25519 for the DH ratchet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://signal.org/docs/specifications/doubleratchet/#diffie-hellman-ratchet does describe the root key ratchet as a "Diffie-Hellman ratchet", so I'd say it's a reasonable term.

On the other hand maybe it would be clearer to say "Curve25519 for the root key ratchet" / "HMAC-SHA-256 for the chain key ratchet".

<shrug> I don't suppose it really matters. The point is simply to list the encryption primitives so that if someone finds a vulnerability in SHA-256 tomorrow you can see what's affected.


The type and content of the plaintext message event are given in the payload.

We include the room ID in the payload, because otherwise the homeserver would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only actually true if we were sending in-room events; if it's a to-device event it's omitted. We should probably say that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're never sending Olm in-room event, I think it would just be confusing. I'll move the sentence to the Megolm subsection where it should have been in the first place.

A user is added to ``changed`` every time they join a room where the client is,
and vice versa. Instead, for optimal performance, ``changed`` could be populated
only when a user starts sharing a room with the client on the condition they
didn't before. This may be hard to garantee, hence this behavior is not enforced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guarantee.

And "behaviour", since we prefer British English

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this according to what @uhoreg has written here #1284 (comment).

@Zil0
Copy link
Contributor Author

Zil0 commented Aug 18, 2018

Rebased on master and made an attempt at newsfragment.

Apparently newsfragments as specified in Towncrier's doc use issue numbers but we are using PR numbers, so issue_format in https://github.com/matrix-org/matrix-doc/blob/master/changelogs/client_server/pyproject.toml should probably be changed to /pull/.

@turt2live
Copy link
Member

Looks like your feature fragment got an rst extension.

A bit of background on why /issues/ was chosen: in the unlikely/rare case we need to reference something by issue then we don't need to go spelunking in the depths of the build process. Github does support an automatic redirect, which is being (ab)used here.

@richvdh richvdh merged commit fe3f3b6 into matrix-org:master Aug 20, 2018
August 2018 r0 automation moved this from Reviewer approved to Done (this list will be incomplete) Aug 20, 2018
@richvdh
Copy link
Member

richvdh commented Aug 20, 2018

\o/

RiotTranslateBot pushed a commit to RiotTranslateBot/matrix-doc that referenced this pull request Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
August 2018 r0
  
Done (this list will be incomplete)
Development

Successfully merging this pull request may close these issues.

None yet

5 participants