Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add end-to-end encryption API #13820

Merged
merged 1 commit into from
Jun 2, 2020
Merged

Add end-to-end encryption API #13820

merged 1 commit into from
Jun 2, 2020

Conversation

Gargron
Copy link
Member

@Gargron Gargron commented May 21, 2020

Fix #1093

A set of APIs required for the double ratchet encryption algorithm, specifically the Olm implementation developed by Matrix -- but it should be roughly the same as libsignal. An additional layer on top of it is so-called message franking, which allows encrypted messages to be reported to content moderators without compromising keys or message contents ahead of time while also preventing fake reports.

Development of E2EE capabilities into the web UI is not in scope of this PR.

REST API overview

To support Olm, the following APIs are required:

  • Uploading keys for a device (current app)
  • Querying available devices of people you want to establish a session with
  • Claiming a pre-key (one-time-key) for each device you want to establish a session with
  • Sending encrypted messages directly to specific devices of other people
Method Description
POST /api/v1/crypto/keys/upload Register current app as a device by submitting device with attributes device_id (securely random generated string or number), name (human-readable description), fingerprint_key (public Ed25519 key) and identity_key (public Curve25519 key) as well an array of one_time_keys with each having the attributes key_id, key (Curve25519 key) and signature (the key signed with the device's Ed25519 key)
POST /api/v1/crypto/keys/query Fetch devices for accounts specified by id (array supported). Returns an array of results, each result having the account's id and a devices attribute. Each device has device_id, name, fingerprint_key and identity_key attributes
POST /api/v1/crypto/keys/claim Fetch one-time keys (also known as pre-keys) for each given device with attributes account_id and device_id (array supported). Returns an array of results, each result has the attributes account_id, device_id, key_id, key and signature. You should verify the signature with the expected device's Ed25519 (fingerprint) key
POST /api/v1/crypto/delivery Send an encrypted message directly to each given device with attributes account_id, device_id, type, body and hmac. The type is 0 when it's a pre-key message (used to establish a new session) and 1 otherwise. For hmac, see below about messaging franking
GET /api/v1/crypto/encrypted_messages Fetch encrypted messages addressed to the current app (device). Returns an array of results, each having the attributes id, account_id, device_id, type, body, digest, and message_franking, supports pagination with pagination headers
POST /api/v1/crypto/encrypted_messages/clear Remove stored encrypted messages for current app (device) that are older or equal to up_to_id. You should do this whenever you're done processing messages client-side

All of the above methods require the new crypto OAuth scope.

Additionally, the streaming API now gives you encrypted_message events right in the main user stream, however, you only receive messages addressed to the connected app (device)!

Message franking

The sending client generates a new HMAC key and includes it in the to-be-encrypted message. It then generates a HMAC-SHA256 value from the to-be-encrypted message and sends it along with the encrypted message. The server, when forwarding the encrypted message to the recipient, composes a metadata summary for the message that includes the HMAC-SHA256 value, and then signs it using its own key. This metadata summary is forwarded along with the encrypted message itself to the recipient and discarded.

Upon reception of the encrypted message, the receiving client verifies the decrypted contents match the HMAC-SHA256 value from the metadata summary using the HMAC key provided in the decrypted contents. If they don't match, the message is discarded.

Should the receiving client desire to report the encrypted message and reveal its contents to the content moderators, the metadata summary is sent along with the report. The server can then verify its own signature on it and trust that the revealed contents are authentic.

Federation

⚠️ Requires design of new JSON-LD vocabulary.

An example of an actor's devices collection, linked to through a devices property:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "toot": "http://joinmastodon.org/ns#",
      "Device": "toot:Device",
      "Ed25519Signature": "toot:Ed25519Signature",
      "Ed25519Key": "toot:Ed25519Key",
      "Curve25519Key": "toot:Curve25519Key",
      "EncryptedMessage": "toot:EncryptedMessage",
      "publicKeyBase64": "toot:publicKeyBase64",
      "deviceId": "toot:deviceId",
      "claim": {
        "@type": "@id",
        "@id": "toot:claim"
      },
      "fingerprintKey": {
        "@type": "@id",
        "@id": "toot:fingerprintKey"
      },
      "identityKey": {
        "@type": "@id",
        "@id": "toot:identityKey"
      },
      "devices": {
        "@type": "@id",
        "@id": "toot:devices"
      },
      "messageFranking": "toot:messageFranking",
      "messageType": "toot:messageType",
      "cipherText": "toot:cipherText"
    }
  ],
  "id": "http://localhost:3000/users/admin/collections/devices",
  "type": "Collection",
  "totalItems": 1,
  "items": [
    {
      "deviceId": "11119",
      "type": "Device",
      "name": "React",
      "claim": "http://localhost:3000/users/admin/claim?id=11119",
      "fingerprintKey": {
        "type": "Ed25519Key",
        "publicKeyBase64": "8KvyHwKt3vMlwSx0HeVZL3juW+plYG0yhoWq/c7GDVc"
      },
      "identityKey": {
        "type": "Curve25519Key",
        "publicKeyBase64": "oBz4RVztr3HOjk1LrcuynJwPJO6Bnya1vCJG3KPU13Q"
      }
    }
  ]
}

An example of a one-time key returned when POSTing to the claim endpoint of a device:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      // ...
    }
  ],
  "keyId": "AAAF",
  "type": "Curve25519Key",
  "publicKeyBase64": "...",
  "signature": {
    "type": "Ed25519Signature",
    "signatureValue": "..."
  }
}

An example of an encrypted message sent to an actor's inbox:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "toot": "http://joinmastodon.org/ns#",
      "Device": "toot:Device",
      "Ed25519Signature": "toot:Ed25519Signature",
      "Ed25519Key": "toot:Ed25519Key",
      "Curve25519Key": "toot:Curve25519Key",
      "EncryptedMessage": "toot:EncryptedMessage",
      "publicKeyBase64": "toot:publicKeyBase64",
      "deviceId": "toot:deviceId",
      "claim": {
        "@type": "@id",
        "@id": "toot:claim"
      },
      "fingerprintKey": {
        "@type": "@id",
        "@id": "toot:fingerprintKey"
      },
      "identityKey": {
        "@type": "@id",
        "@id": "toot:identityKey"
      },
      "devices": {
        "@type": "@id",
        "@id": "toot:devices"
      },
      "messageFranking": "toot:messageFranking",
      "messageType": "toot:messageType",
      "cipherText": "toot:cipherText"
    }
  ],
  "id": "http://localhost:3000/f41c9aae-92fc-48bf-8d87-0606197f340a",
  "type": "Create",
  "actor": "http://localhost:3000/users/admin",
  "published": "2020-05-28T20:59:09Z",
  "to": "http://localhost:3000/users/velda_moriette0",
  "cc": null,
  "object": {
    "type": "EncryptedMessage",
    "messageType": 0,
    "cipherText": "AwogNcm0y1HR42K07iyNs37uTlQ7jrX3n7/nOyWDakQ6kGwSIHmijEqm5hI1wN4mDIkeUIj4LnkKJPqAjZ3hr4VWAt0UGiCgHPhFXO2vcc6OTUuty7KcnA8k7oGfJrW8Ikbco9TXdCI/AwogBOPnpEQySq0xC12nSzjobyQkdczyrbGPvZbaHrIxB0QQACIQf0yOM54nfuBcbWWW0nEA1IemtsNf0wpz",
    "digest": {
      "type": "Digest",
      "digestAlgorithm": "http://www.w3.org/2000/09/xmldsig#hmac-sha256",
      "digestValue": "5f6ad31acd64995483d75c7cf1e008c411d94bf8514a9fd4ef90adcf9d4953d9"
    },
    "messageFranking": "...",
    "attributedTo": {
      "type": "Device",
      "deviceId": "11119"
    },
    "to": {
      "type": "Device",
      "deviceId": "11876"
    }
  }
}

@Gargron Gargron added api REST API, Streaming API, Web Push API work in progress Not to be merged, currently being worked on activitypub Protocol-related changes, federation labels May 21, 2020
@Gargron Gargron force-pushed the feature-e2ee branch 2 times, most recently from 061bc78 to 1849f4b Compare May 22, 2020 14:50
@ClearlyClaire
Copy link
Contributor

Some notes, more on the API design and general concern than the implementation itself since I know you're far from done

Crypto protocol

I think something like Signal's protocol is the way to go, and reusing libsignal or libolm does make sense, so this PR definitely goes in the right direction.

Looking at Matrix's protocol, there is one thing I am a bit worried about: it seems very easy for an attacker to prevent any new session to an offline client (and severely hinder new sessions to an online client) by exhausting the pool of OTKs. From what I have read, Signal avoids this by using a backup PreKey to be used once the pool is exhausted. While that seems safe from a crypto standpoint to me, it may need special care and it may make sense to investigate how Signal is doing it exactly. Anyway, that “backup PreKey” solution seems easy to add a posteriori without breaking compatibility with older clients.

According to the paper you linked, it seems that message franking would not be strictly needed to achieve the reporting functionality:

This means that in Facebook messenger the underlying encryption already suffices as a single-opening-secure committing AEAD scheme. Moreover, due to ratcheting [14, 27, 49] Signal never reuses a symmetric key. Thus Facebook could have avoided the dedicated HMAC commitment. Admittedly they may be uncomfortable — for reason of psychological acceptability — with an architecture that sends decryption keys to Facebook despite the fact that this represents no harm to future or past communications

However, I don't think libolm or other implementations provide facilities to exploit this. Furthermore the protocol you described is easy to understand, to implement and to compose with any E2EE protocol, so I'm much more confident in having clients and servers perform those extra steps than having clients implement things deeper in the crypto stack without messing up. So I'm definitely for that implementation of message franking.

Performances and federation concerns

One thing I am worried about is performances, with the increased amount of cross-instance requests in the end, some of which needing to be synchronous. There are three things that are a potential big increase in requests across the network here:

Starting an encrypted session

Starting an encrypted session between two devices (which would realistically happen a few times per device pair) will require synchronously claiming a one-time PreKey from the instance hosting the recipient.

Federation-wise, I see two strategies for that:

  1. have the sender claim the key from their instance, relaying the request
    Doing so would enable the receiver's server to rate-limit or block by requester account (since the request could be signed).
  2. have the sender claim the key from the receiver's instance directly
    This would cut on a costly cross-instance request (which would be especially costly to Mastodon's synchronous handling of requests), but would prevent the receiver's instance from rate-limiting or blocking by requester account, unless we find a way to authenticate that request on the transport level, in which case the receiving instance will be able to learn about the IP address of the sender.
    It would also need the REST API to include a claiming endpoint for remote accounts.

Querying device list

A session needs to be established for basically every pair of devices, but the list of a user's devices can change at any point. There's probably a trade-off here between querying every users' device list before posting any message, and having caches on instances and clients.

The same question as to whether the client should fetch the list directly or go through their own instance applies as in the “Starting an encrypted session” thing.

Delivering messages

Instead of having an instance deliver the message once to every instance that has recipients as is currently done, it will need to send the message once per device of each recipient, which may be way more. But there's not really any way around that.

Media encryption

The current API proposal does not seem to include anything specific to encrypting media. There is probably some thought to be given here, especially since inlining media in the encrypted message would be expensive, and linking to an encrypted media with a symmetric key may not present the same binding qualities for the media as message franking does for the message's content (e.g. https://www.sjoerdlangkemper.nl/2019/11/20/message-franking/ to see an example of how one can mess that up)

Message archiving

One thing that has been completely left out by your proposal—and that's understandable, it is orthogonal to the crypto protocol itself—is message archiving across devices. Indeed, most users expect to be able to see even previous communication from newly-installed devices, which is not possible with E2EE.

Different protocols/implementations have chosen to provide this functionality either through server-side archiving or client-to-client communication of keys and messages.

This should be given some thought as well so that we have a somewhat secure scheme to do that without every client reimplementing its own thing.

Phasing out current DMs

It's not about the crypto protocol at all, but once E2EE is available in Mastodon, current DMs should be discouraged, but I don't think they should ever be completely removed (even though I see the appeal of having less filtering to do and possibly mess up when deciding which toots to show to whom, though). Indeed, clients may take a lot of time to implement E2EE properly, and some users may not use Mastodon in a way that makes E2EE possible to them (e.g., only using it from private browsing sessions). Also, direct messages are just, like, the most simple use of ActivityPub, and I don't think we should break compatibility on such a fundamental level.

Needless to say, we shouldn't drop current messages without warning and without letting the user a chance to back them up.

Instead, E2EE should be promoted, and DMs made perfectly clear they are not encrypted.

I don't know what would be best and less confusing, but I have a few ideas:

  • put unencrypted DMs back into timelines like they used to be
  • change the DM warning to make it more scary or whatnot
  • have the “conversation view” only for E2EE encrypted messages, with a first-use banner telling what they are, and some indication that the communications are encrypted

@lambadalambda
Copy link

Great start at implementing an e2ee infrastructure. Olm seems like a good choice. I'm indifferent to the message franking, especially if it uses ld-sigs, which we don't use at all, but it seems that it could be made optional, at least on federation.

A few comments on your post, thibg

Starting an encrypted session between two devices (which would realistically happen a few times per device pair) will require synchronously claiming a one-time PreKey from the instance hosting the recipient.

I wonder if having a synchronous rest endpoint for this makes sense, when the underlying action (asking another server for a key) is asynchronous. I think it might be more sensible to make this asynchronous in the client api as well, for example, you'd request an key and could then either pull for it or get a notification when it's ready.

One thing that has been completely left out by your proposal—and that's understandable, it is orthogonal to the crypto protocol itself—is message archiving across devices. Indeed, most users expect to be able to see even previous communication from newly-installed devices, which is not possible with E2EE.

Megolm implements this, but it's not a double ratchet system, just a single ratchet. You can give other devices your known history by sending them the ratchet from some point in time, the device can then read the messages from that point in time on.

The current API proposal does not seem to include anything specific to encrypting media.

I'd also be interested in having some way to do this. Maybe just encrypting the media with the same key as the message before upload would be fine.

Instead of having an instance deliver the message once to every instance that has recipients as is currently done, it will need to send the message once per device of each recipient, which may be way more. But there's not really any way around that.

This really depends on how you implement encrypted message sending UX. When I talked with gargron about this, I mentioned that Telegram has e2ee chats that are always just device-to-device, greatly simplifying key management and message spread. So you can have a chat between your desktop and your friends phone, but if you want a chat between your phone and their phone, that's a second chat.

Similarly, LINE (and I think whatsapp as well) only allow one device at a time for any kind of communication. Things like the LINE or whatsapp desktop app 'tunnel' through your phone. This is a bit awkward, but again, it makes key management much easier.

As a sidenote, we are currently implementing a new activity type specifically for chat messages (https://git.pleroma.social/pleroma/pleroma/-/merge_requests/2429), that defines more explicit rules for addressing than our current 'Note's do, so many of the weird things that can happen with DMs are not possible anymore. One key restriction is that you can only have one recipient in the to field and no other addressing (group chat is still possible, please see the PR).

With a setup like that, implementing a Telegram-style device-to-device chat becomes really easy, by addressing not the user directly, but a device (which would need to become an AP object and part of an collection owned by the user). Especially for a first iteration, a system like this seems to me to be easier to implement and also to understand (for the end user), so at least I'd love to work on that first and then, in a second step, see if there's not a better mechanism for group chats, like megolm (or the similar system signal has, forgot the name).

I'm currently thinking about how to shape the AP objects for all of this. @Gargron you said that you had some thought about this already, right? Can you post them here?

@ClearlyClaire
Copy link
Contributor

Great start at implementing an e2ee infrastructure. Olm seems like a good choice. I'm indifferent to the message franking, especially if it uses ld-sigs, which we don't use at all, but it seems that it could be made optional, at least on federation.

I am very uncomfortable with message franking being optional. It being always available means the client can just reject messages with missing/broken franking and does not have to handle other cases.

I don't see why message franking would use LD-sigs, it really doesn't have to. Let me get a few things straight regarding message franking:

  • the metadata summary can be in any arbitrary format, and can even be encrypted
  • the signature on that metadata + HMAC does not have to be verifiable by anyone else than the server which made it, it is purely for the server's consumption, to verify that it indeed saw the reported message corresponding to the signed metadata, without having to actually store it
  • the only thing a client (receiver) must be able to do is check the HMAC
  • in a federated setting, i'd believe there would be two sets of signed metadata summaries: one for the sending instance and one for the receiving one, the end client would forward both with the HMAC key and the cleartext, the instance would retrieve its own signed metadata and forward the other
  • all in all, it doesn't offer a solid way to break deniability to a third-party if you don't trust the server (and the receiver obviously has to report the message in the first place)

However I guess it could be made optional, in which case users could chose to accept messages they will be unable to report, but I don't think it's a worthy thing to pursue. (And, if my understanding of the paper is correct, that wouldn't actually bring you more deniability in the case of Olm, reporting messages to the platform could still be possible, just in a more complicated and error-prone way).

A few comments on your post, thibg

Starting an encrypted session between two devices (which would realistically happen a few times per device pair) will require synchronously claiming a one-time PreKey from the instance hosting the recipient.

I wonder if having a synchronous rest endpoint for this makes sense, when the underlying action (asking another server for a key) is asynchronous. I think it might be more sensible to make this asynchronous in the client api as well, for example, you'd request an key and could then either pull for it or get a notification when it's ready.

Would the underlying action be asynchronous? In Signal, the thing is synchronous since there's only one logical authority to ask. In XMPP it's asynchronous as much as any query/reply is in this protocol. In Matrix I think it's a single HTTP query that returns you the result.

If Mastodon were to handle such queries as normal AP payloads to inboxes, it may take seconds or even minutes for the key to be sent back, depending on the load, which would lead to terrible user experience.

Instead of having an instance deliver the message once to every instance that has recipients as is currently done, it will need to send the message once per device of each recipient, which may be way more. But there's not really any way around that.

This really depends on how you implement encrypted message sending UX. When I talked with gargron about this, I mentioned that Telegram has e2ee chats that are always just device-to-device, greatly simplifying key management and message spread. So you can have a chat between your desktop and your friends phone, but if you want a chat between your phone and their phone, that's a second chat.

Indeed, that does somewhat simplify key management and message spread, but I'm not sure this leads to better user experience. People would still have to review which devices they send messages to, and a chat session being device-to-device while I use multiple devices is one reason I simply don't use secure chats in Telegram.

As a sidenote, we are currently implementing a new activity type specifically for chat messages (https://git.pleroma.social/pleroma/pleroma/-/merge_requests/2429), that defines more explicit rules for addressing than our current 'Note's do, so many of the weird things that can happen with DMs are not possible anymore. One key restriction is that you can only have one recipient in the to field and no other addressing (group chat is still possible, please see the PR).

I have seen that, though I haven't followed very closely. I'm not sure why having a single actor in the to field would be an improvement. Having groups as actors create an indirection, makes E2EE more complicated, and means “There's always only one Chat between two actors” does not translate well to conversations with more than two people (although, those would be named chats I guess, so that may be fine).

Otherwise, I can see the point in having dedicated API endpoints and structures, especially if we start doing E2EE.

@trwnh
Copy link
Member

trwnh commented May 26, 2020

re: phasing out current dms:

I don't know what would be best and less confusing, but I have a few ideas:

  • put unencrypted DMs back into timelines like they used to be
  • change the DM warning to make it more scary or whatnot
  • have the “conversation view” only for E2EE encrypted messages

all of these would be good things to do IMO -- i think it was a mistake to treat direct statuses as "messages" at all in the first place. see also #12337 #3819 #3819 (comment) -- and also various UX issues with the current "Conversations" UI being constructed from statuses instead of messages, #10900 #10675 #9992 #9194

i would go so far as to say that even if e2ee doesn't get merged, the changes above should still happen. i'm not sure if an entirely new type is needed a la that pleroma merge request, but at minimum something like #9300 or w3c/activitypub#196 should be added to indicate that the object is a message and not a broadcast.

re: devices vs users:

Telegram has e2ee chats that are always just device-to-device, greatly simplifying key management and message spread. So you can have a chat between your desktop and your friends phone, but if you want a chat between your phone and their phone, that's a second chat.

this is actually the correct way to do e2ee from a UX perspective, because pretending you are messaging a user is actually lying. e2ee is fundamentally different in that it is device-to-device, and you cannot avoid having to do device management. better to expose it entirely, e.g. "my friend has 2 phone numbers" as opposed to trying to hide it e.g. "i want to text my friend and i don't care about phone numbers". you are not messaging your friend. you are messaging the phone number, or rather, the device attached to it. the only way around that is to attach the identifier to some sort of bouncer, as in irc or google voice. in that case, the "endpoint" becomes some server, and if you're doing it on both ends, then you might as well just selfhost and use TLS.

@ClearlyClaire
Copy link
Contributor

re: devices vs users:

Telegram has e2ee chats that are always just device-to-device, greatly simplifying key management and message spread. So you can have a chat between your desktop and your friends phone, but if you want a chat between your phone and their phone, that's a second chat.

this is actually the correct way to do e2ee from a UX perspective, because pretending you are messaging a user is actually lying. e2ee is fundamentally different in that it is device-to-device, and you cannot avoid having to do device management. better to expose it entirely, e.g. "my friend has 2 phone numbers" as opposed to trying to hide it e.g. "i want to text my friend and i don't care about phone numbers". you are not messaging your friend. you are messaging the phone number, or rather, the device attached to it. the only way around that is to attach the identifier to some sort of bouncer, as in irc or google voice. in that case, the "endpoint" becomes some server, and if you're doing it on both ends, then you might as well just selfhost and use TLS.

Signal, OMEMO, etc. will just allow listing the devices and accepting/rejecting them manually… no need for a bouncer…

@trwnh
Copy link
Member

trwnh commented May 26, 2020

@ThibG i meant that accepting/rejecting devices is just masking the fact that you're sending to multiple devices instead of to a single person. or in other words, there are multiple "ends" being masked by a single endpoint. a bouncer accomplishes this masking, by terminating the encryption at a single endpoint that can be accessed by the user from multiple devices. without it, you are stuck delivering to multiple devices if you want multi-device availability.

@lambadalambda
Copy link

Just a quick comment, will reply more after sleeping:

If Mastodon were to handle such queries as normal AP payloads to inboxes, it may take seconds or even minutes for the key to be sent back, depending on the load, which would lead to terrible user experience.

  1. A server could request some amount of prekeys from the server of the other party, to cache them locally for these cases.
  2. If exchanging a prekey would take minutes, exchanging a single message would also take minutes.

@ClearlyClaire
Copy link
Contributor

The server fetching a few PreKeys from the other party is an interesting idea i haven't thought of. It sounds like this could pull too many unused PreKeys though.

You are right for the message exchange possibly taking minutes, however, in that case, the message is sent as far as the sender is concerned, while it has to wait on the key to even send the first message. A mobile client can go offline immediately in the first case, in the second case it must wait for the key to get received (or wait until it's back up to send the message).

@lambadalambda
Copy link

The server fetching a few PreKeys from the other party is an interesting idea i haven't thought of. It sounds like this could pull too many unused PreKeys though.

Are prekeys that expensive to generate? can't we give every server that requests it like 5 keys?

mobile client can go offline immediately in the first case, in the second case it must wait for the key to get received (or wait until it's back up to send the message).

true, but I don't think preventing this rather rare problem (which is more of an inconvenience) is worth implementing a system where all servers have to implement a rest api for remote users. If this really is that important, i'd rather have support for a backup prekey, which would also solve this problem. You could even give out a different backup prekey to every server.

However I guess it could be made optional, in which case users could chose to accept messages they will be unable to report, but I don't think it's a worthy thing to pursue.

I do think it should be optional and maybe display a 'this user's messages can't be reported' or something. The franking mechanism is meant to break deniability in certain circumstances, and I don't think that should be given up on in general.

People would still have to review which devices they send messages to, and a chat session being device-to-device while I use multiple devices is one reason I simply don't use secure chats in Telegram.

Also very true, but I think that using multiple devices for encrypted chat is absolutely a 'power user' type of usage, because you need to understand a lot of the underlying mechanisms to understand what is happening and when something is secure or not. Most users just click away any warning, and those users would be in bigger trouble in a multi-device chat scenario.

I guess the question here is what the main target of the e2ee on the fediverse is supposed to be. If it's power users who understand the security implications and why certain features (message history, for example) are not there, then multi-device chats make a lot of sense. If it's more the user who wants to have an occasional way to send secure chat messages that the admin can't read or accidentally leak, then I think something super simple and easy to understand like explicit device-to-device chats make most sense. I'd rather see the latter focus, because I think that the first one is already served by XMPP and Matrix. As far as I know, all successful proprietary chats have either chosen to ignore e2ee (discord), to restrict to one device overall (LINE, Whatsapp) or to restrict e2ee to one-to-one (Telegram).

Of course, having both on the fediverse is absolutely possible, and the encryption primitives probably don't have to change much for either case.

Thank you for reading my blogpost

@ClearlyClaire
Copy link
Contributor

The server fetching a few PreKeys from the other party is an interesting idea i haven't thought of. It sounds like this could pull too many unused PreKeys though.

Are prekeys that expensive to generate? can't we give every server that requests it like 5 keys?

I'm not sure how expensive they are to generate, but what I'm worried about is more, how many keys you need to track. In Signal or Matrix, keys are expected to be used right after they get claimed, so keeping only a low pool of keys on the client-side makes sense. If we start claiming keys well in advance, we have a potentially unbounded number of keys that might get used in the more or less long-term future.

mobile client can go offline immediately in the first case, in the second case it must wait for the key to get received (or wait until it's back up to send the message).

true, but I don't think preventing this rather rare problem (which is more of an inconvenience) is worth implementing a system where all servers have to implement a rest api for remote users. If this really is that important, i'd rather have support for a backup prekey, which would also solve this problem.

You'd still have to query the backup prekey (which changes somewhat often too, otherwise all the benefit of using multiple short-lived PreKeys is lost).

You could even give out a different backup prekey to every server.

Not completely true, you'd need to have enough PreKeys generated by the client for that. Anyway, I'm afraid this all makes PreKey management much more complicated on the client. The goal of the client is to throw away keys as soon as possible, so if it has to track multiple PreKeys that can be used multiple times and on which distribution it doesn't have a lot of control, that is an issue.

However I guess it could be made optional, in which case users could chose to accept messages they will be unable to report, but I don't think it's a worthy thing to pursue.

I do think it should be optional and maybe display a 'this user's messages can't be reported' or something. The franking mechanism is meant to break deniability in certain circumstances, and I don't think that should be given up on in general.

Yes, it is meant for the client to break deniability of the sender to someone that witnessed the encrypted communication. But the paper shows that it is possible anyway with Signal/Facebook/Matrix's scheme (well, it wouldn't work as well with Signal because Signal itself has much less knowledge of the metadata). So having the franking in place doesn't actually lessen the deniability for the protocol we are talking about using, it just makes it easier to implement without messing up (the alternative being to disclose the input to the key derivation functions for that message, that gives out exactly the same properties in that specific case, but I don't think current implementations have support for it, and I'd be afraid of actual implementations messing up and disclosing some more long-lived secret or the server not checking all things properly).

People would still have to review which devices they send messages to, and a chat session being device-to-device while I use multiple devices is one reason I simply don't use secure chats in Telegram.

Also very true, but I think that using multiple devices for encrypted chat is absolutely a 'power user' type of usage, because you need to understand a lot of the underlying mechanisms to understand what is happening and when something is secure or not. Most users just click away any warning, and those users would be in bigger trouble in a multi-device chat scenario.

I think users clicking warnings away would also readily accept a session from a new device at any moment… but unless the attacker successfully MITM the whole communication—as opposed to, for instance, getting one end's credentials—this would lead to two conversations with two different people, so that would probably quickly be suspicious. So I guess you got a point.

But on the other hand, a lot of those users would also have, say, a phone and a laptop/tablet, and would not understand this annoying restriction…

[…] I'd rather see the latter focus, because I think that the first one is already served by XMPP and Matrix.

True, that is already served by XMPP and Matrix, but jumping through protocols and identities is something that we're trying to avoid.

As far as I know, all successful proprietary chats have either chosen to ignore e2ee (discord), to restrict to one device overall (LINE, Whatsapp) or to restrict e2ee to one-to-one (Telegram).

I I'd say that Signal is pretty successful too (although maybe not as much as the ones you listed). I'm not sure I'd even list Telegram there tbh since I've pretty much never seen it used with encryption, partly because of the one-devide-to-one-device restriction.

Of course, having both on the fediverse is absolutely possible, and the encryption primitives probably don't have to change much for either case.

That is true, but that reminds me of one hurdle of XMPP, before E2EE was even a thing. Being built with the idea that one could address a specific device (resource) or let the server route to the most appropriate resource(s) meant a lot of headache with multi-device until people decided multi-device should be the default and introduced Message Carbons. I'd rather like we avoided this kind of issues.

@Gargron Gargron force-pushed the feature-e2ee branch 3 times, most recently from 1d8dfdc to 5d6d838 Compare May 28, 2020 21:02
@lambadalambda
Copy link

But on the other hand, a lot of those users would also have, say, a phone and a laptop/tablet, and would not understand this annoying restriction…

They understand this restriction with other services, who don't even allow a 'second device'. I like that it would make the trade-off explicit: Want a secure channel? Okay, you get no server history and no multi-device. Want all that? Use unencrypted chat. I think it makes the trade-off easier to understand for non crypto experts.

There's also still the question of what to do with group chats. Just using Olm seems to be not the best idea, as you'd have to encrypt the message anew for every recipient's device. Using Megolm seems sensible, but has completely different security properties from Olm. Matrix uses Megolm for both 1-on-1 and group chats, because 1-on-1 chats are user-to-user, not device-to-device, and it makes it possible to keep encrypted history on the server. I think it's worth thinking about, although I feel like Matrix has gone too far with the 'convenience' part of e2ee and makes a lot of compromises to make e2ee feel just like a normal conversation. We'll see.

@lambadalambda
Copy link

One more comment

Being built with the idea that one could address a specific device (resource) or let the server route to the most appropriate resource(s) meant a lot of headache with multi-device until people decided multi-device should be the default and introduced Message Carbons.

You'll always have to deal with this for e2ee messages, at least with Olm. Megolm would help a bit, but each device still has to exchange at least one Olm message.

@ClearlyClaire
Copy link
Contributor

You'll always have to deal with this for e2ee messages, at least with Olm. Megolm would help a bit, but each device still has to exchange at least one Olm message.

Oh yes, I wasn't going about how that'd change the internals, just that different software (in XMPP's case, mostly server-side) had different expectations, and that created confusing and annoying situations in the past, something I'd like to avoid here.

@lambadalambda
Copy link

Absolutely agree, learning how to avoid confusion from all the confusing systems we used in the past (and now) seems like a top priority

@Gargron Gargron force-pushed the feature-e2ee branch 4 times, most recently from 8156f2b to e5acd1c Compare May 29, 2020 21:57
Copy link
Contributor

@ClearlyClaire ClearlyClaire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state of the code as I reviewed it includes code for uploading keys, claiming PreKeys, sending messages, and handles message franking. The message claiming involves synchronous S2S with a new endpoint, which is overall ok with me, but doesn't address the concerns I had listed earlier.

The code also does not provide anything for attachments, and while it handles adding message franking, it does not provide a reporting facility yet. That's ok, that can come later.

The S2S endpoint for claiming PreKeys is defined on a per-device basis, which I think may be wasteful, as all the other infrastructure is account-based anyway.

I have several concerns with how Message Franking is implemented, I have described them in inline comments, but I will outline them here:

  • there's only one franking value for a message that involves multiple servers, and the way it is currently handled means it can only be reported to the sending server, not the receiving one
  • the message franking value is signed with a keypair which public key is, well, public. This means the recipient of a message can disclose it to anyone who trusts their server, without their server's involvement in the process. I think this is an unnecessary weakening of the deniability of the communication. We do not need the receiver being able to report the message to anyone else than their and the sender's server.
  • the franking value being a JSON-LD object may make its processing overly complicated, while it could just be an arbitrary blob of data

app/services/keys/claim_service.rb Show resolved Hide resolved
app/services/deliver_to_device_service.rb Outdated Show resolved Hide resolved
app/lib/activitypub/activity/create.rb Outdated Show resolved Hide resolved
app/lib/activitypub/activity/create.rb Outdated Show resolved Hide resolved
app/controllers/api/v1/crypto/keys/claims_controller.rb Outdated Show resolved Hide resolved
app/services/keys/claim_service.rb Outdated Show resolved Hide resolved
spec/lib/activitypub/activity/create_spec.rb Show resolved Hide resolved
@ClearlyClaire
Copy link
Contributor

I've had an interesting discussion about message franking and how it may weaken deniability. The paper we discussed claims that the message franking isn't necessary to report messages with Facebook's protocol (so pretty much ours) because disclosing the input to the key derivation function is enough. While I haven't read any proof of that and I'm no cryptographer, I think that is true. However a well-behaving client should discard the key material as soon as possible, so reporting under that scheme means keeping the key material for longer. Which is an issue for the deniability of past conversations if the receiving device gets breached at a later point.

Keeping the HMAC key on the receiving end has the same drawback, that's why I think we should encourage clients to delete them after a while (and possibly, implement a mode where reporting is not possible, by discarding that material immediately).

On the server end, we probably don't need to make the franking value verifiable forever either, we could sign it with rotating private keys, e.g., rotate the key every week, and discard any key older than, say, a month or two.

@Gargron Gargron merged commit 5d8398c into master Jun 2, 2020
@Gargron Gargron deleted the feature-e2ee branch June 2, 2020 17:25
@lambadalambda
Copy link

As always, the best way to get people to comment is to hit that merge button :p

Some more input from me:

  1. I wonder if the keys ("fingerprintKey"...) should not only have a type, but also some designation of what kind of algorithm they are meant to be used in.

  2. Same for the encrypted message itself, it would be nice if it had something like "encryptionMethod: 'mastodon:olm:v1'" or something. Especially if megolm should be added for group chats later.

  3. "messageType": 0: I think this is needlessly unclear, why not "preKey"?

  4. The server now signs its franking values with a private key, which means the receiver cannot use it to prove to a third-party that they got the encrypted message. That's a big improvement over the previous version of the PR!

100% agreed

  1. Mismatched to field on the Create and the EncryptedMessage: I think this is very confusing, although I can see why it seems appropriate here. Still, I'd rather see the actual devices addressed instead of the user, as the actual message can also only be read by the specific device.

  2. This is the only thing that looks like a real 'bug' to me

"to": {
     "type": "Device",
     "deviceId": "11876"
   }

The deviceId should be a proper activitypub id, not just a number that only makes sense in context with actor. I think these should be expanded to user.ap_id + '/device/' + device.id so they are unique. As they are now, they can't really be addressed.

  1. "claim": "http://localhost:3000/users/admin/claim?id=11119",: I already talked about it before, but I think this is the wrong way to go about it. It doesn't seem very ActivityPub'y to me. The keys should be exchanged by a request and an answer. This message passing seems like the proper way to model this to me, the current call into a remote server seem to much like a REST or RPC call.

Othewise this looks very workable. It's still missing a good plan for how to deal with the frontend side of things, but this should a good foundation.

@SoniEx2
Copy link

SoniEx2 commented Jun 4, 2020

so what happens when DMs are removed? do I become unable to use DMs for image hosting?

@Gargron
Copy link
Member Author

Gargron commented Jun 4, 2020

so what happens when DMs are removed? do I become unable to use DMs for image hosting?

I think it won't be possible to fully remove old "DMs" because it's like the default ActivityPub mode, it has to be supported. We'll just make them look more like normal posts like they used to at the beginning, and positively encourage users to use E2EE instead for DMs.

Image hosting is a good question. I don't think we have a consensus about this yet... There could be a special upload endpoint for symmetrically encrypted files, with secrets shared over E2EE, that expire after a certain period like 10 or 20 days

@lambadalambda
Copy link

I think the questions was "can i keep uploading images in DMs so i can copy the link to the attachment around as a quick file hosting hack"

@trymeouteh
Copy link

Looking foward to E2EE messaging on Mastodon, will it be possible to have E2EE messages with 3 or more people (groups)?

@trymeouteh
Copy link

Will this E2EE messaging encrypt all data including metadata such as timestamp, likes, etc?

@supernovae
Copy link

Is this available yet? or expected anytime soon? This will be a critical feature with the recent exodus of another network to give the community a sense of safety/security.

@ekennedy80
Copy link

Is this available yet? or expected anytime soon? This will be a critical feature with the recent exodus of another network to give the community a sense of safety/security.

This is probably the most compelling feature to be added to Mastodon! People are drooling for privacy and the ability to DM others without admins' prying eyes.

@ClearlyClaire
Copy link
Contributor

We know this is an expected feature, and there are still plans to bring it to the mobile apps, but we do not have anything to announce yet. This is a complex feature that will take time to implement and cannot be rushed.

@samuk
Copy link

samuk commented Nov 28, 2022

I was idly wondering if https://reticulum.network/ might be interesting here

markqvist/Reticulum#155 (reply in thread)

I think it would require a re-think/ re-implementation of how Mastodon handles identities, which would be non-trivial. Potentially interesting to have a user-owned identity rather than an instance-admin owned one IMHO.

shouo1987 pushed a commit to CrossGate-Pawoo/mastodon that referenced this pull request Dec 7, 2022
@neet neet mentioned this pull request Dec 31, 2022
@digitalbuddha
Copy link

Hi folks, I maintain Firefly an in progress Android Mastodon client. Currently, I am working on bringing end to end encryption using signal protocol to DMs that originate and end in my app. Up until today I did not realize that there is any support for encrypted DMs on server but have found this wonderful PR.

I am fairly feature complete on the client side, let me give you a rundown of my current flow
The wrapper to most operations looks like:

interface Crypt {
    suspend fun onLogin()
    fun generateRemoteDeviceKeys(): PublicKeys
    suspend fun storeRemoteDeviceKeys(remoteDeviceKeys: String)
    suspend fun encryptFor(message: String, accountId: String): CiphertextMessage
    suspend fun decryptFrom(message: CiphertextMessage, accountId: String): String
    suspend fun sendRemoteKeysTo(inReplyTo: String, mentions: String): Status
}

on login with a new account
create a signalprotocol address with the account id and device id stubbed out
generateIdentityKeyPair
generateDeviceKeyBundle
store local identity with name
store prekeys
store signedprekey

On app start/restart
createLocalSignalProtocolStore which has access to above data
create a crypt with ability to encrypt/decrypt

To trust someone
retrieve local signal protocol address
retrieve local identity
if no prekeys unused generate 10 and store
retrieve one prekey and mark it as used
retrieve signed prekey
Create RemoteDeviceKeys json and send to other party (300 ish characters)

On other phone
when recieving a RemoteDeviceKey json payload
store remote device key in DB
retrieve local signal protocol address
retrieve local identity
if no prekeys unused generate 10 and store
retrieve one prekey and mark it as used
retrieve signed prekey
Create RemoteDeviceKeys and send to first party

To send message to someone that is trusted
retrieve receivers address
make sure you have the keys for them
encrypt for receiver
send message
save unencrypted to db

To recieve message from someone that is trusted
use crypt to decrypt message
show to user

My one current hurdle is doing the key exchange. For now I am sending a DM between users that start with some special characters and contain the prekey/public signature and public keys. Ideally I would like to do the exchange server side.

Is the current functionality live/usable? The only part I would need (as of now) is ability to publish and retrieve pre keys. Thank you and I hope we can work together, I'd love to be the first client that implements e2ee and am early enough in development that I would be happy to change anything in my flow to match what you are doing.

Thank you and have a nice day

@tcitworld
Copy link
Contributor

@OrvilleRed Using the Signal protocol doesn't mean using Signal.

@koyuawsmbrtn
Copy link
Contributor

This plus it will either end up with Matrix or ActivityPub anyway

@digitalbuddha
Copy link

Correct I am using libsignal which I believe has same structure for key exchange as ohm. Mostly just wondering if the key exchange piece is usable by a third party app

@koyuawsmbrtn
Copy link
Contributor

I don't recommend using those APIs just yet as there hasn't been seen any implementation of a full chat system

@OrvilleRed
Copy link

@OrvilleRed Using the Signal protocol doesn't mean using Signal.

Thank you & my apologies for my own confusion. Removed comment to avoid confusing others

@erlend-sh
Copy link

@GatoOscuro
Copy link

El futuro de Mastodon parece asombroso.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
activitypub Protocol-related changes, federation api REST API, Streaming API, Web Push API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OTR for direct messages