-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add end-to-end encryption API #13820
Conversation
061bc78
to
1849f4b
Compare
Some notes, more on the API design and general concern than the implementation itself since I know you're far from done Crypto protocolI think something like Signal's protocol is the way to go, and reusing libsignal or libolm does make sense, so this PR definitely goes in the right direction. Looking at Matrix's protocol, there is one thing I am a bit worried about: it seems very easy for an attacker to prevent any new session to an offline client (and severely hinder new sessions to an online client) by exhausting the pool of OTKs. From what I have read, Signal avoids this by using a backup PreKey to be used once the pool is exhausted. While that seems safe from a crypto standpoint to me, it may need special care and it may make sense to investigate how Signal is doing it exactly. Anyway, that “backup PreKey” solution seems easy to add a posteriori without breaking compatibility with older clients. According to the paper you linked, it seems that message franking would not be strictly needed to achieve the reporting functionality:
However, I don't think libolm or other implementations provide facilities to exploit this. Furthermore the protocol you described is easy to understand, to implement and to compose with any E2EE protocol, so I'm much more confident in having clients and servers perform those extra steps than having clients implement things deeper in the crypto stack without messing up. So I'm definitely for that implementation of message franking. Performances and federation concernsOne thing I am worried about is performances, with the increased amount of cross-instance requests in the end, some of which needing to be synchronous. There are three things that are a potential big increase in requests across the network here: Starting an encrypted sessionStarting an encrypted session between two devices (which would realistically happen a few times per device pair) will require synchronously claiming a one-time PreKey from the instance hosting the recipient. Federation-wise, I see two strategies for that:
Querying device listA session needs to be established for basically every pair of devices, but the list of a user's devices can change at any point. There's probably a trade-off here between querying every users' device list before posting any message, and having caches on instances and clients. The same question as to whether the client should fetch the list directly or go through their own instance applies as in the “Starting an encrypted session” thing. Delivering messagesInstead of having an instance deliver the message once to every instance that has recipients as is currently done, it will need to send the message once per device of each recipient, which may be way more. But there's not really any way around that. Media encryptionThe current API proposal does not seem to include anything specific to encrypting media. There is probably some thought to be given here, especially since inlining media in the encrypted message would be expensive, and linking to an encrypted media with a symmetric key may not present the same binding qualities for the media as message franking does for the message's content (e.g. https://www.sjoerdlangkemper.nl/2019/11/20/message-franking/ to see an example of how one can mess that up) Message archivingOne thing that has been completely left out by your proposal—and that's understandable, it is orthogonal to the crypto protocol itself—is message archiving across devices. Indeed, most users expect to be able to see even previous communication from newly-installed devices, which is not possible with E2EE. Different protocols/implementations have chosen to provide this functionality either through server-side archiving or client-to-client communication of keys and messages. This should be given some thought as well so that we have a somewhat secure scheme to do that without every client reimplementing its own thing. Phasing out current DMsIt's not about the crypto protocol at all, but once E2EE is available in Mastodon, current DMs should be discouraged, but I don't think they should ever be completely removed (even though I see the appeal of having less filtering to do and possibly mess up when deciding which toots to show to whom, though). Indeed, clients may take a lot of time to implement E2EE properly, and some users may not use Mastodon in a way that makes E2EE possible to them (e.g., only using it from private browsing sessions). Also, direct messages are just, like, the most simple use of ActivityPub, and I don't think we should break compatibility on such a fundamental level. Needless to say, we shouldn't drop current messages without warning and without letting the user a chance to back them up. Instead, E2EE should be promoted, and DMs made perfectly clear they are not encrypted. I don't know what would be best and less confusing, but I have a few ideas:
|
Great start at implementing an e2ee infrastructure. Olm seems like a good choice. I'm indifferent to the message franking, especially if it uses ld-sigs, which we don't use at all, but it seems that it could be made optional, at least on federation. A few comments on your post, thibg
I wonder if having a synchronous rest endpoint for this makes sense, when the underlying action (asking another server for a key) is asynchronous. I think it might be more sensible to make this asynchronous in the client api as well, for example, you'd request an key and could then either pull for it or get a notification when it's ready.
Megolm implements this, but it's not a double ratchet system, just a single ratchet. You can give other devices your known history by sending them the ratchet from some point in time, the device can then read the messages from that point in time on.
I'd also be interested in having some way to do this. Maybe just encrypting the media with the same key as the message before upload would be fine.
This really depends on how you implement encrypted message sending UX. When I talked with gargron about this, I mentioned that Telegram has e2ee chats that are always just device-to-device, greatly simplifying key management and message spread. So you can have a chat between your desktop and your friends phone, but if you want a chat between your phone and their phone, that's a second chat. Similarly, LINE (and I think whatsapp as well) only allow one device at a time for any kind of communication. Things like the LINE or whatsapp desktop app 'tunnel' through your phone. This is a bit awkward, but again, it makes key management much easier. As a sidenote, we are currently implementing a new activity type specifically for chat messages (https://git.pleroma.social/pleroma/pleroma/-/merge_requests/2429), that defines more explicit rules for addressing than our current 'Note's do, so many of the weird things that can happen with DMs are not possible anymore. One key restriction is that you can only have one recipient in the With a setup like that, implementing a Telegram-style device-to-device chat becomes really easy, by addressing not the user directly, but a device (which would need to become an AP object and part of an collection owned by the user). Especially for a first iteration, a system like this seems to me to be easier to implement and also to understand (for the end user), so at least I'd love to work on that first and then, in a second step, see if there's not a better mechanism for group chats, like megolm (or the similar system signal has, forgot the name). I'm currently thinking about how to shape the AP objects for all of this. @Gargron you said that you had some thought about this already, right? Can you post them here? |
I am very uncomfortable with message franking being optional. It being always available means the client can just reject messages with missing/broken franking and does not have to handle other cases. I don't see why message franking would use LD-sigs, it really doesn't have to. Let me get a few things straight regarding message franking:
However I guess it could be made optional, in which case users could chose to accept messages they will be unable to report, but I don't think it's a worthy thing to pursue. (And, if my understanding of the paper is correct, that wouldn't actually bring you more deniability in the case of Olm, reporting messages to the platform could still be possible, just in a more complicated and error-prone way).
Would the underlying action be asynchronous? In Signal, the thing is synchronous since there's only one logical authority to ask. In XMPP it's asynchronous as much as any query/reply is in this protocol. In Matrix I think it's a single HTTP query that returns you the result. If Mastodon were to handle such queries as normal AP payloads to inboxes, it may take seconds or even minutes for the key to be sent back, depending on the load, which would lead to terrible user experience.
Indeed, that does somewhat simplify key management and message spread, but I'm not sure this leads to better user experience. People would still have to review which devices they send messages to, and a chat session being device-to-device while I use multiple devices is one reason I simply don't use secure chats in Telegram.
I have seen that, though I haven't followed very closely. I'm not sure why having a single actor in the Otherwise, I can see the point in having dedicated API endpoints and structures, especially if we start doing E2EE. |
re: phasing out current dms:
all of these would be good things to do IMO -- i think it was a mistake to treat direct statuses as "messages" at all in the first place. see also #12337 #3819 #3819 (comment) -- and also various UX issues with the current "Conversations" UI being constructed from statuses instead of messages, #10900 #10675 #9992 #9194 i would go so far as to say that even if e2ee doesn't get merged, the changes above should still happen. i'm not sure if an entirely new type is needed a la that pleroma merge request, but at minimum something like #9300 or w3c/activitypub#196 should be added to indicate that the object is a message and not a broadcast. re: devices vs users:
this is actually the correct way to do e2ee from a UX perspective, because pretending you are messaging a user is actually lying. e2ee is fundamentally different in that it is device-to-device, and you cannot avoid having to do device management. better to expose it entirely, e.g. "my friend has 2 phone numbers" as opposed to trying to hide it e.g. "i want to text my friend and i don't care about phone numbers". you are not messaging your friend. you are messaging the phone number, or rather, the device attached to it. the only way around that is to attach the identifier to some sort of bouncer, as in irc or google voice. in that case, the "endpoint" becomes some server, and if you're doing it on both ends, then you might as well just selfhost and use TLS. |
Signal, OMEMO, etc. will just allow listing the devices and accepting/rejecting them manually… no need for a bouncer… |
@ThibG i meant that accepting/rejecting devices is just masking the fact that you're sending to multiple devices instead of to a single person. or in other words, there are multiple "ends" being masked by a single endpoint. a bouncer accomplishes this masking, by terminating the encryption at a single endpoint that can be accessed by the user from multiple devices. without it, you are stuck delivering to multiple devices if you want multi-device availability. |
Just a quick comment, will reply more after sleeping:
|
The server fetching a few PreKeys from the other party is an interesting idea i haven't thought of. It sounds like this could pull too many unused PreKeys though. You are right for the message exchange possibly taking minutes, however, in that case, the message is sent as far as the sender is concerned, while it has to wait on the key to even send the first message. A mobile client can go offline immediately in the first case, in the second case it must wait for the key to get received (or wait until it's back up to send the message). |
Are prekeys that expensive to generate? can't we give every server that requests it like 5 keys?
true, but I don't think preventing this rather rare problem (which is more of an inconvenience) is worth implementing a system where all servers have to implement a rest api for remote users. If this really is that important, i'd rather have support for a backup prekey, which would also solve this problem. You could even give out a different backup prekey to every server.
I do think it should be optional and maybe display a 'this user's messages can't be reported' or something. The franking mechanism is meant to break deniability in certain circumstances, and I don't think that should be given up on in general.
Also very true, but I think that using multiple devices for encrypted chat is absolutely a 'power user' type of usage, because you need to understand a lot of the underlying mechanisms to understand what is happening and when something is secure or not. Most users just click away any warning, and those users would be in bigger trouble in a multi-device chat scenario. I guess the question here is what the main target of the e2ee on the fediverse is supposed to be. If it's power users who understand the security implications and why certain features (message history, for example) are not there, then multi-device chats make a lot of sense. If it's more the user who wants to have an occasional way to send secure chat messages that the admin can't read or accidentally leak, then I think something super simple and easy to understand like explicit device-to-device chats make most sense. I'd rather see the latter focus, because I think that the first one is already served by XMPP and Matrix. As far as I know, all successful proprietary chats have either chosen to ignore e2ee (discord), to restrict to one device overall (LINE, Whatsapp) or to restrict e2ee to one-to-one (Telegram). Of course, having both on the fediverse is absolutely possible, and the encryption primitives probably don't have to change much for either case. Thank you for reading my blogpost |
I'm not sure how expensive they are to generate, but what I'm worried about is more, how many keys you need to track. In Signal or Matrix, keys are expected to be used right after they get claimed, so keeping only a low pool of keys on the client-side makes sense. If we start claiming keys well in advance, we have a potentially unbounded number of keys that might get used in the more or less long-term future.
You'd still have to query the backup prekey (which changes somewhat often too, otherwise all the benefit of using multiple short-lived PreKeys is lost).
Not completely true, you'd need to have enough PreKeys generated by the client for that. Anyway, I'm afraid this all makes PreKey management much more complicated on the client. The goal of the client is to throw away keys as soon as possible, so if it has to track multiple PreKeys that can be used multiple times and on which distribution it doesn't have a lot of control, that is an issue.
Yes, it is meant for the client to break deniability of the sender to someone that witnessed the encrypted communication. But the paper shows that it is possible anyway with Signal/Facebook/Matrix's scheme (well, it wouldn't work as well with Signal because Signal itself has much less knowledge of the metadata). So having the franking in place doesn't actually lessen the deniability for the protocol we are talking about using, it just makes it easier to implement without messing up (the alternative being to disclose the input to the key derivation functions for that message, that gives out exactly the same properties in that specific case, but I don't think current implementations have support for it, and I'd be afraid of actual implementations messing up and disclosing some more long-lived secret or the server not checking all things properly).
I think users clicking warnings away would also readily accept a session from a new device at any moment… but unless the attacker successfully MITM the whole communication—as opposed to, for instance, getting one end's credentials—this would lead to two conversations with two different people, so that would probably quickly be suspicious. So I guess you got a point. But on the other hand, a lot of those users would also have, say, a phone and a laptop/tablet, and would not understand this annoying restriction…
True, that is already served by XMPP and Matrix, but jumping through protocols and identities is something that we're trying to avoid.
I I'd say that Signal is pretty successful too (although maybe not as much as the ones you listed). I'm not sure I'd even list Telegram there tbh since I've pretty much never seen it used with encryption, partly because of the one-devide-to-one-device restriction.
That is true, but that reminds me of one hurdle of XMPP, before E2EE was even a thing. Being built with the idea that one could address a specific device (resource) or let the server route to the most appropriate resource(s) meant a lot of headache with multi-device until people decided multi-device should be the default and introduced Message Carbons. I'd rather like we avoided this kind of issues. |
1d8dfdc
to
5d6d838
Compare
They understand this restriction with other services, who don't even allow a 'second device'. I like that it would make the trade-off explicit: Want a secure channel? Okay, you get no server history and no multi-device. Want all that? Use unencrypted chat. I think it makes the trade-off easier to understand for non crypto experts. There's also still the question of what to do with group chats. Just using Olm seems to be not the best idea, as you'd have to encrypt the message anew for every recipient's device. Using Megolm seems sensible, but has completely different security properties from Olm. Matrix uses Megolm for both 1-on-1 and group chats, because 1-on-1 chats are user-to-user, not device-to-device, and it makes it possible to keep encrypted history on the server. I think it's worth thinking about, although I feel like Matrix has gone too far with the 'convenience' part of e2ee and makes a lot of compromises to make e2ee feel just like a normal conversation. We'll see. |
One more comment
You'll always have to deal with this for e2ee messages, at least with Olm. Megolm would help a bit, but each device still has to exchange at least one Olm message. |
Oh yes, I wasn't going about how that'd change the internals, just that different software (in XMPP's case, mostly server-side) had different expectations, and that created confusing and annoying situations in the past, something I'd like to avoid here. |
Absolutely agree, learning how to avoid confusion from all the confusing systems we used in the past (and now) seems like a top priority |
8156f2b
to
e5acd1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state of the code as I reviewed it includes code for uploading keys, claiming PreKeys, sending messages, and handles message franking. The message claiming involves synchronous S2S with a new endpoint, which is overall ok with me, but doesn't address the concerns I had listed earlier.
The code also does not provide anything for attachments, and while it handles adding message franking, it does not provide a reporting facility yet. That's ok, that can come later.
The S2S endpoint for claiming PreKeys is defined on a per-device basis, which I think may be wasteful, as all the other infrastructure is account-based anyway.
I have several concerns with how Message Franking is implemented, I have described them in inline comments, but I will outline them here:
- there's only one franking value for a message that involves multiple servers, and the way it is currently handled means it can only be reported to the sending server, not the receiving one
- the message franking value is signed with a keypair which public key is, well, public. This means the recipient of a message can disclose it to anyone who trusts their server, without their server's involvement in the process. I think this is an unnecessary weakening of the deniability of the communication. We do not need the receiver being able to report the message to anyone else than their and the sender's server.
- the franking value being a JSON-LD object may make its processing overly complicated, while it could just be an arbitrary blob of data
I've had an interesting discussion about message franking and how it may weaken deniability. The paper we discussed claims that the message franking isn't necessary to report messages with Facebook's protocol (so pretty much ours) because disclosing the input to the key derivation function is enough. While I haven't read any proof of that and I'm no cryptographer, I think that is true. However a well-behaving client should discard the key material as soon as possible, so reporting under that scheme means keeping the key material for longer. Which is an issue for the deniability of past conversations if the receiving device gets breached at a later point. Keeping the HMAC key on the receiving end has the same drawback, that's why I think we should encourage clients to delete them after a while (and possibly, implement a mode where reporting is not possible, by discarding that material immediately). On the server end, we probably don't need to make the franking value verifiable forever either, we could sign it with rotating private keys, e.g., rotate the key every week, and discard any key older than, say, a month or two. |
As always, the best way to get people to comment is to hit that merge button :p Some more input from me:
100% agreed
"to": {
"type": "Device",
"deviceId": "11876"
} The deviceId should be a proper activitypub id, not just a number that only makes sense in context with actor. I think these should be expanded to
Othewise this looks very workable. It's still missing a good plan for how to deal with the frontend side of things, but this should a good foundation. |
so what happens when DMs are removed? do I become unable to use DMs for image hosting? |
I think it won't be possible to fully remove old "DMs" because it's like the default ActivityPub mode, it has to be supported. We'll just make them look more like normal posts like they used to at the beginning, and positively encourage users to use E2EE instead for DMs. Image hosting is a good question. I don't think we have a consensus about this yet... There could be a special upload endpoint for symmetrically encrypted files, with secrets shared over E2EE, that expire after a certain period like 10 or 20 days |
I think the questions was "can i keep uploading images in DMs so i can copy the link to the attachment around as a quick file hosting hack" |
Looking foward to E2EE messaging on Mastodon, will it be possible to have E2EE messages with 3 or more people (groups)? |
Will this E2EE messaging encrypt all data including metadata such as timestamp, likes, etc? |
Is this available yet? or expected anytime soon? This will be a critical feature with the recent exodus of another network to give the community a sense of safety/security. |
This is probably the most compelling feature to be added to Mastodon! People are drooling for privacy and the ability to DM others without admins' prying eyes. |
We know this is an expected feature, and there are still plans to bring it to the mobile apps, but we do not have anything to announce yet. This is a complex feature that will take time to implement and cannot be rushed. |
I was idly wondering if https://reticulum.network/ might be interesting here markqvist/Reticulum#155 (reply in thread) I think it would require a re-think/ re-implementation of how Mastodon handles identities, which would be non-trivial. Potentially interesting to have a user-owned identity rather than an instance-admin owned one IMHO. |
Hi folks, I maintain Firefly an in progress Android Mastodon client. Currently, I am working on bringing end to end encryption using signal protocol to DMs that originate and end in my app. Up until today I did not realize that there is any support for encrypted DMs on server but have found this wonderful PR. I am fairly feature complete on the client side, let me give you a rundown of my current flow interface Crypt {
suspend fun onLogin()
fun generateRemoteDeviceKeys(): PublicKeys
suspend fun storeRemoteDeviceKeys(remoteDeviceKeys: String)
suspend fun encryptFor(message: String, accountId: String): CiphertextMessage
suspend fun decryptFrom(message: CiphertextMessage, accountId: String): String
suspend fun sendRemoteKeysTo(inReplyTo: String, mentions: String): Status
} on login with a new account On app start/restart To trust someone On other phone To send message to someone that is trusted To recieve message from someone that is trusted My one current hurdle is doing the key exchange. For now I am sending a DM between users that start with some special characters and contain the prekey/public signature and public keys. Ideally I would like to do the exchange server side. Is the current functionality live/usable? The only part I would need (as of now) is ability to publish and retrieve pre keys. Thank you and I hope we can work together, I'd love to be the first client that implements e2ee and am early enough in development that I would be happy to change anything in my flow to match what you are doing. Thank you and have a nice day |
@OrvilleRed Using the Signal protocol doesn't mean using Signal. |
This plus it will either end up with Matrix or ActivityPub anyway |
Correct I am using libsignal which I believe has same structure for key exchange as ohm. Mostly just wondering if the key exchange piece is usable by a third party app |
I don't recommend using those APIs just yet as there hasn't been seen any implementation of a full chat system |
Thank you & my apologies for my own confusion. Removed comment to avoid confusing others |
There’s an emerging internet standard that solves for exactly this: https://www.ietf.org/blog/mls-secure-and-usable-end-to-end-encryption/ |
El futuro de Mastodon parece asombroso. |
Fix #1093
A set of APIs required for the double ratchet encryption algorithm, specifically the Olm implementation developed by Matrix -- but it should be roughly the same as libsignal. An additional layer on top of it is so-called message franking, which allows encrypted messages to be reported to content moderators without compromising keys or message contents ahead of time while also preventing fake reports.
Development of E2EE capabilities into the web UI is not in scope of this PR.
REST API overview
To support Olm, the following APIs are required:
POST /api/v1/crypto/keys/upload
device
with attributesdevice_id
(securely random generated string or number),name
(human-readable description),fingerprint_key
(public Ed25519 key) andidentity_key
(public Curve25519 key) as well an array ofone_time_keys
with each having the attributeskey_id
,key
(Curve25519 key) andsignature
(thekey
signed with the device's Ed25519 key)POST /api/v1/crypto/keys/query
id
(array supported). Returns an array of results, each result having the account'sid
and adevices
attribute. Each device hasdevice_id
,name
,fingerprint_key
andidentity_key
attributesPOST /api/v1/crypto/keys/claim
device
with attributesaccount_id
anddevice_id
(array supported). Returns an array of results, each result has the attributesaccount_id
,device_id
,key_id
,key
andsignature
. You should verify the signature with the expected device's Ed25519 (fingerprint) keyPOST /api/v1/crypto/delivery
device
with attributesaccount_id
,device_id
,type
,body
andhmac
. Thetype
is0
when it's a pre-key message (used to establish a new session) and1
otherwise. Forhmac
, see below about messaging frankingGET /api/v1/crypto/encrypted_messages
id
,account_id
,device_id
,type
,body
,digest
, andmessage_franking
, supports pagination with pagination headersPOST /api/v1/crypto/encrypted_messages/clear
up_to_id
. You should do this whenever you're done processing messages client-sideAll of the above methods require the new
crypto
OAuth scope.Additionally, the streaming API now gives you
encrypted_message
events right in the mainuser
stream, however, you only receive messages addressed to the connected app (device)!Message franking
The sending client generates a new HMAC key and includes it in the to-be-encrypted message. It then generates a HMAC-SHA256 value from the to-be-encrypted message and sends it along with the encrypted message. The server, when forwarding the encrypted message to the recipient, composes a metadata summary for the message that includes the HMAC-SHA256 value, and then signs it using its own key. This metadata summary is forwarded along with the encrypted message itself to the recipient and discarded.
Upon reception of the encrypted message, the receiving client verifies the decrypted contents match the HMAC-SHA256 value from the metadata summary using the HMAC key provided in the decrypted contents. If they don't match, the message is discarded.
Should the receiving client desire to report the encrypted message and reveal its contents to the content moderators, the metadata summary is sent along with the report. The server can then verify its own signature on it and trust that the revealed contents are authentic.
Federation
An example of an actor's devices collection, linked to through a
devices
property:An example of a one-time key returned when POSTing to the
claim
endpoint of a device:An example of an encrypted message sent to an actor's inbox: