Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-104: Double Ratchet (End-to-End Encrypted) DMs #1206

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
205 changes: 205 additions & 0 deletions 104.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# NIP-104

## Double Ratchet E2EE Direct Messages

This NIP defines an encrypted direct messaging scheme that provides double-ratchet E2EE (end-to-end encryption) with forward secrecy & post-compromise (backward) secrecy and allows users to access messages from multiple synced devices.

- **[Live demo](https://drdm-demo.vercel.app)**
- [Demo app code](https://github.com/erskingardner/drdm-demo)
- [Demo app video with explainer](https://share.cleanshot.com/nMKk6cn0)

## Context

Currently, one-to-one direct messages (DMs) in Nostr happen via the scheme defined in NIP-04. This NIP is not recommended because, while it encrypts the content of the message, it leaks significant amounts of metadata about the parties involved in the conversation.

With the addition of NIP-44, we have an updated encryption scheme that improves some (but not all) of the metadata leakage and improves the obfuscation of message content but this NIP stops short of defining a new `kind` number or scheme for doing direct messages using this encryption scheme.

There have been a few separate proposals for new ways to do DMs that do not leak metadata. The most accepted (and recently merged) one is NIP-17 which combines NIP-44 encryption with NIP-59 gift-wrapping to hide the actual direct message inside another set of events to ensure that it's impossible to see who is talking to who and when messages passed between the users. This solves the metadata leakage problem and does allow some degree of deniability/repudiation but doesn't solve forward/backward secrecy. That is to say, if a user's private key (or the calculated conversation key used to encrypt messages) is compromised, the attacker will have full access to all past and future DMs sent between those users.

## Why is this important?

Without proper E2EE, Nostr cannot be used as the protocol for secure direct messaging clients. While clients like Signal do a fantastic job with E2EE, they still rely on centralized servers and as a result can be shut down by a powerful (i.e. state-level) actor. The goal of Nostr is not only to protect against centralized entities censoring you and your communications, but also protect against the ability of a state-level actor to stop these sorts of services from existing in the first place. By replacing centralized servers with decentralized relays, we make it nearly impossible for a centralized actor to completely stop communications between individual users.

## How does it work?

This scheme relies on incrementing (ratcheting) three sets of keys (using 2 independent ratchet mechanisms) on each message such that the two client applications can decrypt messages even if they arrive out of order or get lost and can still function if either party is offline.

This scheme is an adaptation of the Signal protocol, which itself is based on the Off the Record protocol. It uses a slightly simplified version of the [X3DH](https://www.signal.org/docs/specifications/x3dh/) key agreement protocol and the [Signal Double Ratchet algorithm](https://www.signal.org/docs/specifications/doubleratchet).

Let's look at an overview of the steps at the most abstract level. This initial discussion skips over many of the low-level cryptographic details. Go read the two articles linked above and check out the working demo app available on [Github](https://github.com/erskingardner/drdm-demo) for more detail.

1. Alice wants to have a private conversation with Bob. Her client sends an encrypted "conversation request" to Bob. To encrypt this first event, Alice's client creates a shared root key from a combination of Alice and Bob's Nostr keys, an ephemeral Nostr key pair, and a signed pre-key, which is a replaceable event found on Bob's Nostr relays. Because of the way the Diffie-Hellman exchange works, this process also authenticates Alice's identity to Bob. This conversation request event is [NIP-59 gift-wrapped](https://github.com/nostr-protocol/nips/blob/master/59.md) in order to hide as much metadata as possible. This conversation request event (`kind: 443`) is put inside the gift-wrap Seal event. This is a version of the X3DH key agreement protocol.
2. Alice has now given Bob enough information that, when he comes online, his client will see the conversation request and can authenticate that it is indeed from Alice (or, at least, someone who controls Alice's private key). It will then show Bob the conversation request. If he decides to accept this conversation request his client will then, using the shared root key he will also calculate via X3DH, fetch new message events (`kind: 444`), derive the keys needed to decrypt those messages, and decrypt the messages that Alice has sent. Importantly, Alice DOES NOT have to wait for Bob to accept this conversation request before she can start sending messages to him. Her conversation request event plus the tags included in her message events give Bob all the information needed (e.g. the ephemeral public key used to ratchet the shared root key + additional data) in order to derive the keys needed to decrypt her messages and continue the ratchet process.
erskingardner marked this conversation as resolved.
Show resolved Hide resolved
3. The ratchet process continues in line with the [Signal Double Ratchet](https://www.signal.org/docs/specifications/doubleratchet) algorithm. It's important to note that the keys involved here are all symmetrical derived keys that are deleted as the conversation progresses. This includes a mechanism that allows for processing messages that arrive out of order or don't arrive at all.

## Details

### X3DH Key Exchange

For Alice to start a conversation with Bob, we need to have 4 sets of keys.

1. Alice's Nostr keypair (npub/nsec)
1. Alice's Ephemeral keypair (one created just for this initial conversation request)
1. Bob's Nostr keypair (npub/nsec)
1. Bob's Prekey keypair (Published in a `kind:10443` event)

By using a combination of these keys we create a shared secret key that also authenticates that Alice is who she says she is. The prekey (which should be rotated regularly) and ephemeral key (one-time use) provides some degree of replay attck protection.

The shared secret key that is output from this combination of Diffie-Hellman calculations becomes the initial root key (described below) from which all other keys are derived.

### KDF Chains

The double ratchet algorithm used here depends on KDF (key derivation function) chains. If you're not familiar with what those are, the Signal docs have a great [description of KDF chains](https://www.signal.org/docs/specifications/doubleratchet/#kdf-chains), you should read it before moving on.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a tldr?


NB: Both the Diffie-Hellman ratchet and the sending/receiving chain ratchet use symmetric encryption, not asymmetric encryption. This is for very good reasons but is different from how most encryption so far on Nostr has been done (via asymmetric encryption).

### Three Chains

This algorithm depends on keeping 3 KDF chains in sync between the two parties. Each chain's latest output key is used to derive other values.

1. Root Chain Key (RK)
2. Sending Chain Key (CKs)
3. Receiving Chain Key (CKr)

#### Root Chain Key (Diffie-Hellman Ratchet)

The initial key for the root chain is calculated based on the X3DH key exchange described above. From that point onwards, each time the DH Ratchet is turned, a new RK and a new CKs or CKr is output. This ratchet provides post-compromise security. In other words, if an attacker gets steals one participant's CKs and CKr, they can calculate all future message keys and encrypt/decrypt future messages. Ratching the root chain breaks this by creating a new RK and a new key for each sending participant's sending and receiving chains. This ratchet is turned twice each time the active participant changes. For example, if Alice sends 1 message, Bob then sends 3, and Alice sends 1 more, each participant will have performed the DH ratchet 6 times. Twice for each change in sender so that we can calculate a new CKs and CKr.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define RK, CKs, CKr


#### Sending Chain & Receiving Chain

Each participant maintains one sending and one receiving chain. Alice's sending chain is Bob's receiving chain and vice versa. This is how the double ratchet algorithm can maintain message order in an asychronous setting. When either of these chains are ratcheted the output is a new CK and a message key (MK), which is used to encrypt the content of a message. In this way, an attacker that compromises a MK can only read a single message sent between the parties.

### NIP-44 Encryption using Message Keys in `kind:444` Events

The `content` field of a `kind:444` event is the ciphertext of the message being sent. This content is encrypted using the same process as outlined in [NIP-44](https://github.com/nostr-protocol/nips/blob/master/44.md) with one major difference. Instead of calculating a conversation key using the participants Nostr identity keys as the inputs, we MUST use the message key (MK) that is the output of the participants respective sending and receiving chains. This MK changes with each message sent or received, thus providing forward secrecy.

Using [nostr-tools](https://github.com/nbd-wtf/nostr-tools/blob/master/nip44.ts) NIP-44 v2 encrypt method, you'd do the following:

```js
import nip44 from "nostr-tools";
import { expand as hkdf_expand } from "@noble/hashes/hkdf";
import { sha256 } from "@noble/hashes/sha256";

// Use a KDF to output two 32-byte keys

// prevChainKey is the previous KDF chain key for the sending chain
const expanded = hkdf_expand(sha256, prevChainKey, "", 64);

// chainKey is the new chainkey for the sending chain
const newChainKey = expanded.subarray(0, 32);

// messageKey is the key that we'll use to encrypt/decrypt the message content
const messageKey = expanded.subarray(32);

const ciphertext = nip44.v2.encrypt(message_plaintext, messageKey);
```

Decrypting follows the same process to derive keys but then calls `decrypt`.

### Group Conversations (Out of scope)

A variation of this scheme will enable private group conversations with a few important caveats/changes to the security model. **This is beyond the scope of this NIP.**

### Syncing Conversations on Multiple Devices (Out of scope)

This scheme allows for users to register and concurrently use multiple devices to receive/send messages. The protocol ensures that each device uses separate keys and syncs independently. Read more about this concept in Signal's [Sesame protocol docs](https://www.signal.org/docs/specifications/sesame/). Some adaptation for using relays instead of central servers will be required and consistency expectations won't be the same (e.g. we'd use encrypted replaceable events to store some user/device state). **This is beyond the scope of this NIP.**

## New Event Kinds

This NIP introduces three new event kinds. First, we'll look at the two that are related to the conversation itself. These event kinds MUST always be gift-wrapped as per [NIP-59](https://github.com/nostr-protocol/nips/blob/master/59.md) and use [NIP-44](https://github.com/nostr-protocol/nips/blob/master/44.md) encryption for the [Seal (kind 13)](https://github.com/nostr-protocol/nips/blob/master/59.md#2-the-seal-event-kind) and [Gift Wrap (kind 1059)](https://github.com/nostr-protocol/nips/blob/master/59.md#3-gift-wrap-event-kind) events.

This combination might seem like overkill given we're further encrypting the actual content of the messages between the users using a double-ratchet symmetrical encryption scheme but this combination significantly decreases the amount of metadata published and improves deniability/repudiation of the conversation.

### Conversation Request Event

This event is sent when one user wants to start a conversation with another user. For example, if Alice wants to talk to Bob, her first message will be a conversation request event. This message allows both parties to establish a shared root key via a Diffie-Hellman key exchange and allows Alice to derive the first key in the chain to encrypt a message. An encrypted message MUST be included in the conversation request event to ensure that the first ratchet step is performed. The message can be a default message like, "Hi, I would like to connect." if no message is provided by the user.

```json
{
"kind": 443,
"created_at": <unix timestamp in seconds>,
"pubkey": <pubkey of the sender>,
"content": <ciphertext of initial message symmetrically encyrpted using first sending chain/message key>,
"tags": [
["p", <pubkey of the recipient>],
["prekey", <pubkey of the recipient prekey used for DH calculation>],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the tag to recipient_prekey to make it more clear?

["ephemeral", <pubkey of the ephemeral key used for DH calculation>]
],
"sig": "" // No signature because these are only sent inside gift wraps.
}
```

### Double Ratchet Direct Message (DRDM) Event

The Double Ratchet Direct Message (DRDM) event is the main event kind that carries the conversation content. This event carries enough information for recievers to properly double ratchet their chains and decrypt the message content with the right symmetrical keys.

```json
{
"kind": 444,
"created_at": <unix timestamp in seconds>,
"pubkey": <pubkey of the sender>,
"content": <ciphertext of the message symmetrically encrypted with chain/message key>,
"tags": [
["p", <pubkey of the recipient>],
["dh_sending", <current DH sending pubkey>],
["current_index", <number of current message in the current message chain>],
["previous_length", <length of previous message chain>]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When sharing the key between clients/devices, do we also need to share the current rachet state (current_index, previous_length)? Or should a new client start from 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DOES NOT work out of the box on multiple clients or devices. Each client/device combo should be thought of as a different "inbox" that will have it's own initial setup and will maintain it's own set of chains/keys for each conversation with another person. At a very high level, say I have Primal on iOS and Amethyst on Android and you have just Amethyst on Android. You're client would have to know that it's having the same conversation with two device-clients but using different sets of keys/chains. It would encrypt each message you send for each recipient and then publish those GW events separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to how this sort of scheme works for group conversations as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staab @paulmillr high level description of multiple devices question here... ☝️

you can also read more about how signal does it here: https://www.signal.org/docs/specifications/sesame/

I want to read up a bit more before committing to any path here. In an ideal world we have multi-device conversations, it's significantly more complex for clients. From the beginning I viewed this more as a solution that a dedicated secure messaging client would be using, and less as the normal standard for DMs in social clients. That said, nothing stopping those clients from implementing it, just a question of making the UI good enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, then I would add a:

Clients MUST not re-use/re-import keys from other clients or devices. The Key is not supposed to leave the app/device it was created from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I need to make this whole device/client limitation more clear in the NIP.

],
"sig": "" // No signature because these are only sent inside gift wraps.
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients MUST check if the pubkey of kind 443 and 444 is the same as the Seal's, kind 13, pubkey. Otherwise, impersonators can impersonate. :)

#### Required Tags

- `"p"`: References the recipient
- `"dh_sending"`: The ephemeral pubkey used in the current Diffie-Hellman ratchet.
- `"current_index"`: The index of the current message in the current message chain. For handling out of order messages.
- `"previous_length"`: The length of the previous message chain. For handling out of order messages.

Clients can use any other standard tags they like, for example, `"a"` or `"e"` tags to reference previous messages.

### Prekey Event

The Prekey event is a simple replaceable event where a user publishes their signed prekey that is required to initiate the Diffie-Hellman key exchange to create a conversation request event. Including a signature on this event proves that the holder of the user's main nostr key (which signs the `kind: 10443` event) also has control of the private key associated with the prekey.

In most cases, clients that implement this NIP will manage the creation and rotation of this key on a regular basis. It's recommended that clients do so interactively with user consent in order to avoid overwriting prekeys created by other clients. It is also recommended that clients rotate this key on a regular basis.

```json
{
"kind": 10443,
"created_at": <unix timestamp in seconds>,
"pubkey": <pubkey>,
"content": <hex encoded pubkey of the prekey>,
"tags": [
["prekey_sig", <signature generated from hex encoded pubkey of the prekey>]
],
"sig": <signed normally with the user's main nostr pubkey>
}
```

The `"prekey_sig"` tag value is a Schnorr signature of the sha256 hashed value of the prekey's pubkey, signed with the prekey's private key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients MUST verify the prekey_sig before accepting this event.

This allows anyone out there to copy the prekey_sig string and resign it under their own main nostr keys. Shouldn't the signature be a hash of (nostr.pubkey || prekey.pubkey) to avoid the shenanigans of just reusing somebody else's key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my demo I'm verifying the signature, but I'll make it clear that it's a MUST.

Since the prekey has the signature of the user's nostr identity key in the sig field, and the signed value in the prekey_sig tag value, I think you're covered there from imposters? But maybe not? Maybe you can elaborate a little on what you're thinking here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. As it is today, one can just copy any other user's prekey tag and it will be valid. They won't have the private key so maybe they can't do anything with it, but there might be a social attack somewhere. The event doesn't prove the prekey is linked with the main key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. that's a good point. I was thinking validation of the event + validation of the prekey signature would be enough but you're right, they aren't technically linked and it's easy to link them.

I don't think it's strictly necessary given how we do the multiple pairs of DH calculations in the initial key exchange (which does require private key signatures from both the prekey and the identity key) to derive the initial root key but it certainly wouldn't hurt.


```js
const privKey = schnorr.utils.randomPrivateKey();
const pubKey = bytesToHex(schnorr.getPublicKey(privKey));

const prekeySig = bytesToHex(
schnorr.sign(bytesToHex(sha256(utf8Encoder.encode(pubKey)), privKey))
);

const prekeyTag = ["prekey_sig", prekeySig];
```

## Known trade-offs & open questions

1. Messages are decrypted and stored on each device/client pairing. This means that there is no way to load and decrypt historic messages directly from relays on a different client or device. However, you can create backups of your decrypted conversations that could be loaded into another device (this is beyond the scope of this NIP).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do the other protocols (Signal and OMEMO) address this issue? Do they rely on at least one client to be online with the full history to relay it to the new client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in most cases that I've seen there is no historic conversation syncing via the protocol. With some of them there is a way to download/backup chat history and then load that into a new client (Signal does this) but it's not syncing conversations between devices ever.

One thought I had about this was that if Blossom becomes something then it would be a way that we could sync this chat data but I think that's still too early to tell...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://conversations.im/ suggests "If you are installing Conversations on a new device or catching up after being offline for a while, Conversations will use XEP-0313: Message Archive Management to fetch the message history from your server.". Do they mean that it is a new device with the same private key loaded from an old device?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signal does not address the issue. Signal can't download historical messages and doesn't have sync (besides a very trivial one, phone-to-desktop-web that relies on phone-to-web pairing and LAN webservers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signal does not address the issue. Signal can't download historical messages and doesn't have sync (besides a very trivial one, phone-to-desktop-web that relies on phone-to-web pairing and LAN webservers)

That's partially true. They don't handle syncing but they do have a mechanism to back up the history and then load that file into a new device (done via bluetooth) https://support.signal.org/hc/en-us/articles/360007059752-Backup-and-Restore-Messages

@AndySchroder No idea. I think we have to approach the syncing question from first principles based on how Nostr works instead of just trying to copy whatever XMPP is doing.

2. What are potential attack vectors and results? Do we leak a lot more metadata somehow? Can we publish some set of keys over time (incl. private) to increase plausible deniability?
3. How do we stop large amounts of spam conversation request events? Since they arrive via ephemeral keys we have to decrypt the outer layer of the GW, at a minimum. We can then limit from that point based on WoT?
4. Should we also have an inbox relays event kind that allows users to specify which relays they want used? That definitely makes it more obvious that these GW events are DMs though.

## Resources

- [Early demo video of demo app](https://share.cleanshot.com/nMKk6cn0)
- [Explain it like I'm 5 video on Double Ratchets](https://youtu.be/7uEeE3TUqmU)
- [Signal docs](https://www.signal.org/docs/)
- [OTR blog posts - in 4 parts](https://robertheaton.com/otr1)