Copyright (c) 2013 Michael Mol firstname.lastname@example.org
(insert BSD 3-clause license here)
msgpass: A message format targeted at secure(ish) asynchronous emails.
Email is inherently insecure. Tools such as GPG and S?MIME are nice, but they don't secure certain pieces of metadata. Namely, they don't protect the source address and the subject. Further, the transport process of email results in routing information added to the message, baking additional information subject to analysis.
With the existence of malicious nation-states and other entities which perform wholesale capture and analysis of network traffic, it becomes critical for purposes of individual freedom to maintain the ability to communicate despite these entities, and it becomes further critical for these communications to be as private as possible.
Finally, with the recent shuttering of privacy-minded hosted email services, and with the revelation of fundamental flaws in the privacy and security of email, individuals are left with a glaring hole in the suite of standard communication habits; while it's easy to use protocols such as TLS and OTR to establish secure synchronous communication, there remain no easy and common means to have a more critical service: asynchronous communication.
msgpass is intended to help fill this gap.
msgpass tries to tackle the security and privacy aspects in a few basic ways.
It uses existing PGP/GPG tools as a means to both protect message content and to identify source and destination identities.
It does not expose source identity information in the clear; only the destination identity is shown.
In order to improve on delivery guarantees, it allows for multiple return paths to be specified in the encrypted portion of message headers.
In order to resist attacks based on known (or guessed, such as is used in the CRIME and BEAST attacks against SSL/TLS) cyphertext, nonces are specified at the beginning of the message body, and the message version number is kept in the clear.
In order to ease uptake, the system is explicitly intended to be able to work with existing content distribution systems such as pastebins and NNTP servers.
All fields are packed on 8-bit boundaries, with no padding.
Except where noted, all fields are in network byte order.
The body immediately follows the header.
There are multiple version numbers in the envelope, each corresponding to a depth; in version 1, the Outer version number applies to the header, while the Inner version number apples to the message body.
Several types are reused frequently, such as Blobs and Counts. Blobs are themselves compound types, but are used so frequently that I feel they warrant up-front explanation.
First, the Count data type. The purpose of this type is, simply, to provide a count of values. In the message format, this is often used as a prelude to an array of like objects.
A Count type is always unsigned, and will have the specified number of bits where it's referenced elsewhere in the spec.
Next, the Blob type. The Blob type is intended for data packing. It consists of a Count value followed by as many octets as are specified by that Count. Where the Blob type is referenced, the number of bits for the corresponding Count field will also be specified.
Finally, the String type. Strings are Blob types where the content of the blob is specified as using UTF-8 as its encoding.
|Magic Number||32 bits||The magic number useful for identifying this file type from others.|
|Version Number (Outer)||16 bits||Version of the message outer format. This spec describes version 1.|
|Recipient GPG Key ID||32 bits||The ID of the public key of the recipient.|
Everything following this header is encrypted using the public key associated with the Recipient GPG Key ID.
|Nonce||64 bits||High-entropy random data|
|Message ID||128 bits||Unique identifier for message|
|Version Number (Inner)||32 bits||Version of the message inner format. This spec describes version 1.|
|Sender Public GPG Key||Variable. Blob with 16-bit count||GPG Public Key of sender. See String type.|
|Return Paths||Variable. Blob with 32-bit count.||Depositories for replies to this message|
|Payload||Variable. Blob with 64-bit count.||Content of message body|
|Signature||Variable||GPG signature, using sender's key, of entire body of message|
|Count||8 bits||How many return paths are listed. See Count type.|
|Path 1||Variable. Blob with 32-bit count.||Return Path|
|Path 2||Variable. Blob with 32-bit count.||Return Path|
|Path N...||Variable. Blob with 32-bit count.||Return Path|
TODO: Specify return paths more clearly. Include sample path types.
|Part count||8 bits||How many payload parts to look for|
|Part 1||Variable. Blob with 64-bit count.||Payload part.|
|Part 2||Variable. Blob with 64-bit count.||Payload part.|
|Part N...||Variable. Blob with 64-bit count./td>||Further payload parts as necessary|
|Payload Part ID||16 bits||ID of payload part|
|Payload Part Name||Variable. String with 16-bit count.||Name of Payload|
|Payload Part MIME Type||Variable. String with 8-bit count.||MIME type of payload body|
|Payload Part Body||Variable. Blob with 64-bit count.||Body of the Payload Part|
In order to identify the data on the wire or on disk, a magic number is used. This is fairly common practice. The magic number for this data is yet to be determined.
This field is not encrypted.
Version Number (Outer)
In order to allow for changes in the format on the wire, a version number is necessary. Because the version number defines how to parse the message, it must come early in the byte stream. Because the version number is unlikely to change often, I don't think it likely to reveal much identifying information. Further, because the version number isn't likely to change much, it represents a piece of known data, and thus, were it included near the beginning of the encrypted portion of the stream, would improve the efficacy of known-cyphertext attacks against the message body.
In consideration all of these factors, the version number is kept in the clear.
However, becaue it is neither encrypted nor signed, it can be spoofed. Since this may in the future be used to confuse a message parser, this version number only describes the layout of the outer, unencrypted portion of the envelope.
This field is not encrypted.
Recipient GPG Key ID
The public key ID of the intended recipient. This is the only intended mechanism by which the message's destination should be known. Knowledge of the recipient's GPG key should inform the sender of the appropriate cyphers to use, etc. The message body is encrypted using the public key identified by this key ID. Only the message recipient should be able to decode the message.
This field is not encrypted. All subsequent fields are encrypted as a monolithic block.
Random, high-entropy data. This data has no semantic meaning. Its sole purpose is to help scramble the state of the cypher engine, and to provide uniqueness to multiple copies of the same message sent to a given recipient using varying return paths. As such, if a message is placed in multiple depositories, the nonce should be unique for each placement.
Value uniquely identifying the message. This value should be able to be compared with the same field from another message in order to detect duplicate messages without any further decryption of the message body.
If the same message is placed in multiple depositories, they MUST have common message IDs, regardless of whether or not they have distinct nonces. There is no other defined way to detect identical messages.
Version Number (Inner)
Identifies the format of the message body.
Sender Public GPG Key
This is the sender's public key. Or, at least, the public key used to sign the message. The recipient may, of course, use whatever means they wish to verify the key; without out some out-of-band means to verify the key, there's no way for the recipient to validate the authenticity of the message short of only distributing their own public key to specific parties.
In the event the recipient wishes to reply, this field lists recommended depositories which the sender may check for messages. Examples of depositories might include:
- File servers
A return path might also indicate a different public key which should be used encrypt messages sent via that path. As a consequence, it might also function as a CC/BCC target list for replies, resulting in a default of a "reply-to- all" semantic. It might also serve as a mechanism by which the original sender may seek to encode further information which might trigger decisions upon receipt of reply.
Before this spec is finalized, a proper syntax should be defined.
The message content. Consists of multiple Payload Parts.
Each Payload Part consists of an ID, a Payload Part Name, a MIME type for the Payload Part Body, and the Payload Part Body itself, which contains the actual data.
Payload Part ID
Identifies the Payload Part, so that other Payload Parts which may wish to transclude it may do so. An example of this principle is seen in HTML emails with inline-attached images.
Payload Parts with the same ID, but different MIME types, should be considered different means of representing the same content. In this way, plaintext, HTML and audio forms of the same message might be simultaneously offered.
Payload Part Name
A human-readable description of the Payload Part. If Payload Parts are broken out into separate files, this would be considered the part's filename.
Payload Part MIME Type
A hint as to how the content of the Payload Part Body should be used.
Payload Part Body
The raw meat of a Payload Part.
GPG signature of entire encrypted body of message, using the sender's private GPG key.
(This was originally posted to Google+ at: https://plus.google.com/108080062547354628132/posts/HnjDEkL6thr and contained the first off-the-cuff version of this spec. It's included here for background purposes.)
A very basic, terribly inefficient alternative to email:
- Compose a message to your intended recipient.
- Prefix this message with set of recommended return paths. (A list of pastebins you monitor, for example)
- Prefix this message again the source address. (That address being your public key and key ID.)
- Encrypt the message with their public key.
- Prefix a destination (their public key ID) to the encrypted form of the message.
- Post to a pastebin (or more than one) you expect the desired recipient might check.
In this way, the routing data leak is limited. The return path is not leaked. The message source is not revealed in an encrypted form, and the layer 3 source can be obscured using tools like Tor. And if the pastebin you use is wholly within Tor, even better.
It's not perfect, though. There's an obvious need for indexed searches, and servers providing this functionality provide an obvious vulnerability. And anything that lets you follow packets through the Tor network (say, for example, that a powerful hostile entity spins up a network of a million or so Tor nodes, controlling the majority of nodes in the network, and performs realtime analysis on them) also obviously leaks layer 3 identity information.
And, of course, there's no safe way to do aggregate anti-spam analysis without a few thousand honeypot spam targets with spam/ham classification and subscription-based heuristic data distributed on a push basis...but spammers could easily learn to be more selective which keys they target for spamming based on how long the key has been in the system, and avoiding known honeypot keys and key-age ranges with high honeypot densities.
But I haven't seen anyone else come up with or discuss a mechanism for secure online asynchronous message passing to replace email.