Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Symmetric Encryption for Swarm Content
Symmetric Encryption for Swarm Content
It is a natural requirement for many use cases to store private information accessible only to authorized parties in Swarm. Since in Swarm there can be no expectation of not sharing any information with other nodes, the only way to prevent unauthorized parties from acessing it is encryption. In this case, authorized parties must be in possession of the corresponding decryption key, while unauthorized parties must not.
The objective of this document is to extend Swarm with an encryption suite that would allow the development of decentralized applications that need to store and manipulate large amounts of private data the same way as they are currently able to store and manipulate public data. The only difference between accessing private and public data will be the presence of a decryption/encryption key and the computational overhead related to encryption.
Encryption is to be implemented on top of the DPA layer, with the underlying infrastructure for storing, locating and retrieving chunks unchanged. In particular, this implies that ciphertext chunks fit in the same 4096 + 8 bytes as plaintext chunks. However, in order not to disclose the length of the plaintext, all ciphertext chunks must be padded to full size and the length must also be encrypted.
The computational and storage costs of various operations, such as storing and retrieving full or partial plaintext as well as retrieving or changing parts of raw binary data, manifests and file collections must remain the same as their unencrypted counterparts except for a constant or linear overhead (same big O).
Security-wise even in case of (even adaptively) chosen plaintexts, an attacker must not be able to distinguish ciphertext chunks from random data (identically uniformly distributed, independent random bits) as well as ciphertext chunks resulting from encrypting other data.
The API should be as close to the unencrypted API as possible. In particular, the only change is that the Swarm hash hp of the plaintext is replaced by the pair (hc, k) of the Swarm hash of the ciphertext and the symmetric decription key.
Plaintext chunks consist of a 8-byte length field encoding the size of the binary blob accessible through a Merkle structure for which this chunk is the root, followed by the chunk payload which is at most 4096 bytes. Before encrypting, each chunk payload is padded to exactly 4096 bytes, as its actual length can be deduced from the length field as follows:
payloadLength := length while payloadLength > 4096 payloadLength := payloadLength + 4095 payloadLength := payloadLength / 4096 payloadLength := payloadLength * refSize
length is the content of the length field and
refSize is the sum of size of the referencing hash value and that of
the decryption key, which is currently 64, as we use 256-bit hashes and 256-bit keys.
This procedure can be used to remove the padding after decryption before returning the plaintext chunk. To frustrate keyspace search, the padding must be random. This way, the only distinguishing feature of a plaintext chunk is length being much smaller than 264, but for any ciphertext, there will be a large number of keys (well over 2192) resulting in such plaintexts.
Chunks are encrypted and decrypted using a stream cipher seeded with the corresponding symmetric key. In order not to increase the attack surface by introducing additional cryptographic primitives, the stream cipher of choice can be SHAKE256 as defined in FIPS-202 as it relies on the security of the same Keccak sponge function as used in Swarm hash. Another attractive alternative is to use SHA3 in CTR mode (i.e. hashing together the key with a counter), which is considerably slower than SHAKE256, but has the desirable property of being easier (and cheaper) to implement in EVM, lending itself to use in smart contracts constraining the plaintext of encrypted Swarm content.
It is important to emphasize, that encrypted Swarm chunks are not different from plaintext chunks and therefore there is no change whatsoever on the P2P protocol level. The proposed encryption scheme is end-to-end, meaning that encription and decryption is done on endpoints or protocol gateways.
DPA Put and Get
At the DPA layer, the encrypted version of
Put has the same argument (the plaintext chunk) as its unencrypted counterpart. It
generates a random encryption key, pads and encrypts the plaintext chunk with it and submits the ciphertext to unencrypted
returning both the Swarm hash returned by it and the encryption key. In order to guarantee the uniqueness of encryption keys as
well as to ease tha load on the OS's entropy pool, it is recommended (but not required) to generate the key as the MAC of the
plaintext using a (semi-) permanent random key stored in memory.
The encrypted version of
Get takes a reference (consisting of a ciphertext hash and a decryption key) as its argument instead of
just the hash in the unencrypted version. It calls the unencrypted
Get with the ciphertext hash, retrieves the ciphertext chunk
from the DPA and decrypts it using the supplied decryption key, returning the plaintext chunk.
Higher protocol layers
The API's of those do not change except for accepting 512-bit references (consisting of a Swarm hash and a decryption key) in the place of the 256-bit Swarm hash in the unencrypted version.
There will be two alternative ways of passing the encryption-decryption key to command-line utilities:
- Directly in a command-line argument (unsafe, but useful for testing)
- By passing the path to a file containing the key in a command-line argument, in which case the file's access control will take care of security.