New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for file encryption (e.g. non-trusted servers) #109

Natanji opened this Issue Apr 4, 2014 · 137 comments


None yet

Natanji commented Apr 4, 2014

So I have had a look at BitTorrent sync, syncthing and alternatives and what I always wondered about was the possibility to not only sync between resources I own and trust, but also external resources/servers which I do NOT trust with my data, up to a certain extent.

One way to do this is using ecryptfs or encfs, but this has many obvious downsides: it is not an interoperable solution (only works on Linux), the files are actually stored in encrypted form on the disk (even if the resource is trusted and this is not necessary, for instance because of the file system being encrypted already), etc.

What I propose is somehow configuring nodes which are only sent the files in an encrypted format, with all file contents (and potentially file/directory names as well; or even permissions) being encrypted. This way, if I want to store my private files on a fast server in a datacenter to access them from anywhere, I could do this with syncthing without essentially giving up ownership of those files. I could also prevent that particular sync node from being allowed/able to make any changes to the files without me noticing.

I realize that this requires a LOT of additional effort, but it would be a killer feature that seems to not be available in any other "private cloud" solution so far. What are your thoughts on this feature?

EDIT: BitTorrent sync mentions a feature like this in their API docs: "Encryption secret
API users can generate folder secrets with encrypted peer support. Encryption secrets are read-only. They make Sync data encrypted on the receiver’s side. Recipients can sync files, but they can’t see file content, and they can’t modify the files. Encryption secrets come in handy if you need to sync to an untrusted location." (from


This comment has been minimized.

jewel commented Apr 4, 2014

This would be amazing. I tried to spec out what this might look like in this clearskies extension, but it adds so much complexity that I've tabled plans for it for now.

Like you say, if only the file contents are synchronized to the "untrusted" peers, that would be a lot simpler to implement (i.e. the metadata never hits the untrusted peer in any form). I hadn't thought of that.


This comment has been minimized.

Natanji commented Apr 4, 2014

It seems like you even thought of a zero-knowledge-proof to show that the server is legitimate/actually stores the files (did I understand that correctly?). Not bad.

CTR mode sounds like an extremely bad choice for me, just like other stream ciphers like GCM. Yes, it is seekable and that is useful, but XORing two snapshots of encryption on top of each other will result in an adversary knowing what changed between the plaintext of those two files. CBC is a much better choice: when seeking, you may need two blocks of ciphertext to decrypt the first block of plaintext, but that is negligible usually because you will read more than one block anyway, and the more you decrypt the less overhead you get.

I don't really understand why encrypting everything - including metadata - should somehow be "easier" or simpler to implement. Maybe I'm misunderstanding you? What do you mean?


This comment has been minimized.

jewel commented Apr 4, 2014

I think you might have misunderstood, I was trying to say that it'd be simpler to implement if the metadata isn't synced.

Thanks for the feedback on CTR mode, I wasn't aware that seeking was possible with CBC mode.


This comment has been minimized.


calmh commented Apr 5, 2014

I could see how this would be useful. As you say, it would require some work because it's currently not a part of the design at all - an additional layer of encryption would be needed. There would obviously be some trade offs between privacy and efficiency, ie if a blocks changes in the middle of a large file, do we resynchronize just that block and leak that fact or re-encrypt the entire file etc.

@calmh calmh added the far-future label Apr 5, 2014


This comment has been minimized.


calmh commented Apr 5, 2014

Also slightly related to #62 which is similar functionality, minus the encryption (ie for when we trust the node with the data, just not with modifying it).


This comment has been minimized.

NickPyz commented Apr 7, 2014

This idea is particularly useful for people who would use syncthing to setup a network of devices, and require 1 of them to be available 24 hours a day. Let's say all the devices are trusted except for the always on device, which is a 3rd party VPS server.

In this case, it would desirable for some additional properties built into syncthing so that the VPS node has the following characteristics:

READ ONLY (can't change any data)
ENCRYPTED (so the VPS personnel can't see the data).

No doubt this adds complexity and performance hits to support the encryption, especially if this project eventually extends to devices that don't support hardware-based encryption, such as most current smartphones,


This comment has been minimized.

kylemanna commented Apr 13, 2014

Tahoe-LAFS has this feature and it would be awesome if a more usable implementation (I find tahoe-lafs WebAPI very painful and difficult to use).

It has the notion of "storage nodes" that hold chunks of distributed encrypted data. The default configuration is that any 3 chunks can restore the file out of a goal of 10 chunks on different storage nodes using erasure coding.

It would be nice if syncthing could support the distributed fractional concept as well, but that sounds like a topic for another issue. It may be out of scope too, hopefully not :)


This comment has been minimized.

Natanji commented Apr 16, 2014

Tahoe-LAFS sounds pretty much exactly like what we want - what and incredible find, I hadn't heard of it. Thanks, @kylemanna :)

The way I see it, syncthing already has the functionality to keep directories in sync and perform upload/download operations between the nodes whenever something changes. So the feature we want might not be that far out of reach: whenever a file changes, then we have to call the API of tahoe-lafs and upload/download the file.

I agree that we should start with a configuration where files are simply replicated completely on all foreign nodes. Fractional distribution can be added later if this setup turns out to work well.

The solution would also work on both Windows and Linux, which is a huge plus! And we don't have to do any crypto storage of our own, which would most probably turn out to be a failure anyway, I presume. :)

Sooo... anyone see a problem with this approach yet, from a design perspective? @calmh, do you think syncthing's design is compatible with tahoe-lafs?


This comment has been minimized.


calmh commented Apr 16, 2014

Sure. However if Tahoe-lafs is a "cloud storage system", then perhaps that is all you need and syncthing doesn't add much to the equation if you already have that up and running?


This comment has been minimized.

kylemanna commented Apr 16, 2014

I played with Tahoe-lafs for a while and it doesn't really do what I want. The major deal breaker for me was that the storage nodes don't work behind NAT. Everything I could find suggested that I needed to do port forwarding and tunneling of some sort. I'd imagine that a significant portion of the user base for syncthing is behind a NAT.


This comment has been minimized.

mcg commented May 12, 2014

These days, without some form of encrypted/untrusted node support, Syncthing is probably going to be unusable for some portion of users. One of the reasons I choose BT Sync over other solutions was it's support for this.


This comment has been minimized.

elimisteve commented May 17, 2014

This feature would be great because it'd allow me to replace the rather bulky and Mono dependency-laden SparkleShare with syncthing, which is much easier to set up :-).

SparkleShare has been working well for me, though.


This comment has been minimized.

EvdH0 commented May 18, 2014

This would indeed be a great feature!


This comment has been minimized.

menelic commented May 19, 2014

This would indeed be a great feature, especially if it could be defined on a folder and/or file level as in BTSync. I'd argue adding this feature is part of the "BTSync-replacement" goal. This would add complexity for sure, but it would be great to have one Syncthing interface from which I can manage my synchronised shares with people who are supposed to have access as well as with locations which are not supposed to either have access or be able to see the files. As a VPS user, this would be great for me - and surely for a lot of others as well.


This comment has been minimized.


bigbear2nd commented May 28, 2014

For me, this feature is the only thing keeping me from switching from BTSync to Syncthing.

How the encryprtion works with BTSync ist discribed here in detail:

The use would be for me: I can store data at a friends home, at my family members PCs and i dont have to worry about that they can access my data. Additionally i can store data for them and i cannot access it.

The more people which have my data, the faster my download / upload and spreading of data is.
Additionally, the safer my data is.

Syncing data is for me not only having it available, it has become data safekeeping as well.


This comment has been minimized.


jedie commented May 28, 2014

Can closed source projects ever offer security? Keyword: verifiability...

And should i really sync important files to an untrusted location?

Just my 2¢...


This comment has been minimized.

Natanji commented May 28, 2014

Syncthing is open source. That's the point. That's why I don't want the
closed-source BTsync but a functionality in an open source project such
as syncthing.

Syncing important files to untrusted locations is usually not a problem
when they are encrypted+signed. Or where do you see the problem?

On Mittwoch, 28. Mai 2014 11:44:09, Jens Diemer wrote:

Can closed source projects ever offer security? Keyword: verifiability...

And should i really sync important files to an untrusted location?

Just my 2¢...

Reply to this email directly or view it on GitHub


This comment has been minimized.


bigbear2nd commented May 29, 2014

Quote: Can closed source projects ever offer security? Keyword: verifiability...

Thats why i want to change.

Quote: Syncing important files to untrusted locations is usually not a problem when they are encrypted+signed.

I totally agree on that.
But for me, I would say that my family members and friends computers are kind of "half trusted" locations.


This comment has been minimized.

nadalle commented Jul 20, 2014

One issue with the clearskies proposal is that it only addresses encryption of file data, not metadata about the file like name length, file size, etc.

If you really don't trust the remote storage, this is not sufficient -- it's often possible to tell what's in a directory tree just by looking at the file sizes, for example. Encrypting file systems try to mitigate this somewhat by padding files up and so forth, but dealing with the remote security issues may be rather hard.

At minimum, you probably want to think about randomizing the file locations in the tree and padding the files. Better would be to break them up into blocks and scatter them around in fixed size storage chunks that the remote end doesn't know anything about.


This comment has been minimized.

nadalle commented Jul 21, 2014

To elaborate a little bit, you can't entirely eliminate the data leakage if you're storing in a completely untrusted location. For example, at an absolute minimum, someone who can watch your traffic can tell how much data you change (and thus need to sync) every hour/minute/etc.

But systems that just encrypt the file data (and hopefully the names) leak a lot more. For example, say I just stored the new Weird Al album in my share. Even encrypted, rounded up to 16 byte boundaries, the directory contains files of these sizes:

    By track     By size
 1.  7151712     5497664
 2.  9123472     5822608
 3.  5822608     7032608
 4.  5497664     7151712
 5.  9032544     7159040
 6.  8931184     7856016
 7.  9947920     8931184
 8. 10858000     9032544
 9.  7159040     9123472
10.  7856016     9947920
11.  7032608    10858000
12. 21923472    21923472

Probably no other set of files will show this pattern. So it's pretty easy for an adversary with a database of these things (they exist) to tell that I have a Weird Al album there.

You might assume that the sort order of the files will be scrambled, but of course the tool probably uploaded them in order (so they can get it from the CTIME). Even if it didn't, the file sizes are nearly as good in sorted order. You might try to store the files in random locations in the directory tree (better), but that has the same CTIME problem.

If you really want to have much hope of a secure system here, you really want to avoid storing the data in files entirely. One simple way to think of this is to break all the data you want to sync into 4k blocks, and then have the untrusted side store a database of SHA256 hash -> encrypted 4k block. You do updates by sending new blocks, and then giving the remote store a manifest of which blocks are still needed (the data about file names and the map of blocks to files is itself stored in encrypted 4k blocks hidden in the data). The layout of the database is now mostly irrelevant, since the protocol just talks in terms of hashes and manifests.

You'll note that this is starting to look a lot like a filesystem in its own right. I think something like this is probably needed to have a reasonable level of security.


This comment has been minimized.

Natanji commented Jul 24, 2014

Well, the question certainly is what counts as "reasonable". There are file systems like EncFS and ecryptfs which expose the same problems that you mention here, but are still widely used - especially for cloud storage. If syncthing can do it just as good as these state-of-the-art systems, then that is a big leap forward!

Security is never absolute, but relative to a use case. Leaking the alphabetical orders can be easily circumvented by shuffling the order in which files are uploaded - that is a good idea. Leaking the file sizes can lead to some exposure, but for most use cases leaking your musical preferences will not be the end of the world. Files with private data in them, however, would still benefit a hundred percent from having just their name and contents encrypted, like in EncFS or ecryptfs.

Don't get me wrong: it is important to think about these issues. But we don't have to come up with a perfect solution that exposes absolutely nothing under no circumstances ever. If a perfect solution fulfilling 100% of the use cases means so much work, then it should be fine to opt for a much less complicated option that just fits 95% of use cases - at least for now, until a better option is available.

Perfectionism is the greatest enemy of open source progress. ;) As long as you inform your users of the security implications, e.g. what does not get protected and what does, it's completely legitimate.

@calmh calmh added enhancement and removed far-future labels Jul 26, 2014


This comment has been minimized.

Phyks commented Aug 3, 2014


I'm really interested in Syncthing, but client-side encryption is a major feature for me, as I want to sync my files against my dedicated server (which hosts several other services) and thus, I don't want to risk to have any sensitive files unencrypted on such as server.

I read this issue and saw that this feature is ongoing. But do you know about some working setups usable as of today ? For example using an encfs or ecryptfs container which could be automatically mounted and unmounted before / after each synchronization or something similar ? (just for basic file content encryption, waiting for a better solution directly implemented in syncthing)

Thanks !


This comment has been minimized.

Finkregh commented Aug 10, 2014

i'd offer 40 EUR when i could replace tahoe-lafs with syncthing... 👍


This comment has been minimized.

elimisteve commented Aug 10, 2014

I'd put money in, too.

What don't you like about Tahoe-LAFS specifically?
On Aug 10, 2014 11:49 AM, "Oluf Lorenzen" wrote:

i'd offer 40 EUR when i could replace tahoe-lafs with syncthing... [image:

Reply to this email directly or view it on GitHub
#109 (comment).


This comment has been minimized.

Finkregh commented Aug 10, 2014

well, i like tahoe-lafs very much concept-wise, but putting in files and the whole setup is a pita :/

i need something that i can just install on my mothers' PC ;)


This comment has been minimized.

Finkregh commented Mar 2, 2015

@cydron FUSE:

  • it makes things more complicated/layered. The beauty of syncthing at the moment is that i can use my normal filesystem, run one binary (which even patches itself, how cool is that) and i am about done. Mangling all files through another layer (FUSE) adds complexity which i suppose is not that easy to hide
  • it won't run on windows as far as i know (i might be wrong there and there is some easy-peasy to use FUSE that runs on windows ;) )
  • (if i want something complex which i can only read/write to via ftp/fuse/... i use TahoeLAFS (i love their concept, tho))

This comment has been minimized.

generalmanager commented Mar 2, 2015


Well if it's not at the protocol layer, then I don't even see how you could make this secure, as your shady VPS provider could grab your syncthing private key and start asking for unencrypted data from other peers.

That's why it would be easier to not send anybody plaintext ever.

Which isn't too difficult if we use symmetric encryption for the data, and do the following:

1. Store only encrypted files on trusted AND untrusted nodes, give the key only to trusted nodes and present the decrypted files via something like fuse. If we don't want fuse, the trusted devices have to store the encrypted AND the decrypted blocks, doubling the storage requirements.

If we only want to store plaintext on trusted nodes, it becomes more complex:

2. Never send unencrypted blocks over the wire to anybody. Trusted peers receive the key to decrypt the encrypted blocks via a secure channel once. (This could be PGP encrypted email, SCP, someone walking from machine to machine with a thumbdrive etc. depending on the thread model.) Trusted peers only store plaintext, but have to store the mapping between encrypted blocks and plaintext blocks. This means we need to save twice the amount of hashes. The trusted nodes also have to decrypt/encrypt all incoming/outgoing blocks on the fly, which will make it more resource intensive. The details of the implementation are certainly going to be interesting. As we should use something like AES-GCM the question arises how we can make sure that the nonce is the same on all machines AND never used twice? This could be done by using a pseudo random number with 96 bits of entropy for each block, because it is highly unlikely that the same number will be generated twice. The nonce is then sent unencrypted along with the ciphertext. That's ok because the nonce doesn't need to be secret. But it means, that while trusted nodes can decrypt the block, compare it's hash to their list of hashes of decrypted blocks and thus throw it away if it already exists, it still has to be sent over the wire. It also means that untrusted nodes have to store files multiple times, if they are added separately on out-of-sync nodes.
This obviously sucks.

But it could be solved with deterministic encryption (the same input always creates the same output for one key). If the same plaintext always produces the same ciphertext, the untrusted nodes can compare the hashes of ciphertext blocks, so they don't store files multiple times, if those were added on different trusted machines while offline. And trusted hosts can compare a list of hashes of encrypted blocks to their own list of hashes of encrypted blocks, which means they don't waste traffic on files they already have. (Note: I used the term deterministic encryption a bit misleading here. AES is for example deterministic, but made non-deterministic by using different IVs/nonces.)

Intuitively one could think about something like this:

3. Nearly everything is the same as in 2. but instead of a completely random nonce we use the (first 96 bits of the) hash of the unencrypted block (plus a shared secret to protect against file confirmation and similar attacks) as the nonce.
This way the ciphertext is always the same for identical plaintext blocks, but it leads us to the barren lands of not well researched crypto and doesn't sound like a good idea:

If anybody has more information on how this can be done securely, I'd be happy to hear about it.

So we are left with two options: asymmetric deterministic encryption and convergent encryption .
Let's take a look at asymmetric deterministic encryption first:

4. If we instead choose to use deterministic asymmetric encryption (like RSA without padding), we would have to create one shared key pair. This would itself have to be moved to all trusted devices (as the key has to be in 2.).

However, there don't seem to be any widely used encryption schemes using deterministic encryption. Apparently some interesting work was presented at the Crypto 2007 and Crypto 2008 conferences, but I haven't really looked into this yet.

Also I am not sure if there are any modern encryption schemes with elliptic curves which would allow for fast asymmetric encryption with small keys.

Any information on those topics would be appreciated.

5. In convergent encryption files are encrypted with their own hash (more precisely: their hash and a static shared salt).
This gives us the very interesting property, that it is possible to determine if two encrypted blocks are identical without decrypting them first. This way untrusted nodes don't need to store some files multiple times and trusted nodes can check if an encrypted block exists without having to transfer and decrypt it first.
That's what Tahoe-LAFS and Maidsafe use and seems to be one of the best ways to do this kind of thing.
A good writeup can be found here:
Now we won right? Not quiet, because how do the trusted nodes get the keys (hashes) to decrypt files?
We will send the hashes of the unencrypted blocks along with the encrypted blocks, which are itself encrypted with AES-GCM and a random nonce. The nonce is then appended to the encrypted hashes.
Convergent encryption has the problem that hashes of public files or files the attacker knows can be precomputed and the ciphertexts can be compared. This means an attacker could proof that you stored a forbidden book/pirated movie/mp3 in your encrypted files. This and another, more nuanced attack can be preveted by adding a static salt to the cleartext of a file before creating the hash with which the file will be encrypted.
More details:

Also I don't see how de/encrypting at the block level via fuse suddenly solves the "know how data changes over time" problem.

It doesn't, I just wanted to point out that we have to be very careful which ciphers we use, because most are not resilient enough for a threat model where the adversary can see the ciphertext change over time. And if no node can be tricked into sending the cleartext, because they all just work with the ciphertext, which is stored on disc, that's a big bonus. At least if we don't want to store the ciphertext AND the cleartext on all trusted devices, effectively doubling the storage needs.

FUSE is certainly not the solution to all problems, but it's one of the easier ways to allow users on trusted nodes transparent access to the encrypted data with low implementation effort from syncthing, no storage overhead and only computing overhead when the files are accessed or changed. And it supports nearly all platforms.

It's possible to use FUSE on windows, as was previously discussed in this thread. But a second tool called Dokan/DokanX has to be installed.

Could you take a look at this? If I made a mistake, it should be corrected as quickly as possible. Thanks!


This comment has been minimized.

bobobo1618 commented Mar 2, 2015

You're really worrying me with the focus on FUSE.

it's one of the easier ways to allow users on trusted nodes transparent access to the encrypted data

For syncthing developers maybe but try getting your grandma or even mildly technologically proficient friend to set up encFS on Windows with Dokan or try grabbing a few of your home files while you're at work or school and don't have admin access to your machine.

no storage overhead and only computing overhead when the files are accessed or changed

On platforms with low computing resources available like the Raspberry or Android devices, storage is often more available than computing.

And it supports nearly all platforms.

Show me a FUSE filesystem running on:

  • BSD
  • Android
  • iOS

BSD might not be a big deal for you but I doubt I'm the only one running this on an NAS. The latter two are pretty important to most users.

I don't think the tradeoffs here are nearly worth it. Dropping support for platforms, devices and use cases isn't a good decision for usability and adoption of the project. The current easy 'run a file' way Syncthing works is great and I think it should stay that way.


This comment has been minimized.


AudriusButkevicius commented Mar 2, 2015

@generalmanager I understand very little about crypto and I am not the most clever man on the planet to start with, but my initial ideas were as follows.
It does require a bit of understanding how syncthing works under the hood I guess, but you seem to be "on the ball".

Most of it echoes what you already said, but probably in more implementational terms.
Verify that you agree with what I said, and fill in any blanks if you have any answers.

First let's start with the fact that we have a constraint:
We are using a fixed block size (128kb), hence for N bytes input, our ciphertext has be N or more bytes but not more than 128kb, which sort of corners us into using AES-ECB (google might lie, and there might be something better?).

My basic ideas which might not be secure, but should however make getting plaintext data harder:

Plan A (allows reusing blocks, leaks info about two identical blocks/files):

  1. Have a shared secret [1] .. (which as you say is shared via a postal pigeon or whatever)
  2. Encrypt sensitive metadata [2] in indexes with AES-whatever perhaps using first N bits of the first block hash as the IV (this one is not constrained by length limits as much)
  3. Sign Index/IndexUpdate messages using the shared secret (file versions changing will act as a random nonce) preventing replay attacks by an attacker advertising some random old signed index to the cluster (newest version wins) causing DoS.
  4. In the Index/IndexUpdate messages leave block hashes unencrypted. So plaintext hashes are on the encrypted machine. They will allow you to distinguish two identical 128kb blocks among any files, regardless of how it's stored on the disk. (As well as potentially a known plaintext attack given you have a plaintext block with the same hash? Some random license file in some git checkout for example...)
  5. As other peers start asking us for data for some random block (by hash) we just encrypt the block with the shared secret (constant IV or block hash as the IV?) and send that across. Nodes which have the shared secret decrypt and store plaintext, nodes which don't have the shared secret, store the ciphertext and trust that the given plaintext hash matches what's there.
  6. As we ask information from an encrypted node, the node does block lookup by plaintext hash, and provides us with ciphertext, which we can verify if it matches the plaintext hash once decrypted.


  • Reduced bandwidth on encrypted nodes due to block reuse.
  • Potentially eventually reduced storage on encrypted nodes (makes me shake when I think how hard it would be to implement this)


  • Fairly weak?

Plan B (prevents reusing blocks, does not leak info about two identical blocks, 1-3 same as before):

  1. Have a shared secret [1] .. (which as you say is shared via a postal pigeon or whatever)
  2. Encrypt sensitive metadata [2] in indexes with AES-whatever perhaps using first N bits of the first block hash as the IV.
  3. Sign Index/IndexUpdate messages using the shared secret (file versions changing will act as a random nonce) preventing replay attacks by an attacker advertising some random old signed index to the cluster (newest version wins) causing DoS.
  4. Compute cipher text hash during the scanning process, store cipher text hashes along with plain text hashes.
  5. Advertise Index/IndexUpdate with cipher text block hashes. Given we use [3] or something that prevents you from identifying to identical blocks, you will not be able to reuse anything.
  6. As other peers start asking us for data, we use [3] to to encrypt it, the receiving device verifies that the received ciphertext hash matches the advertised hash, decrypts, hashes plaintext, stores both ciphertext and plaintext hashes, and writes the data to a file.
  7. Encrypted devices work exactly like plaintext devices work now.


  • More secure as blocks cannot be identified as exactly the same.


  • No block reuse, potentially increased bandwidth.

Plan C (a big overhaul):

  1. Move away from fixed block size throughout the whole protocol.
  2. Use RSA + AES for [1]
  3. ???
  4. Profit

[1] Ideally I'd like a readonly secret and readwrite secret but block cipher as such does not seem to exist. Plus the constraint that we have means we can probably only use ECB without a major protocol rewrite and going to plan C.
[2] Mainly file/dir names,flattening out the folder hierarchy. You will still know if a file is a file and not a symlink, what permissions it has, etc.
[3] We could use the shared key + HASH(unencrypted filename + block index in the file/block hash) as the IV for encrypting the block.


This comment has been minimized.

JIVS commented Mar 6, 2015

Just wondering, is the possibility of using dedicated server-side software, ie: a version of syncthing specific for servers, completely out of the question?

By "insecure server" I don't assume you mean you don't have enough privileges to install and run software on it and only store files but simply a server that might be accessed by others without your consent.


This comment has been minimized.

bitshark commented Mar 6, 2015

Great discussion here... really interesting food for thought...


As we've discussed, any 'crypto' is only as good as the weakest link. For example, regardless of any implementation (FUSE, kernel drivers, userspace encryption, whatever you choose)... If the 'master' host (for example a Windows machine) is infested with Malware/Keyloggers etc, then the whole point is moot because even a perfect implementation is compromised.

Same premise holds if our hardware is compromised / backdoored. There is a reason DARPA started X-ray'ing ASICs and FPGAs sourced from Asia, to ensure there were no hardware backdoors.

"Almost all FPGAs are now made at foundries outside the United States, about 80 percent of them in Taiwan. Defense contractors have no good way of guaranteeing that these economical chips haven't been tampered with. "

How are we to be sure that the BIOS chips we use (or TPM etc) are not backdoor'ed? We don't. We can either accept or reject the premise, but that's about all we can do.

And so any security will only be as good as it's assumptions.

Then we have the ironic fact that most crypto is never broken based on the cipher, but rather on implementation goof-ups, or more insidiously -- side-channel attacks.

I think the best example of this is the 'padding oracle' attack which is theoretically possible against any block cipher operating in CBC mode .. And in practice this resulted in the Lucky13 attack against SSL/TLS which were thought to have been fixed!

"Any person can invent a security system so clever that she or he can't think of how to break it."
-Schneier's law

Even the best we can build is only as strong as the weakest link.

There is also the balance between ease-of-use / ease-of-installation vs security, as well as the issue of 'scope creep' / reinventing the wheel VS. that a P2P block exchange may need it's own solution to be optimal.

encFS Results

As for encFS, I've got my own issues with it. . . It's not suitable for cloud storage beyond a 'single snapshot' model... The main issue I think is (1) any 'fix' of encFS would necessarily break all backward compatibility with previous versions, and (2) it's 10 years old (plus) now, so it would take significant work to bring it up to 'state-of-the-art'.


I think what you have outlined is a fair approach, in terms of only approaching the problem of securing data 'in transit'. This lets the user manage their own solution in terms of local crypto, whether encFS, dm-crypt, Truecrypt, or otherwise. We do run into the problem that perhaps these 'user determined' crypto solutions are not ideal for a P2P network which exchanges small block-like chunks.

But if the user implements the same crypto globally , then Syncthing does not really care as to what it's transporting , which is a nice abstraction that simplifies life for anyone contributing code.

I agree that the particular enhancement detailed in this Github issue (baking in crypto for storage on untrusted nodes) opens up a whole can of worms and is a pretty good example of 'scope creep' . So I think we're in agreement that this probably isn't feasible given the constraints, certainly not before nailing down existing issues and closing them out.

Beyond that, as I'm sure you are aware, selective encryption on a per-node basis opens up the new set of problems regarding the synchronization of encrypted vs unencrypted blocks... Not to mention problems of key management, access permissions, key revocation, various attacks involving chosen-ciphertext , block vs stream ciphers, selection of appropriate IV generation, operating modes, and so forth.

On and on... Ideally, my feeling at the moment is either (1) let the user handle their crypto problem, or (2) take a long-term view and really implement a novel solution based on proven cutting edge techniques in cryptography -- specifically that relating to cloud storage, authenticated encryption ,and so forth.

In the latter case, it would be ideal to have an 'off-the-shelf' solution that could be dropped in... The only reason I think to DIY is if there were a clever way around some of the limitations... Certainly the current state-of-the-art in crypto literature is not focusing on creating secure P2P applications.


Thanks for clarifying my points -- after our conversations , I think you have done a good job of summarizing my thoughts on the issue, which have changed somewhat as I have done additional research.

"IF they get fixed upstream, this could be a good way to go, because EncFS has the best shot at beeing usable a cross-platform solution for encrypted storage."

This was my original thinking, but as I've delved deeper, I simply don't think EncFS is feasible as a solution UNLESS it's released as a "2.0" version -- which completely dispenses with legacy code... I do like the 'filesystem' level encryption which is convenient, but I think gains in convenience are a tradeoff with security.

Here is an example of one of the problems I just ran into today using EncFS on Linux... I tried using EncFS coupled with the google-drive-ocamlfuse module for mounting Google Drive as a 'shared' network drive... Now I had no problems with the encFS FUSE driver , but it's more of an integration headache from end-to-end... Note that this commentary is independent of Syncthing, and is simply looking at using encFS locally to mirror content to an encrypted folder on Google Drive (with google drive mounted in linux as a sort of 'network share' showing up as a local dir).

The idea here is to drop large amounts of unencrypted files into a mountpoint, have them transparently encrypted and uploaded to Google Drive without wasting tons of local storage making copies of everything.

Some major problems right out of the gate... The Google Drive FUSE driver is a pain to compile since it's written in OCaml. Even after I got it working, I was astonished to find there is not really decent built-in support for caching or buffering. (This highlights a major advantage to something like Syncthing , which breaks such transport into manageable blocks).

Anyway, so I found that what happens is if I copy a large (4 GB) file to my encFS directory (linked to the Google Drive remote network mount)... A 'cp' command of some large files to the EncFS plaintext input dir simply 'hangs' as encFS / Google Drive oCAML attempt to (1) encrypt the file and metadata and write to the Google Drive network mount point, and (2) we are subject to network-level failures (wireless AP disconnects ,etc). It's slow as heck, but it works as long as there is no transport failure.

But a wireless AP disconnect during copying in this setup will easily corrupt or ruin large gigabyte-sized files.

Given these issues on a native client on Linux -- there's a whole set of problems of the optimum between a 100% local mirror of content (which requires extensive use of disk space locally) vs a 100% 'network attached drive with minimal caching (which means that network failures of large files 'in transit' can cause data loss and corruption).

This is sort of a microcosm of similar issues which apply to Syncthing, albeit to a lesser extent since it's not 'all or nothing' in the transport sense. But I do have concerns with the lack of multithreaded / asynchronous performance of both encFS and the Google Drive oCaml driver. The performance of encFS-FUSE was quite poor in terms of handling 'below the surface' transport failures.

In plain English, if I were to disconnect / reconnect to the wireless access point while in the middle of copying data to the encFS plaintext dir , which is then writing to a 'network attached' Google Drive ... Network connection loss would corrupt the file in the process of being copied.


Regarding encFS vs BoxCryptor -- After release of BoxCryptor Classic (which is hypothetically compatible with encFS) , BoxCryptor ditched all compatibility with encFS for the 2.x series of BoxCryptor. So new versions of BoxCryptor are completely incompatible with encFS.

"Just for good measure I wanted to point at, which is an (not completely finished) GO wrapper for libsodium"

NaCl and the related wrappers are awesome. Good point. If I personally ever had to implement crypto that was 'roll your own', I'd absolutely say 100% the way to go is with NaCl or related libraries and wrappers. They are a fantastic compromise between using the bloated (but tested) OpenSSL libraries and rolling a custom solution (which could easily introduce major vulnerabilities, which NaCl could help prevent).

EncFS Attacks

"As the article by Thomas Ptacek, which we both referred to, as well as the EncFS audit make very clear there are plenty attacks left, especially if an attacker has knowledge about how the encrypted data changes over time."

This specifically is a huge concern for me, and is why I don't think encFS is feasible for any sort of remote storage which 'incrementally' changes on any regular basis. encFS might be fine for a 'one time' backup, or twice a year backup, but anything that gives a remote node insight into filesystem changes over time is going to be a major problem in ANY modern cryptosystem...

It's far worse with 10 year old technology like encFS.

So I agree there -- to me the limitation (more than FUSE, driver compatibility, trust models, or portability) is actually the fact that encFS is not suitable for cloud backup storage on 'untrusted' nodes. Not without some major updates anyway.

In fact, I think there are very few solutions suitable for this purpose besides more advanced technologies like convergent encryption, authenticated encryption, and so forth.

"Also I don't see how de/encrypting at the block level via fuse suddenly solves the "know how data changes over time" problem." -Audrius

You're right -- it doesn't. This is a major problem that keeps cropping up as I think about how to set up a good P2P network backup / sync system.

There's the critical issue that's become apparent with additional research -- how ciphertext data changes over time may allow an adversary to break the entire system. I'm not sure any current cryptosystem is equipped to become resistant to these sort of attacks (where you provide a potential attacker with N-snapshots of a ciphertext filesystem as it changes over time, where N could be quite large).

I don't know what the solution is for this issue. Even using a P2P sync tool with a Truecrypt file container could be problematic. I suppose it's a matter of balancing the threat vs the countermeasures. Usability vs Security, etc.


I understand the objection regarding the complexity of setting up Dokan on Windows -- any bundled solution needs to work out of the box. You would probably be okay if there was an installer that 'worked' regardless of the underlying solution, right?

I think what we are discussing here is not some much FUSE vs not-FUSE ... it's more questions as to (1) should we bother with this aspect of baking in filesystem encryption (probably not, hah), (2) are there good already-written solutions available off-the-shelf?, (3) what is the state-of-the-art in terms of P2P encryption, (4) benefits vs drawbacks of various solutions, etc.

The whole FUSE vs non-FUSE debate is really not the core of the issue, because most solutions developed in FUSE could be ported out of FUSE with enough time and effort. FUSE is simply conducive to rapid prototyping and testing, as it's just an abstraction layer on top of the common system calls for file/folder interaction.


Your post on deterministic asymmetric encryption vs convergent encryption is a really good overview. I'll check out some of your links and get back to you.

"This gives us the very interesting property, that it is possible to determine if two encrypted blocks are identical without decrypting them first."

This is a huge benefit for the idea of a distributed p2p system, where we are exchanging data on the 'block' level of arbitrary size (say 1k to 1024k). Ideally, if you and I are on the same network, and we both have a copy of the same movie (perhaps encrypted with different IVs or what-have-you) -- can we mutually share the blocks to accelerate synchronization?

And if that's possible, what do we lose by doing so?

"That's what Tahoe-LAFS and Maidsafe use and seems to be one of the best ways to do this kind of thing."

I agree, based on my somewhat limited knowledge of convergent encryption. But this allows 'watermarking' attacks , no? I.E. Some omniscient entity can prove you and I possess a copy of 'TransformersTheMovie.avi' or whatever, even if they cannot decrypt the movie from any of our encrypted shares?

Certainly an open-source solution vulnerable to watermarking is preferable to a binary-only client vulnerable to watermarking (ie. Syncthing vs. BitSync)... Syncthing would be far superior in this case, since at least we can be sure it's not backdoor'ed.

"This means an attacker could proof that you stored a forbidden book/pirated movie/mp3 in your encrypted files"

Okay, that's what I thought. I wish there was a way around this . Perhaps there is? For example, utilizing a 'keyed hash function' to calculate the block hashes for a file we re sharing, where the input to the keyed hash function is related to a shared secret or the results of key agreement?

I guess anything involves trade-offs.


I think what you have proposed in the latest message is on the right track.

My suggestion is that any implementation uses a construct of 'authenticated encryption' , where the idea of encryption and HMAC are combined into a basic primitive. The new ChaCha20 TLS standards have this 'baked in' -- whether at the transport level or otherwise.

As you've suggested, AES-GCM for a block cipher is another example of authenticated encryption, though not my personal favorite... It is off-patent and already included in openSSL, which is nice.

Personally, for block ciphers, I like OCB mode, but unfortunately this is on-patent. It's free to use for open-source non-commercial purposes though.

But regardless, I think we have numerous opposing forces here...

(1) Utilization of 'baked in' crypto vs 'Let the user run TrueCrypt'

(2) Level of Effort and Scope-Creep vs. Broad Spectrum of Applications

(3) Prevention of watermarking attacks etc vs Block P2P inter-operability

(4) FUSE type drivers vs. 'Works out of the box'

(5) Low-level (block/loopback device) vs. High-level (VFS or file/folder encryption)

Great discussion, best ideas I've seen in a long time. Don't want to get too sidetracked from any short term goals, but I think the last few pages of comments really get to the core of the issues regarding client-side and remote-side encryption.


This comment has been minimized.

bitshark commented Mar 6, 2015

Also, I do like the idea of a non-hardwired blocksize , but I agree it'd be a major overhaul.

Maybe a solution is to do key agreement on a separate shared secret K_dht -- call it a 'session DHT key' or something that's agreed upon using some decent DH primitive.

For any given file, the file's hash Fh = HMAC(K_dht, filedata).... The block hash for a particular block in a file at index idx, is HMAC(K_dht, Fh + idx + blockdata).

Then the only people that can derive the DHT are those with the shared secret,

Something like that just as a first idea, anyway.

Perhaps combining HMACs, Authenticated Encryption, and the idea of 'tweakable ciphers' (like XEX mode, the basis for GCM) can allow a balance between block-level sharing, untrusted storage endpoints, and resistance to watermarking.


This comment has been minimized.

bitshark commented Mar 6, 2015

Okay, so there's a way around the 'watermarking' problem of convergent encryption.

Convergent Encryption (standard / vulnerable):
Fkey = SHA1(Plaintext)
Ciphertext = AES(Fkey, Plaintext)
Locator_Hash = SHA1(Ciphertext)

Convergent Encryption (keyed / resistant):
Fkey = HMAC_SHA1(Skey, Plaintext)
Ciphertext = AES(Fkey, Plaintext)
Locator_Hash = SHA1(Ciphertext)

In the latter example, only those who know Skey can conduct 'proof-of-file' and related attacks, thus Skey is shared among all nodes in a cluster which are sharing files.

Using the latter example with AES-CTR mode, and a public per-file random IV, then we actually have completely random access to file blocks.


The general way such algorithms work is as follows:

The object to be encrypted is validated to ensure it is suitable for this type of encryption. This generally means, at a minimum, the the file is sufficiently long. (There is no point in encrypting, say, 3 bytes this way. Someone could trivially encrypt every 3-byte combination to create a reversing table.)

Some kind of hash of the decrypted data is created. Usually a specialized function just for this purpose is used, not a generic one like SHA-1. (For example, HMAC-SHA1 can be used with a specially-selected HMAC key not used for any other purpose.)

This hash is called the 'key'. The data is encrypted with the key (using any symmetric encryption function such as AES-CBC).

The encrypted data is then hashed (a standard hash function can be used for this purpose). This hash is called the 'locator'.

The client sends the locator to the server to store the data. If the server already has the data, it can increment the reference count if desired. If the server does not, the client uploads it. The client need not send the key to the server. (The server can validate the locator without knowing the key simply by checking the hash of the encrypted data.)

A client who needs access to this data stores the key and the locator. They send the locator to the server so the server can lookup the data for them, then they decrypt it with the key. This function is 100% deterministic, so any clients encrypting the same data will generate the same key, locator, and encrypted data.


This comment has been minimized.


AudriusButkevicius commented Mar 6, 2015

I assume skey is a shared secret?
Why can't you just use skey in the second example instead of a hash of skey + plaintext to the aes function?


This comment has been minimized.

bitshark commented Mar 6, 2015

(A) Yes, skey is a shared secret .
(B) I'm not 100% clear on why you can't just use skey in example two... so I don't know. But I think that the idea is that we want there to be a certain amount of 'determinism' to the encryption so that we don't waste bandwidth and space re-sharing the same file over and over.

In other words, if you and I both have copy of 'Transformers2.avi', and we have the same shared secret, then if I request to upload the movie to you.... The P2P network (aka you) will say don't worry about it, we already have a fully copy of that file that matches this hash ("locator"). BUT the network can only know that the Transformers2.avi file already exists on your computer by comparing the hash of a deterministically generated ciphertext.

File_Locator = SHA1(Transformers2.avi.aes)

I think that when skey is a shared secret across all users sharing files in a 'cluster'... Aka In the latter example above, skey just acts as a 'tweak' to prevent the "confirmation that you or I have a copy of Transformers2.avi type" attacks, unless the attacker knows skey.

The idea is basically that you can request a file from the server (or peer) by asking for it's file locator (the hash of the mostly-deterministically encrypted ciphertext), and see whether the file is already stored or not. This principle can apply to fixed or variable length 'chunks' as well, supposedly.

The two examples are above functionally identical , just the latter is more secure, and probably more appropriate for this discussion where we are not sharing 'with the world' on bittorrent, but rather on small private p2p networks.

The main point of convergent encryption is that it allows 'de-duplication' -- meaning that if we've already stored one copy of a file, then the server is smart enough not to store a second copy of the file -- but rather it stores just a reference to it in some sort of metadata / mapping table.

I know this particular scheme (convergent encryption) is utilized in Tahoe LAFS and Bitcasa... It may be utilized in BitSync as well, though on that I'm unclear.

I'm still trying to understand it fully , so my apologies if my explanation is not very good.

Check out the pdf linked below at the end -- it actually discusses convergent encryption in the context of fixed and variable length 'chunks' of a plaintext file.

Here are two other good links:

and this...

"Both models of our secure deduplication strategy rely on a num-
ber of basic security techniques. First, we utilize convergent en-
cryption [10] to enable encryption while still allowing deduplica-
tion on common chunks. Convergent encryption uses a function
of the hash of the plaintext of a chunk as the encryption key: any
client encrypting a given chunk will use the same key to do so, so
identical plaintext values will encrypt to identical ciphertext values,
regardless of who encrypts them. While this technique does leak
knowledge that a particular ciphertext, and thus plaintext, already
exists, an adversary with no knowledge of the plaintext cannot de-
duce the key from the encrypted chunk. "

Here are two resources that helped me so far:

(1) Secure Data Deduplication

(2) Source code for Convergent Encryption in Python (includes both example 1 and example 2)


This comment has been minimized.


AudriusButkevicius commented Mar 6, 2015

So you can't verify that someone has something, unless you have skey, because you need that to generate to locator to verify if someone has a given file.

If you managed to get a locator from someone for the given file you want to verify against, the fact that the content was encrypted using a hash over the plaintext is meaningless, as you got the locator already.

If you have skey, all bets are off, and there is no encryption left at that point.|

The only sensible reason I can come up with, is given you have the plaintext and the ciphertext of some other file, it's not possible to recover skey, because the ciphertext is encrypted with HMAC(skey, plaintext) rather than skey, so the only thing you can recover is HMAC(skey, plaintext) which is not good enough to decrypt a different file.


This comment has been minimized.

AliceBlitterCopper commented Mar 9, 2015

If in need to "sell" this feature to non tech users I say:
"Now you can setup a family-cloud or friends-cloud in which some of your private data can be stored redundantly: perfect for the backup of your photos f.e. which you don't want to give away but still need a backup for. Store it at your friends place."
I really would like to use it this way: instead of storing physical hd at several locations f.e.


This comment has been minimized.


Zillode commented Mar 9, 2015


This comment has been minimized.

bademux commented Mar 10, 2015

@Zillode let`s call it "mirroring"
for me it is all about mirror some data on different devices without setuping encfs\luks.


This comment has been minimized.

AliceBlitterCopper commented Mar 10, 2015

@Zillode @bademux Yes, that's a very good point. Let's not suggest too high SLA ;).

"... allows you to setup an 'attic' you can share with your family and friends, where your boxes are sealed and can only be opened by you if needed. It's not a replacement for a data-bunker or data-safe to store your most precious data. Sometimes an attic is what you need, though"


This comment has been minimized.

eyeson commented Mar 18, 2015

Going to throw my vote in here for this, having recently moved over to ST from Bittorrent Sync due to their whole faff, this is a feature I am missing


This comment has been minimized.

benguild commented Mar 23, 2015

@eyeson I think a lot of us in that same boat. We'd like to switch BTS to ST and can't because it's missing this feature.


This comment has been minimized.

djtm commented May 14, 2015

I'm thinking right now it would be cool to have something generic for this task. Sort of a reverse ecryptfs. Instead of an encrypted version on disk and a virtual unencrypted folder like ecryptfs you'd have the files unencrypted on disk and there is a folder which shows them only in encrypted form. Then you could use that folder and sync the encrypted version to other nodes with syncthing. I guess it should not be too hard to implement based on ecryptfs, either, as all the pieces are already there, they just need to be plugged in a different order...


This comment has been minimized.

djtm commented May 14, 2015

And someone already had the idea:
Oh and look, it's already implemented in encfs as encfs --reverse:

from the encfs man page:

Normally EncFS provides a plaintext view of data on demand. Normally it stores enciphered data and displays plaintext data. With --reverse it takes as source plaintext data and produces enciphered data on-demand. This can be useful for creating remote encrypted backups, where you do not wish to keep the local files unencrypted.
For example, the following would create an encrypted view in /tmp/crypt-view.
encfs --reverse /home/me /tmp/crypt-view

Of course you could also simply encrypt everything with ecryptfs and then sync the encrypted version, which is probably even safer. But while encfs is not cloud safe then encfs is arguably a lot safer than btsync.


This comment has been minimized.

bademux commented May 18, 2015

@djtm ecryptfs is the piece of sht, slow and buggy piece of sht. 2 or 3 years ago I try to do the same trick with encfs --reverse for dropbox-like services and just lost my time - the resulting "frankenstein" was too unstable and slow.
Wild (and incompatible with other world) guess:
maybe it is worth a try to use brand new ext4 encryption support in linux 4.1?


This comment has been minimized.

djtm commented May 18, 2015

@bademux I'm using ecryptfs shipping with Ubuntu for my home directoy. I never had any issues of any kind. The only thing is that it's a bit difficult to mount an ecryptfs directory from a bootable linux distribution. I'm not worried as much about speed as about security. The whole point of encryption is that it's reliable. encfs currently allows various attacks which are especially problematic within the cloud. However I believe it might be the better option to fix encfs, which had undergone security reviews by cryptography experts than to invent a new encryption here which will most likely end up being either a ton of work or mostly snake oil. As much as I'd love to see this feature implemented...

ext4 encryption will certainly not help, as the encrypted files will be invisible to syncthing.


This comment has been minimized.

MyPod commented May 18, 2015

As per the previous comments on this bug/enhancement request, @djtm, ecryptfs/encfs isn't good enough for something that is sent passing trough wires you don't own and can't control, as the changes within the structure of your database/filesystem can reveal informations. There is also to keep in mind that the solution has to be OS-independent and, in particular, work with windows (OSX could be easier to work with from what I know and understand), as well as being something (relatively) simple and easy to do that doesn't go beyond the scope of Syncthing.


This comment has been minimized.


calmh commented May 19, 2015

The requirement is clear; further discussion about the possible workarounds and this limitations fit better in the forum.

@syncthing syncthing locked and limited conversation to collaborators May 19, 2015

@calmh calmh removed this from the v1.0-maybe milestone Nov 17, 2015

@calmh calmh modified the milestone: Unplanned Jan 1, 2016


This comment has been minimized.


lkwg82 commented Feb 4, 2018

see PR #4331

@calmh calmh removed this from the Unplanned (Contributions Welcome) milestone Feb 11, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.