Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Encrypted folders #4331

Closed
wants to merge 20 commits into from

Conversation

Projects
None yet
9 participants
@AudriusButkevicius
Copy link
Member

commented Aug 27, 2017

Go through the TODO items in the file.

Issues that need to be addressed:

  1. Encrypted names break ignores. Workaround would be to make ignoring an integral part of the filesystem
  2. Encrypted names breaks versioning. Could be handled by having special decryption/encryption login that recognizes known patterns
  3. Handle temp name encryption.
  4. Versioners that support custom paths currently break things. Because they do filepath.Join(fs.URI(), "blah").
  5. Errors reported for encrypted names will be sad, probably need a fs.ReadableName()
  6. Name encryption is insecure fairly.
  7. Need to work out where to store nonces.
// the hashes provided, and returns a hash -> slice of offsets within reader
// map, that produces the same weak hash.
func Find(ir io.Reader, hashesToFind []uint32, size int) (map[uint32][]int64, error) {
func FindOffests(ir io.Reader, hashesToFind []uint32, size int) (map[uint32][]int64, error) {

This comment has been minimized.

Copy link
@AudriusButkevicius

AudriusButkevicius Aug 27, 2017

Author Member

FindOffests -> FindOffsets

@calmh

This comment has been minimized.

Copy link
Member

commented Aug 31, 2017

Do you want me to start looking at this?

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2017

Its not finished, but yes, it would be useful if you told me its nonsense before I wasted more time.

@calmh

This comment has been minimized.

Copy link
Member

commented Aug 31, 2017

Is there a three sentence elevator pitch for how this is even supposed to work and what it protects, or should I expect that to be obvious?

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2017

I think the relevant parts of the code are small enough to explain themselves. I guess commenting on the TODO sections helping solve the issue would be nice.

There is some magic around nonce storage etc, so I am all ears how to make it better.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2017

The basic idea is that we encrypt at the block level, so we get a random nonce which we store somewhere, encrypt BlockSize-nonceSize-aeadOverhead worth of data to get Nonce+ciphertext which is exactly BlockSize. We know the nonce size on the other end, which we remove from the beginning of the block.
W also optionally encrypt names by encrypting each path segment individually also using a random nonce.

We encrypt blocks using GCM, names using CFB, GCM has a fixed x+16 overhead and no padding (x is nonce size, default is 12), CFB has a 16 byte overhead plus padding to 16 bytes. Initially I planned to use just plain AES with no ivs or nonces which is a bit insecure, but if we loose the nonce file, the nonces for data blocks would change (we'd generate new ones) yet mtime and size wouldn't change so we'd end up with peers who had this file went away issues. This would still happen if you don't encrypt filenames and loose the nonce file. Perhaps we should offer different levels of name encryption, as in: not encrypt, encrypt potentially creating an attack surface by using plain aes with minimal overhead to file names, encrypt securely with random nonces adding a huge overhead to names.

@calmh

This comment has been minimized.

Copy link
Member

commented Aug 31, 2017

So this nonce file business sounds like nonsense to me. I'm also not really clear on how it works. For example, does the following work as expected?

  • Device A: share a folder F with device B, in encrypted mode. (A has the cleartext; B gets the encrypted data)
  • Device B: stores a lot of encrypted data.
  • Device C: adds the same folder as A, using the same password. It should download and decrypt all data from C to get back the cleartext.

For that to work I expect that B needs to know and store all the nonces, so it can give them to C. Maybe you already do this. Ideally, a file produced at C and sent should be identical at B compared to the same file created and sent at A: same contents, same nonces, etc.

The nonces need to be unique per encrypted piece of data but not unpredictable, afaik. I'd do something like

  • nonce for file metadata: the hash of the entire metadata blob
  • nonce for file block X: the hash of the block xor the nonce for file metadata

The encrypted device still needs to store all the nonces, but an unencrypted device doesn't need to store them, and they will be the same for the same file on two different devices...

I'd also go with the encrypted filename being the hash of the metadata blob from above, with a couple of slashes added at strategic places to not get a completely flat directory...

One difference between doing it at the layer you're doing and the protocol layer where I was envisioning it is that you have to take less than 128 KiB of data at the source and add the necessary block level data to add up to 128 KiB of encrypted data. With protocol support you would instead take 128 KiB of source data, add whatever data necessary, and have the encrypted device store 129 KiB or something per block, knowing that this is the case.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2017

As it stands now nonces are added to the data, so filename segments are prefixed with nonces, blocks are prefixed with nonces. You need to store nonces locally in order for you to pick the same nonce in rescan etc to produce same cipher text, which means you either need to store them as you generate them, or have a deterministic way of computing them which having read a number of interwebz comments says it's bad to have them predictable. We could for example say have a rand.Reader seeded by hash of the full filename, which means we wouldn't need to store them.

@Ferroin

This comment has been minimized.

Copy link

commented Aug 31, 2017

A nonce can't be derived from the plaintext if you want it to add any security at all. The point of having a nonce is to ensure that encrypting the same plaintext will not result in the same ciphertext, thus closing one of the simplest methods for an attacker to brute-force a key if they have the ability to encrypt arbitrary plaintext.

I don't quite understand why you would need to be certain that the same file created on A and C would have the same nonce for each block. Yes, it would allow some performance optimization in that C would not have to download the file from A and you would avoid the possibility of a bogus conflict file, but if you're talking about security, performance needs to come second to security, otherwise there's no point in discussing security. For most people who would be using this, I think the possibility of lower performance in some circumstances is probably an acceptable trade-off, especially considering the fact that it's not likely to be a common case to have a copy of a given file on both A and C get scanned at the same time.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Aug 31, 2017

The nonce is still random for a given offset in a given file, it's just that it's reused once it's set, otherwise we are talking about files constantly change without having been changed on the unencrypted filesystem, which makes an unworkable solution as we scan something, advertise hash of cipher text, and when client asks for for the data, the data does not match the hash.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Sep 1, 2017

@calmh have you actually checked the code?

@calmh

This comment has been minimized.

Copy link
Member

commented Sep 1, 2017

@Ferroin

This comment has been minimized.

Copy link

commented Sep 1, 2017

This may depend a bit on the implementation. Generally speaking Syncthing is happiest if a given block from the same version of the same file looks the same no matter who you ask for it. That is, I should get the same data for the same block regardless of whether I ask device A or device B. If both of those devices are encrypted I think they should have the same data. That means the encrypted data should not be dependent on who encrypted it, in my mind.

But the nonce is getting stored, reused (which means it's cryptographic salt, not a nonce, but that's a pointless argument right now), and copied with the data (at least, that's what seems to be the case, I can't tell 100% for certain as my Go skills are somewhat lackluster). That means that whichever device first sends data for a block should by definition dictate what the nonce is for that block, so the only case which is particularly problematic is if A and C aren't talking to each other and both happen to get a file with the same name and contents at the same time, and both rescan at roughly the same time, and I would think that that case is likely to be rare enough that it should probably just generate a conflict file and not worry beyond that.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Sep 1, 2017

That should be fine, as it's essentially a conflict.

@calmh

This comment has been minimized.

Copy link
Member

commented Sep 1, 2017

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Sep 1, 2017

Its not a problem at all, hashes and everything from syncthings perspective is for ciphertext. Decryption happens as a side effect during write in an encrypted filesystem.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Sep 6, 2017

@calmh can you please have a look at this and the general idea? I don't want to waste more time if we believe this is fundamentally wrong.

@calmh

This comment has been minimized.

Copy link
Member

commented Sep 6, 2017

I will. Sorry for dragging my feet

@calmh

This comment has been minimized.

Copy link
Member

commented Sep 20, 2017

OK, I've actually read through this now and it does not look like complete nonsense to me. :) Is it supposed to work, roughly? I'll do some playing around with it an then go in for an actual first review (there were some points of oddness, but I want to make sure I actually understand it better first).

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Sep 20, 2017

So the basic idea worked at some point before, it probably doesn't now given it has moved on a bit. As long as it's not complete nonsense, then I guess I'll continue poking at it.

This does not protect against file renames (files being swapped/moved in an encrypted folder), and it does not protect against block swaps within a file, as it stands now. The idea was that we could reuse our optimizations, but given the blocks are of cipher text, it's unlikely that any optimizations we have would work in general.

@AudriusButkevicius AudriusButkevicius force-pushed the AudriusButkevicius:crypto branch from 16db779 to 26f9875 Sep 24, 2017

@AudriusButkevicius AudriusButkevicius force-pushed the AudriusButkevicius:crypto branch from 8fbd251 to b4f9c1f Dec 3, 2017

@derDieDasJojo

This comment has been minimized.

Copy link

commented Dec 4, 2017

This will be awesome as soon as its Working. I am very excited to See it

@lkwg82

This comment has been minimized.

Copy link
Member

commented Dec 28, 2017

Need to work out where to store nonces.

Is that done or is that to be done? I read through the comments and am not sure.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Dec 28, 2017

Its worked out. Getting the crypto right and ignores and other stuff is the harder part.

AudriusButkevicius added some commits Jan 1, 2018

@lkwg82

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

I have two questions:

  1. Are there any thoughts on key exchange between nodes?
  2. Are there any ideas regarding UX?
@generalmanager

This comment has been minimized.

Copy link

commented Feb 4, 2018

Is there a proper write-up of the whole concept somewhere? And which of the features discussed in #109 are actually implemented?

Please specifically reread what @bitshark wrote there and at those two posts by myself:
#109 (comment)
#109 (comment)

  • Will you send plaintext to trusted nodes and ciphertext to untrusted? Or do you always send ciphertext and only the trusted nodes get the key(s)?

  • Do you intend to enable deduplication between trusted and untrusted nodes?

Initially I planned to use just plain AES with no ivs or nonces which is a bit insecure

It's not a bit insecure it's a fucking nightmare and has so many problems it's worse than no encryption at all, because people trust us to get this right. And we'd be better off not offering that functionality than doing something horribly insecure like this.

  • Why are you using CFB for file names?

I'm honestly pretty alarmed by how nonchalantly this whole issue is approached and that there isn't even a real in-depth discussion on the goals and through which precise implementation those are to be achived.
Untrusted continued access and the possibility to manipulate it like in a cloud- or on untrusted servers is one of the most far-reaching and dangerous threat models in existance. And it's incredibly difficult to defend against.

Unfortunately I'm really short on time at the moment, so couldn't look at it in detail, but I'm pretty sure there are some major problems with such a naive approach. I really don't want to piss on anyones parade, but this is treading into dangerous territory.

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Feb 4, 2018

So I think this is dead in the water as it stands, as I could not find a way to do this purely at the filesystem level and still support things like ignores etc.

Things like block reuse, efficient renames et all are impossible to implement without opening up XOR attacks, block reorder attacks, file swap attacks etc.

@generalmanager

This comment has been minimized.

Copy link

commented Feb 4, 2018

I see, it's unfortunate that your efforts haven't been fruitful in this regard. Block reuse could be done with deterministic encryption, which has it's problems, but should be ok in our scenario as we can use a key only known to the group. And proof-of-ownership is already possible when one of the trusted nodes is compromised, because Syncthing syncs stuff, so no degression there. I think if we merged all the metadata in a separate AEAD-encrypted virtual file, we could achive efficient renames too.

But I have to admit, I haven't really put my mind to this lately, so I'll have to think about it in detail again.

Have you looked at TAHOE-lafs? They have a rather similar scenario and are thought to be doing pretty okay security wise. I'll restart the work on an in-depth concept when I'll finally get some free time to slack off. But that'll be in 2-3 months at the earliest :-/

@Darkvater

This comment has been minimized.

Copy link

commented Aug 23, 2018

Hi, question here: where are we with the implementation? From the outstanding TODO it seems only #7 (where to store nonces) is really a blocker. I think we can live without the other limitations for now, document it, and call it a beta. Encryption, even unversioned-only and with garbled names is a lot better than no encryption; especially given this has been going for over four years. Just my 2ct.

Is there any way I can help perhaps?

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Aug 23, 2018

No, this is dead in the water.

@Darkvater

This comment has been minimized.

Copy link

commented Aug 23, 2018

Is there anything else going on or are we back to square one?

@AudriusButkevicius

This comment has been minimized.

Copy link
Member Author

commented Aug 23, 2018

Square zero

@user9209

This comment has been minimized.

Copy link

commented Mar 16, 2019

I would do it on an other way:

see: #5601

Here you are independent on any internal encryption schemes or key management schemes.

You can use

  • openssl symmetric
  • openssl assymmetric
  • gpg symmetric
  • gpg assymmetric
  • ...

Keys can be stored

  • in the share (on ignore list)
  • in the home dir / system global
  • inside the app like gpg
  • ...
@anatoli26

This comment has been minimized.

Copy link

commented Jul 14, 2019

For a secure cloud storage encryption consider ideas from https://github.com/cryfs/cryfs. Probably @smessmer could give some tips on how to solve features like dedup and renames (or probably they should be dropped completely when encrypting).

Also maybe both tools could be merged somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.