Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue H. Plaintext File Kept on Server when Whistleblower Does Not Finish Submitting Tip #828

Closed
fpietrosanti opened this issue Mar 3, 2014 · 24 comments

Comments

@fpietrosanti
Copy link
Contributor

When a whistleblower uploads a file, it is written to the hard drive in plain text. When the
whistleblower submits the Tip, the file is encrypted with the receiver's public key then the
originally-uploaded temporary file is removed. If the whistleblower uploads a file, but never follows through with submitting the tip, the file they uploaded will remain on disk indefinitely.

@fpietrosanti fpietrosanti added this to the LeastAuthorityPentest milestone Mar 3, 2014
@evilaliv3
Copy link
Member

@defuse @zooko :

since version 2.54.3 all temporary file and so the also requests bigger than 100k are saved on the disk encrpyted with AES with a 128bit key generated by globaleaks at startup and stored in /dev/shm. (this key survive globaleaks restarts but obviously not to machine reboots).

datails on this are in https://github.com/globaleaks/GLBackend/blob/master/globaleaks/security.py#L47 where the GLSecureTemporaryFile is implemented in order to be used instead of python tempfile.py.

in addition we have fixed various bug that prevented the cleaning sched to delete unffinishced submissions.

does this solution comply to your recomendations?

@nathan-at-least
Copy link

@evilaliv3 We will be looking at this remediation, starting this week.

@defuse
Copy link

defuse commented Mar 17, 2014

I have not had a chance to review the encryption code properly, but I have some
comments:

  • Different constants: For the counter length, different constants are used.
    For example, sometimes it's the integer literal 64, sometimes it's
    GLSetting.AES_nonce_size, and sometimes it's GLSetting.AES_counter_size.
    If these are all the same thing, they should be unified into one variable that
    gets used consistently.

  • MD5: Why is MD5 used to generate the nonce? The random IV should be
    generated directly, not hashed with MD5.

  • /dev/shm: I believe /dev/shm can be swapped to disk. So it's not a good
    place to keep keys. Keys should be stored in memory that is mlocked into RAM.

  • Use Random Key for Each File: It looks like the same key is being used to
    encrypt each file. Use a new random key for every file. Tahoe-LAFS has code
    for doing this:

    https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/util/fileutil.py?annotate=blame&rev=ff64a0fef5879d3651bc3db6ca0522d96b217d45#L66

@defuse
Copy link

defuse commented Mar 17, 2014

Oops, forgot to add this point:

  • Use a CSPRNG to generate the nonce: Currently the nonce is not generated with a cryptographically secure random number generator. It should be.

@evilaliv3
Copy link
Member

hi @defuse, i answer your 4 question in order:

  1. you are right on this and i will do a review and possibly i will use the same var.
  2. the nounce is hashed in order that it would be possible to save it as part of some filname (i do not see any problem due to the properties of an md5). do you see any problem?
  3. i understand your point. we are using /dev/shm in order to make it possible to restart globaleaks without loosing keys. do you see a problem to keep this? any suggestion?
  4. this sounds reasonable, we avoided it for simplicity but this rightly make the key to change during time.
  5. i think you are talking about this (https://github.com/globaleaks/GLBackend/blob/master/globaleaks/security.py#L67). this nonce is used to generate the counter number. as far as i know the counter does not need to be random this is why we avoid to use a real random number to not have any lock in case of empty random pool. what do you think?

@defuse
Copy link

defuse commented Mar 17, 2014

  1. Ok, great!
  2. Yes, there is a problem. The md5 returns hex, and only 8 hex digits are taken from it as the nonce. This means the nonce is really only 32 bits, not 64 bits, and is much more likely to be reused (which is fatal for CTR mode). You should generate 8 random bytes, use those as the nonce, and hex encode them for the filename.
  3. I'm not sure how to mlock a page in Python. The best solution is probably to make sure the server is encrypting its swap file. I'll ask the others here.
  4. Ok.
  5. The nonce doesn't need to be random, but it does need to be unique. That means if you are generating it randomly, you have to use a secure random number generator (weak random number generators can sometimes return the same numbers, creating a duplicate nonce). You don't need to use "true" random numbers, just a CSPRNG, like os.urandom().

@evilaliv3
Copy link
Member

ok @defuse we agreed internally on 1) 2) 4) 5).

related to 3) we think that for the mitigation (short term) we will keep it due to the simplicity of the solution and we are going to open a ticket for research and development of a specific solution/improval.

do you think that addressing 1) 2) 4) 5) described by you the ticket would be addressed?

@fpietrosanti
Copy link
Contributor Author

@defuse We added anyway #843 as it's a good idea to force/automate swap encryption :-)

@defuse
Copy link

defuse commented Mar 26, 2014

Yes, I think the ticket can be closed when 1) 2) 4) 5) are fixed. I think some more review of the encryption would be helpful, though, so I'll try to do that once they are fixed.

@evilaliv3
Copy link
Member

@defuse here comes the fix:

  1. i've renamed also "nonce" to "counter_prefix" in order to make the usage of the variable more clear.
  2. we are using now os.urandom to generate it as suggested avoiding to hex it in it's usage.
  3. now we generate a different key for each file and we delete the key on the same conditions that lead to remove the temporary encrypted file (in cases of success/failure)

can this thicket be closed?

@fpietrosanti / @vecna the usage of different keys in /dev/shm open the possibility for ddos based on multiple attachments and ram exaustion. i don't think this critical for the moment as probably other more powerful attacks than this can be achieved but probably it's good to open a new ticket managing at least to track the situation on te statistic system. what do you think?

@fpietrosanti
Copy link
Contributor Author

@evilaliv3 Like for the many features available in Flood-Resiliency-Protection https://github.com/globaleaks/globaleaks/issues?labels=Flood-Resiliency-Protection&milestone=&page=1&state=open we can have a new specific limitations on the "amount of new non-finalized file-upload / time" . This would still limit the amount of "files" that can be uploaded (that generate a new file). If this works, please open a ticket with tag "Flood-Resiliency-Protection" .

Be sure to update the documentation in the "Application Security Design" detailing how this security feature works.

@evilaliv3
Copy link
Member

yes this works, but a monitoring guard is anyway needed.
just wait for @defuse review of the fix of this ticket and than i will open
the news issues.

@defuse
Copy link

defuse commented Apr 9, 2014

@evilaliv3 Some comments:

  1. There is still a magic number. In the first call to AES.new there is 64. In the second call there is GLSetting.AES_counter_size. The first 64 should be changed to `GLSetting.AES_counter_size.
  2. Issue 2) above is fixed.
  3. Good, I see it is using a new key for each file -- Issue 4) above is fixed
  4. Issue 5) above is fixed.
  5. Is there any reason Random.new().read() is used for the key and os.urandom() is used for the nonce? It would be fine to use one or the other for both of them.
  6. The class should enforce the rule that write() MUST NEVER be called after read() is called. There is an assert that checks for that, but it's commented out right now. This is mandatory because if write() is called after read(), the new data will be encrypted using the same keystream as before, which will make it trivial to decrypt (xoring the two ciphertexts gives the xor of the two plaintexts).
  7. Unit tests for this class would be a good idea, especially to test the previous point (6).

Once issue (6) above is fixed, then this ticket can be closed.

@evilaliv3
Copy link
Member

thank you an other time. here comes all the fixes suggested included the addition the unit test for 6) 7)

@defuse
Copy link

defuse commented Apr 9, 2014

@evilaliv3 Two things I missed:

  1. It's better to explicitly raise an exception instead of using assert() since assert might be turned off accidentally or on purpose.
  2. There is another restriction that needs to be added: GLSecureFile should override write() so that it is not possible to write to a file that has already been written to. The reason for this is the same: If different data is written twice with the same key and nonce, decrypting it becomes easy.

For completeness' sake, I should also mention the following points. I don't think they are issues under the threat model and probably don't have to be fixed (but doing so would be good):

  • If the attacker can modify contents of the RAM disk, they can execute arbitrary code (because of the pickle.load().
  • If the attacker can change the filename, they can cause the file to be decrypted with the wrong key (a key that another file was encrypted with), which will tell them the keystream used to encrypt the other file. Fixing this would not be easy, you'd have to authenticate the entire ciphertext with an HMAC, which would mean you'd only be able to read and write the entire file all at once.

To be clear: The above two points are not problems, since to perform the attacks the attacker would already need access to the GlobaLeaks Node. I'm just noting them for completeness and future reference.

@evilaliv3
Copy link
Member

@defuse

i think you are misunderstanding the use of the write(). we are using a stream cipher and doing crypto in streaming during upload (in order to no be capped by RAM limits). the reason why we have prevented to use write after read is not for security, but just because it is not possible at all. so with the assertion we wanted to be sure that no code flow will follow this path.

so correct me if i'm wrong but it's fine to keep the assertion just because we can eventually disable it after have verified that it does never verify that condition.

@defuse
Copy link

defuse commented Apr 9, 2014

@evilaliv3 You are correct. As long as no other code is writing twice, everything is good and you don't even need the assertion.

But from a security design standpoint, I like things that never let the user to do dangerous things under any conditions. It makes it harder to make mistakes, especially if someone re-uses your classes in another application and doesn't really understand the problem with writing twice. It also makes auditing easier and faster, since the auditor doesn't have to check if there are multiple writes, they can just see immediately that the class doesn't allow it.

@evilaliv3
Copy link
Member

@defuse: but multiple writes are in place here obviously. if the user is uploading "helloworld" byte-by-byte we will do:

write("h"), write("e"), write("l"), that results in the cipher block chain that encrypt the stream.

the problem is that if i want to encrypt a temporary file and decrypt it with the same object i can mix write and read; when O start to read I can't do a later write on the same object but not because it's insecure but because it won't work and with the assertion I want simply to spot if this unexpected code flow happens

@defuse
Copy link

defuse commented Apr 9, 2014

@evilaliv3 Sorry, you understand correctly: by 'multiple writes' I meant multiple writes to the same location in the file, like reading then writing, or like opening the file with GLSecureFile then writing. Doing either of those things will break the encryption. Multiple writes like write("h"), write("e"), write("l")... is safe of course.

Right now I think it's possible to open the file with GLSecureFile and write a second time. I don't think any code is actually doing it, but even so it should not be allowed (there should be an assert to catch this, just in case).

Edit: fixed a grammar error

@evilaliv3
Copy link
Member

ok :) got it.

thank you for the precision.

@evilaliv3
Copy link
Member

this can be considered closes as no plaintext is kept on the server in default configuration where temporary AES encryption is in place and files are encrypted in streaming to PGP from RAM.
plaintext files are now saved only if the admin explicilty enables this possibbility based on the initiative requirements.

@evilaliv3
Copy link
Member

in addition all the codes has been ported from PyCrypto to cryptography.

@darius
Copy link

darius commented Apr 17, 2014

At https://github.com/globaleaks/GLBackend/blob/devel/globaleaks/settings.py#L274 is a misleading comment: the AES_counter_nonce is not in hex (it's used at https://github.com/globaleaks/GLBackend/blob/devel/globaleaks/security.py#L72) so the length should not be doubled. @defuse points out that the value is nevertheless correct now because the cryptography module says "Must be the same number of bytes as the block_size of the cipher with a given key."

I recommend fixing the comment or at least deleting it.

@evilaliv3
Copy link
Member

allright, i've removed the outdated comment: https://github.com/globaleaks/GLBackend/commit/42bb926ad8f6834432f222ebbe36bc58afa56fc3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants