Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on Key Entropy #101

Closed
Logan007 opened this issue Apr 30, 2019 · 14 comments
Closed

Thoughts on Key Entropy #101

Logan007 opened this issue Apr 30, 2019 · 14 comments

Comments

@Logan007
Copy link
Collaborator

Logan007 commented Apr 30, 2019

Let's assume that an input key has a certain "variety". For example, if we restricted ourselves to the use of 64 ASCII characters (at 6 bit each), an input key of 4 characters length would have a variety of 24.

As of now, the input key gets stretched/shrinked to 512 bits in the key hash (SHA512). Hashing the key does not change overall variety. Variety just gets distributed over the hash now. But there is a good reason to hash it anyway.

So, following the example given above, every bit in the hash (from which the key is taken) would carry a variety of 24 / 512 = 0.047. Our aim would be that every bit of the hash that gets used in the key carries a variety of at least 1 (to make sure the full key space gets used).

Thus, the first conclusion is that we would want to keep the hash as short as possible but as long as necessary to make best use of the input key's variety. But how long should the hash be?

The major security defining parameter for AES encryption is its key. In case of 128 bit keys, we need 128 bit. Furthermore, we use a different, additional 128 bit key for IV encryption which essentially contributes to security. This adds up to 256 bit. Luckily, there is a SHA256 that would deliver exactly a 256 bit value that could be split into the two keys needed. The extra bits needed for an implicit IV padding before encryption or use (see #72) could be generated by hashing the hash again; this is feasible in this case of a short input key. But we need to be aware that those extra bits do not come with extra variety.

Assuming that the input key carries a varitey of only 6 bit per byte, every input key with a length equal or below 256 / 6 = 43 should be hashed using SHA256.

For longer input keys, we would want to switch to 192 bit keys for longer input to make use of the full key space. SHA384 delivers 192 + 128 + 64 bit for the payload-key, the IV encryption key, and a possible IV padding, respectively. Following the presumed expectation that longer input keys yield to higher security, the additional 64 bit for IV padding are derived directly from the key and thus carrying their own variety. With 6 bit per byte, every input key of length between 44 and (384 / 6 =) 64 should be treated this way.

For even longer input keys, we finally would need to switch to SHA512 using the well know scheme of 256 bit payload key, 128 bit IV key, and 64 bit IV padding. We even have extra 64 bit! Following the above mentioned scheme, this would be the way to chose for input keys with length equal or above 65 characters. If a user explicitly wants to go for the 256 bit payload encryption, an "optimal" input key length would be around (512 / 6 =) 83 characters.

To summarize, the scheme for key setup could be implemented using a case statement as follows:

input key length hash to use IV padding from
...43 SHA256 SHA256²(input key)
44...64 SHA384 last 64 of hash
65... SHA512 last or penultimate 64 of hash

Performance impact should be low as key setup is a one-shot at program start.

Why shouldn't we just always use SHA512 and switch the payload keysize just at the limits mentioned above? Because we would not make use of the full key space: The first 2 ^ 128 input keys would probably not (even roughly) be mapped to the 2 ^ 128 possible values for the first quarter of a SHA512 value (as the remaining 384 bit do not stay constant).

Why do I emphasize 6 bit per character? Well, when using typed input keys a.k.a. passwords we usually have 2 x 26 letters + 10 digits + few extra characters, which comes roughly to roughly 64 = 2 ^ 6. Even the uuencode command line tool uses 64 characters only.

A nice way to generate a shiny random input key with 83 characters is pwgen -s -y 83 1 or any pipe from /dev/random to uuencode such as this one found on stack exchange.

Confusing? I will be happy to discuss any questions.

@emanuele-f
Copy link
Contributor

emanuele-f commented May 4, 2019

So 2^variety should to be the number of possible configurations for the bits of the hash. I see that extending the hash size does not change the number of configurations available, but I do not understand why reducing the hash size to possibly match the variety is more secure than using a wider hash size. In practice, my question is why helloworld -> sha256 -> aes256 is more secure than helloworld -> sha512 -> aes512?

Maybe we should also consider the use of specific password-based key derivation functions like PBKDF2, bcrypt or scrypt in place of SHA to increase time needed to bruteforce.

@Logan007
Copy link
Collaborator Author

Logan007 commented May 5, 2019

why reducing the hash size to possibly match the variety is
more secure than using a wider hash size

This is more about the point that we are aiming to use the full keyspace than the security of a single given password.

First of all, we want to make use of all 128, 192 and 256 bit keys, comprising 2 ^ 128 (which already are a lot) + 2 ^ 192 + 2 ^ 256 instead of just 2 ^ 256. And then, we want to use every single of those three keyspaces to their limits before switching to the next one.

For example, in case of the 128 bit keys that we cut from the 256 bit hash, we would want to have the opportunity to chose between all 2 ^128 possible keys for the AES encryption. Leaving blanks in the keyspace could be considered wasteful (non-)use of precious keyspace even allowing brute-force attackers to know that they do need to check all possible keys and thus easing their business.

How, i.e. at what length of the input key, can we make sure that we probably have seen all 2 ^ 128 combinations in the first half of the hash?

Depending on SHA256's closeness to perfectness, it is extremely unlikely that we will have seen all 2 ^ 128 possible combinations in the first half of the hash value after having fed 2 ^ 128 input combinations - just because the hash distributes the input variety to the entire length of its output. That is why we need to get to a variety per bit of 1 in the hash value (see above) before switching to the next higher key size.

Another way to come to the same conclusion is starting from the point that the two different 128 bit keys for payload and IV encryption, respectively, together could be considered a whole of 256 bit key material.

Maybe we should also consider the use of specific
password-based key derivation functions like
PBKDF2, bcrypt or scrypt in place of SHA ...

In my view, the use of time/resource consuming type of key derivation functions makes sense only

  • if attackers might gain access to some key- or password-hash, e.g. stored for verification purposes, or
  • if passwords of a definitely limited range are used.

Both are not the case so far (no hashes stored, no limits to passwords), so I do not see any need for PBKDF2 & Co.

... to increase time needed to bruteforce.

The question to ask is: brute force against what? As there is no hash permanently stored nor transmitted, the only target could be intercepted network traffic whose (already derived) keys better would be attacked directly. Only in case of limited password space (e.g. a 4 digit PIN which is not the case here) from which keys were derived, it would be beneficial for an attacker to pre-calculate the possible derived keys - and thus for us to resort to a resource consuming key derivation function.

@emanuele-f emanuele-f added this to the v.2.6-stable milestone May 26, 2019
@emanuele-f
Copy link
Contributor

@logan008 can you summarize the changes to be applied in order to improve the n2n security based on the entropy concepts above?

@Logan007
Copy link
Collaborator Author

Logan007 commented May 26, 2019

input key length [bytes] AES key size [bit] hash function to use payload key (AES-CBC) IV key IV padding (64 bit)
...43 128 SHA256 first 128 bit of hash next (=last) 128 bit of hash any (maybe first or last) 64 bit of SHA256(SHA256(input key))
44...64 192 SHA384 first 192 bit of hash next 128 bit of hash next (=last) 64 bit of hash
65... 256 SHA512 first 256 bit of hash next 128 bit of hash next(=penultimate) or last 64 bit of hash

Those changes would need to be reflected in aes_best_keysize and setup_aes_key as well as in IV padding handling. The IV still needs to be encrypted using IV key.

Idea for coding: In the first case, concatenate SHA256(input key) and SHA256²(input key) for being able to afterwards always (all cases) use the same scheme: First part [key size] = key, next part [128 bit] = IV key, next part [64 bit] = IV padding.

Yet another idea: aes_best_keysize could be integrated in setup_aes_key to make it a one-stop shop.

@emanuele-f
Copy link
Contributor

Thank you. Can you provide a pull request?

@Logan007
Copy link
Collaborator Author

Logan007 commented May 29, 2019

I will try. The coding part should be done in a few days. But it will need some time as I need to convince my local git to unbind from the last pull request. Any hints on that? Will a fresh clone be sufficent? I am not really handy with git...

Would you also accept a patch?

Any plans for when to release v2.6?

@Logan007
Copy link
Collaborator Author

Logan007 commented May 30, 2019

While working on the code, I found that this scheme implies that TRANSOP_AES_IV_SEED_SIZE may not be chosen below 8 as otherwise, there wouldn't be enough IV padding material from the hash especially in the "middle" case which uses SHA384.

I am not sure if anybody would ever want to change TRANSOP_AES_IV_SEED_SIZE, but to keep it flexible, I will fill up the rest (from bit 385 to bit 512) with 128256 bit of another appended hash value in case those are needed. This does not change any of the other outlined thoughts, but just allows some flexibility. Any objections?

Alternatively, we could just make a note in the source code that TRANSOP_AES_IV_SEED_SIZE may not be chosen below 8, which I would consider the worse way to go.

Also, I would need to change some data types to arrays to make use of that flexibility. Any idea how to elegantly output a uint8_t array with traceEvent?

n2n now could be compiled with TRANSOP_AES_IV_SEED_SIZE ranging between 0 and 16 which influences security on the one hand (16 = highest security) and packet length on the other hand (0 = shortest packet length). I left 8 as well balanced default.

@Logan007
Copy link
Collaborator Author

Please find attached a patch.

@Logan007 Logan007 mentioned this issue May 30, 2019
@emanuele-f
Copy link
Contributor

Thank you. If possible, please provide a pull request so that the code attribution will be yours

@Logan007
Copy link
Collaborator Author

Logan007 commented May 31, 2019

#118 #128

@Creling
Copy link

Creling commented Jun 10, 2019

Well......

if we pursue using full key-space, then the key space will be extended from (2^512) to (2^256+ 2^384+ 2^512). But does this make sense?

2^512 is 10e38 size of 2^256. If we calculate how much the key space is expanded
图片

BTW, if we use SHA256 for ...43 (key.size() < 43), brute-force attackers can only hit SHA256 to find if someone using ...43 and ignore SHA512 at all, which will bring additional risks.

@Logan007
Copy link
Collaborator Author

Logan007 commented Jun 10, 2019

if we pursue using full key-space,

The usable (full) AES keyspace is (2 ^ 128 + 2 ^ 192 + 2 ^ 256).

then the key space will be extended
from (2^512) [...]

There is no claim that 2 ^ 512 would be the usable AES keyspace.

[...] to (2^256+ 2^384+ 2^512).

SHA does not extend key's entropy. It is just a "mixer" between the input key and the AES key to ensure a more unpredictable use of the AES keyspace.

But does this make sense?

It is because of the mixing nature, that it is important to use the right-sized tool for a certain input key size. If the low entropy of a short input key gets distributed on too many bits, i.e. a too long hash value of which - even worse - we only used the first part for the AES key, we would waste key entropy and also not make use of the full AES key space.

BTW, if we use SHA256 for ...43 (key.size() < 43),
brute-force attackers can only hit SHA256
to find if someone using ...43 and
ignore SHA512 at all, which will bring additional risks.

As far as I can see it, no attacker will know from the encrypted messages what kind of AES key has been used (128, 192, or 256). Consequently, they will not know from what kind of hash it was derived. What did I miss?

I am not sure what you mean by "hitting SHA256". An attacker would always have to brute-force the keys - derived or not derived from a hash of any size. The hash value does not get stored, parts of it get used as AES key. Could you please elaborate on that point and explain a little bit more in detail any related risks?

However, it would make more sense for an attacker willing to brute-force to attack the encrypted messages directly, i.e. skipping the hash and trying to find the correct key - which always is a threat to every encryption scheme.

P. S. This is no new function, the AES key size was chosen according to input key size before - switching was at input key length of 24 or 32, respectively. This here is more about the exact sizes when to switch from one AES key size to the next taking two things into account: the characteristics of mixing hash functions as well as ASCII input keys.

If we transfer the calculus you did to the example of AES key sizes of 128, 192, and 256, we will find that of course 2 ^ 256 is much bigger than 2 ^ 128 and 2 ^ 192; therefore, (2 ^ 256 + 2 ^ 192 + 2 ^ 128) seems to be quite close to 2 ^ 256 (quotient around 1). But actually, 2 ^ 128 and 2 ^ 192 already are quite big numbers. For a lot of people, these key sizes even are sufficient. Especially, with a view to performance as 192 and 256 bit keys require more AES rounds and cause a respective performance loss.

@emanuele-f
Copy link
Contributor

emanuele-f commented Jun 18, 2019

The main discussion here is that using SHA256 is more secure than using SHA512 on common key lengths (< 20 characters I would say). @Creling do you have other concerns about this?

Since 99% of n2n users will use key shorter than 44 characters, with the new algorithm aes128 can be assumed to be always in use. However, we may remove the dynamic selection of encryption based on key length and instead add an additional parameter to specify the aes encryption to use if this is a concern.

I've tested pull request #128 and it seems ok for me (apart from the small fixes requested).

@emanuele-f
Copy link
Contributor

No feedback received, PR integrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants