Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly initialize the AES IV #72

Closed
emanuele-f opened this issue Mar 1, 2019 · 36 comments
Closed

Properly initialize the AES IV #72

emanuele-f opened this issue Mar 1, 2019 · 36 comments

Comments

@emanuele-f
Copy link
Contributor

As explained in #70, the IV used in AES encryption is currently zeroed. This affects the security and must be fixed.

@Logan007
Copy link
Collaborator

Logan007 commented Mar 3, 2019

There is a solution as proposed in the patch provided for #70 and also in #76.

@emanuele-f
Copy link
Contributor Author

To fix this issue in the current n2n AES implementation, I would replace the 32bit nonce number of AES (which is set but never checked anyway) with a random 64bit clear-text number. The remaining 64 bits of the IV can be set to 0 for now. I think this simple fix can improve security as it will prevent two exact plaintexts to produce the same ciphertexts with high probability (clash will only occur with 1/2^64 probability). This should be effective as the IV is not required to be kept secret (this is what I read about it online). Do you think this could be a good compromise for the current AES version?

@Logan007
Copy link
Collaborator

Logan007 commented Mar 31, 2019

Basically, you are right that usually the IV could be transmitted unencryptedly. The underlying assumption is that an attacker does not know anything about the plaintext then.

However, in this case, there is a problem. The data transmitted is an ethernet frame. An ethernet frame always starts with a constant preamble and a start frame delimeter, so the first 8 bytes are always the same. In every packet. These eight bytes are followed by destination and source MAC at 6 bytes each. The MACs are well known as they also are transmitted outside the encrypted portion, in the transformation-independent envelope of n2n packets. Thus, the plaintext of the first data block is extremly predictable if not "well known" to any attacker.

Let us assume that we unencryptedly transmit the IV. In cipher block chaining, the IV gets XORed to the first block of ethernet plaintext before it gets encrypted. Openly transmitting the IV, an attacker could find out the first plaintext data that went into encryption by just XORing the IV to the assumed ethernet plaintext.

One may think that this no big deal as this data is known anyway. But knowing a plaintext that went into encryption and the cipher that comes out of it, allows for known plaintext attacks which helps an attacker to find out the key. And if we randomly deliver more variants of the plaintext (with every new IV) together with its cipher, known plaintext attack gets easier with every packet sent.

That would not happen, if IV were encrypted for transmission.

The problem here is the high predictability of the first ethernet data block, not the IV or CBC mode. From what I have read (section B.2, p. 69), a random IV should have an entropy of at least 95 bits. Given that not-so-random first block of ethernet data the IV gets XORed to, I would definitely go for a full block size of 128 bit.

As brought up in the pull request discussion #76, we might need to consider users that use version 1 encryption scheme and are not able to switch maybe router firmware as fast as desktop users could replace desktop versions. Do we really want to break compatibility? So, I would think of implementing a version 2 with slight additions to version 1 - and of course the full package in version 3 ;) .

@Logan007
Copy link
Collaborator

Logan007 commented Mar 31, 2019

A few additional thoughts on the topic:

  • If we want to transmit a sub-block-sized (part of) IV, e.g. 64 or 96 bit, and have to encrypt it as outlined above, we need a format preserving encryption that maintains data length. The only scheme I would trust so far is the FF1 approved by NIST (should also work with AES) which unfortunately is not implemented in openssl yet. DES with its 64 bit block size is not a considerable option. The only way I see to make it working without implementing FF1 would be to add another encryption step using the not-data-key K2 (see below).
  • With that well known first data block I would refrain from setting any IV-bits to identifiable values, be it 0 or any other hard-coded pattern. Instead, it would be better to use some secret (maybe derived from the key hash) to supplement to full IV size. Over time, a key change would then contribute to higher IV entropy, too.

If we found that 64 bit of minimum entropy are enough for IV (even though not recommended) and wanted to save 8 bytes per packet, a possible approach could look like this:

  1. Generate 64 bit random IV part 1 and call it r
  2. Put r to the outgoing packet being assembled.
  3. Use some 64 bit part of the key hash or another hash of the key or a hash of the has of the key (...) to fill up r to a full 128 bit IV. Note, that this needs to be repeatable on any other client, so it best is key dependent.
  4. Just to avoid constant parts of the full IV (even though the key-dependent part of the IV remains secret, it still is constant under the same key) reproducibly scramble it with a single block AES encryption using K2.
  5. Use the output from step 4 as IV.

The receiver then would do the following:

  1. Get r from the packet
  2. Follow steps 3 to 5 from encryption scheme.

Do we need really need that encryption at step 4? Yes. Otherwise we have well known parts of the IV again, which is bad (see post above). Note: Instead of transmitting an encrypted IV (or parts of it) as explained in the post above, we here would encrypt the IV before use, i.e. before it is being fed into the de- or encryption respectively. But that's okay, too - as long the key remains secret, of course.

Why do we have to use K2? If we applied an encryption twice under the same key, that would be too close to encryption with a following decryption and therefore considered extremely weak. As the IV gets digested (XORed with a highly predictable first ethernet block) in first data block block encryption using K1, we don't want to create a possible target for differential crypt analysis.

Should that be part of an enhanced AES V1 encryption scheme? We could make it the new V2 - if we want to preserve compatibility.

@emanuele-f
Copy link
Contributor Author

At least with this version of AES (v1), it's very important to reduce the performance impact of the encryption process, while still providing a good level of security. I understand the problem you have outlined about my proposal above, so here is how I would improve my proposal.

We could transmit a 64 bit random value in cleartext, R1, then both the sender and the receiver can derive the actual IV by performing hash(R1 + secret_key). Since the attacker does not know secret_key, it cannot know the IV. This method would only add a hash() operation per packet (todo check performance penality) so it should be quite feasible. Do you think this is an improvement over the current 0 IV? Do you see major drawbacks or lighter ways to implement this?

@Logan007
Copy link
Collaborator

Logan007 commented Apr 9, 2019

That is an interesting idea!

In general, the current hashes and the software aes implementation performancewise are in the same league. Here, you will find a performance comparison chart that was created before AES-NI hardware was built in desktop processors. On that somewhat older processor used, AES-ECB (128) compares to SHA256 like 109 to 111 MiB/s, quite on par.

If I understand your propsal correctly, it would require one hash per packet on each side. The other proposal from 9 days days ago requires one aes block encryption step per packet as the hash from the key is generated once only.

I would prefer the aes solution because it is faster on at least some platforms, namely the modern hardware accelerated desktop computers after 2010, otherwise equal.

Securitywise, I still will have to think about it...

@emanuele-f
Copy link
Contributor Author

Thank you for the feedback. To summarize the proposed solutions:

  1. Prepend a random full aes block (128 bits) to each packet. IV does not need to be sent. Pro: very simple, can decrypt the payload as a whole so it's possibly optimized. Cons: additional packet overhead
  2. Prepend a random 64bits value r to each packet. IV can be calculated as encrypt(r + 6bytes from hash(secret_key)). Pro: uses possibly optimized aes encryption, only 2 bytes overhead. Cons: cost of encrypt
  3. Prepend a random 64bits value R1 to each packet. IV can be calculated as hash(R1 + secret_key). Pro: only 2 bytes overhead, Cons: cost of hashing

I will do some benchmarks for the solutions and post the results

@Logan007
Copy link
Collaborator

Logan007 commented Apr 9, 2019

  1. Additional con: Weak security if not processed (hash, encryption) any further.
  2. Shouldn't it read r + 8 bytes to form a 128 bit IV? You mentioned 2 additional bytes, I am not sure to what value exactly you are comparing? Another con: Only 64 bit of entropy for IV.
  3. Should use some (part of the only once calculated) hash of the secret key, e.g. hash (R1 + hash(secret_key)). Another con: Only 64 bit of entropy for IV.

Thanks for preparing current benchmarks to compare aes and sha. I am curious.

Securitywise, 2. and 3. also seem to be on par.

@emanuele-f
Copy link
Contributor Author

emanuele-f commented Apr 14, 2019

I've implemented the tests in my branch: https://github.com/emanuele-f/n2n/tree/init_iv

Results on my i3 pc:

encryption + decryption:

Run [0. aes_iv_zero] for 3s (512 bytes):          201625 packets      67.2 Kpps     34.4 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):    151415 packets      50.5 Kpps     25.8 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):      158923 packets      53.0 Kpps     27.1 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):         168386 packets      56.1 Kpps     28.7 MB/s

encryption only:

Run [0. aes_iv_zero] for 3s (512 bytes):          469719 packets     156.6 Kpps     80.2 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):    272128 packets      90.7 Kpps     46.4 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):      329808 packets     109.9 Kpps     56.3 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):         337057 packets     112.4 Kpps     57.5 MB/s

@emanuele-f
Copy link
Contributor Author

emanuele-f commented Apr 14, 2019

I've also added a new test which uses xor instead of sha256, here is the result(encryption + decryption):

Run [4. iv_init_xor] for 3s (512 bytes):          177139 packets      59.0 Kpps     30.2 MB/s

Important to note that the RAND_bytes function is the source of most of the slow-downs. Replacing it with two calls to rand() (because on windows RAND_MAX is 32bit size) I get:

Run [2. iv_init_encrypt_rand] for 3s (512 bytes):    185883 packets      62.0 Kpps     31.7 MB/s
Run [3. iv_init_hash_rand] for 3s (512 bytes):       191254 packets      63.8 Kpps     32.6 MB/s
Run [4. iv_init_xor_rand] for 3s (512 bytes):        201447 packets      67.1 Kpps     34.4 MB/s

In conclusion I would use the 4th method, xor based, to generate the IV from the hash of the encryption key and the random (using rand(), we don't need unpredictability) 64 bytes IV as the performance impact is almost 0. The IV used is unknown to the attacker (since it does not the key used with xor) and we get the property to generate different ciphertexts from same plaintext with high probability.
Important to note that we could avoid the need to add the 64bits random value at all if we don't care about this property and stil use part of the hash of the encryption key as IV (this would still be a fixed IV though) to improve the current situation with an IV of 0, but I would rather pay the 64bits value overhead and get this property.

@Logan007
Copy link
Collaborator

Wow, you have put a lot of effort in that!

I am not sure if I correctly understood the 4th method. If we XOR a 64 bit IV with the key, some part of the resulting 128 bit IV will always be the same and thus some bits of the first crypto-input block will always remain the same (as ethernet data is constant).

XORing the IV and therefore the input to crypto with a constant value (in this case: the key) does not make a difference in terms of differential crypt-analysis. Differential means that the plain texts differ in a well known way (actually, the difference is applied/measured by XORing) and the way how the resulting cipher differs is analyzed. So, a constant (even if secret) XOR does not make a difference, if the IV is transmitted in plain text and not processed any further. That way, we would deliberately deliver a lot of blocks including the actual XORwise difference to an attacker. I would not recommend it.

I would go with the hash (IV + appended part of the key hash (K2?) - do not use the key directly) or the encryption step. These methods also are advantageous as they deliver a fully-stretched-to-blocksize IV.

@emanuele-f
Copy link
Contributor Author

I perform the xor of the hash of the key (a part of the hash) with the random 64bit number, see emanuele-f@45fd49a#diff-1444ba9e46de97ec079a106ed01d0431R176. The xor is applied on a full 128bit value, so no bits will be the same unless of course the xor bit is 0. In practice I use xor as a faster hash function, maybe we can use a hash function faster than sha256?

@Logan007
Copy link
Collaborator

I see. XOR is not a good one-way function, especially not if we XOR with repeatedly constant (even secret) or well known parts: Using plainly transmitted 64 bits XORed with the key hash (which remains constant under the same key) as IV which gets XORed to the well known first ethernet block does not help at all against differential crypt-analysis.

The currently suggested implementation of IV already has a low entropy of 64 bit. It should not be weakened any further. To be honest, I would not support such a solution, not even if advertised as the-weak-but-performing alternative.

I think, here would be a limit where security gets more important than performance. And SHA256 does not seem to be that much slower than XOR.

@emanuele-f
Copy link
Contributor Author

emanuele-f commented Apr 15, 2019

What about md5? It should be 3x faster than SHA256, it might be a good compromise. I agree that we should not weaken this too much for performance reasons but also I think that differential analysis based on the assumption of packets contents may be not so trivial. I mean, you can speculate that a packet content is X but ARP packets are not that common in a continuous stream of packets. You may filter that based on the packet size but this remains a speculation and that, together with the unknown and changing IV, may be enough for security. Anyway, this is just my speculation on such crypto analysis (which is a black box for me right now), so I appreciate your critial feedback on this.

I want AES to become the standard encryption in n2n for embedded devices so it's trivial to avoid performance reduction where not strictly needed.

@Logan007
Copy link
Collaborator

Logan007 commented Apr 15, 2019

MD5 is considered to be weak in terms of collision avoidance which is not important here - we need the scrambling one-way factor. So yes, I think we could give MD5 a try. MD5(64 bit random ++ 128 bit part of key hash)?

As far as I can see, every ethernet packet has the same predictable beginning consisting of preamble and well known disclosed MACs - not just the ARP ones.

@emanuele-f
Copy link
Contributor Author

Yes, I see. By the way, here are the results on i3 (without rand_bytes):

Run [0. aes_iv_zero] for 3s (512 bytes):   	      197889 packets	    66.0 Kpps	    33.8 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):        195688 packets	    65.2 Kpps	    33.4 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):   	      186943 packets	    62.3 Kpps	    31.9 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):   	      190751 packets	    63.6 Kpps	    32.6 MB/s
Run [4. iv_init_xor] for 3s (512 bytes):   	      201114 packets	    67.0 Kpps	    34.3 MB/s
Run [5. iv_init_md5] for 3s (512 bytes):   	      195369 packets	    65.1 Kpps	    33.3 MB/s

whereas on a singlecore raspberry 800Mhz (without rand_bytes):

Run [0. aes_iv_zero] for 3s (512 bytes):   	       36618 packets	    12.2 Kpps	     6.2 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):         35954 packets	    12.0 Kpps	     6.1 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):   	       32833 packets	    10.9 Kpps	     5.6 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):   	       30502 packets	    10.2 Kpps	     5.2 MB/s
Run [4. iv_init_xor] for 3s (512 bytes):   	       36073 packets	    12.0 Kpps	     6.2 MB/s
Run [5. iv_init_md5] for 3s (512 bytes):   	       32561 packets	    10.9 Kpps	     5.6 MB/s

Interesting to note how the results on such raspberry are different and the hw acceleration becomes more visible when cpu power is limited.

whereas on a 4 core raspberry 900Mhz (without rand_bytes):

Run [0. aes_iv_zero] for 3s (512 bytes):   	       49195 packets	    16.4 Kpps	     8.4 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):         48623 packets	    16.2 Kpps	     8.3 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):   	       45020 packets	    15.0 Kpps	     7.7 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):   	       44128 packets	    14.7 Kpps	     7.5 MB/s
Run [4. iv_init_xor] for 3s (512 bytes):   	       49154 packets	    16.4 Kpps	     8.4 MB/s
Run [5. iv_init_md5] for 3s (512 bytes):   	       46263 packets	    15.4 Kpps	     7.9 MB/s

@Logan007
Copy link
Collaborator

Interesting finding. Maybe we should go with aes on the IV as the fastest of the more secure IV handlings?

@emanuele-f
Copy link
Contributor Author

emanuele-f commented Apr 15, 2019

md5 usually is slighyly more performant than aes, in the embedded environment they perform practically the same. I will also add tests with sha1 for completeness. Do you know if there is a 128bit version of aes in openssl? Right now I use the aes256. I'm actually using sha128 aes128

@Logan007
Copy link
Collaborator

Logan007 commented Apr 15, 2019

Just setup a 128 bit key using AES_set_encrypt_key and/or AES_set_decrypt_key (providing the keysize parameter) for the AES_encrypt or AES_decrypt functions respectively.

Please correct me if I'm wrong, but the numbers seem to be in favor of aes, even on arm which we might find from time to time in embedded systems.

@emanuele-f
Copy link
Contributor Author

I've removed an un-neded initialization in AES in emanuele-f@839d1fb#diff-1444ba9e46de97ec079a106ed01d0431R167, now performance of the AES ECB (iv_init_encrypt) is better than MD5 for embedded, here are the updated results:

on i3 (without rand_bytes):

Run [0. aes_iv_zero] for 3s (512 bytes):          196207 packets      65.4 Kpps     33.5 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):    195771 packets      65.3 Kpps     33.4 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):      187578 packets      62.5 Kpps     32.0 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):         187527 packets      62.5 Kpps     32.0 MB/s
Run [4. iv_init_xor] for 3s (512 bytes):          201137 packets      67.0 Kpps     34.3 MB/s
Run [5. iv_init_md5] for 3s (512 bytes):          196508 packets      65.5 Kpps     33.5 MB/s
Run [6. iv_init_sha1] for 3s (512 bytes):         196457 packets      65.5 Kpps     33.5 MB/s

on a singlecore raspberry 800Mhz (without rand_bytes):

Run [0. aes_iv_zero] for 3s (512 bytes):           36843 packets      12.3 Kpps      6.3 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):     36221 packets      12.1 Kpps      6.2 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):       34928 packets      11.6 Kpps      6.0 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):          30850 packets      10.3 Kpps      5.3 MB/s
Run [4. iv_init_xor] for 3s (512 bytes):           35850 packets      11.9 Kpps      6.1 MB/s
Run [5. iv_init_md5] for 3s (512 bytes):           32782 packets      10.9 Kpps      5.6 MB/s
Run [6. iv_init_sha1] for 3s (512 bytes):          32736 packets      10.9 Kpps      5.6 MB/s

on a 4 core raspberry 900Mhz (without rand_bytes):

Run [0. aes_iv_zero] for 3s (512 bytes):           49312 packets      16.4 Kpps      8.4 MB/s
Run [1. aes_iv_full_block] for 3s (512 bytes):     48803 packets      16.3 Kpps      8.3 MB/s
Run [2. iv_init_encrypt] for 3s (512 bytes):       47725 packets      15.9 Kpps      8.1 MB/s
Run [3. iv_init_hash] for 3s (512 bytes):          44332 packets      14.8 Kpps      7.6 MB/s
Run [4. iv_init_xor] for 3s (512 bytes):           49379 packets      16.5 Kpps      8.4 MB/s
Run [5. iv_init_md5] for 3s (512 bytes):           46853 packets      15.6 Kpps      8.0 MB/s
Run [6. iv_init_sha1] for 3s (512 bytes):          45614 packets      15.2 Kpps      7.8 MB/s

So let's go with aes128

@Logan007
Copy link
Collaborator

The key initializations (AES_set_encrypt_key and ...decrypt_key) could be "factored out" to the aes_key_setup. Making the keys accessible via the sa structure, the key setup only would need to be performed once - and not for evey packet. That might save some more cycles.

@Logan007
Copy link
Collaborator

Logan007 commented Apr 15, 2019

How do we want to fill up the 64 bit to full block size? We could use part of the key hash to fill up - that way, the key contributes to entropy... with every change.

Do you want me to implement that into the pull request for version 1? I am not sure how the pull request compares to the the current dev branch meanwhile. Does the pull request still have to compare against the dev branch code at the time when the pull request was created first time?

As I still see the need for a full security feature list (version 2 with higher IV entropy, replay protection, message authentication, ...), I would really like to keep the opportunity to switch version using the "-A2" option. Besides, the unrolled code looks much more readable.

Hopefully, the (nights of) upcoming weekend will allow some time for coding.

@emanuele-f
Copy link
Contributor Author

I'm making the changes to implement this right now. We will have a 512 hash of the key:

  • (up to) 256 bits can be used for the primary encryption key
  • 128 bits can be used as the secondary encryption key for the AES ECB
  • The remaining 128 bits can be used as you suggest to fill to full block size. Maybe adding the 64bits value to both of the 64bits parts of this hash is better than concatenating them to the value

After the changes the pull request will need to be adapted, but I would wait for that until we discuss the remaining points.

@Logan007
Copy link
Collaborator

Logan007 commented Apr 15, 2019

Concatenation (++) of 64 bits of the hash would be good enough in this case. We could always use the last 64 bits of the key hash.

As long as we use a completely different key for the AES ECB one block encryption step, we could even fill up using the primary key or parts of the secondary key even though it would be considered bad practice. There is no need for addition (+) or XOR, it would not really make a difference here.

emanuele-f added a commit that referenced this issue Apr 16, 2019
This implements the changes discussed in #68 and #72.
This breaks compatibility with the previous AES implementation.

This also fixes two problems reported by valgrind:

==4887== Invalid write of size 2
==4887==    at 0x483E9DB: memmove (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4887==    by 0x10E37F: setup_aes_key (transform_aes.c:378)
==4887==    by 0x10E451: add_aes_key (transform_aes.c:401)
==4887==    by 0x10ED10: transop_aes_setup_psk (transform_aes.c:580)
==4887==    by 0x10A547: main (benchmark.c:92)
==4887==  Address 0x4d574a0 is 0 bytes after a block of size 16 alloc'd
==4887==    at 0x4839B65: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4887==    by 0x10E337: setup_aes_key (transform_aes.c:374)
==4887==    by 0x10E451: add_aes_key (transform_aes.c:401)
==4887==    by 0x10ED10: transop_aes_setup_psk (transform_aes.c:580)
==4887==    by 0x10A547: main (benchmark.c:92)

==13057== Use of uninitialised value of size 8
==13057==    at 0x49023B3: ??? (in /usr/lib/libcrypto.so.1.1)
==13057==    by 0x490346A: AES_cbc_encrypt (in /usr/lib/libcrypto.so.1.1)
==13057==    by 0x11270A: transop_encode_aes (transform_aes.c:230)
==13057==    by 0x10F5CD: send_packet2net (edge_utils.c:1224)
==13057==    by 0x10F813: readFromTAPSocket (edge_utils.c:1278)
==13057==    by 0x1106A8: run_edge_loop (edge_utils.c:1596)
==13057==    by 0x10B9F7: main (edge.c:701)
@emanuele-f
Copy link
Contributor Author

Implemented. Please check out the commit above and review possible issues. Thank you.

@Logan007
Copy link
Collaborator

Will do tonight.

@Logan007
Copy link
Collaborator

Logan007 commented Apr 16, 2019

It has been running stable now for a while. Starting to dig in the code and trying to verify the iv generation performed in set_aes_cbc_iv, I was not able to print the generated ivec, I always get unreasonable values. Probably some mind bending pointer issue in my trace debug lines - but both do give the same strange values:

set_aes_cbc_iv(sa, sa->dec_ivec, iv_seed);
traceEvent( TRACE_DEBUG, "decode_aes using iv %016lx", in_len, sa->sa_id,(__int128*)(sa->dec_ivec) );
traceEvent( TRACE_DEBUG, "decode_aes using iv %016lx", in_len, sa->sa_id, sa->dec_ivec );

If this all turns out to be working, the nonce code can be removed. Also, I would not store the generated iv in the sa structure as the iv only is needed for the en/decryption of the current packet - the iv does not need to survive the end of that function, so I might be much better to use a local variable to hold it.

TomorrowDuring the weekend, I will take a closer look at the key setup, more feedback then.

@emanuele-f
Copy link
Contributor Author

Thank you for your time. You can use the hexdump function to print the IV

n2n/n2n.c

Line 246 in d689691

void hexdump(const uint8_t * buf, size_t len)
. Yes the nonce can be removed, I'd also like to ask if someone uses the keyfile based aes and if not I would drop the SA too. I'll move the IV out of the sa struct in the meantime.

@Logan007
Copy link
Collaborator

Logan007 commented Apr 17, 2019

Just a short real quick note: We will need the host-keyed sa-structures later on, probably for DH-key generation (the state of generation and transmission as well as the key need to be kept track of on a per host basis) and some other per-host values such as "last packet time stamp seen" or similar.

Edit: Just for clarification, I am talking about the sa structures in the code (shall stay), not the sa indicator in the packet (could be dropped).

@Logan007
Copy link
Collaborator

Logan007 commented Apr 20, 2019

The IV seems to work pretty well, i can't see any reason not to use it in version 1.

Another point came up to my mind while reviewing the key setup. I am sure I have mentioned it somewhere else before. It is about the input key entropy and the chosen key size. The function aes_best_keysize uses the input key size as parameter to determine the actual keysize for payload encryption.

As a lot of input keys might be provided in the form of ASCII characters (who would reallyy use backslashed binary codes?) with an usual variety of 64 up to 82, their entropy is respectively lower than a full byte. So maybe it would be good to introduce a factor between 3 and 4 to make sure key entropy is high enough and full key space is used.

For example, input key size could be divided by 4 before aes_best_keysize is applied. That way, 4 input key characters would be required for one key byte. That of course would need a PSK with a minimum of 128 characters to guarantee a 256 bit aes payload encryption.

If we already break compatability, that is maybe a good opportunity to take care of this thought.

@emanuele-f
Copy link
Contributor Author

The key is now hashed with SHA512 so the entropy of the key feed into AES should be fine, am I missing something? What's the benefit of such change? I see it diffucult for people to use a 128 characters PSK unless they know it's required to trigger the use of AES256

@Logan007
Copy link
Collaborator

Logan007 commented Apr 22, 2019

Hashing does not change entropy; the SHA512 is for swirling the provided input key bits and thus making a more unpredictable and distributed use of the key space. But considering ASCII input, we do not make full use of the key space:

Let us assume to have a 16 character ASCII input key. That would be a 128 bit input key and resulting in a 128 bit keysize for AES ( >= 24 character input would result in 192 bit AES key). Also assuming the best case of a variety of 82 characters per input key byte, that would be 82 ^ 16 = 4.18E30 different keys which is way below 256 ^ 16 = 3,4E38 (number of fixed size 128 bit a.k.a. 16 byte keys that use full byte range, not just common ASCII characters). Entropy or "variety" of input keys is 3,081,436,257 times lower than full 128 bit AES key space. The 16th root delivers a factor of 3.12 ( = 82 / 256) that each character should carry more entropy. If we assume a 23 character ASCII input, the factor would be around 0.000003 which means more than same entropy which is reached with a factor of 1. The factor of 1 is reached between 20 and 21 input characters: ((256 ^16) ÷ (82 ^ 20)) ^( 1 ^20)) = 1.8.

For the bigger keysizes, the factor also varies somewhat: For the 192 AES bit range, we see the factor varying between 3.12 (again, 82 / 256 for 24 byte input key) and 0.89 (for 31 character input key, below 1 means more than same entropy). For the 256 key size range, the factor varies between between 3.12 (again, 82 / 256 for 32 byte input key) and any value close enough to zero with higher key lengths -- it gets 1 (equal entropy) for around 40 or 41 input key characters.

That's why I would recommend to use 3 to 4 input characters per internal AES keysize byte. That way, we definitely make better use of the key space. I could also imagine using a more sophisticated function that takes the varying factors into account to calculate the actual AES keysize to use from the any given input key size - not just constantly multiplying / dividing by 3 or 4. I might come up with something not too complicated. Give me some time.

By the way, the famous second sentence of the United States Declaration of Independence already has 210 characters - maybe my favorite passphrase 😉. As the PSK does not get entered by the user on the keyboard but is laid down in the keyfile or cli parameter / environment (probably in some bash script or batch file which also would make a good place to load modules and add some routing setup if required), I wouldn't see a big issue with PSK length.

[EDITed for having confused numbers]

[EDITed even more... the thoughts above are not wrong but not complete at all. While investigating I found that there is more to it. I will open a new issue as soon as I got it straight. This is not about the IV anyway.]

@emanuele-f
Copy link
Contributor Author

Thank you for trying to explaining this more simply! We should also evaluate the possibility to add another parameter to specify the AES size to use, but first I would like to better understand this issue with entropy.

@Logan007
Copy link
Collaborator

No parameter for AES size required, it is just that we need to think through what entropy to assume for the provided keys and how their entropy gets split up on the single generated keys and how we use that information to determine the optimized "switching" points for key sizes... I will get back to that somewhen during the next days.

@emanuele-f
Copy link
Contributor Author

I'm closing this issue as the initialization problem has been fixed. Please open a new issue about the entropy improvement

@Logan007
Copy link
Collaborator

Shouldn't we keep it flexible, i.e. using a constant such as IV_LENGTH_BYES (0 ... 16) to determine the IV length and (16 - IV_LENGTH_BYTES) bytes to take from the key hash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants