8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes#28363
Conversation
…tubs cause AES-GCM encryption failure for certain payload sizes
….com and zlukas@google.com.
|
👋 Welcome back jiangli! A progress list of the required criteria for merging this PR into |
|
/contributor tholenst@google.com |
|
@jianglizhou This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 254 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@jianglizhou Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
|
@jianglizhou Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
|
@jianglizhou The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
|
/contributor Thomas Holenstein tholenst@google.com |
|
@jianglizhou Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
|
@jianglizhou Syntax:
User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address. |
|
/contributor add Thomas Holenstein tholenst@google.com |
|
@jianglizhou |
|
@jianglizhou |
Webrevs
|
|
|
||
| public class TestAesGcmIntrinsic { | ||
|
|
||
| static final SecureRandom SECURE_RANDOM = newDefaultSecureRandom(); |
There was a problem hiding this comment.
Drive-by comment: Java code should use 4x whitespace indentation.
shipilev
left a comment
There was a problem hiding this comment.
Good catch! Some stylistic comments for the product fix, and suggestions for the test.
|
|
||
| __ bind(MESG_BELOW_32_BLKS); | ||
| __ subl(len, 16 * 16); | ||
| __ cmpl(len, 256); |
There was a problem hiding this comment.
From the stylistic logic, this should be written as 16 * 16, to match the surrounding subl and addl.
There was a problem hiding this comment.
Thanks for the detailed review, @shipilev! Fixed.
| public static void main(String[] args) throws Exception { | ||
| TestAesGcmIntrinsic test = new TestAesGcmIntrinsic(); | ||
| long startTime = System.currentTimeMillis(); | ||
| while (System.currentTimeMillis() - startTime < 60 * 1000) { |
There was a problem hiding this comment.
I get that you want a stress test. But time-limiting puts the test into weird condition: it can have different number of iterations, depending on auxiliary load on the machine. These tests are running in parallel with lots of other tests, so it is not uncommon. Do you even need to repeat jitFunc() call multiple times? Looks like it traverses the interesting configurations in one go?
There was a problem hiding this comment.
I did some testing today. For 200 runs, removing the time-limited loop, there is 89 runs out of 200 fail. So I changed to use an iteration of three runs, all 200 runs fail without the fix.
| for (int messageSize = SPLIT_LEN; messageSize < SPLIT_LEN + 300; messageSize++) { | ||
| byte[] message = randBytes(messageSize); | ||
| try { | ||
| byte[] ciphertext = gcmEncrypt(key, message, aad); |
There was a problem hiding this comment.
I believe it makes sense to check that round-trip is successful, e.g. that decrypt(encrypt(message)) == message. Currently we implicitly rely on exceptions being thrown from the incorrectly executing code, which is IMO too weak -- in the boundary conditions like these, there might be bugs that do not manifest in visible exceptions, and just the encryption is subtly broken.
There was a problem hiding this comment.
That's a good idea. I added decrypt part and the check as suggested.
There was a problem hiding this comment.
With the changes, there were more common parts in the test. I moved common code into helper methods.
| import javax.crypto.spec.GCMParameterSpec; | ||
| import javax.crypto.spec.SecretKeySpec; | ||
|
|
||
| public class TestAesGcmIntrinsic { |
There was a problem hiding this comment.
This sounds like TestGCMSplitBound or some such; it is not a generic test for AES/GCM intrinsic.
There was a problem hiding this comment.
I renamed to TestAesGcmIntrinsic name, when converting the original test into the jtreg test. TestGCMSplitBound SGTM. Changed.
| throw new RuntimeException("ciphertext is null"); | ||
| } | ||
| } | ||
| for (int messageSize = SPLIT_LEN; messageSize < SPLIT_LEN + 300; messageSize++) { |
There was a problem hiding this comment.
[SPLIT_LEN - 300; SPLIT_LEN + 300] for completeness, perhaps?
|
|
||
| public class TestAesGcmIntrinsic { | ||
|
|
||
| static final SecureRandom SECURE_RANDOM = newDefaultSecureRandom(); |
There was a problem hiding this comment.
Do you really need a SecureRandom here? Random RANDOM = Utils.getRandomInstance(); gets you the pre-seeded random instance, which can be used to repeatably reproduce failures.
There was a problem hiding this comment.
I kept the SecureRandom without changing. I think that could be more related to what the original reproducible.
- Rename test to TestGCMSplitBound.java - Change test range to [SPLIT_LEN - 300; SPLIT_LEN + 300].
- Replace time-bound loop with an iteration of three runs. - Add encrypt part and check to make sure the encrypted message is the same as the original.
| byte[] nonce = randBytes(IV_SIZE_IN_BYTES); | ||
| System.arraycopy(ciphertext, 0, nonce, 0, IV_SIZE_IN_BYTES); | ||
| Cipher cipher = getCipher(key, aad, nonce); | ||
| return cipher.doFinal(ciphertext, IV_SIZE_IN_BYTES, ciphertext.length - IV_SIZE_IN_BYTES); |
There was a problem hiding this comment.
Indenting is still 2-space here.
There was a problem hiding this comment.
Indeed. Fixed, thanks.
test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java
Outdated
Show resolved
Hide resolved
| AlgorithmParameterSpec params = | ||
| new GCMParameterSpec(8 * TAG_SIZE_IN_BYTES, nonce, 0, nonce.length); | ||
| Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding"); | ||
| cipher.init(Cipher.ENCRYPT_MODE, keySpec, params); |
There was a problem hiding this comment.
Er. This is used from gcmDecrypt? How does it work without Cipher.DECRYPT_MODE?
There was a problem hiding this comment.
Good catch. Interestingly the test passed for me on my local machine. Fixed to use Cipher.DECRYPT_MODE when doing gcmDecrypt.
Also an interesting new finding, with the decrypted message verification, I see there are 2 failures out of 200 runs with AVX512. I'm filing a new issue on the specifically, so it can be investigated.
There was a problem hiding this comment.
|
|
||
| /* | ||
| * @test | ||
| * @bug 8371864 |
There was a problem hiding this comment.
Does it make sense to just run the unit test on architectures with @requires vm.cpu.features ~= ".*avx512f.*" | vm.cpu.features ~= ".*avx2.*" annotation?
There was a problem hiding this comment.
Thanks for reviewing and testing!
Does it make sense to just run the unit test on architectures with @requires vm.cpu.features ~= ".avx512f." | vm.cpu.features ~= ".avx2." annotation?
Limiting the test execution on the relevant devices is a good idea. We can also check for os.simpleArch == "x64". We probably could check for ".avx512." instead ".avx512f." just to make sure we still get the proper test coverage in case there is any future/hidden bugs with populating cpu feature flags.
I just did a quick testing:
On my local machine, these related cpu feature flags are set: avx, avx2.
On a machine enabled with the aesgcm_avx512 intrinsic, these are the related cpu feature flags:
avx, avx2, avx512f, avx512dq, avx512cd, avx512bw, avx512vl, avx512_vpopcntdq, avx512_vpclmulqdq, avx512_vaes, avx512_vnni, avx512_vbmi2, avx512_vbmi, avx512_bitalg, avx512_ifma
There was a problem hiding this comment.
Added @requires.
| /* | ||
| * @test | ||
| * @bug 8371864 | ||
| * @run main/othervm/timeout=600 TestGCMSplitBound |
There was a problem hiding this comment.
60 was sufficient for my test runs.
There was a problem hiding this comment.
It's probably better to give larger timeout factor to prevent false failure when testing on slower machine.
test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java
Outdated
Show resolved
Hide resolved
test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java
Outdated
Show resolved
Hide resolved
test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java
Outdated
Show resolved
Hide resolved
test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java
Outdated
Show resolved
Hide resolved
test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java
Outdated
Show resolved
Hide resolved
test/jdk/com/sun/crypto/provider/Cipher/AES/TestGCMSplitBound.java
Outdated
Show resolved
Hide resolved
- Add @requires - Shorten long lines
| //process 8 16 byte blocks in initial_num_blocks. | ||
| //process 8 16 byte blocks at a time until all are done 'encrypt_by_8_new followed by ghash_last_8' | ||
| __ xorl(pos, pos); | ||
| __ cmpl(len, 128); |
There was a problem hiding this comment.
Was this part of the original problem? I was trying to trace where this is called with < 128 bytes and couldn't find the path.
There was a problem hiding this comment.
As I documented in JDK-8371864 description, there was also a bug in AVX2 version of the intrinsic, StubGenerator::aesgcm_avx2. Hence the bug title mentioned both AVX512 and AVX2 intrinsics stubs.
The failure can be reproduced if you run TestGCMSplitBound.java on a machine supports AVX2 but not AVX512 features. You would need to find a x64 machine that supports AVX2 but not AVX512 features. See StubGenerator::generate_aes_stubs() for how it decides which version of the stub is used.
On my local machine with AVX2 support, TestGCMSplitBound.java fails without the fix:
test result: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Failed for messageSize 100001
| encryptAndDecrypt(key, aad, message, messageSize); | ||
| } catch (Exception e) { | ||
| throw new RuntimeException("Failed for messageSize " + | ||
| Integer.toHexString(messageSize), e); |
There was a problem hiding this comment.
nit: + operator should be first and line indented >= 8 white-spaces.
There was a problem hiding this comment.
Aren't these nits something a tool should check and in the best case also fix automatically?
There was a problem hiding this comment.
nit: + operator should be first and line indented >= 8 white-spaces.
Aren't these nits something a tool should check and in the best case also fix automatically?
Changed to break before operators +.
AFAIK, we have mixed styles in existing JDK code with operator on the new line and operator at the end of previous line for breaking long lines. +1 on the suggestion to do auto-detection and auto-fix if we want to more strictly reinforce style.
smemery
left a comment
There was a problem hiding this comment.
@jianglizhou thank you for the AVX2 related output from the unit test pre-fix. From this I was able to trace the point of failure and see that your proposed changes are good for approval. Thank you for your work on these issues!
shipilev
left a comment
There was a problem hiding this comment.
Oh man, the pos/len modifications in current code are confusing. I scratched my head for quite a while trying to comprehend why does __ bind(MESG_BELOW_32_BLKS) split the pos += 16 and len -= 16? On a surface, that just looks like a bug.
But looks that way because we do initial_blocks_16_avx512 twice, do pos += 16 twice, but only do the len += 32 after the second call. Which does not help if we shortcut after the first call. In fact, I am not at all sure that checking len < 32 without modifying len beforehand does not break the assumptions downstream:
initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset);
__ addl(pos, 16 * 16);
__ cmpl(len, 32 * 16);
__ jcc(Assembler::below, MESG_BELOW_32_BLKS);
Really, in these kind of intrinsics, you want to make sure pos and len updates are tightly bound together, otherwise these kinds of mistakes would keep happening. You will lose on code density a bit, but would have more readable and robust code.
Shouldn't it be like this?
diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
index 1e728ffa279..a16e25b075d 100644
--- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
+++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
@@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist
initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset);
__ addl(pos, 16 * 16);
+ __ subl(len, 16 * 16);
+
__ cmpl(len, 32 * 16);
__ jcc(Assembler::below, MESG_BELOW_32_BLKS);
initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK, stack_offset + 16);
__ addl(pos, 16 * 16);
- __ subl(len, 32 * 16);
+ __ subl(len, 16 * 16);
__ cmpl(len, 32 * 16);
__ jcc(Assembler::below, NO_BIG_BLKS);
@@ -3491,24 +3493,27 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist
ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK,
true, true, false, false, false, ghashin_offset, aesout_offset, HashKey_32);
__ addl(pos, 16 * 16);
+ __ subl(len, 16 * 16);
ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK,
true, false, true, false, true, ghashin_offset + 16, aesout_offset + 16, HashKey_16);
__ evmovdquq(AAD_HASHx, ZTMP4, Assembler::AVX_512bit);
__ addl(pos, 16 * 16);
- __ subl(len, 32 * 16);
+ __ subl(len, 16 * 16);
__ jmp(ENCRYPT_BIG_BLKS_NO_HXOR);
__ bind(ENCRYPT_BIG_NBLKS);
ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK,
false, true, false, false, false, ghashin_offset, aesout_offset, HashKey_32);
__ addl(pos, 16 * 16);
+ __ subl(len, 16 * 16);
+
ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, SHUF_MASK,
false, false, true, true, true, ghashin_offset + 16, aesout_offset + 16, HashKey_16);
__ movdqu(AAD_HASHx, ZTMP4);
__ addl(pos, 16 * 16);
- __ subl(len, 32 * 16);
+ __ subl(len, 16 * 16);
__ bind(NO_BIG_BLKS);
__ cmpl(len, 16 * 16);
@@ -3525,9 +3530,9 @@ void StubGenerator::aesgcm_avx512(Register in, Register len, Register ct, Regist
ghash16_avx512(false, true, false, false, true, in, pos, avx512_subkeyHtbl, AAD_HASHx, SHUF_MASK, stack_offset, 16 * 16, 0, HashKey_16);
__ addl(pos, 16 * 16);
+ __ subl(len, 16 * 16);
__ bind(MESG_BELOW_32_BLKS);
- __ subl(len, 16 * 16);
gcm_enc_dec_last_avx512(len, in, pos, AAD_HASHx, SHUF_MASK, avx512_subkeyHtbl, ghashin_offset, HashKey_16, true, true);
__ bind(GHASH_DONE);
The combination of handling for the fall through from I had also missed the fall through case in my initial proposed fix (with comp/jcc) until @sviswa7 pointed it out and suggested the current fix. The fix for
Improving readability is a good idea, hand-rolled assembly however is mostly motivated by performance. While |
@smemery Thanks for carefully testing the changes! |
|
/integrate |
|
@jianglizhou This pull request has not yet been marked as ready for integration. |
|
/integrate |
|
Going to push as commit 6cb1c8f.
Your commit was automatically rebased without conflicts. |
|
@jianglizhou Pushed as commit 6cb1c8f. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size.
Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case!
Progress
Issue
Reviewers
Reviewers without OpenJDK IDs
Contributors
<tholenst@google.com><zlukas@google.com>Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28363/head:pull/28363$ git checkout pull/28363Update a local copy of the PR:
$ git checkout pull/28363$ git pull https://git.openjdk.org/jdk.git pull/28363/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28363View PR using the GUI difftool:
$ git pr show -t 28363Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28363.diff
Using Webrev
Link to Webrev Comment