AES-GCM enabled with AVX512 vAES and vPCLMULQDQ. #17239

amatyuko-intc · 2021-12-08T16:14:14Z

The proposed patch provides a vectorized 'stitched' encrypt + ghash implementation of AES-GCM enabled with AVX512 vAES and vPCLMULQDQ instructions (available starting Intel's IceLake micro-architecture).

The performance details for representative IceLake Server and Client platforms are shown below

Performance data:
OpenSSL Speed KBs/Sec
Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (1Core/1Thread)
Payload in Bytes       16          64        256         1024        8192      16384
AES-128-GCM
  Baseline      478708.27   1118296.96  2428092.52  3518199.4   4172355.99  4235762.07
  Patched       534613.95   2009345.55  3775588.15  5059517.64  8476794.88  8941541.79
  Speedup            1.12         1.80        1.55        1.44        2.03        2.11
 
AES-256-GCM
  Baseline      399237.27   961699.9    2136377.65  2979889.15  3554823.37  3617757.5
  Patched       475948.13   1720128.51  3462407.12  4696832.2   7532013.16  7924953.91
  Speedup            1.19        1.79         1.62        1.58        2.12        2.19
Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz (1Core/1Thread)
Payload in Bytes       16          64        256         1024        8192      16384
AES-128-GCM
  Baseline      259128.54   570756.43   1362554.16  1990654.57  2359128.88  2401671.58
  Patched       292139.47   1079320.95  2001974.63  2829007.46  4510318.59  4705314.41
  Speedup            1.13        1.89         1.47        1.42        1.91        1.96
AES-256-GCM
  Baseline      236000.34   550506.76   1234638.08  1716734.57  2011255.6   2028099.99
  Patched       247256.32   919731.34   1773270.43  2553239.55  3953115.14  4111227.29
  Speedup            1.05        1.67         1.44        1.49        1.97        2.03

mattcaswell

Comments below. I have not reviewed the assembler. I will accept external review of this code.

providers/implementations/ciphers/cipher_aes_gcm_hw_vaes_avx512.inc

providers/implementations/ciphers/cipher_aes_gcm_hw_aesni.inc

amatyuko-intc · 2021-12-08T20:13:23Z

Thanks Matt for a quick review! I have fixed mentioned style issues plus the ones that were pointed by check-format.pl.

mattcaswell · 2021-12-09T14:27:58Z

LGTM.

paulidale

LGTM, although I've not inspected the assembly.

dtzimmerman · 2021-12-14T17:02:13Z

I've lined up one ASM focused reviewer to start in January, and hope to have another lined up soon.

mdcornu

Will look into this more in the coming days.

mdcornu · 2022-01-12T16:11:07Z

crypto/modes/asm/aes-gcm-avx512.pl

+# ; It should be called before restoring the XMM registers
+# ; for Windows (XMM6-XMM15).
+# ;
+sub clear_scratch_zmms_asm {


Clearing ZMM scratch registers is not needed, just vzeroupper should be used before returning.
This function can probably be removed.

This function is currently not called. $CLEAR_SCRATCH_REGISTERS flag is set to zero, so only vzeroupper is called.

mdcornu · 2022-01-12T16:12:14Z

crypto/modes/asm/aes-gcm-avx512.pl

+}
+
+# Clears all scratch GP registers
+sub clear_scratch_gps_asm {


Same here, scratch registers probably do not need to be cleared.

Same for GPR registers cleanup.

pablodelara · 2022-01-18T08:40:23Z

crypto/modes/asm/aes-gcm-avx512.pl

+        .quad     0x0000000000000001, 0x0000000000000000
+
+.align 16
+TWO:


TWO and TWOf are not used, so they can be removed.

Removed these and double checked and removed other unused data too. Thanks.

pablodelara · 2022-01-18T08:40:57Z

crypto/modes/asm/aes-gcm-avx512.pl

+        .quad  0x0000000000000000, 0x0400000000000000
+
+.align 64
+ddq_addbe_5678:


ddq_addbe_5678 and ddq_addbe_888 are not used, so they can be removed.

pablodelara · 2022-01-18T08:41:13Z

crypto/modes/asm/aes-gcm-avx512.pl

+        .quad  0x0000000000000000, 0x0400000000000000
+
+.align 64
+ddq_addbe_8888:


ddq_aadbe_8888 is not used, so it can be removed.

pablodelara · 2022-01-19T13:19:38Z

crypto/modes/asm/aes-gcm-avx512.pl

+  my $ZTMP5       = $_[7];
+  my $ZTMP6       = $_[8];
+  my $ZTMP7       = $_[9];
+  my $HKEYS_RANGE = $_[10];    # ; "first16", "mid16", "last16", "all", "first32", "last32"


last16 is not passed anywhere, so it can be removed (including lines 506, 571-581 and part of 583)

Although this code path is currently not used in the code generation, I would leave this for completeness. It does not impact resulting assembly. Do you have any objections?

pablodelara · 2022-01-19T13:24:45Z

crypto/modes/asm/aes-gcm-avx512.pl

+
+  $code .= <<___;
+        sub               \$`(16*16)`,$T2
+je            .L_CALC_AAD_done_${rndsuffix}


Any reason for this alignment and others in 1588, 1591, 1594...?

No, the issues of code formatting. I'll fix formatting for all jump instructions.

pablodelara · 2022-01-20T09:31:28Z

crypto/modes/asm/aes-gcm-avx512.pl

+  my $NUM_BLOCKS      = $_[6];     # [in] can only be 1, 2, 3, 4, 5, ..., 15 or 16 (not 0)
+  my $CTR             = $_[7];     # [in/out] current counter value
+  my $ENC_DEC         = $_[8];     # [in] cipher direction (ENC/DEC)
+  my $INSTANCE_TYPE   = $_[9];     # [in] multi_call or single_call


multi_call is always used, so I suggest to simplify this and remove INSTANCE_TYPE here, as multi_call is always passed. There are several macros that have this argument and can be removed.

You're right, only "multi_call" option is used in this file across all macros. An upstream code (from IPsec-MB library) which is used as a basis for this contribution supports both (multi_call and single_call), so I would leave this as is for support reasons to allow easier sync with upstream if needed. It does not affect current code generation. Do you think it is a valid reasoning?

Decided to remove this (single_call) code path in generator to not overload it.

pablodelara · 2022-01-20T09:32:21Z

crypto/modes/asm/aes-gcm-avx512.pl

+
+  my $rndsuffix = &random_string();
+
+  if ($INSTANCE_TYPE eq "single_call") {


All this block can be removed, as "single_call" is never set

Please see answer about multi_call / single_call above.

crypto/modes/asm/aes-gcm-avx512.pl

tj-odwyer · 2022-01-24T09:12:27Z

crypto/modes/asm/aes-gcm-avx512.pl

+# This implementation is based on the AES-GCM code (AVX512VAES + VPCLMULQDQ)
+# from Intel(R) Multi-Buffer Crypto for IPsec Library v1.1
+# (https://github.com/intel/intel-ipsec-mb).
+# Original author is Tomasz Kantecki <tomasz.kantecki@intel.com>.


It would be very useful to reference the 2 papers (Vinod Gopal et al) referenced in the ipsec-mb code here, since there are alot of useful background details. I believe it is OK to include references like this in the OpenSSL code - as I remember there are examples like this elsewhere.

Sure, will do.

tj-odwyer · 2022-01-24T09:15:40Z

crypto/modes/asm/aes-gcm-avx512.pl

+}
+
+# ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+# ;; GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)


The (128,127,126,121,0) notation for the reduction polynomial looks a little unusual. Perhaps something like "x^128 + x^127 + x^126 + x^121 + 1" is more conventional

tj-odwyer · 2022-01-24T09:17:30Z

crypto/modes/asm/aes-gcm-avx512.pl

+  my $T4 = $_[5];    #; [clobbered] xmm/ymm/zmm
+  my $T5 = $_[6];    #; [clobbered] xmm/ymm/zmm


looks like $T4 and $T5 are unused in the GHASH_MUL macro - perhaps remove

tj-odwyer · 2022-01-24T09:19:55Z

crypto/modes/asm/aes-gcm-avx512.pl

+        vpclmulqdq        \$0x11,$HK,$GH,$T1 # ; $T1 = a1*b1
+        vpclmulqdq        \$0x00,$HK,$GH,$T2 # ; $T2 = a0*b0
+        vpclmulqdq        \$0x01,$HK,$GH,$T3 # ; $T3 = a1*b0
+        vpclmulqdq        \$0x10,$HK,$GH,$GH # ; $GH = a0*b1


here it is a textbook multiplication with 4 X vpclmulqdq instructions - has the karatsuba method been considered, to reduce it to 3 X vpclmulqdq? (along with the additional ALU instructions required and additional Hashkey precomputations)
I see this approach discussed in the paper - "Vinodh Gopal et. al. Optimized Galois-Counter-Mode Implementation on Intel Architecture Processors. August, 2010". Potentially there were Arch improvements since the paper was written, meaning the benefit of removing a vpclmulqdq vs the additional ALU work for karatsuba doesn’t give a benefit for current Architectures

Yes, Karatsuba method was considered too, but as you correctly pointed out with the consistent uarch improvements simple schoolbook multiplication scales better.

tj-odwyer · 2022-01-24T09:21:31Z

crypto/modes/asm/aes-gcm-avx512.pl

+  &GHASH_MUL($T5, $HK, $T1, $T3, $T4, $T6, $T2);
+  $code .= <<___;
+        vmovdqu64         $T5,@{[HashKeyByIdx(3,$GCM128_CTX)]}
+        vinserti64x2      \$1,$T5,$ZT7,$ZT7
+
+        # ;; calculate HashKey^4<<1 mod poly
+___
+  &GHASH_MUL($T5, $HK, $T1, $T3, $T4, $T6, $T2);
+  $code .= <<___;
+        vmovdqu64         $T5,@{[HashKeyByIdx(4,$GCM128_CTX)]}
+        vinserti64x2      \$0,$T5,$ZT7,$ZT7


At the start of this block, both HK^1 and HK^2 are known - couldn’t HK^3 and HK^4 be calculated now in a single call to GHASH_MUL using ymm registers, instead of 2 calls to GHASH_MUL using xmm registers (with some extra instructions to create a ymm register containing HK^1 and HK^2), saving 1 call to GHASH_MUL?

Updated, thanks.

tj-odwyer · 2022-01-24T10:14:08Z

crypto/modes/asm/aes-gcm-avx512.pl

+  my $GPR2       = $_[4];     # [clobbered] GP register
+  my $GPR3       = $_[5];     # [clobbered] GP register
+  my $MASKREG    = $_[6];     # [clobbered] mask register
+  my $AAD_HASH   = $_[7];     # [out] XMM for AAD_HASH value (xmm14)


should there be an assumption about the exact xmm register used here? i.e. any xmm could potentially be used for $AAD_HASH by a caller of the GCM_UPDATE_AAD subroutine? (not just xmm14)

Agree, removed.

tj-odwyer · 2022-01-24T10:18:32Z

crypto/modes/asm/aes-gcm-avx512.pl

+  my $ZTMP15         = $_[27];    # [clobbered] ZMM register
+  my $ZTMP16         = $_[28];    # [clobbered] ZMM register
+  my $ZTMP17         = $_[29];    # [clobbered] ZMM register
+  my $ZTMP18         = $_[30];    # [clobbered] ZMM register
+  my $ZTMP19         = $_[31];    # [clobbered] ZMM register
+  my $ZTMP20         = $_[32];    # [clobbered] ZMM register
+  my $ZTMP21         = $_[33];    # [clobbered] ZMM register
+  my $ZTMP22         = $_[34];    # [clobbered] ZMM register


looks like $ZTMP15 to $ZTMP22 are unused in the subroutine, remove?

Agree, removed.

tj-odwyer · 2022-01-24T10:25:42Z

crypto/modes/asm/aes-gcm-avx512.pl

+  my $GPR2       = $_[5];     # [clobbered] GP register
+  my $GPR3       = $_[6];     # [clobbered] GP register
+  my $MASKREG    = $_[7];     # [clobbered] mask register
+  my $CUR_COUNT  = $_[8];     # [out] XMM with current counter (xmm2)


should xmm2 be mentioned here? (i.e. it could be any xmm register used by subroutine caller)

Agree, removed.

tj-odwyer · 2022-01-24T10:27:31Z

crypto/modes/asm/aes-gcm-avx512.pl

+TWOf:
+        .quad     0x0000000000000000, 0x0200000000000000


TWOf looks to be unused, perhaps remove?

Yep, removed also other unused data entries.

tj-odwyer · 2022-01-24T10:28:55Z

crypto/modes/asm/aes-gcm-avx512.pl

+mask_out_top_block:
+        .quad      0xffffffffffffffff, 0xffffffffffffffff
+        .quad      0xffffffffffffffff, 0xffffffffffffffff
+        .quad      0xffffffffffffffff, 0xffffffffffffffff
+        .quad      0x0000000000000000, 0x0000000000000000
+___


mask_out_top_block looks to be unused, perhaps remove?

mdcornu · 2022-01-24T15:56:19Z

crypto/modes/asm/aes-gcm-avx512.pl

+# ;; - it is assumed that data read from $INPTR is already shuffled and
+# ;;   $INPTR address is 64 byte aligned
+# ;; - there is an option to pass ready blocks through ZMM registers too.
+# ;;   4 extra parameters need to passed in such case and 21st argument can be empty


Could add ($_[20]) or ($ZTMP9) after "21st argument" to make it more clear which arg can be empty

Added, thank for noticing.

mdcornu · 2022-01-24T16:02:21Z

crypto/modes/asm/aes-gcm-avx512.pl

+}
+
+# ;; ===========================================================================
+# ;; schoolbook multiply of 16 blocks (8 x 16 bytes)


16 x 16 bytes?

Yep, updated.

mdcornu · 2022-01-24T16:47:35Z

crypto/modes/asm/aes-gcm-avx512.pl

+        vshufi64x2        \$0x00,$ZT7,$ZT7,$ZT5                 # ;; broadcast HashKey^8 across all ZT5
+___
+
+  # ;; calculate HashKey^9<<1 mod poly, HashKey^10<<1 mod poly, ... HashKey^48<<1 mod poly


... HashKey^16<<1 mod poly
Since hkeys 17 .. 48 are computed somewhere else

You're correct, fixed.

mdcornu · 2022-01-24T16:57:48Z

crypto/modes/asm/aes-gcm-avx512.pl

+jb            .L_AAD_blocks_13_${rndsuffix}
+        je            .L_AAD_blocks_14_${rndsuffix}
+        cmp               \$15,@{[DWORD($T2)]}
+je            .L_AAD_blocks_15_${rndsuffix}


indentation of jump instructions is inconsistent throughout the module

mdcornu · 2022-01-24T17:16:04Z

crypto/modes/asm/aes-gcm-avx512.pl

+  if ($do_reduction != 0) {
+
+    # ;; GH1H holds reduced hash value
+    # ;; - normally do "vmovdqa64 XWORD($HASH_IN_OUT), XWORD($GH1H)"


could be converted to AT&T syntax for consistency and other places with ASM code in the comments

Yep, thanks.

mdcornu · 2022-01-24T20:10:56Z

crypto/modes/asm/aes-gcm-avx512.pl

+.align 32
+.Laes_gcm_encrypt_${keylen}_avx512:
+___
+  &GCM_ENC_DEC("$arg1", "$arg2", "$arg3", "$arg4", "$arg5", "$arg6", "ENC", "multi_call");


Look like all macros are called with "multi-call". Is there a reason to keep all the "single-call" code in the macros?

mdcornu · 2022-01-25T15:05:34Z

crypto/modes/asm/aes-gcm-avx512.pl

+
+    # ;; =================================================
+    # ;; Return GHASH value  through $GH1H
+  }


empty if statement should be removed

Thanks, fixed all style comments below.

mdcornu · 2022-01-25T15:15:16Z

crypto/modes/asm/aes-gcm-avx512.pl

+  #     $code .= <<___;
+  #         add               $PLAIN_CIPH_LEN,`$CTX_OFFSET_InLen`($GCM128_CTX)
+  # ___
+  #   }


Can this commented out code be removed?

mdcornu · 2022-01-25T15:17:16Z

crypto/modes/asm/aes-gcm-avx512.pl

+
+  $code .= <<___;
+        cmp               \$`(32 * 16)`,$LENGTH
+jb            .L_message_below_32_blocks_${rndsuffix}


indentation off here and lots more places below in this function

mdcornu · 2022-01-25T15:25:36Z

crypto/modes/asm/aes-gcm-avx512.pl

+  $code .= <<___;
+        # ;; Check aes_keys != NULL
+        test               $arg1,$arg1
+jz      .Labort_setiv


indentation off for jz here and below

- Removed unused data - Removed unused code branch in perl generator (related to single_call scenario) - Indentation fixes - Added references to papers used in the work etc

pablodelara

Small comment from me, apart from that, rest looks good. One more thing, will this 3rd commit be squashed into the first commit after addressing all comments?

pablodelara · 2022-02-04T12:13:31Z

crypto/modes/asm/aes-gcm-avx512.pl

@@ -497,8 +519,7 @@ sub precompute_hkeys_on_stack {
  my $ZTMP4       = $_[6];
  my $ZTMP5       = $_[7];
  my $ZTMP6       = $_[8];
-  my $ZTMP7       = $_[9];
-  my $HKEYS_RANGE = $_[10];    # ; "first16", "mid16", "last16", "all", "first32", "last32"
+  my $HKEYS_RANGE = $_[9];    # ; "first16", "mid16", "last16", "all", "first32", "last32"


I think last16 is not passed anywhere, so it can be removed.

We generally squash commits when they are merged but it is better if the author does this beforehand.

I planned to do it (squash commits) when review is done. I think separate commits allow to better track changes during reviews, especially in such large files.

Thanks for clarifying! So if you agree with my comment about removing last16, I'm done with the review. Once this is settled, I will approve. Thanks for the work!

Thanks @pablodelara for the review. I removed "last16" part.

precompute_hkeys_on_stack() routine.

pablodelara

Looks good to me! Thanks for the work, Andrey!

tj-odwyer

Hi Andrey
I've checked all the resolutions to my comments and have no outstanding issues so I'm happy to Approve.
It is great work! And I have learned a lot during the review, thanks.
TJ

mattcaswell · 2022-02-08T16:11:33Z

@mdcornu - are you satisfied with the resolutions to your comments applied to this PR?

mdcornu

Looks OK to me.

paulidale · 2022-02-09T07:27:46Z

My approval stands. Thanks for the assistance everyone.

openssl-machine · 2022-02-10T11:00:18Z

This pull request is ready to merge

Vectorized 'stitched' encrypt + ghash implementation of AES-GCM enabled with AVX512 vAES and vPCLMULQDQ instructions (available starting Intel's IceLake micro-architecture). The performance details for representative IceLake Server and Client platforms are shown below Performance data: OpenSSL Speed KBs/Sec Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (1Core/1Thread) Payload in Bytes 16 64 256 1024 8192 16384 AES-128-GCM Baseline 478708.27 1118296.96 2428092.52 3518199.4 4172355.99 4235762.07 Patched 534613.95 2009345.55 3775588.15 5059517.64 8476794.88 8941541.79 Speedup 1.12 1.80 1.55 1.44 2.03 2.11 AES-256-GCM Baseline 399237.27 961699.9 2136377.65 2979889.15 3554823.37 3617757.5 Patched 475948.13 1720128.51 3462407.12 4696832.2 7532013.16 7924953.91 Speedup 1.19 1.79 1.62 1.58 2.12 2.19 Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz (1Core/1Thread) Payload in Bytes 16 64 256 1024 8192 16384 AES-128-GCM Baseline 259128.54 570756.43 1362554.16 1990654.57 2359128.88 2401671.58 Patched 292139.47 1079320.95 2001974.63 2829007.46 4510318.59 4705314.41 Speedup 1.13 1.89 1.47 1.42 1.91 1.96 AES-256-GCM Baseline 236000.34 550506.76 1234638.08 1716734.57 2011255.6 2028099.99 Patched 247256.32 919731.34 1773270.43 2553239.55 3953115.14 4111227.29 Speedup 1.05 1.67 1.44 1.49 1.97 2.03 Reviewed-by: TJ O'Dwyer, Marcel Cornu, Pablo de Lara Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #17239)

t8m · 2022-02-10T14:19:40Z

Merged after squashing the commits and expanding the commit message. Thank you all for your contribution and reviews.

Vectorized 'stitched' encrypt + ghash implementation of AES-GCM enabled with AVX512 vAES and vPCLMULQDQ instructions (available starting Intel's IceLake micro-architecture). The performance details for representative IceLake Server and Client platforms are shown below Performance data: OpenSSL Speed KBs/Sec Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (1Core/1Thread) Payload in Bytes 16 64 256 1024 8192 16384 AES-128-GCM Baseline 478708.27 1118296.96 2428092.52 3518199.4 4172355.99 4235762.07 Patched 534613.95 2009345.55 3775588.15 5059517.64 8476794.88 8941541.79 Speedup 1.12 1.80 1.55 1.44 2.03 2.11 AES-256-GCM Baseline 399237.27 961699.9 2136377.65 2979889.15 3554823.37 3617757.5 Patched 475948.13 1720128.51 3462407.12 4696832.2 7532013.16 7924953.91 Speedup 1.19 1.79 1.62 1.58 2.12 2.19 Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz (1Core/1Thread) Payload in Bytes 16 64 256 1024 8192 16384 AES-128-GCM Baseline 259128.54 570756.43 1362554.16 1990654.57 2359128.88 2401671.58 Patched 292139.47 1079320.95 2001974.63 2829007.46 4510318.59 4705314.41 Speedup 1.13 1.89 1.47 1.42 1.91 1.96 AES-256-GCM Baseline 236000.34 550506.76 1234638.08 1716734.57 2011255.6 2028099.99 Patched 247256.32 919731.34 1773270.43 2553239.55 3953115.14 4111227.29 Speedup 1.05 1.67 1.44 1.49 1.97 2.03 Reviewed-by: TJ O'Dwyer, Marcel Cornu, Pablo de Lara Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#17239) (cherry picked from commit 63b996e)

Vectorized 'stitched' encrypt + ghash implementation of AES-GCM enabled with AVX512 vAES and vPCLMULQDQ instructions (available starting Intel's IceLake micro-architecture). The performance details for representative IceLake Server and Client platforms are shown below Performance data: OpenSSL Speed KBs/Sec Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (1Core/1Thread) Payload in Bytes 16 64 256 1024 8192 16384 AES-128-GCM Baseline 478708.27 1118296.96 2428092.52 3518199.4 4172355.99 4235762.07 Patched 534613.95 2009345.55 3775588.15 5059517.64 8476794.88 8941541.79 Speedup 1.12 1.80 1.55 1.44 2.03 2.11 AES-256-GCM Baseline 399237.27 961699.9 2136377.65 2979889.15 3554823.37 3617757.5 Patched 475948.13 1720128.51 3462407.12 4696832.2 7532013.16 7924953.91 Speedup 1.19 1.79 1.62 1.58 2.12 2.19 Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz (1Core/1Thread) Payload in Bytes 16 64 256 1024 8192 16384 AES-128-GCM Baseline 259128.54 570756.43 1362554.16 1990654.57 2359128.88 2401671.58 Patched 292139.47 1079320.95 2001974.63 2829007.46 4510318.59 4705314.41 Speedup 1.13 1.89 1.47 1.42 1.91 1.96 AES-256-GCM Baseline 236000.34 550506.76 1234638.08 1716734.57 2011255.6 2028099.99 Patched 247256.32 919731.34 1773270.43 2553239.55 3953115.14 4111227.29 Speedup 1.05 1.67 1.44 1.49 1.97 2.03 Reviewed-by: TJ O'Dwyer, Marcel Cornu, Pablo de Lara Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #17239) (cherry picked from commit 63b996e)

github-actions bot added the severity: fips change The pull request changes FIPS provider sources label Dec 8, 2021

t8m added branch: master Merge to master branch triaged: feature The issue/pr requests/adds a feature approval: otc review pending This pull request needs review by an OTC member labels Dec 8, 2021

mattcaswell requested changes Dec 8, 2021

View reviewed changes

amatyuko-intc force-pushed the rls/2021/aes-gcm-icx branch from dc12196 to 0468bb5 Compare December 8, 2021 20:09

paulidale added the help wanted label Dec 9, 2021

paulidale approved these changes Dec 10, 2021

View reviewed changes

paulidale added approval: review pending This pull request needs review by a committer and removed approval: otc review pending This pull request needs review by an OTC member labels Dec 10, 2021

mdcornu suggested changes Jan 12, 2022

View reviewed changes

pablodelara reviewed Jan 20, 2022

View reviewed changes

tj-odwyer reviewed Jan 24, 2022

View reviewed changes

mdcornu reviewed Jan 24, 2022

View reviewed changes

mdcornu suggested changes Jan 25, 2022

View reviewed changes

amatyuko-intc added 3 commits February 1, 2022 12:28

AES-GCM enabled with AVX512 vAES and vPCLMULQDQ.

3bf6f86

Fixed style issues.

6e6f50c

Addressed feedback from @mdcornu, @tj-odwyer, @pablodelara

4cf4274

- Removed unused data - Removed unused code branch in perl generator (related to single_call scenario) - Indentation fixes - Added references to papers used in the work etc

amatyuko-intc force-pushed the rls/2021/aes-gcm-icx branch from 0468bb5 to 4cf4274 Compare February 2, 2022 14:30

pablodelara reviewed Feb 4, 2022

View reviewed changes

Removed unused "last16" code path handling from

c47febc

precompute_hkeys_on_stack() routine.

pablodelara approved these changes Feb 7, 2022

View reviewed changes

tj-odwyer approved these changes Feb 8, 2022

View reviewed changes

mdcornu approved these changes Feb 8, 2022

View reviewed changes

paulidale approved these changes Feb 9, 2022

View reviewed changes

paulidale removed the help wanted label Feb 9, 2022

t8m approved these changes Feb 9, 2022

View reviewed changes

t8m added approval: done This pull request has the required number of approvals and removed approval: review pending This pull request needs review by a committer labels Feb 9, 2022

openssl-machine removed the approval: done This pull request has the required number of approvals label Feb 10, 2022

openssl-machine added the approval: ready to merge The 24 hour grace period has passed, ready to merge label Feb 10, 2022

t8m closed this Feb 10, 2022


		my $rndsuffix = &random_string();

		if ($INSTANCE_TYPE eq "single_call") {

		my $T4 = $_[5]; #; [clobbered] xmm/ymm/zmm
		my $T5 = $_[6]; #; [clobbered] xmm/ymm/zmm

AES-GCM enabled with AVX512 vAES and vPCLMULQDQ. #17239

AES-GCM enabled with AVX512 vAES and vPCLMULQDQ. #17239

Conversation

amatyuko-intc commented Dec 8, 2021

mattcaswell left a comment

Choose a reason for hiding this comment

amatyuko-intc commented Dec 8, 2021

mattcaswell commented Dec 9, 2021

paulidale left a comment

Choose a reason for hiding this comment

dtzimmerman commented Dec 14, 2021

mdcornu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablodelara left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablodelara left a comment

Choose a reason for hiding this comment

tj-odwyer left a comment

Choose a reason for hiding this comment