Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"fusion" AES-GCM engine #310

merged 64 commits into from
Jun 17, 2020

"fusion" AES-GCM engine #310

merged 64 commits into from
Jun 17, 2020


Copy link

@kazuho kazuho commented May 12, 2020

Fusion is our own AES-GCM crypto, optimized for encrypting short blocks.


  • Perform entire AEAD operation at once. Processing of AAD and AES calculation necessary for generating the GCM tag run in parallel with payload protection. Traditional AES-GCM implementations handle separately the processing of AAD, payload protection, GCM finalization, focusing on the efficiency of processing the payload. To paraphrase, in case of traditional implementations, some execution ports remain idle while AAD or GCM finalization is being processed, while fusion tries to make use of pipelined, hyper-scaler CPU through the entire process.
  • Reduce GHASH once. Traditional AES-GCM implementations are designed to handle arbitrary sized AEAD blocks, and therefore periodically performs reduction (e.g., once per 6-8 blocks).
  • Calculate QUIC header protection vector in parallel. For maximum throughput, AES calculation of multiple blocks has to be done in parallel (fusion (and OpenSSL) do 6 blocks at a time). That means that there is likely some wasted CPU resource, when doing less than 6 blocks at once. Fusion checks if there's such wasted resource, and uses that to calculate the header protection vector in parallel.
  • Written using C, instead of assembly. Modern x86-64 CPUs have deep reorder buffers; there's less need to arrange the instructions precisely than there used to be.

AEAD throughput

OpenSSL suffers from ~30% performance penalty when encrypting AEAD blocks of 1440 bytes compared to when encrypting 16KB blocks. With fusion, that overhead is reduced to ~10%. Fusion also succeeds in hiding the crypto overhead of header protection.

To reproduce the results, fusionbench.c is available for measuring the performance of fusion. openssl speed command can be used to measure OpenSSL AEAD performance, though it should be noted that the undocumented -aead option have to be used to measure the AEAD performance including the overhead of preprocessing and post-processing (e.g., IV setup, AAD processing, GCM finalization). "openssl 1.1.1g (aead)" shows the numbers with the -aead option being set, while "openssl1.1.1g (aesgcm-core)" shows the numbers without.


  • add support for AES-256
  • auto-expand the pre-computed data
  • more tests, running OpenSSL in parallel

@kazuho kazuho marked this pull request as ready for review June 15, 2020 04:16
@kazuho kazuho merged commit b833001 into master Jun 17, 2020
fengjixuchui added a commit to fengjixuchui/picotls that referenced this pull request Jun 17, 2020
Merge pull request h2o#310 from h2o/kazuho/fusion
kazuho added a commit that referenced this pull request Feb 10, 2023
… detect supplied `key` being NULL, but how is that intended to work?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

3 participants