Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Kyber PQ-Safe KEM #2872

Merged
merged 6 commits into from
Feb 10, 2022
Merged

Conversation

boricm
Copy link
Contributor

@boricm boricm commented Dec 22, 2021

Adding support for Kyber (post-quantum-safe KEM) as specified in Round 3 of the NIST post-quantum project. Specification and link to NIST submission package: https://pq-crystals.org/kyber/resources.shtml

Contributed by:
@reneme @hrantzsch @omarama @boricm

Edit (hrantzsch):

This addresses #2500

There are a few things we would still like to address:

  • cherry-pick fixes and adjustments, using Shake_128_Cipher as XOF and fixes to be compatible with changes on master
  • split 90s mode and normal mode into different modules to make better use of our build system
  • fix docstring format
  • cleanup comments and TODOs
  • copyright headers
  • various minor code-style things (e.g. remove #pragma once)
  • a test that uses Kyber through the PK_En/Decryptor_* interface
  • test uncovered paths where reasonable
  • apply astyle (or clang-format?)
  • ASN.1 encoding according to IETF draft

Furthermore, there are some things that are likely future work:

  • port the AVX optimization paths from the reference implementation
  • look into a generic XOF-interface for Botan
  • benchmark/profile the algorithm's performance, especially regarding the XOF-StreamCipher in sample_rej_uniform

@randombit
Copy link
Owner

This is great to see! I'll get a review in ASAP.

Until then, a topic for discussion -- do we really need the 90s mode? It seems like it adds a relatively large amount of complexity. Will anyone use it?

@mouse07410
Copy link
Contributor

I don't know.

-90s is faster, but my benchmarks of the "normal" code showed performance sufficient for my needs.

Still, if it's not too big a headache, why not support both...

@hrantzsch
Copy link
Collaborator

do we really need the 90s mode?

Hm, good question. I can't say that we need it specifically, but I think it's a great use case for Botan's module system. If someone does have a configuration where the more modern primitives are not available, we can support that very well. The differences between the modes are also pretty local so it shouldn't be too difficult to separate the shared code parts from the specific parts.

Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial review. Many comments here and there will be more later but overall a good direction.

Another question wrt 90s mode. Currently the code makes a distinction between 90s and modern at key generation time. Would it make more sense to have this be an attribute of the KEM? That is, KyberMode is just 512/768/1024, which encapsulates the algorithmic params like N, eta1, etc. And then there is some additional abstraction (KyberSymmetricPrimitives or something like that) that implements the hashes, PRFs, etc, and the KEM instantiates one implementation or another (90s/modern) based on the param strings which are currently unused.

I think this design would make it a little simpler to actually realize the promise of supporting Kyber in limited configurations as it would cleanly split out the AES/SHA-2 vs SHA_3/SHAKE code into two implementations of KyberSymmetricPrimitives and these could be implemented in submodules which in turn depend on the relevant primitives.

One concern here is that maybe the Kyber designers do not recommend mixing modes for any particular key. It's been a long time since I read the Kyber specs and I don't recall what if anything they have to say about this.

src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.h Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/lib/pubkey/kyber/kyber.cpp Outdated Show resolved Hide resolved
src/tests/test_kyber.cpp Outdated Show resolved Hide resolved
src/tests/test_kyber.cpp Outdated Show resolved Hide resolved
src/tests/test_kyber.cpp Outdated Show resolved Hide resolved
src/tests/test_kyber.cpp Outdated Show resolved Hide resolved
src/tests/test_kyber.cpp Outdated Show resolved Hide resolved
@randombit
Copy link
Owner

Answering my own question

Another question wrt 90s mode. Currently the code makes a distinction between 90s and modern at key generation time. Would it make more sense to have this be an attribute of the KEM?

After reviewing the Kyber spec I remember why this is not possible, the A matrix is derived via the PRF and so every key is either modern or 90s, forever.

For this reason I think we should probably have two distinct OIDs, so the keys can be distinguished.

Or possibly 6 OIDs? One per size + 90s vs modern.

@mouse07410
Copy link
Contributor

IMHO, offhand, I'd prefer 6 OIDs.

…post-quantum project. Specification an link to NIST submission package: https://pq-crystals.org/kyber/resources.shtml

Co-authored-by: Manuel Glaser <Manuel.Glaser@rohde-schwarz.com>
Co-authored-by: René Meusel <rene.meusel@nexenio.com>
Co-authored-by: Hannes Rantzsch <hannes.rantzsch@nexenio.com>
@reneme
Copy link
Collaborator

reneme commented Jan 10, 2022

Regarding OIDs: We'll switch to the ones mentioned in this IETF draft standard. Also, this document specifies an ASN.1 encoding for Kyber keys. Probably we should support those as well.


#if !defined(BOTAN_HAS_KYBER_90S) && !defined(BOTAN_HAS_KYBER)
static_assert(false, "botan module 'kyber_common' is useful only when enabling modules 'kyber', 'kyber_90s' or both");
#endif
Copy link
Collaborator

@reneme reneme Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We opted to introduce three modules kyber (modern), kyber_90s and kyber_common. The latter contains the actual algorithm and supporting math code. The former two contain adapters to the required symmetric algorithms for XOF, KDF and more.

Hence, users enabling kyber_common exclusively at ./configure.py would end up with a dysfunctional kyber implementation. They'd need at least one of the symmetric algorithm adapters in the other mentioned modules. This static_assert is an attempt to warn them of their mistake at build time.

To the best of our knowledge, Botan's module system doesn't have a notion of an "internal module" for such use cases, right? Is that potentially something we'd want to introduce?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to @mouse07410 inline, to keep the context:

IMHO, it makes no sense to allow kyber_common for explicit configuration.
I'd have only kyber and kyber_90s as "configurable", each of which would force inclusion of kyber_common. Adding it explicitly without one of the other two makes no sense, disabling it when at least one of the others is enabled doesn't make sense either.

I agree with you! Though, given the inner workings of Botan's module system, we didn't figure out how to do that without introducing the kyber_common module. I.e. this solution is a technical crutch for the time being. Is there a better way, @randombit?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically there used to be a good way of doing this, a module could have an A or B dependency, which would work perfectly in this situation. It looks like I removed this in the otherwise unrelated commit ba8a26f - the functionality hadn't been used in some time at that point, and in that PR I was making other changes to dependency resolution to handle conditional dependencies based on OS features and apparently decided to remove | at the same time.

This current setup is certainly going to be confusing to end users. I'll see how hard it would be to add alternatives back, probably not bad. Then we would have kyber which would depend on kyber_modern | kyber_90s.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, if it's not too complicated to get alternatives back in. Would the module kyber default to the modern mode then? Or would users have to explicity enable kyber,kyber_modern?

Note that currently the kyber module is the modern mode and kyber_90s the 90s mode. So end users would need to explicitly only enable kyber_common to get a broken build. So naively using kyber would be fine. We hoped this would avoid confusion at least a bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, we might take a look at creating "internal modules" (i.e. by adding some new keyword in info.txt) so that the current kyber_common module could be "required" by kyber and kyber_90s but wouldn't show up in the user-facing ./configure.py --list-modules.

@mouse07410
Copy link
Contributor

IMHO, it makes no sense to allow kyber_common for explicit configuration.
I'd have only kyber and kyber_90s as "configurable", each of which would force inclusion of kyber_common. Adding it explicitly without one of the other two makes no sense, disabling it when at least one of the others is enabled doesn't make sense either.

@codecov-commenter
Copy link

codecov-commenter commented Jan 11, 2022

Codecov Report

Merging #2872 (a5e3fdd) into master (e81aa17) will increase coverage by 0.08%.
The diff coverage is 98.71%.

❗ Current head a5e3fdd differs from pull request most recent head ded2a96. Consider uploading reports for the commit ded2a96 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2872      +/-   ##
==========================================
+ Coverage   92.31%   92.40%   +0.08%     
==========================================
  Files         567      569       +2     
  Lines       62846    63633     +787     
  Branches     6146     6228      +82     
==========================================
+ Hits        58019    58802     +783     
- Misses       4795     4799       +4     
  Partials       32       32              
Impacted Files Coverage Δ
src/tests/test_kyber.cpp 97.52% <97.52%> (ø)
src/lib/pubkey/kyber/kyber_common/kyber.cpp 98.92% <98.92%> (ø)
src/lib/asn1/oid_maps.cpp 100.00% <100.00%> (ø)
src/tests/test_rng.cpp 81.89% <0.00%> (-0.24%) ⬇️
src/tests/test_tls.cpp 92.46% <0.00%> (ø)
src/lib/tls/tls_text_policy.cpp 85.71% <0.00%> (ø)
src/lib/block/cast128/cast128.cpp 100.00% <0.00%> (ø)
src/lib/rng/system_rng/system_rng.cpp 75.47% <0.00%> (ø)
src/lib/pubkey/mce/gf2m_rootfind_dcmp.cpp 99.26% <0.00%> (+<0.01%) ⬆️
src/lib/math/bigint/big_ops2.cpp 95.03% <0.00%> (+0.03%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e81aa17...ded2a96. Read the comment docs.

@reneme
Copy link
Collaborator

reneme commented Jan 11, 2022

Regarding OIDs: We'll switch to the ones mentioned in this IETF draft standard. Also, this document specifies an ASN.1 encoding for Kyber keys. Probably we should support those as well.

@randombit Regarding ASN.1 encoding: Do you think it makes sense to support the reference implementation's encoding (simple byte concatenation of parameters) as well? If yes, what's the preferred way of selecting which encoding should be read/emitted?

Further: Should there be constructors for Kyber_PublicKey/Kyber_PrivateKey that take raw algorithm-specific parameters? This might come in handy for users that want to interface with implementations that provide some other unsupported encoding. Or could this be left as future improvement?

Edit: The (preliminary) ASN.1 spec defines a partial and full encoding of the private key. Same questions arise for that.

@omarama @boricm Do you have some specific requirements here?

@mouse07410
Copy link
Contributor

I have some reservations about ASN.1 encoding from that document. Let me double-check and get back here.

@boricm
Copy link
Contributor Author

boricm commented Jan 12, 2022

Regarding OIDs: We'll switch to the ones mentioned in this IETF draft standard. Also, this document specifies an ASN.1 encoding for Kyber keys. Probably we should support those as well.

@randombit Regarding ASN.1 encoding: Do you think it makes sense to support the reference implementation's encoding (simple byte concatenation of parameters) as well? If yes, what's the preferred way of selecting which encoding should be read/emitted?

Further: Should there be constructors for Kyber_PublicKey/Kyber_PrivateKey that take raw algorithm-specific parameters? This might come in handy for users that want to interface with implementations that provide some other unsupported encoding. Or could this be left as future improvement?

Edit: The (preliminary) ASN.1 spec defines a partial and full encoding of the private key. Same questions arise for that.

@omarama @boricm Do you have some specific requirements here?

Thanks for finding this document, @reneme! Back when we thought about the API and IODs, this document was not yet published. That's the only reason why we didn't use the encoding and OIDs specified there.
@omarama and me would like to support both - the reference implementations encoding as well as the ASN.1 encoding from this document.

@randombit
Copy link
Owner

Certainly would want to support any IETF-standardized ASN.1 encoding (even if suboptimal). No strong opinion on supporting the reference implementation encoding.

@randombit
Copy link
Owner

Re the ASN.1 I would much rather they have just said "everything is a single OCTET STRING which encapsulates whatever the reference implementation uses", rather than reflecting the algorithm specific details in the ASN.1 for no real reason. I'll send a comment, once I figure out what WG this is going through...

@reneme
Copy link
Collaborator

reneme commented Jan 13, 2022

So we decided to add a rudimentary support for the drafted ASN.1 format. Basically we support to read/write the "full" version of the encoding and depend on the (optional) embedding of the public key in the private key encoding.

While playing with the implementation, a few additional problems showed up. Leaving them here, in case you want to comment on the draft:

"full" and "partial" encodings aren't compatible

The full-encoding does not contain the seed required for the "partial" encoding. Hence, a private key decoded from "full" (or the reference implementation's "raw" encoding) can never be encoded as partial again.

"partial" does not contain the Z parameter

From our understanding, this parameter can be re-rolled by an RNG for every session. Though, this means that private keys decoded from "partial" and re-encoded to "full" will not be byte-by-byte compatible.

Encoded private key without embedded public key

The embedded public key is optional in the ASN.1 encodings of the private key. From our understanding, Kyber does not allow to recover the public key from the private key. Hence, if no public key was embedded, Botan cannot fulfil its polymorphy-based API: I.e. the resulting private key object would not be able to act as a public key.

reneme and others added 2 commits January 13, 2022 17:42
Co-Authored-By: Hannes Rantzsch <hannes.rantzsch@nexenio.com>
* Shake_128_Cipher as XOF
* Split Kyber "modern" and "90s" modes into botan modules
* copyright headers
* OIDs for different kyber modes
* Support ASN.1 Full encoding

Co-authored-by: Hannes Rantzsch <hannes.rantzsch@nexenio.com>
@hrantzsch hrantzsch changed the title WIP: Add support for Kyber PQ-Safe KEM Add support for Kyber PQ-Safe KEM Jan 13, 2022
Co-Authored-By: Hannes Rantzsch <hannes.rantzsch@nexenio.com>
@hrantzsch
Copy link
Collaborator

We're done with all the improvements and refactorings we had in mind and discussed above. Ready for another round of review :)

@boricm
Copy link
Contributor Author

boricm commented Jan 14, 2022

Hi @hrantzsch and @reneme!
@omarama and me noticed that you added a random = line to the test vectors. Where did you get this random from? Did you capture the output of the kyber-rng (which was only used for the KAT-tests)?
We like the approach with the unmodified test vectors, as you can simply add new vectors if the NIST provides them. What are your thoughts on this?

@reneme
Copy link
Collaborator

reneme commented Jan 16, 2022

Where did you get this random from? Did you capture the output of the kyber-rng (which was only used for the KAT-tests)?

Yes, exactly. It's the random string produced by the RNG that used to be implemented in the KAT tests. We opted for this "static" approach instead of shipping an entire RNG implementation. See also @randombit's comment here.

We like the approach with the unmodified test vectors, as you can simply add new vectors if the NIST provides them. What are your thoughts on this?

If NIST provides more test vectors, we can do the same approach and hard-code the random outputs, no? Its a bit of extra work up-front but saves the additional homebrew RNG in the code base.

@mouse07410
Copy link
Contributor

With the misguided API that NIST enforced, I think keeping a homebrew NIST-specified (P)RNG in the code is the only way to deal with KAT. 😟

src/lib/utils/compiler.h Outdated Show resolved Hide resolved
Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One side channel issue that should be resolved before merge, but besides that this is looking really good.

I have not had time to work on the modules issue where we would like have the Kyber module require that at least one implementation be available at build time. I don't think that should hold up this merging.

Since the Kyber specification is still somewhat subject to change I would be against backporting this to the 2.x branch at least until NIST announces something (which I have heard should be by end of March).

src/lib/pubkey/kyber/kyber_common/kyber.cpp Outdated Show resolved Hide resolved
Co-authored-by: René Meusel <rene.meusel@nexenio.com>
@randombit randombit merged commit 61bde0a into randombit:master Feb 10, 2022
@randombit
Copy link
Owner

Merged! @boricm @hrantzsch @reneme @omarama thanks for all the work put into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants