Add support for Kyber PQ-Safe KEM #2872

boricm · 2021-12-22T14:26:25Z

Adding support for Kyber (post-quantum-safe KEM) as specified in Round 3 of the NIST post-quantum project. Specification and link to NIST submission package: https://pq-crystals.org/kyber/resources.shtml

Contributed by:
@reneme @hrantzsch @omarama @boricm

Edit (hrantzsch):

This addresses #2500

There are a few things we would still like to address:

Furthermore, there are some things that are likely future work:

port the AVX optimization paths from the reference implementation
look into a generic XOF-interface for Botan
benchmark/profile the algorithm's performance, especially regarding the XOF-StreamCipher in sample_rej_uniform

randombit · 2021-12-22T21:18:24Z

This is great to see! I'll get a review in ASAP.

Until then, a topic for discussion -- do we really need the 90s mode? It seems like it adds a relatively large amount of complexity. Will anyone use it?

mouse07410 · 2021-12-22T21:50:05Z

I don't know.

-90s is faster, but my benchmarks of the "normal" code showed performance sufficient for my needs.

Still, if it's not too big a headache, why not support both...

hrantzsch · 2021-12-23T07:56:49Z

do we really need the 90s mode?

Hm, good question. I can't say that we need it specifically, but I think it's a great use case for Botan's module system. If someone does have a configuration where the more modern primitives are not available, we can support that very well. The differences between the modes are also pretty local so it shouldn't be too difficult to separate the shared code parts from the specific parts.

randombit

An initial review. Many comments here and there will be more later but overall a good direction.

Another question wrt 90s mode. Currently the code makes a distinction between 90s and modern at key generation time. Would it make more sense to have this be an attribute of the KEM? That is, KyberMode is just 512/768/1024, which encapsulates the algorithmic params like N, eta1, etc. And then there is some additional abstraction (KyberSymmetricPrimitives or something like that) that implements the hashes, PRFs, etc, and the KEM instantiates one implementation or another (90s/modern) based on the param strings which are currently unused.

I think this design would make it a little simpler to actually realize the promise of supporting Kyber in limited configurations as it would cleanly split out the AES/SHA-2 vs SHA_3/SHAKE code into two implementations of KyberSymmetricPrimitives and these could be implemented in submodules which in turn depend on the relevant primitives.

One concern here is that maybe the Kyber designers do not recommend mixing modes for any particular key. It's been a long time since I read the Kyber specs and I don't recall what if anything they have to say about this.

src/lib/pubkey/kyber/kyber.cpp

src/lib/pubkey/kyber/kyber.h

src/lib/pubkey/kyber/kyber.cpp

src/tests/test_kyber.cpp

randombit · 2021-12-31T18:39:01Z

Answering my own question

Another question wrt 90s mode. Currently the code makes a distinction between 90s and modern at key generation time. Would it make more sense to have this be an attribute of the KEM?

After reviewing the Kyber spec I remember why this is not possible, the A matrix is derived via the PRF and so every key is either modern or 90s, forever.

For this reason I think we should probably have two distinct OIDs, so the keys can be distinguished.

Or possibly 6 OIDs? One per size + 90s vs modern.

mouse07410 · 2021-12-31T20:20:18Z

IMHO, offhand, I'd prefer 6 OIDs.

…post-quantum project. Specification an link to NIST submission package: https://pq-crystals.org/kyber/resources.shtml Co-authored-by: Manuel Glaser <Manuel.Glaser@rohde-schwarz.com> Co-authored-by: René Meusel <rene.meusel@nexenio.com> Co-authored-by: Hannes Rantzsch <hannes.rantzsch@nexenio.com>

reneme · 2022-01-10T16:26:34Z

Regarding OIDs: We'll switch to the ones mentioned in this IETF draft standard. Also, this document specifies an ASN.1 encoding for Kyber keys. Probably we should support those as well.

src/lib/pubkey/kyber/kyber_common/kyber.cpp

reneme · 2022-01-11T11:11:13Z

src/lib/pubkey/kyber/kyber_common/kyber.h

+
+#if !defined(BOTAN_HAS_KYBER_90S) && !defined(BOTAN_HAS_KYBER)
+   static_assert(false, "botan module 'kyber_common' is useful only when enabling modules 'kyber', 'kyber_90s' or both");
+#endif


We opted to introduce three modules kyber (modern), kyber_90s and kyber_common. The latter contains the actual algorithm and supporting math code. The former two contain adapters to the required symmetric algorithms for XOF, KDF and more.

Hence, users enabling kyber_common exclusively at ./configure.py would end up with a dysfunctional kyber implementation. They'd need at least one of the symmetric algorithm adapters in the other mentioned modules. This static_assert is an attempt to warn them of their mistake at build time.

To the best of our knowledge, Botan's module system doesn't have a notion of an "internal module" for such use cases, right? Is that potentially something we'd want to introduce?

Replying to @mouse07410 inline, to keep the context:

IMHO, it makes no sense to allow kyber_common for explicit configuration.
I'd have only kyber and kyber_90s as "configurable", each of which would force inclusion of kyber_common. Adding it explicitly without one of the other two makes no sense, disabling it when at least one of the others is enabled doesn't make sense either.

I agree with you! Though, given the inner workings of Botan's module system, we didn't figure out how to do that without introducing the kyber_common module. I.e. this solution is a technical crutch for the time being. Is there a better way, @randombit?

Ironically there used to be a good way of doing this, a module could have an A or B dependency, which would work perfectly in this situation. It looks like I removed this in the otherwise unrelated commit ba8a26f - the functionality hadn't been used in some time at that point, and in that PR I was making other changes to dependency resolution to handle conditional dependencies based on OS features and apparently decided to remove | at the same time.

This current setup is certainly going to be confusing to end users. I'll see how hard it would be to add alternatives back, probably not bad. Then we would have kyber which would depend on kyber_modern | kyber_90s.

Sounds good, if it's not too complicated to get alternatives back in. Would the module kyber default to the modern mode then? Or would users have to explicity enable kyber,kyber_modern?

Note that currently the kyber module is the modern mode and kyber_90s the 90s mode. So end users would need to explicitly only enable kyber_common to get a broken build. So naively using kyber would be fine. We hoped this would avoid confusion at least a bit.

As an alternative, we might take a look at creating "internal modules" (i.e. by adding some new keyword in info.txt) so that the current kyber_common module could be "required" by kyber and kyber_90s but wouldn't show up in the user-facing ./configure.py --list-modules.

mouse07410 · 2022-01-11T11:20:14Z

IMHO, it makes no sense to allow kyber_common for explicit configuration.
I'd have only kyber and kyber_90s as "configurable", each of which would force inclusion of kyber_common. Adding it explicitly without one of the other two makes no sense, disabling it when at least one of the others is enabled doesn't make sense either.

codecov-commenter · 2022-01-11T11:26:43Z

Codecov Report

Merging #2872 (a5e3fdd) into master (e81aa17) will increase coverage by 0.08%.
The diff coverage is 98.71%.

❗ Current head a5e3fdd differs from pull request most recent head ded2a96. Consider uploading reports for the commit ded2a96 to get more accurate results

@@            Coverage Diff             @@
##           master    #2872      +/-   ##
==========================================
+ Coverage   92.31%   92.40%   +0.08%     
==========================================
  Files         567      569       +2     
  Lines       62846    63633     +787     
  Branches     6146     6228      +82     
==========================================
+ Hits        58019    58802     +783     
- Misses       4795     4799       +4     
  Partials       32       32

Impacted Files	Coverage Δ
src/tests/test_kyber.cpp	`97.52% <97.52%> (ø)`
src/lib/pubkey/kyber/kyber_common/kyber.cpp	`98.92% <98.92%> (ø)`
src/lib/asn1/oid_maps.cpp	`100.00% <100.00%> (ø)`
src/tests/test_rng.cpp	`81.89% <0.00%> (-0.24%)`	⬇️
src/tests/test_tls.cpp	`92.46% <0.00%> (ø)`
src/lib/tls/tls_text_policy.cpp	`85.71% <0.00%> (ø)`
src/lib/block/cast128/cast128.cpp	`100.00% <0.00%> (ø)`
src/lib/rng/system_rng/system_rng.cpp	`75.47% <0.00%> (ø)`
src/lib/pubkey/mce/gf2m_rootfind_dcmp.cpp	`99.26% <0.00%> (+<0.01%)`	⬆️
src/lib/math/bigint/big_ops2.cpp	`95.03% <0.00%> (+0.03%)`	⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e81aa17...ded2a96. Read the comment docs.

reneme · 2022-01-11T16:53:02Z

Regarding OIDs: We'll switch to the ones mentioned in this IETF draft standard. Also, this document specifies an ASN.1 encoding for Kyber keys. Probably we should support those as well.

@randombit Regarding ASN.1 encoding: Do you think it makes sense to support the reference implementation's encoding (simple byte concatenation of parameters) as well? If yes, what's the preferred way of selecting which encoding should be read/emitted?

Further: Should there be constructors for Kyber_PublicKey/Kyber_PrivateKey that take raw algorithm-specific parameters? This might come in handy for users that want to interface with implementations that provide some other unsupported encoding. Or could this be left as future improvement?

Edit: The (preliminary) ASN.1 spec defines a partial and full encoding of the private key. Same questions arise for that.

@omarama @boricm Do you have some specific requirements here?

mouse07410 · 2022-01-12T02:23:22Z

I have some reservations about ASN.1 encoding from that document. Let me double-check and get back here.

boricm · 2022-01-12T11:25:27Z

Regarding OIDs: We'll switch to the ones mentioned in this IETF draft standard. Also, this document specifies an ASN.1 encoding for Kyber keys. Probably we should support those as well.

@randombit Regarding ASN.1 encoding: Do you think it makes sense to support the reference implementation's encoding (simple byte concatenation of parameters) as well? If yes, what's the preferred way of selecting which encoding should be read/emitted?

Further: Should there be constructors for Kyber_PublicKey/Kyber_PrivateKey that take raw algorithm-specific parameters? This might come in handy for users that want to interface with implementations that provide some other unsupported encoding. Or could this be left as future improvement?

Edit: The (preliminary) ASN.1 spec defines a partial and full encoding of the private key. Same questions arise for that.

@omarama @boricm Do you have some specific requirements here?

Thanks for finding this document, @reneme! Back when we thought about the API and IODs, this document was not yet published. That's the only reason why we didn't use the encoding and OIDs specified there.
@omarama and me would like to support both - the reference implementations encoding as well as the ASN.1 encoding from this document.

randombit · 2022-01-12T13:56:50Z

Certainly would want to support any IETF-standardized ASN.1 encoding (even if suboptimal). No strong opinion on supporting the reference implementation encoding.

randombit · 2022-01-12T13:59:30Z

Re the ASN.1 I would much rather they have just said "everything is a single OCTET STRING which encapsulates whatever the reference implementation uses", rather than reflecting the algorithm specific details in the ASN.1 for no real reason. I'll send a comment, once I figure out what WG this is going through...

reneme · 2022-01-13T15:55:21Z

So we decided to add a rudimentary support for the drafted ASN.1 format. Basically we support to read/write the "full" version of the encoding and depend on the (optional) embedding of the public key in the private key encoding.

While playing with the implementation, a few additional problems showed up. Leaving them here, in case you want to comment on the draft:

"full" and "partial" encodings aren't compatible

The full-encoding does not contain the seed required for the "partial" encoding. Hence, a private key decoded from "full" (or the reference implementation's "raw" encoding) can never be encoded as partial again.

"partial" does not contain the Z parameter

From our understanding, this parameter can be re-rolled by an RNG for every session. Though, this means that private keys decoded from "partial" and re-encoded to "full" will not be byte-by-byte compatible.

Encoded private key without embedded public key

The embedded public key is optional in the ASN.1 encodings of the private key. From our understanding, Kyber does not allow to recover the public key from the private key. Hence, if no public key was embedded, Botan cannot fulfil its polymorphy-based API: I.e. the resulting private key object would not be able to act as a public key.

Co-Authored-By: Hannes Rantzsch <hannes.rantzsch@nexenio.com>

* Shake_128_Cipher as XOF * Split Kyber "modern" and "90s" modes into botan modules * copyright headers * OIDs for different kyber modes * Support ASN.1 Full encoding Co-authored-by: Hannes Rantzsch <hannes.rantzsch@nexenio.com>

Co-Authored-By: Hannes Rantzsch <hannes.rantzsch@nexenio.com>

hrantzsch · 2022-01-13T17:06:20Z

We're done with all the improvements and refactorings we had in mind and discussed above. Ready for another round of review :)

boricm · 2022-01-14T14:06:49Z

Hi @hrantzsch and @reneme!
@omarama and me noticed that you added a random = line to the test vectors. Where did you get this random from? Did you capture the output of the kyber-rng (which was only used for the KAT-tests)?
We like the approach with the unmodified test vectors, as you can simply add new vectors if the NIST provides them. What are your thoughts on this?

reneme · 2022-01-16T11:35:40Z

Where did you get this random from? Did you capture the output of the kyber-rng (which was only used for the KAT-tests)?

Yes, exactly. It's the random string produced by the RNG that used to be implemented in the KAT tests. We opted for this "static" approach instead of shipping an entire RNG implementation. See also @randombit's comment here.

We like the approach with the unmodified test vectors, as you can simply add new vectors if the NIST provides them. What are your thoughts on this?

If NIST provides more test vectors, we can do the same approach and hard-code the random outputs, no? Its a bit of extra work up-front but saves the additional homebrew RNG in the code base.

mouse07410 · 2022-01-27T10:20:53Z

With the misguided API that NIST enforced, I think keeping a homebrew NIST-specified (P)RNG in the code is the only way to deal with KAT. 😟

src/lib/utils/compiler.h

randombit

One side channel issue that should be resolved before merge, but besides that this is looking really good.

I have not had time to work on the modules issue where we would like have the Kyber module require that at least one implementation be available at build time. I don't think that should hold up this merging.

Since the Kyber specification is still somewhat subject to change I would be against backporting this to the 2.x branch at least until NIST announces something (which I have heard should be by end of March).

src/lib/pubkey/kyber/kyber_common/kyber.cpp

Co-authored-by: René Meusel <rene.meusel@nexenio.com>

randombit · 2022-02-10T12:27:36Z

Merged! @boricm @hrantzsch @reneme @omarama thanks for all the work put into this.

hrantzsch mentioned this pull request Dec 22, 2021

WIP: Implement Kyber on v2 Release neXenio/kbls-botan#2

Closed

28 tasks