-
-
Notifications
You must be signed in to change notification settings - Fork 11k
design: Fixed size large numbers #28522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Call-out to those who expressed an early interest: |
|
ARGH! I really, really dislike our forced mixed header style. |
|
So the idea is to create a new type and deprecated the old type? |
|
The idea is to create a new type, and replace our internal uses. Deprecation is not part of this effort. |
|
We could eventually deprecate all the math functions using the BIGNUM type, but not the type itself or some of the functions that we do not plan to replace with OSSL_FN counterparts. I mean in some more distant future we could do it as well but it is not pressing and intended to be done as part of this effort. |
|
I've added this note in the design document to clarify the current intent: Initial implementation is predicted to be internal. However, the intent is |
fdce6e9 to
197aaee
Compare
rsith71
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions I have that will help me understand things better.
paulidale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of discussion about constant size which is great. I'm not seeing mention of making things constant time as well. Constant time is a lot more than just constant size.
I also feel that we should be aiming for a rewrite rather than trying to reuse BN code. By this I mean that the low level calls that take a BN_LONG pointer and a size are okay but things that deal with BN aren't.
Nonetheless, this is a great step forward.
|
|
||
| struct ossl_fn_st { | ||
| /* Flag: alloced with OSSL_FN_new() or OSSL_FN_secure_new() */ | ||
| unsigned int is_dynamically_allocated : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think it was possible to have a statically allocated struct with a flexible array at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's perfectly possible to cast an address to such a struct.
unsigned char *memory = 0xdeadbeef + 1; /* +1 will get us proper alignment */
struct ossl_fn_st *x = (struct ossl_fn_st *)memory;If you wonder when I'd ever do that, have a look at the OSSL_FN_CTX idea further down, or think about how an arena memory allocator usually works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, this requires that whatever memory address that we cast this way is in our control. That should go without saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pointing away from using a flexible array and having a pointer to the limbs. That has some advantages:
- the FN_CTX doesn't need to concern itself with structure sizes
- no need for offsetof
- sizeof() returns the propery correct length
- interchange with BN becomes easier (just point at the limbs, possibly adjusting sizes)
- no casts required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because OSSL_FN is made to have a given size when allocated, it's rather pointless to make the array of limbs a separate allocation. This design's intent is that the basic OSSL_FN allocation will be like this (modulo checks):
size_t ossl_fn_sizeof(size_t bits)
{
size_t bytes = (bits + 7) / 8;
size_t limbbytes = sizeof(((OSSL_FN *)NULL)->d[0]); /* yes, this is legitimate */
size_t limbs = (bytes + limbbytes - 1) / limbbytes;
return sizeof(OSSL_FN) + limbs * limbbytes;
}
OSSL_FN *OSSL_FN_new(size_t bits)
{
OSSL_FN *fn = OPENSSL_zalloc(ossl_fn_sizeof(bits));
fn->is_dynamically_allocated = 1;
return fn;
}It is, of course, possible to have BN_ULONG *d in struct ossl_fn_ctx, and just add fn->d = (BN_ULONG *)(fn + 1); in OSSL_FN_new()... but this may encourage allocating the struct and the BN_ULONG array separately, and I'd rather avoid that.
Going back to FN_CTX, not sure what you expect of it... I'm making it an arena allocation construct quite specifically because it allows very easy free of all temporary material => just free the arena and be done with it. I don't think that repurposing BN_CTX as it currently is would be much of a win.
| large an arena should be to accommodate the needs of a large tree of | ||
| function calls using it. A possible solution is to allocate a very large | ||
| arena (could 32kB be considered enough?), but it may require some | ||
| investigation to find out what's reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could instrument the current BN_CTX code to record actual usage and use that as a guide for FN_CTX sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to be cautious and allow dynamic CTXs with a size hint at the beginning. Who knows when an extra intermediate needs to be introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather go the pre-determined size route, i.e. the one allocating an OSSL_FN_CTX must allocate enough space. Better a little too big than the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we could also instrument OSSL_FN_CTX to determine optimal sizes. Should I include that in this design?
As far as I've followed, many of them, and especially the assembler routines, have already been worked through to offer constant time calculations. This is the assumption we've worked from so far, and is the main reason why this design is at a more zoomed out level and focuses on the constant size aspect. |
|
Thinking about all the review comments I've seen so far, I realise that this design document could be more precise with regards to its intended scope. To be fixed in an upcoming fixup commit |
| used in a fairly limited fashion. Furthermore, the `BN_CTX` internals allow | ||
| for a maximum of 16 `BIGNUM`s. A corresponding arena could the reasonably | ||
| have the size 16 \* *size of largest fixed number* plus a little extra for | ||
| bookkeeping purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if we run out of space in the arena what happens? Do we just fail, or do we allocate an additional chunk of memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO: we shouldn't fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a situation where the need for additional allocation would leak some secret information that the calculation itself wouldn't have leaked before...
so, yes, I don't it should fail if we run out of pre-allocated values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that for a known mathematical formula being computed, it should be possible to determine the total size of OSSL_FN_CTX space that's going to be used for that formula.
Of course, an OSSL_FN_CTX could be defined as including an array OSSL_FN numbers[16]; (just like the BN pool that BN_CTX uses), and let the allocation or reallocation ('cause the size of one already allocated might not fit its next use) of a number slot be done dynamically with OSSL_FN_CTX_get().
But, quite frankly, if this is what we're doing, we might as well just repurpose BN_CTX, 'cause the bookkeeping is likely to become just as complicated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(should we repurpose BN_CTX, we could make it a bit less complicated... a quick look at the code for that tells me that there's room for improvement, and I think that can be done without breaking anything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a new level of complexity beyond a pre-determined size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the CTX as we allocate it could be informed about the size of the modulus that will be used for the calculations...
Wouldn't it have to have pretty intimate knowledge of the set of calculations it's going to be thrown into for this to work? That makes it pretty specialised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will need a lot of knowledge about what's going to happen. Nothing that it couldn't know since it's being allocated by the provider that's doing the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so I have no such plans. The code that allocates an OSSL_FN_CTX will simply have to know, or allocate a plausibly large enough arena.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it should be possible to add some instrumentation to figure out a good enough size. I did mention that as a comment to your instrumentation comment, @paulidale, here
| pay attention to possible `struct` padding. Among other methods, one chosen | ||
| here is to precede the flexible array member with a member whose atomic type | ||
| is assumed to be large enough that no padding is needed after it, such as | ||
| `size_t` or a pointer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the natural assumption is that struct members are naturally aligned, with some added padding fields, if needed. I wonder, however, if there's use for getting the alignment up to 16 bytes (for AVX-256) or 64-bytes (because cache lines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we determine the need for such alignments in a build independent way? I'm thinking for example that OpenSSL is packaged in a number of fairly generic binary packages, so whatever we choose to do will have to work on any machine where those binaries are run.
If aligning for specific cache lines, for example, requires configuring for a specific machine, then the more generic binaries will not always be able to align for those.
Of course, the possibility of advanced code gymnastics to get alignment exactly right dynamically can be done... but yeah, I wonder how far it's worth to take that sort of thing.
|
I would like to look at this design from both constant-time and performance perspective. The design @levitte proposed is fine and matches traditional OpenSSL design from the universal approach. Shouldn't we, however, instead of the universal approach use specialized types limited by maximum size specific for particular algorithms? Looks like AWSlabs use this approach (https://github.com/awslabs/s2n-bignum/blob/main/include/s2n-bignum.h) and it brings great performance gain, AFAIK |
I see that idea as an orthogonal approach. I see the Fixed num approach that is being proposed here as a replacement for where we currently use BIGNUM in existing algorithms (RSA, DSA, DH etc). Trying to retro-fit specialised types into that existing code would be a much larger project IMO. I don't object to having specialised types for new code (or eventually using them for the existing algorithms). But I see that on a whole different timeline. |
To add to that, the existing cryptosystems using BIGNUMs are not Quantum-Safe. That by itself will likely make them obsolete in about 10 years. It would not be a good spending of resources to try to perfect their implementation with a huge effort only to completely abandon them a few years later. Please note we are NOT proposing to use the FIXNUM implementation for implementation of new PQ algorithms. This and the need to keep existing BIGNUM based interfaces to work (regardless of performance) are the main driving decisions of this design. |
|
I don't particularly like the fact that close compatibility with BIGNUM is a goal... Consider the situation with a multiply function: it needs three operands: two inputs and one output. For side-channel free operation the output must be of pre-allocated size: len(x) + len(y); calling that function with any other output is an invalid operation. Even more complicated for a multiply modulo (a combined multiplication and reduction modulo); what to do if one of the operands is larger than the modulus? For a general purpose numerical library it needs to be defined; for cryptograhic usage that situation basically should never happen (and is an indication that something terribly wrong has happened way before that). |
Then we need to drop all the get0 getters returning BIGNUMs in 4.0. |
that stuff is basically always used to import or export numbers to some formats that leak the bit size of the individual RSA parameters (PKCS#1 or XML), so keeping them side-channel free is not possible anyway if there are counter examples, I'd love to hear them, but I think it's better to just mark the API as not side-channel free and ignore that part... |
Or wrap the underlying implementation using code that does something rather than some pointer twiddles. |
slontis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the design, would it be possible to step back up a level away from the implementation and discuss
- What problems are we trying to solve?
- What are the possible options that were considered?. What are the pros and cons of different options?
If this work is targeted for 4.0, could we do breaking changes to BIGNUM without the need of wrappers for example?
I imagine for some (public) operations that we don't want fixed size.
Also should there even be a negative flag (most operations are modulus based)?
No, you cannot do that without severally changing the semantics of the get0 getters. |
couldn't you fix it by having "shadow" copies of the parameters in the struct, that are not used for calculation but for input-output only? |
that's not covered by the Goals section?
good point, it would be good to include that in the document
Do you have some examples in mind? |
yes, but for BIGNUM API, AFAIU it, the output size would just got re-allocated transparently. While this is a small change, it has a lot of impact on the calling code, which goes back to my worry about BIGNUM and OSSL_FN being "mostly" compatible. Yes, using the same limb construction is a good idea, and I don't see any issues with that, it's anything more that could be problematic...
well, while it could be part of the CTX used for the calculations, even if we consider just the RSA cryptosystem, sometimes we operate with numbers that are the size of the modulus, sometimes twice the modulus, sometimes the size of individual primes, sometimes twice as big as a prime, and sometimes as small as single limb (the public exponent). Yes, it's not a lot of different sizes, but it's not one specific size for all operations, even with a particular set of parameters (be it RSA, ECDSA, DH, ECDH...)
only if you include specific implementation into it. Some algorithms can be implemented in CPU efficient manner, then they need more memory, or memory efficient manner, then they will be slower.... so it's an amount that may change in the future |
|
said @tomato42:
Yeah, I hear you. Thing is, I'm viewing it all as BIGNUM being quite, how we say, external, in so far that as soon as the OSSL_FN API, all further calls from therein are entirely in the OSSL_FN "bubble" if you will, where there are no BIGNUMs in sight. So any size fitting should happen before the dive into the OSSL_FN bubble, and in the end, all user-facing BN function should simply wrap OSSL_FN. I'm quite confident this is achievable. Dunno if that helped you out of what it is you fear, though.
So hmmm, it sounds you're making this much more complex than I anticipated, or believe is necessary. With this simple pseudo-code: I fully expect that in an OSSL_FN world, the sizes of But, as you mention, there are intermediary numbers that may have other sizes. I fully expect those to be handled inside
Well... yes and no, I think? Doesn't my line of thought above, where a caller only needs to know what the sizes of the input operands and the resulting output should be, make it more independent of specific implementations? After all, for a well known formula like x = yz mod n, an implementation might want to calculate yz in an intermediary Am I making sense? All that being said, though,
Yes, and adaptations will have to be made |
well, yes. but what I mean, is that that if your goal is to start plugging in OSSL_FN functions and types from bottom up, then yes, it's not an issue
actually no, the intermediate OSSL_FN allocations will need to be the size of about like two to four times len(n) to keep the intermediate steps of the "square and multiply always" or Montgommery ladder... but yes, we can have the overall |
For all intent and purposes, what you said there aligns with my thoughts behind this design, at least the way I hear you.
Yeah, ok, so what I hear is a known (maximum) size for an intermediary or several. Sounds acceptable, and exactly what I was looking for. Thank you, this has highlighted a few areas that I need to clarify |
|
I've pushed a fixup that I hope answers many of the questions asked so far, as well as to clarify the scope of this design. |
68d38b1 to
2de120f
Compare
|
2de120f looks good to me |
t8m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @levitte. Just two minor comments.
beldmit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@openssl/committers please do not merge. This should go into a new feature/ossl_fn feature branch which needs to be created first. |
For the longest time, we have mitigated security issues related to large numbers (BIGNUM) and constant time in a piece-meal fashion, without really looking at the problem from a zoomed out, holistic perspective. An interesting aspect in this problem is that large numbers can vary in size, and that depending on their combined sizes, the time to perform mathematical calculations with them vary equally much, and may thereby unintentionally leak information on those numbers. To mitigate that sort of timing issue, we introduce fixed size numbers, which are designed to have payload sizes that are pre-determined, usually by the crypto system that uses them. This means that even a very small number (let's take 1 as a ridiculous example) would have the same size payload as a much larger number, and calculations using them would perform across all payload bits of all input numbers combined. These fixed size numbers primarly differ from BIGNUMs in that once they have been allocated to a certain size, that size will not change throughout its lifetime.
2bf542f to
37dfbed
Compare
|
I squashed all the commits together into one (all but the first were fixup commits anyway) |
|
24 hours has passed since 'approval: done' was set, but as this PR has been updated in that time the label 'approval: ready to merge' is not being automatically set. Please review the updates and set the label manually. |
|
Merged to the feature branch. Thank you. |
For the longest time, we have mitigated security issues related to large numbers (BIGNUM) and constant time in a piece-meal fashion, without really looking at the problem from a zoomed out, holistic perspective. An interesting aspect in this problem is that large numbers can vary in size, and that depending on their combined sizes, the time to perform mathematical calculations with them vary equally much, and may thereby unintentionally leak information on those numbers. To mitigate that sort of timing issue, we introduce fixed size numbers, which are designed to have payload sizes that are pre-determined, usually by the crypto system that uses them. This means that even a very small number (let's take 1 as a ridiculous example) would have the same size payload as a much larger number, and calculations using them would perform across all payload bits of all input numbers combined. These fixed size numbers primarly differ from BIGNUMs in that once they have been allocated to a certain size, that size will not change throughout its lifetime. Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #28522)
For the longest time, we have mitigated security issues related to large numbers (BIGNUM) and constant time in a piece-meal fashion, without really looking at the problem from a zoomed out, holistic perspective. An interesting aspect in this problem is that large numbers can vary in size, and that depending on their combined sizes, the time to perform mathematical calculations with them vary equally much, and may thereby unintentionally leak information on those numbers. To mitigate that sort of timing issue, we introduce fixed size numbers, which are designed to have payload sizes that are pre-determined, usually by the crypto system that uses them. This means that even a very small number (let's take 1 as a ridiculous example) would have the same size payload as a much larger number, and calculations using them would perform across all payload bits of all input numbers combined. These fixed size numbers primarly differ from BIGNUMs in that once they have been allocated to a certain size, that size will not change throughout its lifetime. Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#28522)
For the longest time, we have mitigated security issues related to large numbers (BIGNUM) and constant time in a piece-meal fashion, without really looking at the problem from a zoomed out, holistic perspective. An interesting aspect in this problem is that large numbers can vary in size, and that depending on their combined sizes, the time to perform mathematical calculations with them vary equally much, and may thereby unintentionally leak information on those numbers. To mitigate that sort of timing issue, we introduce fixed size numbers, which are designed to have payload sizes that are pre-determined, usually by the crypto system that uses them. This means that even a very small number (let's take 1 as a ridiculous example) would have the same size payload as a much larger number, and calculations using them would perform across all payload bits of all input numbers combined. These fixed size numbers primarly differ from BIGNUMs in that once they have been allocated to a certain size, that size will not change throughout its lifetime. Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#28522)
For the longest time, we have mitigated security issues related to large numbers (BIGNUM) and constant time in a piece-meal fashion, without really looking at the problem from a zoomed out, holistic perspective. An interesting aspect in this problem is that large numbers can vary in size, and that depending on their combined sizes, the time to perform mathematical calculations with them vary equally much, and may thereby unintentionally leak information on those numbers. To mitigate that sort of timing issue, we introduce fixed size numbers, which are designed to have payload sizes that are pre-determined, usually by the crypto system that uses them. This means that even a very small number (let's take 1 as a ridiculous example) would have the same size payload as a much larger number, and calculations using them would perform across all payload bits of all input numbers combined. These fixed size numbers primarly differ from BIGNUMs in that once they have been allocated to a certain size, that size will not change throughout its lifetime. Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from openssl#28522)
For the longest time, we have mitigated security issues related to large
numbers (
BIGNUM) and constant time in a piece-meal fashion, without reallylooking at the problem from a zoomed out, holistic perspective.
An interesting aspect in this problem is that large numbers can vary in
size, and that depending on their combined sizes, the time to perform
mathematical calculations with them vary equally much, and may thereby
unintentionally leak information on those numbers.
To mitigate that sort of timing issue, we introduce fixed size numbers,
which are designed to have payload sizes that are pre-determined, usually by
the crypto system that uses them. This means that even a very small number
(let's take 1 as a ridiculous example) would have the same size payload as a
much larger number, and calculations using them would perform across all
payload bits of all input numbers combined.
These fixed size numbers primarly differ from BIGNUMs in that once they have
been allocated to a certain size, that size will not change throughout its
lifetime.