New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement equal_const_time/2 as nif #2749
Conversation
- add C implementation for equal_const_binary/2 (delegates to CRYPTO_memcmp) - modify crypto.erl to delegate to nif implementation when arguments are binaries, otherwise go with the existing implementation for lists. - add minimal tests.
if (!enif_inspect_binary(env, argv[1], &s2)) | ||
goto bad_arg; | ||
|
||
if (s1.size != s2.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the erlang list implementation and the nif implementation differ currently (i.e., the length of the strings isn't checked explicitly in the erlang implementation and probably for good reason), but the behavior remains the same for both.
equal_const_time() -> | ||
[{doc, "Test equal_const_time"}]. | ||
equal_const_time(Config) when is_list(Config) -> | ||
true = crypto:equal_const_time(<<"">>, <<"">>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do better tests surely, but will wait for approval.
Example erlang implementation provided should that be preferred over the NIF or along side it on
The implementation is based off the implementation in Plug.Crypto by @whatyouhide |
Thank you @starbelly for working on this! FWIW, I think there is an argument to make this EDIT: however there is an argument that the NIF should delegate to the OpenSSL implementation of said functionality. If that's the case, then it obviously belongs in crypto. :) |
I understands the benefits of a dependency-free NIF, but such code would have to be carefully maintained. OpenSSL and other libraries that implement such a function often implement this function in assembly, to avoid getting bitten by compiler optimisations, and fallback implementations in C have to resort to features (such as EDIT: response written before the edit of the previous comment :) |
@josevalim @voltone To be clear I'd like to have both 😁 I don't see any reason why there can't be so long as the documentation is clear. |
Co-authored-by: José Valim <jose.valim@gmail.com>
First of all, thanks for engaging in speeding up the "constant"-time check, a contribution is valuable and wanted! However, the CRYPTO_memcmp is not documented in other OpenSSL versions than 1.1.1, not even the 3.0 that is soon in beta state. LibreSSL documents it. The function is available in all versions we have to support, but relying on a not documented function is not good. The CRYPTO_memcmp is a very simple implementation looping over the two strings OR-ing the comparison - no fancy stuff and I can't see any assembler support, but that might of course come later. Having two functions in two modules both with the same functionality is something I dislike and can't see the benefit of, so I would prefer having it in just one module. If you don't have a cryptolib like OpenSSL you don't have the crypto erlang module either. |
In theory, yes. In practice, you have to be wary of compiler optimisations that short-circuit the comparison loop, removing the constant-time property and therefore the justification for having such a function in the first place. That's why some crypto libraries implement this in assembly: not for additional speed, but to more reliably rule out such compiler optimisations. Please have a look at the OpenSSL discussion @starbelly mentioned. That should convince you this is a non-trivial function to get right and to maintain. The fact that CRYPTO_memcmp is not public is indeed a problem, I don't really know how to get past that... |
@HansN Let me offer two options per what you stated :
With either option we would delegate to If one of the options listed above is desired I will gladly revise this PR or open a new one. |
@starbelly If you go down the pure erlang path, it is best to not rely on a length check being equal, to ensure the comparison is constant time (bounded based on the input size):
|
Especially since it would not be too hard to add a test that checks if the comparison is constant time e.g. if two extreme cases are compared (equal vs. first byte different on a long binary) |
Arguably those may not actually be constant time since you'll still bail out early if the first list is shorter: by gradually increasing the length of the second string, you can guess the size of the first one; by reducing the length of the first one you can guess the size of the second one as well. Your algorithm needs to do both types of comparisons, something like the following (for binaries): verify_in_constant_time(X, Y) ->
verify_in_constant_time(X, Y, 0).
verify_in_constant_time(<<X, RestX/binary>>, <<Y, RestY/binary>>, Result) ->
verify_in_constant_time(RestX, RestY, (X bxor Y) bor Result);
verify_in_constant_time(<<X, RestX/binary>>, <<>>, Result) ->
verify_in_constant_time(RestX, <<>>, (X bxor -1) bor Result);
verify_in_constant_time(<<>>, <<Y, RestY/binary>>, Result) ->
verify_in_constant_time(<<>>, RestY, (Y bxor -1) bor Result);
verify_in_constant_time(<<>>, <<>>, Result) ->
Result == 0. Regardless, the type of operations vary (comparing immediate values like This is not a huge issue because size is generally not considered a problem since these comparison functions should generally be used on hashed on encrypted values that are expected to have a fixed size; the size being known does not leak significant amounts of information in such cases. The one weird case where you end up caring about length being information is on more oddball cases like doing comparisons of cleartext passwords where the differences between the submitted and expected string suddenly becomes significant. Those should usually not be used either though. |
Yeah, I originally had it written without a length check for that exact reason, but consulting with some, people tended to prefer checking length with the argument being the bad actor would most likely known the length of the input. IMO, we should not check the length though. EDIT: I'd add that a bad actor knowing the length of an input your checking against is not always the case. EDIT: The alternative is to check the length AFTER you cycle through the input your checking against to avoid blunder. |
@ferd The first list is always the external data input. With the external data input controlling the constant time it isn't possible to look at the timing results to determine the correct size for further brute-forcing. It makes sense that it shouldn't be a huge issue in practice if the data is always using the same hash, but if other hashes are used then you would care more. With data likely needing to change the hash algorithm gradually as it ages, it is good to avoid leaking the length. |
@HansN I have pushed up an implementation of |
|
||
size1 = binary_size(bin1); | ||
|
||
if (size1 != binary_size(bin2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been discussion around whether to check the length or not. I have also suggested checking the length after the loop. The rationale behind checking the length at that point is a) we avoid leaking length information b). it acts as a safe guard in case a user has mixed up the argument order.
|
||
T2 = erlang:convert_time_unit(erlang:monotonic_time(), native, microsecond), | ||
|
||
%% Given a 10 megabyte binary we can say with a fair amount of certainty that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test and comment might need some more thought. On an intel i9 it takes between 12 to 14 milliseconds to complete the op, so I think we can say even if you have a processor that is twice as fast it will still take 6 to 8. I would think that a bif call would never take over 1ms, but we could increase the comparison to => 3
if we are concerned about that. Note that I tested the sad path by putting a break after the first iteration in the loop. Mostly saw < 1ms, a few times saw 1ms.
goto bad_arg; | ||
|
||
ERTS_GET_BINARY_BYTES(bin1, bytes1, bitoffs1, bitsize1); | ||
ERTS_GET_BINARY_BYTES(bin2, bytes2, bitoffs2, bitsize2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro will do a realloc of the binary if it is an un-aligned sub binary, which means that it will take time depending on the size of the binary. Come to think about it, so will enif_inspect_binary.
Unaligned subbinaries is what Bin
is if you do this:
<<_:1, Bin:1024/binary, _/bitstring>> = large_binary().
Not sure what do want to do when you get such a binary, either badarg or do some other trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garazdawi is the concern around performance or just general oddities that could happen as a result of unaligned subbinaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern is that the realloc will take time proportional to the size of the binary. So if you want to have a constant time compare, you should think about this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Let me do some spelunking.
@starbelly wrote:
I think it is hard to motivate a new BIF (alt 1) for this very specialized purpose. I personally is happy with:
|
@HansN wrote:
Do you mean not doing a BIF for binary (i.e., erl_bif_binary.c) ? I didn't see any NIFs for binary... Can you clarify?
Doing some tests in erlang showed that odds of deviation was much greater than compared to the C implementation, in particularly with unaligned sub binaries (differences >= 10 microseconds). With C, even with unaligned sub-binaries, the results were more consistent and of course faster (less chance of deviation, but generally only by a microsecond when it happens). Thus, C is the best way to go IMHO.
At this point I definitely agree and I think this should pull request should be renamed, etc. and what happens with the undocumented equal_const_time/2 function should happen in another PR or just handled by OTP team. It's only used by ssh (lib/ssh/src/ssh_auth.erl and lib/ssh/src/ssh_transport.erl) to note. Would you prefer a new PR or this one to be renamed? I think a new PR might be cleaner since the branch name would reflect the intent and can close but reference this one in the PR for the history. |
I agree that a new PR is preferred. |
Closing in favor of #2778 |
This PR implements equal_const_time/2 as a nif via delegation to CRYPTO_memcmp and intends to expose this function as part of the public api via documentation. If this PR is approved I will gladly document the function as part of the public API.
Discussions around this PR were had with @josevalim , @voltone , and @peerst of the erlef security working group.
One particular topic that I feel is worth discussing and possibly adjusting this PR for is around having either a pure erlang implementation of a constant time comparisson function on
binary
assecure_compare/2
or as BIF on binary of the same name and arity. The idea for having this in addition tocrypto:equal_const_time/2
is prompted by the case where one does not have crypto (openssl) available. This option may even be more advantageous with BeamAsm on the horizon, such that the nif implementation is not desired at all. However, no benchmarks were done using BeasmAsm against the NIF or BIF implementations.I have implemented the BIF but did not include it in this PR initially but am happy to do so either in this PR or a subsequent PR. Likewise, if a BIF is undesirable I'm also happy to add an erlang implementation on
binary
either as part of this PR or a subsequent PR.It should be noted that the BIF implementation was found to be twice as fast as the NIF implementation at a cursory glance, however, there is concern around maintaining the implementation, specifically around keeping up with compilers not respecting the
volatile
qualifier (see openssl/openssl#102 for more details around this subject).Benchmarks for the NIF implementation
Benchmarks were performed with erlperf and as with all benchmarks take the ones below with a grain of salt, but they give a good idea of the improvements.For brevity I did not include all my runs which measured function calls with more samples and in parallel (i.e., new nif impl vs old erlang impl). I'm happy to provide these if desired, but the quick benchmarks below show the same findings.
Findings for binaries
The performance improvement was found to be significant.
Original erl implementation:
New NIF implementation:
Findings for strings
A performance decrease was observed for small strings, only at 64 byte strings was performance roughly the same. As such
equal_const_time/2 as part of this PR will only delegate to the nif for binaries.
Original erl implementation:
New NIF Implementation