Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement equal_const_time/2 as nif #2749

Closed
wants to merge 3 commits into from

Conversation

starbelly
Copy link
Contributor

@starbelly starbelly commented Sep 15, 2020

This PR implements equal_const_time/2 as a nif via delegation to CRYPTO_memcmp and intends to expose this function as part of the public api via documentation. If this PR is approved I will gladly document the function as part of the public API.

Discussions around this PR were had with @josevalim , @voltone , and @peerst of the erlef security working group.

One particular topic that I feel is worth discussing and possibly adjusting this PR for is around having either a pure erlang implementation of a constant time comparisson function on binary as secure_compare/2 or as BIF on binary of the same name and arity. The idea for having this in addition to crypto:equal_const_time/2 is prompted by the case where one does not have crypto (openssl) available. This option may even be more advantageous with BeamAsm on the horizon, such that the nif implementation is not desired at all. However, no benchmarks were done using BeasmAsm against the NIF or BIF implementations.

I have implemented the BIF but did not include it in this PR initially but am happy to do so either in this PR or a subsequent PR. Likewise, if a BIF is undesirable I'm also happy to add an erlang implementation on binary either as part of this PR or a subsequent PR.

It should be noted that the BIF implementation was found to be twice as fast as the NIF implementation at a cursory glance, however, there is concern around maintaining the implementation, specifically around keeping up with compilers not respecting thevolatile qualifier (see openssl/openssl#102 for more details around this subject).

Benchmarks for the NIF implementation

Benchmarks were performed with erlperf and as with all benchmarks take the ones below with a grain of salt, but they give a good idea of the improvements.For brevity I did not include all my runs which measured function calls with more samples and in parallel (i.e., new nif impl vs old erlang impl). I'm happy to provide these if desired, but the quick benchmarks below show the same findings.

Findings for binaries

The performance improvement was found to be significant.

Original erl implementation:

$ erlperf 'crypto:old_equal_const_time(<<"onexxxxx">>, <<"twoxxxxx">>).'
Code                                                               ||        QPS     Rel
crypto:old_equal_const_time(<<"onexxxxx">>, <<"twoxxxxx">>).        1    1522 Ki    100%

New NIF implementation:

$erlperf 'crypto:equal_const_time(<<"onexxxxx">>, <<"twoxxxxx">>).'
Code                                                             ||        QPS     Rel
crypto:equal_const_time(<<"onexxxxx">>, <<"twoxxxxx">>).          1    5909 Ki    100%

Findings for strings

A performance decrease was observed for small strings, only at 64 byte strings was performance roughly the same. As such
equal_const_time/2 as part of this PR will only delegate to the nif for binaries.

Original erl implementation:

$ erlperf 'crypto:old_equal_const_time("onexxxxx", "twoxxxxx").'
Code                                                         ||        QPS     Rel
crypto:old_equal_const_time("onexxxxx", "twoxxxxx").          1    5267 Ki    100%

New NIF Implementation

$ erlperf 'crypto:equal_const_time("onexxxxx", "twoxxxxx").'
Code                                                     ||        QPS     Rel
crypto:equal_const_time("onexxxxx", "twoxxxxx").          1    2644 Ki    100%

- add C implementation for equal_const_binary/2 (delegates to CRYPTO_memcmp)

- modify crypto.erl to delegate to nif implementation when arguments
   are binaries, otherwise go with the existing implementation for lists.

- add minimal tests.
if (!enif_inspect_binary(env, argv[1], &s2))
goto bad_arg;

if (s1.size != s2.size)
Copy link
Contributor Author

@starbelly starbelly Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the erlang list implementation and the nif implementation differ currently (i.e., the length of the strings isn't checked explicitly in the erlang implementation and probably for good reason), but the behavior remains the same for both.

equal_const_time() ->
[{doc, "Test equal_const_time"}].
equal_const_time(Config) when is_list(Config) ->
true = crypto:equal_const_time(<<"">>, <<"">>),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do better tests surely, but will wait for approval.

@starbelly
Copy link
Contributor Author

starbelly commented Sep 15, 2020

Example erlang implementation provided should that be preferred over the NIF or along side it on binary as secure_compare/2 :

secure_compare(L, R) when byte_size(L) == byte_size(R) ->
   secure_compare(L, R, 0);
secure_compare(_L, _R) -> false.
    
secure_compare(<<X1/integer,L1/binary>>, <<Y1/integer,R1/binary>>, Acc) ->
    Xorred = X1 bxor Y1,
    secure_compare(L1, R1, Acc bor Xorred);
secure_compare(<<>>, <<>>, Acc) ->
    Acc =:= 0.

The implementation is based off the implementation in Plug.Crypto by @whatyouhide

@josevalim
Copy link
Contributor

josevalim commented Sep 15, 2020

Thank you @starbelly for working on this! FWIW, I think there is an argument to make this binary:constant_compare/2 even if the implementation is a NIF. There are other native functions in the binary module and making this available without a need to add a dep on crypto can be a nice positive.

EDIT: however there is an argument that the NIF should delegate to the OpenSSL implementation of said functionality. If that's the case, then it obviously belongs in crypto. :)

@voltone
Copy link
Contributor

voltone commented Sep 15, 2020

FWIW, I think there is an argument to make this binary:constant_compare/2 even if the implementation is a NIF.

I understands the benefits of a dependency-free NIF, but such code would have to be carefully maintained. OpenSSL and other libraries that implement such a function often implement this function in assembly, to avoid getting bitten by compiler optimisations, and fallback implementations in C have to resort to features (such as volatile variables) that are not consistently implemented across compilers. There is also an argument to be made to delegate this to proven implementations, is what I'm saying.

EDIT: response written before the edit of the previous comment :)

@HansN HansN self-assigned this Sep 15, 2020
@HansN HansN added the team:PS Assigned to OTP team PS label Sep 15, 2020
@starbelly
Copy link
Contributor Author

@josevalim @voltone To be clear I'd like to have both 😁 I don't see any reason why there can't be so long as the documentation is clear.

Co-authored-by: José Valim <jose.valim@gmail.com>
@HansN
Copy link
Contributor

HansN commented Sep 16, 2020

First of all, thanks for engaging in speeding up the "constant"-time check, a contribution is valuable and wanted!

However, the CRYPTO_memcmp is not documented in other OpenSSL versions than 1.1.1, not even the 3.0 that is soon in beta state. LibreSSL documents it. The function is available in all versions we have to support, but relying on a not documented function is not good. The CRYPTO_memcmp is a very simple implementation looping over the two strings OR-ing the comparison - no fancy stuff and I can't see any assembler support, but that might of course come later.

Having two functions in two modules both with the same functionality is something I dislike and can't see the benefit of, so I would prefer having it in just one module. If you don't have a cryptolib like OpenSSL you don't have the crypto erlang module either.

@voltone
Copy link
Contributor

voltone commented Sep 16, 2020

The CRYPTO_memcmp is a very simple implementation looping over the two strings OR-ing the comparison - no fancy stuff and I can't see any assembler support, but that might of course come later.

In theory, yes. In practice, you have to be wary of compiler optimisations that short-circuit the comparison loop, removing the constant-time property and therefore the justification for having such a function in the first place. That's why some crypto libraries implement this in assembly: not for additional speed, but to more reliably rule out such compiler optimisations.

Please have a look at the OpenSSL discussion @starbelly mentioned. That should convince you this is a non-trivial function to get right and to maintain.

The fact that CRYPTO_memcmp is not public is indeed a problem, I don't really know how to get past that...

@starbelly
Copy link
Contributor Author

@HansN Let me offer two options per what you stated :

  1. Implement this function as a BIF (binary:secure_compare/2) Yes, there can be pitfalls with compilers per the OpenSSL discussion, but I also think these are old issues (AFAIK most volatile issues fixed in all major compilers around 2008). New compiler bugs (optimizations) could come up, but so could a lot of things. Thus, I think it's manageable, and aside from the volatile issue, as you stated it is a trivial function.

  2. Implement this function as pure erlang (binary:secure_compare/2)

With either option we would delegate to binary:secure_compare/2 within equal_const_time/2. Whether or not to even document equal_const_time/2 with the addition of binary:secure_compare/2 I'm not sure, I suppose that depends on how common and valuable the case for strings with such a function is.

If one of the options listed above is desired I will gladly revise this PR or open a new one.

@okeuday
Copy link
Contributor

okeuday commented Sep 16, 2020

@starbelly If you go down the pure erlang path, it is best to not rely on a length check being equal, to ensure the comparison is constant time (bounded based on the input size):

https://github.com/CloudI/CloudI/blob/c2ff9b3fc236d2c1ca639eb3cb3280a0dc240fab/src/lib/cloudi_core/src/cloudi_string.erl#L283-L293

compare_constant(Test, [_ | _] = Correct) ->
    compare_constant(Test, Correct, 0) =:= 0.

compare_constant([], [], Bits) ->
    Bits;
compare_constant([], [_ | _], _) ->
    1;
compare_constant([C | Test], [] = Correct, Bits) ->
    compare_constant(Test, Correct, Bits bor (C bxor -1));
compare_constant([C1 | Test], [C2 | Correct], Bits) ->
    compare_constant(Test, Correct, Bits bor (C1 bxor C2)).

@peerst
Copy link
Contributor

peerst commented Sep 16, 2020

  1. Implement this function as a BIF (binary:secure_compare/2) Yes, there can be pitfalls with compilers per the OpenSSL discussion, but I also think these are old issues (AFAIK most volatile issues fixed in all major compilers around 2008). New compiler bugs (optimizations) could come up, but so could a lot of things. Thus, I think it's manageable, and aside from the volatile issue, as you stated it is a trivial function.

Especially since it would not be too hard to add a test that checks if the comparison is constant time e.g. if two extreme cases are compared (equal vs. first byte different on a long binary)

@ferd
Copy link
Contributor

ferd commented Sep 16, 2020

@okeuday

If you go down the pure erlang path, it is best to not rely on a length check being equal, to ensure the comparison is constant time (bounded based on the input size):

Arguably those may not actually be constant time since you'll still bail out early if the first list is shorter: by gradually increasing the length of the second string, you can guess the size of the first one; by reducing the length of the first one you can guess the size of the second one as well. Your algorithm needs to do both types of comparisons, something like the following (for binaries):

verify_in_constant_time(X, Y) ->
    verify_in_constant_time(X, Y, 0).

verify_in_constant_time(<<X, RestX/binary>>, <<Y, RestY/binary>>, Result) ->
    verify_in_constant_time(RestX, RestY, (X bxor Y) bor Result);
verify_in_constant_time(<<X, RestX/binary>>, <<>>, Result) ->
    verify_in_constant_time(RestX, <<>>, (X bxor -1) bor Result);
verify_in_constant_time(<<>>, <<Y, RestY/binary>>, Result) ->
    verify_in_constant_time(<<>>, RestY, (Y bxor -1) bor Result);
verify_in_constant_time(<<>>, <<>>, Result) ->
    Result == 0.

Regardless, the type of operations vary (comparing immediate values like -1 instead of accessing them through a list iteration) and someone with knowledge of the algorithm should in theory be able to access it, and you'd need to demonstrate that recent compiler optimizations won't interfere with that scheme.

This is not a huge issue because size is generally not considered a problem since these comparison functions should generally be used on hashed on encrypted values that are expected to have a fixed size; the size being known does not leak significant amounts of information in such cases.

The one weird case where you end up caring about length being information is on more oddball cases like doing comparisons of cleartext passwords where the differences between the submitted and expected string suddenly becomes significant. Those should usually not be used either though.

@starbelly
Copy link
Contributor Author

starbelly commented Sep 16, 2020

@starbelly If you go down the pure erlang path, it is best to not rely on a length check being equal, to ensure the comparison is constant time (bounded based on the input size):

Yeah, I originally had it written without a length check for that exact reason, but consulting with some, people tended to prefer checking length with the argument being the bad actor would most likely known the length of the input. IMO, we should not check the length though.

EDIT: I'd add that a bad actor knowing the length of an input your checking against is not always the case.

EDIT: @okeuday @ferd 😁

EDIT: The alternative is to check the length AFTER you cycle through the input your checking against to avoid blunder.

@okeuday
Copy link
Contributor

okeuday commented Sep 16, 2020

@ferd The first list is always the external data input. With the external data input controlling the constant time it isn't possible to look at the timing results to determine the correct size for further brute-forcing.

It makes sense that it shouldn't be a huge issue in practice if the data is always using the same hash, but if other hashes are used then you would care more. With data likely needing to change the hash algorithm gradually as it ages, it is good to avoid leaking the length.

@starbelly
Copy link
Contributor Author

@HansN I have pushed up an implementation of binary:secure_compare/2 as bif as an alternative. If accepted, I would document this function and possibly delegate to this bif from crypto in the case of binaries for equal_const_time/2. I hope this finds you will.


size1 = binary_size(bin1);

if (size1 != binary_size(bin2))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been discussion around whether to check the length or not. I have also suggested checking the length after the loop. The rationale behind checking the length at that point is a) we avoid leaking length information b). it acts as a safe guard in case a user has mixed up the argument order.


T2 = erlang:convert_time_unit(erlang:monotonic_time(), native, microsecond),

%% Given a 10 megabyte binary we can say with a fair amount of certainty that
Copy link
Contributor Author

@starbelly starbelly Sep 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and comment might need some more thought. On an intel i9 it takes between 12 to 14 milliseconds to complete the op, so I think we can say even if you have a processor that is twice as fast it will still take 6 to 8. I would think that a bif call would never take over 1ms, but we could increase the comparison to => 3 if we are concerned about that. Note that I tested the sad path by putting a break after the first iteration in the loop. Mostly saw < 1ms, a few times saw 1ms.

goto bad_arg;

ERTS_GET_BINARY_BYTES(bin1, bytes1, bitoffs1, bitsize1);
ERTS_GET_BINARY_BYTES(bin2, bytes2, bitoffs2, bitsize2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro will do a realloc of the binary if it is an un-aligned sub binary, which means that it will take time depending on the size of the binary. Come to think about it, so will enif_inspect_binary.

Unaligned subbinaries is what Bin is if you do this:

<<_:1, Bin:1024/binary, _/bitstring>> = large_binary().

Not sure what do want to do when you get such a binary, either badarg or do some other trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garazdawi is the concern around performance or just general oddities that could happen as a result of unaligned subbinaries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern is that the realloc will take time proportional to the size of the binary. So if you want to have a constant time compare, you should think about this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Let me do some spelunking.

@HansN
Copy link
Contributor

HansN commented Sep 24, 2020

@starbelly wrote:

@HansN Let me offer two options per what you stated :

1. Implement this function as a BIF (`binary:secure_compare/2`) Yes, there can be pitfalls with compilers per the OpenSSL discussion, but I also think these are old issues (AFAIK most volatile issues fixed in all major compilers around 2008). New compiler bugs (optimizations) could come up, but so could a lot of things. Thus, I think it's manageable, and aside from the volatile issue, as you stated it is a trivial function.

2. Implement this function as pure erlang (`binary:secure_compare/2`)

With either option we would delegate to binary:secure_compare/2 within equal_const_time/2. Whether or not to even document equal_const_time/2 with the addition of binary:secure_compare/2 I'm not sure, I suppose that depends on how common and valuable the case for strings with such a function is.

If one of the options listed above is desired I will gladly revise this PR or open a new one.

I think it is hard to motivate a new BIF (alt 1) for this very specialized purpose. I personally is happy with:

  • a new function in binary and not in crypto since there seem to be a need for it even if no cryptolib (and as a consequence no crypto application) is available
  • either in Erlnag or in C is good for me
  • I see no need for a function crypto:equal_const_time/2 delegating to the new one except for compatibility reasons. So we probably need one for people outside OTP that uses it (if any). It could be implemented, deprecated in OTP-24.0, with removal in OTP-25.0

@starbelly
Copy link
Contributor Author

@HansN wrote:

I think it is hard to motivate a new BIF (alt 1) for this very specialized purpose. I personally is happy with:

Do you mean not doing a BIF for binary (i.e., erl_bif_binary.c) ? I didn't see any NIFs for binary... Can you clarify?

* a new function in binary and not in crypto since there seem to be a need for it even if no cryptolib (and as a consequence no crypto application) is available

* either in Erlnag or in C is good for me

Doing some tests in erlang showed that odds of deviation was much greater than compared to the C implementation, in particularly with unaligned sub binaries (differences >= 10 microseconds). With C, even with unaligned sub-binaries, the results were more consistent and of course faster (less chance of deviation, but generally only by a microsecond when it happens). Thus, C is the best way to go IMHO.

* I see no need for a function  crypto:equal_const_time/2 delegating to the new one except for compatibility reasons.  So we probably need one for people outside OTP that uses it (if any).  It could be implemented, deprecated in OTP-24.0, with removal in OTP-25.0

At this point I definitely agree and I think this should pull request should be renamed, etc. and what happens with the undocumented equal_const_time/2 function should happen in another PR or just handled by OTP team. It's only used by ssh (lib/ssh/src/ssh_auth.erl and lib/ssh/src/ssh_transport.erl) to note.

Would you prefer a new PR or this one to be renamed? I think a new PR might be cleaner since the branch name would reflect the intent and can close but reference this one in the PR for the history.

@HansN
Copy link
Contributor

HansN commented Sep 28, 2020

@starbelly

Would you prefer a new PR or this one to be renamed? I think a new PR might be cleaner since the branch name would reflect the intent and can close but reference this one in the PR for the history.

I agree that a new PR is preferred.

@starbelly
Copy link
Contributor Author

Closing in favor of #2778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants