Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/bits: guarantee Add, Sub, Mul, RotateLeft, ReverseBytes to be constant-time operations #31267

Open
TuomLarsen opened this issue Apr 5, 2019 · 37 comments

Comments

Projects
None yet
@TuomLarsen
Copy link

commented Apr 5, 2019

I'm a heavy non-crypto user of bits.* and I'm afraid changes like #31229 will effect my performance. One way of keeping crypto people happy and users like me would be to have separate functions for both non-constant time and constant time addition, multiplication, ...

In the issue above I was reminded that the change affects only the fallbacks. I'm afraid promising that the fallbacks are constant time will later lead to constant time native implementations which could very well be slower than their non-constant time alternatives. For example, platforms which don't support carry flag are RISC-V and WebAssembly and it could happened that "a branch works out better" [1].

[1] https://gmplib.org/manual/Assembly-Carry-Propagation.html

@gopherbot gopherbot added this to the Proposal milestone Apr 5, 2019

@gopherbot gopherbot added the Proposal label Apr 5, 2019

@TuomLarsen TuomLarsen changed the title proposal: bits.ContantTimeAdd, bits.ContantTimeMul proposal: bits.ConstantTimeAdd, bits.ConstantTimeMul Apr 5, 2019

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

RISC-V can generate the carry in constant time using a SLTU instruction.
Web assembly has a similar instruction. I don't know whether it guarantees constant time or not.

@smasher164

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

The change for #31229 results in a neglible (if any) impact on performance. Also keep in mind that the software fallback would rarely be executed. Here is the output for benchstat:

name             old time/op  new time/op  delta
Add-4            1.44ns ± 0%  1.63ns ± 0%   ~     (p=1.000 n=1+1)
Add32-4          1.08ns ± 0%  1.05ns ± 0%   ~     (p=1.000 n=1+1)
Add64-4          1.46ns ± 0%  1.43ns ± 0%   ~     (p=1.000 n=1+1)
Add64multiple-4  0.73ns ± 0%  0.99ns ± 0%   ~     (p=1.000 n=1+1)
Sub-4            1.42ns ± 0%  1.42ns ± 0%   ~     (all equal)
Sub32-4          1.08ns ± 0%  1.47ns ± 0%   ~     (p=1.000 n=1+1)
Sub64-4          1.42ns ± 0%  1.47ns ± 0%   ~     (p=1.000 n=1+1)
Sub64multiple-4  0.72ns ± 0%  0.72ns ± 0%   ~     (p=1.000 n=1+1)
@bmkessler

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

@smasher164 FYI, that benchmark looks to be testing the intrinsic path and not the pure Go fallback. When you test the pure Go version, you should run multiple counts to compare significant changes.

If carries are unpredictable (adding random numbers), then the constant time (branchless) code will likely be faster than the prior implementation because the carry branches will be poorly predicted.

@smasher164

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

name             old time/op  new time/op  delta
Add-4            1.16ns ±11%  1.51ns ± 5%  +30.52%  (p=0.008 n=5+5)
Add32-4          1.08ns ± 0%  1.03ns ± 1%   -4.86%  (p=0.029 n=4+4)
Add64-4          1.09ns ± 1%  1.95ns ± 3%  +79.23%  (p=0.008 n=5+5)
Add64multiple-4  4.03ns ± 1%  4.55ns ±11%  +13.07%  (p=0.008 n=5+5)
Sub-4            1.08ns ± 1%  1.50ns ± 0%  +38.17%  (p=0.016 n=5+4)
Sub32-4          1.09ns ± 2%  1.53ns ±10%  +40.26%  (p=0.008 n=5+5)
Sub64-4          1.10ns ± 1%  1.47ns ± 1%  +33.39%  (p=0.008 n=5+5)
Sub64multiple-4  4.30ns ± 2%  4.08ns ± 4%   -5.07%  (p=0.032 n=5+5)

It seems that Add64 is the most significantly impacted benchmark, while the others have a ~30% slowdown.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

I prefer making the default secure, and if necessary offer a VariableTimeAdd which communicates the risk. This also has the advantage of letting us cross that bridge when and if we come to it. (That is, if it ever turns out a native implementation would actually be faster if not constant time.)

@TuomLarsen

This comment has been minimized.

Copy link
Author

commented Apr 8, 2019

If Add64 is implemented as ADC on platforms which have such instruction, there is nothing "variable" in it. I think "constant" communicates better that it is always a constant time operation, no matter if ADC or a fallback is used.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 8, 2019

For someone doing a code review, it's reasonable to expect bits.Add to be constant time, and it requires prior knowledge to know that no, you should use bits.ConstantTimeAdd instead. Making bits.Add the constant time one solves this problem. No one will be upset to find out that in fact VariableTimeAdd is constant time sometimes, the other way around is not true.

When there are a safe and an unsafe API, the unsafe one must be the qualified one.

@TuomLarsen

This comment has been minimized.

Copy link
Author

commented Apr 8, 2019

It is only true for crypto, constant time has no use for every other domain. By that perspective it's reasonable to expect that bits.Add adds two numbers, constant time or not.

@smasher164

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

The choice of preserving semantics and timing by default should depend on the chance that the fallback is executed. For example, the FMA intrinsic proposed in #25819 has a decent chance of hitting the fallback on x86, arm, mips, and mips64. Additionally, the cost of using the FMA software fallback is orders of magnitude slower than a simple multiply and add. On the other hand, the chance of the bits.Add fallback being executed is much lower, given the add-with-carry instructions on most architectures. Additionally, the slowdown introduced by a constant time fallback is in the worst-case a factor of two.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

I find it appropriate that the FMA functions live in math, while Add, Sub and Mul live in math/bits.

That allows us to build the expectation that math/bits is constant time unless the name is screaming otherwise, which is not true of math, or math/big.

I am firmly convinced that even if safety is only relevant to one use case, the default should be safe, and an unsafe version can be added with a clear name. So if anyone thinks it's already time to add VariableTimeAdd, VariableTimeSub and VariableTimeMul, we can make this proposal about that. Otherwise, I think we should close this for now.

@jimmyfrasche

This comment has been minimized.

Copy link
Member

commented Apr 9, 2019

@FiloSottile operations in math/bits are not documented as being constant time so I did not have that expectation, though it makes sense now that you say it. Should it be mentioned in the package documentation?

@TuomLarsen

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

@smasher164 The chance of hitting a fallback might change over time, as hardware advances. I'm thinking about FMA the other way around: the compiler should do with a*b+c whatever it pleases but when explicitly asking for FMA it should do just one rounding. Meaning Add should do whatever is needed to make it fast, and ConstantTimeAdd would guarantee what it says.

@FiloSottile Obviously we don't agree on this one but I find you want to close it way too early, given how controversial the issue is (based on number of up- and down-votes). I would also find interesting what other core devs think about this.

Regarding of whether to call something "constant time" or "variable time". I think the most common use case should take precedence in naming to the special case. For example, a "hash function" is usually called just that as evidenced on Wikipedia, whereas to signify the importance of other properties, one usually calls it "cryptographic hash function". Meaning Add should do whatever compiler pleases to make it performant and only "crypto" Add should guarantee constant time operation.

So far we only discussed Add and Mul but there is also Div. At least on z86 DIVQ instruction has variable latency so I would assume it always needs to call a fallback, in order to preserve constant time performance.

One of the goals of #21835 @robpike stated is to "make math/bits support its operations more efficiently through the compiler". This will make it less performant, not more. In the same thread, @josharian implemented PCG using math/bits and there is no need for constant time Add as the generator is not intended for crypto in the first place.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2019

The chance of hitting a fallback might change over time, as hardware advances.

That chance is only going to go down. Once the intrinsic is implemented, it isn't going to be reverted (unless an instruction is removed from an instruction set, which never happens).

I would also find interesting what other core devs think about this.

I see this as pretty much a non-issue. Practically speaking, no one is using fallbacks (and if they are, it isn't hard to contribute intrinsics). Processors today need to provide constant-time implementations of these operations for crypto libraries anyway, we might as well use them. And if you look at, say, the assembly code for math/big, all that assembly uses those constant-time implementations - there isn't a faster non-constant-time version that math/big feels like it needs to use.

Sure, theoretically, a new faster non-constant-time multiply might show up in x86 next year, or some new popular processor architecture might show up that only has non-constant-time ops. But both of those seem to me like really unlikely events that we shouldn't design our systems around.

So far we only discussed Add and Mul but there is also Div. At least on z86 DIVQ instruction has variable latency so I would assume it always needs to call a fallback, in order to preserve constant time performance.

I agree that div is a special case, and we probably don't want to (can't?) guarantee constant time.

@TuomLarsen

This comment has been minimized.

Copy link
Author

commented Apr 9, 2019

Or, let the constant time operations live in crypto. There is already hash/fnv and there is crypto/sha1, math/rand and crypto/rand. And of course crypto/subtle, with ConstantTimeCompare, ConstantTimeCopy, ConstantTimeLessOrEq, ... There could also very well be math/bits and crypto/bits as they similarly serve different purposes. Or they could share the space in subtle with other ContantTime* functions.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

"Security" is not a good enough reason to penalize real Go code with an 80% slowdown
(as in the Add64 in the CL).

That said, the intention from the start was that you'd get intrinsics in any real Go code.
Any time the fallback is too slow the answer is certainly "get intrinsics added on that architecture."
Slowing down the fallback by 80% to match the constant-time semantics of the intrinsics
is OK. But should we define that the intrinsics are constant-time?

The open question then is whether defining these to be constant time
would fundamentally slow down any intrinsic implementation (perhaps one
yet to be written). It doesn't seem like it should.
Can someone confirm that intrinsics would not be affected at all
(even on other systems yet to be implemented) if we change the
definition of math/bits to be constant-time for these operations?

Thanks.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

I can confirm that all of our architectures have instructions that can be used for constant time add and sub (with the caveat that it's hard to tell whether wasm ops are really constant time). Those are the same instructions as would be used for cases that didn't need constant time (in math/big, say).

x86 intrinsics do constant time mul. I don't know for sure about other architectures, but my suspicion is they do. Again, these are the same instructions as would be used in non-constant-time situations.
(If mul wasn't constant time - what would crypto libraries use?)

The slowdown is a lot less than 80%. That number might be old, before some optimizations went in. Disabling intrinsics and comparing CL 170758 before and after, I get much more equivocal results:

name             old time/op  new time/op  delta
Add-8            1.04ns ± 1%  1.30ns ± 1%  +25.79%         (p=0.000 n=9+10)
Add32-8          1.04ns ± 0%  0.79ns ± 1%  -24.44%          (p=0.000 n=7+9)
Add64-8          1.04ns ± 1%  1.31ns ± 0%  +25.48%         (p=0.000 n=10+8)
Add64multiple-8  4.52ns ± 2%  2.88ns ± 1%  -36.24%         (p=0.000 n=10+8)
Sub-8            1.04ns ± 2%  1.30ns ± 1%  +24.23%        (p=0.000 n=10+10)
Sub32-8          1.04ns ± 0%  1.04ns ± 0%     ~     (all samples are equal)
Sub64-8          1.05ns ± 0%  1.04ns ± 0%   -0.95%          (p=0.006 n=7+9)
Sub64multiple-8  4.29ns ± 3%  2.85ns ± 0%  -33.50%          (p=0.000 n=9+8)
@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

Those benchmarks also favor the branching implementations, because the branch is perfectly predictable. In real world uses, the branch is perfectly random.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

"Security" is not a good enough reason to penalize real Go code with an 80% slowdown (as in the Add64 in the CL).

@rsc Again, I am not arguing not to provide the fast option. My argument is that if we need to have two intrinsics, one constant time and one variable time, the default one should be the safe one, and the fast and unsafe one should be the qualified one, not the other way around.

If you have a "slow" Add in a hot loop, the profiler will tell you, and you can go replace it with a faster unsafe version if appropriate. If you have a variable time Add in a crypto implementation, nothing will tell you.

Anyway, this sounds moot because it looks like there are not in fact two classes of intrinsics.

@crvv

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

I think the point here is not just about Add, Mul or Div.
I guess no one is worry about the speed of the constant-time Add or Mul. And Div is not the only one that the constant-time algorithm is not trivial.

The question is whether math/bits should be guaranteed to be constant-time.

There are some options.

  1. All functions in math/bits are constant-time. The downside is that some functions like Div will be very slow. This is the point of the issue.
  2. Some functions in math/bits are constant-time and some are not. It will be documented.
  3. No function in math/bits is guaranteed to be constant-time.
  4. Add two div functions, Div and ConstantTimeDiv. And there is only one Add.
  5. Add two div functions, VariableTimeDiv and Div. And there is only one Add.
  6. Add two functions for all operations in math/bits. There will be Add and ConstantTimeAdd.
  7. Add two functions for all operations in math/bits. There will be VariableTimeAdd and Add.

Obviously, 1 is not OK.
2 and 4 are not safe.
For 5, almost everyone will use bits.VariableTimeDiv, which is kind of weird. And I need to remember that there is bits.VariableTimeDiv but there isn't bits.VaraibelTimeAdd.
I think 6 and 7 are worse than two packages, maybe math/bits and crypto/bits.

So I think 3 is the acceptable option. Maybe a new package crypto/bits will be added later.

@crvv

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

If you have a variable time Add in a crypto implementation, nothing will tell you.

There are some discussions about how to test whether the code is constant-time.
#20654 (comment)

@TuomLarsen

This comment has been minimized.

Copy link
Author

commented Apr 11, 2019

@rsc I double-checked with Agner Fog's "Instruction tables" and can confirm that there is no constant-time Div64 instruction in any of the recent microarchitectures by Intel or AMD, nor in ARM Cortex A57/A72 according to their "Software Optimization Guide", which includes the following:

Integer divides are performed using a iterative algorithm and block any subsequent divide operations until complete. Early termination is possible, depending upon the data values.

@robpike

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2019

One option would be to have a separate constant-time package.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

I feel like we are solving a problem that does not exist yet. It looks like all current architectures have fast, constant-time instructions for implementing intrinsics. That should make the fallbacks nearly irrelevant, so it shouldn't matter their performance.

I would be more than happy with a package comment like the following

// All operations except Div are implemented in constant time.
// Future additions might not be constant time.

This avoids locking ourselves into an answer for now, so we can make a decision later with more information if something changes, instead of speculating. And in the meantime no cryptography implementor expects division to be constant time anyway.

@TuomLarsen

This comment has been minimized.

Copy link
Author

commented Apr 16, 2019

One either guarantees that a function or package is constant time, or one does not. "Future additions might" means that backward compatibility could be broken, which of course cannot be allowed. The very issue is about "locking ourselves into an answer" and a much better solution is to let crypto functions live in crypto package.

@crvv

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Actually, this constant-time algorithm in math/bits was discussed and rejected before.
#18616 (comment)

The reason to reject is simple.
math/bits is for performance, but generally, the constant-time algorithm is not the fast algorithm.
Besides Div, the constant-time algorithm for LeadingZeros is not trivial. I don't know whether it is faster or slower than the current one. But it is also an issue.

@crvv

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

As for constant-time cryptography code, I think it is much important to have the tool to test whether the code is constant-time.
With that tool, anyone can use any library to write the code and verify that the code is constant-time.
Without that tool, a constant-time math/bits isn't very helpful because it is still very difficult to verify the code for the entire encryption is constant-time.

@crvv

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

And this issue is an example which #17373 is trying to solve.

the Go 1 compatibility makes designing good abstraction pretty hard, and
because the nature of this problem, we can't design the API in a package in
a subrepo (e.g. x/simd) like context or http2 and later move it into std
when it's mature.

Because user-defined assembly functions can't be intrinsicified, the code must be placed in stdlib.
Then it is controversial because it becomes a compatibility issue.

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

As far as I know, making a tool to ensure constant time code is an unsolved problem in cryptography engineering, at least in the general case. The attempts I know about are also dynamic, not static, so not really an option for cross-platform development.

Also, the Go 1 Compatibility Promise applies to functions, not packages, so "Future additions might not be constant time." is not at all a violation, as long as the functions that exist stay constant time.

@crvv

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

I agree that it is not a violation.
But I don't think it is a good idea to have "Future additions might not be constant time" in the document.

And compatibility promise means the functions must be constant-time forever.
If someone comes up with a faster LeadingZeros algorithm, the problem becomes
Is that acceptable to have a new faster VariableTimeLeadingZeros?

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

It doesn't sound like we should say "All but these" are constant time. Enumerate the ones we want to be constant-time, document each one as such, and then don't worry about having to say "Future ones may not fit the blanket rule." It doesn't seem like LeadingZeros64 is particularly easy to do in constant time, for example (absent hardware support).

It would be fine to just start with Add, Sub, Mul (certainly not Div). Filippo, are there others that are important?

@rsc rsc changed the title proposal: bits.ConstantTimeAdd, bits.ConstantTimeMul proposal: math/bits: document Add, Sub, Mul as constant-time operations Apr 24, 2019

@FiloSottile

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

Sounds good to me. (Including sized variants, of course.)

We definitely use RotateLeft, too, and I can see a use for ReverseBytes. Their fallbacks are already constant time.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

OK, then let's start with Add, Sub, Mul, RotateLeft, ReverseBytes for now (including sized variants).
Accepted.

@rsc rsc changed the title proposal: math/bits: document Add, Sub, Mul as constant-time operations math/bits: document Add, Sub, Mul, RotateLeft, ReverseBytes as constant-time operations Apr 25, 2019

@rsc rsc changed the title math/bits: document Add, Sub, Mul, RotateLeft, ReverseBytes as constant-time operations math/bits: guarantee Add, Sub, Mul, RotateLeft, ReverseBytes to be constant-time operations Apr 25, 2019

@rsc rsc added NeedsFix and removed Documentation labels Apr 25, 2019

@rsc rsc modified the milestones: Proposal, Go1.14 Apr 25, 2019

@rsc

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

I removed the Documentation label because there is code work to do too in the fallbacks.
I milestoned it Go 1.14, but if it goes into Go 1.13 that's fine too.

@smasher164

This comment has been minimized.

Copy link
Contributor

commented May 19, 2019

It seems that the ReverseBytes and RotateLeft fallbacks are already constant time. Should the change then modify Add and Sub, while documenting Add, Sub, Mul, RotateLeft, and ReverseBytes to be constant time?

@gopherbot

This comment has been minimized.

Copy link

commented May 20, 2019

Change https://golang.org/cl/170758 mentions this issue: math/bits: make Add and Sub fallbacks constant time

@gopherbot gopherbot closed this in 5ca44dc May 20, 2019

@FiloSottile

This comment has been minimized.

Copy link
Member

commented May 20, 2019

Ah, right, reopening to document the property. I'll send a CL tomorrow.

@gopherbot

This comment has been minimized.

Copy link

commented May 20, 2019

Change https://golang.org/cl/178177 mentions this issue: math/bits: document that Add, Sub, Mul, RotateLeft, ReverseBytes are constant time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.