New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite CBC unpad without data-dependent branches, mitigates lucky 13 (s... #49

Open
wants to merge 1 commit into
base: master
from

Conversation

4 participants
@hannesm
Member

hannesm commented Apr 10, 2014

...till needed: hmac computation with constant time in both success and failure mode)

rewrite CBC unpad without data-dependent branches, mitigates lucky 13…
… (still needed: hmac computation with constant time in both success and failure mode)
@vouillon

This comment has been minimized.

vouillon commented Apr 10, 2014

Maybe I'm missing something, but as far as I can see, there are still data-dependent branches in this code.

Performing the same computation on both branches of a match is not sufficient, as one may still be able to observe branch mispredictions.

Also, pattern matching is optimized:

match x, y with
| true , true  -> e1
| true , false -> e2
| false, _     -> e3

is compiled into something like:

if x then (if y then e1 else e2) else e3

so the execution time is not the same whether x is true or not.

On lines 288/290, the call to Cstruct.sub is not constant time.

Finally, on lines 292-294, an allocation is performed in case of success, but not in case of failure.

@avsm

This comment has been minimized.

Contributor

avsm commented Apr 10, 2014

I'm wondering if this could be solved via Lwt instead of a fine-grained solution like below. Could we wrap a block of code into an Lwt thread that would guarantee that it would take a constant amount of time, for example by dropping into the scheduler and sleeping on a timer wheel? It's a heavyweight approach, but at least responses could only be observed at the resolution of the timer wheel.

-anil

On 10 Apr 2014, at 21:07, Jerome Vouillon notifications@github.com wrote:

Maybe I'm missing something, but as far as I can see, there are still data-dependent branches in this code.

Performing the same computation on both branches of a match is not sufficient, as one may still be able to observe branch mispredictions.

Also, pattern matching is optimized:

match x, y with
| true , true -> e1
| true , false -> e2
| false, _ -> e3
is compiled into something like:

if x then (if y then e1 else e2) else e3
so the execution time is not the same whether x is true or not.

On lines 288/290, the call to Cstruct.sub is not constant time.

Finally, on lines 292-294, an allocation is performed in case of success, but not in case of failure.


Reply to this email directly or view it on GitHub.

@pqwy

This comment has been minimized.

Contributor

pqwy commented Apr 11, 2014

Lwt approach might be a little heavy, but my inclination goes in that general direction: instad of making sure all code paths are the same length, sometimes at increasing costs to code simplicity, and often just guessing about what really happens below us, I'm trying to figure out how to introduce revertible random workloads into the pipeline.

I was greatly inspired by this (standard) RSA timing mitigation. It exploits the fact RSA is homeomorphic with respect to multiplication, to fudge random nonce into the ciphertext, decrypt the whole masked lot and then extract the actual result. This way, the otherwise very timeable RSA becomes much more difficult to steal secrets from by using a stopwatch.

I don't have too many concrete ideas yet, but this is the direction my thinking goes in. This kind of mitigation is much more global, and thus more modular: half way to Lwt-level, but still within time bounds related to the algorithm itself, and not an external scheduler.

#foodforthought

@hannesm

This comment has been minimized.

Member

hannesm commented Apr 13, 2014

@pqwy yes, blinding is well known for asymmetric crypto (but then, https://twitter.com/agl__/status/453471651779182592 might be of interest as well)

strategy here should be: define threat model (do we care about local attackers, or only those on the network?); implement attacker; measure consumed time for possible inputs; show results.

relying on an external scheduler will mitigate against a network attacker, but certainly not against a local one (due to other side channels, such as power consumption).

@pqwy

This comment has been minimized.

Contributor

pqwy commented Apr 13, 2014

@hannesm Exactly. Threat model.

In my mind, the threat model for the first cut of the code is strictly a network attacker. Cache-eviction, power measurements and other local side channel subtelties are, at least in my view, something to reconsider only after having a full stack capable of working on the network.

This is because some of the primary targets for the code are small arm boxes which run a dedicated process, or cloud vms which can run multiple tennants, where these can't know who's sharing the cpu with whom and which can transparently migrate, thus largely voiding local-attack assumptions.

This is also because subtler side channels are a complex topic in themselves possibly larger than the tls itself. ( If someone can measure cpu power, they can probably spray the ram chips with liquid helium as well, and dump the contents before the resistors lose the information. This particular threat model then includes potential full memory compromise as well. Now what? That's one. The other is that writing code that bets on exact generated instructions departs from the greatest benefits OCaml can confer; we could do it in Intercal then and it would read about the same. )

Doing differential timings (contingent on varying secrets) is something I'm dying to try out. But I firmly believe that ground functionality should be put in place before going there. Else focus disperses and we end up randomly chasing loose ends without making core progress.

So, as for the threat model and mitigations, I would really, really love if we first did a tls that worked correctly on the logic level (threat model is Eve), proceeding into more serious timing mitigations only after that (with Mallory living on the network), and considering local (or physical) side channels firmly as a third (Cpu is Mallory? Or is she the power cord now?).

I'd like to get mundane stuff done first. My fear is that otherwise we'll never finish.

@pqwy

This comment has been minimized.

Contributor

pqwy commented Apr 13, 2014

@vouillon Are you sure Cstruct.sub is not constant time? Are you thinking about something subtle in the ordering of its three checks (cstruct.ml: 73)? Because the core operation is still zero-copy?

@avsm

This comment has been minimized.

Contributor

avsm commented Apr 13, 2014

On 13 Apr 2014, at 17:12, pqwy notifications@github.com wrote:

So, as for the threat model and mitigations, I would really, really love if we first did a tls that worked correctly on the logic level (threat model is Eve), proceeding into more serious timing mitigations only after that (with Mallory living on the network), and considering local (or physical) side channels firmly as a third (Cpu is Mallory? Or is she the power cord now?

I completely agree. It also gives us the opportunity to put in negative tests for timing attacks (with the naive/simple OCaml code), and then introduce the mitigations afterwards. The Lem/Coq verification will almost certainly be easier on the naive code to begin with, followed by the more complex timing-resistant version.

-anil

@pqwy

This comment has been minimized.

Contributor

pqwy commented Apr 13, 2014

In essence, I believe it is just sound engineering to go step by step not to get lost along the way. And even the core logic is a mouthful. (Cf. all the recent major exploits all of which fall into the first domain, that of solid code.)

@hannesm

This comment has been minimized.

Member

hannesm commented Apr 13, 2014

I agree with the raised points and that we should focus first on a robust and useful stack. I'd leave this PR open to remind us on the discussion once we are there.

@hannesm

This comment has been minimized.

Member

hannesm commented Jun 13, 2015

recent papers:

@hannesm

This comment has been minimized.

Member

hannesm commented Dec 1, 2015

and here's another paper on that topic, in amazon's s2n library https://eprint.iacr.org/2015/1129

@hannesm hannesm referenced this pull request Dec 4, 2015

Closed

implementation pitfalls from rfc5246, d.4 #31

14 of 14 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment