Skip to content

Conversation

@thomaswue
Copy link
Member

Spectre CVE-2017-5753 and CVE-2017-5715 mitigation. Performs min(index, length-1) before every array read such that the read is safe independent of the bounds check (which the speculative execution of the CPU might skip).

@chrisseaton
Copy link
Contributor

Do we only need this for untrusted code? If you're running Ruby or Python you already have access to everything native via the FFI.

@jcdavis
Copy link
Contributor

jcdavis commented Jan 7, 2018

Doesn't & (length-1) only work if the array length is a power of 2?

@thomaswue
Copy link
Member Author

This is mainly protecting basic memory safety for Java code (and by extension Truffle language code). Even for Ruby or Python, we do not want to assume that all modules are trusted. This blog post points out why with some colorful language: https://hackernoon.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5

Also, we can run native extensions safely with SafeSulong.

The patch in its current form has significant performance implications for array-heavy code. So we will put it behind a flag for now.

@thomaswue
Copy link
Member Author

@jcdavis This patch performs a real min(index, length-1) via (index + ((length-1-index) & (length-1-index)>>31)) and not just (index & (length - 1)).

@jcdavis
Copy link
Contributor

jcdavis commented Jan 7, 2018

Ahh, makes sense, thanks

@mikehearn
Copy link
Contributor

Hmm. Is it necessary to do this independently of the normal JVM bounds checks? I still need to re-review the Project Zero blog post in detail but my understanding was that it's sufficient to block speculation past an array bounds check e.g. with a fence and that it's OK to optimise these bounds checks as normal.

@thomaswue
Copy link
Member Author

Do you think the fence is more performant? Also, are you sure that using it solves the issue with side-effects on the cache? Using an instruction to block speculative execution after bounds checks (and also type checks) would be sufficient I believe.

@mikehearn
Copy link
Contributor

I wouldn't want to make definite statements about performance without profiling, as there is a lot of conflicting information and things seem to vary by CPU variant. But - the Spectre paper as well as the Intel and ARM paper recommend fencing. From Intel's paper:

The software mitigation that Intel recommends is to insert a barrier to stop speculation in appropriate places. In particular, the use of an LFENCE instruction is recommended for this purpose. Serializing instructions, as well as the LFENCE instruction, will stop younger instructions from executing, even speculatively, before older instructions have retired but LFENCE is a better performance solution than other serializing instructions. *An LFENCE instruction inserted after a bounds check will prevent younger operations from executing before the bound check retires. Note that the insertion of LFENCE must be done judiciously; if it is used too liberally, performance may be significantly compromised. *

It is possible to create a set of static analysis rules in order to help find locations in software where a speculation barrier might be needed. Intel’s analysis of the Linux kernel for example has only found a handful of places where LFENCE insertion is required, resulting in minimal performance impact. As with all static analysis tools, there are likely to be false positives in the results and human inspection is recommended.

The reference to static analysis showing that the fence is rarely needed is presumably about the need for a double-load pattern. CVE-5753 (a.k.a. Variant 1 a.k.a. Spectre/1) requires not just a load from an bounds checked array but then for the result of the load to be used to calculate an address for another load. If that double-load pattern isn't found in the target address space then it doesn't seem to be exploitable. So to protect type-system based security in the presence of JIT compilers, I guess the matching can be made more advanced and the fence deployed only when it's needed. Intel at least seems to think there should be little perf impact.

ARM suggests a combination of two approaches, one is the use of cmov.

The practical software mitigation for the scenario where the value being leaked is determined by less privileged software is to ensure that the address derived from the secret (that is the address that will be used to load value2 in the example in page 2) is only indicative of the secret (the data in value) when the access that derived the secret was one that would be executed non-speculatively. This can be achieved on most Arm implementations by using a conditional selection or conditional move instruction based on the condition that is used to determine the outcome of the branch.

In the implementations where this does not work, a new barrier, [defined below] can be used (this instruction is a NOP on implementations where the conditional select/conditional move can be used). The combination of both a conditional select/conditional move and the new barrier are therefore sufficient to address this problem on ALL Arm implementations. The details of the new barrier are described later in this section.

The ARM paper helpfully provides before/after assembly code as well.

What about CVE-5715 branch target injection a.k.a. variant 2 a.k.a. Spectre/2? That's where they recommend the retpolines. Well, ARM says there's no mitigation on their end but I suspect the hackjob being proposed for Intel might work there too ... but in general there seems to be more analysis required here. The recommended form of the retpoline seems to vary slightly between compilers and companies right now. It might be worth focusing on variant 1 and leaving things to settle around variant 2 for a week or so.

Don't get me wrong. Your approach may well work. I haven't thought about it much. It's just different to what the CPU companies are recommending and given that all these attacks rely on undocumented or internal details of the chips I'd be wary of diverging from their advice here.

@thomaswue
Copy link
Member Author

We will look into this in more detail and measure exact performance impact. My intuition is that the fence instruction is under normal circumstances far more expensive than the index calculation (Intel is very good at executing these very primitive assembly instructions). It seems also non-trivial to find whether the double-load pattern (and all its alternative forms) is present. The browser JS engines (JSC, V8) use masking of the access address to the heap size (which is in my opinion not sufficient). One additional security benefit of the index masking is that security bugs related to wrong bounds check elimination are also prevented.

Jan Stola pointed out that also max(index, 0) is necessary, I will add that.

Igor Veresov pointed out that a similar out-of-bounds situation can be generated with a type cast and field access. So a fence on type casts might be necessary. Also, virtual calls can have a similar effect to type casts.

My current understanding of CVE-2017-5715 is that the attacker needs to locate a specific machine code gadget in the victim's address space. I think this should not be possible from a managed environment. This prevents the managed language code from being an attacker.

@mikehearn
Copy link
Contributor

Fair enough then. I saw today WebKit's blog post on what they're doing, which is indeed not barrier based. I'll step out of this and let you guys get on with figuring out the best performing approach.

For 5715 I'm not sure why managed code can't do this. If the target program is running a known binary then the author of the managed code can simply search for gadgets ahead of time and hard-code the known offsets. It then turns into a problem of defeating ASLR which has various known weaknesses. Which is the specific operation that can't be done?

@thomaswue
Copy link
Member Author

Regarding 5715: You are correct, once you have the address of the gadget, the attack is possible without major issues. One minor aspect is that indirect jumps or calls are not very common (switches are often converted into branch trees and virtual calls often inlined). We can test the performance impact of Retpoline.

@thomaswue
Copy link
Member Author

Added 955b019 for general LFENCE defense when executing critical code.

@thomaswue thomaswue closed this Mar 11, 2018
@mikehearn
Copy link
Contributor

Is there a summary anywhere of the final thinking on this? I note that in the end, the array masking wasn't merged and LFENCE was indeed used, but it's hard to know what sort of performance impact this has or why the array masking approach was abandoned (which does indeed seem quite nice).

The mitigation that was added seems to lfence every basic block, not just after bounds checks or vcalls. I assume it's done that way so we can say, we are sure this works and yes it's slow but c'est la vie, vs more tricky approaches that are faster but might have gaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants