-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Mask index to be always lower than array length independent of bounds check. #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Do we only need this for untrusted code? If you're running Ruby or Python you already have access to everything native via the FFI. |
|
Doesn't |
|
This is mainly protecting basic memory safety for Java code (and by extension Truffle language code). Even for Ruby or Python, we do not want to assume that all modules are trusted. This blog post points out why with some colorful language: https://hackernoon.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5 Also, we can run native extensions safely with SafeSulong. The patch in its current form has significant performance implications for array-heavy code. So we will put it behind a flag for now. |
|
@jcdavis This patch performs a real min(index, length-1) via (index + ((length-1-index) & (length-1-index)>>31)) and not just (index & (length - 1)). |
|
Ahh, makes sense, thanks |
|
Hmm. Is it necessary to do this independently of the normal JVM bounds checks? I still need to re-review the Project Zero blog post in detail but my understanding was that it's sufficient to block speculation past an array bounds check e.g. with a fence and that it's OK to optimise these bounds checks as normal. |
|
Do you think the fence is more performant? Also, are you sure that using it solves the issue with side-effects on the cache? Using an instruction to block speculative execution after bounds checks (and also type checks) would be sufficient I believe. |
|
I wouldn't want to make definite statements about performance without profiling, as there is a lot of conflicting information and things seem to vary by CPU variant. But - the Spectre paper as well as the Intel and ARM paper recommend fencing. From Intel's paper:
The reference to static analysis showing that the fence is rarely needed is presumably about the need for a double-load pattern. CVE-5753 (a.k.a. Variant 1 a.k.a. Spectre/1) requires not just a load from an bounds checked array but then for the result of the load to be used to calculate an address for another load. If that double-load pattern isn't found in the target address space then it doesn't seem to be exploitable. So to protect type-system based security in the presence of JIT compilers, I guess the matching can be made more advanced and the fence deployed only when it's needed. Intel at least seems to think there should be little perf impact. ARM suggests a combination of two approaches, one is the use of cmov.
The ARM paper helpfully provides before/after assembly code as well. What about CVE-5715 branch target injection a.k.a. variant 2 a.k.a. Spectre/2? That's where they recommend the retpolines. Well, ARM says there's no mitigation on their end but I suspect the hackjob being proposed for Intel might work there too ... but in general there seems to be more analysis required here. The recommended form of the retpoline seems to vary slightly between compilers and companies right now. It might be worth focusing on variant 1 and leaving things to settle around variant 2 for a week or so. Don't get me wrong. Your approach may well work. I haven't thought about it much. It's just different to what the CPU companies are recommending and given that all these attacks rely on undocumented or internal details of the chips I'd be wary of diverging from their advice here. |
|
We will look into this in more detail and measure exact performance impact. My intuition is that the fence instruction is under normal circumstances far more expensive than the index calculation (Intel is very good at executing these very primitive assembly instructions). It seems also non-trivial to find whether the double-load pattern (and all its alternative forms) is present. The browser JS engines (JSC, V8) use masking of the access address to the heap size (which is in my opinion not sufficient). One additional security benefit of the index masking is that security bugs related to wrong bounds check elimination are also prevented. Jan Stola pointed out that also max(index, 0) is necessary, I will add that. Igor Veresov pointed out that a similar out-of-bounds situation can be generated with a type cast and field access. So a fence on type casts might be necessary. Also, virtual calls can have a similar effect to type casts. My current understanding of CVE-2017-5715 is that the attacker needs to locate a specific machine code gadget in the victim's address space. I think this should not be possible from a managed environment. This prevents the managed language code from being an attacker. |
|
Fair enough then. I saw today WebKit's blog post on what they're doing, which is indeed not barrier based. I'll step out of this and let you guys get on with figuring out the best performing approach. For 5715 I'm not sure why managed code can't do this. If the target program is running a known binary then the author of the managed code can simply search for gadgets ahead of time and hard-code the known offsets. It then turns into a problem of defeating ASLR which has various known weaknesses. Which is the specific operation that can't be done? |
|
Regarding 5715: You are correct, once you have the address of the gadget, the attack is possible without major issues. One minor aspect is that indirect jumps or calls are not very common (switches are often converted into branch trees and virtual calls often inlined). We can test the performance impact of Retpoline. |
|
Added 955b019 for general LFENCE defense when executing critical code. |
|
Is there a summary anywhere of the final thinking on this? I note that in the end, the array masking wasn't merged and LFENCE was indeed used, but it's hard to know what sort of performance impact this has or why the array masking approach was abandoned (which does indeed seem quite nice). The mitigation that was added seems to lfence every basic block, not just after bounds checks or vcalls. I assume it's done that way so we can say, we are sure this works and yes it's slow but c'est la vie, vs more tricky approaches that are faster but might have gaps? |
Spectre CVE-2017-5753 and CVE-2017-5715 mitigation. Performs min(index, length-1) before every array read such that the read is safe independent of the bounds check (which the speculative execution of the CPU might skip).
