[arm64] fixes around icache flushing #3549

Merged
merged 3 commits into from Sep 12, 2016

Projects

None yet

8 participants

@lewurm
Member
lewurm commented Sep 9, 2016

This fixes https://bugzilla.xamarin.com/show_bug.cgi?id=39859

The problem in emit_thunk and the workarounds for the Cortex A53 errata are unrelated to the bug above, but they don't hurt either.

The basic problem that we see is that the Exynos 8890 SoC (shipped for example in the world edition of the Samsung Galaxy S7) is a big.LITTLE architecture with two different CPUs, four of each. __clear_cache of GCC [1], the built-in that we use to flush the instruction cache, correctly reads the cache line sizes for data- and instruction cache from the CPU and then caches that result. However, the two CPUs on the Exynos 8890 have two different cache line sizes for the instruction cache (128 bytes vs. 64 bytes): If we happen to initialize the cache line sizes with the larger one, we only flush every other cache line if the code runs on the second CPU model.

Neither V8 [2] and LLVM [3] do caching of the cache line sizes, but just read it every time from the register. We have a slightly different solution, where we cache and read the value every time and try to determine a global minimum across CPUs, in case we get scheduled to a different CPU by the kernel during the flushing loop.

[1] https://android.googlesource.com/toolchain/gcc/+/master/gcc-4.9/libgcc/config/aarch64/sync-cache.c#33
[2] v8/v8@fec99c6
[3] https://github.com/llvm-mirror/compiler-rt/blob/ff75f2a0260b1940436a483413091c5770427c04/lib/builtins/clear_cache.c#L146

@kumpera
Member
kumpera commented Sep 9, 2016

LGTM

lewurm added some commits Sep 9, 2016
@lewurm lewurm [arm64] manually flush icache 802cf1d
@lewurm lewurm [arm64] use `dc civac` instead of `dc cvau`
suggested workaround for some Cortex A53 bugs, inspired by V8:
v8/v8@fec99c6
44520cd
@lewurm lewurm referenced this pull request in dolphin-emu/dolphin Sep 9, 2016
Merged

arm64: fixes around icache flushing #4204

@hrydgard
hrydgard commented Sep 9, 2016 edited

Has this issue been reported to the GCC project? Caching the cache line size is wrong indeed on this arch.

@kumpera
Member
kumpera commented Sep 10, 2016

@hrydgard we're rolling out the fix today. We'll report the bug to all parties we're aware to be affected.

@jahmai
jahmai commented Sep 10, 2016

We've had SIGILL on the following Samsung devices: SM-G920I, SM-G920F, SM-G925F, SM-G930F, SM-G935F. Should this fix apply to all?

We also get a lot of random other errors on these devices, including ArgumentNull and NullReference exceptions in places where it should be impossible, but always very close to application start. Should this fix also resolve these?

@lewurm
Member
lewurm commented Sep 10, 2016 edited

This patch was tested on a SM-G930F (Exynos 8890), Nexus 5X (Snapdragon 808) and on a SM-G925P (Exynos 7420). On the latter one I could never reproduce a SIGILL, but I used it to develop patches for debugging as I had it available locally (as opposed to the SM-G930F, which I only could send APKs to via Xamarin Test Cloud). Neither I saw a SIGILL on the Nexus 5X, I just used it to verify that this patch also works on non-Samsung devices.

So, this definitely fixes the problem on SM-G930F and SM-G935F as they use the same SoC (Exynos 8890).

It might fixes the issues you see on SM-G920I, SM-G920F and SM-G925F (all of them are using the Exynos 7420). I didn't really try hard to figure out the cache line sizes there and I couldn't find it on the web. If the problem continues with this patch for you, we need to figure out a way to reproduce it more reliable on that hardware. Please open another bug for that if this is the case.

Regarding ArgumentNullException and NullReferenceException: Again, maybe :-) I never saw those with my testcase. If you continue to see it, please open another bug.

@jahmai
jahmai commented Sep 10, 2016

πŸ‘

@hrydgard hrydgard added a commit to hrydgard/ppsspp that referenced this pull request Sep 10, 2016
@hrydgard hrydgard Port over the Exynos cacheline size fix from Dolphin. Thanks to lewur…
…m of the mono project for the discovery and original fix.

See dolphin-emu/dolphin#4204 and mono/mono#3549
03279e1
@lewurm
Member
lewurm commented Sep 12, 2016

@monojenkins merge

@monojenkins monojenkins merged commit 4e441ec into mono:master Sep 12, 2016

8 of 10 checks passed

Linux x64 FullAOT Build finished.
Details
OS X i386 Build finished.
Details
Linux AArch64 Build finished.
Details
Linux ARMv5 soft float Build finished.
Details
Linux ARMv7 hard float Build finished.
Details
Linux i386 Build finished.
Details
Linux x64 Build finished.
Details
OS X x64 Build finished.
Details
Windows i386 Build finished.
Details
Windows x64 Build finished.
Details
@davidlt
davidlt commented Sep 12, 2016

I think, kernel guys are working on a fix, [PATCH v3 0/9] arm64: Work around for mismatched cache line size (http://lists.infradead.org/pipermail/linux-arm-kernel/2016-September/453859.html)

@zeux
zeux commented Sep 14, 2016

Maybe I'm missing something - is there still an issue with the thread being preempted mid-cache flush and moved to a different core for a subsequent schedule? The code that gets the minimum cache size will sometimes work around this issue but that's not guaranteed for the first run (or even ever, if the thread always starts on a BIG core)

@lewurm
Member
lewurm commented Sep 14, 2016 edited

@zeux potentially. First of all, that's very unlikely, but still could happen. Then the question is, what happens if the kernel migrates a process from one CPU to another CPU? I don't really know any details, but my guess is that a cache flush happens, so we're probably fine. A better solution would be to iterate over all cores on startup and determine the global minimum cache line size (however, that's not that easy either). The best long-term fix is in the kernel. See also the discussion by the GCC folks: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77571

Our fix is not ideal, but it's the best what we could do for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment