Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcre2 makes software crash when GDS mitigation is forced for older CPUs #399

Open
bugreporter234 opened this issue Mar 19, 2024 · 21 comments

Comments

@bugreporter234
Copy link

When the Gather Data Sampling mitigation is forced on Linux for older CPUs (i.e. skylake), AVX instructions are disabled because there is no microcode update available.

lscpu:

…
Vulnerabilities:          
Gather data sampling:   Mitigation; AVX disabled, no microcode
…

This leads to software crashes when pcre2jit is enabled. For example qpwgraph crashes when a file dialog is opened and a pcre2jit function is called.

Backtrace:
qpwgraph_gdb_backtrace.txt

Keepassxc (password manager) doesn't start and reports 'illegal hardware instruction'. If pcre2 is compiled without jit it starts normally. Same for carla (audio rack/patchbay software).

@zherczeg
Copy link
Collaborator

AVX2 is not used unless CPUID says they are available.

@bugreporter234
Copy link
Author

Well, normally AVX is available for skylake, but when the GDS mitigation is forced, it's not. So this is kind of a special case - a skylake CPU without AVX instructions. Do you think this special case can be handled somehow?

@zherczeg
Copy link
Collaborator

What cpuid says on that system?

https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitNativeX86_common.c#L512

Are these bits set? If cpu says it supports AVX, the software believes it.

We could add a compile time option to disable it, but you need to pass it manually.

@bugreporter234
Copy link
Author

bugreporter234 commented Mar 20, 2024

cpuid_tool (from libcpuid) says avx and avx2 are available, but lscpu | grep avx produces no output. So cpuid is wrong in this case.

cpuid_tool_report.txt
lscpu.txt

A compile time flag would work for me, but I don't think it's a good solution, because not everybody with an affected CPU wants to recompile pcre2 and if Linux distros deliver pcre2 with jit enabled without the flag, the software would still crash for them.

@zherczeg
Copy link
Collaborator

What is a fairly easy and good solution then? It is surprising the cpu lies about the feature.

@loqs
Copy link

loqs commented Mar 20, 2024

It is surprising the cpu lies about the feature.

In addition to the CPU reporting it supports AVX support the OS must also support it. You can check for the support by querying bit 1 and 2 of the XCR0 register using the xgetbv instruction as documented in the Intel® 64 and IA-32 Architectures Software Developer’s Manual.

Edit:
See example code in:
https://www.intel.com/content/dam/develop/external/us/en/documents/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family.pdf

The following diff might be closer to your use case https://gitlab.gnome.org/GNOME/babl/-/issues/96#note_2056377

@zherczeg
Copy link
Collaborator

This is something new for me, although simple enough to use it. How can I test if this instruction is available?

Is there a description about its feature bits? I only found this:
https://en.wikipedia.org/wiki/Control_register#XCR0_and_XSS

This documentation tells that xgetbv has a non-deterministic behaviour:
https://google.github.io/security-research/pocs/cpus/xgetbv/

@loqs
Copy link

loqs commented Mar 21, 2024

See the Intel SDM Volume 1 Chapter 13 Managing State Using The XSAVE Feature Set (includes XGETBV and XCR0) and
Volume 1 Chapter 14.3 Dectection OF Intel® AVX Instructions.

@zherczeg
Copy link
Collaborator

I have no time to read this much info. I only need the following info:

  • How can I detect if the instruction is available?
  • Is it supported by non-intel x86 cpus?
  • The bitlist of XCR0 (I have already found it)

@loqs
Copy link

loqs commented Mar 21, 2024

* How can I detect if the instruction is available?

CPUID.(EAX=01H, ECX=0H):ECX[bit 27] indicates OSXSAVE support it also indirectly implies the processor supports XSAVE, XRSTOR, XGETBV, and XCR0.

* Is it supported by non-intel x86 cpus?

Yes as the interface is standardized.

Edit:
So something like:

diff --git a/src/sljit/sljitNativeX86_common.c b/src/sljit/sljitNativeX86_common.c
index c2c0421..c15299b 100644
--- a/src/sljit/sljitNativeX86_common.c
+++ b/src/sljit/sljitNativeX86_common.c
@@ -392,6 +392,7 @@ static const sljit_u8 freg_lmap[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 2] = {
 #define CPU_FEATURE_CMOV		0x020
 #define CPU_FEATURE_AVX			0x040
 #define CPU_FEATURE_AVX2		0x080
+#define CPU_FEATURE_OSXSAVE		0x010
 
 static sljit_u32 cpu_feature_list = 0;
 
@@ -499,6 +500,23 @@ __msan_unpoison(info, 4 * sizeof(sljit_u32));
 
 }
 
+/*
+0    FPU
+1    SSE    XMM registers
+2    AVX    YMM registers
+*/
+static sljit_u32 get_xcr0(void)
+{
+	sljit_u32 xcr0;
+	__asm__ __volatile__(".byte 0x0f, 0x01, 0xd0" : "=a" (xcr0) : "c" (0) : "%edx");
+	return xcr0;
+}
+
+static sljit_u32 get_os_support_avx(void)
+{
+	return ((get_xcr0() & 6) == 6);
+}
+
 static void get_cpu_features(void)
 {
 	sljit_u32 feature_list = CPU_FEATURE_DETECTED;
@@ -526,6 +544,8 @@ static void get_cpu_features(void)
 
 		if (info[2] & 0x80000)
 			feature_list |= CPU_FEATURE_SSE41;
+		if (info[2] & 0x8000000)
+			feature_list |= CPU_FEATURE_OSXSAVE;
 		if (info[2] & 0x10000000)
 			feature_list |= CPU_FEATURE_AVX;
 #if (defined SLJIT_DETECT_SSE2 && SLJIT_DETECT_SSE2)
@@ -543,6 +563,9 @@ static void get_cpu_features(void)
 	if (info[2] & 0x20)
 		feature_list |= CPU_FEATURE_LZCNT;
 
+	if ((feature_list & CPU_FEATURE_AVX) && !((feature_list & CPU_FEATURE_OSXSAVE) || get_os_support_avx()))
+		feature_list  = feature_list &= ~(CPU_FEATURE_AVX | CPU_FEATURE_AVX2);
+
 	cpu_feature_list = feature_list;
 }

@bugreporter234
Copy link
Author

Edit: See example code in: https://www.intel.com/content/dam/develop/external/us/en/documents/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family.pdf

I tried out the example code, removing all the features in line 8, leaving only _FEATURE_AVX2 and it outputs:

This CPU does not support all ISA extensions introduced in Haswell

So this should work.

@zherczeg
Copy link
Collaborator

Ok I will do this in the sljit project in due course. It will be transferred into pcre2 before the next release.

@bugreporter234
Copy link
Author

I applied logs' patch to 10.43 and it compiled successfully, but when I run make -j1 check the checks still fail with 'illegal instruction'. I'm compiling like Arch Linux does in the 10.43 PKGBUILD.

test-suite.log

@bugreporter234
Copy link
Author

I made a pcre2 package for Arch Linux with the patch applied (commenting out the checks) and installed it and now qpwgraph is crashing from the start. Without the patch it was crashing when I clicked 'Open' or 'Save as' (before a file dialog is opened). I also recompiled qpwgraph, just to be sure. Here's the backtrace of qpwgraph with the patch applied to pcre2:

qpwgraph_gdb_backtrace_patch_applied.txt

@zherczeg
Copy link
Collaborator

It looks like it is not in the jit-ed code, but during compilation: pcre2_compile_16. Could you disassemble the instructions around it:
disassemble $pc-32,$pc+32

Maybe it is the XGETBV instruction.

@bugreporter234
Copy link
Author

bugreporter234 commented Mar 21, 2024

Thread 1 "qpwgraph" received signal SIGILL, Illegal instruction.
0x00007ffff4d8dfb0 in pcre2_compile_16 (pattern=0x5555558bac60, patlen=<optimized out>, options=524288, errorptr=0x55555588c2e0, erroroffset=0x7fffffffd988, 
    ccontext=0x7ffff4e11c00 <_pcre2_default_compile_context_16>) at src/pcre2_compile.c:10267
10267	cb.start_pattern = pattern;
(gdb) disassemble $pc-32,$pc+32
Dump of assembler code from 0x7ffff4d8df90 to 0x7ffff4d8dfd0:
   0x00007ffff4d8df90 <pcre2_compile_16+304>:	xor    %cl,(%rdi)
   0x00007ffff4d8df92 <pcre2_compile_16+306>:	(bad)
   0x00007ffff4d8df93 <pcre2_compile_16+307>:	xor    $0x4a000005,%eax
   0x00007ffff4d8df98 <pcre2_compile_16+312>:	lea    (%rbx,%rsi,2),%edx
   0x00007ffff4d8df9b <pcre2_compile_16+315>:	mov    %rdx,-0x4688(%rbp)
   0x00007ffff4d8dfa2 <pcre2_compile_16+322>:	mov    -0x4690(%rbp),%rsi
   0x00007ffff4d8dfa9 <pcre2_compile_16+329>:	lea    0x5c190(%rip),%rdx        # 0x7ffff4dea140 <_pcre2_default_tables_16>
=> 0x00007ffff4d8dfb0 <pcre2_compile_16+336>:	vmovq  %rbx,%xmm3
   0x00007ffff4d8dfb5 <pcre2_compile_16+341>:	mov    -0x46a0(%rbp),%edi
   0x00007ffff4d8dfbb <pcre2_compile_16+347>:	movq   $0x0,-0x44e8(%rbp)
   0x00007ffff4d8dfc6 <pcre2_compile_16+358>:	mov    0x28(%rsi),%rcx
   0x00007ffff4d8dfca <pcre2_compile_16+362>:	vmovq  %rsi,%xmm2
   0x00007ffff4d8dfcf <pcre2_compile_16+367>:	mov    %edi,-0x44cc(%rbp)
End of assembler dump.

@bugreporter234
Copy link
Author

The following diff might be closer to your use case https://gitlab.gnome.org/GNOME/babl/-/issues/96#note_2056377

I tested this babl patch and it fixes the babl and gimp crash, which is unrelated to pcre2, but the logic there seems to solve the problem.

@zherczeg
Copy link
Collaborator

the name vmovq is a VEX prefixed version of movq, so your compiler automatically generates AVX2 instructions. You need to set the instruction set (or the cpu type) before compiling.

@bugreporter234
Copy link
Author

I built the packages on Arch Linux in a chroot with makechrootpkg -uncCr $chrootdir -- -rCc. This basically runs makepkg -rCc in a chroot and makepkg has a config file /etc/makepkg.conf, where environment variables like CFLAGS with compiler and linker options are set. I didn't change these options in the chroot, because I thought it would be good to see if it works with the default options on Arch. If I understand correctly, then it shouldn't use the VEX prefixed instructions like vmovq, when the compiler options -march=x86-64 -mtune=generic are set. Is this correct?

On the host I found that the /etc/makepkg.conf had -march=native instead of -march=x86-64 -mtune=generic set. Could it be that I accidentally built the package on the host with -march=native and this was the cause for the crash on start?

Strangely, I couldn't reproduce it today for qpwgraph. Even when I build both packages with the -march=native config, it doesn't crash on start anymore and we're back to the crash when I click on 'Open' or 'Save as' behaviour (with the patch applied).

Backtrace and disassemble of crash after clicking 'Open' with the patch applied:

qpwgraph_gdb_backtrace_disass_patch_applied.txt

Build process of the pcre2 package with the patch applied:

pcre2-build.txt

@zherczeg
Copy link
Collaborator

Added to sljit zherczeg/sljit@56dbde0

@bugreporter234
Copy link
Author

Looks good! No more crashes. Tested sljit first and without the new patch I get 'illegal instruction' from the ./bin/sljit_test; with the new patch applied I get a PASS. And I made a pcre2 package for Arch Linux with the new patch and it fixes all the crashes. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants