Support older core2 processors#115485
Conversation
Force-enabling SSE4.2 causes signal 4 (Illegal instruction) on older processors. By only enabling SSE4.1, similar performance benefits are retained while still maintaining compatibility. Ideally, multiple code paths should be compiled into the same binary. At runtime, the most optimal path can then be selected. This way, newer CPU features (eg. SSE4.2, AVX512) can be used on modern CPUs while simultaneously maintaining compatibility with systems that do not support them. closes godotengine#115001, closes godotengine#113617
Did you test this on an actual Core 2 CPU? As far as I know, downgrading to SSE4.1 won't help. We would need to downgrade all the way to SSE2 (which matches Godot 4.4 and prior behavior). See also godotengine/godot-proposals#13644. The 32-bit binaries still target SSE2, so they will keep working on Core 2 CPUs for the foreseeable future.
This can't be done for autovectorization, only hand-written intrinsics. |
Yes, I own a Q9550 (which I used to build and run Godot on), just like the OP of 13644. I also own a Q6600, which supports up to SSE3. Intel ARK used to have a nice listing of what each core 2 processor supports but a couple years ago they decided to delete all the pages containing core 2 processors. This information can still be found on Wikipedia, apparently all core 2 processors on the 45nm node support SSE 4.1 |
One could compile the same file multiple times (one for SSE2, one for SSE3, one for SSE4, etc). Extract specific symbols that received performance improvements (a profiler will tell you) using |
|
I can probably rewrite the PR to add a build flag, as proposed in 13644. |
I myself would also recommend that, as it would keep the SSE4.2 baseline as default, while enabling custom compilation for older CPUs by users and distros at once. If you can seriously do it, then please do so, it would be a very ideal way of handling baselines. For example the azahar 3DS emulator has an option to disable the SSE4.2 baseline and allowing custom instruction usage in the CFLAGS etc, means disabling SSE4.2, and still being able to use the highest instruction supported by the non-SSE4.2 CPU. Thank you for not letting this topic become cold, I was going to bring it up within a few days. |
|
I made a PR and added a build flag ( |
Force-enabling SSE4.2 causes signal 4 (Illegal instruction) on older processors. By only enabling SSE4.1, similar performance benefits are retained while still maintaining compatibility.
Ideally, multiple code paths should be compiled into the same binary. At runtime, the most optimal path can then be selected.
This way, newer CPU features (eg. SSE4.2, AVX512) can be used on modern CPUs while simultaneously maintaining compatibility with systems that do not support them.
closes #115001, closes #113617