New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Update node's simdutf to 3.2.9 to fix AVX-related crashes #40446
Conversation
💖 Thanks for opening this pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should update your pull request title to start with a semantic prefix. Examples of commit messages with semantic prefixes:
Things that will help get your PR across the finish line:
We get a lot of pull requests on this repo, so please be patient and we will get back to you as soon as we can. |
Feel free to add a test case for your test suite, run |
I don't have a clue who the correct stakeholder would be, or how to find out who it might be, but looking at other commits that added patches in the I don't have any test cases to add, and I won't be running your test suite on my machine (just compiling this to test if it actually solves the problem for me took already 12 hours), so I've taken the liberty of removing the corresponding checkbox from the checklist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're definitely interested in addressing this, but we can't take a diff like this as our own maintenance burden. This is something that should be PR'd to Node.js itself in the v18.x
line, as if it's critical enough i would imagine they'd be open to it.
Additionally - you should be using build-tools
. It enables access to our build cache and reduces build times significantly.
This issue actually has been fixed in node's |
@hardfalcon what specifically is the fix? We would potentially be open to taking a smaller patch but we cannot update the entire version here. |
This commit: simdutf/simdutf@55b107f Note that although the commit message suggest it only fixes detection of AVX-512, it also fixes the faulty detection routine for AVX/AVX2. On Linux systems with a Skylake (or newer) CPU, where the "gather data sampling" vulnerability has not been fixed through a microcode update (for of their CPUs, Intel have released microcode updates that fix the GDS vulnerability, but for certain Skylake models, they have decided to not release such an update), the Linux kernel disables AVX and AVX2 support if either specifically GDS mitigations are enabled, or if mitigations for spectre/meltdown type vulnerabilities in general are enabled. Unfortunately, buggy software that uses incorrect code to detect if AVX/AVX2 support is available only checks if the CPU reports that it supports AVX/AVX2, but does not check if the operating system does also allow it to use AVX/AVX2, which then leads to an illegal instruction, which results in the whole program/process crashing. A similar issue affects nodejs (and hence electron) directly through a subtle bug in the base64 implementation it uses, but I'm currently trying to get the fix backported into nodejs (help would be appreciated btw). |
@hardfalcon in that case, if you'd like to move this forward, we'd be open to taking back the patch you linked: simdutf/simdutf@55b107f to target the specific crash. |
cc @lemire |
@hardfalcon do you still intend to work on this? |
Updating simdutf to the latest version (4.0.3 or better) should be safe. |
@lemire see #40446 (comment) - it's not about whether it's safe, it is that we do not want to take 6k lines of update as a patch to a release branch. We're happy to take the specific fix though. |
I've tried for days, and I'll probably still try some more, but I really wouldn't mind if somebody else took care of this (and of the base64 AVX fix I've mentioned). Quite frankly, the tooling the Electron project (and the Chromium project, from which Electron seems to have inherited most of this madness) is using, and the development processes derived from that tooling, are orders of magnitude worse than I could have imagined (and that is really an accomplishment, because I didn't exactly have a positive opinion of those two projects or the JS ecosystem to begin with). I've spent literally hours just trying to wrap my head around Even just running Part of this abysmal development experience is caused by outdated or incorrent documentation, but the root cause appears to be the ridiculous development tools and processes the Chromium and Electron projects are using. |
@hardfalcon I've gone and made an isolated PR here #40536 that includes just the requisite patch to In this case for making this change the commands I ran on a fresh linux host (to replicate your environment) were:
|
@MarshallOfSound: Thanks for taking care of that patch/fix. I haven't chosen to use Electron, but I'm faced with the reality of more and more software being based on it, and having to deal with the fallout of that development. I consider both Chromium and Electron prime examples of technical debt precisely because of the unmanageable complexities that you mention. Also, thanks a lot for your concise example. IMO, something like that should be part of Electron's official documentation on patches. By the way, one specific issue that is missing even in your example, and that took me literally hours to figure out, was electron/build-tools#526. |
4d6cb21
to
7fd1538
Compare
In case you want to merge this: I that I haven't tested this yet, I have only tested the complete update to simdutf 3.2.9 until now. |
@codebytere I understand. What I am saying is that we preserve backward compatibility. |
simdutf < 3.2.9 contains a flawed detection routine to decide whether AVX and/or AVX2 can be used. In most cases, it works good enough, but in some corner cases (combination of specific CPU model, microcode version, operating system and OS configuration), that flawed detection results in simdutf trying to use AVX/AVX2 although the OS doesn't allow its use (for example as a mitigation measure against the "gather data sampling" vulnerability), which then crashes the application with an illegal instruction error. This fix is only needed for node < 18.17.0, because later versions use a simdutf version > 3.2.9. Cherry-picked and backported from simdutf/simdutf@55b107f Co-authored-by: Daniel Lemire <daniel@lemire.me> Co-authored-by: easyaspi314 <easyaspi314@users.noreply.github.com> Signed-off-by: Pascal Ernster <git@hardfalcon.net>
7fd1538
to
5686262
Compare
What should I do about trope's It's not really a backport of any commit from the |
|
@jkleinsc: Sorry, but I think there's a misunderstanding. There are two distinct AVX-releated bugs in Electron, one in Node's |
Congrats on merging your first pull request! 🎉🎉🎉 |
No Release Notes |
Description of Change
Node 18.16.1 ships with simdutf 3.2.2, which contains a faulty detection routine for AVX/AVX2/AVX512 support, leading to illegal instructions and thus crashes on some x86_64 Linux machines that have mitigations for the "gather data sampling" vulnerability enabled.
Fixes #40441
Checklist
Release Notes
Notes: none