Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update node's simdutf to 3.2.9 to fix AVX-related crashes #40446

Merged
merged 2 commits into from Dec 8, 2023

Conversation

hardfalcon
Copy link

@hardfalcon hardfalcon commented Nov 3, 2023

Description of Change

Node 18.16.1 ships with simdutf 3.2.2, which contains a faulty detection routine for AVX/AVX2/AVX512 support, leading to illegal instructions and thus crashes on some x86_64 Linux machines that have mitigations for the "gather data sampling" vulnerability enabled.

Fixes #40441

Checklist

  • PR description included and stakeholders cc'd

Release Notes

Notes: none

@hardfalcon hardfalcon requested a review from a team as a code owner November 3, 2023 22:11
Copy link

welcome bot commented Nov 3, 2023

💖 Thanks for opening this pull request! 💖

We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should update your pull request title to start with a semantic prefix.

Examples of commit messages with semantic prefixes:

  • fix: don't overwrite prevent_default if default wasn't prevented
  • feat: add app.isPackaged() method
  • docs: app.isDefaultProtocolClient is now available on Linux

Things that will help get your PR across the finish line:

  • Follow the JavaScript, C++, and Python coding style.
  • Run npm run lint locally to catch formatting errors earlier.
  • Document any user-facing changes you've made following the documentation styleguide.
  • Include tests when adding/changing behavior.
  • Include screenshots and animated GIFs whenever possible.

We get a lot of pull requests on this repo, so please be patient and we will get back to you as soon as we can.

@electron-cation electron-cation bot added the new-pr 🌱 PR opened in the last 24 hours label Nov 3, 2023
@hardfalcon
Copy link
Author

Feel free to add a test case for your test suite, run npm test on it, add or remove check boxes from the Checklist section, modify the commit message, or do whatever else you feel necessary to merge this.

@hardfalcon
Copy link
Author

hardfalcon commented Nov 3, 2023

I don't have a clue who the correct stakeholder would be, or how to find out who it might be, but looking at other commits that added patches in the patches/node/ directory of the 26-x-y branch, I think @jkleinsc could be the right person.

I don't have any test cases to add, and I won't be running your test suite on my machine (just compiling this to test if it actually solves the problem for me took already 12 hours), so I've taken the liberty of removing the corresponding checkbox from the checklist.

@codebytere codebytere self-requested a review November 6, 2023 03:41
Copy link
Member

@codebytere codebytere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're definitely interested in addressing this, but we can't take a diff like this as our own maintenance burden. This is something that should be PR'd to Node.js itself in the v18.x line, as if it's critical enough i would imagine they'd be open to it.

Additionally - you should be using build-tools. It enables access to our build cache and reduces build times significantly.

@hardfalcon
Copy link
Author

hardfalcon commented Nov 6, 2023

This issue actually has been fixed in node's v18.x branch four months ago:
https://github.com/nodejs/node/commits/v18.x/deps/simdutf

@codebytere
Copy link
Member

@hardfalcon what specifically is the fix? We would potentially be open to taking a smaller patch but we cannot update the entire version here.

@hardfalcon
Copy link
Author

hardfalcon commented Nov 6, 2023

This commit: simdutf/simdutf@55b107f

Note that although the commit message suggest it only fixes detection of AVX-512, it also fixes the faulty detection routine for AVX/AVX2.

On Linux systems with a Skylake (or newer) CPU, where the "gather data sampling" vulnerability has not been fixed through a microcode update (for of their CPUs, Intel have released microcode updates that fix the GDS vulnerability, but for certain Skylake models, they have decided to not release such an update), the Linux kernel disables AVX and AVX2 support if either specifically GDS mitigations are enabled, or if mitigations for spectre/meltdown type vulnerabilities in general are enabled. Unfortunately, buggy software that uses incorrect code to detect if AVX/AVX2 support is available only checks if the CPU reports that it supports AVX/AVX2, but does not check if the operating system does also allow it to use AVX/AVX2, which then leads to an illegal instruction, which results in the whole program/process crashing.

A similar issue affects nodejs (and hence electron) directly through a subtle bug in the base64 implementation it uses, but I'm currently trying to get the fix backported into nodejs (help would be appreciated btw).

@codebytere
Copy link
Member

@hardfalcon in that case, if you'd like to move this forward, we'd be open to taking back the patch you linked: simdutf/simdutf@55b107f to target the specific crash.

@electron-cation electron-cation bot removed the new-pr 🌱 PR opened in the last 24 hours label Nov 10, 2023
@anonrig
Copy link

anonrig commented Nov 11, 2023

cc @lemire

@codebytere
Copy link
Member

@hardfalcon do you still intend to work on this?

@lemire
Copy link

lemire commented Nov 15, 2023

Updating simdutf to the latest version (4.0.3 or better) should be safe.

@codebytere
Copy link
Member

codebytere commented Nov 15, 2023

@lemire see #40446 (comment) - it's not about whether it's safe, it is that we do not want to take 6k lines of update as a patch to a release branch. We're happy to take the specific fix though.

@hardfalcon
Copy link
Author

@hardfalcon do you still intend to work on this?

I've tried for days, and I'll probably still try some more, but I really wouldn't mind if somebody else took care of this (and of the base64 AVX fix I've mentioned).

Quite frankly, the tooling the Electron project (and the Chromium project, from which Electron seems to have inherited most of this madness) is using, and the development processes derived from that tooling, are orders of magnitude worse than I could have imagined (and that is really an accomplishment, because I didn't exactly have a positive opinion of those two projects or the JS ecosystem to begin with). I've spent literally hours just trying to wrap my head around electron/build-tools and what is required to allow me to apply 1-2 trivial patches and compile and test the result, and I'm pretty close to just giving up.

Even just running npm i @electron/build-tools, adding the result to the PATH, then running e init and e sync for the first time ended up downloading and using about 49 GB on my machine, downloading unfathomable amounts of useless stuff like precompiled PE binaries (I'm running this on a Linux machine, why does it download Windows binaries?!), and producing warnings like Your current version of Yarn is out of date. The latest version is "1.22.19", while you're on "1.15.2". (I don't have yarn installed at all, and I executed those commands on a fresh/empty user account, so that outdated yarn version can only come from Electron's build-tools).

Part of this abysmal development experience is caused by outdated or incorrent documentation, but the root cause appears to be the ridiculous development tools and processes the Chromium and Electron projects are using.

@MarshallOfSound
Copy link
Member

@hardfalcon I've gone and made an isolated PR here #40536 that includes just the requisite patch to base64. Although the build system may be slightly arcane in some places please consider how you interact with the project, building a project as large as Chromium + Electron + Node is a mammoth task and the build tooling will reflect that.

In this case for making this change the commands I ran on a fresh linux host (to replicate your environment) were:

e init
e sync -vv
cd third_party/electron_node
# Applied the patch to deps/base64/base64
git commit
e patches node # Exports the current state of node patches
cd ../../electron
git add patches # Changes to node patches are here
git commit

@hardfalcon
Copy link
Author

hardfalcon commented Nov 15, 2023

@MarshallOfSound: Thanks for taking care of that patch/fix. I haven't chosen to use Electron, but I'm faced with the reality of more and more software being based on it, and having to deal with the fallout of that development. I consider both Chromium and Electron prime examples of technical debt precisely because of the unmanageable complexities that you mention.

Also, thanks a lot for your concise example. IMO, something like that should be part of Electron's official documentation on patches. By the way, one specific issue that is missing even in your example, and that took me literally hours to figure out, was electron/build-tools#526.

@hardfalcon
Copy link
Author

In case you want to merge this: I that I haven't tested this yet, I have only tested the complete update to simdutf 3.2.9 until now.

@lemire
Copy link

lemire commented Nov 16, 2023

@codebytere I understand. What I am saying is that we preserve backward compatibility.

simdutf < 3.2.9 contains a flawed detection routine to decide whether
AVX and/or AVX2 can be used. In most cases, it works good enough, but
in some corner cases (combination of specific CPU model, microcode
version, operating system and OS configuration), that flawed detection
results in simdutf trying to use AVX/AVX2 although the OS doesn't allow
its use (for example as a mitigation measure against the "gather data
sampling" vulnerability), which then crashes the application with an
illegal instruction error.

This fix is only needed for node < 18.17.0, because later versions use
a simdutf version > 3.2.9.

Cherry-picked and backported from simdutf/simdutf@55b107f

Co-authored-by: Daniel Lemire <daniel@lemire.me>
Co-authored-by: easyaspi314 <easyaspi314@users.noreply.github.com>
Signed-off-by: Pascal Ernster <git@hardfalcon.net>
@hardfalcon
Copy link
Author

What should I do about trope's Invalid Backport message?

It's not really a backport of any commit from the main branch because it only uses one specific commit from a simdutf version that is several versions ahead ((3.2.9 vs 3.2.2), whilst the corresponding commit in the main branches took a whole node version update, which in turn incurred an update from simdutf version 3.2.2 to 3.2.12.

@jkleinsc
Copy link
Contributor

@jkleinsc jkleinsc closed this Nov 27, 2023
@hardfalcon
Copy link
Author

hardfalcon commented Nov 27, 2023

@jkleinsc: Sorry, but I think there's a misunderstanding. There are two distinct AVX-releated bugs in Electron, one in Node's simdutf dependency, and another one in Node's base64 dependency. Despite the erronous commit message of #40540, that PR only contains the fix for the AVX bug in base64, but not the fix for the AVX bug in simdutf. You can easily see this if you simply compare the actual contents/changes of #40446 and #40540.

@codebytere codebytere added semver/patch backwards-compatible bug fixes backport-check-skip Skip trop's backport validity checking 26-x-y labels Nov 29, 2023
@jkleinsc jkleinsc merged commit eb21738 into electron:26-x-y Dec 8, 2023
13 checks passed
Copy link

welcome bot commented Dec 8, 2023

Congrats on merging your first pull request! 🎉🎉🎉

Copy link

release-clerk bot commented Dec 8, 2023

No Release Notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
26-x-y backport-check-skip Skip trop's backport validity checking semver/patch backwards-compatible bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants