-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ansible: use gcc 4.9 on CentOS 6 #809
Conversation
heads-up that we're dealing with some glibc bug on centos6 for 8.x builds on the linux distributions: nodesource/distributions#513, waiting for Chris to shave that yak so we can understand more. |
Here is what I know thus far: If you compile 8.3.0 on CentOS 6 with If you compile 8.3.0 on CentOS 6 with
You see the same bad behavior if you use the compilers from The error appears to be due to this bug which wasn't actually fixed until a newer version of This is problematic as the compiler requirement recently moved to "4.9.x" basically. So the options seem to be:
Happy to provide more info if I have it, just let me know. |
In other words, continue working around compiler bugs due to a glibc bug that was fixed 4.5 years ago. I'd say we should just raise the minimum supported glibc version to 2.17 (which came out on 25 Dec 2012). Enterprises still using CentOS 6 would need to upgrade to CentOS 7 or get a newer version of glibc. cc @nodejs/build |
Just speaking from experience, that won't happen. If we do that, we should understand that we are dropping support for CentOS 6 / RHEL 6. I'm not saying that to imply it's the wrong course of action, but we shouldn't kid ourselves about the consequences. |
AIUI successfully upgrading glibc is basically impossible (or at least not something you'd ever contemplate for a production system). |
It's possible to install a newer version in parallel: https://unix.stackexchange.com/a/299665/164374 |
@seishun It's possible yes, but that seems like a really bad thing for the Node Foundation to suggest for end users for several reasons.
There's a bit of a pickle here already of course, because 8.3.0 is already out in the wild, and was built with a compiler that doesn't meet the current compiler requirements. But that's probably a small issue in the grand scheme of things. I still think that if we're not willing to use the older compiler, and there's no way to code around this, then it's time we simply say CentOS 6 / RHEL 6 is too old and it's not supported anymore. That day is going to come eventually, and it's going to come before those distros reach EOL, so if we're at that point so be it. |
Node project shouldn't suggest anything. It should just state the glibc and compiler requirements and let the users (or repo maintainers) figure out how to satisfy them.
Java is maintained by Oracle, so they can follow whatever policy they like. But Node is mostly maintained by unpaid volunteers and having to work around old compiler bugs will turn people away. Requiring a glibc version that's less than 5 years old is not an unreasonable demand. |
Hey, like I said, I'm not in any way trying to suggest that saying "EL6 is too old, sorry" is the wrong answer here. Nor is it my decision to make. All I was trying to point out was that if that's what's happening, I think the project should be clear about it. |
I'd like to restate the working assumption "feature frozen system is a feature frozen system", that is, if user decided to freeze their OS for stability's sake, it is fairly safe to assume they have also frozen their |
Oh man @refack my life would be so much easier if ^^^ was true. ;-) |
Well then you are our gatekeeper, personally I haven't seen many "minor version upgrade broke my system" bugs in the core repo. So thank you, and sorry 🤗 |
It's hard to overstate how poorly this will go down: "Enterprises still using CentOS 6 would need to upgrade to CentOS 7". Folks are still using EL5 in production out there. In very large companies, these kinds of moves are very complicated and therefore move very slowly. We're a little too comfortable catering to nimble startups and early-adopter types that are more used to working on the cutting edge, or close to it. As you go out further than that we hit real world constraints where these things are really painful. Does clang offer us a get-out-of-jail here perchance? |
There might be a simple workaround. Can someone try this patch? diff --git a/deps/v8/src/trap-handler/handler-shared.cc b/deps/v8/src/trap-handler/handler-shared.cc
index 7b399f5eea..e4f0136be7 100644
--- a/deps/v8/src/trap-handler/handler-shared.cc
+++ b/deps/v8/src/trap-handler/handler-shared.cc
@@ -23,7 +23,9 @@ namespace v8 {
namespace internal {
namespace trap_handler {
-THREAD_LOCAL bool g_thread_in_wasm_code = false;
+// Note: sizeof(g_thread_in_wasm_code) must be > 1 to work around
+// https://sourceware.org/bugzilla/show_bug.cgi?id=14898
+THREAD_LOCAL int g_thread_in_wasm_code = false;
size_t gNumCodeObjects = 0;
CodeProtectionInfoListEntry* gCodeObjects = nullptr;
diff --git a/deps/v8/src/trap-handler/trap-handler.h b/deps/v8/src/trap-handler/trap-handler.h
index 5494c5fdb3..ed9459918b 100644
--- a/deps/v8/src/trap-handler/trap-handler.h
+++ b/deps/v8/src/trap-handler/trap-handler.h
@@ -65,7 +65,7 @@ inline bool UseTrapHandler() {
return FLAG_wasm_trap_handler && V8_TRAP_HANDLER_SUPPORTED;
}
-extern THREAD_LOCAL bool g_thread_in_wasm_code;
+extern THREAD_LOCAL int g_thread_in_wasm_code;
inline bool IsThreadInWasm() { return g_thread_in_wasm_code; }
|
Unlikely. It's a glibc bug, gcc is not at blame. |
@bnoordhuis Builds and runs fine with the patch. @rvagg Without undermining your point about large and slow companies, comparing "working on the cutting edge" with "using an EOL distro that was superseded 6 years ago" is a bit of a false dichotomy. |
Thanks. I'll check if upstream is amenable to this change. |
https://chromium-review.googlesource.com/c/612351 - let's wait and see. |
I'm curious why this isn't an issue with gcc 4.8.2 though. |
It's an issue for me with GCC 4.8.2 on RHEL6 |
Confirming (also) that 8.3.0 builds and runs on EL6 using that patch with the compilers from
As seems often the case, a great big thanks to @bnoordhuis for saving the day. A question (again I suppose for you Ben) since I'm not very familiar with how quickly things move through V8: is it reasonable to assume that they'll include this patch and that it will make it to the mainstream V8 that gets shipped with Node Soon (tm)? I ask because on the RPM packages side, it's fairly easy for me to use this patch and put in logic like "if el6 use the patch else don't". But if this is going to land in mainline next week or something, it's probably not worth it to do that, rebuild everything, and then back that out when it's no longer needed. Thanks again! |
V8 CLs can land very fast, if they are "amenable" to the actual change. |
Okay, thanks Ben. If that's the case I'll to a one-off rebuild just for EL6 RPM packages and this should be settled. |
Also updates glibc version to 2.17. This means the minimum supported distros are RHEL/CentOS 7 and Ubuntu 14.04. | Distro | Kernel | glibc | | --- | --- | --- | | Current | 2.6.32 | 2.12 | | RHEL6 | 2.6.32 | 2.12 | | New | 3.10.0 | 2.17 | | RHEL7 | 3.10.0 | 2.17 | | Ubuntu 14.04 | 3.13 | 2.19 | | Ubuntu 16.04 | 4.4 | 2.23 | | Latest | 4.12.5 | 2.26 | Refs: nodejs/build#809
glibc before 2.17 has a bug that makes it impossible to execute binaries that have single-byte thread-local variables: % node --version node: error while loading shared libraries: cannot allocate memory in static TLS block Work around that by making the one instance in the V8 code base an int. See: https://sourceware.org/bugzilla/show_bug.cgi?id=14898 See: nodesource/distributions#513 See: nodejs/build#809 Change-Id: Iefd8009100cd93e26cf8dc5dc03f2d622b423385 Reviewed-on: https://chromium-review.googlesource.com/612351 Commit-Queue: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-by: Eric Holk <eholk@chromium.org> Cr-Commit-Position: refs/heads/master@{#47400}
@seishun since ATM we don't have a test CentOS6 x32 machine and one one release machine, we probably don't want this to run on x32 machines, just in case. To sum discussion, we need these bot setups:
|
@refack Why do we need 4.8 on 64-bit? |
Testing & Building node v6 |
Makes sense. Does that mean the repo should have separate ansible scripts for 4.8 and >=4.9? |
(edited because I realized there is no 32-bit CentOS 6 test machine) On the other hand, we'll already have gcc 4.8 on CentOS 5 if we want to make sure node 6.x builds with gcc 4.8. Is there really a need to use gcc 4.8 on all other test machines for node 6.x? |
@sgallagher hi, I just saw you commenting on the c-ares thread and was wondering if you have any input on this? |
@refack Sorry, I've scanned the messages, but I'm not sure what question you want me to answer. Could you summarize, please? |
@sgallagher we want to upgrade to GCC 4.9.4, we couldn't find an RPM that will cross compile on CentOS6.64 for an 32bit target. |
I'm sure @sgallagher would agree with me here... As an upstream senior developer for glibc, and the glibc team lead for Red Hat Enterprise Linux, I strongly suggest that issues like this be taken upstream for your distribution. If you are supporting CentOS 6, take it upstream to RHEL6, and file a bug. That way my team can evaluate the direct impact of the problem, perhaps even providing a workaround (if one exists). Your bug always adds to the data we have in our tracker, and that helps us evaluate risk vs. reward for a fix in RHEL6 (in production phase 3, accepting only urgent bug fixes). The upstream glibc bug 14898 was a trivial fix of a singular internal constant (changed to -1) to allow a single-byte TLS variable to exist. It is a fix that has limited scope, little risk, and would fix this use case. The question to really answer is: Is this the problem you are seeing? If it is, then a trivial workaround is to increase the size of the object beyond 1 byte, that would safely change the size of the allocated object not to clash with the use of the internal constant (which is 1 also). So you have a workable solution to keep running on RHEL6 IMO. Lastly, I do not think this is a compiler issue, but it might be, perhaps the compiler was able to optimize and remove a TLS variable that was never used, and now you're down to just one variable of size 1-byte. It's hard for me to evaluate this in this ticket (there is a lot of noise here). Again, filling a ticket upstream would help us get to the bottom of this particular use case. |
@refack I would suggest you take a look at the You can use both the epel-6-x86_64 and epel-6-i386 configs to build for 64-bit and 32-bit, respectively. Feel free to reach out to me privately if you need help setting it up. |
@codonell thanks for the reply. The TLS issue was indeed a blocker, but as you state it was resolved. IMHO what we are facing now is more of a configuration/ABI-compatibility issue. That is we are now stuck at the link stage:
|
P.S. @codonell @sgallagher are there any obvious caveats in building on CentOS/RHEL6 with a non "canonical" GCC toolchain (i.e. 4.9.4) |
@refack Do you mean a custom-made toolchain or one of those from Red Hat’s Developer Toolset releases? Those at least are fully supported by Res Hat for production use (with a Red Hat subscription, of course). |
AFAIK Since node is an OSS project there's a tendency to use FOSS, so till now we used |
@refack Your question about building with a non-canonical toolchain is a very good one. In the case of devtoolset, the tooling is designed specifically to layer a new compiler (and parts of the runtime) on top of system runtimes. If you build a new toolchain from whole cloth, you are in a very different position, a position where you become responsible for bundling (a) the compiler runtime, (b) the language runtimes and (c) any new dependencies therein. Therefore using devtoolset (developer toolset) vs. using a freshly built gcc (which may need a new binutils, and whose generated code may or may not need a new glibc) are very different questions. I would suggest staying with devtoolset as much as you can since the results are predictable and well studied. |
@refack ping... what's the status of this? |
summary:
|
Ran CI with a commit that fails on gcc < 4.9. Looks good: https://ci.nodejs.org/job/node-test-commit-linux/14092/nodes=centos6-64/ Just curious, how did you solve the problem of testing and releasing on Node versions < 9.x? Also, while I appreciate your work here, upgrading gcc on select machines doesn't help much if it's not going to be upgraded on all machines used for CI on master. Which is why I'm going to wait until #797 (comment) is resolved before continuing with other machines that require upgrading. |
I tested the binary outputted by
👍 I'll use https://ci.nodejs.org/job/node-test-commit/14000/ as the basis for an upgrade task list |
Currently CentOS 6 machines use gcc 4.8 while Node.js requires "gcc and g++ 4.9.4 or newer". See #762.
devtoolset-3
has gcc 4.9.1, too old.devtoolset-5
has "gcc (GCC) 5.3.1 20160406", that's still older than gcc 4.9.4.devtoolset-6
has "gcc (GCC) 6.3.1 20170216", good enough.The CERN repo doesn't have
devtoolset
newer than 2, so I replaced that with Software Collections.