rocketchat crashes with segmentation fault since nodejs 8.10 #19274

ghost · 2018-03-10T15:12:04Z

Version: 8.10
Platform: Gentoo Linux
Subsystem: v8 engine

Hello Devs.

Since nodejs 8.10 Rocketchat (https://rocket.chat/) crashes with segmentation fault. Compiled nodejs with debug and got this:

rocketchat@pages /opt/rocketchat $ gdb --args /usr/bin/node main.js 
GNU gdb (Gentoo 8.1 p1) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/node...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/node main.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff5b65700 (LWP 11199)]
[New Thread 0x7ffff5364700 (LWP 11200)]
[New Thread 0x7ffff4b63700 (LWP 11201)]
[New Thread 0x7ffff4362700 (LWP 11202)]
[New Thread 0x7ffff3956700 (LWP 11203)]
[Thread 0x7ffff3956700 (LWP 11203) exited]
[New Thread 0x7ffff3956700 (LWP 11204)]
[New Thread 0x7ffff3054700 (LWP 11205)]
[New Thread 0x7ffff2853700 (LWP 11206)]
[New Thread 0x7ffff2052700 (LWP 11207)]


#
# Fatal error in ../deps/v8/src/parsing/parser.cc, line 542
# Debug check failed: ThreadId::Current().Equals( outer_scope_info->GetIsolate()->thread_id()).
#

Thread 1 "node" received signal SIGILL, Illegal instruction.
0x000055555653e669 in v8::base::OS::Abort() ()
(gdb)

And:

(gdb) backtrace
#0  0x000055555653e669 in v8::base::OS::Abort() ()
#1  0x000055555653a8c8 in V8_Fatal(char const*, int, char const*, ...) ()
#2  0x000055555611a861 in v8::internal::Parser::DeserializeScopeChain(v8::internal::ParseInfo*, v8::internal::MaybeHandle<v8::internal::ScopeInfo>)
    ()
#3  0x000055555613f7bd in v8::internal::Parser::ParseFunction(v8::internal::Isolate*, v8::internal::ParseInfo*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) ()
#4  0x0000555556145979 in v8::internal::parsing::ParseFunction(v8::internal::ParseInfo*, v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Isolate*) ()
#5  0x0000555555dcf664 in v8::internal::Compiler::Compile(v8::internal::Handle<v8::internal::SharedFunctionInfo>, v8::internal::Compiler::ClearExceptionFlag) ()
#6  0x0000555555dd0e8e in v8::internal::Compiler::Compile(v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag) ()
#7  0x0000555556212a6b in v8::internal::Runtime_CompileLazy(int, v8::internal::Object**, v8::internal::Isolate*) ()
#8  0x00001baeb5d843c4 in ?? ()
#9  0x00001baeb5ed1595 in ?? ()
#10 0x00001baeb5d84301 in ?? ()
#11 0x00007ffff3155340 in ?? ()
#12 0x0000000000000006 in ?? ()
#13 0x00007ffff31553a0 in ?? ()
#14 0x00001baeb5d88217 in ?? ()
#15 0x000039a4c98f2ae9 in ?? ()
#16 0x00000b054ee022d1 in ?? ()
#17 0x000039a4c98f2ae9 in ?? ()
#18 0x0000000000000000 in ?? ()
(gdb)

Any idea how to fix this?

Here is the rocketchat issue: RocketChat/Rocket.Chat#10060

thanks and cheers

The text was updated successfully, but these errors were encountered:

targos · 2018-03-10T17:02:08Z

@nodejs/v8

hashseed · 2018-03-10T18:14:42Z

I'd check whether this is fixed with newer V8 and bisect for the fix.

targos · 2018-03-10T18:16:16Z

I could reproduce. Then I wanted to run it again with node master but there is a dependency that uses a V8 API that doesn't exist anymore

hashseed · 2018-03-10T18:22:22Z

I commented on the Rocket Chat issue. From the debug build output it looks like API abuse to me.

hashseed · 2018-03-10T18:57:39Z

I was wrong. This rather looks like the parser is running on a thread it is not supposed to be on.

ghost · 2018-03-11T08:25:51Z

If you have a patch at some point, i'm happy to to test.

nodejs/node#19274

benjamn · 2018-03-12T14:17:28Z

I've seen the thread ID error when running a debug build of Node and using https://github.com/laverdet/node-fibers (which I believe RocketChat uses, since it's a Meteor app?):

# Fatal error in ../deps/v8/src/parsing/parser.cc, line 542
# Debug check failed: ThreadId::Current().Equals( outer_scope_info->GetIsolate()->thread_id()).

Specifically, the fibers library implements coroutines, which operate in a way that looks like cooperative multithreading from the perspective of V8 (even though fibers does not actually spawn threads), so any parts of Node that are not prepared to deal with multiple (cooperating, non-overlapping) threads can get confused.

V8 itself is designed to tolerate multiple threads as long as you're careful to lock and unlock the Isolate object, so the fibers library isn't doing anything forbidden or dangerous as far as V8 is concerned, though Node is a little less thread-aware than V8 in some areas.

While the ThreadId::Current().Equals(...) assertion failure is interesting, I don't think it's the root of the problem, since the segmentation fault still happens with non-debug builds of Node, where this check is omitted. I can provide a patched version of the fibers library that does not fail this assertion, if that's helpful. I have seen segmentation faults with Node 8.10.0 running locally with this patched version of fibers, though, so I think something else must be going wrong.

bnoordhuis · 2018-03-12T16:12:19Z

node-fibers mucks around with thread-local storage to trick V8 into believing it's running on multiple threads. Some recent change probably broke that hack but that's ultimately a node-fibers issue.

hashseed · 2018-03-12T16:28:17Z

umm... I'm shocked.

benjamn · 2018-03-12T16:50:20Z

Just to be clear: we have not pinned this down to fibers yet, though that seems possible, and of course those of us who rely on fibers are willing to put in work to keep them compatible with Node and V8. Let’s wait and see what the real problem is before assuming the solution will be unwelcome to anyone?

benjamn · 2018-03-12T16:51:56Z

As soon as we know this is a fibers issue for sure, I definitely support moving the discussion over to that repo!

hashseed · 2018-03-12T16:52:47Z

Definitely support finding out the root cause. I'm just expressing my surprise that node-fibers would manipulate TLS.

benjamn · 2018-03-12T17:02:38Z

In principle, fibers could be implemented by actually spawning new threads, and then there would be nothing weird about how it works. I agree there’s some trickery involved in getting V8 to think there are multiple threads involved, without incurring the full overhead of spawning threads, and if those tricks are the source of this problem, then of course that’s a fibers problem. Those ticks have worked fairly well for several years, so I’m confident we can find a solution that involves fewer tricks (including just spawning real threads, if necessary).

hashseed · 2018-03-12T17:05:26Z

Maybe someone can bisect the changes between 8.9 and 8.10 to see what introduced the issue to surface?

benjamn · 2018-03-12T17:21:35Z

@hashseed Yep! We (@abernix and I) will report back when we’ve bisected it down (we have a reliable repro).

MylesBorins · 2018-03-12T18:21:50Z

@benjamn has anyone reported similar problems on 9.x? It seems odd that this is just showing up now

benjamn · 2018-03-12T18:27:33Z

@MylesBorins If this is a meteor-specific problem (fibers-related or otherwise), then it’s relevant to note that each version of meteor ships with a specific (stable) node version, and 8.9.4 is the most recent version we’ve shipped (in meteor 1.6.1). You could run an unreleased build of meteor with Node 9, but that’s uncommon.

MylesBorins · 2018-03-12T23:46:17Z

@benjamn with that in mind we should try and figure out how to get some of the meteor stuff into CITGM so we can find these problems early 😄

abernix · 2018-03-16T16:06:24Z

@hashseed It took a little bit of dancing around and shuffling commits to avoid some of the surrounding un-buildable commits, but this bisects down to bede7a3 (deps: update V8 to 6.2.414.46).

Is there a documented process somewhere as to how this larger V8 update commit is formed? Is it essentially a diff between the current state of deps/v8 and that exact revision of V8? (Any advice on how to proceed within that commit would be greatly appreciated, but that seems like my next step.)

hashseed · 2018-03-16T16:16:54Z

@abernix thanks for bisecting!

It seems that the commit updated V8 from 6.1.534.51 to 6.2.414.46. Bisecting across patch levels probably will cause issues, so I think it's best to look between 6.1.534 and 6.2.414.46.

In a V8 check out I get:

git log 6.1.534..6.2.414.46 --oneline | wc -l
1043

So bisecting about 10 steps in V8 should give you the culprit. To do so, check out V8 and update Node.js like this on every bisect step

v8/tools/release/update_node.py $V8_DIR $NODE_DIR

Then build and try to repro.

benjamn · 2018-03-19T17:31:56Z

🎉 Good news! 🐰🎩🔮😌

After @abernix bisected Node between v8.9.4 and v8.10.0 to confirm that the V8 6.2 update introduced the problem, and I bisected V8 between 6.1.534 (the version used in Node 8.9.4) and 6.2.414 (the version used in Node 8.10.0) to discover that the problem began happening between 6.2.145 and 6.2.146, we found a single change among the 42 commits in that range that pretty conclusively causes the segmentation faults.

I say "conclusively" because I can rebuild Node 8.10.0 with just this one commit reverted, and the problem disappears in every case where we were previously able to reproduce it.

Here's the final output of git bisect log after running git bisect start 6.2.146~1 6.2.145~1:

# bad: [28f25699ab2395324a425c3cb07ade53c79de322] [parser] Various cleanup for async function parsing
# good: [fa53a0dae76b186d99259bf775934a6623162796] [wasm] Fix API prototype chains
git bisect start '6.2.146~1' '6.2.145~1'
# good: [9735d7f1095efa3be8a289937d70d93bab7b24ad] [wasm] Fix link error messages to be more indicative of the actual error
git bisect good 9735d7f1095efa3be8a289937d70d93bab7b24ad
# good: [448a1d4bb513613d8c39d2e4eafbb2642602e651] [ic] Drop Array constructor support from CallIC.
git bisect good 448a1d4bb513613d8c39d2e4eafbb2642602e651
# bad: [ea0e1e21ecc13884302e0c77edad67659f2e68b4] Fixing failure on GC stress.
git bisect bad ea0e1e21ecc13884302e0c77edad67659f2e68b4
# good: [e91b96922efbcdc72db3684df8d065d560bfa900] [Compiler] Move construction of CompilationInfo into GenerateUnoptimizedCode
git bisect good e91b96922efbcdc72db3684df8d065d560bfa900
# good: [fd87a3c4236ed5bef4252818e40a38f020cdf671] [wasm] Remove redundant parameter
git bisect good fd87a3c4236ed5bef4252818e40a38f020cdf671
# first bad commit: [ea0e1e21ecc13884302e0c77edad67659f2e68b4] Fixing failure on GC stress.

In other words, the offending commit appears to be v8/v8@ea0e1e2, which (as @abernix points out), was originally intended to fix a problem introduced by v8/v8@e15f554, though that commit was later reverted by v8/v8@a193fde, and has not successfully re-landed since then.

Since the problem that v8/v8@ea0e1e2 fixed is no longer present, arguably v8/v8@ea0e1e2 itself could be (should be?) reverted in V8 and then cherry-picked into Node, even if we don't know exactly why it caused this particular problem. See below for a plausible theory, but I'll be the first to admit it's probably not entirely correct. Nevertheless, I think we can reach the same conclusion without getting wrapped up in a debate about the theory.

Recommendation

We believe that v8/v8@ea0e1e2 should be reverted, and the revert commit should be cherry-picked into Node's copy of V8, and then hopefully released in Node v8.10.1.

Though the use of fibers in Meteor certainly does exacerbate this problem (see below for some ideas about why this makes sense), we do not know of any changes that could be made to https://github.com/laverdet/node-fibers or Meteor that would fix this problem.

If this recommendation seems reasonable, what's the best way to proceed? Specifically, should I submit a V8 PR/CL based on 6.2.414.42, or the current version (6.7.106, much more recent)?

Detailed theory

Content warning: leaps of logic, speculation, hand-waving 👋

Taking a closer look at the commit in question, I think there are a few observations we can make:

commit ea0e1e21ecc13884302e0c77edad67659f2e68b4
Author: Juliana Franco <jupvfranco@google.com>
Date:   Fri Aug 4 10:45:33 2017 +0200

    Fixing failure on GC stress.
    
    This bug was introduced by the CL
    https://chromium-review.googlesource.com/c/586707
    
    With these changes we make sure that the object being deoptimized
    does not point to code objects that have been already collected.
    The CL https://chromium-review.googlesource.com/c/596027 did not
    fix this problem because we were only invalidating embedded objects
    reachable from the stack, however it is possible that there are some
    dangling references in objects not on the stack. Thus we consider
    all the optimized code objects that are marked for deoptimization.
    
    Bug: v8:751825
    Change-Id: I3a6410c2bf556fa254c54a25e1f49d7356b9e51d
    Reviewed-on: https://chromium-review.googlesource.com/601967
    Commit-Queue: Juliana Patricia Vicente Franco <jupvfranco@google.com>
    Reviewed-by: Jaroslav Sevcik <jarin@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#47163}

diff --git a/src/deoptimizer.cc b/src/deoptimizer.cc
index 2f5657e82e..3e90127db3 100644
--- a/src/deoptimizer.cc
+++ b/src/deoptimizer.cc
@@ -301,71 +301,70 @@ void Deoptimizer::DeoptimizeMarkedCodeForContext(Context* context) {
         safe_to_deopt_topmost_optimized_code = safe_if_deopt_triggered;
       }
     }
   }
 #endif
 
   // Move marked code from the optimized code list to the deoptimized
   // code list.
   // Walk over all optimized code objects in this native context.
   Code* prev = NULL;
   Object* element = context->OptimizedCodeListHead();
   while (!element->IsUndefined(isolate)) {
     Code* code = Code::cast(element);
     CHECK_EQ(code->kind(), Code::OPTIMIZED_FUNCTION);
     Object* next = code->next_code_link();
 
     if (code->marked_for_deoptimization()) {
+      // Make sure that this object does not point to any garbage.
+      code->InvalidateEmbeddedObjects();
       if (prev != NULL) {
         // Skip this code in the optimized code list.
         prev->set_next_code_link(next);
       } else {
         // There was no previous node, the next node is the new head.
         context->SetOptimizedCodeListHead(next);
       }
 
       // Move the code to the _deoptimized_ code list.
       code->set_next_code_link(context->DeoptimizedCodeListHead());
       context->SetDeoptimizedCodeListHead(code);
     } else {
       // Not marked; preserve this element.
       prev = code;
     }
     element = next;
   }
 
   // Finds the with activations of codes marked for deoptimization, search for
   // the trampoline to the deoptimizer call respective to each code, and use it
   // to replace the current pc on the stack.
   for (StackFrameIterator it(isolate, isolate->thread_local_top()); !it.done();
        it.Advance()) {
     if (it.frame()->type() == StackFrame::OPTIMIZED) {
       Code* code = it.frame()->LookupCode();
       if (code->kind() == Code::OPTIMIZED_FUNCTION &&
           code->marked_for_deoptimization()) {
         // Obtain the trampoline to the deoptimizer call.
         SafepointEntry safepoint = code->GetSafepointEntry(it.frame()->pc());
         int trampoline_pc = safepoint.trampoline_pc();
         DCHECK_IMPLIES(code == topmost_optimized_code,
                        safe_to_deopt_topmost_optimized_code);
         // Replace the current pc on the stack with the trampoline.
         it.frame()->set_pc(code->instruction_start() + trampoline_pc);
-
-        // Make sure that this object does not point to any garbage.
-        code->InvalidateEmbeddedObjects();
       }
     }
   }
 }

Because this change is so simple, it seems reasonable to conclude that code->InvalidateEmbeddedObjects() must be getting called more often than before.

What could go wrong if code->InvalidateEmbeddedObjects() is called at an inappropriate time? Judging just from the name, I would guess that more embedded objects could be invalidated. By itself, that doesn't sound like a problem, as long as the invalidation logic is sound, so we need to consider why v8/v8@ea0e1e2 might lead to unsound invalidation.

As @abernix suggested in this comment, the original call to code->InvalidateEmbeddedObjects() was protected by the scoped StackFrameIterator it(isolate, isolate->thread_local_top) object, which means (among other things) that the body of the loop would not be executed when isolate->InContext() returns false. Reasoning: each StackFrameIterator has an AsyncStackTrace member, and AsyncStackTrace::capture doesn't capture any frames if isolate->InContext() is false.

More generally, moving code->InvalidateEmbeddedObjects() out of the stack frame loop relaxes any assumptions that were previously enforced by the StackFrameIterator. In particular, since isolate->context() returns a thread-local pointer, my tentative theory is that isolate->InContext() is supposed to return false while execution is happening on a different thread, which previously prevented the invalidation from happening in those cases.

This theory is relevant to our previous fibers discussion, but not limited to just fibers. We believe any multithreaded program could benefit from reverting v8/v8@ea0e1e2.

How does this become a problem for Node?

In the reproduction we used for bisecting, almost every core dump stack trace included this line:

  if (env_->tick_callback_function()->Call(process, 0, nullptr).IsEmpty()) {

This Call ultimately calls into JavaScript through generated assembly code, to invoke the process._tickCallback function to process any nextTick callbacks.

The process tended to crash while calling into native code here, which is consistent with a theory that the process._tickCallback function object (or some other object involved in calling process._tickCallback) had been mistakenly invalidated/collected by the GC.

Performance implications? Unlikely!

Ignoring issues of correctness, the new code is doing more work than it previously did, because code->InvalidateEmbeddedObjects() is called more frequently, which could be related to the performance regression flagged in webpack/webpack#6767 and #19444.

That's pure speculation on my part (even more so than other parts of this comment)… but it couldn't hurt to do less invalidation work, right?

Thanks

Many thanks to @hashseed for his bisection workflow recommendation, @abernix for working with me to narrow the problem down, Ninja for speeding up Node builds considerably, and my partner for putting up with a weekend of me babysitting this epic bisection.

hashseed · 2018-03-19T18:39:29Z

I'm not familiar with the code, but the suggestion sounds reasonable.

I have a theory.

Deoptimizer::DeoptimizeMarkedCodeForContext deoptimizes optimized code that we previously marked. This roughly involves wiping references from code to objects so that they can be garbage collected (Code::InvalidateEmbeddedObjects), and overwriting the return address to jump to the deopt sequence.

Prior to v8/v8@ea0e1e2, we would only invalidate code on the current stack. After it, we would invalidate all code from the same context.

In cases where V8 is embedded with v8::Locker and the same isolate runs on several threads and has more than one stack, this means that we could now also be invalidating code from other threads. However, we still only patch the return addresses to the deopt sequence only for code on the current thread.

So to reproduce this bug, what needs to happen is:

Have optimized code A and optimized code B.
Run A on thread 1 and B on thread 2.
Cause both to deoptimize.
From thread 1, get V8 to call Deoptimizer::DeoptimizeMarkedCodeForContext.
This will invalidate object references in both A and B, but only correctly patch the stack in thread 1.
Switch over to thread 2 and we crash because we load undefined in B where we do not expect it.

So we should definitely revert that change. Thanks for the bisect!

@bmeurer does that sound correct to you?

Did not remove ActivationsFinder from `src/runtime/runtime-compiler.cc` as in the original commit as the Class is still being used prior to f0acede landing Original Commit Message: Deoptimization and multithreading. When using Lockers and Unlockers it is possible to create a scenario where multiple threads point to the same optimized code object. When that happens, if one of the threads triggers deoptimization, then the stack replacement needs to happen in the stacks of all threads. With this CL, the deoptimizer visits all threads to do so. The CL also adds three tests where V8 used to crash due to this issue. Bug: v8:6563 Change-Id: I74e9af472d4833aa8d13e579df45133791f6a503 Reviewed-on: https://chromium-review.googlesource.com/670783 Reviewed-by: Jaroslav Sevcik <jarin@chromium.org> Commit-Queue: Juliana Patricia Vicente Franco <jupvfranco@google.com> Cr-Commit-Position: refs/heads/master@{#48060} PR-URL: #19477 Fixes: #19274 Refs: v8/v8@596d55a Refs: v8/v8@f0acede Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com> Reviewed-By: Benedikt Meurer <benedikt.meurer@gmail.com>

Did not remove ActivationsFinder from `src/runtime/runtime-compiler.cc` as in the original commit as the Class is still being used prior to f0acede landing Original Commit Message: Deoptimization and multithreading. When using Lockers and Unlockers it is possible to create a scenario where multiple threads point to the same optimized code object. When that happens, if one of the threads triggers deoptimization, then the stack replacement needs to happen in the stacks of all threads. With this CL, the deoptimizer visits all threads to do so. The CL also adds three tests where V8 used to crash due to this issue. Bug: v8:6563 Change-Id: I74e9af472d4833aa8d13e579df45133791f6a503 Reviewed-on: https://chromium-review.googlesource.com/670783 Reviewed-by: Jaroslav Sevcik <jarin@chromium.org> Commit-Queue: Juliana Patricia Vicente Franco <jupvfranco@google.com> Cr-Commit-Position: refs/heads/master@{nodejs#48060} PR-URL: nodejs#19477 Fixes: nodejs#19274 Refs: v8/v8@596d55a Refs: v8/v8@f0acede Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com> Reviewed-By: Benedikt Meurer <benedikt.meurer@gmail.com>

ghost · 2018-04-05T06:52:38Z

I just tried the d46fafc with 8.11.1. It works so far. One question though. There is no file called test-locker.cc. Is that an additional file somewhere?

thanks and cheers

abernix · 2018-04-05T07:29:49Z

@himBeere Are you referring to the (plural) test-lockers.cc, which is located in deps/v8/test/cctest/test-lockers.cc? (It should be there!)

ghost · 2018-04-05T07:39:16Z

The patch is there, yes. But which file should I patch? It's note included in the source as far as I can see.

pages /usr/src # cd nodejs/
pages /usr/src/nodejs # wget https://nodejs.org/dist/v8.11.1/node-v8.11.1.tar.gz
--2018-04-05 09:34:38--  https://nodejs.org/dist/v8.11.1/node-v8.11.1.tar.gz
Resolving nodejs.org... 104.20.22.46, 104.20.23.46, 2400:cb00:2048:1::6814:162e, ...
Connecting to nodejs.org|104.20.22.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31030225 (30M) [application/gzip]
Saving to: ‘node-v8.11.1.tar.gz’

node-v8.11.1.tar.gz                    100%[=========================================================================>]  29.59M  8.54MB/s    in 3.9s    

2018-04-05 09:34:42 (7.53 MB/s) - ‘node-v8.11.1.tar.gz’ saved [31030225/31030225]

pages /usr/src/nodejs # tar xvzf node-v8.11.1.tar.gz 
node-v8.11.1/
node-v8.11.1/.nycrc
node-v8.11.1/.remarkrc
node-v8.11.1/android-configure
...

...
node-v8.11.1/benchmark/arrays/zero-float.js
node-v8.11.1/benchmark/arrays/zero-int.js
pages /usr/src/nodejs # find . -name test-lockers.cc
pages /usr/src/nodejs #

thanks and cheers

MylesBorins · 2018-04-05T07:50:38Z

In the node repo run the following command

curl -L https://github.com/nodejs/node/commit/d46fafc8c990899b4890dee2d6d8079c1308051f.patch | git am -3

MylesBorins · 2018-04-05T07:52:36Z

Also worth noting that this patch has not yet gonna wait in an 8.x release 8.11.1 was a patch release due to infra related issues. Another version of 8.x will be coming in just over 2 weeks

ghost · 2018-04-05T07:59:19Z

@MylesBorins good to know. I understand the meteor guys are trying to fix the issue themself for now.

abernix · 2018-04-05T08:28:52Z

@himBeere Correct, we (as in, Meteor, myself and @benjamn included) are trying to mitigate the situation which arose after 8.11.1 was announced. Of course, since it's a security update, many of our users are updating to it in the name of security (makes sense!). Unfortunately, it includes the breaking change outlined in this issue which causes segmentation faults for many of our users.

We're currently deploying a fix to our own hosting platform, Galaxy, by using our own custom build of Node 8.11.1 with the d46fafc commit applied, but that leaves non-Galaxy users needing to make the decision (and often unbeknownst to them, until they run into either problem) of whether to stay on 8.11.0 or tolerate the segmentation faults.

We're anxiously awaiting the 8.11.2(?) which should solve this!

rodrigok · 2018-04-09T20:15:25Z

News about the 8.11.2 release date?

MylesBorins · 2018-04-09T20:20:17Z

@rodrigok we are aiming to get an r.c. out tomorrow with a release date of the 24th

/cc @gibfahn

bazineta · 2018-04-09T21:15:25Z

At least anecdotally, the meteor build seems to have resolved ongoing segmentation faults in our pm2 God processes; 'anecdotally' and 'seems' because proving a negative is difficult, but we've been solid since applying the patch, and we've got quite the collection of core dumps from Mars prior to it.

Did not remove ActivationsFinder from `src/runtime/runtime-compiler.cc` as in the original commit as the Class is still being used prior to f0acede landing Original Commit Message: Deoptimization and multithreading. When using Lockers and Unlockers it is possible to create a scenario where multiple threads point to the same optimized code object. When that happens, if one of the threads triggers deoptimization, then the stack replacement needs to happen in the stacks of all threads. With this CL, the deoptimizer visits all threads to do so. The CL also adds three tests where V8 used to crash due to this issue. Bug: v8:6563 Change-Id: I74e9af472d4833aa8d13e579df45133791f6a503 Reviewed-on: https://chromium-review.googlesource.com/670783 Reviewed-by: Jaroslav Sevcik <jarin@chromium.org> Commit-Queue: Juliana Patricia Vicente Franco <jupvfranco@google.com> Cr-Commit-Position: refs/heads/master@{nodejs#48060} PR-URL: nodejs#19477 Fixes: nodejs#19274 Refs: v8/v8@596d55a Refs: v8/v8@f0acede Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com> Reviewed-By: Benedikt Meurer <benedikt.meurer@gmail.com>

ghost · 2018-05-16T06:43:20Z

Fix seems to be in 8.11.2. Is that right?

benjamn · 2018-05-16T13:49:03Z

Yes, I think this can be closed!

rodrigok · 2018-05-16T14:40:54Z

We didn't release an official version using 8.11.2 yet, we will do it soon 😄

apapirovski · 2018-06-25T05:07:40Z

Sounds like this can be closed. Feel free to reopen if I'm incorrect.

ghost mentioned this issue Mar 10, 2018

rocketchat crash with nodejs-8.10.0 RocketChat/Rocket.Chat#10060

Closed

targos added the v8 engine Issues and PRs related to the V8 dependency. label Mar 10, 2018

piscisaureus added a commit to propelml/propel that referenced this issue Mar 11, 2018

ci: don't use node 8.10, which is broken

8250320

nodejs/node#19274

vladholubiev mentioned this issue Mar 12, 2018

Update Node to 8.10.0 and npm to 5.7.1. meteor/meteor#9725

Closed

ghost mentioned this issue Mar 13, 2018

Crash on startup after upgrade to 0.62.2 RocketChat/Rocket.Chat#10114

Closed

micw mentioned this issue Mar 13, 2018

0.62.2 broken (crashes on startup) RocketChat/Docker.Official.Image#40

Closed

This was referenced Mar 19, 2018

Webpack 4 hot rebuild time twice as slow webpack/webpack#6767

Closed

Performance regression between 8.9 and 8.10 #19444

Closed

benjamn mentioned this issue Mar 29, 2018

Update Node to version 8.11.1 (with v8 patch). meteor/meteor#9783

Closed

benjamn mentioned this issue Mar 31, 2018

Release 1.6.1.1 meteor/meteor#9789

Merged

ghost mentioned this issue Apr 4, 2018

Rocket.Chat vs. NodeJS 8.11.1 (or rather > 8.9.4): Random SEGV (segmentation violation) RocketChat/Rocket.Chat#10331

Closed

benjamn mentioned this issue Apr 4, 2018

Use custom Meteor 1.6.1.1 build of Node 8.11.1. meteor/galaxy-images#2

Merged

benjamn mentioned this issue Apr 6, 2018

Release 1.6.2 meteor/meteor#9559

Closed

sebakerckhof mentioned this issue Apr 9, 2018

[1.6.1.1] Meteor crashes in production multiple times per day meteor/meteor#9804

Closed

hwillson mentioned this issue Apr 9, 2018

Using local file npm package makes app crash when deploying meteor/meteor#9791

Closed

aboire mentioned this issue Apr 16, 2018

deps: V8: backport 596d55a from upstream aboire/node#1

Merged

4 tasks

antobinary mentioned this issue Apr 17, 2018

Segmentation fault (core dumped) HTML5-client bigbluebutton/bigbluebutton#5353

Closed

bazineta mentioned this issue May 2, 2018

Segfault, v8::internal::TransitionsAccessor::Initialize() #20470

Closed

artch mentioned this issue May 16, 2018

Node 8.11.2 fixed version released laverdet/isolated-vm#66

Closed

apapirovski closed this as completed Jun 25, 2018

iamfasal mentioned this issue Oct 29, 2021

RC 4.0.0 node upgrade fails with segfault error RocketChat/Rocket.Chat#23346

Closed

rocketchat crashes with segmentation fault since nodejs 8.10 #19274

rocketchat crashes with segmentation fault since nodejs 8.10 #19274

Comments

ghost commented Mar 10, 2018 • edited by ghost Loading

targos commented Mar 10, 2018

hashseed commented Mar 10, 2018

targos commented Mar 10, 2018 • edited Loading

hashseed commented Mar 10, 2018

hashseed commented Mar 10, 2018

ghost commented Mar 11, 2018

benjamn commented Mar 12, 2018 • edited Loading

bnoordhuis commented Mar 12, 2018

hashseed commented Mar 12, 2018

benjamn commented Mar 12, 2018

benjamn commented Mar 12, 2018

hashseed commented Mar 12, 2018

benjamn commented Mar 12, 2018

hashseed commented Mar 12, 2018

benjamn commented Mar 12, 2018

MylesBorins commented Mar 12, 2018

benjamn commented Mar 12, 2018

MylesBorins commented Mar 12, 2018

abernix commented Mar 16, 2018

hashseed commented Mar 16, 2018

benjamn commented Mar 19, 2018 • edited Loading

🎉 Good news! 🐰🎩🔮😌

Recommendation

Detailed theory

How does this become a problem for Node?

Performance implications? Unlikely!

Thanks

hashseed commented Mar 19, 2018

ghost commented Apr 5, 2018

abernix commented Apr 5, 2018

ghost commented Apr 5, 2018 • edited by ghost Loading

MylesBorins commented Apr 5, 2018

MylesBorins commented Apr 5, 2018

ghost commented Apr 5, 2018

abernix commented Apr 5, 2018

rodrigok commented Apr 9, 2018

MylesBorins commented Apr 9, 2018

bazineta commented Apr 9, 2018

ghost commented May 16, 2018

benjamn commented May 16, 2018

rodrigok commented May 16, 2018

apapirovski commented Jun 25, 2018

ghost commented Mar 10, 2018 •

edited by ghost

Loading

targos commented Mar 10, 2018 •

edited

Loading

benjamn commented Mar 12, 2018 •

edited

Loading

benjamn commented Mar 19, 2018 •

edited

Loading

ghost commented Apr 5, 2018 •

edited by ghost

Loading