Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node aborts and core dumps as soon as it starts #43064

Closed
bluesmoon opened this issue May 12, 2022 · 21 comments
Closed

node aborts and core dumps as soon as it starts #43064

bluesmoon opened this issue May 12, 2022 · 21 comments

Comments

@bluesmoon
Copy link
Contributor

Version

v18.1.0

Platform

Linux 4d7a71cb7acd 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

start node from command line. no parameters

How often does it reproduce? Is there a required condition?

running inside a docker container, happens everytime when the container is on a CentOS host, but never happens on a MacOS host.

What is the expected behavior?

node should start

What do you see instead?

$ node
node[9]: ../src/node_platform.cc:61:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
 1: 0xb57f90 node::Abort() [node]
 2: 0xb5800e  [node]
 3: 0xbc915e  [node]
 4: 0xbc9230 node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
 5: 0xb1b3d1 node::InitializeOncePerProcess(int, char**, node::InitializationSettingsFlags, node::ProcessFlags::Flags) [node]
 6: 0xb1bc89 node::Start(int, char**) [node]
 7: 0x7f2ca389fd90  [/lib/x86_64-linux-gnu/libc.so.6]
 8: 0x7f2ca389fe40 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
 9: 0xa93f0e _start [node]
Aborted (core dumped)

Additional information

No response

@bnoordhuis
Copy link
Member

bnoordhuis commented May 12, 2022

Looks like the RLIMIT_NPROC resource limit (ulimit -u) is set too low.

edit: or there's a seccomp filter active that blocks system calls node needs, like clone2/clone3.

@bluesmoon
Copy link
Contributor Author

It's set to 1048576. In digging further, this seems to be limited to an Ubuntu 22.04 docker container running on a CentOS host. It works with other versions of Ubuntu and other host OSen

@bluesmoon
Copy link
Contributor Author

I believe it's the same issue as documented here: https://askubuntu.com/questions/1405417/20-04-vs-22-04-inside-docker-with-a-16-04-host-thread-start-failures but I've been unable to find any solutions documented.

@bluesmoon
Copy link
Contributor Author

Ok, I've isolated the problem to the docker version running on CentOS. After moving to docker-ce, the problem goes away.

Closing this ticket now.

@Dectom
Copy link

Dectom commented Apr 11, 2023

As a note, I have just experienced this issue, Any server I run with Docker version 20.10.7, build f0df350 has this issue however the ones that have Docker version 20.10.6, build 370c289 Do not.

I have just adjusted my image to use ubuntu 20.04 instead of 22.04

A Docker update might resolve this, However that's too destructive of an action and I can't update all servers to use new Docker Version at this time

@bluesmoon
Copy link
Contributor Author

@Dectom it took about 5 minutes to go through the docker version upgrade. IAC, the problem is not with docker or with node, it's with CentOS.

@jaspertandy
Copy link

jaspertandy commented Jun 14, 2023

Just in case anyone finds this/finds it useful, I've just had almost the identical issue on a Gitlab runner whose host OS is Debian Bullseye, but the target image for a docker run command was on node:18-bookworm-slim (was actually node:18-slim but the underlying OS changed since bookworm was released) and locking to node:18-bullseye-slim fixed the issue. Will upgrade our runner and try again soon, but as a sort term fix this thread was very helpful in reaching this conclusion.

@Dectom
Copy link

Dectom commented Jun 14, 2023

@Dectom it took about 5 minutes to go through the docker version upgrade. IAC, the problem is not with docker or with node, it's with CentOS.

The systems we had this issue on were running Rocky Linux, whilst similar to CentOS it is different.

Also, for us a Docker version changes aren't really a plausible scenario, but the increase in version is what is the problem, any node on a later version has the problem vs the earlier version.

We reverted down to Ubuntu 20.04 instead of 22.04 as we can't upgrade docker across the nodes with the problem to see what later version resolves the problem.

I have ran into this same issue with the same docker version working and not working across other OS's we ran Debian 10 and Ubuntu host nodes and experience the same problem.

@Michiel-s
Copy link

Just in case anyone finds this/finds it useful, I've just had almost the identical issue on a Gitlab runner whose host OS is Debian Bullseye, but the target image for a docker run command was on node:18-bookworm-slim (was actually node:18-slim but the underlying OS changed since bookworm was released) and locking to node:18-bullseye-slim fixed the issue. Will upgrade our runner and try again soon, but as a sort term fix this thread was very helpful in reaching this conclusion.

Thank you @jaspertandy, you saved my day. I ran into the same issue and couldn't find the beginning of a solution.

Making explicit to use image node:18-bullseye instead of node:18 (which defaults to node:18-bookworm worked for me.

@TimSlechten
Copy link

Just in case anyone finds this/finds it useful, I've just had almost the identical issue on a Gitlab runner whose host OS is Debian Bullseye, but the target image for a docker run command was on node:18-bookworm-slim (was actually node:18-slim but the underlying OS changed since bookworm was released) and locking to node:18-bullseye-slim fixed the issue.

Same here for a host running Buster. We were using the node:lts image, changing this to node:lts-bullseye fixed the issue for us.

benel added a commit to Hypertopic/AAAforREST that referenced this issue Jun 29, 2023
kevinlul added a commit to DawnbrandBots/bastion-bot that referenced this issue Jul 8, 2023
…u 20.04 host

nodejs/node#43064
```
node[1]: ../src/node_platform.cc:68:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
 1: 0xb7a940 node::Abort() [node]
 2: 0xb7a9be  [node]
 3: 0xbe98be  [node]
 4: 0xbe99a1 node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
 5: 0xb38f5b node::InitializeOncePerProcess(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, node::ProcessFlags::Flags) [node]
 6: 0xb395ab node::Start(int, char**) [node]
 7: 0x7fbedd7d318a  [/lib/x86_64-linux-gnu/libc.so.6]
 8: 0x7fbedd7d3245 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
 9: 0xabbdee _start [node]
```
kevinlul added a commit to DawnbrandBots/emcee-tournament-bot that referenced this issue Jul 9, 2023
…u 18.04 host

nodejs/node#43064 DawnbrandBots/bastion-bot@9c2c06793e3dd9b5628ff61a44e03309c43f6a361~
@boutell
Copy link

boutell commented Jul 10, 2023

I just hit this exact same problem, CentOS host, but podman rather than docker. And I solved it in the exact same way (using node-VERSION-bullseye rather than node-VERSION).

It's old podman though (as the official podman for CentOS 7 is quite old).

I don't know if this sheds any new light on the issue. It sounds like I probably won't have this issue when I switch to hosting podman on Debian itself. ✌️

@MSiteDev
Copy link

For me it fixed, after changing image from node:18 to node:18-alpine

baptisteArno pushed a commit to baptisteArno/typebot.io that referenced this issue Aug 11, 2023
Changed Dockerfile image due to node-18 compatibility issues with Docker
`version 20.10.7, build f0df350`

Ref
nodejs/node#43064
benel added a commit to Hypertopic/Cassandre that referenced this issue Sep 20, 2023
@bkatiemills
Copy link

Same story here, but it was when trying to run node:20.8.0-based containers on travis. Using node:20.8.0-alpine3.17 resolved.

@edrock200
Copy link

Not a fix by any means but putting the container in priv mode will get past the error as well

@rassie
Copy link

rassie commented Oct 17, 2023

For anyone interested, a quick workaround for a RHEL7/CentOS7 system is disabling (?) seccomp instead of upgrading to docker-ce. For docker-compose.yml it looks like this:

    security_opt:
      - seccomp:unconfined

Yes, it's insecure in the usual sense of the word, but it's also something that has to do with clone3, which basically means people still running RHEL7 don't have another choice (apart from updating to unsupported docker-ce).

@fabianobonomini
Copy link

same here using ubuntu 18.04LTS for the guest OS and node:18.18.1 image with Docker v 19.03.12 and Docker compose version 1.24

@Masterxilo
Copy link

Same issue here, node 20 immediately exited with

node[1]: ../src/node_platform.cc:68:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
 1: 0xc99970 node::Abort() [node]
 2: 0xc999ee  [node]
 3: 0xd19899 node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) [node]
 4: 0xd199bc node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
 5: 0xc53c23  [node]
 6: 0xc545b4 node::Start(int, char**) [node]
 7: 0x7f03bd024d90  [/lib/x86_64-linux-gnu/libc.so.6]
 8: 0x7f03bd024e40 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
 9: 0xbb0c7e _start [node]

within a container on two hosts running

viradmin@virologyngs01:~$ docker --version
Docker version 19.03.13, build 4484c46d9d
viradmin@virologyngs01:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal

and

viradmin@instance-1:~$ docker --version
Docker version 20.10.8, build 3967b7d
viradmin@instance-1:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

when trying to run an ubuntu:22.04@sha256:2b7412e6465c3c7fc5bb21d3e6f1917c167358449fecac8176c6e496e5c1f05f based container with node 20 installed.

Updating docker on these hosts to the current latest Docker version 24.0.7, build afdd53b fixed the issue.

parmentf added a commit to Inist-CNRS/web-services that referenced this issue Dec 6, 2023
Because node on bookworm and docker 20.10 does not work

See
nodejs/node#43064 (comment)
@icemagno
Copy link

icemagno commented Mar 1, 2024

Use --security-opt seccomp=unconfined in docker run command.

BTW: I HATE NodeJS !
Every day with a new problem. Update is a nightmare. Package version control is a hell. Take another person code make me want to die.
I think this was developed by Lucifer itself.

This ticket was open in 2022 ... and still we are.

Every time I need to face Node is a day I will cry until the night comes.

Not a single day it was as simple as mvn clean package

Sorry for this. I really needed to spit it out.

@LeoK80
Copy link

LeoK80 commented Apr 26, 2024

I've been running into this error deploying my image onto a k8s cluster. It ran fine locally (Mac OS) on Colima.
Thanks to some hints in this issue thread I've also have been able to get past it!

Not working situation:

  • Nodejs v21.7.3
  • Image based on node-lts (debian bookworm with RUN apt -y update && apt -y upgrade run directive in Dockerfile)
  • k8s worker OS at RHEL8 (4.18.0-513.11.1.el8_9.x86_64)
  • k8s worker with Container Runtime containerd v1.5.5

Working situation:

  • Nodejs v21.7.3
  • Image based on node-lts (debian bookworm with RUN apt -y update && apt -y upgrade run directive in Dockerfile)
  • k8s worker with node affinity set to one with OS at RHEL8 (4.18.0-513.18.1.el8_9.x86_64)
  • k8s worker with node affinity set to one with Container Runtime containerd v1.6.28

Though the OS is slightly different, it seems that the container runtime makes the difference.

Also successes of running node 20 apps are reported with (as reported by colleagues):

  • Nodejs 20.11.0
  • Image based on Debian 10.3
  • k8s worker OS RHEL 7.9 (3.10.0-1160.114.2.el7.x86_64)
  • k8s Container Runtime docker 1.13.1

It seems that there is some incompatibility between newer versions of Nodejs and some in between versions of container runtimes. An OS upgrade/downgrade will affect the package versions used by the OS (whatever is available for the OS version on the linux repository) and accidentally fix it, though to me it seems that most likely upgrading the Container Runtimeis the way forward with it. Upgrade manually to newer if package manager doesn't pull down anything newer.

@ailuoamang
Copy link

you can try docker run ...... --privileged,you will be thanks me

@boutell
Copy link

boutell commented Jun 20, 2024

you can try docker run ...... --privileged,you will be thanks me

This can make sense in certain use cases, but if your goal is to run code safely by running it in a container, you should never do it:

"Use the --privileged flag with caution. A container with --privileged is not a securely sandboxed process. Containers in this mode can get a root shell on the host and take control over the system."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests