Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux prebuild 9.1.0 not compatible with musl libc #2266

Open
robertsLando opened this issue Jun 11, 2021 · 47 comments
Open

Linux prebuild 9.1.0 not compatible with musl libc #2266

robertsLando opened this issue Jun 11, 2021 · 47 comments
Labels
linux Workaround Solution found but more elegant solution desired

Comments

@robertsLando
Copy link

Ref: zwave-js/zwave-js-ui#1324

@GazHank
Copy link
Contributor

GazHank commented Jun 13, 2021

Hi @robertsLando, based on the linked issue it sounds like you have narrowed the issue down to glibc vs musl libc. Do you know which specific feature of change is triggering the problem?

@robertsLando
Copy link
Author

@marcus-j-davies told me that, maybe he have more info about this

@marcus-j-davies
Copy link

marcus-j-davies commented Jun 14, 2021

Hello,

So, a project of mine was using 9.0.7 of serialport, and with that - everything woking.
I bumped to serialport 9.1.0, and naturally pushed out an update.

I then had a user, who had issues trying to update my package (his environment was opensuse leap 15.2).
he had no issues, up until this point.

Looking at logs, there is a strong reference to glibc 2.28.

555 error code 1
556 error path /root/.node-red/node_modules/@serialport/bindings
557 error command failed
558 error command sh -c prebuild-install --tag-prefix @serialport/bindings@ || node-gyp rebuild
559 error make: Entering directory '/root/.node-red/node_modules/@serialport/bindings/build'
559 error   CXX(target) Release/obj.target/bindings/src/serialport.o
559 error make: Leaving directory '/root/.node-red/node_modules/@serialport/bindings/build'
560 error prebuild-install WARN install /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /root/.node-red/node_modules/@serialport/bindings/build/Release/bindings.node)
560 error gyp info it worked if it ends with ok

This stopped serialport from compiling - and naturally failed entirely.
He was able to update my package, by downgrading serialport in package.js.

I wonder if there is a redirect for glibc -> musl libc and musl libc may have some missing symbols present in glibc 2.28?
My users environment was using glibc 2.26, so I can only guess something from 2.28 is now required.

Sadly, I am not a pro here - but hopefully, this helps.

I attached the log in question.

2021-05-31T10_12_34_850Z-debug.log

@GazHank
Copy link
Contributor

GazHank commented Jun 14, 2021

Thanks @marcus-j-davies, at the moment I can't see anything that changed in the actual code base which would trigger this, so I'll need to do a little digging to see if I can recreate the issue. I hope this isnt just a dependency or build change in the background which has just crept in between 9.0.7 and 9.1.0.

I have to admit that I've been using a full local rebuild so that I can target the newer node versions, so I'll need to see if I can setup a suitable environment to test those older builds

@marcus-j-davies
Copy link

No problem.

This is what (little) I know.

SP 9.0.7 + glibc 2.26 : OK
SP 9.1.0 + glibc 2.26 : Build error (GLIBC_2.28' not found)
SP 9.1.0 + glibc 2.28 : OK
SP 9.1.0 + musl libc : segfault - As per @robertsLando report.

@robertsLando
Copy link
Author

In my case the error isn't on compilation but when the connection with the tty is setup, in fact based on user logs you can see the log Connecting to /dev/ttyACM0 right before the segmentation fault error.

Dunno if the two things are related but there is for sure something weird here

@robertsLando
Copy link
Author

@syphernl if you could give us some more informations about your os/env maybe this could help debugging the segmentation fault error with zwavejs2mqtt

@reconbot
Copy link
Member

reconbot commented Jun 14, 2021

Are you able to rebuild on the effected OS and see if if it compiles? The precompiled binaries are always going to be glibc. Error messages during compilation might be illuminating.

@robertsLando
Copy link
Author

robertsLando commented Jun 14, 2021

I think you could use this dockerfile for your tests

FROM node:15.11.0-alpine

RUN apk --no-cache add \
    coreutils \
    linux-headers \
    alpine-sdk \	
    python

WORKDIR /usr/app

RUN npm install serialport@9.1.0

CMD "sh"

Once builded just run it and do the required tests with serialport

@GazHank
Copy link
Contributor

GazHank commented Jun 14, 2021

So based on the discussion it sounds like the breakage between 9.0.7 and 9.1.0 is due to updates in the github build environment, and as such if we were to ask github to (p)rebuild 9.0.7 again now it would also face similar issues.

Given the usecase and range of supported environments it looks like a build from source would be needed per the instructions for Alpine Linux

@robertsLando are you able to change your build process to npm install serialport --build-from-source rather than relying on the prebuild binaries?

ps - this isn't really important, and I may be configuring docker incorrectly, but I needed to add udev to the list of packages, as I was just hitting udevadm errors without it

@GazHank
Copy link
Contributor

GazHank commented Jun 14, 2021

@marcus-j-davies would you be able to confirm if --build-from-source works with your musl libc use case?

@marcus-j-davies
Copy link

marcus-j-davies commented Jun 14, 2021

I don't have the means presently,
I think there are 2 areas of interest here.

The segfault with musl libc - which was reported by users of 1 project (owned by @robertsLando)
and the sudden requirement for glibc 2.28 reported by my users (a different project)

I saw discussions on slack with issues regarding serialport (since 9.1.0)
and as I happened to also have some issues for my users - since I also bumped to 9.1.0, I got in on the conversation.

I had logs, showing its asking for glibc 2.28 - and thought it may have had something to do with it (it could well be), hence my invitation to this issue.

For what its worth, I did try an npm install with --build-from-source but its fails on building @serialport/bindings.
on a fresh install of opensuse leap 15.2 not so long ago, something about binding.gyp not being able to be read.

Build From Source
image

Standard install (9.0.7, was not failing)
image

@GazHank
Copy link
Contributor

GazHank commented Jun 14, 2021

Thanks @marcus-j-davies super useful info. I'll try to find some time to build a fresh opensuse leap 15.2 and see what I can find. It has been a long time since I've used suse, so it will be a nice chance to catch up on what has changed in the last 20 years ;-)

@marcus-j-davies
Copy link

marcus-j-davies commented Jun 14, 2021

I dont actually use it myself.
but this was the environment that got reported to me. (that same environment, installs 9.0.7 just fine)

Please don't act on my behalf - this issue was raised by @robertsLando, so please prioritise support his way 😄

EDIT:
All of a sudden I can no longer install 9.07 - as I now get the glibc 2.28 not found message, where the reported case didn't, and installed with glibc 2.26 with no problems - I think thats nothing more than semver behaviour though.

Which makes me think the issue is with the bindings package 🤷

EDIT2:
@GazHank
Yup - installing the bindings package of (9.0.7) manually, allows me to install SP 9.07 again,
if not already installed, it will install the latest version of bindings (9.1.0).

Which I think where the issue is with all of this. and the enforced glibc version

@GazHank
Copy link
Contributor

GazHank commented Jun 14, 2021

@marcus-j-davies if you are getting changing results, you might want to check the exact version of bindings that you are using (yarn why etc). If you include a dependency on serialport 9.0.0 then it creates a dependency on the sub packages as follows:

"dependencies": { "@serialport/binding-mock": "^9.0.0", "@serialport/bindings": "^9.0.0", "@serialport/parser-byte-length": "^9.0.0", "@serialport/parser-cctalk": "^9.0.0", "@serialport/parser-delimiter": "^9.0.0", "@serialport/parser-readline": "^9.0.0", "@serialport/parser-ready": "^9.0.0", "@serialport/parser-regex": "^9.0.0", "@serialport/stream": "^9.0.0",

As such it will probably install a later version of bindings than the overall serialport package itself. To correct this you may need to add resolutions to specify the particular version of binding or other sub-package that you need.

@reconbot would it be prudent for us to change the package.json for the serialport package to be exact? That is to say serialport 9.0.0 would have a dependency on "@serialport/bindings": "9.0.0" instead of "@serialport/bindings": "^9.0.0"

In this way if someone wants to lock to serialport 9.0.7 (or whatever) they will always get the exact version of the bindings and other sub packages rather than risk they they get updated to the latest versions and have to specify resolutions?

@robertsLando
Copy link
Author

robertsLando commented Jun 15, 2021

In this way if someone wants to lock to serialport 9.0.7 (or whatever) they will always get the exact version of the bindings and other sub packages rather than risk they they get updated to the latest versions and have to specify resolutions?

I agree with this, in fact I had to use resolutions to fix the problem: https://github.com/zwave-js/zwavejs2mqtt/blob/master/package.json#L91-L93

I'm 99% sure the problem is related to serialport/bindings@9.1.0 but I don't have so much experience here to tell you what actually is wrong. I just know this doesn't happen on all arch/os but just with few users, the one that reported the bug was using alpine 3.13

@marcus-j-davies
Copy link

Agree with @robertsLando .

serialport/bindings@9.1.0 seems to have moved goal posts so to speak.

I investigated more, and installed bindings manually:

I could install serialport/bindings@9.0.7 with no complain about glibc
I could not install serialport/bindings@9.0.8 wants gllibc 2.28
I could not install serialport/bindings@9.1.0 wants gllibc 2.28

Seems as if this problem started with 9.0.8

What ever is happening with bindings - I'm of the opinion its causing the segfault with Alpine.
and the inability to install into an OS not using glibc 2.28

@marcus-j-davies
Copy link

Have taken the liberty of recording my VM to show the failure is present from bindings 9.0.8.
which again - I think is very much why Alpine is having issues.

Video shows installing bindings 9.0.7, then 9.0.8

OpenSUSE Leap

@GazHank
Copy link
Contributor

GazHank commented Jun 15, 2021

Thanks @marcus-j-davies

I think this is down to changes to the Linux env that was used to perform the prebuilds. Since 9.0.7 was buillt 4 months ago (22 Feb), and 9.0.8 and 9.1.0 were built against the same newest environment. The info about those environment changes can be found at https://github.com/actions/virtual-environments/commits/ubuntu20/20210517.1/images/linux/Ubuntu2004-README.md

Unfortunately at the moment I think we probably need to consider the prebuilds as Ubuntu LTS binaries, and there is simply a chance that the prebuilt binary will work on other distros. In effect it is a bit of a fluke that the binary for 9.0.7 works, since the build env at that time was close enough to the alpine env, so it worked. As such build-from-source seems necessary.

The issue you raised earlier with the build-from-source looks like it could be a problem with the install or config of node-gyp. I assume the build from source fails for you irrespective of the serialport version?

The readme for node-gyp might help address the build from source issue. When I've built a very bare bones Linux distro in the past I've needed to add python 3, make and a C/C++ compiler in order to get node-gyp to work, and making sure node-gyp is configured to use python 3 rather than 2. I've seen others hitting issues with node-gyp and needing to replace or upgrade the version bundled with the OS (or available from the OS package manager) as it's out of date.

@robertsLando
Copy link
Author

Disclaimer: I'm not an expert in this field but...

I helped to maintain a project a while ago and I remember we had a similar problem while producing binaries, seems that using Ubuntu distro is not good as this means the compiled binary could not work on other linux OS as the toolchain is different.

The solution has been: https://github.com/vercel/pkg-fetch/blob/main/.github/workflows/build-linux.yml

Throwing here my 2 cent hope it will help

@reconbot
Copy link
Member

@GazHank

@reconbot would it be prudent for us to change the package.json for the serialport package to be exact? That is to say serialport 9.0.0 would have a dependency on "@serialport/bindings": "9.0.0" instead of "@serialport/bindings": "^9.0.0"

Seems reasonable, we need to see how lerna supports it.

@GazHank
Copy link
Contributor

GazHank commented Jun 15, 2021

Thanks @robertsLando this sounds like a good way to avoid background changes to the GitHub environments affecting portability of the binary without much notice.

Did this create much of an additional overhead for your other project to maintain and patch?

My main concern with proposing this as a change/enhancement to the build process is that it could (if not properly supported) create a greater burden on the project and a worse experience for the users. Especially when there are security vulnerabilities identified in the library's or build tools; at the moment such issues are taken care of by GitHub.

@reconbot
Copy link
Member

reconbot commented Jun 15, 2021

I've never known to have an issue with the build env impact the binaries security. But I've had years of brittle build infra. I'm sure we can downgrade some libraries or recommend rebuilding for alpine in our docs until it's fixed upstream. https://serialport.io/docs/guide-installation#alpine-linux

I'm not aware of many other distros using a musl, but if it's going to occur more often a musl build could be in our future. (It's not a fluke that Ubuntu builds work on other os's if they use the same standard c library) Currently the build tools don't support it we'd have to push that upstream.

(PS I have a prototype of prebuildify which doesn't solve this problem but could make this business a lot easier to debug. I haven't figured out how to get it into lerna)

@robertsLando
Copy link
Author

this sounds like a good way to avoid background changes to the GitHub environments affecting portability of the binary without much notice.

Yep, I think that when we speak about binaries the build environment should be agnostic, that's the best solution to prevent this kinf of problems

Did this create much of an additional overhead for your other project to maintain and patch?

For sure using default github runners is much easier but once the environment is correctly setted up I don't think it's so hard to maintain. Also I'm not sure that would be the final solution but this issues sounds very familiar to the one we had there (that workflow is used to build patched nodejs binaries).

@robertsLando
Copy link
Author

BTW I try to tag the author of that workflow, he is a master in that field but I know he is so busy so I dunno if he will answer.

@jesec would be much appreciated to have your opinion here :)

@marcus-j-davies
Copy link

Hi,

I may be missing something, but wasn't 9.2.0 meant to alter the versioning to be exact?

package.json reads (I have installed serialport 9.2.0)

@serialport/bindings": "^9.2.0",

Everything else is fixed.

@reconbot
Copy link
Member

If you read the linked issue it didn't work. We just landed another pr that should. It will go out with the next version.

@marcus-j-davies
Copy link

Got it!

Didn't read down far enough.

@robertsLando
Copy link
Author

robertsLando commented Jun 25, 2021

Just to know, can I safewly remove the resolutions from my package.json now after upgrading to serialport 9.2.0?

@marcus-j-davies
Copy link

marcus-j-davies commented Jun 25, 2021

@robertsLando

From my understanding, I think 9.2.0+ will still have issues with alpine (correct me if I am wrong guys), as I think its more to do with the updated build ev.

The attempt to fix version dependancies in releases didn't work with 9.2.0,
but if that does end up being fixed with 9.2.+ - I think it cant be fixed retrospectively.

@robertsLando
Copy link
Author

robertsLando commented Jun 25, 2021

Nonetheless, I noticed this project uses Github Actions. Github Actions environment has experimental toolchain (incl. libstdc++), which breaks binary compatibility assumptions of a platform

Yeah that's the point of the issue, I think that the only way to fix that is to replicate what you did in pkg-fetch to build nodejs binaries. Thanks for your answer @jesec hope authors of this package can find a solution to this as serialport is a very widely used package

@robertsLando robertsLando changed the title segfault after updating from 9.0.7 -> 9.1.0 [BUG] SEGFAULT after updating from 9.0.7 -> 9.1.0 Jun 25, 2021
@GazHank GazHank added the linux label Jul 5, 2021
@GazHank GazHank added the Workaround Solution found but more elegant solution desired label Jul 5, 2021
@joaorbdg
Copy link

joaorbdg commented Aug 9, 2021

If you are on Ubuntu and are experiencing this issue, make sure that you have the build-essential package installed on your system.

sudo apt install build-essential -y

@robertsLando
Copy link
Author

@GazHank @reconbot any updates on this?

@marcus-j-davies
Copy link

marcus-j-davies commented Aug 10, 2021

@robertsLando,

To my knowledge, no.
However, one way I have 'resolved' it.

is adding a preinstall to package.json

 "dependencies": {
    "serialport": "9.2.0",
    "zwave-js": "8.0.8",
    "winston": "3.3.3"
  },
  "scripts": {
    "preinstall": "npm install @serialport/bindings@9.2.0 --build-from-source  && npm install serialport@9.2.0 --build-from-source"
  },

This means, the packages are built for the environment thats currently in use, instead of depending on the pre-binaries that are incompatible.

It's important to do bindings first - as I believe, thats the problematic package.
one draw back however, is needing to have the necessary toolchain installed, thats where my readme comes in 😄

this work around, seems to have fixed a problem with openSUSE I was having, where the pre-binary was built using a newer version of glibc

not sure if this could work for you as well.

@robertsLando
Copy link
Author

Thanks for the tips @marcus-j-davies , unfortunally I think I will stick to 9.0.X version until this is fixed, building binaries would slow down all my workflows a lot :(

@GazHank
Copy link
Contributor

GazHank commented Aug 10, 2021

WRT a long term fix, there is no real progress from me I'm afraid. I've spent a bit of time looking into the docker option and think it is a good choice; but given the existing workaround (build from source), and limited availability on my part I havent spent a lot of time on this yet.

At the moment in the v limited time I have, I've been trying to see if we can switch from NAN to NAPI, as this would mean we have just one binary per OS architecture, and gives us a number of other potential enhancements. As such it would make it far easier to support lots of new environments from Alpine/musl to including ARM prebuilds, since the number of binaries to build and support would drop drastically.

If you would like to draft a PR with updated github actions to prebuild on alpine via docker or similar then I'd be keen to help if I can

@GazHank
Copy link
Contributor

GazHank commented Aug 10, 2021

@robertsLando do you know if a prebuild made on Ubuntu1804 would work for your issue?

If simply switching the build from the latest LTS of Ubuntu to 1804 would address this (in the short term) then that would be a super quick change and would avoid having lots of Linux builds (prior to NAPI)

@GazHank
Copy link
Contributor

GazHank commented Aug 10, 2021

If switching to an older LTS will work for this issue then let's increase compatability by doing so. The only downside is that we will need to remember to update the Git hub action when ubuntu 1804 is finally out of support.

The longer term fix should still be addressed via #2292

@GazHank GazHank changed the title [BUG] SEGFAULT after updating from 9.0.7 -> 9.1.0 Linux prebuild 9.1.0 not compatible with musl libc Aug 10, 2021
@robertsLando
Copy link
Author

Sorry for the late reply @GazHank I was on vacation. I think a downgrade to ubuntu 18.04 could fix some issues but it's not the utilmate solution to the problem. The only way is #2292

@GazHank
Copy link
Contributor

GazHank commented Aug 23, 2021

No problem @robertsLando, thanks for the reply and I hope you had a nice vacation :-D

Based on your feedback, I'll hold #2307 for now, and try to find some time to work on #2292 instead

@robertsLando
Copy link
Author

Thank you @GazHank :)

@robertsLando
Copy link
Author

@GazHank sorry to ping you again, just to know is there any progress about this? I see many proojects are stuck to 9.0.7 because of this

@GazHank
Copy link
Contributor

GazHank commented Oct 7, 2021

No problem @robertsLando, I understand your keen interest to see this progress. At the moment the real progress on this is the work on the migration to N-APIwhich will be a stepping stone to enable this.

I think the most natural sequence for enabling this will be:
1 migrate to N-API
2 migrate to prebuildify
3 Add containerised cross build

I'm afraid that for me I think I need to prioritise the N-API migration in my limited free time, due to both it's value for this feature, and the other users who need the switch from Nan to N-API (especially electron users on all platforms)

There might be an option to add a musl build once we are on prebuildify without needing to use containers, but containerisation seems to be the best option for the future.

Points 2 and 3 could be worked on independently if someone is able to devote time to these; it would be super valuable, and greatly appreciated.

@robertsLando
Copy link
Author

robertsLando commented Oct 8, 2021

I agree with the priorities here, I could give a try to containerised cross build if you want based on https://github.com/vercel/pkg-fetch/blob/main/.github/workflows/build-linux.yml?rgh-link-date=2021-06-15T12%3A52%3A40Z

Hope to find some time for it soon

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week no further activity occurs. Feel free continue the discussion or ask for a never-stale label to keep it open. If this is a support issue, consider sharing it on stack overflow to get more eyes on your problem.

@stale stale bot added the stale-issue label Apr 16, 2022
@robertsLando
Copy link
Author

No stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linux Workaround Solution found but more elegant solution desired
Development

Successfully merging a pull request may close this issue.

6 participants