Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta v1.4.0-beta.1 crashes on Pi Zero #919

Closed
ebaauw opened this issue Dec 31, 2021 · 28 comments
Closed

Beta v1.4.0-beta.1 crashes on Pi Zero #919

ebaauw opened this issue Dec 31, 2021 · 28 comments
Labels
beta This is in some form related to the current beta release bug
Projects

Comments

@ebaauw
Copy link
Contributor

ebaauw commented Dec 31, 2021

Analysis

Homebridge beta v1.4.0-beta.1 crashes on startup on a Raspberry Pi Zero. Not sure if related, but npm invokes node-gyp when installing the beta.

See the Homebridge log below. Attached the npm log: 2021-12-31T07_15_18_070Z-debug-0.log

Expected Behavior

No crash, obviously.

Steps To Reproduce

Standard Homebridge installation as per the Wiki on the Pi Zero. Install Homebridge beta through the UI.

Logs

[Dec 31 06:14:40] [31/12/2021, 06:14:40] [HB Supervisor] Restarting Homebridge...
[Dec 31 06:14:40] [31/12/2021, 06:14:40] [HB Supervisor] Starting Homebridge with extra flags: -T -D
[Dec 31 06:14:40] [31/12/2021, 06:14:40] [HB Supervisor] Starting Homebridge with custom env: {"NODE_OPTIONS":"--trace-warnings"}
[Dec 31 06:14:40] [31/12/2021, 06:14:40] [HB Supervisor] Started Homebridge v1.4.0-beta.1 with PID: 32243
[Dec 31 06:14:56] #
[Dec 31 06:14:56] # Fatal error in , line 0
[Dec 31 06:14:56] # Liftoff bailout should not happen. Cause: Armv6 not supported
[Dec 31 06:14:56] #
[Dec 31 06:14:56] #
[Dec 31 06:14:56] #
[Dec 31 06:14:56] #FailureMessage Object: 0xb49faf50
[Dec 31 06:14:57] [31/12/2021, 06:14:57] [HB Supervisor] Homebridge Process Ended. Code: null, Signal: SIGTRAP

Configuration

Crash happens before config.json is read.

Environment

  • OS: Raspbian GNU/Linux 11 (bullseye)
  • Software: 1.4.0-beta.1
  • Node: v16.13.1
  • npm: 8.3.0

Process Supervisor

hb-service

Additional Context

Starting Homebrigde manually:

$ homebridge -D


#
# Fatal error in , line 0
# Liftoff bailout should not happen. Cause: Armv6 not supported

#
#
#
#FailureMessage Object: 0xb61fdf50
Trace/breakpoint trap (core dumped)

Running node with --no-expose-wasm:

$ node --no-expose-wasm $(which homebridge)
[31/12/2021, 19:29:20] config.json (/home/pi/.homebridge/config.json) not found.
[31/12/2021, 19:29:21] ---
[31/12/2021, 19:29:35] Plugin /usr/local/lib/node_modules/homebridge-lib package.json does not contain the keyword 'homebridge-plugin'.
[31/12/2021, 19:29:37] Loaded plugin: homebridge-config-ui-x@4.41.5
[31/12/2021, 19:29:37] Registering platform 'homebridge-config-ui-x.config'
[31/12/2021, 19:29:37] ---
Setup Payload:
X-HM://0023ISYWYRK87
Scan this code with your HomeKit app on your iOS device to pair with Homebridge:
[...]                
Or enter this code with your HomeKit app on your iOS device to pair with Homebridge:
                       
    ┌────────────┐     
    │ 031-45-154 │     
    └────────────┘     
                       
[31/12/2021, 19:29:41] Homebridge v1.4.0-beta.1 (HAP v0.10.0-beta.4) (Homebridge) is running on port 36941.
@ebaauw ebaauw added bug beta This is in some form related to the current beta release labels Dec 31, 2021
@ebaauw ebaauw changed the title Beta v1.4.0-beta.1 crashed on Pi Zero Beta v1.4.0-beta.1 crashes on Pi Zero Dec 31, 2021
@Supereg
Copy link
Member

Supereg commented Jan 3, 2022

Introduced via #918

@Supereg
Copy link
Member

Supereg commented Jan 3, 2022

Was someone able to verify that this is indeed caused by the abstract-socket dependency? If so, we could just go down the path of maintaining a fork and removing that dependency for now. As I understood it, it is optional and really just used to be a bit more efficient(?).

CC: @adriancable, @oznu, @donavanbecker

@adriancable
Copy link
Contributor

adriancable commented Jan 3, 2022

@Supereg - unfortunately I don't have an ARMv6 platform to test on. But what I have verified is this: as an experiment I deleted package-lock.json (otherwise npm ignores --no-optional), then I run npm i --no-optional, I confirm that abstract-socket is not installed, and node-gyp does not run. So if indeed abstract-socket is the cause, your suggestion of forking dbus-native and removing that dependency completely would work.

Having a little look through dbus-native, it seems that abstract-socket is only used for the unit tests, hence it's safe to skip it in production.

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 3, 2022

Happy to do some testing, but I'm using Homebridge, and HAP-NodeJS installs through that. Haven't found any package-lock.json in the Homerbridge nor HAP-NodeJS installation directories.

@oznu
Copy link
Member

oznu commented Jan 3, 2022

Having a little look through dbus-native, it seems that abstract-socket is only used for the unit tests, hence it's safe to skip it in production.

If this is the case, perhaps the upstream project would accept a PR to move abstract-socket to devDependencies? Unless for some reason these unit tests need to run on production systems?

@adriancable
Copy link
Contributor

@oznu - I'd be happy to suggest it but first I think we need to be sure there is actually a problem with abstract-socket (on ARMv6). In that case, that's a strong case to present to the dbus-native maintainer. I'm not really sure we can confidently make that request right now.

@adriancable
Copy link
Contributor

@ebaauw - if you haven't already seen it, look at this Github issue posted for homebridge: homebridge/homebridge#3005

Isn't that exactly the same as your issue? Only thing is, that issue predates the PR by a long time (nearly 2 months), which suggests to me that the PR (or abstract-socket) is not the cause.

@adriancable
Copy link
Contributor

@ebaauw / @oznu - one other thing. The error @ebaauw is seeing is related to liftoff, which is (one of) the WASM compiler in V8. So what dependencies in hap-nodejs use WASM? They are: jest, typedoc, and source-map. These are all devDependencies. So @ebaauw maybe you might want to try npm i --production. You will need to remove the package-lock.json file first ... sorry, you will have to find it yourself since I don't know your system. It'll be wherever HAP-NodeJS's package.json is.

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

if you haven't already seen it, look at this Github issue posted for homebridge:

Yeah, saw that searching for no-wasm. Apparently that was related to “the nest plugin”. Not sure which one nor if that includes (dependencies with) C modules. See homebridge/homebridge#3002 (comment)

Isn't that exactly the same as your issue?

Same message, different issue. Note that Homebridge v1.3.9 is running fine on my Pi Zero.

@adriancable
Copy link
Contributor

adriancable commented Jan 4, 2022

@ebaauw - as coincidence would have it, I also happen to be the author of "the nest plugin" (if they are referring to homebridge-nest). I don't knowingly use dependencies with C modules, but in your case while I don't know for sure, since your error message references liftoff - and the error goes away when wasm in node is disabled - my money would be it is due to a dependency with WASM modules / the liftoff WASM compiler and not related to C modules.

So the source of this is still a mystery but I don't think it's related to dbus-native or its dependencies. I can't find any WASM usage there but of course it's always possible I missed it.

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

if they are referring to homebridge-nest

I don't know, hence the quotation.

since your error message references liftoff

Ah, so "Liftoff" is the name of the WASM compiler. I learn something new every day; couldn't make head nor tails of the error message.

It is related to dbus-native, I'm afraid:

$ sudo npm -g i homebridge@beta

added 23 packages, changed 86 packages, and audited 110 packages in 5m

42 packages are looking for funding
  run `npm fund` for details

5 vulnerabilities (2 low, 3 moderate)

To address all issues (including breaking changes), run:
  npm audit fix --force

Run `npm audit` for details.
$ cd /usr/local/lib/node_modules/homebridge
$ node
Welcome to Node.js v16.13.1.
Type ".help" for more information.
> let hap = require('hap-nodejs')


#
# Fatal error in , line 0
# Liftoff bailout should not happen. Cause: Armv6 not supported

#
#
#
#FailureMessage Object: 0xb59fcf50
Trace/breakpoint trap (core dumped)
$ cd node_modules/hap-nodejs/
$ node
Welcome to Node.js v16.13.1.
Type ".help" for more information.
> let dbus = require('dbus-native')


#
# Fatal error in , line 0
# Liftoff bailout should not happen. Cause: Armv6 not supported

#
#
#
#FailureMessage Object: 0xb49faf50
Trace/breakpoint trap (core dumped)
$ 

I have the core file, in case anybody is interested. Here's the stack trace:

$ gdb /usr/local/bin/node core-node-sig5-user1000-group1000-pid24867-time1641294932
GNU gdb (Raspbian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/node...
[New LWP 24872]
[New LWP 24868]
[New LWP 24869]
[New LWP 24871]
[New LWP 24873]
[New LWP 24895]
[New LWP 24896]
[New LWP 24897]
[New LWP 24898]
[New LWP 24955]
[New LWP 24867]
[New LWP 24870]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Core was generated by `node'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0  0x0165aad8 in v8::base::OS::Abort() ()
[Current thread is 1 (Thread 0xb49fc400 (LWP 24872))]
(gdb) bt
#0  0x0165aad8 in v8::base::OS::Abort() ()
#1  0x01651a3c in V8_Fatal(char const*, ...) ()
#2  0x00e45058 in v8::internal::wasm::(anonymous namespace)::LiftoffCompiler::unsupported(v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)1, v8::internal::wasm::(anonymous namespace)::LiftoffCompiler, (v8::internal::wasm::DecodingMode)0>*, v8::internal::wasm::LiftoffBailoutReason, char const*) [clone .part.648] ()
#3  0x00e7d348 in v8::internal::wasm::WasmFullDecoder<(v8::internal::wasm::Decoder::ValidateFlag)1, v8::internal::wasm::(anonymous namespace)::LiftoffCompiler, (v8::internal::wasm::DecodingMode)0>::Decode() ()
#4  0x00e7ea94 in v8::internal::wasm::ExecuteLiftoffCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::FunctionBody const&, int, v8::internal::wasm::ForDebugging, v8::internal::wasm::LiftoffOptions const&) ()
#5  0x00ea4498 in v8::internal::wasm::WasmCompilationUnit::ExecuteFunctionCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::Counters*, v8::internal::wasm::WasmFeatures*) ()
#6  0x00ea49d4 in v8::internal::wasm::WasmCompilationUnit::ExecuteCompilation(v8::internal::wasm::CompilationEnv*, v8::internal::wasm::WireBytesStorage const*, v8::internal::Counters*, v8::internal::wasm::WasmFeatures*) ()
#7  0x00ed6a7c in v8::internal::wasm::(anonymous namespace)::ExecuteCompilationUnits(std::weak_ptr<v8::internal::wasm::NativeModule>, v8::internal::Counters*, v8::JobDelegate*, v8::internal::wasm::(anonymous namespace)::CompileBaselineOnly) ()
#8  0x00ed7a10 in v8::internal::wasm::(anonymous namespace)::BackgroundCompileJob::Run(v8::JobDelegate*) ()
#9  0x0136a300 in v8::platform::DefaultJobWorker::Run() ()
#10 0x006d3e44 in node::(anonymous namespace)::PlatformWorkerThread(void*) ()
#11 0xb6c7c300 in start_thread (arg=0xb49fc400) at pthread_create.c:477
#12 0xb6c00208 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

@Supereg
Copy link
Member

Supereg commented Jan 4, 2022

Haven't looked at the core dump but the other tests were helpful, thanks @ebaauw.

Just pushed hap.nodejs@0.10.0-beta.5 and homebridge@1.4.0-beta.2 which use our dbus-native fork removing the abstract-socket dependency. @ebaauw could you verify that the issue is indeed fixed?

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

Unfortunately, it's not fixed.

$ sudo npm -g i homebridge@beta

added 1 package, removed 5 packages, changed 104 packages, and audited 106 packages in 3m

42 packages are looking for funding
  run `npm fund` for details

5 vulnerabilities (3 low, 2 moderate)

To address issues that do not require attention, run:
  npm audit fix

To address all issues (including breaking changes), run:
  npm audit fix --force

Run `npm audit` for details.
$ cd /usr/local/lib/node_modules/homebridge/node_modules/hap-nodejs
$ node
Welcome to Node.js v16.13.1.
Type ".help" for more information.
> dbus = require('dbus-native')
Uncaught Error: Cannot find module 'dbus-native'
Require stack:
- <repl>
    at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)
    at Function.Module._load (node:internal/modules/cjs/loader:778:27)
    at Module.require (node:internal/modules/cjs/loader:1005:19)
    at require (node:internal/modules/cjs/helpers:102:18) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [ '<repl>' ]
}
> dbus = require('@homebridge/dbus-native')


#
# Fatal error in , line 0
# Liftoff bailout should not happen. Cause: Armv6 not supported

#
#
#
#FailureMessage Object: 0xb61fdf50
Trace/breakpoint trap (core dumped)
$ 

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

It's the long dependency in @homebridge/dbus-native that causes the crash. The other dependencies can be require()ed without issue.

@adriancable
Copy link
Contributor

@ebaauw - yup, I was just about to guess the same thing, as long is the only thing that uses WASM.

@Supereg - can you fork long, and in src/long.js at the top remove this completely:

try {
  wasm = new WebAssembly.Instance(new WebAssembly.Module(new Uint8Array([
    0, 97, 115, 109, 1, 0, 0, 0, 1, 13, 2, 96, 0, 1, 127, 96, 4, 127, 127, 127, 127, 1, 127, 3, 7, 6, 0, 1, 1$
  ])), {}).exports;
} catch (e) {
  // no wasm support :(
}

It won't affect the functionality of the dependency, but I am 99% sure this will fix @ebaauw 's issue.

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

Patching the file, commenting out these lines causes it to load just fine.

@Supereg
Copy link
Member

Supereg commented Jan 4, 2022

@ebaauw - yup, I was just about to guess the same thing, as long is the only thing that uses WASM.

@Supereg - can you fork long, and in src/long.js at the top remove this completely:

try {
  wasm = new WebAssembly.Instance(new WebAssembly.Module(new Uint8Array([
    0, 97, 115, 109, 1, 0, 0, 0, 1, 13, 2, 96, 0, 1, 127, 96, 4, 127, 127, 127, 127, 1, 127, 3, 7, 6, 0, 1, 1$
  ])), {}).exports;
} catch (e) {
  // no wasm support :(
}

It won't affect the functionality of the dependency, but I am 99% sure this will fix @ebaauw 's issue.

seems like their wasm check isn't really working. Might we just raise an PR and disable wasm on armv6? I think this would be the better solution instead of forking another level deeper.

@ebaauw could you manually modify the dependency sources to remove the wasm check, to verify that issue. 

Can we check if we are running armv6 from inside node? Disabling avahi responder for those platforms might also be a short term workaround to unblock the beta.

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

Can confirm that Homebridge is now also working.

Can we check if we are running armv6 from inside node?

os.cpus()
On the Zero:

> os.cpus()
[
  {
    model: 'ARMv6-compatible processor rev 7 (v6l)',
    speed: 1000,
    times: {
      user: 59082220,
      nice: 16460,
      sys: 30106420,
      idle: 331277560,
      irq: 0
    }
  }
]

On the Pi 4B:

> os.cpus()
[
  {
    model: 'ARMv7 Processor rev 3 (v7l)',
    speed: 1800,
    times: { user: 2331240, nice: 220, sys: 3279330, idle: 88322050, irq: 0 }
  },
  {
    model: 'ARMv7 Processor rev 3 (v7l)',
    speed: 1800,
    times: { user: 2448290, nice: 280, sys: 3308480, idle: 88409140, irq: 0 }
  },
  {
    model: 'ARMv7 Processor rev 3 (v7l)',
    speed: 1800,
    times: { user: 2461520, nice: 60, sys: 3242070, idle: 88453830, irq: 0 }
  },
  {
    model: 'ARMv7 Processor rev 3 (v7l)',
    speed: 1800,
    times: { user: 2345290, nice: 6140, sys: 3249470, idle: 88622690, irq: 0 }
  }
]

And on the Zero 2:

> os.cpus()
[
  {
    model: 'ARMv7 Processor rev 4 (v7l)',
    speed: 600,
    times: {
      user: 17346790,
      nice: 7610,
      sys: 6920090,
      idle: 1054929340,
      irq: 0
    }
  },
  {
    model: 'ARMv7 Processor rev 4 (v7l)',
    speed: 600,
    times: {
      user: 23631000,
      nice: 3240,
      sys: 6671930,
      idle: 1047877750,
      irq: 0
    }
  },
  {
    model: 'ARMv7 Processor rev 4 (v7l)',
    speed: 600,
    times: {
      user: 29179920,
      nice: 9240,
      sys: 7372430,
      idle: 1043860090,
      irq: 0
    }
  },
  {
    model: 'ARMv7 Processor rev 4 (v7l)',
    speed: 600,
    times: {
      user: 16367300,
      nice: 5190,
      sys: 4918790,
      idle: 1060089450,
      irq: 0
    }
  }
]

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

Hm. the model doesn't make sensse on the Pi 4B. It seems to match the CPU listed in /proc/cpuinfo, which is wrongly reported by the kernel, see https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#raspberry-pi-revision-codes. I have a parser for the revision code in homebridge-lib.

@adriancable
Copy link
Contributor

adriancable commented Jan 4, 2022

@ebaauw @Supereg @oznu - I have just filed an issue on the long Github. In the mean time we can do the CPU check ourselves and not require dbus-native if it fails. (We can also probably go back to the non-forked dbus-native, since the abstract-socket thing was a red herring.)

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 4, 2022

We can also probably go back to the non-forked dbus-native, since the abstract-socket thing was a red herring.

I can confirm: Homebridge still working after re-installing 1.4.0-beta.1, and patching long.js.

However, I think we would still want to skip the node-gyp: it takes forever on the Pi Zero and it's almost imposiible to upgrade Homebridge from the UI. Make sure to increase the swapfile size on the Pi before even thinking about trying.

@adriancable
Copy link
Contributor

adriancable commented Jan 5, 2022

@ebaauw et al. - so it sounds like we have a plan.

Short term we use @homebridge/dbus-native (to get rid of abstract-socket) but only require it in HAP-NodeJS if !os.cpus().find(cpu => cpu.model.includes('ARMv6')). This gets the beta working on all platforms but you lose Avahi on ARMv6.

Alternatively: we also fork long and fix that, with the aim of getting rid of the fork once it's fixed upstream. That way we get to keep Avahi even on ARMv6, right now.

Longer term we aim to get long fixed properly (as per my issue post) so we can then update the dependency in @homebridge/dbus-native to remove the require check and then Avahi will work on ARMv6.

@Supereg
Copy link
Member

Supereg commented Jan 5, 2022

@adriancable thanks for escalating the issue to long and node 🚀

@adriancable
Copy link
Contributor

adriancable commented Jan 5, 2022

@Supereg - I've now escalated it to V8 itself: https://bugs.chromium.org/p/v8/issues/detail?id=12527

Of course since the chain of blame has now got so long, it also makes the expected time-to-resolution long as well. I am now thinking we should also do a @homebridge/long and use that in @homebridge/dbus-native, and then as things get fixed upstream we can move back to the unforked dependencies. What do you think?

Another reason for doing a @homebridge/long - it gives Homebridge plug-in developers the ability to use long without breaking ARMv6. Actually, I think my homebridge-nest plug-in uses long, so this may well be useful for me.

@Supereg Supereg added this to in progress in v0.10.0 Jan 5, 2022
@Supereg
Copy link
Member

Supereg commented Jan 5, 2022

Published HAP-NodeJS 0.10.0-beta.6 and homebridge 1.4.0-beta.3 which use the forked version of long. The issue should be fixed for now.

@ebaauw Could you please verify.

EDIT: not sure where my message went or if I forgot to send it. But "again": Saw your issue von V8. Thanks for that 👍. I decided to work around the issue by creating yet another fork of long.js incorporating our heuristic for checking armv6. This way we can still run the avahi advertiser on those devices. Further, I would agree with @ebaauw that it might be sensible to still maintain our dbus-native fork to avoid installing the unnecessary abstract-socket dependency. Both forks should be configured in a way, that we are notified when the upstream is updated.

@ebaauw
Copy link
Contributor Author

ebaauw commented Jan 5, 2022

Looking good, also with Avahi advertiser selected; I see the advertisement in Discovery.

@Supereg
Copy link
Member

Supereg commented Jan 6, 2022

I would propose to close the issue for now. The issue is mitigated for hap-nodejs users.
The core issue itself is tracked in the long, node and v8 project respectively.

@Supereg Supereg closed this as completed Jan 6, 2022
@Supereg Supereg moved this from in progress to done in v0.10.0 Jan 6, 2022
@adriancable
Copy link
Contributor

adriancable commented Jan 7, 2022

@Supereg et al. - wow, the Google guys are responsive. They confirmed this is an issue and have already posted a CL to get this fixed in V8. Now I am pushing to get it fixed on Node's V8 fork so we don't need to wait for the upstream fix.

If you are interested, the fix for V8 is here: https://chromium-review.googlesource.com/c/v8/v8/+/3372915/

AntonioMeireles added a commit to AntonioMeireles/homebridge-vieramatic that referenced this issue Jan 21, 2022
- of note:
  - hap-nodejs 0.9.8 -> 0.10.0
  - homebridge 1.3.9 -> 1.4.0

 as they fix homebridge/homebridge#3005 /
 homebridge/HAP-NodeJS#919
 which hit us in #88

Signed-off-by: António Meireles <antonio.meireles@reformi.st>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta This is in some form related to the current beta release bug
Projects
No open projects
v0.10.0
  
done
Development

No branches or pull requests

4 participants