Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ember Try scenario failing on Travis with a Segmentation fault #360

Closed
samselikoff opened this issue May 31, 2019 · 22 comments
Closed

Ember Try scenario failing on Travis with a Segmentation fault #360

samselikoff opened this issue May 31, 2019 · 22 comments

Comments

@samselikoff
Copy link

Hi! I'm investigating a Travis failure in one of my Ember Try scenarios and am looking for any guidance. I have no idea if this has anything to do with Ember Try but figured I'd start here.

Here's the failure. If you look back at the full build you'll see the same failure for all Versioned tests.

When I tried one of them locally by running

ember try:one ember-lts-2.18

it passed with no problem.

My next guess was the Segmentation fault had something to do with Travis' cache. I thought maybe it was due to all the PRs Dependabot was opening. I went back to Travis, deleted all caches, and re-ran master. No change, the fault still happened.

I then thought it might have been due to a code change on my end, so I went back to the last-passing build and re-ran it. I saw the same failures on the Versioned tests.


Any idea for what could be going on? Is there possibly a memory leak that I'm not seeing locally but that is causing Travis to blow up?

Any help much appreciated!

@kategengler
Copy link
Member

Can you try running with DEBUG=ember-try* {{scenario-command}} on travis, so we can see where the segfault happens? It's at the end, after the results are printed; the only thing ember-try does at that point is cleanup.

Copy link
Member

rwjblue commented May 31, 2019

FYI - @scalvert has been digging into this same issue (over in ember-app-scheduler repo). Not yet sure what is going on, and we can't get it to repro locally yet.

@scalvert
Copy link

Correct. I've got the same issue on both ember-app-scheduler and ember-lifeline. Specifically, ember-lifeline's HEAD of master was passing CI, and following the issues I saw in ember-app-scheduler and to test if it was a problem isolated to that repo I restarted the build job in travis to see if it segfaulted too. It did.

Steps I've taken to try to isolate (ember-app-scheduler/ember-app-scheduler#312):

  1. Downgraded node to 8.15 (ember-app-scheduler/ember-app-scheduler@1e0c5fb)
  2. Upgraded node to 10 (ember-app-scheduler/ember-app-scheduler@b86074f)
  3. Downgraded eslint-plugin-prettier (ember-app-scheduler/ember-app-scheduler@4614811)
  4. Downgraded ember-cli (ember-app-scheduler/ember-app-scheduler@5cdcd1b)
  5. Upgraded ember-try to latest (ember-app-scheduler/ember-app-scheduler@9be9968)
  6. Acquired debug access to ember-app-scheduler, triggered debug builds, SSHed into the box and been able to reproduce the segfault. I've been working with Travis support to see if I can get access to the core dumps)

As mentioned, I've been engaged with Travis support to try to investigate.

@scalvert
Copy link

Can you try running with DEBUG=ember-try* {{scenario-command}} on travis, so we can see where the segfault happens? It's at the end, after the results are printed; the only thing ember-try does at that point is cleanup.

I can try this when I'm SSHed into the box.

@kategengler
Copy link
Member

It's also possible to try that by updating the Travis config on a branch.

Worth noting: The latest passing ember-cli-mirage's build was on the latest ember-try.

@scalvert
Copy link

Yep, upgrading ember-try to latest had no effect on the occurrence of segfaults.

@kategengler
Copy link
Member

Any update on this?

@scalvert
Copy link

Getting closer. A pesky weekend got in the way of further debugging efforts. I plan to focus on this today.

@scalvert
Copy link

Here's the top of the stack from the core dump from ember-lifeline:

(llnode) v8 bt
 * thread #1: tid = 0, 0x00007ff13af81554 sharp.node`std::queue<std::string, std::deque<std::string, std::allocator<std::string> > >::~queue() + 532, name = 'ember', stop reason = signal SIGSEGV
    frame #0: 0x00007ff13af81554 sharp.node`std::queue<std::string, std::deque<std::string, std::allocator<std::string> > >::~queue() + 532
    frame #1: 0x00007ff13d1ea1a9 libc.so.6`??? + 217
    frame #2: 0x00007ff13d1ea1f5 libc.so.6`exit + 21
    frame #3: 0x00000000008ce31f node`node::Exit(v8::FunctionCallbackInfo<v8::Value> const&) + 111
    frame #4: 0x0000000000a98153 node`v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) + 403
    frame #5: 0x0000000000b0f37c node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) + 332
    frame #6: 0x0000000000b0ffcf node`v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) + 175
    frame #7: 0x0000311f78f042fd <exit>
  * frame #8: 0x0000311f78fbedb6 process.exit(this=0x22321889cf9:<Object: process>, <Smi: 0>) at (external).js:140:26 fn=0x0000278746d26539
    frame #9: 0x0000311f78fbedb6 process.exit(this=0x22321889cf9:<Object: process>, <Smi: 0>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/capture-exit/index.js:63:26 fn=0x0000021b2c7f18c9
    frame #10: 0x0000311f78fbedb6 tryToExit(this=0x39d479b822d1:<undefined>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/exit/lib/exit.js:15:21 fn=0x00001949f5a51861
    frame #11: 0x0000311f78fbedb6 exit(this=0x39d479b822d1:<undefined>, <Smi: 0>, 0x39d479b822d1:<undefined>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/exit/lib/exit.js:11:31 fn=0x0000359cd89084e9
    frame #12: 0x0000311f78f0535f <adaptor>
    frame #13: 0x0000311f78fbedb6 (anonymous)(this=0x39d479b822d1:<undefined>, <Smi: 0>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/ember-cli/bin/ember:39:17 fn=0x0000391c89f1f009
    frame #14: 0x0000311f78fbedb6 tryCatcher(this=0x39d479b822d1:<undefined>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/rsvp/dist/rsvp.js:322:22 fn=0x0000021b2c788481
    frame #15: 0x0000311f78f0535f <adaptor>
    frame #16: 0x0000311f78fbedb6 invokeCallback(this=0x39d479b822d1:<undefined>, <Smi: 1>, 0x391c89f1efc1:<Object: Promise>, 0x391c89f1f009:<function: (anonymous) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/ember-cli/bin/ember:39:17>, <Smi: 0>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/rsvp/dist/rsvp.js:493:26 fn=0x0000021b2c788799
    frame #17: 0x0000311f78fbedb6 publish(this=0x39d479b822d1:<undefined>, 0x391c89f1eda9:<Object: Promise>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/rsvp/dist/rsvp.js:463:19 fn=0x0000021b2c788751
    frame #18: 0x0000311f78fbedb6 flush(this=0x39d479b822d1:<undefined>) at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/rsvp/dist/rsvp.js:2436:17 fn=0x0000021b2c788d01
    frame #19: 0x0000311f78fbedb6 _combinedTickCallback(this=0x39d479b822d1:<undefined>, 0x39d479b822d1:<undefined>, 0x21b2c788d01:<function: flush at /home/travis/build/ember-lifeline/ember-lifeline/node_modules/rsvp/dist/rsvp.js:2436:17>) at (external).js:130:33 fn=0x00002d1858398529
    frame #20: 0x0000311f78fbedb6 _tickCallback(this=0x22321889cf9:<Object: process>) at (external).js:152:25 fn=0x000002232188c821
    frame #21: 0x0000311f78f04239 <internal>
    frame #22: 0x0000311f78f04101 <entry>
    frame #23: 0x0000000000da7d6a node`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 266
    frame #24: 0x0000000000a7a793 node`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 355
    frame #25: 0x0000000000a89211 node`v8::Function::Call(v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 65
    frame #26: 0x00000000008ce228 node`node::InternalCallbackScope::Close() + 456
    frame #27: 0x00000000008cfa26 node`node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) + 198
    frame #28: 0x00000000008a5b08 node`node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) + 120
    frame #29: 0x00000000008fcec9 node`node::(anonymous namespace)::After(uv_fs_s*) + 329
    frame #30: 0x00000000009b53f5 node`uv__work_done(handle=0x0000000002186b50) + 165 at threadpool.c:313
    frame #31: 0x00000000009b989b node`uv__async_io(loop=0x0000000002186aa0, w=<unavailable>, events=<unavailable>) + 267 at async.c:118
    frame #32: 0x00000000009ca5c0 node`uv__io_poll(loop=0x0000000002186aa0, timeout=6578) + 752 at linux-core.c:375
    frame #33: 0x00000000009ba265 node`uv_run(loop=0x0000000002186aa0, mode=UV_RUN_DEFAULT) + 405 at core.c:370
    frame #34: 0x00000000008d6815 node`node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) + 1205
    frame #35: 0x00000000008d5b70 node`node::Start(int, char**) + 352
    frame #36: 0x00007ff13d1cff45 libc.so.6`__libc_start_main + 245
    frame #37: 0x000000000089f301 node`_start + 41

The top of the stack is capture-exit, which captures and ultimately calls process.exit. There's also a number of RSVP calls directly before. I'm going to inspect each frame to see if there's any more info.

@kategengler
Copy link
Member

Did you ever try running with DEBUG=ember-try*?

@scalvert
Copy link

Yes, it didn't provide any useful information, unfortunately.

@kategengler
Copy link
Member

I was mostly wondering what the last thing that happened before the segfault was?

@scalvert
Copy link

Well the tests complete, and the process seems to 'hang' after.

I stood up an Ubuntu image in Azure to attempt to replicate it there, mainly due to Travis' debug session having a timeout configured, which means the session will spontaneously end during debugging.

I was unable to reproduce the segfault in my server, though the process does hang for a significant portion of time after the tests complete successfully.

@kategengler
Copy link
Member

I was wondering because after all the tests run there is a step where ember-try cleans up / reinstalls node_modules; it's possible the segfault is happening during that.

@scalvert
Copy link

Ah gotcha. @rwjblue @krisselden and I are chatting about it right now to see if we can determine the issue. It's now happening in @ember/test-helpers too :/

@scalvert
Copy link

I tried running with DEBUG=ember-try*, and the cleanup phase appears to complete without issue. I can now reproduce the segfault on my Ubuntu machine in Azure. I have a core dump and am inspecting it again.

@scalvert
Copy link

scalvert commented Jun 11, 2019

We've identified the issue. It stems from the ember-cli-favicon addon, which has a transitive dependency on the sharp node package, which is a library used for image processing.

azureuser@travis:~/ember-lifeline/ember-lifeline$ yarn why sharp
yarn why v1.16.0
[1/4] Why do we have the module "sharp"...?
[2/4] Initialising dependency graph...
[3/4] Finding dependency...
[4/4] Calculating file sizes...
=> Found "sharp@0.22.1"
info Reasons this module exists
   - "ember-cli-favicon#broccoli-favicon#favicons" depends on it
   - Hoisted from "ember-cli-favicon#broccoli-favicon#favicons#sharp"
info Disk size without dependencies: "31.6MB"
info Disk size with unique dependencies: "32.67MB"
info Disk size with transitive dependencies: "35.05MB"
info Number of shared dependencies: 46
Done in 2.34s.

It's the sharp library itself that is causing the segfault, as can be seen from the stack in llnode:

(llnode) v8 bt
 * thread #1: tid = 0, 0x00007ff13af81554 sharp.node`std::queue<std::string, std::deque<std::string, std::allocator<std::string> > >::~queue() + 532, name = 'ember', stop reason = signal SIGSEGV
    frame #0: 0x00007ff13af81554 sharp.node`std::queue<std::string, std::deque<std::string, std::allocator<std::string> > >::~queue() + 532
    frame #1: 0x00007ff13d1ea1a9 libc.so.6`??? + 217
    frame #2: 0x00007ff13d1ea1f5 libc.so.6`exit + 21
    frame #3: 0x00000000008ce31f node`node::Exit(v8::FunctionCallbackInfo<v8::Value> const&) + 111

In the favicons library, the sharp library was added in this commit itgalaxy/favicons@928524c#diff-b9cfc7f2cdf78a7f4b91a753d10865a2. Since ember-try uses --no-lockfile, we upgrade to the version of broccoli-favicon that pulls in the version of favicons that includes the sharp library.

Workaround to unblock:

  • use resolutions to pin to v5.3.0 of favicons
"resolutions": {
  "favicons": "5.3.0"
}

We're trying to figure out the best place to report this issue.

@kategengler
Copy link
Member

Wow! So many levels...

(I am continually amazed anything ever works)

samselikoff added a commit to miragejs/ember-cli-mirage that referenced this issue Jun 11, 2019
samselikoff added a commit to miragejs/ember-cli-mirage that referenced this issue Jun 11, 2019
@stefanpenner
Copy link
Contributor

stefanpenner commented Jun 12, 2019

My guess it is related to this queue: https://github.com/lovell/sharp/blob/aa9b328778ef00971e883365ebedd480799394a2/src/common.cc#L420 and likely an issue with libc.so on the linux on travis.

vipsWarnings is a statically allocated variable on the sharp namespace, if my memory of c++ is correct, static variables such as this are destroyed in LIFO ordering when the main() function exists. Which seems to be when the issue is occurring.

I could be way off base, with a local reproduction it would likely be not too hard to figure out. If such a repro exists, i would recommeding:

  1. recompile, confirm it still crashes
  2. remove the whole warning related code which interacts with the queue, and see if it fixes
  3. if that leads to something, narrow in.

@stefanpenner
Copy link
Contributor

what version of glibc is on those linux boxes?

@scalvert
Copy link

(Ubuntu EGLIBC 2.19-0ubuntu6.15) 2.19

@rwjblue
Copy link
Member

rwjblue commented Nov 2, 2020

Going to close for now, happy to reopen if folks think this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants