Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests segfault on startup #5663

Closed
pmconne opened this issue Jun 21, 2023 · 16 comments · Fixed by #5661
Closed

Tests segfault on startup #5663

pmconne opened this issue Jun 21, 2023 · 16 comments · Fixed by #5661
Labels
bug Something isn't working
Milestone

Comments

@pmconne
Copy link
Member

pmconne commented Jun 21, 2023

Describe the bug
rush cover in CI jobs is producing segmentation faults on mac and linux.

To Reproduce
Steps to reproduce the behavior:

  1. Consult CI job results for Force node 18.16.0 for CI #5661
  2. Observe multiple test suites producing seg faults on mac and linux.

Screenshots

`rush cover` sumary (linux)

==[ FAILURE: 7 operations ]====================================================

--[ FAILURE: @itwin/analytical-backend ]---------------------[ 5.70 seconds ]--

Segmentation fault (core dumped)

--[ FAILURE: @itwin/core-backend ]--------------------------[ 13.29 seconds ]--

Segmentation fault (core dumped)

--[ FAILURE: @itwin/linear-referencing-backend ]-------------[ 5.74 seconds ]--

Segmentation fault (core dumped)

--[ FAILURE: @itwin/physical-material-backend ]--------------[ 4.57 seconds ]--

Segmentation fault (core dumped)

--[ FAILURE: core-full-stack-tests ]------------------------[ 37.85 seconds ]--

WARNING: Tests attempted to load missing asset: "/locales/en-US/iModelJs.json"
WARNING: Tests attempted to load missing asset: "/locales/en-US/CoreTools.json"
WARNING: Tests attempted to load missing asset: "/locales/en-US/Editor.json"
WARNING: Tests attempted to load missing asset: "/locales/en/Editor.json"
WARNING: Tests attempted to load missing asset: "/locales/en-US/TestApp.json"
WARNING: Tests attempted to load missing asset: "/locales/en/TestApp.json"
Segmentation fault (core dumped)

--[ FAILURE: example-code-snippets ]-------------------------[ 4.01 seconds ]--

Segmentation fault (core dumped)

--[ FAILURE: presentation-full-stack-tests ]-----------------[ 5.19 seconds ]--

Invoking: npm run -s test 
Backend PID: 19201


  Default supplemental rules
[2023-06-21T01:35:05.890Z] Tests initialized
    Content modifiers
      bis.Element
        Related properties


Operations failed.

rush cover (4 minutes 51.9 seconds)
##[error]Bash exited with code '1'.
Finishing: rush cover

`rush cover` summary (macOS)

==[ FAILURE: 8 operations ]====================================================

--[ FAILURE: @itwin/analytical-backend ]---------------------[ 3.68 seconds ]--

Invoking: nyc npm -s test 


  AnalyticalSchema

=============================== Coverage summary ===============================
Statements   : Unknown% ( 0/0 )
Branches     : Unknown% ( 0/0 )
Functions    : Unknown% ( 0/0 )
Lines        : Unknown% ( 0/0 )
================================================================================

--[ FAILURE: @itwin/core-backend ]---------------------------[ 8.20 seconds ]--

Invoking: nyc npm -s test 


  Category

=============================== Coverage summary ===============================
Statements   : Unknown% ( 0/0 )
Branches     : Unknown% ( 0/0 )
Functions    : Unknown% ( 0/0 )
Lines        : Unknown% ( 0/0 )
================================================================================

--[ FAILURE: @itwin/linear-referencing-backend ]-------------[ 2.67 seconds ]--

Invoking: nyc npm -s test 


  LinearReferencing Domain

=============================== Coverage summary ===============================
Statements   : Unknown% ( 0/0 )
Branches     : Unknown% ( 0/0 )
Functions    : Unknown% ( 0/0 )
Lines        : Unknown% ( 0/0 )
================================================================================

--[ FAILURE: @itwin/physical-material-backend ]--------------[ 2.78 seconds ]--

Invoking: nyc npm -s test 


  PhysicalMaterialSchema

=============================== Coverage summary ===============================
Statements   : Unknown% ( 0/0 )
Branches     : Unknown% ( 0/0 )
Functions    : Unknown% ( 0/0 )
Lines        : Unknown% ( 0/0 )
================================================================================

--[ FAILURE: core-full-stack-tests ]------------------------[ 20.89 seconds ]--

WARNING: Tests attempted to load missing asset: "/locales/en-US/iModelJs.json"
WARNING: Tests attempted to load missing asset: "/locales/en-US/CoreTools.json"
sh: line 1:   299 Segmentation fault: 11  npm run -s test:chrome

--[ FAILURE: example-code-app ]------------------------------[ 4.88 seconds ]--

sh: line 1:   710 Segmentation fault: 11  mocha --no-config

rush cover (3 minutes 45.5 seconds)
##[error]Bash exited with code '1'.
Finishing: rush cover

Desktop (please complete the applicable information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]
  • iTwin.js Version [e.g. 2.5.3]

Additional context

Failing build pipeline.
First observed in #5660. Occurred on all 3 runs of the pipeline.
#5661 produces similar results, with no code changes vs master.
Each test suite crash without completing a single test. Only test suites that use @itwin/core-backend are affected.
The most recent addon included upgrades of several third-party libraries. These failures were not observed at the time the new addon was integrated.
I fail to reproduce the problem running rush cover on Ubuntu 22.04.

@pmconne pmconne added the bug Something isn't working label Jun 21, 2023
@pmconne pmconne added this to the iTwin.js 4.0 milestone Jun 21, 2023
@pmconne
Copy link
Member Author

pmconne commented Jun 21, 2023

@chuckkir @nick4598 any theories?

@pmconne
Copy link
Member Author

pmconne commented Jun 21, 2023

Not observed in nightly job for 4.1.0-dev.40.
Observed in nightly job for 4.1.0-dev.41
There were no commits to master in between those two.

@pmconne
Copy link
Member Author

pmconne commented Jun 21, 2023

I just observed the seg fault running core-backend tests on Ubuntu on a branch newly-created from master.
I'll see if I can repro with a locally-built (debuggable) addon.

@nick4598
Copy link
Contributor

nick4598 commented Jun 21, 2023

Dev.41 is using node 18.16.1 and dev.40 is using 18.16.0, not sure if that would be enough to cause the issue.. 18.16.1 also just came out within 24 hours I believe.

Chuck also ran into segfaults when updating c-ares, and 18.16.1 appears to have included some c-ares vulnerabilities fixes. The fix in Chuck's case was to hide the symbols from the global space(This is similar to what Affan did to fix our segfaulting OpenSSL) as Node was stepping on the symbols from our version of c-ares in libsrc and causing segfaults. My guess is this is the cause.

We haven't produced a new addon with the fix, but it is already in master. https://github.com/iTwin/imodel-native/pull/297/files#diff-88ba601cab81905cdeda950a5c1189da911b5b755ac49e752ab75f19c5031eab:~:text=ifdef%20__unix,%25endif

We could possibly pin our node dependency down to 18.16.0 and get around this until we have an addon out.

@pmconne
Copy link
Member Author

pmconne commented Jun 21, 2023

Maybe a stretch, but dev.41 is using node 18.16.1 and dev.40 is using 18.16.0

I repro'ed with 18.16.0. Only that once though. I suppose it's possible the crash is sporadic but more likely to occur with 18.16.1? I'll update to that.

@chuckkir
Copy link

It still could be the same problem with a different library; the symbols are weak objects, which means that the linker will choose one and it can change if one of the libraries changes. This article talks about the symbols pretty well although in the context of a different problem.

https://developers.redhat.com/articles/2021/10/27/compiler-option-hidden-visibility-and-weak-symbol-walk-bar#disabling_runtime_type_information

@pmconne
Copy link
Member Author

pmconne commented Jun 21, 2023

Tests passed on macOS after @nick4598 forced them to use 18.16.0. Linux still running - no seg faults yet.
I repro'ed immediately locally after switching to 18.16.1.
Debugging...

@pmconne
Copy link
Member Author

pmconne commented Jun 21, 2023

imodeljs.node!ares_timeout (Unknown Source:0)
imodeljs.node!Curl_resolver_getsock (Unknown Source:0)
imodeljs.node![Unknown/Just-In-Time compiled code] (Unknown Source:0)
imodeljs.node!curl_multi_wait (Unknown Source:0)
imodeljs.node![Unknown/Just-In-Time compiled code] (Unknown Source:0)
imodeljs.node!BentleyM0200::BeSQLite::CloudContainer::PollManifest() (Unknown Source:0)
imodeljs.node![Unknown/Just-In-Time compiled code] (Unknown Source:0)
v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) (Unknown Source:0)
v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (Unknown Source:0)
v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) (Unknown Source:0)
Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit (Unknown Source:0)
Builtins_InterpreterEntryTrampoline (Unknown Source:0)
[Unknown/Just-In-Time compiled code] (Unknown Source:0)
Builtins_JSConstructStubGeneric (Unknown Source:0)

@nick4598
Copy link
Contributor

Yep thats the same callstack we got when debugging Chuck's branch which also updated cares. The fix is in master on imodel-native, but not in an addon yet.

@pmconne
Copy link
Member Author

pmconne commented Jun 21, 2023

Build to publish new addon keeps hanging on Linux. It stops producing output while running the following parts (same parts both times):

['ECPresentation:UnitTests-NonPublished', 'ECDb:RunGtest', 'iModelPlatform:UnitTests-NonPublished', 'iModelPlatform:BuildIModelEvolutionTests', 'Visualization:UnitTests']

I repro'ed locally on Ubuntu (freezes my shell). I failed to note the list of parts that were running when it hung. I rebuilt single-threaded (bb -N1 b) and again using 4 threads - both builds succeeded.

@chuckkir
Copy link

The build is taking the Linux boxes offline. I keep rerunning the one build and watch another machine go down. I feel like possibly the tests use a lot more memory than they did previously? What seems to be happening is that the box stops contacting the server so it is offline'd. My suspicion is that it is memory starved. I'm going to look at more logs to see if I can learn anything additional.

@nick4598
Copy link
Contributor

Reopening issue until we have a node addon for 3.x and 4.x which resolves the segfault. Fix is in both main branch and release/3.x branch already.

@nick4598 nick4598 reopened this Jun 23, 2023
@nick4598
Copy link
Contributor

The build is taking the Linux boxes offline. I keep rerunning the one build and watch another machine go down. I feel like possibly the tests use a lot more memory than they did previously? What seems to be happening is that the box stops contacting the server so it is offline'd. My suspicion is that it is memory starved. I'm going to look at more logs to see if I can learn anything additional.

Were you able to get any information from the logs? I noticed there are a few new 'Prepare' tests added shortly before we attempted the new addon. Maybe those tests aren't at fault but were just enough to push our memory usage too high and make it more likely that the Linux boxes would crash?
iTwin/imodel-native@5b9169c

@chuckkir
Copy link

Nothing new. The logs say that the connection was lost. The boxes don't crash, but they stop talking to the server so they get listed as "offline".

@tm-zub
Copy link

tm-zub commented Sep 15, 2023

@nick4598 you mentioned that fix was already in the branch and you reopened to make sure for closing after verifying in 4.x. Can you check and mark this issue accordingly?

@nick4598
Copy link
Contributor

The 3.x fix is in itwinjs-core versions 3.7.11 and greater. Fix is also in all versions of itwinjs-core 4.1.x, and master as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants