Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang monitor stack trace doesn't make sense #22604

Closed
jdm opened this issue Jan 3, 2019 · 7 comments
Closed

Hang monitor stack trace doesn't make sense #22604

jdm opened this issue Jan 3, 2019 · 7 comments
Assignees

Comments

@jdm
Copy link
Member

@jdm jdm commented Jan 3, 2019

I received these backtraces while loading https://newyorktimes.com:

Stack trace
stack backtrace:
   0:        0x110b8cfae - __ZN9backtrace9backtrace5trace17h1031ecbd57e70b1aE
   1:        0x110b8bf0c - __ZN72_$LT$backtrace..capture..Backtrace$u20$as$u20$core..default..Default$GT$7default17hd2735b19f50df4baE
   2:        0x10ea3c560 - __ZN5servo21install_crash_handler7handler17h5983254e49d3dc15E
   3:     0x7fffae9e2b39 - __sigtramp
   4:        0x110b89ae8 - __ZN112_$LT$background_hang_monitor..sampler_mac..MacOsSampler$u20$as$u20$background_hang_monitor..sampler..Sampler$GT$25suspend_and_sample_thread17h9efe9ca9cfb5f555E
   5:        0x110b83368 - __ZN23background_hang_monitor23background_hang_monitor27BackgroundHangMonitorWorker3run17h047341667e28d833E
   6:        0x110b80397 - __ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h0fbad0ac2f51a49bE
   7:        0x110b77f37 - __ZN3std9panicking3try7do_call17ha9b1c2b3d6b80215E.llvm.13158428631819150874
   8:        0x11160303e - ___rust_maybe_catch_panic
   9:        0x110b8a32b - __ZN50_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$8call_box17h1cdf58f3bd83d145E
  10:        0x1116025eb - __ZN3std3sys4unix6thread6Thread3new12thread_start17ha7a33ee7c2fd74e3E
  11:     0x7fffae9ec93a - __pthread_body
  12:     0x7fffae9ec886 - __pthread_start
Stack trace
stack backtrace:
   0:        0x110b8cfae - __ZN9backtrace9backtrace5trace17h1031ecbd57e70b1aE
   1:        0x110b8bf0c - __ZN72_$LT$backtrace..capture..Backtrace$u20$as$u20$core..default..Default$GT$7default17hd2735b19f50df4baE
   2:        0x10ea3c560 - __ZN5servo21install_crash_handler7handler17h5983254e49d3dc15E
   3:     0x7fffae9e2b39 - __sigtramp
   4:        0x10ea3c5d4 - __ZN5servo21install_crash_handler7handler17h5983254e49d3dc15E
   5:     0x7fffae9e2b39 - __sigtramp
   6:        0x110b89ae8 - __ZN112_$LT$background_hang_monitor..sampler_mac..MacOsSampler$u20$as$u20$background_hang_monitor..sampler..Sampler$GT$25suspend_and_sample_thread17h9efe9ca9cfb5f555E
   7:        0x110b83368 - __ZN23background_hang_monitor23background_hang_monitor27BackgroundHangMonitorWorker3run17h047341667e28d833E
   8:        0x110b80397 - __ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h0fbad0ac2f51a49bE
   9:        0x110b77f37 - __ZN3std9panicking3try7do_call17ha9b1c2b3d6b80215E.llvm.13158428631819150874
  10:        0x11160303e - ___rust_maybe_catch_panic
  11:        0x110b8a32b - __ZN50_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$8call_box17h1cdf58f3bd83d145E
  12:        0x1116025eb - __ZN3std3sys4unix6thread6Thread3new12thread_start17ha7a33ee7c2fd74e3E
  13:     0x7fffae9ec93a - __pthread_body
  14:     0x7fffae9ec886 - __pthread_start

They seem to show backtraces from the hang monitor thread, rather than the hanging thread.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 3, 2019

@jdm
Copy link
Member Author

@jdm jdm commented Jan 3, 2019

On the other hand, that may actually be the result of the signal handler that tries to show backtraces of segfaults and other surprising failures:

println!("Stack trace{}\n{:?}", name, Backtrace::new());

@jdm
Copy link
Member Author

@jdm jdm commented Jan 3, 2019

Yep, this is a crash:

* thread #29, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001024ce268 servo`_$LT$background_hang_monitor..sampler_mac..MacOsSampler$u20$as$u20$background_hang_monitor..sampler..Sampler$GT$::suspend_and_sample_thread::hb329496638279f07 + 392
servo`_$LT$background_hang_monitor..sampler_mac..MacOsSampler$u20$as$u20$background_hang_monitor..sampler..Sampler$GT$::suspend_and_sample_thread::hb329496638279f07:
->  0x1024ce268 <+392>: movq   (%r14), %rcx
    0x1024ce26b <+395>: movq   0x10(%r14), %rdx
    0x1024ce26f <+399>: movq   0x8(%r14), %rsi
    0x1024ce273 <+403>: movq   %rsi, -0x4058(%rbp,%rax,8)
Target 0: (servo) stopped.
@jdm jdm added the I-crash label Jan 3, 2019
@gterzian gterzian self-assigned this Jan 4, 2019
@gterzian gterzian added the C-assigned label Jan 4, 2019
@gterzian
Copy link
Member

@gterzian gterzian commented Jan 5, 2019

HangProfile backtrace:
                         - __ZN2js3jit13MaybeEnterJitEP9JSContextRNS_8RunStateE
                         - __ZN6script3dom8bindings5utils12generic_call17he3d4586dd7542919E
                         - _CallJitMethodOp
                         - __ZN6script3dom8bindings7codegen8Bindings14ElementBinding14ElementBinding21getBoundingClientRect17hf4a63bb14ec6fd0aE
                         - ___rust_maybe_catch_panic
                         - __ZN3std9panicking3try7do_call17h8837fb5a3a806446E.llvm.14731053362047269330
                         - __ZN138_$LT$script..dom..element..Element$u20$as$u20$script..dom..bindings..codegen..Bindings..ElementBinding..ElementBinding..ElementMethods$GT$21GetBoundingClientRect17h251199f422c35b7bE
                         - __ZN6script3dom4node4Node28bounding_content_box_or_zero17hd9fd77e5feca8b9dE
                         - __ZN6script3dom6window6Window6reflow17h14e0ce770caf642eE
                         - __ZN6script3dom6window6Window12force_reflow17h79f45d1e1d5f404bE
                         - __ZN54_$LT$crossbeam_channel..channel..Receiver$LT$T$GT$$GT$4recv17hd5fc7a74e53110eaE
                         - __ZN59_$LT$crossbeam_channel..flavors..list..Channel$LT$T$GT$$GT$4recv17h84da2d1a95ba86f2E
                         - _swtch_pri

I'm getting a few permanent hangs(which occur when window is waiting for the result from reflow, I think)

join_port.recv().unwrap()

and the sampling doesn't seem to crash in all cases.

This is with a build from ./mach build --release --with-frame-pointer

@jdm do you get this crash each time you visit the page? What options did you pass to build and run?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 5, 2019

Mmm, I wasn't using a build with the frame pointer enabled. I wonder if that's causing the crash?

@gterzian
Copy link
Member

@gterzian gterzian commented Jan 5, 2019

I tried a ./mach clean, followed by ./mach build --release, and surprisingly it still produces a stacktrace in the hang profile, so it appears that the options is not necessary...

I still can't seem to reproduce the same crash, although I am getting a different one on one occasion:

Stack trace for thread "ScriptThread PipelineId { namespace_id: PipelineNamespaceId(1), index: PipelineIndex(1) }"
stack backtrace:
   0:        0x10cb2049e - __ZN9backtrace9backtrace5trace17hcbb011583cfd3d3aE
   1:        0x10cb211ac - __ZN72_$LT$backtrace..capture..Backtrace$u20$as$u20$core..default..Default$GT$7default17h29fd18b15ff4d63eE
   2:        0x10cb2122d - __ZN9backtrace7capture9Backtrace3new17h2273fa824674c020E
   3:        0x10a6b1c36 - __ZN5servo21install_crash_handler7handler17h50b5823b9628a2b9E
   4:     0x7fff60d25b3c - __sigtramp
   5:        0x10cc41a41 - __ZL9InterpretP9JSContextRN2js8RunStateE
   6:        0x10cc38642 - __ZN2js9RunScriptEP9JSContextRNS_8RunStateE
   7:        0x10cc478ad - __ZN2js23InternalCallOrConstructEP9JSContextRKN2JS8CallArgsENS_14MaybeConstructE
   8:        0x10cc47a75 - __ZN2js4CallEP9JSContextN2JS6HandleINS2_5ValueEEES5_RKNS_13AnyInvokeArgsENS2_13MutableHandleIS4_EE
   9:        0x10cf8abf2 - __Z20JS_CallFunctionValueP9JSContextN2JS6HandleIP8JSObjectEENS2_INS1_5ValueEEERKNS1_16HandleValueArrayENS1_13MutableHandleIS6_EE
  10:        0x10b24bdb0 - __ZN6script3dom8bindings7codegen8Bindings15FunctionBinding8Function4Call17hecc89d61e3fd727cE
  11:        0x10b24b7bf - __ZN6script3dom8bindings7codegen8Bindings15FunctionBinding8Function5Call_17h2b16f1ae023c01abE
  12:        0x10b5e4d4a - __ZN6script6timers13OneshotTimers10fire_timer17h05099978f42da957E
  13:        0x10b3e9579 - __ZN6script13script_thread12ScriptThread11handle_msgs28_$u7b$$u7b$closure$u7d$$u7d$17h8cbffe432f557b22E
  14:        0x10b3e5d32 - __ZN6script13script_thread12ScriptThread11handle_msgs17h30f3b3fb96e66468E.llvm.7049152249821569285
  15:        0x10b453157 - __ZN14profile_traits3mem12ProfilerChan25run_with_memory_reporting17h74de36cb27d72731E
  16:        0x10b54b6c2 - __ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17hb4a2f2cc3c32547cE
  17:        0x10abdb5fd - __ZN3std9panicking3try7do_call17h7fe842d0bce5757fE.llvm.14731053362047269330
  18:        0x10d5bc72e - ___rust_maybe_catch_panic
  19:        0x10b82f876 - __ZN50_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$8call_box17h7a2952f5b5ab6bf4E
  20:        0x10d5938bb - __ZN3std3sys4unix6thread6Thread3new12thread_start17hba53abd905cebc29E
  21:     0x7fff60d2e304 - __pthread_body
  22:     0x7fff60d3126e - __pthread_start
@gterzian
Copy link
Member

@gterzian gterzian commented Jan 6, 2019

Stack trace
stack backtrace:
   0:        0x107e3149e - __ZN9backtrace9backtrace5trace17hcbb011583cfd3d3aE
   1:        0x107e321ac - __ZN72_$LT$backtrace..capture..Backtrace$u20$as$u20$core..default..Default$GT$7default17h29fd18b15ff4d63eE
   2:        0x107e3222d - __ZN9backtrace7capture9Backtrace3new17h2273fa824674c020E
   3:        0x1059c2c36 - __ZN5servo21install_crash_handler7handler17h50b5823b9628a2b9E
   4:     0x7fff61bdeb3c - __sigtramp
   5:        0x107e31068 - __ZN112_$LT$background_hang_monitor..sampler_mac..MacOsSampler$u20$as$u20$background_hang_monitor..sampler..Sampler$GT$25suspend_and_sample_thread17h2ee3dffc671cf299E
   6:        0x107e27644 - __ZN23background_hang_monitor23background_hang_monitor27BackgroundHangMonitorWorker3run17h856a7fe80951e77fE
   7:        0x107e2da87 - __ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3d2eca4a57b1543cE
   8:        0x107e25287 - __ZN3std9panicking3try7do_call17h8a654a6e8d14c8c0E.llvm.17721706085543353003
   9:        0x1088cd72e - ___rust_maybe_catch_panic
  10:        0x107e30d9b - __ZN50_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$8call_box17h7ffaf2d30131fc8cE
  11:        0x1088a48bb - __ZN3std3sys4unix6thread6Thread3new12thread_start17hba53abd905cebc29E
  12:     0x7fff61be7304 - __pthread_body
  13:     0x7fff61bea26e - __pthread_start

Ok, I reproduced the crash, it happened with a build without the --with-frame-pointer option, yet prior to the crash I also got a few valid hang reports, which would seem to show the crash doesn't relate to the frame pointer option, since in that case it should crash on each sampling, right?

@gterzian gterzian mentioned this issue Jan 6, 2019
0 of 5 tasks complete
bors-servo added a commit that referenced this issue Jan 29, 2019
Fix frame-pointer stackwalking

<!-- Please describe your changes on the following line: -->

This seems to fix the problem, it's a check that is also done at https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/mozglue/misc/StackWalk.cpp#904

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #22604 (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22637)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Jan 29, 2019
Fix frame-pointer stackwalking

<!-- Please describe your changes on the following line: -->

This seems to fix the problem, it's a check that is also done at https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/mozglue/misc/StackWalk.cpp#904

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #22604 (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22637)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.