New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix frame-pointer stackwalking #22637

Merged
merged 1 commit into from Jan 29, 2019

Conversation

Projects
None yet
6 participants
@gterzian
Copy link
Collaborator

gterzian commented Jan 6, 2019

This seems to fix the problem, it's a check that is also done at https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/mozglue/misc/StackWalk.cpp#904


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • These changes fix #22604 (GitHub issue number if applicable)
  • There are tests for these changes OR
  • These changes do not require tests because ___

This change is Reviewable

@gterzian

This comment has been minimized.

Copy link
Collaborator Author

gterzian commented Jan 6, 2019

@jdm r?

In the end I could sort-of reliably reproduce the crash(go to https://newyorktimes.com, wait a bit, then click on a link, wait some more), and it seemed to be caused by the pointer dereferencing at https://github.com/servo/servo/pull/22637/files#diff-71911b06e513640e808c4586db0ee8bcR109. Adding this additional check seems to prevent it, and it makes sense since it's a check that is done in gecko as well.

Gecko seems to do one more check which is still lacking here, the (uintptr_t(next) & 3)) at https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/mozglue/misc/StackWalk.cpp#906. I'm not quite sure how to translate that to rust...

@@ -112,6 +112,9 @@ unsafe fn frame_pointer_stack_walk(regs: Registers) -> NativeStack {
if let Err(()) = native_stack.process_register(*pc, *stack) {
break;
}
if next <= current {
@gterzian

This comment has been minimized.

Copy link
Collaborator Author

gterzian commented Jan 7, 2019

I got the crash again after all, so this requires further investigation...

@jdm

This comment has been minimized.

Copy link
Member

jdm commented Jan 7, 2019

It should be possible to cast a pointer to usize and perform the same & 3 != 0 check.

@gterzian gterzian force-pushed the gterzian:fix_mac_sampler branch from 5351e2f to 25d5631 Jan 9, 2019

@gterzian

This comment has been minimized.

Copy link
Collaborator Author

gterzian commented Jan 9, 2019

Ok so I think I finally solved the issue.

The problem seemed to be that during a stack walk, the frame pointer would "suddenly" jump to point to a place very high in the stack, I'm guessing before the beginning(higher than the beginning). Actually, when cast to usize, the value would either be usize::max_value(), or very close to it.

For example, the frame pointer would in a previous step of the walk look like:

  • 0x700008c5aaa8 (as usize 123145449482920)

and then at the next, look like:

  • 0xfffb00011c6bed90 (as usize 18445336703597800848)

While libc::pthread_get_stackaddr_np gives us the "end"(low value) of the stack, I haven't found the equivalent to get the "beginning"(high value). So I've resorted to a check using if (usize::max_value() / current as usize) <= 1, which catches the problem.

@gterzian

This comment has been minimized.

Copy link
Collaborator Author

gterzian commented Jan 24, 2019

@jdm thoughts on this?

I didn't add the & 3 != 0 check because it didn't prevent this particular crash, while the fix described above seemed to.

It seems that the frame pointer stack walk differs a bit from the one in Gecko(I'm for example perplexed why Gecko seems to check for next > aStackEnd while we need to do the opposite check with current < stackaddr ), but let's say I was able to reproduce the crash you encountered reliably, and that the above fix made it go away...

@jdm
Copy link
Member

jdm left a comment

Let's go ahead and add the & 3 check as well; Gecko's implementation is well-tested, and it would be frustrating to encounter a non-pointer value later that causes a crash in the wild.

// Reached the end of the stack.
break;
}
if (usize::max_value() / current as usize) <= 1 {

This comment has been minimized.

@jdm

jdm Jan 25, 2019

Member

We can call pthread_get_stacksize_np to determine the other end of the stack. That seems like it will be more reliable than this heuristic.

This comment has been minimized.

@gterzian

gterzian Jan 26, 2019

Author Collaborator

I replaced this with if current as usize >= stackaddr.add(stacksize * 8) as usize, with the thinking behind * 8 being 64 bit mac... Does that make sense? It does catch the problem, as can be seen from the below print statement:
println!("Current {:?} stacksize: {:?} stackaddr {:?} test {:?} fp: {:?}", current as usize, stacksize as usize, stackaddr as usize, stackaddr.add(stacksize * 4) as usize, regs.frame_ptr as usize);

Current 123145415937664 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415937904 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415937952 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415938080 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415938160 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415938448 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415938560 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415939536 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940096 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940288 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940384 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940432 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940496 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940576 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940688 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 123145415940760 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
Current 18446181128325699968 stacksize: 2097152 stackaddr 123145407516672 test 123145415905280 fp: 123145415937664
@jdm

This comment has been minimized.

Copy link
Member

jdm commented Jan 25, 2019

Sorry about the delay; for some reason I thought this was still in progress.

@jdm jdm assigned jdm and unassigned emilio Jan 25, 2019

@gterzian gterzian force-pushed the gterzian:fix_mac_sampler branch from 25d5631 to da367fb Jan 26, 2019

@gterzian

This comment has been minimized.

Copy link
Collaborator Author

gterzian commented Jan 26, 2019

@jdm I've added the check, and used pthread_get_stacksize_np for the other check(which checks for a stackoverflow I think).

@gterzian gterzian force-pushed the gterzian:fix_mac_sampler branch 2 times, most recently from b40f863 to dadf943 Jan 26, 2019

@@ -112,6 +121,9 @@ unsafe fn frame_pointer_stack_walk(regs: Registers) -> NativeStack {
if let Err(()) = native_stack.process_register(*pc, *stack) {
break;
}
if (next <= current) && (next as usize & 3 != 0) {

This comment has been minimized.

@jdm

jdm Jan 28, 2019

Member

This should be ||.

This comment has been minimized.

@gterzian

gterzian Jan 29, 2019

Author Collaborator

Thanks, during development I actually had those into two separate if with print statements to see which one would hit, and when I collapsed them into one at the end I off-course made a mistake!

@jdm

This comment has been minimized.

Copy link
Member

jdm commented Jan 28, 2019

@bors-servo delegate+
Otherwise it looks fine!

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Jan 28, 2019

✌️ @gterzian can now approve this pull request

@gterzian gterzian force-pushed the gterzian:fix_mac_sampler branch from dadf943 to dccccef Jan 29, 2019

@gterzian

This comment has been minimized.

Copy link
Collaborator Author

gterzian commented Jan 29, 2019

@bors-servo r=jdm

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Jan 29, 2019

📌 Commit dccccef has been approved by jdm

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Jan 29, 2019

⌛️ Testing commit dccccef with merge 2fd9420...

bors-servo added a commit that referenced this pull request Jan 29, 2019

Auto merge of #22637 - gterzian:fix_mac_sampler, r=jdm
Fix frame-pointer stackwalking

<!-- Please describe your changes on the following line: -->

This seems to fix the problem, it's a check that is also done at https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/mozglue/misc/StackWalk.cpp#904

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #22604 (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22637)
<!-- Reviewable:end -->
@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Jan 29, 2019

💔 Test failed - linux-rel-wpt

@CYBAI

This comment has been minimized.

Copy link
Collaborator

CYBAI commented Jan 29, 2019

  ▶ CRASH [expected OK] /html/semantics/embedded-content/the-iframe-element/iframe_sandbox_navigate_ancestor-1.html
  │ 
  │ VMware, Inc.
  │ softpipe
  │ 3.3 (Core Profile) Mesa 18.3.0-devel
  │ Failed to receive a response from live font cache (thread LayoutThread PipelineId { namespace_id: PipelineNamespaceId(2), index: PipelineIndex(3) }, at components/gfx/font_cache_thread.rs:522)
  │ stack backtrace:
  │    0:     0x7fee67bc620d - backtrace::backtrace::trace::h1cd733210e7d0ee7
  │    1:     0x7fee67bc5062 - <backtrace::capture::Backtrace as core::default::Default>::default::h5b232415cbbfce88
  │    2:     0x7fee65644b3f - servo::main::{{closure}}::hd5e20857bd27bc9d
  │    3:     0x7fee688a661d - rust_panic_with_hook
  │                         at src/libstd/panicking.rs:482
  │    4:     0x7fee66f68b94 - std::panicking::begin_panic::h21f612861f96a666
  │    5:     0x7fee66f9003c - <gfx::font_cache_thread::FontCacheThread as gfx::font_context::FontSource>::get_font_instance::ha6973ecc87f1c7eb
  │    6:     0x7fee66ad15cb - <gfx::font_context::FontContext<S>>::font::hd8ecc87862973361
  │    7:     0x7fee66b378dc - <core::iter::adapters::FilterMap<I, F> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}::h0cec8089fb511385
  │    8:     0x7fee66b3b7cf - gfx::font::FontGroup::first::h3b25f082eba5fcf5
  │    9:     0x7fee66aa9693 - layout::text::font_metrics_for_style::haf2c6a49bac29bbe
  │   10:     0x7fee66aa46ba - <std::thread::local::LocalKey<T>>::with::h42e1172744a6ccac
  │   11:     0x7fee66b177b8 - layout::fragment::Fragment::content_inline_metrics::inline_metrics_of_text::h562382c4cd18c767
  │   12:     0x7fee66b176dc - layout::fragment::Fragment::content_inline_metrics::ha502e5782be8b83e
  │   13:     0x7fee66b17aeb - layout::fragment::Fragment::aligned_inline_metrics::h1def908cb5b9fad3
  │   14:     0x7fee66af9652 - <layout::inline::InlineFlow as layout::flow::Flow>::assign_block_size::hc324649038bdbe9b
  │   15:     0x7fee66adc3ad - layout::sequential::reflow::doit::h197e843dc2aba690
  │   16:     0x7fee66adc3ad - layout::sequential::reflow::doit::h197e843dc2aba690
  │   17:     0x7fee66adc3ad - layout::sequential::reflow::doit::h197e843dc2aba690
  │   18:     0x7fee66adc26a - layout::sequential::reflow::h5b7fd751e648f24a
  │   19:     0x7fee659f60d3 - layout_thread::LayoutThread::solve_constraints::h1c66c241981a73b9
  │   20:     0x7fee65b0f135 - profile_traits::time::profile::h6092dfa42f6eb1d2
  │   21:     0x7fee659fa98e - layout_thread::LayoutThread::perform_post_style_recalc_layout_passes::heceb825ff80aed2e
  │   22:     0x7fee659f83a9 - layout_thread::LayoutThread::handle_reflow::h62c3d1cc58f61fc3
  │   23:     0x7fee65b0fa83 - profile_traits::time::profile::hffc430e0c064183e
  │   24:     0x7fee659f0d5e - layout_thread::LayoutThread::handle_request_helper::h1c3936b59135d1a7
  │   25:     0x7fee659ef601 - layout_thread::LayoutThread::start::hb7a45a6f10a3eb02
  │   26:     0x7fee65a47486 - profile_traits::mem::ProfilerChan::run_with_memory_reporting::hcaa5a25d8445fc05
  │   27:     0x7fee65a94e57 - std::sys_common::backtrace::__rust_begin_short_backtrace::h5b45bb2f5353e077
  │   28:     0x7fee65a95415 - _ZN3std9panicking3try7do_call17h7597226340421ee0E.llvm.11828671310939098513
  │   29:     0x7fee688b1119 - __rust_maybe_catch_panic
  │                         at src/libpanic_unwind/lib.rs:92
  │   30:     0x7fee65b182c7 - <F as alloc::boxed::FnBox<A>>::call_box::hb5ade8794d26455d
  │   31:     0x7fee688b04cd - call_once<(),()>
  │                         at /rustc/da6ab956e1002517803ecd38b904504a1223274b/src/liballoc/boxed.rs:744
  │                          - start_thread
  │                         at src/libstd/sys_common/thread.rs:14
  │                          - thread_start
  │                         at src/libstd/sys/unix/thread.rs:81
  │   32:     0x7fee63006183 - start_thread
  │   33:     0x7fee618cd03c - clone
  │   34:                0x0 - <unknown>
  │ [2019-01-29T15:20:14Z ERROR servo] Failed to receive a response from live font cache
  │ Pipeline failed in hard-fail mode.  Crashing!
  └ thread panicked while processing panic. aborting.
@gterzian

This comment has been minimized.

Copy link
Collaborator Author

gterzian commented Jan 29, 2019

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Jan 29, 2019

⌛️ Testing commit dccccef with merge 62ff032...

bors-servo added a commit that referenced this pull request Jan 29, 2019

Auto merge of #22637 - gterzian:fix_mac_sampler, r=jdm
Fix frame-pointer stackwalking

<!-- Please describe your changes on the following line: -->

This seems to fix the problem, it's a check that is also done at https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/mozglue/misc/StackWalk.cpp#904

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #22604 (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/22637)
<!-- Reviewable:end -->
@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Jan 29, 2019

@bors-servo bors-servo merged commit dccccef into servo:master Jan 29, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details

@gterzian gterzian deleted the gterzian:fix_mac_sampler branch Jan 30, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment