Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in parallel selector matching #1319

Closed
pradeep90 opened this issue Nov 29, 2013 · 25 comments
Closed

Fix race in parallel selector matching #1319

pradeep90 opened this issue Nov 29, 2013 · 25 comments

Comments

@pradeep90
Copy link
Contributor

@pradeep90 pradeep90 commented Nov 29, 2013

Parallel selector matching was disabled in #1315 due to races.

Mainly, the page at http://en.wikipedia.org/wiki/Yellow_River gives the following error for me

task '<unnamed>' failed at 'This case makes no sense.', /home/spradeep/Code/servo/src/components/main/layout/inline.rs:270
task '<unnamed>' failed at 'receiving on closed channel', /home/spradeep/Code/servo/src/compiler/rust/src/libstd/rt/comm.rs:492
task '<unnamed>' failed at 'killed by linked failure', /home/spradeep/Code/servo/src/compiler/rust/src/libstd/rt/kill.rs:639
failed in non-task context at 'task '<unnamed>' failed at 'killed by linked failure', /home/spradeep/Code/servo/src/compiler/rust/src/libstd/rt/kill.rs:639
assertion failed: self.live_allocs.is_null()', /home/spradeep/Code/servo/src/compiler/rust/src/libstd/rt/local_heap.rs:133
Illegal instruction (core dumped)

Since the original author is not working on this project anymore, I'm trying to locate the commit which introduced the race condition.

Note that at the time when @ILyoan added parallel selector matching, Wikipedia pages weren't displaying properly in Servo.

Last commit where the Yellow River page seems to work: june0cho@ee5ff3f (the general Wikipedia issues were fixed by this time)

So, I'm guessing that in some commit between the above commit and a76f761, the code causing the race condition was introduced.

Are there any suggestions for how to approach this?

@june0cho
Copy link
Contributor

@june0cho june0cho commented Nov 29, 2013

I also tested Yellow River page with other computer at ee5ff3f and a76f761 for checking and got the same result: no crash at ee5ff3f but similar crash to above at a76f761.

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Nov 30, 2013

It seems to crash after 155befe (Rewrite flow construction to be incrementalizable and parallelizable.)
It works for the commit before that one.

I reduced the Yellow River wikipedia page to the following and am still able to reproduce the error mentioned above.

<!DOCTYPE html>
<html>
  <head>
    <style type="text/css">
      #baz_id{
      float:right;
      }
    </style>
    <script src="real-mini-load_004.php">
    </script>
  </head>
  <body>
    <span id="baz_id">
      <span>foo</span>
      <span>bar</span>
    </span>
  </body>
</html>

The script file is https://gist.github.com/pradeep90/7715923

@pcwalton When you said you got a race, did you mean that you got the error above ("This case makes no sense")? Or was it something else?

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Nov 30, 2013

Update:
I've narrowed down the JS script to two lines.
The following HTML causes Servo to crash with the "This case makes no sense" error.
The appendChild line seems to cause the crash.

<!DOCTYPE html>
<html>
  <head>
    <style type="text/css">
      #baz_id{
      float:right;
      }
    </style>
    <script type="text/javascript">
      var divChild = document.createElement("div");
      var safeFragment = document.createDocumentFragment();
      var fragmentDiv = safeFragment.appendChild(divChild);
    </script>
  </head>
  <body>
    <span id="baz_id">
      <span>foo</span>
      <span>bar</span>
    </span>
  </body>
</html>
@jdm
Copy link
Member

@jdm jdm commented Nov 30, 2013

Yes, I would expect that to be fixed by #1252. @metajack, ping?

@jdm
Copy link
Member

@jdm jdm commented Dec 3, 2013

Does this still reproduce on master?

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 3, 2013

Even after #1252, I get the same error with the above HTML.

@jdm Do you think this is some problem with the parallel selector matching code? Or is it something related to parallel layout that will be fixed soon?
If it is the first, I will try to debug parallel selector matching further.

@jdm
Copy link
Member

@jdm jdm commented Dec 3, 2013

I didn't notice that the fragmentDiv is never appended to the main document. It makes sense that #1252 would not have any effect here, now that I see that.

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 9, 2013

Just noticed that, even with parallel selector matching disabled, Servo gives the This case makes no sense error for the above HTML.

Can anybody else confirm this (by using the master branch code)?

@pcwalton When you said you got a race, did you mean that you got the error above ("This case makes no sense")? Or was it something else?

@jdm
Copy link
Member

@jdm jdm commented Dec 9, 2013

I can confirm on master. What an interesting testcase!

@jdm
Copy link
Member

@jdm jdm commented Dec 9, 2013

There are identical symptoms if the script content is simply

var safeFragment = document.createDocumentFragment();
safeFragment.textContent = "hi";

I don't know what to make of this.

@jdm
Copy link
Member

@jdm jdm commented Dec 9, 2013

Looking at the output of RUST_LOG=servo::layout, the difference between two runs in which one appends the DocumentFragment to a div and the other does the reverse is that the second reflow request in the crashing case is a MatchSelectorsDocumentDamage one, while the other is ReflowDocumentDamage (and also counts as a resize). Both runs construct the same flow tree, but the reflowing one measures some boxes and creates float contexts with non-zero sizes, while the selector matching one has zero sizes everywhere.

@jdm
Copy link
Member

@jdm jdm commented Dec 9, 2013

Ok, I think the initial resize event is racing with the initial document modification events. Window::content_changed calls Page::reflow_all without setting a particular type of damage, so if no resize event comes in we default to matching selectors.

@jdm
Copy link
Member

@jdm jdm commented Dec 9, 2013

It seems like we might be propagating damage incorrectly, since when a resize is not performed we don't end up calling bubble_widths on the floated elements.

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 11, 2013

With the latest master code, I don't get the above error anymore on the mini HTML above. It works.

However, for the original Yellow River HTML, I get this error and backtrace
task '<unnamed>' failed at 'assertion failed: right_i > 0 && right_i < boxes.len()', /home/spradeep/Code/servo/src/components/main/layout/text.rs:61

#0  0x00007ffff0f80180 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007ffff2cf45ea in rust_begin_unwind (token=839147) at /home/spradeep/Code/servo/src/compiler/rust/src/rt/rust_builtin.cpp:493
#2  0x00007ffff778032d in rt::task::Unwinder::begin_unwind::h221287b8688eddadaqaE::v0.9$x2dpre () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gnu/\
stage2/lib/rustc/x86_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
#3  0x00007ffff7760427 in rt::task::begin_unwind_reason::h29488551c3cf6ec9aB::v0.9$x2dpre () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gnu/stage\
2/lib/rustc/x86_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
#4  0x00007ffff76d0b85 in sys::FailWithCause$__extensions__::fail_with::hd96679812a86c367dnal::v0.9$x2dpre () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknow\
n-linux-gnu/stage2/lib/rustc/x86_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
#5  0x00000000004c92e8 in can_coalesce_text_nodes () at <std-macros>:40
#6  scan_for_runs () at /home/spradeep/Code/servo/src/components/main/layout/text.rs:43
#7  flush_inline_boxes_to_flow_if_necessary () at /home/spradeep/Code/servo/src/components/main/layout/construct.rs:244
#8  servo::layout::construct::FlowConstructor::flush_inline_boxes_to_flow_if_necessary () at /home/spradeep/Code/servo/src/components/main/layout/construct.rs:257
#9  0x00000000004ca354 in servo::layout::construct::FlowConstructor::build_children_of_block_flow () at /home/spradeep/Code/servo/src/components/main/layout/construct.rs:338
#10 0x00000000004def3c in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/construct.rs:359
#11 0x00000000004ddc85 in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#12 0x00000000004ddc85 in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#13 0x00000000004ddc85 in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#14 0x00000000004ddc85 in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#15 0x00000000004ddc85 in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#16 0x00000000004ddc85 in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#17 0x00000000004ddc85 in dom::node::AbstractNode::traverse_postorder_mut::h49d3b9f55a8fb20pla4::v0.1 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#18 0x00000000004dd71c in servo::layout::layout_task::LayoutTask::construct_flow_tree () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:348
#19 0x00000000004d9ca9 in fn26440 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:443
#20 handle_reflow () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:441
#21 fn26209 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:443
#22 fn26209 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:279
#23 0x00000000004428f3 in time::profile::h5c7f194e062f96baj::v0.1 () at /home/spradeep/Code/servo/src/components/main/compositing/run.rs:353
#24 0x00000000004d169a in handle_request () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:279
#25 start () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:252
#26 fn26118 () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:217
#27 0x00000000004cf00e in task::TaskBuilder::spawn_with::anon::expr_fn::Fiaoaz () at /home/spradeep/Code/servo/src/components/main/layout/layout_task.rs:22
#28 0x00007ffff7721ef5 in task::spawn::spawn_raw::anon::expr_fn::aE () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gnu/stage2/lib/rustc/x86_64-unk\
nown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
#29 0x00007ffff777ff4e in rt::task::__extensions__::build_start_wrapper::anon::anon::expr_fn::aq () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gn\
u/stage2/lib/rustc/x86_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
#30 0x00007ffff777dfd3 in rt::task::__extensions__::run::anon::expr_fn::aq () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gnu/stage2/lib/rustc/x86\
_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
#31 0x00007ffff2cf4573 in rust_try (f=<optimized out>, fptr=<optimized out>, env=<optimized out>) at /home/spradeep/Code/servo/src/compiler/rust/src/rt/rust_builtin.cpp:483
#32 0x00007ffff777dcf7 in rt::task::Task::run::h199ab8d6eb2269804van::v0.9$x2dpre () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gnu/stage2/lib/ru\
stc/x86_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
#33 0x00007ffff777fafa in rt::task::__extensions__::build_start_wrapper::anon::expr_fn::ah () from /home/spradeep/Code/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gnu/stag\
e2/lib/rustc/x86_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so
@jdm
Copy link
Member

@jdm jdm commented Dec 11, 2013

We're also seeing that same failure on the mac builders when running test_collections.html sometimes.

@jdm
Copy link
Member

@jdm jdm commented Dec 11, 2013

Here's a reduced version. Every part of this is required:

<a><img/></a> <div>a</div>
@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 12, 2013

Yellow River now displays correctly with master.
When I tried enabling parallel selector matching, I keep running into what seems like the FcPatternDestroy segfault issue (#668), which has not been resolved. This happens when I scroll down three or four times.

Can anybody else try the Yellow River page with https://github.com/mozilla/servo/blob/master/src/components/main/css/matching.rs line 50 having

    let num_tasks = rt::default_sched_threads() * 2;

instead of

    let num_tasks = 1;
@jdm
Copy link
Member

@jdm jdm commented Dec 12, 2013

I see a bunch of strange crashes with that change (including #1386 regularly), but no FcPatternDestroy.

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 12, 2013

@jdm Hmm... I see ElementLike::get_attr being used many times in the matches_simple_selector code.

So, I guess the parallel access of @mut Attr is causing races when parallel selector matching is enabled, but not when it is just sequential code.
Is that right?

@jdm
Copy link
Member

@jdm jdm commented Dec 12, 2013

I'm not sure, since I imagine the tasks would be looking at different elements.

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 12, 2013

@jdm
On a Linux machine with Nouveau driver (which doesn't get the FcPatternDestroy error), I get the following backtrace with parallel selector matching enabled:

You've met with a terrible fate, haven't you?
*** Error in `./servo': double free or corruption (fasttop): 0x00007f6cd917e0f0 ***


======= Backtrace: =========
fatal runtime error: /lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7f6d0c6f1a46]
assertion failed: self.live_allocations > 0
/home/pradeep/servo/master-build/x86_64-unknown-linux-gnu/src/components/script/libscript-45c6dadca23b963b-0.1.so(+0xaeb5a)[0x7f6d0d16db5a]

/home/pradeep/servo/master-build/x86_64-unknown-linux-gnu/src/components/script/libscript-45c6dadca23b963b-0.1.so(+0x17b049)[0x7f6d0d23a049]
/home/pradeep/servo/master-build/x86_64-unknown-linux-gnu/src/components/script/libscript-45c6dadca23b963b-0.1.so(_ZN3dom7element7Element13get_attribute21hcdd08ef46e988a74BmaI4v0.1E+0xc4)[0x7f6d0d238ea4]
Redirecting call to abort() to mozalloc_abort

/home/pradeep/servo/master-build/x86_64-unknown-linux-gnu/src/compiler/rust/x86_64-unknown-linux-gnu/stage2/lib/rustc/x86_64-unknown-linux-gnu/lib/libstd-6c65cf4b443341b1-0.9-pre.so(+0xddef5)[0x7f6d11687ef5]
Segmentation fault (core dumped)

The self.live_allocations > 0 assertion is in local_heap.rs in std::rt
Also, the backtrace shows Element13get_attribute
I guess this would point to more than one task having access to a managed box

@metajack
Copy link
Contributor

@metajack metajack commented Dec 13, 2013

In IRC we discussed using cast::forget in get_attribute's call to map() to see if what's causing this is the Attribute going out of scope, racing on the refcount, and causing an early free.

That's not a real fix however. We'll need to change @mut Attr here to JSManaged when we merge the JSManaged branch. I think @jdm plans to do that soon.

@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 16, 2013

I'm not able to reproduce the crash anymore (even without any of the above forget changes).

pradeep90 added a commit to pradeep90/servo that referenced this issue Dec 18, 2013
The problems that were caused because of Yellow River seem to be fixed.

(See servo#1319).
@pradeep90
Copy link
Contributor Author

@pradeep90 pradeep90 commented Dec 18, 2013

I guess the crash I saw above might have been because of some older build.

bors-servo pushed a commit that referenced this issue Dec 18, 2013
The problems that were caused because of Yellow River seem to be fixed.

(See #1319).
@yichoi
Copy link
Contributor

@yichoi yichoi commented Dec 18, 2013

#1429 landed so close it

@yichoi yichoi closed this Dec 18, 2013
therealglazou added a commit to therealglazou/servo that referenced this issue Feb 20, 2014
The problems that were caused because of Yellow River seem to be fixed.

(See servo#1319).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.