Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a bloom filter to CSS selector matching. #3212

Merged
merged 1 commit into from Sep 16, 2014

Conversation

@cgaebel
Copy link
Contributor

cgaebel commented Sep 5, 2014

Every other browser engine uses a bloom filter to quickly reject Descendant
selectors. This adds that feature to servo, and it even works (mostly!) in
parallel!

@highfive
Copy link

highfive commented Sep 5, 2014

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!
@hoppipolla-critic-bot
Copy link

hoppipolla-critic-bot commented Sep 5, 2014

Critic review: https://critic.hoppipolla.co.uk/r/2506

This is an external review system which you may optionally use for the code review of your pull request.

In order to help critic track your changes, please do not make in-place history rewrites (e.g. via git rebase -i or git commit --amend) when updating this pull request.

@pcwalton
Copy link
Contributor

pcwalton commented Sep 5, 2014

Can you give some numbers on sample pages (say, Wikipedia) which shows how many bloom filter invalidations we have in practice?

use std::mem;
use style::Stylist;
use style::{Stylist,SimpleSelector};

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

nit: space here and alphabetize

@@ -638,8 +648,22 @@ impl LayoutTask {
}
self.screen_size = current_screen_size;

// TODO(cgaebel): Instead of counting the max dom selectors, maybe just
// use a 1 MB bloom filter? Or use d-left hasing to allow one that grows

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

nit: "hasing" -> "hashing"

let (max_dom_node_selectors, descendant_simple_selectors) =
profile(time::LayoutMaxSelectorMatchesCategory,
self.time_profiler_chan.clone(),
// TODO(cagebel): Parallelize this traversal.

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

typo: your name :)

@@ -213,6 +214,93 @@ impl<'a> ParallelPreorderFlowTraversal for AssignISizesTraversal<'a> {

impl<'a> ParallelPostorderFlowTraversal for AssignBSizesAndStoreOverflowTraversal<'a> {}

/// A pair of the bloom filter used for css selector matching, and the node to
/// which it applies. This is used to efficiently do `Desendant` selector

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

typo: Descendant

/// As we walk down the DOM tree a task-local bloom filter is built of all the
/// CSS `SimpleSelector`s which are part of a `Descendant` compound selector
/// (i.e. paired with a `Descendant` combinator, in the `next` field of a
/// `CompoundSelector`.

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

Huh. Is this how other browser engines do it? I would have naively thought that you would just keep track of all classes/tag names/IDs.

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

Ok. I changed it to do that instead. That seems a lot smarter than what I was doing! It saves a pass, and is a lot simpler. I'll update the comment in a minute.

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

Here's what Gecko uses (it's an nsIAtom, which is the type used by class names/tag names/IDs): http://mxr.mozilla.org/mozilla-central/source/layout/style/nsRuleProcessorData.h#72

Looks like Gecko uses element tags and classes: http://mxr.mozilla.org/mozilla-central/source/layout/style/nsCSSRuleProcessor.cpp#3636

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

It looks like gecko is also keeping an stack of atoms for "what's above me", in addition to the bloom filter. This wouldn't be too crazy to also implement.

(Some(p), None) => new_bloom(Some(p)),
// Found cached bloom filter.
(Some(p), Some((bf, old_node))) => {
// Heey, the cached parent is our parent! We can reused the bloom

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

typo: "Hey", "reuse"

/// Quickly figures out whether or not the compound selector is worth doing more
/// work on. If the simple selectors don't match, or there's a child selector
/// that does not appear in the bloom parent bloom filter, we can exit early.
fn can_fast_reject<E: TElement, N: TNode<E>>(

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

Does your Rust have where clauses? If so, you can write this as fn can_fast_reject<E,N>(...) where E: TElement, N: TNode<E>, which is easier to indent.

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

I don't think it does.

// Key Stretching
// ==============
//
// Siphash is expensive. Instead of running it `NUMBER_OF_HASHES`, which would

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

You don't need a cryptographically strong hash at all here. Just use xxhash or something.

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

But then I can't use #deriving(Hash) :(

This comment has been minimized.

@brendanzab

brendanzab Sep 5, 2014

Member

You can implement a custom Hasher. Then Hash would still work.

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

I have switched to FNV hashing, but now this PR is blocked on: servo/stringcache#11.

EDIT: Unblocked now.

/// The return value of this function is extremely sketchy.
/// The 'static lifetime on SimpleSelector is so that lifetime parameters on
/// `SharedLayoutContext` can be avoided.
pub fn max_selector_matches<E: TElement, N: TNode<E>>(&self, node: &N)

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

Can use a where clause here maybe as well.


/// For the css selection bloom filter, we need to get an estimate of the
/// matching descendant simple selector in the tree. This function will
/// return a set of them.

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

Do other browser engines have to do this?

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

I heard on IRC that chrome just uses a 4 kb 8-bit count bloom filter no matter what. I think that's what leads to the picture here from this blog.

This comment has been minimized.

@pcwalton

pcwalton Sep 5, 2014

Contributor

I'd rather do what Chrome/Blink does than add a whole new pass. New passes are expensive.

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

Done. No extra pass.

@jdm
Copy link
Member

jdm commented Sep 5, 2014

/home/travis/build/servo/servo/src/components/layout/wrapper.rs:234:5: 238:6 error: method `first_child` is not a member of trait `TNode`
/home/travis/build/servo/servo/src/components/layout/wrapper.rs:234 fn first_child(&self) -> Option<LayoutNode<'ln>> {
/home/travis/build/servo/servo/src/components/layout/wrapper.rs:235 unsafe {
/home/travis/build/servo/servo/src/components/layout/wrapper.rs:236 self.node.first_child_ref().map(|node| self.new_with_this_lifetime(&node))
/home/travis/build/servo/servo/src/components/layout/wrapper.rs:237 }
/home/travis/build/servo/servo/src/components/layout/wrapper.rs:238 }
error: aborting due to previous error
let mut ret = false;

for shash in stretch(&mut to_rng(hash)) {
ret |= self.definitely_excludes_shash(shash);

This comment has been minimized.

@huonw

huonw Sep 5, 2014

Contributor

Isn't this doing more work than necessary, since it doesn't shortcircuit. Could this whole function be stretch(&mut to_rng(hash)).any(|shash| self.definitely_excludes_shash(shash))?

This comment has been minimized.

@cgaebel

cgaebel Sep 5, 2014

Author Contributor

I did that at first, but it was slower. Apparently branches are expensive.

This comment has been minimized.

@brendanzab

brendanzab Sep 5, 2014

Member

Maybe a comment would be good to clarify.


/// A bloom filter can tell you if an element /may/ be in it. It cannot be
/// certain. But, assuming correct usage, this query will have a low false
// positive rate.

This comment has been minimized.

@huonw

huonw Sep 5, 2014

Contributor

Needs another /

@cgaebel
Copy link
Contributor Author

cgaebel commented Sep 5, 2014

For this wikipedia page, the bloom filter success stats are:

  invalidations: 2129  // times the bloom filter was thrown away
  hits:          1496  // times the bloom filter was reused
  rejections:    25971 // times the bloom filter managed to quickly reject a css selector
@cgaebel
Copy link
Contributor Author

cgaebel commented Sep 5, 2014

I believe I've addressed all the code review comments. Does everyone want to take another look? :)

@metajack
Copy link
Contributor

metajack commented Sep 5, 2014

Timings?

@jdm
Copy link
Member

jdm commented Sep 7, 2014

This PR consistently has trouble with the unit-doc job on travis, and we end up having problems building string-cache with errors like error: macro undefined: 'static_atom_map!'.

@cgaebel
Copy link
Contributor Author

cgaebel commented Sep 8, 2014

@jdm: How do I run the unit-doc job on my local machine?

@pcwalton
Copy link
Contributor

pcwalton commented Sep 8, 2014

A few things I'd like to see any or all of:

  • What is the rejection rate as a fraction of total ancestor/child selectors? Of total selectors?
  • How long does it take to style Wikipedia before and after this change?
  • Does your hit rate go up when you decrease the number of threads (-y)? Currently we overschedule a little bit, but that may well not be the right thing anymore if it decreases the hit rate.
  • As a sanity check, if you hack the code to use only one work stealing deque, does your hit rate reach 100%?
@cgaebel cgaebel force-pushed the cgaebel:style-resolution-bloom-filter branch from 130fcb7 to 7dffa22 Sep 11, 2014
@metajack
Copy link
Contributor

metajack commented Sep 12, 2014

atom.rs:14:23: 14:27 error: mismatched types: expected `&mut collections::hash::sip::SipState` but found `&mut __S` (expected struct collections::hash::sip::SipState but found type parameter)

atom.rs:14 #[deriving(Clone, Eq, Hash, PartialEq)]

^~~~

note: in expansion of #[deriving]

atom.rs:14:1: 14:40 note: expansion site

error: aborting due to previous error
@cgaebel cgaebel force-pushed the cgaebel:style-resolution-bloom-filter branch from 7dffa22 to 614aba2 Sep 12, 2014
@cgaebel
Copy link
Contributor Author

cgaebel commented Sep 12, 2014

Ok I have some more numbers to report now! All this data is for when I just opened http://cnn.com.

+-----------------------------------------------+--------+
| Total css selector matchings attempted        | 143190 |
| Total descendant selector matchings attempted | 132454 |
| Total bloom filter rejections                 | 126844 |
+-----------------------------------------------+--------+

Total time from startup to close:
   - 18.5s with bloom filter
   - 69s without bloom filter

If you want to play with the profiling data yourself, you can get it here. Run 1 is without the bloom filter, Run 2 is with.

This still isn't ready to merge because I haven't fixed the build on all platforms. I'll work on having that done by the end of today.

@metajack
Copy link
Contributor

metajack commented Sep 12, 2014

That is a nice looking table.

@pcwalton
Copy link
Contributor

pcwalton commented Sep 12, 2014

Not to diminish the awesomeness of this work, but I think the relative benefit of the bloom filter will be reduced a lot when we start using atoms to compare classes (PR incoming soon).

@metajack
Copy link
Contributor

metajack commented Sep 12, 2014

That might change the absolute performance gain, but won't change the hit rate. Let's rerun numbers with your atom patch.

@pcwalton
Copy link
Contributor

pcwalton commented Sep 12, 2014

Right.

@cgaebel
Copy link
Contributor Author

cgaebel commented Sep 12, 2014

This is the number of bloom filter invalidations as a function of the number of threads used for layout. Getting data for 1 thread isn't fair, because it uses a different codepath.

+--------------+---------------+
| Thread Count | Invalidations |
+--------------+---------------+
|       2      |       11      |
|       4      |      200      |
|       8      |      450      |
|      16      |      750      |
|      32      |     1100      |
|      64      |     1600      |
|     128      |     2350      |
+--------------+---------------+

Given that the filter allows us to quickly reject 100k selectors, unless the average depth of a selector is greater than 50 (for 128 threads), this seems like a win.

@cgaebel cgaebel force-pushed the cgaebel:style-resolution-bloom-filter branch 4 times, most recently from b4f1f61 to cad8329 Sep 12, 2014
@cgaebel
Copy link
Contributor Author

cgaebel commented Sep 13, 2014

Whaaaat the build is passing!? Cool. Can I get an r+?

@@ -278,16 +279,31 @@ pub enum StyleSharingResult<'ln> {
}

pub trait MatchMethods {
/// Inserts and removes the matching `Child` selectors from a bloom filter.
/// This is used to speed up CSS selector matching to remove unnecessary
/// tree climbs for `Child` queries.

This comment has been minimized.

@pcwalton

pcwalton Sep 15, 2014

Contributor

Don't you mean descendant selectors?

This comment has been minimized.

@pcwalton

pcwalton Sep 15, 2014

Contributor

Also, you aren't putting selectors in the bloom filter: you're putting local names, namespaces, IDs, and classes.

///
/// Since a work-stealing queue is used for styling, sometimes, the bloom filter
/// will no longer be the for the parent of the node we're currently on. When
/// this happens, the task local bloom filter will be throw away and rebuilt.

This comment has been minimized.

@pcwalton

pcwalton Sep 15, 2014

Contributor

typo: "throw" → "thrown"

///
/// If one does not exist, a new one will be made for you. If it is out of date,
/// it will be thrown out and a new one will be made for you.
fn take_task_local_bloom_filter<'ln>(

This comment has been minimized.

@pcwalton

pcwalton Sep 15, 2014

Contributor

Do you need the explicit lifetime here?

}
}

fn put_task_local_bloom_filter<'ln>(bf: BloomFilter, unsafe_node: &UnsafeLayoutNode) {

This comment has been minimized.

@pcwalton

pcwalton Sep 15, 2014

Contributor

Seems like this explicit lifetime is unused as well.

@pcwalton
Copy link
Contributor

pcwalton commented Sep 15, 2014

r+ with nits addressed. We may want to move away from FNV in the future but this is fine for now.

@cgaebel cgaebel force-pushed the cgaebel:style-resolution-bloom-filter branch from cad8329 to acd83ff Sep 15, 2014
metajack added a commit that referenced this pull request Sep 16, 2014
Added a bloom filter to CSS selector matching.
@metajack metajack merged commit ad02534 into servo:master Sep 16, 2014
1 check passed
1 check passed
continuous-integration/travis-ci The Travis CI build passed
Details
@cgaebel cgaebel deleted the cgaebel:style-resolution-bloom-filter branch Sep 16, 2014
@Manishearth
Copy link
Member

Manishearth commented Sep 16, 2014

@cgaebel btw, this works:

Thread Count Invalidations
2 11
4 200
8 450
16 750
32 1100
64 1600
128 2350

(See the tables section)

:)

@cgaebel
Copy link
Contributor Author

cgaebel commented Sep 16, 2014

Whoa. Very slick. Gotta change up my ascii table style, now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants
You can’t perform that action at this time.