Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Rule Hash for Selector Matching. #1167

Closed
wants to merge 3 commits into from

Conversation

@pradeep90
Copy link
Contributor

pradeep90 commented Nov 1, 2013

  • Using Rule Hash, we can implement the parallel optimizations for CSS Selector Matching from Fast and parallel web page layout (2010)
  • Add index fields in Rule for parent StyleRule and parent Stylesheet to break ties while cascading.
    Sorting by (specificity, stylesheet index, rule index) removes the need to use stable sorts.
+ Add index fields in Rule for parent StyleRule and parent Stylesheet to break
  ties while cascading.
  Sorting by (specificity, stylesheet index, rule index) removes the need to use stable sorts.
@highfive
Copy link

highfive commented Nov 1, 2013

Thanks for the pull request, and welcome! The Servo team is excited to review your changes, and you should hear from @jdm (or someone else) soon.

@jdm
Copy link
Member

jdm commented Nov 1, 2013

Wow, this looks very cool. Tagging @SimonSapin to dig into it!

@bzbarsky
Copy link
Contributor

bzbarsky commented Nov 1, 2013

So the rule/sheet index fields need to get updated when the CSSOM is modified (sheets are added/removed, rules are added/removed, etc), right?

@pradeep90
Copy link
Contributor Author

pradeep90 commented Nov 4, 2013

@bzbarsky
Yes, the indexes would need to be updated
Just to clarify, I don't think removal of stylesheets or rules is implemented yet. Please correct me if I'm wrong.

  • The only way to add a stylesheet to Stylist is using the add_stylesheet method - I increment the stylesheet_index field then.
  • The way it works as of now is that a sheet is sent to Stylist once and all its rules are stored in Stylist.
    That is where I set the rule indexes.

The sheet itself is not being stored in Stylist. So, I don't see how changes in the stylesheet (addition / removal of Rules) are being communicated to Stylist.
Plus, there doesn't seem to be any function to remove a stylesheet as of now.

So, when these modification operations are supported, we can update the indexes accordingly. The indexes only need to be monotonically increasing - they don't have to be continuous.

@bzbarsky
Copy link
Contributor

bzbarsky commented Nov 4, 2013

So inserting a <style> element not at the end of the page is not supported right now?

@sanxiyn
Copy link
Contributor

sanxiyn commented Nov 4, 2013

@bzbarsky No, it isn't, so this is not a regression... Of course we should keep that case in mind.

Support for adding/removing style element is issue #976.

@bzbarsky
Copy link
Contributor

bzbarsky commented Nov 4, 2013

Right; my real question is how this approach scales to having to reindex. Is the data structure that needs to be reindexed a persistent one, or recreated on CSSOM modifications?

@pradeep90
Copy link
Contributor Author

pradeep90 commented Nov 5, 2013

@bzbarsky The data structure that needs to be reindexed (SelectorMap) is persistent. It doesn't have to be recreated.
It is stored in Stylist which is stored in LayoutTask.

When a new rule is inserted, we will have to update the indexes of the rules which come after it.

I think it is a trade-off between having fast selector matching and being able to handle dynamic style quickly.
With Rule Hash, we can implement other optimizations from the previously mentioned paper on top of it.

If we could get some representative numbers regarding the frequency and extent of dynamic style modifications, then we could make a better decision.

So, please advise as to the best way to move forward in this situation.


Also, there is a penalty even in the approach currently used in master branch: Having a stable-sorted list of all rules. This will also have trouble scaling to having to reindex.

Example: Stylesheet foo having rules rule1, rule2, rule3 in decreasing order of
specificity.

  • We would have the sorted rules stored in Stylist as [rule3, rule2, rule1]
  • Now we want to dynamically add a rule "rule22" at index 2 in the sheet with specificity as that of rule3.
  • The sorted list should become [rule22, rule3, rule2, rule1] because rule22 comes before rule3
  • However, because we insert rules by appending them at the end and then doing a stable sort, we would end up with [rule3, rule22, rule2, rule1]

We would need to refer to the original list and do extra work to calculate the correct position in the sorted list.
In other words, even in the simple sorted list of Rules approach, inserting a new rule would not be cheap.

@bzbarsky
Copy link
Contributor

bzbarsky commented Nov 5, 2013

For what it's worth, Gecko completely recreates its RuleHash when you insert rules or sheets (lazily, of course). So chances are, the reindexing operation on insertions should be OK, as long as it's no worse than O(N) in number of sheets/rules.

@pradeep90
Copy link
Contributor Author

pradeep90 commented Nov 6, 2013

@bzbarsky Yeah, it will be O(N) in number of rules. So I can implement that when we have dynamic style working.

What other changes should I make as of now?

@bzbarsky
Copy link
Contributor

bzbarsky commented Nov 6, 2013

Some general notes that shouldn't necessarily block this from landing:

  1. In quirks mode id and class matches need to be case-insensitive, which means case-insensitive hash lookups here.
  2. Long-term we want to store the class pre-split instead of splitting every time.
  3. Is the tim_sort call on every get_all_matching_rules() fast enough? In Gecko's case, we use presorted lists that we then merge, and the merging logic has definitely showed up in profiles; I would expect a sort to show up too...
  4. I don't believe the localName handling in get_element_name correctly deals with matching HTML elements in HTML documents case-insensitively (or more precisely ASCII-lowercased) while matching everything else case-sensitively.
Both the insertion key for rules having a LocalNameSelector and the lookup-key sent in
get_all_matching_rules should be ASCII lower case.
@pradeep90
Copy link
Contributor Author

pradeep90 commented Nov 6, 2013

@bzbarsky

  1. Quirks mode Hash lookups - Even in matches_simple_selector they have // TODO: case-sensitivity depends on the document type and quirks mode. So, I'll wait for the implementation there.

  2. class splitting - Element should ideally have a Set of classes instead of a string. But that's part of the DOM code.

  3. sort - If you're saying that there are enough matched rules on average to make even merge costly, then sort would be costly too. I will implement pre-sorting soon in a different pull request.
    The problem is that every time we call add_stylesheet, we would need to sort the rules for ALL keys in the HashMap.

    Options for pre-sorting rules:

    • We could sort only the rules for key "foo" at the time of adding a rule for ID "foo"
    • We could sort all rules once and for all after all sheets have been added (we would need to know when this happens)
    • We could have a Linked List of Rules as values in the HashMap and insert each Rule in O(M) where M is the number of rules for that key.
  4. Element case-insensitivity - Added commit above

@bzbarsky
Copy link
Contributor

bzbarsky commented Nov 6, 2013

you're saying that there are enough matched rules on average to make even merge costly,

It's not clear whether it's enough matched rules or what. Just have the profile data, not the breakdown of what the lists look like. But I suspect that the "universal rule" bucket is pretty big on some sites.

Element case-insensitivity - Added commit above

Now it's wrong for SVG elements.

What you probably want is to add both the lowercased and the non-lowercased (if they're different) versions of the tag name to the hashtable and then make sure during matching itself to use the right one. You should not be doing any lowercasing of element localNames.

@pradeep90
Copy link
Contributor Author

pradeep90 commented Nov 7, 2013

@bzbarsky I don't think support for SVG elements is implemented yet.
If you see matches_simple_selector(), it says

TODO: case-sensitivity depends on the document type

for LocalNameSelector.

Currently, to handle LocalNameSelector, we do

element.get_local_name().eq_ignore_ascii_case(name.as_slice())

in matches_simple_selector

So, to implement the current functionality and not regress, I need to handle element names and element selectors case-insensitively in SelectorMap.
Hope that was clear. Please let me know if I'm wrong :)

@SimonSapin
Copy link
Member

SimonSapin commented Nov 7, 2013

I just saw this, sorry for the delays in replying.

We do not implement CSSOM at all at the moment. Doing so will requires non-trivial refactoring, since the current code does not keep Stylesheet objects around after parsing. However adding a <style> element should work, using the same code path as any <style> element.

We also do not have any notion of quirks mode / standards mode, HTML / XML elements / documents, or namespaces.

I wonder if we should do these before adding many optimizations, as they (in particular dynamic stylesheet changes) may affect the rest in non-trivial ways.

@pradeep90
Copy link
Contributor Author

pradeep90 commented Nov 8, 2013

@SimonSapin

We do not implement CSSOM at all at the moment.
We also do not have any notion of quirks mode / standards mode, HTML / XML elements / documents, or namespaces

Yeah :) That's what I was telling him

I wonder if we should do these before adding many optimizations, as they (in particular dynamic stylesheet changes) may affect the rest in non-trivial ways.

Ok. I guess it would probably be good to implement the dynamic stylesheet changes before going in for parallelization and other further optimizations, because then it might be non-trivial to modify.

As far as Rule Hash goes, I think it is a standard sequential optimization and we would want to support it anyway, so we should probably do dynamic stylesheet changes after this.

Your thoughts?

@yichoi
Copy link
Contributor

yichoi commented Nov 8, 2013

+1. After landing this, we can continue to sequential optimization and prepare further optimization for parallelization

@ryanhc
Copy link
Contributor

ryanhc commented Nov 8, 2013

Rule hash is a standard optimization technique that all major web browser engines such as Webkit implement, and I also think that we should have similar technique. Given the incompleteness of Servo, and CSS selector matching still being developed, how about we integrate rule hash first, and then incrementally develop/optimize it as the above mentioned features such as quirks mode, namespaces, dynamic updates, etc are developed rather than waiting for and/or developing all the required features?

@bzbarsky
Copy link
Contributor

bzbarsky commented Nov 8, 2013

The point is, the devil is in the details. The exact way you actually set up the hashtable and how you use it really depends on how you implements those other features.

There's no problem with landing this, as long as all the various buggy parts are clearly marked, with issues tracking them getting fixed filed.

@pcwalton
Copy link
Contributor

pcwalton commented Nov 13, 2013

Discussed in person and I'm fine with landing this with a few allocation optimizations.

Also, use stack-allocated vector for fixed-length vector instead of allocating
on the heap.
@pradeep90 pradeep90 closed this Nov 14, 2013
ChrisParis pushed a commit to ChrisParis/servo that referenced this pull request Sep 7, 2014
Cleanup the querySelector{,All} tests; r=MikeSmith+jgraham
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

9 participants
You can’t perform that action at this time.