Inefficiency in applying stylesheet #529

bbshelper · 2023-09-27T14:21:26Z

If an epub's stylesheets contain a lot of class selectors (which is not rare) then current code can be terribly slow. Basically if there are m class selectors and n nodes, more than mn calls to LVCssSelectorRule::check() are made. Some optimization ideas:

Hash class selectors (maybe also ID selectors) in the stylesheet. Given current class attr, we can skip most class selectors. Can be constructed during LVStyleSheet.push()?
Cache various getAttributeValue() results in LVStyleSheet::apply(), so we don't repeatedly call that on the same node against different rules.
ldomNode trades speed for space. On x64 there are 4 bytes padding that can be utilized, such as storing node ID. May be irrelevant if previous 2 items are implemented.

I test epubs with LoadDocument() and Render(). I've noticed that styles are applied in both calls. Why so?

The text was updated successfully, but these errors were encountered:

poire-z · 2023-09-29T02:18:33Z

I test epubs with LoadDocument() and Render(). I've noticed that styles are applied in both calls. Why so?

When you open a document and there is no cache, the XML is parsed and the DOM built. While the DOM is built, there is an optimisation: we apply styles as we create the nodes (as CSS styles depends only on parents and previous siblings, and not on the not-yet-met children or next siblings).
Then, in the Render() that follows this LoadDocument(), we skip applying styles, so this Render() is quicker than the Render() that will happen later.
This Render() then only initNodeRendMethods(), which depends on children (so it can't be done while in the LoadDocument() phase.

Later, when you rerender the document (change font size, style tweaks...), the DOM is there and is reused. But the nodes styles may no longer be valid, and need to be recomputed.
So, in this phase, there is no LoadDocument, but only Render(): this Render() clears styles and need to recompute them, by walking the DOM. And then the initNodeRendMethods() is done as in the first Render().

So, styles are always computed only once in each of these 2 operations.

Basically if there are m class selectors and n nodes, more than mn calls to LVCssSelectorRule::check() are made.

And possibly some more other calls if selectors have parent/siblings rules. We must be sure they will work for them too, so we can't just cache stuff for the currently stylesheet.apply(current_node), or we need to have 2 paths for when stuff is cached for the current node and for when stuff is not for parent nodes.

Hash class selectors (maybe also ID selectors) in the stylesheet. Given current class attr, we can skip most class selectors. Can be constructed during LVStyleSheet.push()?

Not sure I understand what you have in mind in that italic part.

Cache various getAttributeValue() results in LVStyleSheet::apply(), so we don't repeatedly call that on the same node against different rules.

Also not sure how you see that.
(We have cache_node_checked_property() and get_cached_node_checked_property(), dunno if you're thinking about extending them - or something else completely?)

ldomNode trades speed for space. On x64 there are 4 bytes padding that can be utilized, such as storing node ID. May be irrelevant if previous 2 items are implemented.

So, not a solution if we want to support 32bits arm like our Kobo & co.

Hash class selectors (maybe also ID selectors) in the stylesheet. Given current class attr, we can skip most class selectors. Can be constructed during LVStyleSheet.push()?

Dunno if that's what you had in mind, but when I read that earlier, I thought about lUInt32 lString32/8::getHash(). So, if we can trust it, we could convert a classname to a lUInt32, so a lChar32 - and we could store 3 long litteral class names as a lString32 of 3 codepoints/lChar32. If you do the same for the classname in the selector, it's just about finding one codepoint in such hashed-classnames-lString32.
We can't just store these hashed-classnames-lString32 into the "class" attributes, as we need it litteral for attr checks like div[class~=chap] - but we could at DOM build time add an attribute of our own Hclass= that we could use for checking.
A bit ugly (we'd need to hide it in writeNodeEx()/ViewHtml), and it will need space for a new attribute and their values.

bbshelper mentioned this issue Sep 29, 2023

Optimize LVCssSelectorRule::check() #528

Merged

bbshelper mentioned this issue Oct 28, 2023

Improve css class matching #545

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficiency in applying stylesheet #529

Inefficiency in applying stylesheet #529

bbshelper commented Sep 27, 2023

poire-z commented Sep 29, 2023

Inefficiency in applying stylesheet #529

Inefficiency in applying stylesheet #529

Comments

bbshelper commented Sep 27, 2023

poire-z commented Sep 29, 2023