Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficiency in applying stylesheet #529

Open
bbshelper opened this issue Sep 27, 2023 · 1 comment
Open

Inefficiency in applying stylesheet #529

bbshelper opened this issue Sep 27, 2023 · 1 comment

Comments

@bbshelper
Copy link
Contributor

If an epub's stylesheets contain a lot of class selectors (which is not rare) then current code can be terribly slow. Basically if there are m class selectors and n nodes, more than mn calls to LVCssSelectorRule::check() are made. Some optimization ideas:

  1. Hash class selectors (maybe also ID selectors) in the stylesheet. Given current class attr, we can skip most class selectors. Can be constructed during LVStyleSheet.push()?
  2. Cache various getAttributeValue() results in LVStyleSheet::apply(), so we don't repeatedly call that on the same node against different rules.
  3. ldomNode trades speed for space. On x64 there are 4 bytes padding that can be utilized, such as storing node ID. May be irrelevant if previous 2 items are implemented.

I test epubs with LoadDocument() and Render(). I've noticed that styles are applied in both calls. Why so?

@poire-z
Copy link
Contributor

poire-z commented Sep 29, 2023

I test epubs with LoadDocument() and Render(). I've noticed that styles are applied in both calls. Why so?

When you open a document and there is no cache, the XML is parsed and the DOM built. While the DOM is built, there is an optimisation: we apply styles as we create the nodes (as CSS styles depends only on parents and previous siblings, and not on the not-yet-met children or next siblings).
Then, in the Render() that follows this LoadDocument(), we skip applying styles, so this Render() is quicker than the Render() that will happen later.
This Render() then only initNodeRendMethods(), which depends on children (so it can't be done while in the LoadDocument() phase.

Later, when you rerender the document (change font size, style tweaks...), the DOM is there and is reused. But the nodes styles may no longer be valid, and need to be recomputed.
So, in this phase, there is no LoadDocument, but only Render(): this Render() clears styles and need to recompute them, by walking the DOM. And then the initNodeRendMethods() is done as in the first Render().

So, styles are always computed only once in each of these 2 operations.

Basically if there are m class selectors and n nodes, more than mn calls to LVCssSelectorRule::check() are made.

And possibly some more other calls if selectors have parent/siblings rules. We must be sure they will work for them too, so we can't just cache stuff for the currently stylesheet.apply(current_node), or we need to have 2 paths for when stuff is cached for the current node and for when stuff is not for parent nodes.

Hash class selectors (maybe also ID selectors) in the stylesheet. Given current class attr, we can skip most class selectors. Can be constructed during LVStyleSheet.push()?

Not sure I understand what you have in mind in that italic part.

Cache various getAttributeValue() results in LVStyleSheet::apply(), so we don't repeatedly call that on the same node against different rules.

Also not sure how you see that.
(We have cache_node_checked_property() and get_cached_node_checked_property(), dunno if you're thinking about extending them - or something else completely?)

ldomNode trades speed for space. On x64 there are 4 bytes padding that can be utilized, such as storing node ID. May be irrelevant if previous 2 items are implemented.

So, not a solution if we want to support 32bits arm like our Kobo & co.

Hash class selectors (maybe also ID selectors) in the stylesheet. Given current class attr, we can skip most class selectors. Can be constructed during LVStyleSheet.push()?

Dunno if that's what you had in mind, but when I read that earlier, I thought about lUInt32 lString32/8::getHash(). So, if we can trust it, we could convert a classname to a lUInt32, so a lChar32 - and we could store 3 long litteral class names as a lString32 of 3 codepoints/lChar32. If you do the same for the classname in the selector, it's just about finding one codepoint in such hashed-classnames-lString32.
We can't just store these hashed-classnames-lString32 into the "class" attributes, as we need it litteral for attr checks like div[class~=chap] - but we could at DOM build time add an attribute of our own Hclass= that we could use for checking.
A bit ugly (we'd need to hide it in writeNodeEx()/ViewHtml), and it will need space for a new attribute and their values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants