Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow drawing images inverted + Some CSS fixes #276

Merged
merged 4 commits into from
Mar 29, 2019

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented Mar 29, 2019

DrawBuf: allow drawing images inverted

Will be requested when in night mode: inverted images will then get inverted back by night mode and displayed normally. See koreader/koreader#2571 (comment)

ldomXPointer::getRect(): adds adjusted=false parameter

Commit 12c8e4e from #263 (lvtinydom: Adjust ldomXPointer::getRectEx(), used to get the rect of a char, to include its glyph's left and right overflows) is nice for getting nice rects to highlight, but should generally not be done for other things - so have it done only when needed. Will avoid having "not a coherent xpointer" as seen in koreader/koreader#4840 (comment).

CSS: fix parsing of "! important" with spaces

That's allowed, and used in the testsuite:
image

CSS: fix non-deterministic combinators

E F (descendant combinator) and E ~ F (general sibling combinator) had a limited implementation, that could miss some matches.
This implements the correct logic: check the multiple possible paths, possibly all, until a match is found.
This may be expensive on books with lots of such selectors, and may increase the loading and re-rendering times (some quick test on a book with an absurdly lot of these shows a +20% load time).
It fixes the css3-modsel-86, css3-modsel-88 (and others css3-modsel-8xx) from #176, see #176 (comment).

Will be requested when in night mode: inverted images will
then get inverted back by night mode and displayed normally.
By default, don't adjust returned Rect for glyphs left
or right side bearing, as this could cause have some side
effects (ex: when checking for link coherency).
Only use it in getSegmentRects() when we really want
finely adjusted rects for highlighting.
E F (descendant combinator) and E ~ F (general sibling combinator)
had a limited implementation, that could miss some matches.
This implements the correct logic: check the multiple possible
paths, possibly all, until a match is found.
(This may be expensive on books with lots of such selectors,
and may increase the loading and re-rendering times.)
Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@poire-z
Copy link
Contributor Author

poire-z commented Jul 22, 2019

A quick note related to the book mentionned at https://www.mobileread.com/forums/showthread.php?t=321683 (investigated there by @NiLuJe :).

Loading time takes 27s on my emulator (far less powerful than @NiLuJe 's desktop :) - but only 14s if I disable embedded styles.
There are indeed a lot of "non-deterministic combinators" in its stylesheet, so it's victim of the expensive cost of checking such rules.

(My reference book for timing is also a Bible (french TOB, with a quite simple stylesheet) which loads in 17s on the emulator and 80s on my Kobo - so this book should indeed load in 2-3 minutes on a device - no idea why it would not...)

There's another strange thing with this book: when you change some rendering parameters (font size, margins), it triggers a Styles have changed in such a way that fully reloading the document may be needed for a correct rendering. popup - which I've never seen happening before when ... styles have no reason to have changed :|

@NiLuJe
Copy link
Member

NiLuJe commented Jul 22, 2019

Note that I was also cheating a bit by using a release build, and I think I remember you mentioning those timings were from debug builds ;). And it was also run from a tmpfs, so, very very fast I/O, too ;).

That said, it does indeed fall squarely on the 2 min mark on my H2O (unplugged, which might be a significant factor on the H2O with its wonky CPUFreq driver)).

I assume that that popup showing up means that closing/re-opening the book will indeed trigger a re-render?

@Frenzie
Copy link
Member

Frenzie commented Jul 22, 2019

It took under 6 seconds on my debug build, so it'd definitely take a minute on my H2O.

@poire-z
Copy link
Contributor Author

poire-z commented Jul 22, 2019

I assume that that popup showing up means that closing/re-opening the book will indeed trigger a re-render?

Yes (*): if you don't answer OK to the popup, it will happen on next load (in spite of the cache, considered invalid).

*: but more than a re-render: a re-loading, re parsing the EPUB's HTML to build a new DOM.
When a single CSS' display/white-space/float has changed, new internal elements like autoBoxing or floatBox may need to be added/removed to/from the DOM - so it's noticed by a dedicated hash:

// We also compute _nodeDisplayStyleHash from each node style->display. It
// may not change as often as _nodeStyleHash, but if it does, it means
// some nodes switched between 'block' and 'inline', and that some autoBoxing
// that may have been added should no more be in the DOM for a correct
// rendering: in that case, the user will have to reload the document, and
// we should invalidate the cache so a new correct DOM is build on load.
_nodeDisplayStyleHash = 0;

_nodeDisplayStyleHash = _nodeDisplayStyleHash * 31 + style.get()->display;
// Also account in this hash if this node is "white_space: pre"
// If white_space change from/to "pre" to/from any other value,
// the document will need to be reloaded so that the HTML text parts
// are parsed according the the PRE/not-PRE rules
if (style.get()->white_space == css_ws_pre) _nodeDisplayStyleHash += 29;
// Also account for style->float_, as it should create/remove new floatBox
// elements wrapping floats when toggling BLOCK_RENDERING_G(ENHANCED)
if (style.get()->float_ > css_f_none) _nodeDisplayStyleHash += 123;

and when this hash changes, you get that popup.

When only other CSS properties have changed (e.g. by style tweaks), the DOM is still valid, so no "loading" needed - just a re-rendering, which works on the current and cached DOM, so it just drops all nodes' styles and recompute them, and re-layout boxes and text.

Note that on a first load/no cache, a good amount of the styles work is done in the "loading" phase, so the "rendering" phase is quicker than the next re-renderings.

Initial loading with no cache: 17s (loading/parsing HTML + computing styles) + 4s (rendering boxes and text)

07/22/19-21:04:44 DEBUG CreDocument: loading document...
07/22/19-21:05:01 DEBUG CreDocument: loading done.
07/22/19-21:05:01 DEBUG CreDocument: rendering document...
07/22/19-21:05:05 DEBUG CreDocument: rendering done.

Changing font size: 13s (computing styles + rendering boxes and text)

07/22/19-21:07:24 DEBUG CreDocument: set font size 18
07/22/19-21:07:37 DEBUG show widget: table: 0x40975940

@NiLuJe
Copy link
Member

NiLuJe commented Jul 22, 2019

Gotcha! Thanks for the detailed walk-through ;).

@poire-z
Copy link
Contributor Author

poire-z commented Jul 22, 2019

OK, the popup was triggered because of docclass117.css having:
th:last-child, td:last-child {display: none; }

Issue can be triggered with this simple document:

<?xml version="1.0" encoding="utf-8" ?>
<html>
<head>
<style>
th:last-child, td:last-child {display: none; }
</style>
</head>
<body>
<div>Start of text.</div>
<table>
<tr>
<th>abc</th>
<th>def</th>
</tr>
<tr>
<td>abc</td>
<td>def</td>
</tr>
</table>
<div>End of text.</div>
</body>
</html>

On initial loading, no table is displayed (wrong).
After changing font size, the table is shown with only the "abc" column (good).

@poire-z
Copy link
Contributor Author

poire-z commented Jul 22, 2019

On initial loading, no table is displayed (wrong).

:(
Related somehow to what I just explained...
When the HTML is parsed and the DOM is being built, and styles computed in that phase (cause we know enough: the node class, all the parents), when meeting the nodes as they are parsed, the current node is... always the last child :| (which might be the only thing we don't know yet...)
When re-rendering and working on a fully available DOM, we get them right, and only the last child is the last child and get that style.

Dunno if that's solvable (except by just not computing styles in the loading phase... and delay it to the rendering phase, which I had occasionally though about to solve some bugs, but always ended up not having to, which I've kinda been proud of :)

@poire-z
Copy link
Contributor Author

poire-z commented Jul 22, 2019

Or just set a flag somewhere when one of these is met, so we just consider the styles made on loading invalid, and immediately go on with a re-rendering (like we may already do in some cases when there are embedded fonts). Books with these will load slower, but those without will be fine.

:last-child
:last-of-type
:nth-last-child()
:nth-last-of-type()
:only-child
:only-of-type

@Frenzie
Copy link
Member

Frenzie commented Jul 23, 2019

The flag idea sounds nifty. :-)

@poire-z
Copy link
Contributor Author

poire-z commented Jul 23, 2019

I quickly crafted a fix, but it still needs a bit of work, and as I'm off soon, it will have to wait a few days.

Regarding the italian version given on the forum, it was still loading after a few minutes, parsing and checking CSS rules each time I gdb ctrl-z it.
You with more powerful computers than mine can check if it finally ends up loading please? :)
(No time to check if the styles are really more complex than those of the english version.)

@Frenzie
Copy link
Member

Frenzie commented Jul 23, 2019

It loads, but it took about 15 minutes. I don't know what's so different about that Italian one… or maybe "only" about 7-8 + another 7-8 for the new Italian hyphenation?

https://www.mobileread.com/forums/showthread.php?p=3870310#post3870310

07/23/19-11:01:22 DEBUG CreDocument: set view dimen {
    ["h"] = 1872,
    ["w"] = 1404
}
07/23/19-11:01:22 DEBUG CreDocument: set status line property 1
07/23/19-11:01:22 DEBUG made tweak css:
[END]
07/23/19-11:01:22 DEBUG CreDocument: set style sheet: ./data/epub.css and appended content (0 bytes)
07/23/19-11:01:22 DEBUG CreDocument: set embedded style sheet 1
07/23/19-11:01:22 DEBUG CreDocument: set block rendering flags 0x7fffffff
07/23/19-11:01:22 DEBUG CreDocument: set render dpi 96
07/23/19-11:01:22 DEBUG CreDocument: set page margins 22 32 22 65
07/23/19-11:01:22 DEBUG CreDocument: set floating punctuation 0
07/23/19-11:01:22 DEBUG CreDocument: set txt preformatted 1
07/23/19-11:01:22 DEBUG CreDocument: set smooth scaling false
07/23/19-11:01:22 DEBUG CreDocument: set nightmode images true
07/23/19-11:01:22 DEBUG CreDocument: set font face Noto Serif
07/23/19-11:01:22 DEBUG CreDocument: set header font Noto Sans
07/23/19-11:01:22 DEBUG CreDocument: set font size 47
07/23/19-11:01:22 DEBUG CreDocument: toggle font bolder 0
07/23/19-11:01:22 DEBUG CreDocument: set font hinting mode 2
07/23/19-11:01:22 DEBUG CreDocument: set font kerning mode 1
07/23/19-11:01:22 DEBUG CreDocument: set space condensing 75
07/23/19-11:01:22 DEBUG CreDocument: set interline space 100
07/23/19-11:01:22 DEBUG CreDocument: set gamma index 15
07/23/19-11:01:22 DEBUG Hyphenation: no algo set
07/23/19-11:01:22 DEBUG CreDocument: set hyphenation left hyphen min 2
07/23/19-11:01:22 DEBUG CreDocument: set hyphenation right hyphen min 2
07/23/19-11:01:22 DEBUG CreDocument: set hyphenation trust soft hyphens 0
07/23/19-11:01:22 DEBUG Hyphenation: keeping current crengine algo: English_US.pattern
07/23/19-11:01:22 DEBUG CreDocument: requesting DOM version: 20180528
07/23/19-11:01:22 DEBUG CreDocument: set visible page count 1
07/23/19-11:01:22 DEBUG CreDocument: loading document...
07/23/19-11:08:33 DEBUG CreDocument: loading done.
07/23/19-11:08:33 DEBUG Hyphenation: updating for doc language it : English_US.pattern => Italian.pattern
07/23/19-11:08:33 DEBUG CreDocument: set hyphenation dictionary Italian.pattern
07/23/19-11:08:33 DEBUG CreDocument: set hyphenation left hyphen min 2
07/23/19-11:08:33 DEBUG CreDocument: set hyphenation right hyphen min 2
07/23/19-11:08:33 DEBUG CreDocument: set hyphenation trust soft hyphens 0
07/23/19-11:08:33 DEBUG CreDocument: rendering document...
CRE: document loaded, but styles re-init needed (possible epub with embedded fonts)
07/23/19-11:16:19 DEBUG CreDocument: rendering done.

@poire-z
Copy link
Contributor Author

poire-z commented Aug 2, 2019

  • another 7-8 for the new Italian hyphenation?

I don't think the hyphenation is the cause, rather some embedded font specified in CSS (even if there is no font file in the EPUB...):
CRE: document loaded, but styles re-init needed (possible epub with embedded fonts)
(We'll trigger that same thing too when we meet a :last-child & co.)

The italian one has the same number of xhtml files (~4000) as the english one, but the CSS in the italian one are horrible: 3 huge files, that must be included by each HTML files (the english one has more, but smaller CSS files, and each xhtml file only includes 2 or 3 small ones).
And each italian CSS have thousands of ugly ancestor selectors... which might be duplicated in another CSS... So, probably x100 more selectors in the italian one than in the english one - so x100 more time needed to check them...
So, I'm just not going to investigate more :)

May be there are some tools to clean CSS files and keep only the selectors that are matched, or merge them. That book could benefit from some cleanup :| May be just remove 1 or 2 of the biggest CSS Files as a first try...

The answer as to why other readers open it is that may be they don't render the whole book, or their styles check&apply code is better than crengine's one.

@Frenzie: I'd like a qcachegrind output on the english one if you can provide it - just to see if something jumps out, although I know the ancestor selectors are doomed to be slow, and there's not much obvious to be done, except discarding them (as it was done before I fixed them :| ).

@Frenzie
Copy link
Member

Frenzie commented Aug 2, 2019

May be there are some tools to clean CSS files and keep only the selectors that are matched, or merge them. That book could benefit from some cleanup :| May be just remove 1 or 2 of the biggest CSS Files as a first try...

There are some tools that can help clean up styles a little, e.g. see here. I suppose they should work well enough on an extracted EPUB, or maybe Sigil even integrates something like it.

cachegrind output on the english one

callgrind.out.28633.zip

@poire-z
Copy link
Contributor Author

poire-z commented Aug 2, 2019

Thanks.
A quick look shows that lString16:lowercase() is expensive (alloc/free) and called many times, taking half of the time (the half left side of this):
image
It should be called only for checking these:

/* Attributes that make some elements float or clear */
br[clear=all i], br[clear=both i] { clear: both; }
br[clear=left i] { clear: left; }
br[clear=right i] { clear: right; }
img[align=left i] { float: left; }
img[align=right i] { float: right; }
table[align=left i] { float: left; }
table[align=right i] { float: right; }

But if I delete these, load & render time doesn't change (30s with or without)...
You've been using epub.css ? Can you try removing these declarations and see if you still have that lowercase stuff called that much?

Otherwise, nothing much strange.

@Frenzie
Copy link
Member

Frenzie commented Aug 2, 2019

But the self value of lowercase is a mere 0.01. It looks to me like the unexpected heavy hitter is TrimDoubleSpaces()?

Screenshot_2019-08-02_09-56-51

@poire-z
Copy link
Contributor Author

poire-z commented Aug 2, 2019

But the self value of lowercase is a mere 0.01

Well, I don't see the same things as you do :) I use some portable Qcachegrind. You don't see that big block on the left I pasted above?

It looks to me like the unexpected heavy hitter is TrimDoubleSpaces()?

I don't see that either :) Anyway, it's in LVCssDeclaration:parse(), which shouldn't be hit much.

@Frenzie
Copy link
Member

Frenzie commented Aug 2, 2019

You don't see that big block on the left I pasted above?

Sure, if I enable the callee map. :-)

Screenshot_2019-08-02_11-01-27

@poire-z
Copy link
Contributor Author

poire-z commented Aug 3, 2019

Just some quick timing notes on the emulator about that italian one:

I made 2 of the 3 huge CSS empty, keeping only the smaller one (80Kb).
The book loads in 140s.
If I just have ancestor rules discarded in the crengine code, it loads in 90s.
If I revert 621dee0, it loads in 144s (so, it's strangely not that much the fixed and expensive ancestor checking that is the cause of the slowness).
If I just disable embedded stylesheets via the bottom menu, it loads in 13s (the same if I keep the 3 huge CSS files, so it's obviously not the parsing of these files).

No real clue about how to speed that up, but it's probably in lvstsheet.cpp that stuff might be optimized.

(According to https://www.mobileread.com/forums/showthread.php?p=3864541#post3864541, many reading apps may not support the expensive CSS combinators as much as we do.)

@Frenzie
Copy link
Member

Frenzie commented Aug 3, 2019

but they looked perfect with PocketBook and koreader

Nice work @poire-z ;-)

No real clue about how to speed that up, but it's probably in lvstsheet.cpp that stuff might be optimized.

Maybe there's some stuff that could be cached that isn't? I dunno, I think the only real way to open ridiculously large documents fast is to do only one XHTML file at a time (cheating, so to speak). I imagine facilitating such behavior to be part of the reason behind the various individual chapter files in the first place.

Also I just finally looked at that CSS and HOLY @@#$ what is that thing?! Those styles are just ridiculously complex. O_o I assume a preprocessor was involved but still…

Also they contain some doozies:

div.east_right{width:40% !important;clear:both;float:right;margin-top:0em;margin-bottom:1em;display:inline-block}

inline-block doesn't combine with float:right. Stuff like this makes a good testcase for how we handle crappy CSS. ;-)

@poire-z
Copy link
Contributor Author

poire-z commented Jan 11, 2020

Reminding @Frenzie we did investigate that https://www.mobileread.com/forums/showthread.php?p=3940093#post3940093 above around #276 (comment)

If I remember correctly, the 2 of the 3 CSS files were really similar, so I'm sure removing one of these 2 from the EPUB zip file will improve loading time, and might not change much the look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants