New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyphenation: speed up book loading by setting hyph algo earlier #4142

Merged
merged 1 commit into from Aug 7, 2018

Conversation

Projects
None yet
3 participants
@poire-z
Contributor

poire-z commented Aug 7, 2018

The hyphenation algo setup was the only one using onPreRenderDocument, because it needed access to the document language to set accordingly the hyphenation algorithm.

Setting the hyph algo before loading the document may save crengine from re-doing some expensive work at render time (the hyph algo is accounted in the nodeStyleHash, and would cause a mismatch if it is different at render time from how it was at load time - "English US" by default - causing a full re-init of the nodes styles.)
We will only re-set it on pre-render (only then, after loading, we know the document language) if it's really needed: when no algo saved in book settings, no default algo, and book has some language defined.

Before, the hyphenation settings were only and always applied after loading. It was working similar to this (with the logs this PR adds):

08/07/18-18:10:10 DEBUG CreDocument: set gamma index 15
08/07/18-18:10:10 DEBUG Hyphenation: no algo set
08/07/18-18:10:10 DEBUG CreDocument: set hyphenation left hyphen min 2
08/07/18-18:10:10 DEBUG CreDocument: set hyphenation right hyphen min 2
08/07/18-18:10:10 DEBUG CreDocument: set hyphenation trust soft hyphens 0
08/07/18-18:10:10 DEBUG Hyphenation: keeping current crengine algo: English_US.pattern
08/07/18-18:10:10 DEBUG CreDocument: requesting DOM version: 20180528
08/07/18-18:10:10 DEBUG CreDocument: loading document...
08/07/18-18:10:11 DEBUG CreDocument: loading done.
08/07/18-18:10:11 DEBUG Hyphenation: updating for doc language fr : English_US.pattern => French.pattern
08/07/18-18:10:11 DEBUG CreDocument: set hyphenation dictionary French.pattern
08/07/18-18:10:11 DEBUG CreDocument: set hyphenation left hyphen min 2
08/07/18-18:10:11 DEBUG CreDocument: set hyphenation right hyphen min 1
08/07/18-18:10:11 DEBUG CreDocument: set hyphenation trust soft hyphens 0
08/07/18-18:10:11 DEBUG CreDocument: set visible page count 1
08/07/18-18:10:11 DEBUG CreDocument: rendering document...
checkRenderContext: Style hash doesn't match b3a9c9c6!=65a8043a
  /\ this makes crengine actually reset all the node styles it has built when loading
DOCUMENT 1 rendering context is changed - full render required...
08/07/18-18:10:12 DEBUG CreDocument: rendering done.

Now, we can get this:

08/07/18-18:13:43 DEBUG CreDocument: set gamma index 15
08/07/18-18:13:43 DEBUG Hyphenation: using fallback  French.pattern , might be overriden by doc language
08/07/18-18:13:43 DEBUG CreDocument: set hyphenation dictionary French.pattern
08/07/18-18:13:43 DEBUG CreDocument: set hyphenation left hyphen min 2
08/07/18-18:13:43 DEBUG CreDocument: set hyphenation right hyphen min 1
08/07/18-18:13:43 DEBUG CreDocument: set hyphenation trust soft hyphens 0
08/07/18-18:13:43 DEBUG CreDocument: requesting DOM version: 20180528
08/07/18-18:13:43 DEBUG CreDocument: loading document...
08/07/18-18:13:44 DEBUG CreDocument: loading done.
08/07/18-18:13:44 DEBUG Hyphenation: current French.pattern is right for doc language: fr
08/07/18-18:13:44 DEBUG CreDocument: set visible page count 1
08/07/18-18:13:44 DEBUG CreDocument: rendering document...
08/07/18-18:13:44 DEBUG CreDocument: rendering done.

or this:

08/07/18-18:14:40 DEBUG CreDocument: set gamma index 15
08/07/18-18:14:40 DEBUG Hyphenation: using default  French.pattern
08/07/18-18:14:40 DEBUG CreDocument: set hyphenation dictionary French.pattern
08/07/18-18:14:40 DEBUG CreDocument: set hyphenation left hyphen min 2
08/07/18-18:14:40 DEBUG CreDocument: set hyphenation right hyphen min 1
08/07/18-18:14:40 DEBUG CreDocument: set hyphenation trust soft hyphens 0
08/07/18-18:14:40 DEBUG CreDocument: requesting DOM version: 20180528
08/07/18-18:14:40 DEBUG CreDocument: loading document...
08/07/18-18:14:40 DEBUG CreDocument: loading done.
08/07/18-18:14:40 DEBUG Hyphenation: not overriding French.pattern with doc language: fr
08/07/18-18:14:40 DEBUG CreDocument: set visible page count 1
08/07/18-18:14:40 DEBUG CreDocument: rendering document...
08/07/18-18:14:40 DEBUG CreDocument: rendering done.

or this

08/07/18-18:15:10 DEBUG CreDocument: set gamma index 15
08/07/18-18:15:10 DEBUG Hyphenation: using default  French.pattern
08/07/18-18:15:10 DEBUG CreDocument: set hyphenation dictionary French.pattern
08/07/18-18:15:10 DEBUG CreDocument: set hyphenation left hyphen min 2
08/07/18-18:15:10 DEBUG CreDocument: set hyphenation right hyphen min 1
08/07/18-18:15:10 DEBUG CreDocument: set hyphenation trust soft hyphens 0
08/07/18-18:15:10 DEBUG CreDocument: requesting DOM version: 20180528
08/07/18-18:15:10 DEBUG CreDocument: loading document...
08/07/18-18:15:10 DEBUG CreDocument: loading done.
08/07/18-18:15:10 DEBUG Hyphenation: not overriding French.pattern with doc language: en
08/07/18-18:15:10 DEBUG CreDocument: set visible page count 1
08/07/18-18:15:10 DEBUG CreDocument: rendering document...
08/07/18-18:15:10 DEBUG CreDocument: rendering done.

and only when necessary this full re-rendering:

08/07/18-18:16:32 DEBUG CreDocument: set gamma index 15
08/07/18-18:16:32 DEBUG Hyphenation: using fallback  French.pattern , might be overriden by doc language
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation dictionary French.pattern
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation left hyphen min 2
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation right hyphen min 1
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation trust soft hyphens 0
08/07/18-18:16:32 DEBUG CreDocument: requesting DOM version: 20180528
08/07/18-18:16:32 DEBUG CreDocument: loading document...
08/07/18-18:16:32 DEBUG CreDocument: loading done.
08/07/18-18:16:32 DEBUG Hyphenation: updating for doc language en : French.pattern => English_US.pattern
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation dictionary English_US.pattern
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation left hyphen min 2
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation right hyphen min 2
08/07/18-18:16:32 DEBUG CreDocument: set hyphenation trust soft hyphens 0
08/07/18-18:16:32 DEBUG CreDocument: set visible page count 1
08/07/18-18:16:32 DEBUG CreDocument: rendering document...
checkRenderContext: Style hash doesn't match 5bc7055d!=a9c8cae9
DOCUMENT 1 rendering context is changed - full render required...
08/07/18-18:16:32 DEBUG CreDocument: rendering done.

On my big sample book (which gets another Stylesheet has doesn't match, which I have yet to fix, so I can't use it to show), which takes 13s for loading and 11s for rendering on the emulator, this recusrive node style init accounts for 4-5 seconds in the 13s, and also for 4-5 seconds in the 11s. So, that's potentially a 25% speed up in loading time in some conditions for us non English_US readers :)

Hyphenation: speed up book loading by setting hyph algo earlier
Setting the hyph algo before loading the document may save crengine
from re-doing some expensive work at render time (the hyph algo
is accounted in the nodeStyleHash, and would cause a mismatch if it is
different at render time from how it was at load time - "English US" by
default - causing a full re-init of the nodes styles.)
We will only re-set it on pre-render (only then, after loading, we
know the document language) if it's really needed: when no algo saved
in book settings, no default algo, and book has some language defined.

@Frenzie Frenzie merged commit c120688 into koreader:master Aug 7, 2018

1 check passed

ci/circleci Your tests passed on CircleCI!
Details

@poire-z poire-z deleted the poire-z:optim_hyphenation_vs_cre branch Aug 7, 2018

@robert00s

This comment has been minimized.

Contributor

robert00s commented Aug 7, 2018

Thanks! :)

@poire-z

This comment has been minimized.

Contributor

poire-z commented Aug 7, 2018

Well, this speedup is noticable mostly on the first opening of big books. On next openings, if a cache is re-used, this late hyphenation wouldn't cause any problem, and the already fast loading won't get any faster.
So, it needs a big book to be noticable, but as big books make a cache... we won't notice it often :)
But well, I benchmark on first opening, so that suits me.

@robert00s

This comment has been minimized.

Contributor

robert00s commented Aug 7, 2018

Every speedup is worthy of praise :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment