Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix writeNodeEx(), add getBalancedHTML() #569

Merged
merged 2 commits into from
Jun 16, 2024

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented Jun 15, 2024

writeNodeEx(): fix handling of multilines attribute values

Make them single line in the "extra stream" output giving position of nodes in the main stream, used for handling long-press in the HTML Viewer.
See koreader/koreader#12004 (comment).

Add getBalancedHTML() helper

This helper function makes use of our (nearly) conforming HTML parser (#370), which handles unbalanced HTML and builds a proper DOM, and returns the serialized DOM, so balanced.
This is currently not used by crengine nor frontend, but it's available for use in HTML dict funcs where giving balanced HTML to MuPDF can give better rendering.

I added that to see if it helped with koreader/koreader-base#1586 (comment).
and used it like this:

return function(html)
    html = "<html><body>"..html.."</body></html>"
    html = cre.getBalancedHTML(html, 0x50)
    return html
end

but it didn't really help: the HTML was bad, and the "balanced" results, although looking more proper, didn't really give anything better.
Anyway, let's have this small helper available, it may help with experimenting and testing.


This change is Reviewable

Make them single line in the "extra stream" output giving
position of nodes in the main stream, used for handling
long-press in the HTML Viewer.
This helper function makes use of our (nearly) conforming
HTML parser, which handles unbalanced HTML and builds a
proper DOM, and returns the serialized DOM, so balanced.
This is currently not used by crengine nor frontend, but
it's available for use in HTML dict funcs, where giving
balanced HTML to MuPDF can give better rendering.
@poire-z poire-z merged commit e2c62ef into koreader:master Jun 16, 2024
1 check passed
@poire-z poire-z deleted the misc_202406 branch June 16, 2024 07:34
@Frenzie
Copy link
Member

Frenzie commented Jun 16, 2024

but it didn't really help: the HTML was bad, and the "balanced" results, although looking more proper, didn't really give anything better.

MuPDF also claims to have an HTML 5 parser since 1.18 btw, so it should do something very similar itself.

Frenzie pushed a commit to koreader/koreader that referenced this pull request Jun 16, 2024
Includes:
- Russian hyphenation: revert "allow hyphens after не" koreader/crengine#568
- Serbian hyphenation: combine patterns for Cyrillic and Latin scripts koreader/crengine#566
- writeNodeEx(): fix handling of multilines attribute values koreader/crengine#569
  See #12004 (comment).
- Add getBalancedHTML() helper

Also includes:
- kobo: add missing blitbuffer library koreader/koreader-base#1823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants