New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML dict: enable color rendering and per-dict user fix html func #3585

Merged
merged 1 commit into from Jan 9, 2018

Conversation

Projects
None yet
2 participants
@poire-z
Contributor

poire-z commented Jan 9, 2018

  • Enable Mupdf color rendering in HtmlBoxWidget.
  • Fix possibility of crash in HtmlBoxWidget:free() (I don't think a crash can happen with the current master code, but I have some work in progress that uses more free() to prevent memleaks and I had some crashes there, so let's fix it now).
  • Allow for user's hooks for HTML dict to tweak/fix the bad HTML of some dicts, by allowing users to put a LUA function in some file named as the .ifo with a .lua extension (#3573 (comment))

With a painfully crafted PetitRobert2007_1.1.lua (updated 20180115 ), I managed to turn this:
robert2007bof
(the HTML returned is really really bad, full of errors and inconsistencies)

into:
robert2007html

Too bad I won't be using it, as it turns out this PetitRobert2007 does not return all definitions (ie, you get 2. Critique, the 2nd definition for word critique, that suggest to go see 1. Critique, but you never get it in the results...) edit: just realized the PetitRobert2003 returns the 1. Critique but not any 2. Critique ... so it's either a bug when these 2 dicts were made, or something in sdcv...
edit: ok, it's a sdcv bug already reported in Dushistov/sdcv#30

(The collect/serialize to fix HTML tags balance I used in this file seems OK, but there's many chances that it won't work as is and need individual tweaks - so I don't think we can and should make use of it as an alternative to htmltidy by default for all HTML output that Mupdf woudln't like).

(@Frenzie : you can release this morning's builds without this with no problem.)

HTML dict: enable color rendering and per-dict user fix html func
Also fix possibility of crash in HtmlBoxWidget:free()
@Frenzie

This comment has been minimized.

Member

Frenzie commented Jan 9, 2018

I'm not entirely sure that I can just yet. :-P

On my H2O I found I wasn't able to scroll, even though it's working on the emulator.

@poire-z

This comment has been minimized.

Contributor

poire-z commented Jan 9, 2018

Confirmed: scrolling with swipe does not work in scrollhtmlwidget (I scroll with tap mostly). It does not scroll for me either in the emulator.
Small typo (a natural/visual half-conscious corretion I guess). Easy to fix:

     if Device:isTouchDevice() then
         self.ges_events = {
-            SwipeScrollText = {
+            ScrollText = {
                 GestureRange:new{

Should I add it to this PR ? or you prefer an individual one?

@Frenzie

This comment has been minimized.

Member

Frenzie commented Jan 9, 2018

I suppose a separate quickie is better for history.

@Frenzie Frenzie merged commit 8c897a0 into koreader:master Jan 9, 2018

1 check passed

ci/circleci Your tests passed on CircleCI!
Details
@Frenzie

This comment has been minimized.

Member

Frenzie commented Jan 9, 2018

Another crash that I figure should theoretically be avoided is the error throwing, i.e., sticking it behind an if dbg.is_on then and doing something a little more user-friendly otherwise.

if not ok then
error(self.document)
end

But that's more of a hypothetical thing. Since I'm not completely sure what the best friendly thing to do is, other than not closing the program[1] I won't throw in a quickie for that one and I won't let it hold up anything tomorrow. ;-)

[1] Retry yet again with known good HTML saying something about an error? Close the widget and show some kind of InfoMessage instead? Leave it white with an InfoMessage? Some other option?

@poire-z poire-z deleted the poire-z:htmldict_color_fixfunc branch Jan 9, 2018

@poire-z

This comment has been minimized.

Contributor

poire-z commented Jan 9, 2018

I don't really know.
But I assume that with the html to plain text conversion just done before, there's no real reason for it to fail (except for other things like not enough memory, etc...), so we should be fine.

@poire-z

This comment has been minimized.

Contributor

poire-z commented Jan 29, 2018

Sharing the html fix funcs I made for the 3 HTML french dicts I have (to be put alongside the .ifo and renamed as it with a .lua extension instead of .ifo):
PetitRobert2007_1.1.lua.txt
PetitRobert2003_v4.lua.txt
XMLittre.lua.txt

@Frenzie

This comment has been minimized.

Member

Frenzie commented Jan 29, 2018

We should look into a sensible way of including the XML Littré fix. (What's the problem with something that explicitly has XML in the name? Some custom XML DTD rather than HTML?)

@poire-z

This comment has been minimized.

Contributor

poire-z commented Jan 29, 2018

Yes, some XML which is not fully HTML. For example, in my .lua:

    html = html:gsub('<big>', '<span style="font-weight: bold; font-size: 1.1em">')
    html = html:gsub(' foreground="', ' style="color: ')

(+ some other cosmetics changes for my taste, like replacing colors with smaller italic fonts, etc...)
(But I'm not not sure I didnt explicitely changed sametypesequence=x to sametypesequence=h - so it may by default displays as text.)
(Or may be it was bad HTML, and the conversion to simple HTML is OK enough to have it looks as text like it did before - and I must have by chance see it was indeed some HTML, and tried to make something better out of it.)

@Frenzie

This comment has been minimized.

Member

Frenzie commented Jan 29, 2018

Hm, how old is that dictionary? It's being continuously corrected, although it looks like it remains an XML version of ye olde Littré, see here.

Is there a script somewhere that automates the conversion from the XML Littré source to stardict? There's https://bitbucket.org/Mytskine/xmlittre-web but I'm not seeing any XSLT in that repo.

https://www.littre.org/faq

Anyway, just something that one could consider as a lunch or breakfast project. :-P

@poire-z

This comment has been minimized.

Contributor

poire-z commented Jan 29, 2018

Oh, didn't know that. Files in mine are dated from 2006 :)

@Frenzie

This comment has been minimized.

Member

Frenzie commented Feb 11, 2018

Those curious could investigate whether https://github.com/leafo/web_sanitize could be of any use. Although its purpose is to sanitize (i.e., from scripts and styles), it seems to handle unclosed inner tags. It's a fair bit smaller than libtidy. :-P

It depends on LPeg, but we already include that for lua-Spore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment