Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity 'nbsp' not defined #578

Closed
jazz-it opened this issue Aug 30, 2020 · 3 comments
Closed

Entity 'nbsp' not defined #578

jazz-it opened this issue Aug 30, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@jazz-it
Copy link

jazz-it commented Aug 30, 2020

Describe the bug
When I open any azw3 file type, foliate throws this particular message and doesn't open any other page than crippled first page of the document.

To Reproduce
Steps to reproduce the behavior:

  1. Open azw3 file
  2. See the error message.

Expected behavior
There should be no error message. I guess it's due to UTF-8 formatting issues?

Screenshots

Version:

  • Foliate version: 2.4.2
  • OS/Distribution and version: 5.4.60-2-MANJARO
  • Desktop environment: KDE Plasma 5.19.4
  • Installation method: pamac (Official Repository)

Additional context
I wasn't able to open any azw3 file that didn't have such issue, so I assume it may affect all azw3 file types.

@jazz-it jazz-it added the bug Something isn't working label Aug 30, 2020
@frnco
Copy link

frnco commented Sep 5, 2020

Also affects ePubs. Considering Foliate also reports a ton of errors about tag mismatch (i.e. <br>, <br >, <img > etc.), and in both cases doesn't show content after that point, I suspected the problem could be in the implementation of the HTML Specification.

After checking the runtime dependencies and searching a bit it seems that, unless Foliate modifies/encodes/converts/etc the HTML String, there's a pretty good chance that webkit2gtk is the one causing the problem. One bug that caught my attention over there is Content-Type being either unset, incomplete (i.e. Lacking Charset) or wrong, it came up a lot on my search. Apparently it breaks on unclosed Tags (So HTML5 doesn't work) and either special characters (& or ;) or html entities.

If there's any code dealing with some of those, especially the <DOCTYPE, <html> and <head> tags, plus html entities or the charactes used for them (&and ;) it's ṕrobably worth looking into. After that, I'd check how webkig2gtk and perhaps GJS deals with HTML5 and HTML Entities.

I may be able to look a bit more into it, so I'd appreciate any tips on which files have code that may relate to the problem and the easiest way to setup a dev environment to debug.

@johnfactotum
Copy link
Owner

Can you open the file with upstream Epub.js? See the wiki for instructions.

To test .azw3 files with Epub.js, convert them to EPUB first with KindleUnpack (which is what Foliate uses internally).

@johnfactotum
Copy link
Owner

Closing in favor of #699, as it's really the same issue.

@johnfactotum johnfactotum closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants