Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validation errors in content.opf #9493

Closed
gmccullo opened this issue Feb 22, 2024 · 10 comments
Closed

validation errors in content.opf #9493

gmccullo opened this issue Feb 22, 2024 · 10 comments
Labels

Comments

@gmccullo
Copy link

Just upgraded Pandoc and I'm getting a new validation error. This does not happen with Pandoc 2.19.2.

pandoc version:

pandoc.exe 3.1.12.1
Features: +server +lua
Scripting engine: Lua 5.4

Converting markdown to epub.

markdown:

# Header 1

Foobar.

commandline: C:\scratch\pandoc bug> pandoc test.md -o test.epub --epub-cover-image "cover.jpg" --metadata title="test book"

Now when I run an epub validator:

C:\scratch\pandoc bug> java -jar C:\Users\gary\OneDrive\creative\Stories\Cygnus\compile\epubcheck\epubcheck.jar ./test.epub
Validating using EPUB version 3.3 rules.
ERROR(RSC-005): ./test.epub/EPUB/content.opf(8,50): Error while parsing file: attribute "content" not allowed here; expected attribute "dir", "id", "refines", "scheme" or "xml:lang"
ERROR(RSC-005): ./test.epub/EPUB/content.opf(8,50): Error while parsing file: character content of element "meta" invalid; must be a string with length at least 1 (actual length was 0)
ERROR(OPF-027): ./test.epub/EPUB/content.opf(8,50): Undefined property: "cover".

Check finished with errors
Messages: 0 fatals / 3 errors / 0 warnings / 0 infos

Here's what content.opf looks like:

<?xml version="1.0" encoding="UTF-8"?>
<package version="3.0" xmlns="http://www.idpf.org/2007/opf" xml:lang="en-US" unique-identifier="epub-id-1" prefix="ibooks: http://vocabulary.itunes.apple.com/rdf/ibooks/vocabulary-extensions-1.0/">
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:identifier id="epub-id-1">urn:uuid:4020d3b9-d202-4f0b-9cf8-26c79d6bcec5</dc:identifier>
    <dc:title id="epub-title-1">test book</dc:title>
    <dc:date id="epub-date">2024-02-22T14:07:46Z</dc:date>
    <dc:language>en-US</dc:language>
    <meta property="cover" content="cover_jpg" />
    <meta property="dcterms:modified">2024-02-22T14:07:46Z</meta>
    <meta property="schema:accessMode">textual</meta>
    <meta property="schema:accessModeSufficient">textual</meta>
    <meta property="schema:accessibilityFeature">alternativeText</meta>
    <meta property="schema:accessibilityFeature">readingOrder</meta>
    <meta property="schema:accessibilityFeature">structuralNavigation</meta>
    <meta property="schema:accessibilityFeature">tableOfContents</meta>
    <meta property="schema:accessibilityHazard">none</meta>
  </metadata>
  <manifest>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
    <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav" />
    <item id="stylesheet1" href="styles/stylesheet1.css" media-type="text/css" />
    <item id="cover_xhtml" href="text/cover.xhtml" media-type="application/xhtml+xml" properties="svg" />
    <item id="title_page_xhtml" href="text/title_page.xhtml" media-type="application/xhtml+xml" />
    <item id="ch001_xhtml" href="text/ch001.xhtml" media-type="application/xhtml+xml" />
    <item properties="cover-image" id="cover_jpg" href="media/cover.jpg" media-type="image/jpeg" />
  </manifest>
  <spine toc="ncx">
    <itemref idref="cover_xhtml" />
    <itemref idref="title_page_xhtml" linear="yes" />
    <itemref idref="ch001_xhtml" />
  </spine>
  <guide>
    <reference type="toc" title="test book" href="nav.xhtml" />
    <reference type="cover" title="Cover" href="text/cover.xhtml" />
  </guide>
</package>

pandoc bug.zip

@gmccullo gmccullo added the bug label Feb 22, 2024
@gmccullo
Copy link
Author

This doesn't affect the problem but note that I forgot in my demo to specify output format = epub3.

Anyway, playing around with it some more, I see that removing the offending line

<meta property="cover" content="cover_jpg" />

from content.opf resolves the validation issue.

If I'm reading it right, it looks like in the epub3 specification the correct way to specify the cover is to set an item with properties "cover-image", which pandoc is doing:

<item properties="cover-image" id="cover_jpg" href="media/cover.jpg" media-type="image/jpeg" />

@jgm jgm closed this as completed in 0a5ecec Feb 23, 2024
jgm added a commit that referenced this issue Feb 23, 2024
Use `--epub-cover-image` to catch issues that only arise with that.
See #9493.
@jgm
Copy link
Owner

jgm commented Feb 23, 2024

thanks for reporting!

@gmccullo
Copy link
Author

I just noticed something else about this. With the validation error in place, Google Play would not successfully upload the epub. If I delete that "cover" element from the content.opf, then the validation is clean, and Play imports it, but it doesn't show the cover on the thumbnail. This suggests that in the Pandoc 3+-generated epub, vs the Pandoc 2+ version, there's something not quite right about how the cover art is configured. (Or deleting that content element has some unexpected side effect. Or Google Play has an issue.)

@jgm
Copy link
Owner

jgm commented Feb 26, 2024

I don't know. We're doing everything mentioned in the resource you linked. But if you find something we should be doing that we aren't, let us know.

@gmccullo
Copy link
Author

gmccullo commented Feb 26, 2024

I think the problem is that it wants this line in the manifest:

    <item id="cover" href="media/cover.jpg" media-type="image/jpeg"/>

instead of this:

    <item properties="cover-image" id="cover_jpg" href="media/cover.jpg" media-type="image/jpeg"/>

When I make this change it validates and the thumbnail works.

@gmccullo
Copy link
Author

OK, so I've been looking at this in more detail and here's what I see... I'm attaching a zip with two versions of content.opf: the original version of the that Pandoc creates ("fails.xml") which doesn't work, and my corrected version ("works.xml") that validates and also has a working cover image in Google Play Books.

content.opf.zip

@jgm
Copy link
Owner

jgm commented Feb 26, 2024

Is the only change the one you note above? cover-image -> cover ?
That's strange, because the EPUBv3 documentation clearly says to use "cover-image":
https://www.w3.org/TR/epub-33/#example-item-properties-cover-image
I'd hate to do something that goes against the spec.

@gmccullo
Copy link
Author

I think there's 2 changes. I think it's the 2 changes that have come up in this discussion, but if you diff the XML files in the zip you'll see for sure.

I find that epub3 spec pretty hard to understand.

And I'm a little suspicious that Google should not need the cover in the manifest, that it's a holdover from epub 2, but it seems it does.

@jgm
Copy link
Owner

jgm commented Feb 27, 2024

I find that epub3 spec pretty hard to understand.

On this issue it seems pretty clear. I can't find anywhere that says to use "cover" instead of "cover-image" in this context. I'm not going to make pandoc do something that goes against the explicit language of the spec -- it seems to me that this is a Google Play issue. iBooks, for example, has no trouble showing the cover image for the epubs we produce.

@gmccullo
Copy link
Author

gmccullo commented Feb 27, 2024

A couple points:

I don't think it violates the spec — the epub still validates; I think it's a backwards-compatibility consideration.

Also bear in mind that Pandoc 2x does not have this problem. Its epub3 files both validate and the covers work.

You might consider having a switch for this.

Finally it's worth researching more thoroughly than I have. I was just knocking around to figure out a workaround for the validation issue and the cover image issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants