Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema - metadata.source? #14

Open
danielweck opened this issue Dec 27, 2018 · 16 comments
Open

JSON Schema - metadata.source? #14

danielweck opened this issue Dec 27, 2018 · 16 comments

Comments

@danielweck
Copy link
Member

metadata.source

Currently parsed from EPUB (OPF XML):

metadata.source = OPF /package/metadata/dc:source/text()

@danielweck
Copy link
Member Author

@danielweck danielweck changed the title metadata.source? JSON Schema - metadata.source? Dec 27, 2018
@danielweck
Copy link
Member Author

Related issue: #8

@danielweck danielweck mentioned this issue Dec 27, 2018
7 tasks
@HadrienGardeur
Copy link
Collaborator

I have mixed feelings about this one. While we declare it in the JSON-LD context document, we don't really talk about it elsewhere.

It's currently mapped to http://schema.org/isBasedOn but given the rather "flexible" nature of dc:source (and DublinCore in general), it's very likely that we won't get a URL out of it.
I would rather remove it and provide support as part of our extensibility model by referencing the DublinCore element directly instead.

@HadrienGardeur
Copy link
Collaborator

My advice would be to map this to schema.org and use the inherent extensibility of the model in various implementations.

This could be covered in the parsing doc in the future cc @JayPanoz

@danielweck
Copy link
Member Author

Same with dc:rights

metadata.rights = OPF /package/metadata/dc:rights/text()?

@HadrienGardeur
Copy link
Collaborator

I see two potential way of dealing with this:

  • as I said above, this could be mapped to a schema.org term, but in this case we need to document this in the parsing doc
  • the alternative is to simply extend dc:source into http://purl.org/dc/elements/1.1/source and store this through the extensibility of our model

I would only follow the first approach for metadata that we consider important enough, everything else should simply use our extensibility.

@mickael-menu-mantano
Copy link

mickael-menu-mantano commented May 31, 2019

I'm trying to add the remaining <meta> properties into otherMetadata. But I don't know what to do with those namespace prefixes (eg. dc in dc:source). Should we remove it from the key? Or expand it like Hadrien mentioned (http://purl.org/dc/elements/1.1/source)?

I didn't find any way to get the namespace URL for a given prefix in our Swift XML lib (or access the xmlns:* attributes for that matter), so I'm not sure I can expand them for unknown prefixes or custom ones (eg. xmlns:dc-alias="http://purl.org/dc/elements/1.1/"). It might be something that can be fixed in the dependency if the feature is really needed, because it's built on top of libxml2.

Edit: This is wrong, see #14 (comment)

@HadrienGardeur
Copy link
Collaborator

Expanding them to a full URL is probably the right way to handle this for now.

@mickael-menu-mantano
Copy link

Okay, I think I found something in libxml2 that I could expose to Swift (node->nsDef) to get the xmlns:x attributes.

@mickael-menu-mantano
Copy link

mickael-menu-mantano commented Jun 4, 2019

I have three related questions:

  1. How do we handle multiple values (eg. several <meta> with same property or name)? The entries in the otherMetadata could be either String or [String] if multiple values are found.

  2. What do we do with refines properties?

  3. What about those kind of tags: <dc:rights>Public Domain</dc:rights>? Do we look for any tag with dc namespace in metadata and use the tag as the key?

@mickael-menu-mantano
Copy link

mickael-menu-mantano commented Jun 4, 2019

I just found out that the prefixes used in meta[@property] are not XML namespace prefixes, but actually declared in package[@prefix] with a list of default prefixes (http://www.idpf.org/epub/301/spec/epub-publications.html#sec-metadata-reserved-vocabs). Great news, I won't have to fiddle with an external dependency.

Reading Systems must resolve all reserved prefixes used in Package Documents using their pre-defined URIs. Reserved prefixes should not be overridden in the prefix attribute, but Reading Systems must use such local overrides when encountered.

@JayPanoz
Copy link
Contributor

JayPanoz commented Jun 4, 2019

Ah yeah, just to reinstate that you should feel free to complete the parsing doc.

For starters, it’s not complete by any means, so having at least a documented reference will help discuss it and fine-tune handling.

Then there is metadata I’m honestly not familiar with, and don’t know how to handle – I legit don’t know what authors expect for some metadata for instance, but having a written reference makes it (more) easily sharable/reviewable by others.

Finally I’m confident others are more knowledgeable than me when it comes to some metadata I’ve never used as an author.

@mickael-menu-mantano
Copy link

mickael-menu-mantano commented Jun 4, 2019

For reference, I came up with this solution on Swift:
readium/r2-streamer-swift@037de4d?diff=unified#diff-b8124a64cd2aa700aa444e5f9a7d7232

This generates the otherMetadata JSON but is also used to access metadata from their name and associated vocabulary. This is safer than directly querying for properties like rendition:layout in case the author uses a different prefix for the rendition vocabulary.

(@HadrienGardeur My three questions are still relevant though: #14 (comment))

To author the additional metadata, I'm using this list of "known" properties to ignore, maybe it should be something documented and shared between platforms (or better, if we share a unit test file that covers those cases):

// List of properties that should not be added to `otherMetadata` because they
// are already consumed by the RWPM model.
private let rwpmProperties: [OPFVocabulary: [String]] = [
    .defaultMetadata: ["cover"],
    .dc: ["contributor", "creator", "publisher"],
    .dcterms: ["contributor", "creator", "modified", "publisher"],
    .media: ["duration"],
    .rendition: ["flow", "layout", "orientation", "spread"]
]

Finally, here's an example of JSON produced for publication.metadata:

"metadata": {
    "http://www.idpf.org/epub/vocab/package/a11y/#certifiedBy": "EDRLab",
    "http://purl.org/dc/elements/1.1/source": ["Feedbooks", "Web", "Internet"],
    "http://purl.org/dc/elements/1.1/rights": "Public Domain",
    "http://idpf.org/epub/vocab/package/#type": "article",
    "http://my.url/#customProperty": "Custom property",
    "rendition": {
        "spread": "both",
        "overflow": "scrolled",
        "orientation": "landscape",
        "layout": "fixed"
    }
}

from

<package prefix="
  rend-alias: http://www.idpf.org/vocab/rendition/#
  myPrefix: http://my.url/#">
  <metadata>
    <dc:source>Feedbooks</dc:source> 
    <meta property="dc:source">Web</meta> 
    <meta name="dc:source" content="Internet"/> 
    <dc:rights>Public Domain</dc:rights> 
    <meta property="rendition:layout">pre-paginated</meta>
    <meta property="rend-alias:orientation">landscape</meta>
    <meta property="rendition:flow">scrolled-doc</meta>
    <meta property="rendition:spread">both</meta>
    <meta property="og:type">article</meta>
    <meta property="a11y:certifiedBy">EDRLab</meta>
    <meta property="myPrefix:customProperty">Custom property</meta>
  </metadata>
</package>

@danielweck
Copy link
Member Author

EPUB "reserved" prefixes: in practice they are probably never overridden (why would content creators want to take that risk) ... but they can! (even though they are "reserved")

Reserved prefixes SHOULD NOT be overridden in the prefix attribute, but Reading Systems MUST use such local overrides when encountered.
As changes to the reserved prefixes and updates to Reading Systems are not always going happen in synchrony, Reading Systems MUST NOT fail when encountering unrecognized prefixes (i.e., not reserved and not declared using the prefix attribute).

https://w3c.github.io/publ-epub-revision/epub32/spec/epub-packages.html#sec-metadata-reserved-prefixes

...so strictly-speaking, Mickael's approach makes sense :)

@HadrienGardeur
Copy link
Collaborator

How do we handle multiple values (eg. several with same property or name)? The entries in the otherMetadata could be either String or [String] if multiple values are found.

I think they could be either:

  • String
  • [String]
  • Object
  • or [Object]

What do we do with refines properties?

This is the use case where we could use an object:

"http://my.url/#customProperty": {
  "@value": "Main value",
  "http://my.url/#customPropertyUsedInRefine": "Refine value"
}

What about those kind of tags: dc:rightsPublic Domain</dc:rights>? Do we look for any tag with dc namespace in metadata and use the tag as the key?

You've covered this properly in your examples IMO.

@danielweck
Copy link
Member Author

Related issue: #66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants