Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax for the Media Overlay Document #22

Closed
HadrienGardeur opened this issue Dec 15, 2016 · 14 comments
Closed

Syntax for the Media Overlay Document #22

HadrienGardeur opened this issue Dec 15, 2016 · 14 comments

Comments

@HadrienGardeur
Copy link

In addition to providing the content of an OPF in a different serialization, the streamer will also be responsible for parsing SMIL files and providing them in a simplified JSON document.

What are the key principles and information that need to be preserved in SMIL? What can we simplify and streamline?

cc @danielweck

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Dec 19, 2016

Looking at the EPUB 3 specification, here's a list of things for which I'm not entirely sure of their usefulness:

  • id for seq, par, text and audio
  • epub:textref for seq and par
  • do we need specific attributes for begin & end for the audio segment? Can't we simply use media fragment instead?
  • do we need sub-second precision?
  • are epub:type really added in SMIL documents instead of the HTML resources?

Update
During our call on Dec 21st, @danielweck provided answers to these questions:

  • no need for id
  • epub:textref is mostly useful because it's associated to an epub:type
  • media fragments could be a good alternative
  • sub-second precision could be implemented through media fragments' support for NTP (need to be tested if implemented by browsers, for example in <audio> and <video>)

@HadrienGardeur
Copy link
Author

HadrienGardeur commented Dec 20, 2016

The more I think about it, the less the difference betweenseq and par truly makes sense. I think it could be boiled down to a single object that can have:

  • a text reference
  • an audio reference
  • children
  • roles

Here are a few examples.

  1. Simple sequence
{
  "text": "chapter1.html#start",
  "role": ["seq", "chapter"],
  "children":[]
}
  1. Simple parallel
{
  "role": ["par"],
  "text": "chapter1.html#sentence1", 
  "audio": "chapter1.mp3#t=0,20"
}

Are you allowed to mix up seq and par at the top level of a SMIL document? If that's the case, we can create a single model for a media overlay node, in Go this would look like this:

type MediaOverlayNode struct {
	Text       string                  `json:"text,omitempty"`
	Audio      string                  `json:"audio,omitempty"`
	Role       []string                `json:"role,omitempty"`
	Children   []MediaOverlayNode      `json:"children,omitempty"`
}

@danielweck
Copy link
Member

@HadrienGardeur
Copy link
Author

Updated both comments above, with replies from @danielweck and new examples based on our latest call.

@llemeurfr
Copy link
Contributor

After looking back at the SMIL profile in EPUB 3, I can propose an alternative syntax, which tries to be both readable and short.

A Sequence object contains a "sequence" property (instead of children) -> people 'see' the concept.

{
  "text": "chapter1.html#start",
  "role": ["chapter"],
  "sequence":[]
}

A Parallel object does not need a role (it is optional) ; the fact that is has no sequence property is enough to identify it (it was an initial idea of Hadrien).

{
  "text": "chapter1.html#sentence1", 
  "audio": "chapter1.mp3#t=0,20"
}

@HadrienGardeur
Copy link
Author

We're running in circles here... What you're proposing @llemeurfr is exactly what I had in my proposal yesterday.

I edited the comment above because @danielweck was unhappy about the lack of direct reference to seq and par.

The only difference between your proposal and mine, is that I use children while you rely on a similar element named sequence.
Since we already have a role and my understanding of SMIL is that we can have more than just seq and par, I dislike the idea of repeating the role twice, and would much rather use a generic element.

I'm happy to move back to my initial proposal, but I'd like to get a consensus first on the need to explicitly state that each object is a seq or a par.

@rkwright
Copy link
Member

rkwright commented Dec 23, 2016

I like "sequence" instead of "children", simply because it is more obvious. And I, personally, would be happier if the "par" object had a type/role of "par" (or "parallel"), again to be obvious. But I will say that this reminds me of the early days of XML and, before that, PostScript (yes, I'm that old) when we argued about these sorts of issues. Really, is anyone besides John Warnock or Tim Bray going to LOOK at this code? :-)

@HadrienGardeur
Copy link
Author

I don't like the idea of using a role for what's essentially a generic element for listing child elements because SMIL isn't limited to par and seq. If another role is introduced in EPUB (let's call it new) we could end up with a very weird syntax if we use sequence:

{
  "role": ["new"],
  "text": "chapter1.html#start",
  "sequence": []
}

Is this a seq or a new element?

Anyway, you're somehow right @rkwright that we're indeed debating some very small details here. So far, no one has challenged the proposed model based on those 4 elements, so at least that's something we all agree on.

It seems that how we name these elements, and the requirement for a role are the two main things we're still debating.

@llemeurfr
Copy link
Contributor

I agree with Ric that it can be seen as a minor naming issue. I'm only thinking about the future of this work = a proposal for an evolution of EPUB.

The main question is: do we want to mimic the SMIL model?
In such a case the parobject should contain a json array (children), like the seq object, because their content model is identical in the SMIL schema (https://www.w3.org/TR/2005/REC-SMIL2-20050107/smil-SCHEMA.html#timing).
And apart from seq and par, SMIL has 1 other element defined at this level, excl (meaning mutually exclusive) (https://www.w3.org/TR/REC-smil/smil-timing.html#Timing-ExclSyntax), with the same flexible content model as seqand par.

My feeling is that we should not mimic the SMIL model and its flexibility, but rather stick with a simplified model, tightly attached to our need. Therefore no new or exclelement would appear in the model, and some implicit object type is acceptable.

If you all prefer an explicit object type, I'd prefer a type or nameproperty over a value in the role array (which has another meaning).

@HadrienGardeur
Copy link
Author

In such a case the parobject should contain a json array (children), like the seq object, because their content model is identical in the SMIL schema (https://www.w3.org/TR/2005/REC-SMIL2-20050107/smil-SCHEMA.html#timing).

Why would we have to include a JSON array because of a schema? Sorry, I'm not buying that argument.

And apart from seq and par, SMIL has 1 other element defined at this level, excl (meaning mutually exclusive) (https://www.w3.org/TR/REC-smil/smil-timing.html#Timing-ExclSyntax), with the same flexible content model as seqand par.

Which could be handled using role and children.

My feeling is that we should not mimic the SMIL model and its flexibility, but rather stick with a simplified model, tightly attached to our need.

That's also my personal preference, which is why my first proposal did not included an explicit object type at all:

{
  "text": "chapter1.html#start",
  "role": ["chapter"],
  "children":[]
}
{
  "text": "chapter1.html#sentence1", 
  "audio": "chapter1.mp3#t=0,20"
}

I still dislike using sequence instead of children for the reasons listed above, mostly because this element will contain both sequences and parallels, even in a simplified model.

@danielweck
Copy link
Member

So, after a few days in hiatus, I am revisiting this technical discussion.

I am leaning towards extreme-lightweight-JSON as originally proposed by Hadrien. That is to say: minimal syntactical overhead (fewer bytes in the generated HTTP responses, fast parsing / discovery of content semantics), at the cost of some additional mental gymnastics (less human-friendly) to infer the par vs. seq type of timing containers.

This approach would probably not scale well in a hypothetical future when the feature set of "Media Overlays" (or whatever this ends up being called) is extended to support more SMIL functionality (which is huge, relative to DAISY talking books, let alone compared with EPUB3 MO), because of how "compressed" / less expressive the proposed JSON syntax is.

But, as our current focus is to propose a lean, efficient serialisation format for a simple timing syntax, moreover primarily designed for internal use, I am comfortable with the minimal parent-children "node"-like structure (and a few relevant fields to express crucial semantics, etc.). Should the model need to evolve significantly, we could add versioning hints to differentiate syntax and processor implementations.

To be honest, there has been several attempts to define a declarative timing syntax for the web ("timesheets", much like CSS stylesheets), but this essentially failed because of limited use-cases, and because of the greater flexibility afforded by other animation / timings mechanisms (svg has its own brew of SMIL, there's CSS animation, and of course there's programmatic timing using JavaScript).

So, I am not too concerned about a future evolution that would render our internal format backward-incompatible. As for BFF (browser-friendly flavour of EPUB), I don't think Readium-2 's SMIL support should aim to be a blueprint for an interoperable authoring + interchange format. I believe we should just ensure ; on a best-effort basis, given the alignment with web syntax such as Media Fragments ; a reasonable level of lossless roundtrip-ability. Our primary prerogative is to guarantee no data loss when converting from EPUB2-3.1 to Readium's internal JSON format.

Sorry for this long blurb :)
Thoughts welcome. This is by no means a definitive statement, just where I stand at this point of the discussion.

@HadrienGardeur
Copy link
Author

Thanks @danielweck, I can now draft something since we have consensus on a minimal syntax.

@HadrienGardeur
Copy link
Author

Here's a reference to the first draft: https://github.com/readium/readium-2/blob/master/media-overlay/syntax.md

@HadrienGardeur
Copy link
Author

We've covered the syntax issue and it's already implemented in Go, I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants