Syntax for the Media Overlay Document #22

HadrienGardeur · 2016-12-15T15:17:00Z

In addition to providing the content of an OPF in a different serialization, the streamer will also be responsible for parsing SMIL files and providing them in a simplified JSON document.

What are the key principles and information that need to be preserved in SMIL? What can we simplify and streamline?

cc @danielweck

HadrienGardeur · 2016-12-19T17:36:35Z

Looking at the EPUB 3 specification, here's a list of things for which I'm not entirely sure of their usefulness:

id for seq, par, text and audio
epub:textref for seq and par
do we need specific attributes for begin & end for the audio segment? Can't we simply use media fragment instead?
do we need sub-second precision?
are epub:type really added in SMIL documents instead of the HTML resources?

Update
During our call on Dec 21st, @danielweck provided answers to these questions:

no need for id
epub:textref is mostly useful because it's associated to an epub:type
media fragments could be a good alternative
sub-second precision could be implemented through media fragments' support for NTP (need to be tested if implemented by browsers, for example in <audio> and <video>)

HadrienGardeur · 2016-12-20T12:24:30Z

The more I think about it, the less the difference betweenseq and par truly makes sense. I think it could be boiled down to a single object that can have:

a text reference
an audio reference
children
roles

Here are a few examples.

Simple sequence

{
  "text": "chapter1.html#start",
  "role": ["seq", "chapter"],
  "children":[]
}

Simple parallel

{
  "role": ["par"],
  "text": "chapter1.html#sentence1", 
  "audio": "chapter1.mp3#t=0,20"
}

Are you allowed to mix up seq and par at the top level of a SMIL document? If that's the case, we can create a single model for a media overlay node, in Go this would look like this:

type MediaOverlayNode struct {
	Text       string                  `json:"text,omitempty"`
	Audio      string                  `json:"audio,omitempty"`
	Role       []string                `json:"role,omitempty"`
	Children   []MediaOverlayNode      `json:"children,omitempty"`
}

danielweck · 2016-12-21T17:12:09Z

Time / clock syntax (begin/end offsets):
https://www.w3.org/TR/media-frags/#naming-time
https://www.w3.org/TR/SMIL3/smil-timing.html#q22

HadrienGardeur · 2016-12-22T16:20:42Z

Updated both comments above, with replies from @danielweck and new examples based on our latest call.

llemeurfr · 2016-12-22T18:20:05Z

After looking back at the SMIL profile in EPUB 3, I can propose an alternative syntax, which tries to be both readable and short.

A Sequence object contains a "sequence" property (instead of children) -> people 'see' the concept.

{
  "text": "chapter1.html#start",
  "role": ["chapter"],
  "sequence":[]
}

A Parallel object does not need a role (it is optional) ; the fact that is has no sequence property is enough to identify it (it was an initial idea of Hadrien).

{
  "text": "chapter1.html#sentence1", 
  "audio": "chapter1.mp3#t=0,20"
}

HadrienGardeur · 2016-12-22T19:15:01Z

We're running in circles here... What you're proposing @llemeurfr is exactly what I had in my proposal yesterday.

I edited the comment above because @danielweck was unhappy about the lack of direct reference to seq and par.

The only difference between your proposal and mine, is that I use children while you rely on a similar element named sequence.
Since we already have a role and my understanding of SMIL is that we can have more than just seq and par, I dislike the idea of repeating the role twice, and would much rather use a generic element.

I'm happy to move back to my initial proposal, but I'd like to get a consensus first on the need to explicitly state that each object is a seq or a par.

rkwright · 2016-12-23T19:34:47Z

I like "sequence" instead of "children", simply because it is more obvious. And I, personally, would be happier if the "par" object had a type/role of "par" (or "parallel"), again to be obvious. But I will say that this reminds me of the early days of XML and, before that, PostScript (yes, I'm that old) when we argued about these sorts of issues. Really, is anyone besides John Warnock or Tim Bray going to LOOK at this code? :-)

HadrienGardeur · 2016-12-27T11:21:08Z

I don't like the idea of using a role for what's essentially a generic element for listing child elements because SMIL isn't limited to par and seq. If another role is introduced in EPUB (let's call it new) we could end up with a very weird syntax if we use sequence:

{
  "role": ["new"],
  "text": "chapter1.html#start",
  "sequence": []
}

Is this a seq or a new element?

Anyway, you're somehow right @rkwright that we're indeed debating some very small details here. So far, no one has challenged the proposed model based on those 4 elements, so at least that's something we all agree on.

It seems that how we name these elements, and the requirement for a role are the two main things we're still debating.

llemeurfr · 2016-12-27T12:31:48Z

I agree with Ric that it can be seen as a minor naming issue. I'm only thinking about the future of this work = a proposal for an evolution of EPUB.

The main question is: do we want to mimic the SMIL model?
In such a case the parobject should contain a json array (children), like the seq object, because their content model is identical in the SMIL schema (https://www.w3.org/TR/2005/REC-SMIL2-20050107/smil-SCHEMA.html#timing).
And apart from seq and par, SMIL has 1 other element defined at this level, excl (meaning mutually exclusive) (https://www.w3.org/TR/REC-smil/smil-timing.html#Timing-ExclSyntax), with the same flexible content model as seqand par.

My feeling is that we should not mimic the SMIL model and its flexibility, but rather stick with a simplified model, tightly attached to our need. Therefore no new or exclelement would appear in the model, and some implicit object type is acceptable.

If you all prefer an explicit object type, I'd prefer a type or nameproperty over a value in the role array (which has another meaning).

HadrienGardeur · 2016-12-27T13:56:13Z

In such a case the parobject should contain a json array (children), like the seq object, because their content model is identical in the SMIL schema (https://www.w3.org/TR/2005/REC-SMIL2-20050107/smil-SCHEMA.html#timing).

Why would we have to include a JSON array because of a schema? Sorry, I'm not buying that argument.

And apart from seq and par, SMIL has 1 other element defined at this level, excl (meaning mutually exclusive) (https://www.w3.org/TR/REC-smil/smil-timing.html#Timing-ExclSyntax), with the same flexible content model as seqand par.

Which could be handled using role and children.

My feeling is that we should not mimic the SMIL model and its flexibility, but rather stick with a simplified model, tightly attached to our need.

That's also my personal preference, which is why my first proposal did not included an explicit object type at all:

{
  "text": "chapter1.html#start",
  "role": ["chapter"],
  "children":[]
}

{
  "text": "chapter1.html#sentence1", 
  "audio": "chapter1.mp3#t=0,20"
}

I still dislike using sequence instead of children for the reasons listed above, mostly because this element will contain both sequences and parallels, even in a simplified model.

danielweck · 2016-12-28T11:46:37Z

So, after a few days in hiatus, I am revisiting this technical discussion.

I am leaning towards extreme-lightweight-JSON as originally proposed by Hadrien. That is to say: minimal syntactical overhead (fewer bytes in the generated HTTP responses, fast parsing / discovery of content semantics), at the cost of some additional mental gymnastics (less human-friendly) to infer the par vs. seq type of timing containers.

This approach would probably not scale well in a hypothetical future when the feature set of "Media Overlays" (or whatever this ends up being called) is extended to support more SMIL functionality (which is huge, relative to DAISY talking books, let alone compared with EPUB3 MO), because of how "compressed" / less expressive the proposed JSON syntax is.

But, as our current focus is to propose a lean, efficient serialisation format for a simple timing syntax, moreover primarily designed for internal use, I am comfortable with the minimal parent-children "node"-like structure (and a few relevant fields to express crucial semantics, etc.). Should the model need to evolve significantly, we could add versioning hints to differentiate syntax and processor implementations.

To be honest, there has been several attempts to define a declarative timing syntax for the web ("timesheets", much like CSS stylesheets), but this essentially failed because of limited use-cases, and because of the greater flexibility afforded by other animation / timings mechanisms (svg has its own brew of SMIL, there's CSS animation, and of course there's programmatic timing using JavaScript).

So, I am not too concerned about a future evolution that would render our internal format backward-incompatible. As for BFF (browser-friendly flavour of EPUB), I don't think Readium-2 's SMIL support should aim to be a blueprint for an interoperable authoring + interchange format. I believe we should just ensure ; on a best-effort basis, given the alignment with web syntax such as Media Fragments ; a reasonable level of lossless roundtrip-ability. Our primary prerogative is to guarantee no data loss when converting from EPUB2-3.1 to Readium's internal JSON format.

Sorry for this long blurb :)
Thoughts welcome. This is by no means a definitive statement, just where I stand at this point of the discussion.

HadrienGardeur · 2016-12-29T00:11:16Z

Thanks @danielweck, I can now draft something since we have consensus on a minimal syntax.

HadrienGardeur · 2017-01-02T19:05:01Z

Here's a reference to the first draft: https://github.com/readium/readium-2/blob/master/media-overlay/syntax.md

HadrienGardeur · 2017-03-14T17:20:55Z

We've covered the syntax issue and it's already implemented in Go, I'm closing this issue.

HadrienGardeur mentioned this issue Dec 15, 2016

Referencing Media Overlays in the Publication Manifest #23

Closed

HadrienGardeur added Streamer Discussion labels Mar 13, 2017

HadrienGardeur closed this as completed Mar 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax for the Media Overlay Document #22

Syntax for the Media Overlay Document #22

HadrienGardeur commented Dec 15, 2016

HadrienGardeur commented Dec 19, 2016 •

edited

HadrienGardeur commented Dec 20, 2016 •

edited

danielweck commented Dec 21, 2016

HadrienGardeur commented Dec 22, 2016

llemeurfr commented Dec 22, 2016

HadrienGardeur commented Dec 22, 2016

rkwright commented Dec 23, 2016 •

edited

HadrienGardeur commented Dec 27, 2016

llemeurfr commented Dec 27, 2016

HadrienGardeur commented Dec 27, 2016

danielweck commented Dec 28, 2016

HadrienGardeur commented Dec 29, 2016

HadrienGardeur commented Jan 2, 2017

HadrienGardeur commented Mar 14, 2017

Syntax for the Media Overlay Document #22

Syntax for the Media Overlay Document #22

Comments

HadrienGardeur commented Dec 15, 2016

HadrienGardeur commented Dec 19, 2016 • edited

HadrienGardeur commented Dec 20, 2016 • edited

danielweck commented Dec 21, 2016

HadrienGardeur commented Dec 22, 2016

llemeurfr commented Dec 22, 2016

HadrienGardeur commented Dec 22, 2016

rkwright commented Dec 23, 2016 • edited

HadrienGardeur commented Dec 27, 2016

llemeurfr commented Dec 27, 2016

HadrienGardeur commented Dec 27, 2016

danielweck commented Dec 28, 2016

HadrienGardeur commented Dec 29, 2016

HadrienGardeur commented Jan 2, 2017

HadrienGardeur commented Mar 14, 2017

HadrienGardeur commented Dec 19, 2016 •

edited

HadrienGardeur commented Dec 20, 2016 •

edited

rkwright commented Dec 23, 2016 •

edited