Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more Global elements explanations #281

Merged
merged 12 commits into from Sep 24, 2019
Merged

Add more Global elements explanations #281

merged 12 commits into from Sep 24, 2019

Conversation

robUx4
Copy link
Contributor

@robUx4 robUx4 commented Aug 25, 2019

Fixes #279

@robUx4 robUx4 added enhancement clarifications labels Aug 25, 2019
Copy link
Contributor

@dericed dericed left a comment

some minor comments but this looks okay to me

specification.markdown Outdated Show resolved Hide resolved
@@ -773,7 +779,11 @@ description: The version of the DocTypeExtension. Different DocTypeExtensionVers

## Global Elements

EBML defines these Global Elements which MAY be stored within any Master Element of an EBML Document as defined by their Element Path.
EBML allows some special Elements to be found within more than one parent in an EBML Document. These Elements are called Global Elements. There are 2 Global Elements that can be found in any EBML Document: the CRC-32 Element and the Void Element. An EBML Schema MAY add other Global Elements to the format it defines.
Copy link
Contributor

@dericed dericed Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deleted sentence here makes it clearer that a Global Element MUST have a parent Master Element, ie a Global Element can not be at the root level. Like:

<EBMLHeader/>
<Void/>
<Segment/>

should not be allowed.

Copy link
Contributor Author

@robUx4 robUx4 Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be allowed and that's how the Void path reads (it took me a while to figure it out back).

Void can be used to void some elements, including a Segment. Unlike CRC-32 it doesn't have any relation to a parent nor does it need to.

libebml was originally designed to handle such case, it should be able to handle Void without a parent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of allowing Void at the top level, it may be used in some corner cases and does not hurt.
Actually CRC-32 definition could be slightly changed for also being at the top level, from first element after EBML header and up to end of file, so I prefer the new wording.

Copy link
Contributor

@dericed dericed Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRC-32 Element documents the remaining data of the Parent Element of the CRC-32 Element. If at top level, there is no end of what the CRC-32 documents. For instance if you have 2 mkv like:

<EBML/>
<CRC-32/>
<Segment/>

then concatenate them as an EBML Stream, then ?_?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then concatenate them as an EBML Stream, then ?_?

You get the point. Up to end of file or next EBML element.

But before debating about CRC-32, let's agree first on Void element.

Copy link
Contributor

@dericed dericed Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I analyzed over 100,000 mkv files in archive.org and never see a Void at top-level. I feel like these scenarios are ugly:

Why allow this? It allows a file where the header is exabytes into the file.

<Void/>
<EBML/>
<Segment/>

When Void is a child of a Master Element, then you have the advantage of using SeekHead to navigate the file efficiently, but with this you must parse to thousands of various points in the file to find where the Segment is. Why allow such a mess?

<EBML/>
<Void/>
<thousands of additional voids/>
<Segment/>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow this? It allows a file where the header is exabytes into the file.

EBML element must be first IIRC, so still no Void element first.

Why allow such a mess?

In Matroska Segment must be unique. But in case it is used elsewhere, a top element could be several times, and "voiding" one can permit to avoid to rewrite the whole file while permitting to remove nicely an top element.

SeekHead

Is from Matroska, not EBML. I get that the goal is Matroska, but I am not convinced that we should force that in EBML itself. Maybe forbidding Void at top level in Matroska specs?

Copy link
Contributor

@dericed dericed Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that mkvinfo refers to a root level void as Unknown Element. I had a mkv file with a gigabyte of empty voids between the header and the segment. It works in ffmpeg but takes a big delay.

Copy link
Contributor Author

@robUx4 robUx4 Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I analyzed over 100,000 mkv files in archive.org and never see a Void at top-level.

Where you looking for mkv files where the Segment was Void and thus the file was likely empty/unusable ? I seems logical that archive.org would not have such a file. That doesn't mean anything to how Void should be interpreted.

Copy link
Contributor Author

@robUx4 robUx4 Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @JeromeMartinez here. It wouldn't make much sense for Matroska (we can discuss this further). But for EBML formats it could make sense.

For example if you have a format which is a big flat table of elements like this:

<EBML/>
<XEntry/>
<XEntry/>
<YEntry/>
<XEntry/>
<YEntry/>
<ZEntry/>
<XEntry/>
...

You could want to erase one of the entries without having to rewrite the whole file.

@dericed
Copy link
Contributor

@dericed dericed commented Aug 27, 2019

Thoughts from @mbunkus?

@dericed
Copy link
Contributor

@dericed dericed commented Sep 2, 2019

Hi @robUx4, @mcr, I reviewed the document in consideration of removing this requirement in this PR:

EBML defines these Global Elements which MAY be stored within any Master Element of an EBML Document as defined by their Element Path.

I added 98edd95 to update a few definitions to acknowledge that in some cases Global Elements MAY be permitted at the Root Level.

I suggest that in doing this, we also considering restricting what Global Elements may be at the Root Level to only the Void Element.

@dericed dericed self-requested a review Sep 2, 2019
dericed
dericed approved these changes Sep 2, 2019
Copy link
Contributor

@dericed dericed left a comment

LGTM with my added commit at 98edd95.

@@ -34,7 +34,7 @@ This document defines specific terms in order to define the format and applicati

`EBML Schema`: A standardized definition for the structure of an `EBML Document Type`.

`EBML Document`: A datastream comprised of only two components, an `EBML Header` and an `EBML Body`.
`EBML Document`: A datastream comprised of only two components, an `EBML Header` and an `EBML Body`, plus any optional `Global Elements` that MAY be permitted at the Root Level (for example, the `Void Element`).
Copy link
Contributor Author

@robUx4 robUx4 Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO Global Elements are part of the EBML Body.

We may have a loophole though, as I'm not sure they are allowed in the EBML Header. Void and CRC-32 seem like they could both be used there.

Maybe there's a distinction between the Global Elements defined in the EBML spec and the Global Elements defined in the format (let's say Matroska). The latter seem tied to the Body (defined by the Doctype in the Header). It seems they should be counted as inside the Body. The former I'm not sure. We may add to the Void/CRC definitions that they can be found in the EBML Header and the EBML Body.

Copy link
Contributor Author

@robUx4 robUx4 Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with the one word change proposed below it think we don't need this line at all. The Global Elements at the root are part of the EBML Body.

Then we need extra text saying that Void and CRC-32 can be found inside the EBML Header, but not at the root.

Copy link
Contributor Author

@robUx4 robUx4 Sep 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed text change is:

EBML allows some special Elements to be found within more than one parent in an EBML Document or optionally at the Root Level of an EBML Body.

In the end we don't need extra text because the EBML Header is part of the EBML Document, so Void/CRC-32 can be in a parent element of the EBML Header. They can only be found at the root in the Body (but then CRC-32 requires a parent so it cannot).

@@ -240,7 +240,7 @@ The contents of a Binary Element should not be interpreted by the EBML Reader.

# EBML Document

An EBML Document is comprised of only two components, an EBML Header and an EBML Body. An EBML Document MUST start with an EBML Header that declares significant characteristics of the entire EBML Body. An EBML Document consists of EBML Elements and MUST NOT contain any data that is not part of an EBML Element.
An EBML Document is comprised of only two components, an EBML Header and an EBML Body, plus any optional `Global Elements` that MAY be permitted at the Root Level (for example, the `Void Element`). An EBML Document MUST start with an EBML Header that declares significant characteristics of the entire EBML Body. An EBML Document consists of EBML Elements and MUST NOT contain any data that is not part of an EBML Element.
Copy link
Contributor Author

@robUx4 robUx4 Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remark as above

@@ -779,7 +779,7 @@ description: The version of the DocTypeExtension. Different DocTypeExtensionVers

## Global Elements

EBML allows some special Elements to be found within more than one parent in an EBML Document. These Elements are called Global Elements. There are 2 Global Elements that can be found in any EBML Document: the CRC-32 Element and the Void Element. An EBML Schema MAY add other Global Elements to the format it defines.
EBML allows some special Elements to be found within more than one parent in an EBML Document or optionally at the Root Level of an EBML Document. These Elements are called Global Elements. There are 2 Global Elements that can be found in any EBML Document: the CRC-32 Element and the Void Element. An EBML Schema MAY add other Global Elements to the format it defines.
Copy link
Contributor Author

@robUx4 robUx4 Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be more precise with the EBML Header and the EBML Body. For example Void MUST NOT be the first element of the EBML Header. In fact it MUST NOT be found at the root of the EBML Header. If it's at the root (ie after the <EBML/>) it should be considered as part of the EBML Body. And CRC-32 cannot be at the root (I disagree with @JeromeMartinez on that). So this text should be:

or optionally at the Root Level of an EBML Body

instead of

or optionally at the Root Level of an EBML Document

…Header

thus, they are considered part of the EBML Body
@robUx4
Copy link
Contributor Author

@robUx4 robUx4 commented Sep 22, 2019

I think a9685c5 is enough to allow Void at the root of the EBML Document but not in the EBML Header.

(It can be reverted if we don't agree on this.)

specification.markdown Outdated Show resolved Hide resolved
specification.markdown Outdated Show resolved Hide resolved
@dericed dericed merged commit 6715772 into master Sep 24, 2019
@dericed dericed deleted the global-elements branch Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarifications enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants