New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more Global elements explanations #281
Conversation
- where they are found - the 2 default ones - their possible occurrences
specification.markdown
Outdated
| @@ -773,7 +779,11 @@ description: The version of the DocTypeExtension. Different DocTypeExtensionVers | |||
|
|
|||
| ## Global Elements | |||
|
|
|||
| EBML defines these Global Elements which MAY be stored within any Master Element of an EBML Document as defined by their Element Path. | |||
| EBML allows some special Elements to be found within more than one parent in an EBML Document. These Elements are called Global Elements. There are 2 Global Elements that can be found in any EBML Document: the CRC-32 Element and the Void Element. An EBML Schema MAY add other Global Elements to the format it defines. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deleted sentence here makes it clearer that a Global Element MUST have a parent Master Element, ie a Global Element can not be at the root level. Like:
<EBMLHeader/>
<Void/>
<Segment/>should not be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be allowed and that's how the Void path reads (it took me a while to figure it out back).
Void can be used to void some elements, including a Segment. Unlike CRC-32 it doesn't have any relation to a parent nor does it need to.
libebml was originally designed to handle such case, it should be able to handle Void without a parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of allowing Void at the top level, it may be used in some corner cases and does not hurt.
Actually CRC-32 definition could be slightly changed for also being at the top level, from first element after EBML header and up to end of file, so I prefer the new wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRC-32 Element documents the remaining data of the Parent Element of the CRC-32 Element. If at top level, there is no end of what the CRC-32 documents. For instance if you have 2 mkv like:
<EBML/>
<CRC-32/>
<Segment/>then concatenate them as an EBML Stream, then ?_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then concatenate them as an EBML Stream, then ?_?
You get the point. Up to end of file or next EBML element.
But before debating about CRC-32, let's agree first on Void element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analyzed over 100,000 mkv files in archive.org and never see a Void at top-level. I feel like these scenarios are ugly:
Why allow this? It allows a file where the header is exabytes into the file.
<Void/>
<EBML/>
<Segment/>When Void is a child of a Master Element, then you have the advantage of using SeekHead to navigate the file efficiently, but with this you must parse to thousands of various points in the file to find where the Segment is. Why allow such a mess?
<EBML/>
<Void/>
<thousands of additional voids/>
<Segment/>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why allow this? It allows a file where the header is exabytes into the file.
EBML element must be first IIRC, so still no Void element first.
Why allow such a mess?
In Matroska Segment must be unique. But in case it is used elsewhere, a top element could be several times, and "voiding" one can permit to avoid to rewrite the whole file while permitting to remove nicely an top element.
SeekHead
Is from Matroska, not EBML. I get that the goal is Matroska, but I am not convinced that we should force that in EBML itself. Maybe forbidding Void at top level in Matroska specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that mkvinfo refers to a root level void as Unknown Element. I had a mkv file with a gigabyte of empty voids between the header and the segment. It works in ffmpeg but takes a big delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analyzed over 100,000 mkv files in archive.org and never see a Void at top-level.
Where you looking for mkv files where the Segment was Void and thus the file was likely empty/unusable ? I seems logical that archive.org would not have such a file. That doesn't mean anything to how Void should be interpreted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @JeromeMartinez here. It wouldn't make much sense for Matroska (we can discuss this further). But for EBML formats it could make sense.
For example if you have a format which is a big flat table of elements like this:
<EBML/>
<XEntry/>
<XEntry/>
<YEntry/>
<XEntry/>
<YEntry/>
<ZEntry/>
<XEntry/>
...
You could want to erase one of the entries without having to rewrite the whole file.
|
Thoughts from @mbunkus? |
…l Elements at Root Level
|
Hi @robUx4, @mcr, I reviewed the document in consideration of removing this requirement in this PR:
I added 98edd95 to update a few definitions to acknowledge that in some cases Global Elements MAY be permitted at the Root Level. I suggest that in doing this, we also considering restricting what Global Elements may be at the Root Level to only the Void Element. |
specification.markdown
Outdated
| @@ -34,7 +34,7 @@ This document defines specific terms in order to define the format and applicati | |||
|
|
|||
| `EBML Schema`: A standardized definition for the structure of an `EBML Document Type`. | |||
|
|
|||
| `EBML Document`: A datastream comprised of only two components, an `EBML Header` and an `EBML Body`. | |||
| `EBML Document`: A datastream comprised of only two components, an `EBML Header` and an `EBML Body`, plus any optional `Global Elements` that MAY be permitted at the Root Level (for example, the `Void Element`). | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO Global Elements are part of the EBML Body.
We may have a loophole though, as I'm not sure they are allowed in the EBML Header. Void and CRC-32 seem like they could both be used there.
Maybe there's a distinction between the Global Elements defined in the EBML spec and the Global Elements defined in the format (let's say Matroska). The latter seem tied to the Body (defined by the Doctype in the Header). It seems they should be counted as inside the Body. The former I'm not sure. We may add to the Void/CRC definitions that they can be found in the EBML Header and the EBML Body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with the one word change proposed below it think we don't need this line at all. The Global Elements at the root are part of the EBML Body.
Then we need extra text saying that Void and CRC-32 can be found inside the EBML Header, but not at the root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed text change is:
EBML allows some special Elements to be found within more than one parent in an EBML Document or optionally at the Root Level of an EBML Body.
In the end we don't need extra text because the EBML Header is part of the EBML Document, so Void/CRC-32 can be in a parent element of the EBML Header. They can only be found at the root in the Body (but then CRC-32 requires a parent so it cannot).
specification.markdown
Outdated
| @@ -240,7 +240,7 @@ The contents of a Binary Element should not be interpreted by the EBML Reader. | |||
|
|
|||
| # EBML Document | |||
|
|
|||
| An EBML Document is comprised of only two components, an EBML Header and an EBML Body. An EBML Document MUST start with an EBML Header that declares significant characteristics of the entire EBML Body. An EBML Document consists of EBML Elements and MUST NOT contain any data that is not part of an EBML Element. | |||
| An EBML Document is comprised of only two components, an EBML Header and an EBML Body, plus any optional `Global Elements` that MAY be permitted at the Root Level (for example, the `Void Element`). An EBML Document MUST start with an EBML Header that declares significant characteristics of the entire EBML Body. An EBML Document consists of EBML Elements and MUST NOT contain any data that is not part of an EBML Element. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same remark as above
specification.markdown
Outdated
| @@ -779,7 +779,7 @@ description: The version of the DocTypeExtension. Different DocTypeExtensionVers | |||
|
|
|||
| ## Global Elements | |||
|
|
|||
| EBML allows some special Elements to be found within more than one parent in an EBML Document. These Elements are called Global Elements. There are 2 Global Elements that can be found in any EBML Document: the CRC-32 Element and the Void Element. An EBML Schema MAY add other Global Elements to the format it defines. | |||
| EBML allows some special Elements to be found within more than one parent in an EBML Document or optionally at the Root Level of an EBML Document. These Elements are called Global Elements. There are 2 Global Elements that can be found in any EBML Document: the CRC-32 Element and the Void Element. An EBML Schema MAY add other Global Elements to the format it defines. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be more precise with the EBML Header and the EBML Body. For example Void MUST NOT be the first element of the EBML Header. In fact it MUST NOT be found at the root of the EBML Header. If it's at the root (ie after the <EBML/>) it should be considered as part of the EBML Body. And CRC-32 cannot be at the root (I disagree with @JeromeMartinez on that). So this text should be:
or optionally at the Root Level of an EBML Body
instead of
or optionally at the Root Level of an EBML Document
…Header thus, they are considered part of the EBML Body
|
I think a9685c5 is enough to allow Void at the root of the EBML Document but not in the EBML Header. (It can be reverted if we don't agree on this.) |
Fixes #279