New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize the meaning of mesh versioning #41

Closed
io7m opened this Issue Feb 23, 2017 · 6 comments

Comments

Projects
None yet
1 participant
@io7m
Owner

io7m commented Feb 23, 2017

Right now, the binary format is actually (accidentally) defined such that every future change will require a major version number increment. Work out what changes need to be made to the format to effectively use semantic versioning and to allow an appropriate degree of forward and backwards compatibility for parser implementations.

@io7m io7m self-assigned this Feb 23, 2017

@io7m io7m added this to the 0.10.0 milestone Feb 23, 2017

@io7m io7m removed this from the 0.10.0 milestone Apr 24, 2017

@io7m io7m referenced this issue Apr 27, 2017

Closed

Redesign mesh API #71

@io7m

This comment has been minimized.

Show comment
Hide comment
@io7m

io7m Apr 29, 2017

Owner

There's some inconsistency in the header too:

[record SMFBV1Header
  ([field schema                   SMFBV1SchemaID]
   [field vertex_count             [integer unsigned 64]]
   [field triangle_count           [integer unsigned 64]]
   [field triangle_index_size_bits [integer unsigned 32]]
   [field coordinate_system        SMFBV1CoordinateSystems]
   [padding-octets 2]
   [padding-octets 4]
   [field meta_count               [integer unsigned 32]]
   [field attribute_count          [integer unsigned 64]])]

The vertex, triangle, and attribute counts are 64 bit, but the metadata count is 32 bit. I think the attribute count should be 32 bit. Additionally, the [padding-octets 4] could be eliminated.

Owner

io7m commented Apr 29, 2017

There's some inconsistency in the header too:

[record SMFBV1Header
  ([field schema                   SMFBV1SchemaID]
   [field vertex_count             [integer unsigned 64]]
   [field triangle_count           [integer unsigned 64]]
   [field triangle_index_size_bits [integer unsigned 32]]
   [field coordinate_system        SMFBV1CoordinateSystems]
   [padding-octets 2]
   [padding-octets 4]
   [field meta_count               [integer unsigned 32]]
   [field attribute_count          [integer unsigned 64]])]

The vertex, triangle, and attribute counts are 64 bit, but the metadata count is 32 bit. I think the attribute count should be 32 bit. Additionally, the [padding-octets 4] could be eliminated.

@io7m

This comment has been minimized.

Show comment
Hide comment
@io7m

io7m Apr 29, 2017

Owner

I think the rule for versioning has to be that a parser of major version m has to be able to parse files of version m.n for any n. The question then becomes how to define the format such that it's actually possible to make changes that can be forwards and backwards compatible.

Firstly, the format is designed to be streamed. The header is a fixed structure followed by a list of attribute definitions, followed by data for the attributes, followed by triangles, followed by metadata. Speaking in terms of keeping major version compatibility, if a new type of data needs to be delivered then it needs to be appended to the end of the file. Parsers need to be able to cope with extra data that they do not recognize at the end of the file. If a new type of data is to be delivered, then the header needs extra information appended to it that states that the new data will be there. Therefore, parsers need to be able to cope with extra data that they do not recognize at the end of the header.

For the text encoding, implementing the above may be fairly easy as the header and data sections are explicitly delimited. The text parser can simply ignore any header command that it doesn't understand and can ignore any commands it receives beyond the final metadata element. Some care should be taken to avoid accepting commands that shouldn't exist given the format's version. For example, if the parser accepts version 1.3 of the format, the parsed file declares that it is version 1.2 but nevertheless contains 1.3 commands, the parser should ignore/reject the commands (with an appropriate diagnostic message).

For the binary encoding, things are trickier because the size of the header is implicit and the data sections aren't explicitly delimited. The parser simply works out ahead of time how large each data section will be (except for metadata, which is variable length but the number of metadata items are known ahead of time). The parser needs to know where the
attribute definitions start and where the attribute data starts...

Owner

io7m commented Apr 29, 2017

I think the rule for versioning has to be that a parser of major version m has to be able to parse files of version m.n for any n. The question then becomes how to define the format such that it's actually possible to make changes that can be forwards and backwards compatible.

Firstly, the format is designed to be streamed. The header is a fixed structure followed by a list of attribute definitions, followed by data for the attributes, followed by triangles, followed by metadata. Speaking in terms of keeping major version compatibility, if a new type of data needs to be delivered then it needs to be appended to the end of the file. Parsers need to be able to cope with extra data that they do not recognize at the end of the file. If a new type of data is to be delivered, then the header needs extra information appended to it that states that the new data will be there. Therefore, parsers need to be able to cope with extra data that they do not recognize at the end of the header.

For the text encoding, implementing the above may be fairly easy as the header and data sections are explicitly delimited. The text parser can simply ignore any header command that it doesn't understand and can ignore any commands it receives beyond the final metadata element. Some care should be taken to avoid accepting commands that shouldn't exist given the format's version. For example, if the parser accepts version 1.3 of the format, the parsed file declares that it is version 1.2 but nevertheless contains 1.3 commands, the parser should ignore/reject the commands (with an appropriate diagnostic message).

For the binary encoding, things are trickier because the size of the header is implicit and the data sections aren't explicitly delimited. The parser simply works out ahead of time how large each data section will be (except for metadata, which is variable length but the number of metadata items are known ahead of time). The parser needs to know where the
attribute definitions start and where the attribute data starts...

@io7m

This comment has been minimized.

Show comment
Hide comment
@io7m

io7m Apr 30, 2017

Owner

What happens if a new format version adds a new type? Implementations won't know how to calculate the size of attribute data.

Owner

io7m commented Apr 30, 2017

What happens if a new format version adds a new type? Implementations won't know how to calculate the size of attribute data.

@io7m

This comment has been minimized.

Show comment
Hide comment
@io7m

io7m Jun 9, 2017

Owner

For the text encoding, assuming that a parser that supports major version m is receiving a file of version m.n for any n:

  1. For header parsing, ignore and log any unknown header commands as warnings.
  2. For body parsing, ignore and log any unknown body section commands as warnings, and suppress the printing of any warnings until the next recognized body section command.

For the binary encoding, switch to a chunked format. A chunk starts with an eight-octet magic number and an eight-octet size value for the chunk. Unrecognized chunks can simply be skipped (and logged as a warning).

Owner

io7m commented Jun 9, 2017

For the text encoding, assuming that a parser that supports major version m is receiving a file of version m.n for any n:

  1. For header parsing, ignore and log any unknown header commands as warnings.
  2. For body parsing, ignore and log any unknown body section commands as warnings, and suppress the printing of any warnings until the next recognized body section command.

For the binary encoding, switch to a chunked format. A chunk starts with an eight-octet magic number and an eight-octet size value for the chunk. Unrecognized chunks can simply be skipped (and logged as a warning).

@io7m

This comment has been minimized.

Show comment
Hide comment
@io7m

io7m Jun 9, 2017

Owner

For the text encoding, sections must be ended with an explicit end. This drastically simplifies parsing as a whole.

Owner

io7m commented Jun 9, 2017

For the text encoding, sections must be ended with an explicit end. This drastically simplifies parsing as a whole.

@io7m io7m added this to the 0.11.0 milestone Jun 10, 2017

io7m added a commit that referenced this issue Jun 11, 2017

Start redesigning formats for better versioning
This redesigns the text format such that:

Assuming that a parser that supports major version m is receiving a
file of version m.n for any n:

  1. For header parsing, ignore and log any unknown header commands
     as warnings.
  2. For body parsing, ignore and log any unknown body section commands
     as warnings, and suppress the printing of any warnings until the
     next recognized body section command.

Sections must be ended with an explicit end. This drastically
simplifies parsing as a whole, and allows skipping entire sections
when those sections are unrecognized.

Affects #41
@io7m

This comment has been minimized.

Show comment
Hide comment
@io7m

io7m Jun 11, 2017

Owner

Additionally, the declaration order of attribute data in binary files could/should be eliminated.

#46

Owner

io7m commented Jun 11, 2017

Additionally, the declaration order of attribute data in binary files could/should be eliminated.

#46

io7m added a commit that referenced this issue Jun 14, 2017

Continue redesigning formats for better versioning
This redesigns the binary format to use a sectioned or chunked format
similar to PNG. Sections begin with a size and a magic number. This
ensures that unrecognized data can be skipped for forwards and
backwards compatibility. Sections are 16-byte aligned: 32 or 64 byte
alignment is too wasteful.

This also changes schema and metadata identifiers to use reverse-DNS
64-character strings.

Affects #41
Fix #46
Fix #45

io7m added a commit that referenced this issue Jun 14, 2017

Change the syntax of metadata commands
This treats metadata commands in the text encoding as separate
sections, matching the form they take in the binary encoding.
It also fixes a padding bug in the serializer for metadata in the
binary encoding.

Affects #41

@io7m io7m closed this in 4aadbab Jun 14, 2017

io7m added a commit that referenced this issue Jun 14, 2017

Rename vertices command and start specifying
This provides a more complete specification of the new text format,
and adjusts the name of the vertices command to allow for better
forward compatibility.

Affects #41

io7m added a commit that referenced this issue Jun 15, 2017

Merge branch 'release/0.11.0'
Release: com.io7m.smfj 0.11.0
Code new: Implement version probing. (tickets: #42)
Code new: Add convenient SMFFilterCommandChecks class for developing filters.
Code change: Move SMFSchemaValidator into the API package as it defines the semantics of validation.
Code change: Make parsers use URIs instead of Paths for diagnostic messages. (tickets: #43)
Code fix: Fail in the correct way when failing to parse files during mesh processing. (tickets: #44)
Code change: Complete redesign of formats for forward and backwards compatibility. (tickets: #46, #45, #41)
Code change: Rename the "formats" command to "list-formats".

io7m added a commit that referenced this issue Jun 15, 2017

Merge tag 'com.io7m.smf-0.11.0' into develop
Release: com.io7m.smfj 0.11.0

Code new: Implement version probing. (tickets: #42)
Code new: Add convenient SMFFilterCommandChecks class for developing filters.
Code change: Move SMFSchemaValidator into the API package as it defines the semantics of validation.
Code change: Make parsers use URIs instead of Paths for diagnostic messages. (tickets: #43)
Code fix: Fail in the correct way when failing to parse files during mesh processing. (tickets: #44)
Code change: Complete redesign of formats for forward and backwards compatibility. (tickets: #46, #45, #41)
Code change: Rename the "formats" command to "list-formats".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment