New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARC 1.1: Introduce record-id BNF grammar rule for consistency with examples #24

Closed
wants to merge 1 commit into
base: gh-pages
from

Conversation

Projects
None yet
4 participants
@ato
Member

ato commented Sep 17, 2015

In the examples and in all popular implementations, URIs in the WARC-Target-URL and WARC-Profile fields are not surrounded by "<" and ">" characters. This change makes the grammar consistent with practice by removing "<" and ">" from the basic uri rule and introducing a new record-id rule for the fields WARC-Record-ID, WARC-Concurrent-To, WARC-Refers-To, WARC-Warcinfo-ID and WARC-Segment-Origin-ID.

Fixes #23

Add record-id BNF grammar rule for consistency with examples
In the examples and in all popular implementations, URIs in the
WARC-Target-URL and WARC-Profile fields are not surrounded by
"<" and ">" characters.  This change makes the grammar consistent
with practice by removing "<" and ">" from the basic `uri` rule and
introducing a new `record-id` rule for the fields WARC-Record-ID,
WARC-Concurrent-To, WARC-Refers-To, WARC-Warcinfo-ID and
WARC-Segment-Origin-ID.

Fixes #23
@kris-sigur

This comment has been minimized.

Show comment
Hide comment
@kris-sigur

kris-sigur Sep 17, 2015

Member

Makes sense to me.

I wonder if it is appropriate to include some kind of "errata" as well to address how this was mishandled in the previous standard?

Member

kris-sigur commented Sep 17, 2015

Makes sense to me.

I wonder if it is appropriate to include some kind of "errata" as well to address how this was mishandled in the previous standard?

@anjackson

This comment has been minimized.

Show comment
Hide comment
@anjackson

anjackson Sep 17, 2015

Member

I added a Document History section with this kind of thing in mind, but maybe a dedicated Errata bit would be better?

https://github.com/iipc/warc-specifications/blob/gh-pages/specifications/warc-format/warc-1.1/index.md#document-history

Member

anjackson commented Sep 17, 2015

I added a Document History section with this kind of thing in mind, but maybe a dedicated Errata bit would be better?

https://github.com/iipc/warc-specifications/blob/gh-pages/specifications/warc-format/warc-1.1/index.md#document-history

@ato

This comment has been minimized.

Show comment
Hide comment
@ato

ato Sep 18, 2015

Member

My experience in this area is very limited, but in most of the standards I have read the errata is a separate document associated with the version containing the error. eg #25

Revisions I've seen note changes if there are compatibility concerns in a "Changes since 1.0" section or just inline where the relevant item is discussed. For example:

In version 1.0 of the WARC standard the uri grammar rule was defined incorrectly with respect to the examples in the specification and with common implementations. For compatiblity implementations may choose to accept but should never emit URIs surrounded by '<' and '>' in the WARC-Target-URL and WARC-Profile fields.

@anjackson, should I add a document history entry to this pull request? I'd be happy to do so. I wasn't sure if it would cause problems when merging and whether the date should refer to now or the date of merging.

Member

ato commented Sep 18, 2015

My experience in this area is very limited, but in most of the standards I have read the errata is a separate document associated with the version containing the error. eg #25

Revisions I've seen note changes if there are compatibility concerns in a "Changes since 1.0" section or just inline where the relevant item is discussed. For example:

In version 1.0 of the WARC standard the uri grammar rule was defined incorrectly with respect to the examples in the specification and with common implementations. For compatiblity implementations may choose to accept but should never emit URIs surrounded by '<' and '>' in the WARC-Target-URL and WARC-Profile fields.

@anjackson, should I add a document history entry to this pull request? I'd be happy to do so. I wasn't sure if it would cause problems when merging and whether the date should refer to now or the date of merging.

@ato ato changed the title from Introduce record-id BNF grammar rule for consistency with examples to WARC 1.1: Introduce record-id BNF grammar rule for consistency with examples Sep 18, 2015

@anjackson anjackson modified the milestone: The WARC Format 1.1 Oct 20, 2015

@saraaubry

This comment has been minimized.

Show comment
Hide comment
@saraaubry

saraaubry Nov 17, 2015

The following changes have been integrated in the revised ISO draft during the ISO working group meeting on November 16-17, 2015:

in section 4 file and record model, change the definition of uri and add a note:
uri = <'URI' per RFC3986>

NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC3986> ">". This rule has been changed to meet requests from implementers.

saraaubry commented Nov 17, 2015

The following changes have been integrated in the revised ISO draft during the ISO working group meeting on November 16-17, 2015:

in section 4 file and record model, change the definition of uri and add a note:
uri = <'URI' per RFC3986>

NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC3986> ">". This rule has been changed to meet requests from implementers.

@saraaubry

This comment has been minimized.

Show comment
Hide comment
@saraaubry

saraaubry Dec 7, 2017

Included in WARC 1.1

saraaubry commented Dec 7, 2017

Included in WARC 1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment