New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BNF grammar for WARC-Target-URI and WARC-Profile is inconsistent with examples #23

Closed
ato opened this Issue Sep 17, 2015 · 4 comments

Comments

Projects
None yet
5 participants
@ato
Member

ato commented Sep 17, 2015

Sections 5.2 and 5.16 define the grammar for these fields as:

WARC-Record-ID = "WARC-Record-ID" ":" uri
WARC-Profile   = "WARC-Profile" ":" uri
WARC-Target-URI = "WARC-Target-URI" ":" uri

Section 4 defines uri as:

uri            = "<" <'URI' per RFC3986> ">"

However all examples in sections 12 and 13 do not include the "<" and ">" characters on the WARC-Target-URI and WARC-Profile fields. All other uri fields (WARC-Record-ID, WARC-Refers-To etc) in the examples include "<" and ">".

WARC/1.0
WARC-Type: revisit
WARC-Target-URI: http://www.archive.org/images/logoc.jpg
WARC-Date: 2007-03-06T00:43:35Z
WARC-Profile: http://netpreserve.org/warc/1.0/server-not-modified
WARC-Record-ID: <urn:uuid:16da6da0-bcdc-49c3-927e-57494593bbbb>
WARC-Refers-To: <urn:uuid:92283950-ef2f-4d72-b224-f54c6ec90bb0>
Content-Type: message/http
Content-Length: 226

Many (all?) implementations have adopted the form shown in the examples rather than strictly following the grammar.

@kris-sigur

This comment has been minimized.

Show comment
Hide comment
@kris-sigur

kris-sigur Sep 17, 2015

Member

Yes, current tools follow the examples as far as I can tell. The < > form is only used when dealing with UUIDs.

I suspect most tools would choke on something that is formatted as per the BNF.

Member

kris-sigur commented Sep 17, 2015

Yes, current tools follow the examples as far as I can tell. The < > form is only used when dealing with UUIDs.

I suspect most tools would choke on something that is formatted as per the BNF.

ato added a commit to ato/warc-specifications that referenced this issue Sep 17, 2015

Add record-id BNF grammar rule for consistency with examples
In the examples and in all popular implementations, URIs in the
WARC-Target-URL and WARC-Profile fields are not surrounded by
"<" and ">" characters.  This change makes the grammar consistent
with practice by removing "<" and ">" from the basic `uri` rule and
introducing a new `record-id` rule for the fields WARC-Record-ID,
WARC-Concurrent-To, WARC-Refers-To, WARC-Warcinfo-ID and
WARC-Segment-Origin-ID.

Fixes iipc#23

ato added a commit to ato/warc-specifications that referenced this issue Sep 18, 2015

@nclarkekb

This comment has been minimized.

Show comment
Hide comment
@nclarkekb

nclarkekb Oct 26, 2015

I always though the point was that target-uri and profile where most likely URLs that could be browsed to. And the rest were more likely UUIDs and hence surrounded by "<" / ">".
Both being URIs was more a technicallity.

nclarkekb commented Oct 26, 2015

I always though the point was that target-uri and profile where most likely URLs that could be browsed to. And the rest were more likely UUIDs and hence surrounded by "<" / ">".
Both being URIs was more a technicallity.

@saraaubry

This comment has been minimized.

Show comment
Hide comment
@saraaubry

saraaubry Nov 17, 2015

The following changes have been integrated in the revised ISO draft during the ISO working group meeting on November 16-17, 2015:

in section 4 file and record model, change the definition of uri and add a note:
uri = <'URI' per RFC3986>

NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC3986> ">". This rule has been changed to meet requests from implementers.

saraaubry commented Nov 17, 2015

The following changes have been integrated in the revised ISO draft during the ISO working group meeting on November 16-17, 2015:

in section 4 file and record model, change the definition of uri and add a note:
uri = <'URI' per RFC3986>

NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC3986> ">". This rule has been changed to meet requests from implementers.

@saraaubry

This comment has been minimized.

Show comment
Hide comment
@saraaubry

saraaubry Dec 7, 2017

Included in WARC 1.1

saraaubry commented Dec 7, 2017

Included in WARC 1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment