diff --git a/Chapters/4.Syntax.md b/Chapters/4.Syntax.md index 2fc1d5b..98240aa 100644 --- a/Chapters/4.Syntax.md +++ b/Chapters/4.Syntax.md @@ -39,8 +39,9 @@ the following grammar: ::= "visit" "=" ; ::= "anchor" "=" ; ::= "path" "=" ; - ::= "lines" "=" ["-" ] ; - ::= + ; + ::= "lines" "=" | "bytes" "=" ; + = ["-" ] ; + ::= + ; ::= (* RFC 3987 IRI *) ::= (* RFC 3987 absolute path *) ``` diff --git a/Chapters/6_Qualified_identifiers.md b/Chapters/6_Qualified_identifiers.md index 2592903..d74f741 100644 --- a/Chapters/6_Qualified_identifiers.md +++ b/Chapters/6_Qualified_identifiers.md @@ -18,6 +18,7 @@ The following *context qualifiers* are available: A "line" in the context of a file content refers to a sequence of characters that ends with a line break. This line can contain text, code, or any other form of data. In this specification, the line break is the ASCII LF character. The "lines" qualifier allows to designate a line range inside a content. The range can be a single line number, or a pair of line numbers separated by the ASCII `-` character. +Line numbers start from 1, and range is inclusive, i.e. the fragment includes both the lines numbered as start and end of the range. For example, [`swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;lines=9-15`](https://archive.softwareheritage.org/swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;lines=9-15) designates the function `generate_intput_stream` that is found at lines 9 to 15 of the *content* with core SWHID `swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b`. @@ -25,6 +26,22 @@ designates the function `generate_intput_stream` that is found at lines 9 to 15 Notice that the notion of "line number" is not always meaningful: the content may be a binary file, or a file that uses non standard line termination character(s). +### 6.1.2 Bytes qualifier + +To overcome the limitations of the lines qualifier, the bytes qualifier allows +to designate a byte range inside a content. The range can be a single byte number, or a pair of byte numbers separated by `-`. +Byte numbers start from 0, and range is inclusive, i.e. the fragment includes both the bytes numbered as start and end of the range. +If the range is a single byte number, it designates the byte at that specific position. + +For example, `swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;bytes=154-315` +designates the same function `generate_intput_stream` as in the example above, but +does not rely on any convention about line numbers. + +### 6.1.3 Bytes and line qualifiers are mutually exclusive + +The `bytes` and `lines` qualifiers are mutually exclusive: a valid SWHID MUST not contain both qualifiers. +A conformant implementation MAY accept a SWHID that violates this constraint, by ignoring the `lines` qualifier when the `bytes` qualifier is present. + ## 6.2 Context qualifiers ### 6.2.1 Origin qualifier @@ -75,7 +92,7 @@ its full state had the SWHID core identifier `swh:1:snp:d7f1b9eb7ccb596c2622c478 We recommend to equip identifiers meant to be shared with as many qualifiers as possible. While qualifiers may be listed in any order, it is good practice to present them in the following order: -`origin`, `visit`, `anchor`, `path` and `lines`. Redundant information +`origin`, `visit`, `anchor`, `path`, `lines` or `bytes`. Redundant information should be omitted: for example, if the *visit* is present, and the *path* is relative to the snapshot indicated there, then the *anchor* qualifier is superfluous; similarly, if the *path* is empty, it may be