Description
Hi! I've recently been working on a Rust implementation for this, and found a few corner cases where the syntax from the README didn't match up with the examples:
The straightforwardest one is probably the duplicate-despite-quotes.hrx
part of example/invalid/duplicates.hrx
– it looks like (syntax says naught, #1) quotes are no longer specialcased?
The next one I ran into was duplicate-files.hrx
from that same file – that archive should, according to the spec, (a) be valid and (b) contain <======> file\n
.
I think so due to the following: contents
is defined as "any sequence of characters that does not include U+000A LINE FEED followed immediately by boundary
", and file
as boundary " "+ path newline body?
.
Now, given a buffer containing
<======> file
A BCD EF
<======> file
We can see, that the AB span matches boundary
, C – the spaces, DE – path
, and F – newline
. What is left? To match the optional body
, which consists of the following:
<======> file
Note, how this chunk doesn't start with U+000A LINE FEED, despite the line starting with boundary
. This means, that the file contents continue until EOF.
The third mismatched example plagues example/empty-file.hrx
. Assuming the same symbols as before, we get (after the first comment)
<===> file1
A BCD EF
<===>
So is this one.
<===> file2
thereby hitting the first LF+boundary sequence on the line declaring file2
(my parser returns {file1: {cmt: "This file is empty.", ctnt: "<===>\nSo is this one."}, file2: { cmt: null, ctnt: "" }}
, which I feel is correct, going solely by the syntax?).
My hunch as to why these weren't noticed earlier is due to the usage of splitting parsers (e.g. in hrx.js
and hrx.py
), which probably handle these examples as expected.
I'd be more than happy to submit a PR addressing these issues, if deemed valid :)