Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples vs. syntax spec inconsistencies #2

Closed
nabijaczleweli opened this issue Feb 27, 2019 · 2 comments
Closed

Examples vs. syntax spec inconsistencies #2

nabijaczleweli opened this issue Feb 27, 2019 · 2 comments

Comments

@nabijaczleweli
Copy link

Hi! I've recently been working on a Rust implementation for this, and found a few corner cases where the syntax from the README didn't match up with the examples:

The straightforwardest one is probably the duplicate-despite-quotes.hrx part of example/invalid/duplicates.hrx – it looks like (syntax says naught, #1) quotes are no longer specialcased?

The next one I ran into was duplicate-files.hrx from that same file – that archive should, according to the spec, (a) be valid and (b) contain <======> file\n.
I think so due to the following: contents is defined as "any sequence of characters that does not include U+000A LINE FEED followed immediately by boundary", and file as boundary " "+ path newline body?.
Now, given a buffer containing

<======> file
A      BCD  EF
<======> file

We can see, that the AB span matches boundary, C – the spaces, DE – path, and F – newline. What is left? To match the optional body, which consists of the following:

<======> file

Note, how this chunk doesn't start with U+000A LINE FEED, despite the line starting with boundary. This means, that the file contents continue until EOF.

The third mismatched example plagues example/empty-file.hrx. Assuming the same symbols as before, we get (after the first comment)

<===> file1
A   BCD   EF
<===>
So is this one.
<===> file2

thereby hitting the first LF+boundary sequence on the line declaring file2 (my parser returns {file1: {cmt: "This file is empty.", ctnt: "<===>\nSo is this one."}, file2: { cmt: null, ctnt: "" }}, which I feel is correct, going solely by the syntax?).

My hunch as to why these weren't noticed earlier is due to the usage of splitting parsers (e.g. in hrx.js and hrx.py), which probably handle these examples as expected.

I'd be more than happy to submit a PR addressing these issues, if deemed valid :)

@rebeccajae
Copy link
Contributor

I just want to be on the same page here with those other two parsers, just so I can catch any significant spec-violating they're doing. I tried to make them to be a bit more tolerant of mistakes.

So if I'm understanding correctly, this tends to show up on empty files. They're looking for a LF then a boundary to end the file, but there's no LF, there's just a boundary.

body is optional, however determining whether body is file contents or a continuation of the archive (in the case of empty files)

I feel that the spec should cover something such that a content block shouldn't begin with a boundary of the current size the parser is processing, which is probably why my parsers didn't catch it.

@nex3
Copy link
Collaborator

nex3 commented Feb 27, 2019

Good catches! They all seem right, and @rebeccajae is correct that the spec should forbid a contents production that starts with a boundary. I'll put out a fix.

@nex3 nex3 closed this as completed in #3 Feb 27, 2019
nabijaczleweli added a commit to nabijaczleweli/hrx.rs that referenced this issue Mar 2, 2019
nabijaczleweli added a commit to nabijaczleweli/hrx.rs that referenced this issue Mar 2, 2019
…th source material. Uncomment empty-file example

Ref: google/hrx#2
Ref: google/hrx#3
Ref: #1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants