Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File format specification #17

Draft
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@scy
Copy link
Owner

commented Jul 9, 2019

Starting to work on #8. WIP.

@scy scy added this to the 1.0 milestone Jul 9, 2019

@scy scy self-assigned this Jul 9, 2019

Since whitespace is removed from the end of each line (see below), Windows-style line endings (`\r\n` or `CRLF` or `U+000D` followed by `U+000A`) are implicitly supported as well.
Mac OS 9 style line endings (`CR` only) are not, though.

It is a good habit to end the file with a line terminator, but this is not required.

This comment has been minimized.

Copy link
@Zegnat

Zegnat Jul 12, 2019

If a file does not need to end with a line terminator, it means that lines (as defined by the previous paragraph) do not always end in a \n. The last line of the file will end at EOF instead.

Thinking about parsers (and writers alike) having to be simple, is there any reason not to put a strict requirement on the ending \n of a file? It would definitely make appending new lines a whole lot easier.

This comment has been minimized.

Copy link
@scy

scy Jul 12, 2019

Author Owner

The last line of the file will end at EOF instead.

Not "will" but "can".

is there any reason not to put a strict requirement on the ending \n of a file?

Yes: Some editors don't do it by default, and while I agree that it's good Unix convention to always end a file with a newline, I'm not sure I want to force this requirement on users who maybe don't have a Unix background at all or simply forgot to configure their editor in a certain way.

I would say that it doesn't complicate a parser significantly to accept a non-newline-terminated line at the end of the file. In fact I'm pretty sure most higher level languages will not require a terminating newline in their builtin "read file line by line" implementations.

But I agree that it makes a writer more complicated, because before appending a line, it has to check whether the file ends in a newline and, if not, append one. It's not horrendously complicated though: Open the file for reading and writing (instead of just for writing), seek to length minus one, read the byte, if it's not \n, add it.

Also, this will only be a problem if the file has been edited with both a tool (or editor) that doesn't enforce the final newline and a tool that thinks there should be one. In other words, if you always use a writer tool that ends each line with \n, you won't have any problems either.

TL;DR: I don't think that adding ~3 lines of code to a writer justifies eating the last line for people with poorly configured editors.

I'd nevertheless add a warning to my Python implementation that will be printed if the last line doesn't end in \n and suggest in the specification that other implementations should do the same. Is that a compromise you could live with?

This comment has been minimized.

Copy link
@Zegnat

Zegnat Jul 12, 2019

I think I am fine with all outcomes. Mostly nitpicking at spec language here as I dislike ambiguities.

I guess what I would do is rewrite the definition of a line to make it clear it can end with either \n or EOF, and then (if you want to go with RFC 2119 terminology) note that ending a file SHOULD (recommendation, rather than a MUST obligation) end on a \n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.