Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizing line endings in ixml inputs #192

Closed
ndw opened this issue Jul 25, 2023 · 3 comments · Fixed by #242
Closed

Normalizing line endings in ixml inputs #192

ndw opened this issue Jul 25, 2023 · 3 comments · Fixed by #242

Comments

@ndw
Copy link
Contributor

ndw commented Jul 25, 2023

In a comment on issue #176 , @johnwcowan writes:

I take a radical (and perhaps radically annoying) perspective, namely that the five line-end sequences #D, #A, #D#A, #85, #2028 never mean different things in text (binary data is another matter) and that the ixml "input hopper" should be responsible for blindly normalizing them all to #A. Consequently, rules with #D, #85, or #2028 in them would never match anything (and it is reasonable to provide a warning if one is seen). This would be performed just after encoding conversion and before matching.

This amounts to a claim that different line endings never have any differential semantics. I believe this to be true of all text, not just XML, and assuming it takes the burden of dealing with junk off both the rule author and the consumer of the XML output by an ixml processor, just as it is not necessary to write extra rules to deal with input encodings (which would be equivalent to treating all input as binary).

In short, we should stop distinguishing once and for all between ASR/KSR 33 (CRLF) and ASR 37 (LF) Teletypes. Their day is done.

The CG discussed this briefly today and decided it warrants further discussion.

Among the possible options are:

  1. Do nothing
  2. Leave line endings unchanged and suggest that implementations should provide an option to normalize line endings
  3. Normalize line endings by default (and suggest that implementations provide a way to disable this functionality)
@ndw
Copy link
Contributor Author

ndw commented Oct 23, 2023

Per an action I took to write up a proposal: https://lists.w3.org/Archives/Public/public-ixml/2023Oct/0021.html

@spemberton
Copy link
Member

spemberton commented Oct 24, 2023 via email

@johnwcowan
Copy link

The advantages of #A is that it has no other semantics (unless you are talking to a Model 33 TTY) and it is already used in plain text by most systems, the exceptions being the Internet and protocol transactions such as HTTP, FTP, SMTP, Gemini, etc.

In any case, this is not a "fourth option"; it is orthogonal to the choice between 2 and 3 (but not 1). As should be obvious by now, I support option 3 with #A.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants