Skip to content

Conversation

MatmaRex
Copy link
Contributor

It may be counterintuitive, but using PHP_EOL is usually wrong.

Just find-and-replace all occurrences. This resolves 80 test failures on Windows (out of 410).

Windows defines PHP_EOL to "\r\n" (also known as CRLF), while all other platforms define it to "\n" (a.k.a. LF). Therefore this won't break any code that works on any currently supported platform.

Some of this code may need to handle both CRLF and LF. I will look into it on a case-by-case basis as I work through the failing tests, or if I see issues in practice in an editor.


Why is using PHP_EOL is usually wrong?

When reading input

When reading input and trying to split it in lines, you can't assume the line endings based on the OS. Windows users often work with files using LF line endings, because that's usually how the source code of cross-platform projects is stored, and every text editor on Windows supports them, even the built-in Windows Notepad (since 2018: https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/). Similarly, one could have files with CRLF line endings on Linux, if they were originally created by someone on Windows.

Git on Windows by default enables the core.autocrlf setting to replace LF in source code with CRLF when checking out and back when checking in, but these days it's often recommended to change it, e.g. https://jvns.ca/blog/2024/02/16/popular-git-config-options/#and-more (it used to be a good default around the time of Windows XP). If a tool replaces CRLF with LF, that's not problematic regardless of this setting (lone LF is accepted when checking in), but if it replaces LF with CRLF, that will cause line-ending diffs if that setting is disabled.

By the way, Language Server Protocol requires all of CRLF, lone LF, and lone CR to be treated as line breaks, regardless of platform: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

When writing output

When writing output consisting of multiple lines, LF line endings will usually be converted to CRLF if needed (e.g. when writing to console, or to a file opened in text mode). If they aren't converted when writing to a file, that's not a problem, because (as explained above) every text editor on Windows supports LF line endings too, and it's handled well by Git.

Using different line endings internally often causes test failures that only occur on Windows, unless you also define multiple variants of expected output.

It may be counterintuitive, but using PHP_EOL is usually wrong.

Just find-and-replace all occurrences. This resolves 80 test failures
on Windows (out of 410).

Windows defines PHP_EOL to "\r\n" (also known as CRLF), while all
other platforms define it to "\n" (a.k.a. LF). Therefore this won't
break any code that works on any currently supported platform.

Some of this code may need to handle both CRLF and LF. I will look
into it on a case-by-case basis as I work through the failing tests,
or if I see issues in practice in an editor.

----

Why is using PHP_EOL is usually wrong?
======================================

When reading input
------------------
When reading input and trying to split it in lines, you can't assume
the line endings based on the OS. Windows users often work with files
using LF line endings, because that's usually how the source code of
cross-platform projects is stored, and every text editor on Windows
supports them, even the built-in Windows Notepad (since 2018:
https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/).
Similarly, one could have files with CRLF line endings on Linux, if
they were originally created by someone on Windows.

Git on Windows by default enables the `core.autocrlf` setting to
replace LF in source code with CRLF when checking out and back when
checking in, but these days it's often recommended to change it, e.g.
https://jvns.ca/blog/2024/02/16/popular-git-config-options/#and-more
(it used to be a good default around the time of Windows XP).
If a tool replaces CRLF with LF, that's not problematic regardless of
this setting (lone LF is accepted when checking in), but if it
replaces LF with CRLF, that will cause line-ending diffs if that
setting is disabled.

By the way, Language Server Protocol requires all of CRLF, lone LF,
and lone CR to be treated as line breaks, regardless of platform:
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

When writing output
--------------------
When writing output consisting of multiple lines, LF line endings
will usually be converted to CRLF if needed (e.g. when writing to
console, or to a file opened in text mode). If they aren't converted
when writing to a file, that's not a problem, because (as explained
above) every text editor on Windows supports LF line endings too,
and it's handled well by Git.

Using different line endings internally often causes test failures
that only occur on Windows, unless you also define multiple variants
of expected output.
@dantleech dantleech merged commit c428fcf into phpactor:master Feb 28, 2024
@dantleech
Copy link
Collaborator

thanks for this, some effort has been taken to account for \r|\n|\r\n but there are for sure more places that don't.

@MatmaRex MatmaRex deleted the windows-eol branch February 29, 2024 09:32
MatmaRex added a commit to MatmaRex/test-utils that referenced this pull request Mar 1, 2024
dantleech pushed a commit to phpactor/test-utils that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants