New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: output CR LF for \n on Windows #28822

Open
jftuga opened this Issue Nov 16, 2018 · 47 comments

Comments

Projects
None yet
@jftuga

jftuga commented Nov 16, 2018

What version of Go are you using (go version)?

$ go version
go version go1.11.1 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\john\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\john\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\john\AppData\Local\Temp\go-build222202374=/tmp/go-build -gno-record-gcc-switches

What did you do?

package main
import "fmt"
func main() {
   fmt.Println("test")
}

What did you expect to see?

C:\>test.exe | od -tx1
0000000 74 65 73 74 0d 0a
(notice the Carriage-Return character = 0x0d)

According to https://en.wikipedia.org/wiki/Newline#Representation

A newline under Windows is represented by CR LF, which is 0x0d 0x0a

Please consider adding 0x0d to fmt.Println, Printf, etc.

What did you see instead?

C:\>test.exe | od -tx1
0000000 74 65 73 74 0a
(Carriage Return is missing, but Linefeed is there
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 16, 2018

People seem to have managed so far without this change on Windows. Changing this now would almost certainly break people.

@ianlancetaylor ianlancetaylor changed the title from The \n in fmt.Println and Printf does not output CR LF under Windows to os: output CR LF for \n on Windows Nov 16, 2018

@jftuga

This comment has been minimized.

jftuga commented Nov 16, 2018

Can this be changed for Go 2.0?

The longer you wait to make this change, the worse it will be.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Nov 16, 2018

The longer you wait to make this change, the worse it will be.

It's been nine years. I don't think the world is going to end.

@as

This comment has been minimized.

Contributor

as commented Nov 16, 2018

And what will "\r\n" do?

@jftuga

This comment has been minimized.

jftuga commented Nov 16, 2018

It will be useful when redirecting a command line program's output to a file.

For example:

C:> dir > files.txt
C:\> myGoPgm.exe > output.txt

The dir command as well as any other command-line program that has ever run under Windows will use \r\n for new lines. However, myGoPgm that uses fmt.Println will only output \n for newlines. Since the expected behavior for Windows command-line programs is \r\n, it would be nice if Go programs did the same thing under Windows as well, but still use just \n for Linux and MacOS.

See also: https://en.wikipedia.org/wiki/Newline#Representation

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 16, 2018

The problem is that sometimes people really do want just \n. It would be a disaster to always turn \n into \r\n; that would break all code that writes a binary file. So it would be necessary to do something like what C does on Windows, and add a O_BINARY flag to pass to os.OpenFile. Since this means that code has to change regardless, much better to require people who care to add an io.Writer that does the transformation for them, such as https://godoc.org/github.com/andybalholm/crlf.

@mattn

This comment has been minimized.

Member

mattn commented Nov 16, 2018

This is my indivisually opinion. I have ported many UNIX applications to Windows so far. In many cases, the tests are failed due to the difference in the line feed code, people had frustrations after ported to Windows too. After I met Go where the line feed code is "\n", I feel this problem seems to became fewer, and I noticed the portability of Go is very nice. Using Go, we were able to port UNIX applications to Windows with just small changes. Recently, Notepad on Windows10 support to open a file that have UNIX line code. I feel that the time will come for Windows to abandon "\r\n" in same as the Mac abandoned "\r".

https://blogs.msdn.microsoft.com/commandline/2018/05/08/extended-eol-in-notepad/

I use bufio.Scanner when reading the output of a command that not known the line feed code.

@as

This comment has been minimized.

Contributor

as commented Nov 16, 2018

@jftuga I am not asking you what \r\n is. Rather, I am asking you what you expect fmt.Fprintln(fd, "\r\n\r\n\n\r\r\n") to do on a Windows box.

Side note: Windows programs write \r\n, but most of them do not break when given \n on the consumer side. Having used windows cmd.exe for longer than I can tolerate, my personal list of programs that break on \n is very short: the interpreter of batch files, for example, is such a program.

Notepad, as mentioned above, was one application that did not work. However, wordpad (write.exe) does. Native command line utilities work too.

@jftuga

This comment has been minimized.

jftuga commented Nov 16, 2018

Ideally, Fprintf("\n") would output \r\n on Windows with the exception being if the \n is immediately preceded by a \r then the complete output would be \r\n and not \r\r\n

But I am really not sure about this. Originally, I was only thinking about Println and Printf and not Fprintln

@forskning

This comment has been minimized.

forskning commented Nov 16, 2018

https://nurmi-labs.blogspot.com/2016/11/git.html

Git-2.10.0-32-bit.exe

includes dos2unix.exe and unix2dos.exe

https://nurmi-labs.blogspot.com/2015/11/bcc55.html

MSYS-1.0.11.exe

contains the awk scripts u2d and d2u

https://nurmi-labs.blogspot.com/2015/10/awk.html

link to UNIX awk

https://nurmi-labs.blogspot.com/2017/05/9pvfs.html

links to precompiled watcom-1.9 and -2.0 awk versions

whether the d2u or u2d (above) scripts

work using the UNIX and/or the watcom-1.9 and -2.0 awk versions

I am uninformed on

@ALTree ALTree added this to the Go1.13 milestone Nov 16, 2018

@jftuga

This comment has been minimized.

jftuga commented Nov 16, 2018

Can we simply consider the case of fmt.Println for now? If you are using fmt.Fprintf then you have explicit control. Would it be possible for just fmt.Println to use CRLF under Windows?

@forskning

This comment has been minimized.

forskning commented Nov 16, 2018

I am used to dealing with BOM, CRLF, executable scripts, and file permissions, after having used different Windows subsystems, win32, Interix, and WSL,

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 16, 2018

@jftuga Your suggestion is unfortunately not as simple as it sounds. fmt.Println(...) is exactly the same as fmt.Fprintln(os.Stdout, ...). os.Stdout is a global variable that programs can and do change. So we can't attribute special behavior to fmt.Println; we can only attribute special behavior to the default value of os.Stdout. That leads us straight back to O_BINARY.

If we adopted #13473 (for Go 2) then it would be at least feasible to wrap the default values of os.Stdout and os.Stderr with an io.Writer that turned \n into \r\n. I do not know whether that would be a good idea in the long run.

@forskning

This comment has been minimized.

forskning commented Nov 16, 2018

From Emacs' GUI frontend one can change the end-of-line style on an open file via the second button from the left on the modeline.

Vim is now 27 years old (according to Bram Moolenaar).

Help poor children in Uganda!

I presume the GUI frontend, gvim.exe, likewise as the Linux version, includes in the menu's Edit dropdown, an entry for File Settings, wherein one navigates to File Format, which opens a popup from where one can 'Select format for writing the file', i.e., Unix, Dos, and Mac, respectively LF, CRLF, and CR.

https://nurmi-labs.blogspot.com/2015/10/regexp.html

9pm051031.zip

BOMs if present will show up on the first line of flat files running 'cat filename' using 9term.exe, and can be edited out using sam.exe, using the backspace key.

See also: https://github.com/nurmi-labs/9pm

@jftuga

This comment has been minimized.

jftuga commented Nov 16, 2018

@ianlancetaylor

If we adopted #13473 (for Go 2) then it would be at least feasible to wrap the default values of os.Stdout and os.Stderr with an io.Writer that turned \n into \r\n. I do not know whether that would be a good idea in the long run.

I like this idea for Go 2. What is the best way to determine if this is a good long-term idea or not?

@as

This comment has been minimized.

Contributor

as commented Nov 16, 2018

What programs (other than notepad and the batch interpreter) have trouble with pure line feeds on the consumer side? I feel like this is a step back for Go on Windows. It would break many Go ports of unix tools that have deterministic output across operating systems.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 16, 2018

To be honest, I think it's unlikely that we would make this change in Go 2. The argument in favor would have to demonstrate workflows that do not work today but would start working if we made the change. It's not enough to show that Go programs do not work like Windows programs; we need to know what is actually broken by that fact.

@mvdan

This comment has been minimized.

Member

mvdan commented Nov 16, 2018

Here's an argument against modifying the behavior of os.Stdout to handle newlines; what if a program supports writing binary data to os.Stdout? I can think of a handful of programs that can print raw data to a file or to standard output. I imagine those would silently break, if the binary data happens to contain a newline.

@forskning

This comment has been minimized.

forskning commented Nov 17, 2018

https://github.com/golang/protobuf
Go support for Google's protocol buffers

Would the above be relevant in regards to the previous comment?

@as

This comment has been minimized.

Contributor

as commented Nov 17, 2018

encoding/gob moreso

@davecheney

This comment has been minimized.

Contributor

davecheney commented Nov 17, 2018

@alexbrainman

This comment has been minimized.

Member

alexbrainman commented Nov 18, 2018

If anyone cares, I support @ianlancetaylor

People seem to have managed so far without this change on Windows. Changing this now would almost certainly break people.

and @mvdan argument

what if a program supports writing binary data to os.Stdout

We need to see the problem with what happens now, before we consider changing it.

We should not just follow what https://en.wikipedia.org/wiki/Newline#Representation says.

I also looked to see what gcc does to

printf("a\n");

(it outputs "a\r\n")

and to

printf("a\r\n");

(it outputs "a\r\r\n").

Still that does not sway me towards doing the same in Go.

Alex

@forskning

This comment has been minimized.

forskning commented Nov 18, 2018

Another argument against changing the code might be maintaining consistency across Windows environment subsystems, win32/win64 and WSL.

While the cygnus port (a posix layer above win32/win64) isn't a supported platform for golang, cygwin occasionally shows up in golang issues and/or threads, however I would point out that, IIRC, I.L.T. wrote in summer of 2012 that "...cygwin binaries are considerably slower than Windows binaries".

Google Code Archive
https://code.google.com/archive/p/golang-on-cygwin/source/default/source

I'd also like to see this issue closed.

@creker

This comment has been minimized.

creker commented Nov 18, 2018

From my experience working with systems with different line endings, any solution that tries to be clever and produce different line endings depending on some logic, creates more problems that it solves. Every one of them has different logic, every piece of code that accepts line endings have different logic. In the end you almost always end up with some glue code that ties incompatible systems together.

I vote we leave Go as is. Being opinionated is good in cases like that. At least we have consistent ecosystem that works the same everywhere. If some ancient piece of code breaks because of that, it's better to fix it or write some glue code. More reasons to write your programs properly from the beginning.

@kardianos

This comment has been minimized.

Contributor

kardianos commented Nov 18, 2018

As a long time Go, Windows, and Linux user, I would view this as disastrous. I do not like programs that try to be smart about line endings. This is especially disastrous when you try to have some type of reproducible output.

This type of logic inevitably must try to distinguish between "text" and "binary" modes. I remember old FTP programs where if you downloaded a program or image in text mode it would corrupt it due to this feature.

I believe this issue should be closed as won't fix.

@robaho

This comment has been minimized.

robaho commented Nov 19, 2018

You may want to review how Java has handled this from its inception. Just define a platform constant LineSeperator then you can create TextReader/Writer types that perform bidirectional encoding as needed. They developer can wrap the standard os.Stdout as needed. Seems the only easy way to not break backward compatibility.

@forskning

This comment has been minimized.

forskning commented Nov 20, 2018

That would be predating D. Cutler's work on the Windows NT kernel; thereafter in MS promotional literature had been emphasised the layering of environment subsystems on top of the Windows kernel.

@forskning

This comment has been minimized.

forskning commented Nov 21, 2018

Possibly if the below is applicable some PowerShell content on a wiki page?

I'm not familiar with the utilisation of powershell.exe, but presuming it outputs a default end-of-line style + BOM for unicode, then code could be written so as to result in other than such presumed defaults.

Separately, as R.E. mentioned Java, maybe the Go community would be interested in J. Garcia's jPowerShell (a GitHub repository).

@forskning

This comment has been minimized.

forskning commented Nov 23, 2018

from powershell.exe running test.exe without the pipe and redirecting to file created a UTF-16 file

About Redirection

possibly the od from the pipe is from MSYS2 which I did not have installed on a Windows 10 machine

@forskning

This comment has been minimized.

forskning commented Nov 23, 2018

copying the output file to a Windows XP machine with Git-2.10.0-32-bit.exe installed on it

maybe the result of the 'file' utility isn't precise, 0d 0a indicates it's CRLF

% file ps_out
ps_out: Little-endian UTF-16 Unicode text, with CR line terminators
% cat ps_out | od -tx1
0000000 ff fe 74 00 65 00 73 00 74 00 0d 00 0a 00
0000016
%

@robaho

This comment has been minimized.

robaho commented Nov 23, 2018

@forskning not sure what you are trying to say...

@kardianos kardianos closed this Nov 23, 2018

@mvdan

This comment has been minimized.

Member

mvdan commented Nov 23, 2018

I'll assume that @kardianos closed this by mistake.

@mvdan mvdan reopened this Nov 23, 2018

@forskning

This comment has been minimized.

forskning commented Nov 23, 2018

Consider exploring on a Wiki page using PowerShell to output alternate end-of-line style plus BOM and minus BOM, for the win32/win64 environment subsystem, if that really isn't satisfactory, then revisit the current issue.

@jftuga

This comment has been minimized.

jftuga commented Nov 23, 2018

OP here. I just want to thank everyone for considering this change. I know it would/could be a big change. Would it be possible to make this a user-defined setting within the fmt package? The setting could be enabled by a function that changes the output of \n. By default, the newline behavior is the same. However, if you were to call this function, then \n would output \r\n on Windows platforms when one of the fmt.Print functions is invoked. What do you think?

fmt.useNativeNewline(true)

@forskning

This comment has been minimized.

forskning commented Nov 24, 2018

the following comment addressed a question, which has remained unanswered

#28822 (comment)

@jftuga

This comment has been minimized.

jftuga commented Nov 24, 2018

@forskning: I will work on a list of programs and let you know my results tomorrow.

@mattn

This comment has been minimized.

Member

mattn commented Nov 24, 2018

I prefer to add implementation of Writer to write CR LF when write LF instead of add fmt.useNativeNewline.

ex: https://github.com/mattn/go-textwriter

@forskning

This comment has been minimized.

forskning commented Nov 24, 2018

I've sent an enquiry to someone from the 'PowerShell Team' (a GitHub organization) if they would be inclined chip in.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 24, 2018

We are definitely not going to add a setting to the fmt package. Sorry.

@jftuga

This comment has been minimized.

jftuga commented Nov 25, 2018

I created a text file that contained just LF and compared it to a file that had CRLF endings.

I opened the LF file. MS Word 2016 & Excel read the file OK, but when it saved the file, both programs converted LF to CRLF. I don't see a problem with this - just an observation. VS Code, Notepad++, Sublime Text, Visual Studio 2017 had no problems opening/saving files (as expected). My version of the Windows 10 Notepad did not work properly, but as previously mentioned the very latest version of the Windows 10 fixes this.

Batch files, using cmd.exe, definitely have a problem with just LF endings. I would see this error when reading the the LF file, but not the CRLF file: The syntax of the command is incorrect.

I did some limited testing with PowerShell and it seemed OK with just LF, but I don't know enough about PS to make a definitive statement one way or the other. I only tested Get-Content and [System.IO.File]::ReadLines()

In summary, as long as you are not using cmd.exe, LF seems to be OK an there may or may not be issues with PowerShell. In my case, I do use cmd and noticed this problem.

@forskning

This comment has been minimized.

forskning commented Nov 25, 2018

What about mintty.exe?

@forskning

This comment has been minimized.

forskning commented Nov 25, 2018

runemacs.exe

M-x eshell

outputs CRLF

% cat eshell_out | od -tx1
0000000 74 65 73 74 0d 0a
0000006
%

@mattn

This comment has been minimized.

Member

mattn commented Nov 25, 2018

I want to know about the original problem. Why output of LF is not good for you? And how many cases that is higher priority that should be CRLF than LF? IMHO, in many cases, CRLF that is needed is fewer than LF. At least for Go language, fmt.Println must work as same on all environments since it should be cross compiled with same code. If CRLF is needed, the programmer should use CRLF intentionally, I think.

@forskning

This comment has been minimized.

forskning commented Nov 25, 2018

Looks like the solution is to learn Elisp or PowerShell coding.

I wouldn't mind if this issue stayed open a few weeks to see if any PowerShell users added comments.

@forskning

This comment has been minimized.

forskning commented Nov 27, 2018

#28822 (comment)

If anyone cares, the bug in the 'file' command has been fixed.

file/file@6d90cbf
Avoid over-trimming UCS16 text, and ending up losing the last character.
committed Nov 27, 2018

@jftuga

This comment has been minimized.

jftuga commented Dec 6, 2018

This more concisely describes potential issues than what I previously posted:

https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats

@forskning

This comment has been minimized.

forskning commented Dec 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment