Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: buffer not flushed correctly #25631

Closed
theute opened this issue May 29, 2018 · 15 comments
Closed

os: buffer not flushed correctly #25631

theute opened this issue May 29, 2018 · 15 comments

Comments

@theute
Copy link

@theute theute commented May 29, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

I ran the code on multiple versions:
It works on v1.7.5 and v1.8.7
It fails on v1.9, v1.9.4, v1.9.6 and v1.10.2

Does this issue reproduce with the latest release?

Yes (1.10.2)

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/theute/.go"
GORACE=""
GOROOT="/home/theute/Tools/Go/latest"
GOTOOLDIR="/home/theute/Tools/Go/latest/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build923971487=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I'm writing to the USB device of my linux machine /dev/usb/lp0 to a printer. (I tried on my laptop and a raspberry)
The code is https://play.golang.org/p/7J8srsKSzSf

What did you expect to see?

It should have printed up to "test-11"

What did you see instead?

It stopped printing at "test-8" with no error.
If I compile with v1.7.5 or v1.8.7 then it prints everything

Note: If I add a 5s sleep at the end of the print method then it prints everything. But if I call the print method followed by a call to Sleep it will not help.

@ianlancetaylor ianlancetaylor changed the title Buffer not flushed correctly os: buffer not flushed correctly May 29, 2018
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 29, 2018

You aren't checking the error return of the Close method. Please do so, although I don't know that it will make any difference.

Do you see the same behavior from an identical C program?

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone May 29, 2018
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented May 30, 2018

Please post strace -f output too.

@theute

This comment has been minimized.

Copy link
Author

@theute theute commented May 31, 2018

So here is the code with the error check on Close (there is no error showing up) https://play.golang.org/p/9vEmEpSuXK_b
Here is the "strace -f" when it stops printing at "test-8":
https://pastebin.com/89mjEEvQ

I didn't try in C as I wouldn't remember how to do that ;)

@theute

This comment has been minimized.

Copy link
Author

@theute theute commented May 31, 2018

BTW by writing straight to the file it works (os.File.WriteString()), it's just through the buffer that it eats the end even though the code is executed and I see the iobuf.Buffered() growing as expected.
It sounds like the flushing is killed prematurely.

@kostix

This comment has been minimized.

Copy link

@kostix kostix commented May 31, 2018

May it be due to this change introduced in 1.9 IIUC?

I mean, from the strace output, it looks like the syscall on line 212

epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc42004fd6c <unfinished ...>

is operating on the FD 3, which is the same FD, write(2) output the data into;
so it looks like that FD gone through the poller somehow.

@theute

This comment has been minimized.

Copy link
Author

@theute theute commented May 31, 2018

The behaviour definitely changed from 1.9 so it's very possible

@opennota

This comment has been minimized.

Copy link

@opennota opennota commented May 31, 2018

[pid 26602] write(3, "\33@test-1\ntest-2\ntest-3\ntest-4\nte"..., 81 <unfinished ...>
[pid 26602] <... write resumed> ) = 81

Looks like all the 81 bytes were written.

@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented May 31, 2018

@ianlancetaylor, thoughts on how the runtime poller for files would affect this?

@kostix

This comment has been minimized.

Copy link

@kostix kostix commented May 31, 2018

I wonder how the strace output looks under Go < 1.9.

May it be that there it would not be written as a single chunk?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 31, 2018

I agree with @opennota : the strace output clearly shows that all 81 bytes were written to the file, and then the file is closed without error.

The effect of the change to use the runtime poller is that the file is put into nonblocking mode which shows up in the strace as fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK|O_LARGEFILE). Since this is an unusual file (/dev/usb/lp0) that could be related. However, that is not consistent with the fact that calling the WriteString method reportedly does work. The fact that the code uses bufio.NewWriter should be irrelevant, and, according to the strace, it is irrelevant. The bytes are definitely being written to the file descriptor, and the write system call is successfully writing all the bytes.

If it weren't for that discrepancy, I would guess that there is a bug in the kernel driver when the descriptor is in non-blocking mode: I would guess that it is incorrectly reporting that all the bytes were written, when only some were. It is also possible that in nonblocking mode the kernel queues the bytes for writing, returns before they are sent to the hardware, and then aborts the queued bytes when the descriptor is closed. However that too is not consistent with different behavior when using bufio.NewWriter.

The simple workaround may be to call the Fd method to put the descriptor back into blocking mode. But I don't know what the real problem is.

@kostix

This comment has been minimized.

Copy link

@kostix kostix commented Jun 1, 2018

@ianlancetaylor, do I understand you correctly that by saying

bug in the kernel driver when the descriptor is in non-blocking mode: <…> it is incorrectly reporting that all the bytes were written, when only some were.

you mean that it had to return EAGAIN or EWOULDBLOCK so that the Go runtime would put that FD to the poller and wait on it?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 1, 2018

I should make clear that I am only speculating, I haven't looked at the code.

And thinking about it a bit more, my speculation can't be right, since all the bytes are written out if the program waits a few seconds. So maybe the problem is that the close system call is aborting any queued bytes, when it should instead wait until they are written. But again I am only speculating. I don't know what is happening. But I do still suspect that there is some bug in the kernel driver.

@kostix

This comment has been minimized.

Copy link

@kostix kostix commented Jun 1, 2018

So, may be the answer is to make sure that the FD is put back into the blocking mode
(via fcntl(2) on Linux) right before calling close(2)?

The reasoning is as follows:
the open(2) manual page for Linux says:

O_NONBLOCK or O_NDELAY

When possible, the file is opened in nonblocking mode.
Neither the open() nor any subsequent operations on the file
descriptor
which is returned will cause the calling process to wait.

(Emphasis mine.) And this suggests close(2) is also non-blocking (though I admit
its manual page does not mention how it supposed to behave in this case).

The close(2) manual of POSIX
mentions O_NONBLOCK only in relation with the STREAMS subsystem and sockets;
its open(2) manual says that the call itself should not block and the behaviour of all
the other operations is "device-specific".


A quick test would be to copy the implementation of SetNonblock() from there and call it right before invoking the CloseFunc on the FD here.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 1, 2018

It seems to me that even if close is non-blocking, it should not abort writing any queued up data to the descriptor. The bytes should continue to flow out even after the descriptor is closed. For example, that is how network sockets work (as adjustable using SO_LINGER).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 27, 2018

Since the strace shows that all the bytes are written to the kernel, I'm very skeptical that there is anything we can do in Go to fix this. I'm going to close this issue. Please comment if you disagree.

@golang golang locked and limited conversation to collaborators Jun 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.