Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: broken sendfile on SmartOS/Solaris? large files garbled when sent over network #13892

Closed
doublerebel opened this issue Jan 9, 2016 · 17 comments

Comments

Projects
None yet
6 participants
@doublerebel
Copy link

commented Jan 9, 2016

What version of Go are you using (go version)? go version go1.5.1 solaris/amd64
What operating system and processor architecture are you using? SunOS 5.11 joyent_20141030T081701Z i86pc i386 i86pc Solaris, SunOS 5.11 joyent_20151002T091442Z i86pc i386 i86pc Solaris
What did you do? Retrieved a >250K file over http from a golang server
What did you expect to see? The correct contents of the file
What did you see instead? A file with the correct size, but the first ~25% of the file repeated to fill the file.

If I access the file through localhost, the file is served correctly. If I send it over any network interface, the bug appears. Different lengths of the file are repeated to fill the space in each case. All smaller files come through without error.

I attempted to step through using godebug but ran into mailgun/godebug#12, since the issue is clearly somewhere in the standard library. So I have setup a temporary fileserver to demonstrate the issue. text.text.2 contains the beginning of Sherlock Holmes from Project Gutenberg (thanks Norvig).

Control: python -m SimpleHTTPServer
http://165.225.174.156:8000/
http://165.225.174.156:8000/test.txt.2
Serves correct contents of file.

Bug: gohttp (itang/gohttp)
http://165.225.174.156:8080/
http://165.225.174.156:8080/test.txt.2
Serves garbled file, every time.

Localhost: gohttp with a barebones nginx proxy from localhost
http://165.225.174.156/
http://165.225.174.156/test.txt.2
Serves correct contents of file.

I discovered this bug because we just recently have hashicorp/consul building on SmartOS (hashicorp/consul#159), and the Consul UI contains a ~750K minified JS file. Consul uses gorilla/mux and http.FileServer, but I am able to reproduce with the much simpler case of gohttp.

The test suite for the http response methods doesn't have tests from a remote machine, or use large files, so this bug can slip through in most cases unnoticed. Please let me know if I can provide more details. We really need a way to step through and debug the standard libs!

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 9, 2016

I suspect the sendfile implementation for Solaris is broken, then.

Can you modify src/net/sendfile_solaris.go and add a return statement on the first line of func sendfile?

func sendFile(c *netFD, r io.Reader) (written int64, err error, handled bool) {
    return

And see if that fixes it? (it would revert to Read+Write)

/cc @4ad

@bradfitz bradfitz added this to the Go1.6Maybe milestone Jan 9, 2016

@bradfitz bradfitz added the OS-Solaris label Jan 9, 2016

@bradfitz bradfitz changed the title net/http: large files are garbled when sent to a network interface on SmartOS/Solaris net: broken sendfile on SmartOS/Solaris? large files garbled when sent over network Jan 9, 2016

@doublerebel

This comment has been minimized.

Copy link
Author

commented Jan 9, 2016

@bradfitz You nailed it. I added a log line and the return statement, and I can now see the file being served correctly. Really appreciate the fast diagnosis.

@4ad

This comment has been minimized.

Copy link
Member

commented Jan 9, 2016

Could you please try with the latest SmartOS image? Thanks.

@doublerebel

This comment has been minimized.

Copy link
Author

commented Jan 9, 2016

@4ad sure thing, tested both versions on the latest base-64 image: SunOS 5.11 joyent_20151002T091442Z i86pc i386 i86pc Solaris

The current version fails as on the earlier SmartOS image, and the patched version suggested by @bradfitz works.

@4ad

This comment has been minimized.

Copy link
Member

commented Jan 10, 2016

Just for confirmation, you tested on a recent SmartOS image, a.i. a recent kernel, not an older SmartOS image with a recent base-64 image?

@4ad 4ad self-assigned this Jan 10, 2016

@doublerebel

This comment has been minimized.

Copy link
Author

commented Jan 10, 2016

@4ad the tests were all run on instances in the Joyent Public Cloud, base-64 is the name of their default basic 64-bit SmartOS image. joyent_20151002T091442Z is SmartOS Base Image v15.3.0. I can donate an instance for testing, if that helps.

EDIT: FWIW, I just rebuilt Consul with net/sendfile patched, and it now too works as expected.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 11, 2016

Perhaps we should just disable sendfile on Solaris for Go 1.6 as a safe option. We can re-enable in Go 1.7 once we understand and fix the problem.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2016

I think that is a good idea

On Tue, 12 Jan 2016, 05:57 Brad Fitzpatrick notifications@github.com
wrote:

Perhaps we should just disable sendfile on Solaris for Go 1.6 as a safe
option. We can re-enable in Go 1.7 once we understand and fix the problem.


Reply to this email directly or view it on GitHub
#13892 (comment).

gopherbot pushed a commit that referenced this issue Jan 11, 2016

net: disable sendfile on Solaris for now
There are reports of corruption. Let's disable it for now (for Go 1.6,
especially) until we can investigate and fix properly.

Update #13892

Change-Id: I557275e5142fe616e8a4f89c00ffafb830eb3b78
Reviewed-on: https://go-review.googlesource.com/18540
Reviewed-by: Dave Cheney <dave@cheney.net>

@bradfitz bradfitz modified the milestones: Go1.7, Go1.6Maybe Jan 13, 2016

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 13, 2016

Moved milestone to Go1.7. In Go 1.6, sendfile will just be disabled for Solaris and it will fall back to the path that works.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 31, 2016

Any news here?

@4ad

This comment has been minimized.

Copy link
Member

commented Mar 31, 2016

@binarycrusader

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2016

I'll look into this.

@binarycrusader

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2016

Have reproduced on Solaris development build using decompressed Mark.Twain-Tom.Sawyer.txt.bz2 from src/compress/bzip2/testdata.

@binarycrusader

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2016

It consistently fails after the first 262,144 characters (256KiB), where it starts to the repeat the input all over again.

@binarycrusader

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2016

So the root cause of this issue is that the implementation of sendfile() for Solaris is not correct. In particular, for sizes beyond the size of the socket buffer, the same data will be sent (data_len / socket_buffer_size) + (data_len % socket_buffer_size) times.

This is only vaguely referenced in the Solaris sendfile(3c) man page. The FreeBSD sendfile(2) man page however makes this quite clear:

When using a socket marked for non-blocking I/O, sendfile() may send
fewer bytes than requested.  In this case, the number of bytes success-
fully written is returned in *sbytes (if specified), and the error    EAGAIN
is returned.

As a result, I believe the implementations for the following platforms may not actually be correct:

  • dragonfly
  • freebsd
  • solaris

I am currently testing to determine whether any other platforms need a similar fix.

@binarycrusader

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2016

So my new test works on Solaris, but did not fail as expected on DragonFly/FreeBSD, despite the fact that they clearly do not account for the EAGAIN scenario. I would welcome any insight others might have to as to why it did not fail. The only guess I have is that the file is not big enough to trigger the failure on FreeBSD.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 9, 2016

CL https://golang.org/cl/21769 mentions this issue.

@gopherbot gopherbot closed this in 98080a6 Apr 12, 2016

@golang golang locked and limited conversation to collaborators Apr 13, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.