Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: request body transfer optimization issues (ReadFrom/sendfile) #30377

Closed
vancluever opened this issue Feb 24, 2019 · 11 comments

Comments

Projects
None yet
6 participants
@vancluever
Copy link
Contributor

commented Feb 24, 2019

What version of Go are you using (go version)?

$ go version
go version go1.11.5 linux/amd64

Does this issue reproduce with the latest release?

Yes (1.11.5), haven't tried the 1.12 RC, but the implementations of persistConnWriter and transferWriter.writeBody do not appear to have changed, which is where the issue lies.

What operating system and processor architecture are you using (go env)?

$ go env GOOS GOARCH
linux
amd64

What did you do?

Upload a file using PUT in a http.Request, by setting the file as Body:

fh, err := os.Open(fileName)
// ....
req, err := http.NewRequest("PUT", destURL, fh)

(Not that our implementations at HashiCorp usually wrap in go-retryablehttp, which is interface-compatible with the above. This is what the benchmarks below use. The above example is to illustrate the repro using the stdlib.)

This test utility was tested on live systems sending uploads to live services.

What did you expect to see?

An upload speed matching something more representative of the speed of the links the systems were connected to. fasthttp, which was also tested, gave upload speeds of about 50-70 MB/sec on live systems depending on the location of the client machine.

What did you see instead?

Upload speeds ranging from about 8MB/sec (lower latency systems) to as low as sub-1MB/sec (higher-latency systems).

Additional info/root cause

Investigation into the internals of the net/http transport and its behavior compared to fasthttp revealed that the path fasthttp currently takes allowed it to use sendfile to transfer the data instead of write calls. This was discovered by using various tracing tools (strace/bpftrace) during testing.

Further investigation into the net/http transport revealed that its writer is currently wrapped in persistConnWriter which does not implement io.ReaderFrom, which is what is necessary for an io.Copy to the transport's connection (ie: TCPConn) to ultimately fast-path to sendfile. See here. Further to that, the request body is wrapped in a transferBodyReader, and possibly an ioutil.nopCloser, obfuscating the reader to the point where TCPConn cannot discern properly whether or not the underlying reader is an *os.File.

Adding a trivial implementation for ReadFrom in persistConnWriter allows for the pathing to sendfile, in addition to significantly speeding up uploads when the reader is not an *os.File, or if the OS does not have a sendfile implementation (example: darwin), due to the use of genericReadFrom. Replacing transferBodyReader with a non-writer implementation - and providing additional unwrapping within writeBody to unwrap the underlying reader from the nopCloser if the wrapping exists - ensures that the proper reader makes its way to TCPConn.sendFile.

Example on a local machine using a simple net/http server reading the body into an ioutil.Discard, uploading a 1GB file of random data:

$ ./simpleput-write garbage.dat http://127.0.0.1:8080/
2019/02/24 09:39:16 [DEBUG] PUT http://127.0.0.1:8080/
2019/02/24 09:39:18 upload complete; duration: 1.564957672s; size: 1048576000 bytes; rate: 638.994918 MB/sec

$ ./simpleput-sendfile garbage.dat http://127.0.0.1:8080/
2019/02/24 09:39:39 [DEBUG] PUT http://127.0.0.1:8080/
2019/02/24 09:39:39 upload complete; duration: 244.542093ms; size: 1048576000 bytes; rate: 4089.275542 MB/sec

vancluever added a commit to vancluever/go that referenced this issue Feb 24, 2019

net/http: Optimize request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom. Or it would, if persistConnWriter implemented
io.ReaderFrom, which is missing as well.

As such, this commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

* Finally, io.ReaderFrom is implemented for persistConnWriter in the
higher-level transport, which was preventing ReadFrom in net.TCPConn
from being used in the first place. This has the additional benefit of
facilitating the fallback to genericReadFrom if there's still no
*os.File, which by itself was giving significant performance gains over
the io.Writer implementation.

Fixes golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 24, 2019

net/http: optimize request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom. Or it would, if persistConnWriter implemented
io.ReaderFrom, which is missing as well.

As such, this commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

* Finally, io.ReaderFrom is implemented for persistConnWriter in the
higher-level transport, which was preventing ReadFrom in net.TCPConn
from being used in the first place. This has the additional benefit of
facilitating the fallback to genericReadFrom if there's still no
*os.File, which by itself was giving significant performance gains over
the io.Writer implementation.

Fixes golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 24, 2019

net/http: optimize request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom. Or it would, if persistConnWriter implemented
io.ReaderFrom, which is missing as well.

As such, this commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

* Finally, io.ReaderFrom is implemented for persistConnWriter in the
higher-level transport, which was preventing ReadFrom in net.TCPConn
from being used in the first place. This has the additional benefit of
facilitating the fallback to genericReadFrom if there's still no
*os.File, which by itself was giving significant performance gains over
the io.Writer implementation.

Fixes golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 24, 2019

net/http: optimize request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom. Or it would, if persistConnWriter implemented
io.ReaderFrom, which is missing as well.

As such, this commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

* Finally, io.ReaderFrom is implemented for persistConnWriter in the
higher-level transport, which was preventing ReadFrom in net.TCPConn
from being used in the first place. This has the additional benefit of
facilitating the fallback to genericReadFrom if there's still no
*os.File, which by itself was giving significant performance gains over
the io.Writer implementation.

Fixes golang#30377.
@gopherbot

This comment has been minimized.

Copy link

commented Feb 24, 2019

Change https://golang.org/cl/163599 mentions this issue: net/http: optimize request body writes

@ALTree ALTree changed the title net/http: Request body transfer optimization issues (ReadFrom/sendfile) net/http: request body transfer optimization issues (ReadFrom/sendfile) Feb 25, 2019

@ALTree ALTree added this to the Go1.13 milestone Feb 25, 2019

@bcmills

This comment has been minimized.

Copy link
Member

commented Feb 25, 2019

CC @bradfitz @rsc for net/http.

vancluever added a commit to vancluever/go that referenced this issue Feb 25, 2019

net/http: implement ReaderFrom in persistConnWriter
This implements ReaderFrom in persistConnWriter, allowing request
body writes to take advantage of it when possible, example: usually
direct file access.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 25, 2019

net/http: let Transport request body writes use sendfile
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom. Or it would, if persistConnWriter implemented
io.ReaderFrom, which is missing as well.

As such, this commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

* Finally, io.ReaderFrom is implemented for persistConnWriter in the
higher-level transport, which was preventing ReadFrom in net.TCPConn
from being used in the first place. This has the additional benefit of
facilitating the fallback to genericReadFrom if there's still no
*os.File, which by itself was giving significant performance gains over
the io.Writer implementation.

Fixes golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 25, 2019

net/http: optimize Transport request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom. Or it would, if persistConnWriter implemented
io.ReaderFrom, which is missing as well.

As such, this commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

* Finally, io.ReaderFrom is implemented for persistConnWriter in the
higher-level transport, which was preventing ReadFrom in net.TCPConn
from being used in the first place. This has the additional benefit of
facilitating the fallback to genericReadFrom if there's still no
*os.File, which by itself was giving significant performance gains over
the io.Writer implementation.

Fixes golang#30377.
@gopherbot

This comment has been minimized.

Copy link

commented Feb 25, 2019

Change https://golang.org/cl/163737 mentions this issue: net/http: implement ReaderFrom in persistConnWriter

vancluever added a commit to vancluever/go that referenced this issue Feb 25, 2019

net/http: optimize Transport request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change ID
Ic88d4ac254f665223536fcba4d551fc32ae105b6 to properly function, as
the lack of ReaderFrom implementation otherwise means that this
functionality is essentially walled off.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 25, 2019

net/http: optimize Transport request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change ID
Ic88d4ac254f665223536fcba4d551fc32ae105b6 to properly function, as
the lack of ReaderFrom implementation otherwise means that this
functionality is essentially walled off.

Updates golang#30377.
@vancluever

This comment has been minimized.

Copy link
Contributor Author

commented Feb 25, 2019

The performance issues may actually be largely related to the HTTP2 stack.

Over the weekend when I was testing this, I was not accounting for TLS (as can be seen from the tests above). This morning while working on trying to see if the changes I made helped the real-world problems we've been seeing, I saw no change, and then realized that none of these performance gains really apply to TLS on part of sendfile et al not being used in these situations.

On debugging again, I discovered that net/http has actually been using the HTTP2 transport. The curious part is that our logs say these requests are ultimately going over HTTP/1.1. I'm still investigating on our end, but disabling HTTP2 with GODEBUG=http2client=0 and running the upload test again made any speed discrepancy between net/http and fasthttp pretty much disappear in some scenarios (it could very well be possible that some higher-latency scenarios may still be exhibiting some major speed discrepancies in comparison).

This may or may not be a major priority for us in light of this, it would be neat to see what is up in the HTTP2 stack that might be causing this and I'd like to investigate. Some advice on whether or not I should open a new issue for the HTTP2 package once it's discovered would be good though.

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

new/http: add request file upload benchmarks
This adds a couple of benchmarks to test file uploads using PUT
requests.

It's designed to complement change IDs
I631a73cea75371dfbb418c9cd487c4aa35e73fcd and
Ic88d4ac254f665223536fcba4d551fc32ae105b6, allowing an easy
comparison of performance before and after these changes are applied.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

net/http: implement ReaderFrom in persistConnWriter
This implements ReaderFrom in persistConnWriter, allowing request
body writes to take advantage of it when possible, example: usually
direct file access.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

net/http: optimize Transport request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change ID
Ic88d4ac254f665223536fcba4d551fc32ae105b6 to properly function, as
the lack of ReaderFrom implementation otherwise means that this
functionality is essentially walled off.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

new/http: add request file upload benchmarks
This adds a couple of benchmarks to test file uploads using PUT
requests.

It's designed to complement change IDs
I631a73cea75371dfbb418c9cd487c4aa35e73fcd and
Ic88d4ac254f665223536fcba4d551fc32ae105b6, allowing an easy
comparison of performance before and after these changes are applied.

Updates golang#30377.
@gopherbot

This comment has been minimized.

Copy link

commented Feb 27, 2019

Change https://golang.org/cl/163862 mentions this issue: new/http: add request file upload benchmarks

@vancluever

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2019

An update on this: I've been working on benchmarks (see referenced PR/CL). It does appear that when using a TLS test server that requests are properly sent through the HTTP1 transport, so that particular problem seems to be some sort or implementation issue between what happens when use httptest.NewTLSServer and InsecureSkipVerify, and whatever is going on with our load balancers and a default TLS configuration.

Running the referenced benchmarks, I am definitely seeing performance improvements on both unencrypted and TLS connections. An interesting oddity is the memory jump on unencrypted connections on darwin/amd64, I'm wondering if this was telling of some sort of buffer issue in the old setup that was ultimately bottlenecking the stream at that specific point.

linux/amd64:

benchmark                              old ns/op     new ns/op     delta
BenchmarkRequestWriteFileNoTLS-4       14751184      2425482       -83.56%
BenchmarkRequestWriteFileWithTLS-4     17610158      7241909       -58.88%

benchmark                              old MB/s     new MB/s     speedup
BenchmarkRequestWriteFileNoTLS-4       694.18       4221.84      6.08x
BenchmarkRequestWriteFileWithTLS-4     581.48       1413.99      2.43x

benchmark                              old allocs     new allocs     delta
BenchmarkRequestWriteFileNoTLS-4       67             69             +2.99%
BenchmarkRequestWriteFileWithTLS-4     2568           696            -72.90%

benchmark                              old bytes     new bytes     delta
BenchmarkRequestWriteFileNoTLS-4       5644          5097          -9.69%
BenchmarkRequestWriteFileWithTLS-4     86747         59854         -31.00%

darwin/amd64:

benchmark                              old ns/op     new ns/op     delta
BenchmarkRequestWriteFileNoTLS-8       29557060      5213768       -82.36%
BenchmarkRequestWriteFileWithTLS-8     30266374      9908891       -67.26%

benchmark                              old MB/s     new MB/s     speedup
BenchmarkRequestWriteFileNoTLS-8       346.45       1964.03      5.67x
BenchmarkRequestWriteFileWithTLS-8     338.33       1033.42      3.05x

benchmark                              old allocs     new allocs     delta
BenchmarkRequestWriteFileNoTLS-8       69             68             -1.45%
BenchmarkRequestWriteFileWithTLS-8     2461           712            -71.07%

benchmark                              old bytes     new bytes     delta
BenchmarkRequestWriteFileNoTLS-8       6696          38568         +475.99%
BenchmarkRequestWriteFileWithTLS-8     85893         60870         -29.13%

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

new/http: add request file upload benchmarks
This adds a couple of benchmarks to test file uploads using PUT
requests.

It's designed to complement changes https://golang.org/cl/163599 and
https://golang.org/cl/163737, allowing an easy comparison of
performance before and after these changes are applied.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

net/http: add request file upload benchmarks
This adds a couple of benchmarks to test file uploads using PUT
requests.

It's designed to complement changes https://golang.org/cl/163599 and
https://golang.org/cl/163737, allowing an easy comparison of
performance before and after these changes are applied.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

net/http: optimize Transport request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163599 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Feb 27, 2019

net/http: optimize Transport request body writes
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original writer
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163737 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Mar 6, 2019

net/http: add request file upload benchmarks
This adds benchmarks to test file uploads using PUT requests.

It's designed to complement changes https://golang.org/cl/163599 and
https://golang.org/cl/163737, allowing an easy comparison of
performance before and after these changes are applied.

Updates golang#30377.

Co-authored-by: Emmanuel Odeke <emm.odeke@gmail.com>

vancluever added a commit to vancluever/go that referenced this issue Mar 6, 2019

net/http: add request file upload benchmarks
This adds benchmarks to test file uploads using PUT requests.

It's designed to complement changes https://golang.org/cl/163599 and
https://golang.org/cl/163737, allowing an easy comparison of
performance before and after these changes are applied.

Updates golang#30377.

Co-authored-by: Emmanuel Odeke <emm.odeke@gmail.com>
@vancluever

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

@odeke-em contributed improved benchmarks in https://golang.org/cl/163862 and that CL is being updated. Thanks again Emmanuel!

I'm posting the new results below. I'm not too sure what's up with my MacBook right now, the same odd skew in the pre-update results for non-TLS vs. TLS tests (where non-TLS is actually slower than with TLS) is happening on my old benchmarks now too. 🤷‍♂️

Anyway:

linux/amd64:

benchmark                               old ns/op     new ns/op     delta
BenchmarkFileAndServer_1KB/NoTLS-4      52700         53060         +0.68%
BenchmarkFileAndServer_1KB/TLS-4        61513         61315         -0.32%
BenchmarkFileAndServer_16MB/NoTLS-4     25221976      3651966       -85.52%
BenchmarkFileAndServer_16MB/TLS-4       30608611      10865631      -64.50%
BenchmarkFileAndServer_64MB/NoTLS-4     97564941      15347145      -84.27%
BenchmarkFileAndServer_64MB/TLS-4       114426054     43649242      -61.85%

benchmark                               old MB/s     new MB/s     speedup
BenchmarkFileAndServer_1KB/NoTLS-4      19.43        19.30        0.99x
BenchmarkFileAndServer_1KB/TLS-4        16.65        16.70        1.00x
BenchmarkFileAndServer_16MB/NoTLS-4     665.18       4594.02      6.91x
BenchmarkFileAndServer_16MB/TLS-4       548.12       1544.06      2.82x
BenchmarkFileAndServer_64MB/NoTLS-4     687.84       4372.73      6.36x
BenchmarkFileAndServer_64MB/TLS-4       586.48       1537.46      2.62x

benchmark                               old allocs     new allocs     delta
BenchmarkFileAndServer_1KB/NoTLS-4      66             70             +6.06%
BenchmarkFileAndServer_1KB/TLS-4        70             71             +1.43%
BenchmarkFileAndServer_16MB/NoTLS-4     69             70             +1.45%
BenchmarkFileAndServer_16MB/TLS-4       4175           1102           -73.60%
BenchmarkFileAndServer_64MB/NoTLS-4     73             71             -2.74%
BenchmarkFileAndServer_64MB/TLS-4       16517          4193           -74.61%

benchmark                               old bytes     new bytes     delta
BenchmarkFileAndServer_1KB/NoTLS-4      4940          5027          +1.76%
BenchmarkFileAndServer_1KB/TLS-4        5073          6108          +20.40%
BenchmarkFileAndServer_16MB/NoTLS-4     6110          5099          -16.55%
BenchmarkFileAndServer_16MB/TLS-4       139738        74312         -46.82%
BenchmarkFileAndServer_64MB/NoTLS-4     7898          5643          -28.55%
BenchmarkFileAndServer_64MB/TLS-4       546156        181074        -66.85%

darwin/amd64:

benchmark                               old ns/op     new ns/op     delta
BenchmarkFileAndServer_1KB/NoTLS-8      89838         86693         -3.50%
BenchmarkFileAndServer_1KB/TLS-8        96090         91924         -4.34%
BenchmarkFileAndServer_16MB/NoTLS-8     54961694      8443407       -84.64%
BenchmarkFileAndServer_16MB/TLS-8       49016790      15563171      -68.25%
BenchmarkFileAndServer_64MB/NoTLS-8     223579299     34027450      -84.78%
BenchmarkFileAndServer_64MB/TLS-8       195088780     68123274      -65.08%

benchmark                               old MB/s     new MB/s     speedup
BenchmarkFileAndServer_1KB/NoTLS-8      11.40        11.81        1.04x
BenchmarkFileAndServer_1KB/TLS-8        10.66        11.14        1.05x
BenchmarkFileAndServer_16MB/NoTLS-8     305.25       1987.02      6.51x
BenchmarkFileAndServer_16MB/TLS-8       342.27       1078.01      3.15x
BenchmarkFileAndServer_64MB/NoTLS-8     300.16       1972.20      6.57x
BenchmarkFileAndServer_64MB/TLS-8       343.99       985.11       2.86x

benchmark                               old allocs     new allocs     delta
BenchmarkFileAndServer_1KB/NoTLS-8      66             68             +3.03%
BenchmarkFileAndServer_1KB/TLS-8        70             71             +1.43%
BenchmarkFileAndServer_16MB/NoTLS-8     72             69             -4.17%
BenchmarkFileAndServer_16MB/TLS-8       3987           1123           -71.83%
BenchmarkFileAndServer_64MB/NoTLS-8     82             71             -13.41%
BenchmarkFileAndServer_64MB/TLS-8       15815          4311           -72.74%

benchmark                               old bytes     new bytes     delta
BenchmarkFileAndServer_1KB/NoTLS-8      4959          6026          +21.52%
BenchmarkFileAndServer_1KB/TLS-8        5093          6141          +20.58%
BenchmarkFileAndServer_16MB/NoTLS-8     7697          38631         +401.90%
BenchmarkFileAndServer_16MB/TLS-8       136975        75484         -44.89%
BenchmarkFileAndServer_64MB/NoTLS-8     12198         39128         +220.77%
BenchmarkFileAndServer_64MB/TLS-8       526316        190617        -63.78%

gopherbot pushed a commit that referenced this issue Mar 6, 2019

net/http: add request file upload benchmarks
This adds benchmarks to test file uploads using PUT requests.

It's designed to complement changes https://golang.org/cl/163599 and
https://golang.org/cl/163737, allowing an easy comparison of
performance before and after these changes are applied.

Updates #30377.

Co-authored-by: Emmanuel Odeke <emm.odeke@gmail.com>

Change-Id: Ib8e692c61e1f7957d88c7101669d4f7fb8110c65
GitHub-Last-Rev: 242622b
GitHub-Pull-Request: #30424
Reviewed-on: https://go-review.googlesource.com/c/go/+/163862
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: let Transport request body writes use sendfile
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original reader
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163737 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: unfurl persistConnWriter's underlying writer
Make persistConnWriter implement io.ReaderFrom, via an io.Copy on the
underlying net.Conn. This in turn enables it to use OS level
optimizations such as sendfile.

This has been observed giving performance gains even in the absence
of ReaderFrom, more than likely due to the difference in io's default
buffer (32 KB) versus bufio's (4 KB).

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: add request file upload benchmarks
This adds benchmarks to test file uploads using PUT requests.

It's designed to complement changes https://golang.org/cl/163599 and
https://golang.org/cl/163737, allowing an easy comparison of
performance before and after these changes are applied.

Updates golang#30377.

Co-authored-by: Emmanuel Odeke <emm.odeke@gmail.com>

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: unfurl persistConnWriter's underlying writer
Make persistConnWriter implement io.ReaderFrom, via an io.Copy on the
underlying net.Conn. This in turn enables it to use OS level
optimizations such as sendfile.

This has been observed giving performance gains even in the absence
of ReaderFrom, more than likely due to the difference in io's default
buffer (32 KB) versus bufio's (4 KB).

Updates golang#30377.

gopherbot pushed a commit that referenced this issue Mar 7, 2019

net/http: unfurl persistConnWriter's underlying writer
Make persistConnWriter implement io.ReaderFrom, via an io.Copy on the
underlying net.Conn. This in turn enables it to use OS level
optimizations such as sendfile.

This has been observed giving performance gains even in the absence
of ReaderFrom, more than likely due to the difference in io's default
buffer (32 KB) versus bufio's (4 KB).

Speedups on linux/amd64:
benchmark                               old MB/s     new MB/s     speedup
BenchmarkFileAndServer_16MB/NoTLS-4     662.96       2703.74      4.08x
BenchmarkFileAndServer_16MB/TLS-4       552.76       1420.72      2.57x

Speedups on darwin/amd64:
benchmark                               old MB/s     new MB/s     speedup
BenchmarkFileAndServer_16MB/NoTLS-8     357.58       1972.86      5.52x
BenchmarkFileAndServer_16MB/TLS-8       346.20       1067.41      3.08x

Updates #30377.

Change-Id: Ic88d4ac254f665223536fcba4d551fc32ae105b6
GitHub-Last-Rev: a6f67cd
GitHub-Pull-Request: #30390
Reviewed-on: https://go-review.googlesource.com/c/go/+/163737
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@mvdan

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

@vancluever please remember to use https://godoc.org/golang.org/x/perf/cmd/benchstat when comparing benchmark results; that will give results with p-values that we can trust better than single runs. You'll also need to take multiple benchmark measurements before and after, for example via go test -count=5 ....

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: let Transport request body writes use sendfile
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original reader
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163737 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: let Transport request body writes use sendfile
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original reader
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163737 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: let Transport request body writes use sendfile
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original reader
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163737 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Benchmarks between this commit and https://golang.org/cl/163737
(where ReaderFrom was implemented for persistConnWriter):

linux/amd64:
name                        old time/op    new time/op    delta
FileAndServer_1KB/NoTLS-4     53.7µs ± 0%    53.5µs ± 0%   -0.31%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-4       61.0µs ± 0%    61.1µs ± 0%     ~     (p=0.165 n=10+10)
FileAndServer_16MB/NoTLS-4    6.30ms ± 6%    3.81ms ± 5%  -39.48%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4      13.5ms ± 1%    13.4ms ± 1%     ~     (p=0.631 n=10+10)
FileAndServer_64MB/NoTLS-4    27.3ms ± 1%    16.4ms ± 3%  -40.05%  (p=0.000 n=9+10)
FileAndServer_64MB/TLS-4      53.9ms ± 2%    53.4ms ± 1%   -0.93%  (p=0.006 n=10+9)

name                        old speed      new speed      delta
FileAndServer_1KB/NoTLS-4   19.1MB/s ± 0%  19.1MB/s ± 0%   +0.32%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-4     16.8MB/s ± 0%  16.8MB/s ± 0%     ~     (p=0.302 n=10+10)
FileAndServer_16MB/NoTLS-4  2.67GB/s ± 6%  4.41GB/s ± 5%  +65.32%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4    1.25GB/s ± 1%  1.25GB/s ± 1%     ~     (p=0.631 n=10+10)
FileAndServer_64MB/NoTLS-4  2.45GB/s ± 1%  4.09GB/s ± 3%  +66.84%  (p=0.000 n=9+10)
FileAndServer_64MB/TLS-4    1.25GB/s ± 2%  1.26GB/s ± 1%   +0.93%  (p=0.006 n=10+9)

name                        old alloc/op   new alloc/op   delta
FileAndServer_1KB/NoTLS-4     6.01kB ± 0%    5.03kB ± 0%  -16.31%  (p=0.000 n=8+10)
FileAndServer_1KB/TLS-4       6.11kB ± 0%    6.11kB ± 0%     ~     (p=0.127 n=10+10)
FileAndServer_16MB/NoTLS-4    38.4kB ± 0%     5.1kB ± 0%  -86.71%  (p=0.000 n=8+8)
FileAndServer_16MB/TLS-4      74.1kB ± 0%    74.1kB ± 0%     ~     (p=0.781 n=10+10)
FileAndServer_64MB/NoTLS-4    38.9kB ± 0%     5.6kB ± 1%  -85.58%  (p=0.000 n=8+10)
FileAndServer_64MB/TLS-4       184kB ± 1%     184kB ± 0%     ~     (p=0.489 n=9+9)

name                        old allocs/op  new allocs/op  delta
FileAndServer_1KB/NoTLS-4       69.0 ± 0%      70.0 ± 0%   +1.45%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-4         71.0 ± 0%      71.0 ± 0%     ~     (all equal)
FileAndServer_16MB/NoTLS-4      70.0 ± 0%      70.0 ± 0%     ~     (all equal)
FileAndServer_16MB/TLS-4       1.11k ± 0%     1.11k ± 0%     ~     (p=0.189 n=10+10)
FileAndServer_64MB/NoTLS-4      71.0 ± 0%      71.0 ± 0%     ~     (all equal)
FileAndServer_64MB/TLS-4       4.23k ± 0%     4.24k ± 0%   +0.10%  (p=0.002 n=9+8)

darwin/amd64:
name                        old time/op    new time/op     delta
FileAndServer_1KB/NoTLS-8     92.0µs ± 4%     91.6µs ± 1%    ~     (p=0.601 n=10+7)
FileAndServer_1KB/TLS-8        103µs ± 5%       97µs ± 1%  -6.20%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-8    10.1ms ± 6%     10.2ms ± 4%    ~     (p=0.315 n=10+8)
FileAndServer_16MB/TLS-8      18.3ms ± 2%     17.2ms ± 2%  -6.34%  (p=0.000 n=9+10)
FileAndServer_64MB/NoTLS-8    40.1ms ± 7%     37.7ms ± 3%  -6.05%  (p=0.000 n=9+9)
FileAndServer_64MB/TLS-8      70.2ms ± 4%     68.1ms ± 3%  -3.01%  (p=0.001 n=10+10)

name                        old speed      new speed       delta
FileAndServer_1KB/NoTLS-8   11.1MB/s ± 4%   11.2MB/s ± 1%    ~     (p=0.617 n=10+7)
FileAndServer_1KB/TLS-8     9.92MB/s ± 5%  10.57MB/s ± 1%  +6.53%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-8  1.67GB/s ± 5%   1.65GB/s ± 3%    ~     (p=0.315 n=10+8)
FileAndServer_16MB/TLS-8     910MB/s ± 5%    977MB/s ± 2%  +7.40%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8  1.67GB/s ± 6%   1.78GB/s ± 3%  +6.36%  (p=0.000 n=9+9)
FileAndServer_64MB/TLS-8     956MB/s ± 3%    985MB/s ± 3%  +3.09%  (p=0.001 n=10+10)

name                        old alloc/op   new alloc/op    delta
FileAndServer_1KB/NoTLS-8     6.03kB ± 0%     6.03kB ± 0%  -0.04%  (p=0.021 n=8+10)
FileAndServer_1KB/TLS-8       6.15kB ± 0%     6.14kB ± 0%  -0.18%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-8    38.8kB ± 1%     38.9kB ± 1%    ~     (p=0.529 n=10+10)
FileAndServer_16MB/TLS-8      75.5kB ± 1%     75.5kB ± 0%    ~     (p=0.481 n=10+10)
FileAndServer_64MB/NoTLS-8    40.3kB ± 1%     39.3kB ± 1%  -2.47%  (p=0.000 n=9+10)
FileAndServer_64MB/TLS-8       189kB ± 1%      189kB ± 1%    ~     (p=0.684 n=10+10)

name                        old allocs/op  new allocs/op   delta
FileAndServer_1KB/NoTLS-8       68.0 ± 0%       68.0 ± 0%    ~     (all equal)
FileAndServer_1KB/TLS-8         71.0 ± 0%       71.0 ± 0%    ~     (all equal)
FileAndServer_16MB/NoTLS-8      69.5 ± 1%       69.6 ± 1%    ~     (p=1.000 n=10+10)
FileAndServer_16MB/TLS-8       1.13k ± 0%      1.14k ± 0%    ~     (p=0.336 n=9+10)
FileAndServer_64MB/NoTLS-8      73.6 ± 1%       71.0 ± 0%  -3.47%  (p=0.000 n=9+9)
FileAndServer_64MB/TLS-8       4.34k ± 0%      4.34k ± 0%    ~     (p=0.323 n=10+10)

Benchmarks between this commit and https://golang.org/cl/163862
(where benchmarks were added):

linux/amd64:
name                        old time/op    new time/op    delta
FileAndServer_1KB/NoTLS-4     53.3µs ± 0%    53.4µs ± 0%    +0.21%  (p=0.028 n=10+9)
FileAndServer_1KB/TLS-4       61.2µs ± 0%    60.8µs ± 0%    -0.73%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-4    25.4ms ± 5%     3.8ms ± 5%   -84.99%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4      33.3ms ± 2%    13.5ms ± 3%   -59.35%  (p=0.000 n=9+10)
FileAndServer_64MB/NoTLS-4     105ms ± 2%      16ms ± 1%   -84.38%  (p=0.000 n=9+9)
FileAndServer_64MB/TLS-4       128ms ± 2%      54ms ± 2%   -58.18%  (p=0.000 n=10+10)

name                        old speed      new speed      delta
FileAndServer_1KB/NoTLS-4   19.2MB/s ± 0%  19.2MB/s ± 0%    -0.21%  (p=0.018 n=10+9)
FileAndServer_1KB/TLS-4     16.7MB/s ± 0%  16.8MB/s ± 0%    +0.72%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-4   660MB/s ± 5%  4400MB/s ± 6%  +566.83%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4     504MB/s ± 2%  1241MB/s ± 2%  +146.03%  (p=0.000 n=9+10)
FileAndServer_64MB/NoTLS-4   640MB/s ± 2%  4089MB/s ± 2%  +538.71%  (p=0.000 n=9+10)
FileAndServer_64MB/TLS-4     524MB/s ± 2%  1254MB/s ± 2%  +139.15%  (p=0.000 n=10+10)

name                        old alloc/op   new alloc/op   delta
FileAndServer_1KB/NoTLS-4     4.94kB ± 0%    5.03kB ± 0%    +1.82%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-4       5.07kB ± 0%    6.11kB ± 0%   +20.43%  (p=0.000 n=8+10)
FileAndServer_16MB/NoTLS-4    5.93kB ± 8%    5.10kB ± 0%   -14.06%  (p=0.000 n=10+8)
FileAndServer_16MB/TLS-4       141kB ± 1%      74kB ± 0%   -47.53%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-4    9.66kB ±22%    5.59kB ± 1%   -42.09%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4       551kB ± 0%     184kB ± 0%   -66.59%  (p=0.000 n=9+9)

name                        old allocs/op  new allocs/op  delta
FileAndServer_1KB/NoTLS-4       66.0 ± 0%      70.0 ± 0%    +6.06%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-4         70.0 ± 0%      71.0 ± 0%    +1.43%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-4      67.6 ± 1%      70.0 ± 0%    +3.55%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4       4.19k ± 0%     1.11k ± 0%   -73.59%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-4      76.3 ± 5%      71.0 ± 0%    -6.95%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4       16.6k ± 0%      4.2k ± 0%   -74.44%  (p=0.000 n=10+8)

darwin/amd64:
name                        old time/op    new time/op    delta
FileAndServer_1KB/NoTLS-8     91.5µs ± 2%    96.8µs ± 8%    +5.76%  (p=0.010 n=10+9)
FileAndServer_1KB/TLS-8       98.9µs ± 2%    98.9µs ± 2%      ~     (p=0.968 n=10+9)
FileAndServer_16MB/NoTLS-8    80.3ms ±20%     9.5ms ± 2%   -88.19%  (p=0.000 n=9+9)
FileAndServer_16MB/TLS-8      54.3ms ±12%    17.6ms ± 4%   -67.52%  (p=0.000 n=9+10)
FileAndServer_64MB/NoTLS-8     406ms ±19%      39ms ± 3%   -90.45%  (p=0.000 n=10+9)
FileAndServer_64MB/TLS-8       205ms ± 6%      71ms ± 3%   -65.27%  (p=0.000 n=8+10)

name                        old speed      new speed      delta
FileAndServer_1KB/NoTLS-8   11.2MB/s ± 2%  10.6MB/s ± 9%    -5.30%  (p=0.010 n=10+9)
FileAndServer_1KB/TLS-8     10.4MB/s ± 2%  10.4MB/s ± 2%      ~     (p=0.889 n=10+9)
FileAndServer_16MB/NoTLS-8   211MB/s ±17%  1770MB/s ± 2%  +739.26%  (p=0.000 n=9+9)
FileAndServer_16MB/TLS-8     302MB/s ±26%   952MB/s ± 4%  +214.93%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8   167MB/s ±17%  1729MB/s ± 3%  +938.05%  (p=0.000 n=10+9)
FileAndServer_64MB/TLS-8     316MB/s ±15%   944MB/s ± 3%  +198.59%  (p=0.000 n=10+10)

name                        old alloc/op   new alloc/op   delta
FileAndServer_1KB/NoTLS-8     4.96kB ± 0%    6.03kB ± 0%   +21.48%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-8       5.10kB ± 0%    6.15kB ± 0%   +20.59%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-8    7.70kB ± 1%   38.64kB ± 0%  +401.51%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-8       137kB ± 2%      76kB ± 0%   -44.73%  (p=0.000 n=10+9)
FileAndServer_64MB/NoTLS-8    24.1kB ±49%    40.0kB ± 2%   +66.24%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-8       530kB ± 1%     189kB ± 1%   -64.42%  (p=0.000 n=10+10)

name                        old allocs/op  new allocs/op  delta
FileAndServer_1KB/NoTLS-8       66.0 ± 0%      68.0 ± 0%    +3.03%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-8         70.0 ± 0%      71.0 ± 0%    +1.43%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-8      71.6 ± 1%      69.0 ± 0%    -3.63%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-8       3.96k ± 2%     1.13k ± 0%   -71.34%  (p=0.000 n=10+9)
FileAndServer_64MB/NoTLS-8       105 ±22%        73 ± 3%   -30.61%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-8       15.7k ± 0%      4.3k ± 0%   -72.40%  (p=0.000 n=10+10)

Updates golang#30377.

vancluever added a commit to vancluever/go that referenced this issue Mar 7, 2019

net/http: let Transport request body writes use sendfile
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original reader
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163737 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Benchmarks between this commit and https://golang.org/cl/163862,
incorporating https://golang.org/cl/163737:

linux/amd64:
name                        old time/op    new time/op    delta
FileAndServer_1KB/NoTLS-4     53.2µs ± 0%    53.3µs ± 0%      ~     (p=0.075 n=10+9)
FileAndServer_1KB/TLS-4       61.2µs ± 0%    60.7µs ± 0%    -0.77%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-4    25.3ms ± 5%     3.8ms ± 6%   -84.95%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4      33.2ms ± 2%    13.4ms ± 2%   -59.57%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-4     106ms ± 4%      16ms ± 2%   -84.45%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4       129ms ± 1%      54ms ± 3%   -58.32%  (p=0.000 n=8+10)

name                        old speed      new speed      delta
FileAndServer_1KB/NoTLS-4   19.2MB/s ± 0%  19.2MB/s ± 0%      ~     (p=0.095 n=10+9)
FileAndServer_1KB/TLS-4     16.7MB/s ± 0%  16.9MB/s ± 0%    +0.78%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-4   664MB/s ± 5%  4415MB/s ± 6%  +565.27%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4     505MB/s ± 2%  1250MB/s ± 2%  +147.32%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-4   636MB/s ± 4%  4090MB/s ± 2%  +542.81%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4     522MB/s ± 1%  1251MB/s ± 3%  +139.95%  (p=0.000 n=8+10)

darwin/amd64:
name                        old time/op    new time/op     delta
FileAndServer_1KB/NoTLS-8     93.0µs ± 5%     96.6µs ±11%      ~     (p=0.190 n=10+10)
FileAndServer_1KB/TLS-8        105µs ± 7%      100µs ± 5%    -5.14%  (p=0.002 n=10+9)
FileAndServer_16MB/NoTLS-8    87.5ms ±19%     10.0ms ± 6%   -88.57%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-8      52.7ms ±11%     17.4ms ± 5%   -66.92%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8     363ms ±54%       39ms ± 7%   -89.24%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-8       209ms ±13%       73ms ± 5%   -65.37%  (p=0.000 n=9+10)

name                        old speed      new speed       delta
FileAndServer_1KB/NoTLS-8   11.0MB/s ± 5%   10.6MB/s ±10%      ~     (p=0.184 n=10+10)
FileAndServer_1KB/TLS-8     9.75MB/s ± 7%  10.27MB/s ± 5%    +5.26%  (p=0.003 n=10+9)
FileAndServer_16MB/NoTLS-8   194MB/s ±16%   1680MB/s ± 6%  +767.83%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-8     319MB/s ±10%    963MB/s ± 4%  +201.36%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8   180MB/s ±31%   1719MB/s ± 7%  +853.61%  (p=0.000 n=9+10)
FileAndServer_64MB/TLS-8     321MB/s ±12%    926MB/s ± 5%  +188.24%  (p=0.000 n=9+10)

Updates golang#30377.

gopherbot pushed a commit that referenced this issue Mar 7, 2019

net/http: let Transport request body writes use sendfile
net.TCPConn has the ability to send data out using system calls such as
sendfile when the source data comes from an *os.File. However, the way
that I/O has been laid out in the transport means that the File is
actually wrapped behind two outer io.Readers, and as such the TCP stack
cannot properly type-assert the reader, ensuring that it falls back to
genericReadFrom.

This commit does the following:

* Removes transferBodyReader and moves its functionality to a new
doBodyCopy helper. This is not an io.Reader implementation, but no
functionality is lost this way, and it allows us to unwrap one layer
from the body.

* The second layer of the body is unwrapped if the original reader
was wrapped with ioutil.NopCloser, which is what NewRequest wraps the
body in if it's not a ReadCloser on its own. The unwrap operation
passes through the existing body if there's no nopCloser.

Note that this depends on change https://golang.org/cl/163737 to
properly function, as the lack of ReaderFrom implementation otherwise
means that this functionality is essentially walled off.

Benchmarks between this commit and https://golang.org/cl/163862,
incorporating https://golang.org/cl/163737:

linux/amd64:
name                        old time/op    new time/op    delta
FileAndServer_1KB/NoTLS-4     53.2µs ± 0%    53.3µs ± 0%      ~     (p=0.075 n=10+9)
FileAndServer_1KB/TLS-4       61.2µs ± 0%    60.7µs ± 0%    -0.77%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-4    25.3ms ± 5%     3.8ms ± 6%   -84.95%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4      33.2ms ± 2%    13.4ms ± 2%   -59.57%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-4     106ms ± 4%      16ms ± 2%   -84.45%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4       129ms ± 1%      54ms ± 3%   -58.32%  (p=0.000 n=8+10)

name                        old speed      new speed      delta
FileAndServer_1KB/NoTLS-4   19.2MB/s ± 0%  19.2MB/s ± 0%      ~     (p=0.095 n=10+9)
FileAndServer_1KB/TLS-4     16.7MB/s ± 0%  16.9MB/s ± 0%    +0.78%  (p=0.000 n=10+9)
FileAndServer_16MB/NoTLS-4   664MB/s ± 5%  4415MB/s ± 6%  +565.27%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4     505MB/s ± 2%  1250MB/s ± 2%  +147.32%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-4   636MB/s ± 4%  4090MB/s ± 2%  +542.81%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4     522MB/s ± 1%  1251MB/s ± 3%  +139.95%  (p=0.000 n=8+10)

darwin/amd64:
name                        old time/op    new time/op     delta
FileAndServer_1KB/NoTLS-8     93.0µs ± 5%     96.6µs ±11%      ~     (p=0.190 n=10+10)
FileAndServer_1KB/TLS-8        105µs ± 7%      100µs ± 5%    -5.14%  (p=0.002 n=10+9)
FileAndServer_16MB/NoTLS-8    87.5ms ±19%     10.0ms ± 6%   -88.57%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-8      52.7ms ±11%     17.4ms ± 5%   -66.92%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8     363ms ±54%       39ms ± 7%   -89.24%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-8       209ms ±13%       73ms ± 5%   -65.37%  (p=0.000 n=9+10)

name                        old speed      new speed       delta
FileAndServer_1KB/NoTLS-8   11.0MB/s ± 5%   10.6MB/s ±10%      ~     (p=0.184 n=10+10)
FileAndServer_1KB/TLS-8     9.75MB/s ± 7%  10.27MB/s ± 5%    +5.26%  (p=0.003 n=10+9)
FileAndServer_16MB/NoTLS-8   194MB/s ±16%   1680MB/s ± 6%  +767.83%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-8     319MB/s ±10%    963MB/s ± 4%  +201.36%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8   180MB/s ±31%   1719MB/s ± 7%  +853.61%  (p=0.000 n=9+10)
FileAndServer_64MB/TLS-8     321MB/s ±12%    926MB/s ± 5%  +188.24%  (p=0.000 n=9+10)

Updates #30377.

Change-Id: I631a73cea75371dfbb418c9cd487c4aa35e73fcd
GitHub-Last-Rev: 4a77dd1
GitHub-Pull-Request: #30378
Reviewed-on: https://go-review.googlesource.com/c/go/+/163599
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@odeke-em

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

Yes @mvdan I also paged him about using -count=10, thanks for mentioning it too.

Alright, all the respective CLs have been submitted, @vancluever I think we can now close this issue, thank you very much for working on this!

@vancluever

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2019

@mvdan as @odeke-em mentioned, updated stats are posted to https://golang.org/cl/163599 in summary form, but I've posted another run that has memory stats below as well.

linux/amd64:

name                        old time/op    new time/op    delta
FileAndServer_1KB/NoTLS-4     53.3µs ± 0%    53.4µs ± 0%      ~     (p=0.105 n=10+10)
FileAndServer_1KB/TLS-4       61.2µs ± 0%    61.0µs ± 0%    -0.34%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-4    25.1ms ± 1%     3.8ms ± 5%   -84.92%  (p=0.000 n=8+10)
FileAndServer_16MB/TLS-4      33.5ms ± 4%    13.4ms ± 1%   -60.20%  (p=0.000 n=10+9)
FileAndServer_64MB/NoTLS-4     108ms ± 7%      16ms ± 1%   -84.81%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4       130ms ± 3%      54ms ± 1%   -58.63%  (p=0.000 n=10+10)

name                        old speed      new speed      delta
FileAndServer_1KB/NoTLS-4   19.2MB/s ± 0%  19.2MB/s ± 0%      ~     (p=0.127 n=10+10)
FileAndServer_1KB/TLS-4     16.7MB/s ± 0%  16.8MB/s ± 0%    +0.34%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-4   669MB/s ± 1%  4442MB/s ± 5%  +564.20%  (p=0.000 n=8+10)
FileAndServer_16MB/TLS-4     500MB/s ± 4%  1257MB/s ± 1%  +151.15%  (p=0.000 n=10+9)
FileAndServer_64MB/NoTLS-4   624MB/s ± 6%  4103MB/s ± 1%  +557.60%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4     518MB/s ± 3%  1252MB/s ± 1%  +141.72%  (p=0.000 n=10+10)

name                        old alloc/op   new alloc/op   delta
FileAndServer_1KB/NoTLS-4     4.94kB ± 0%    5.03kB ± 0%    +1.80%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-4       5.07kB ± 0%    6.11kB ± 0%   +20.44%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-4    5.91kB ± 7%    5.10kB ± 0%   -13.76%  (p=0.000 n=10+9)
FileAndServer_16MB/TLS-4       141kB ± 2%      74kB ± 0%   -47.60%  (p=0.000 n=10+9)
FileAndServer_64MB/NoTLS-4    9.73kB ±21%    5.55kB ± 0%   -42.92%  (p=0.000 n=10+8)
FileAndServer_64MB/TLS-4       551kB ± 0%     184kB ± 0%   -66.54%  (p=0.000 n=10+8)

name                        old allocs/op  new allocs/op  delta
FileAndServer_1KB/NoTLS-4       66.0 ± 0%      70.0 ± 0%    +6.06%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-4         70.0 ± 0%      71.0 ± 0%    +1.43%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-4      67.7 ± 2%      70.0 ± 0%    +3.40%  (p=0.000 n=10+10)
FileAndServer_16MB/TLS-4       4.19k ± 0%     1.11k ± 0%   -73.59%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-4      76.1 ± 5%      71.0 ± 0%    -6.70%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-4       16.6k ± 0%      4.2k ± 0%   -74.45%  (p=0.000 n=10+8)

darwin/amd64:

name                        old time/op    new time/op    delta
FileAndServer_1KB/NoTLS-8     94.0µs ± 4%    92.2µs ± 0%      ~     (p=0.173 n=10+8)
FileAndServer_1KB/TLS-8        102µs ± 1%      98µs ± 1%    -4.27%  (p=0.000 n=9+8)
FileAndServer_16MB/NoTLS-8    88.7ms ±24%     9.5ms ± 1%   -89.33%  (p=0.000 n=9+9)
FileAndServer_16MB/TLS-8      55.9ms ±16%    17.3ms ± 2%   -69.06%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8     368ms ±14%      38ms ± 2%   -89.66%  (p=0.000 n=9+9)
FileAndServer_64MB/TLS-8       205ms ± 5%      71ms ± 5%   -65.49%  (p=0.000 n=8+9)

name                        old speed      new speed      delta
FileAndServer_1KB/NoTLS-8   10.9MB/s ± 3%  11.1MB/s ± 0%      ~     (p=0.163 n=10+8)
FileAndServer_1KB/TLS-8     10.0MB/s ± 1%  10.5MB/s ± 1%    +4.46%  (p=0.000 n=9+8)
FileAndServer_16MB/NoTLS-8   193MB/s ±23%  1772MB/s ± 1%  +820.31%  (p=0.000 n=9+9)
FileAndServer_16MB/TLS-8     303MB/s ±15%   969MB/s ± 2%  +219.39%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8   183MB/s ±13%  1763MB/s ± 2%  +861.46%  (p=0.000 n=9+9)
FileAndServer_64MB/TLS-8     327MB/s ± 6%   948MB/s ± 5%  +189.73%  (p=0.000 n=8+9)

name                        old alloc/op   new alloc/op   delta
FileAndServer_1KB/NoTLS-8     4.96kB ± 0%    6.03kB ± 0%   +21.48%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-8       5.10kB ± 0%    6.15kB ± 0%   +20.46%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-8    7.75kB ± 5%   38.60kB ± 0%  +398.24%  (p=0.000 n=9+10)
FileAndServer_16MB/TLS-8       136kB ± 2%      76kB ± 1%   -44.55%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8    19.3kB ±66%    39.7kB ± 2%  +105.74%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-8       527kB ± 2%     188kB ± 1%   -64.31%  (p=0.000 n=10+10)

name                        old allocs/op  new allocs/op  delta
FileAndServer_1KB/NoTLS-8       66.0 ± 0%      68.0 ± 0%    +3.03%  (p=0.000 n=10+10)
FileAndServer_1KB/TLS-8         70.0 ± 0%      71.0 ± 0%    +1.43%  (p=0.000 n=10+10)
FileAndServer_16MB/NoTLS-8      72.2 ± 2%      69.0 ± 0%    -4.43%  (p=0.000 n=10+9)
FileAndServer_16MB/TLS-8       3.95k ± 1%     1.14k ± 0%   -71.25%  (p=0.000 n=10+10)
FileAndServer_64MB/NoTLS-8      95.6 ±24%      71.8 ± 2%   -24.90%  (p=0.000 n=10+10)
FileAndServer_64MB/TLS-8       15.6k ± 1%      4.3k ± 1%   -72.29%  (p=0.000 n=10+9)

Thanks again @odeke-em and everyone else for the help in all of this!

@vancluever

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2019

PS: It doesn't look like #30424 closed out correctly after the merge of the benchmarks. I can close it out unless it's needed for any reason, or if automation will reap it later.

@odeke-em odeke-em closed this Mar 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.