New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: investigate and fix uncaught allocations and regressions #25482

Open
odeke-em opened this Issue May 21, 2018 · 9 comments

Comments

Projects
None yet
5 participants
@odeke-em
Member

odeke-em commented May 21, 2018

I just made a fix for https://golang.org/issue/25383 with https://go-review.googlesource.com/c/go/+/113996 and that CL just puts a bandaid on the issue which was a regression that was manually noticed by @Quasilyte.

The real issue as raised by @bradfitz is that there have been a bunch of allocations and regressions that have crept into net/http code over the past couple of months/years, that are more concerning.

This issue is to track the mentioned the problem raised. I am currently too swamped to comprehensively do investigative and performance work but if anyone would like to take on this issue or would like to work on this for Go1.11 or perhaps during Go1.12 or so, I would be very happy to help out whether pairing or as a "bounty", please feel free to reach out to me.

@mvdan

This comment has been minimized.

Member

mvdan commented May 21, 2018

Are there any considerable regressions from Go 1.10 to 1.11? If not, it seems safest and easiest to do this when the tree reopens.

@bradfitz

This comment has been minimized.

Member

bradfitz commented May 21, 2018

First, somebody should benchmark the mux for Go 1.10 and tip. If it's really bad, we can investigate and do a minimal perf fix for Go 1.11. Otherwise this is all Go 1.12 material.

I will assume it's Go 1.12 for now, but somebody should still benchmark it soon.

@bradfitz bradfitz added this to the Go1.12 milestone May 21, 2018

@odeke-em

This comment has been minimized.

Member

odeke-em commented May 21, 2018

@mvdan great question! So, so far what prompted this issue was #25383 which was manually noticed. Roger that, perhaps when the tree opens.

Perhaps we could even make tooling that runs benchmarks as part of the Trybots and reports when there have been huge fluctuations say |8%| I have worked on some internal tool that does that, but I'll need to get the time to package it

@bradfitz

This comment has been minimized.

Member

bradfitz commented May 21, 2018

Perhaps we could even make tooling that runs benchmarks as part of the Trybots and reports when there have been huge fluctuations say |8%| I have worked on some internal tool that does that, but I'll need to get the time to package it

That is not a small project.

It's been done twice and abandoned twice.

First was #8930 (comment) and second (current) is #20552.

@odeke-em

This comment has been minimized.

Member

odeke-em commented May 22, 2018

Roger that, thanks for the cautionary tale @bradfitz!

Alright, so for some results, I ran all the net/http benchmarks on both Go1.10 and tip devel 87a18c6 with the results at [1]

My machine's information

Darwin Emmanuels-MacBook-Pro-2.local 15.6.0 Darwin Kernel Version 15.6.0: Tue Jan  9 20:12:05 PST 2018; root:xnu-3248.73.5~1/RELEASE_X86_64 x86_64 i386

Synopsis

The ones that sorely stick out:

  • Time/op
name old time/op new time/op delta samples
ServeMux-4 78.9µs ± 2% 91.0µs ± 1% +15.37% (p=0.029 n=4+4)

and the above matches the mux regression report, but with an addition ~5% regression on tip, so in general: ~15% regression since Go1.10, ~10% regression on tip since before 2fd1b52, at least on my machine

  • Allocs/op
name old allocs/op new allocs/op delta samples
CookieString-4 3.00 ± 0% 5.00 ± 0% +66.67% (p=0.029 n=4+4)

All results

[1] with a full gist of results at https://gist.github.com/odeke-em/3a6c95c4aa74d15619ab95728337e615.

$ benchstat benchmarks-go1.10.txt benchmarks-go-devel-87a18c6.txt
name                               old time/op    new time/op    delta
Client-4                              130µs ± 5%     125µs ± 8%     ~             (p=0.343 n=4+4)
ClientServer-4                        104µs ± 7%     107µs ± 2%     ~             (p=0.343 n=4+4)
ClientServerParallel4-4             1.26ms ±254%   0.42ms ±100%     ~             (p=1.000 n=4+4)
ClientServerParallel64-4              147µs ±69%     197µs ±34%     ~             (p=0.686 n=4+4)
ClientServerParallelTLS4-4            168µs ±30%     157µs ±22%     ~             (p=0.886 n=4+4)
ClientServerParallelTLS64-4          58.9µs ±32%   62.4µs ±122%     ~             (p=1.000 n=4+4)
CloseNotifier-4                       220µs ±15%     209µs ± 5%     ~             (p=0.486 n=4+4)
CookieString-4                        951ns ± 1%     982ns ± 1%   +3.31%          (p=0.029 n=4+4)
HeaderWriteSubset-4                   558ns ± 1%     528ns ± 2%   -5.29%          (p=0.029 n=4+4)
ReadCookies-4                        3.61µs ± 1%    3.69µs ± 1%   +2.29%          (p=0.029 n=4+4)
ReadRequestApachebench-4             1.63µs ± 1%    1.66µs ± 1%   +1.36%          (p=0.029 n=4+4)
ReadRequestChrome-4                  2.95µs ± 1%    2.92µs ± 1%     ~             (p=0.343 n=4+4)
ReadRequestCurl-4                    1.62µs ± 0%    1.65µs ± 1%   +1.84%          (p=0.029 n=4+4)
ReadRequestSiege-4                   2.11µs ± 1%    2.13µs ± 0%     ~             (p=0.343 n=4+4)
ReadRequestWrk-4                     1.14µs ± 1%    1.17µs ± 1%   +2.26%          (p=0.029 n=4+4)
ReadSetCookies-4                     3.24µs ± 1%    3.29µs ± 1%     ~             (p=0.143 n=4+4)
ResponseStatusLine-4                 35.1ns ± 2%    37.9ns ± 6%     ~             (p=0.086 n=4+4)
ServeMux-4                           78.9µs ± 2%    91.0µs ± 1%  +15.37%          (p=0.029 n=4+4)
Server-4                              131µs ± 3%     131µs ± 0%     ~             (p=0.343 n=4+4)
ServerFakeConnNoKeepAlive-4          18.9µs ± 1%    18.8µs ± 0%     ~             (p=0.486 n=4+4)
ServerFakeConnWithKeepAlive-4        10.8µs ± 5%    10.3µs ±12%     ~             (p=0.200 n=4+4)
ServerFakeConnWithKeepAliveLite-4    6.87µs ± 2%    6.83µs ± 1%     ~             (p=0.800 n=4+4)
ServerHandlerNoHeader-4              6.47µs ± 2%    6.60µs ± 2%     ~             (p=0.057 n=4+4)
ServerHandlerNoLen-4                 7.75µs ± 4%    7.58µs ± 1%     ~             (p=0.057 n=4+4)
ServerHandlerNoType-4                7.85µs ± 1%    7.82µs ± 1%     ~             (p=0.200 n=4+4)
ServerHandlerTypeLen-4               8.35µs ± 2%    8.17µs ± 2%     ~             (p=0.086 n=4+4)
ServerHijack-4                       20.3µs ± 0%    20.5µs ± 1%   +1.21%          (p=0.029 n=4+4)

name                               old alloc/op   new alloc/op   delta
Client-4                             3.49kB ± 0%    3.49kB ± 0%     ~             (p=1.000 n=4+4)
ClientServer-4                       4.98kB ± 0%    4.98kB ± 0%     ~             (p=0.429 n=4+4)
ClientServerParallel4-4              7.92kB ± 7%    7.93kB ± 2%     ~             (p=0.486 n=4+4)
ClientServerParallel64-4             4.66kB ±12%    4.51kB ± 3%     ~             (p=0.343 n=4+4)
ClientServerParallelTLS4-4           18.3kB ±28%    20.4kB ±21%     ~             (p=0.343 n=4+4)
ClientServerParallelTLS64-4          4.98kB ±11%    4.78kB ± 6%     ~             (p=0.686 n=4+4)
CloseNotifier-4                      3.32kB ± 0%    3.31kB ± 0%     ~             (p=0.229 n=4+4)
CookieString-4                         384B ± 0%      360B ± 0%   -6.25%          (p=0.029 n=4+4)
HeaderWriteSubset-4                  0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
ReadCookies-4                        2.01kB ± 0%    2.01kB ± 0%     ~     (all samples are equal)
ReadRequestApachebench-4               883B ± 0%      883B ± 0%     ~     (all samples are equal)
ReadRequestChrome-4                  1.46kB ± 0%    1.46kB ± 0%     ~     (all samples are equal)
ReadRequestCurl-4                      880B ± 0%      880B ± 0%     ~     (all samples are equal)
ReadRequestSiege-4                     965B ± 0%      965B ± 0%     ~     (all samples are equal)
ReadRequestWrk-4                       832B ± 0%      832B ± 0%     ~     (all samples are equal)
ReadSetCookies-4                       960B ± 0%      960B ± 0%     ~     (all samples are equal)
ResponseStatusLine-4                 0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
ServeMux-4                           19.9kB ± 0%    19.9kB ± 0%     ~     (all samples are equal)
Server-4                             2.24kB ± 0%    2.24kB ± 0%     ~             (p=0.286 n=4+4)
ServerFakeConnNoKeepAlive-4          4.64kB ± 0%    4.59kB ± 0%   -1.09%          (p=0.029 n=4+4)
ServerFakeConnWithKeepAlive-4        2.44kB ± 0%    2.44kB ± 0%     ~     (all samples are equal)
ServerFakeConnWithKeepAliveLite-4    1.30kB ± 0%    1.30kB ± 0%     ~     (all samples are equal)
ServerHandlerNoHeader-4              1.30kB ± 0%    1.30kB ± 0%     ~     (all samples are equal)
ServerHandlerNoLen-4                 2.08kB ± 0%    2.08kB ± 0%     ~     (all samples are equal)
ServerHandlerNoType-4                2.08kB ± 0%    2.08kB ± 0%     ~     (all samples are equal)
ServerHandlerTypeLen-4               2.12kB ± 0%    2.12kB ± 0%     ~     (all samples are equal)
ServerHijack-4                       16.1kB ± 0%    16.0kB ± 0%   -0.30%          (p=0.029 n=4+4)

name                               old allocs/op  new allocs/op  delta
Client-4                               49.0 ± 0%      49.0 ± 0%     ~     (all samples are equal)
ClientServer-4                         65.0 ± 0%      65.0 ± 0%     ~     (all samples are equal)
ClientServerParallel4-4                79.5 ± 8%      76.5 ± 2%     ~             (p=0.114 n=4+4)
ClientServerParallel64-4               64.5 ± 5%      63.2 ± 1%     ~             (p=0.229 n=4+4)
ClientServerParallelTLS4-4              196 ±26%       211 ±19%     ~             (p=0.686 n=4+4)
ClientServerParallelTLS64-4            69.2 ± 7%      66.8 ± 4%     ~             (p=0.543 n=4+4)
CloseNotifier-4                        54.0 ± 0%      50.0 ± 0%   -7.41%          (p=0.029 n=4+4)
CookieString-4                         3.00 ± 0%      5.00 ± 0%  +66.67%          (p=0.029 n=4+4)
HeaderWriteSubset-4                   0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
ReadCookies-4                          16.0 ± 0%      16.0 ± 0%     ~     (all samples are equal)
ReadRequestApachebench-4               9.00 ± 0%      9.00 ± 0%     ~     (all samples are equal)
ReadRequestChrome-4                    14.0 ± 0%      14.0 ± 0%     ~     (all samples are equal)
ReadRequestCurl-4                      9.00 ± 0%      9.00 ± 0%     ~     (all samples are equal)
ReadRequestSiege-4                     11.0 ± 0%      11.0 ± 0%     ~     (all samples are equal)
ReadRequestWrk-4                       7.00 ± 0%      7.00 ± 0%     ~     (all samples are equal)
ReadSetCookies-4                       17.0 ± 0%      17.0 ± 0%     ~     (all samples are equal)
ResponseStatusLine-4                  0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
ServeMux-4                              540 ± 0%       540 ± 0%     ~     (all samples are equal)
Server-4                               20.0 ± 0%      20.0 ± 0%     ~     (all samples are equal)
ServerFakeConnNoKeepAlive-4            48.0 ± 0%      49.0 ± 0%   +2.08%          (p=0.029 n=4+4)
ServerFakeConnWithKeepAlive-4          22.0 ± 0%      22.0 ± 0%     ~     (all samples are equal)
ServerFakeConnWithKeepAliveLite-4      12.0 ± 0%      12.0 ± 0%     ~     (all samples are equal)
ServerHandlerNoHeader-4                12.0 ± 0%      12.0 ± 0%     ~     (all samples are equal)
ServerHandlerNoLen-4                   17.0 ± 0%      17.0 ± 0%     ~     (all samples are equal)
ServerHandlerNoType-4                  18.0 ± 0%      18.0 ± 0%     ~     (all samples are equal)
ServerHandlerTypeLen-4                 20.0 ± 0%      20.0 ± 0%     ~     (all samples are equal)
ServerHijack-4                         46.0 ± 0%      47.0 ± 0%   +2.17%          (p=0.029 n=4+4)

name                               old speed      new speed      delta
ReadRequestApachebench-4           50.2MB/s ± 1%  49.5MB/s ± 1%   -1.33%          (p=0.029 n=4+4)
ReadRequestChrome-4                 207MB/s ± 1%   209MB/s ± 1%     ~             (p=0.343 n=4+4)
ReadRequestCurl-4                  48.1MB/s ± 0%  47.2MB/s ± 1%   -1.81%          (p=0.029 n=4+4)
ReadRequestSiege-4                 71.6MB/s ± 1%  71.0MB/s ± 0%     ~             (p=0.343 n=4+4)
ReadRequestWrk-4                   35.0MB/s ± 1%  34.3MB/s ± 1%   -2.22%          (p=0.029 n=4+4)
@odeke-em

This comment has been minimized.

Member

odeke-em commented Jun 5, 2018

Alright, I've also just discovered another regression from CookieString where we switched to using strings.Builder in commit 48db2c0#diff-44567335da7512324c74b50e5fcb3def aka CL https://go-review.googlesource.com/102479

The results are wild for CookieString with both ns/op and allocs/op increasing but more radically allocs/op increasing by 40%

Before

BenchmarkCookieString-4   	 1000000	      1055 ns/op	     384 B/op	       3 allocs/op

After

BenchmarkCookieString-4   	 1000000	      1086 ns/op	     360 B/op	       5 allocs/op

While I don't think Cookie.String is used a lot; for the niche cases of servers that log cookies per request, this is a huge incurred cost, thus I propose that for Go1.11 we should roll back the CookieString change. It worked well before and am not sure that the change to use strings.Builder provides any better readability improvements either. Paging @bradfitz and @ianlancetaylor as per the reported CL

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented Jun 5, 2018

@odeke-em, one such regression was fixed by choosing right strings.Builder preallocation size.
See CL113235.

@odeke-em

This comment has been minimized.

Member

odeke-em commented Jun 5, 2018

@Quasilyte nice, but that required some specialized skills and scrutiny #LiamNeeson
However, I am not sure that using preallocation in net/http.Cookie.String is that intuitive and can be easily reasoned about.

@gopherbot

This comment has been minimized.

gopherbot commented Jun 5, 2018

Change https://golang.org/cl/113996 mentions this issue: net/http: fix a few regressions since Go1.10

gopherbot pushed a commit that referenced this issue Jun 21, 2018

net/http: avoid deferred unlock in ServeMux.shouldRedirect
CL 96575 introduced concurrency protection for
ServeMux.shouldRedirect with a read lock and deferred unlock.
However, the change produced a noticeable regression.
Instead add the suffix "RLocked" to the function name to
declare that we should hold the read lock as a pre-requisite
before calling it, hence avoiding the defer altogether.

Benchmarks:
name                  old time/op    new time/op    delta
ServeMux-8              63.3µs ± 0%    54.6µs ± 0%  -13.74%  (p=0.000 n=9+9)
ServeMux_SkipServe-8    41.4µs ± 2%    32.7µs ± 1%  -21.05%  (p=0.000 n=10+10)

name                  old alloc/op   new alloc/op   delta
ServeMux-8              17.3kB ± 0%    17.3kB ± 0%     ~     (all equal)
ServeMux_SkipServe-8     0.00B          0.00B          ~     (all equal)

name                  old allocs/op  new allocs/op  delta
ServeMux-8                 360 ± 0%       360 ± 0%     ~     (all equal)
ServeMux_SkipServe-8      0.00           0.00          ~     (all equal)

Updates #25383
Updates #25482

Change-Id: I2ffa4eafe165faa961ce23bd29b5653a89facbc2
Reviewed-on: https://go-review.googlesource.com/113996
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment