Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: client can't fully utilize network resource #37373

Open
ochanism opened this issue Feb 22, 2020 · 10 comments
Open

x/net/http2: client can't fully utilize network resource #37373

ochanism opened this issue Feb 22, 2020 · 10 comments
Milestone

Comments

@ochanism
Copy link

@ochanism ochanism commented Feb 22, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jayden/Library/Caches/go-build"
GOENV="/Users/jayden/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/jayden/Workspace/GoProject"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/jayden/Workspace/GoProject/src/golang.org/x/net/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/j1/94761d9d49j6gz5p55375sbr0000gn/T/go-build875938892=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I ran an http2 webdav server and I made a client with golang x/net/http2 for a test.
The client uploaded a 1 MB file infinitely with a few hundred goroutine.
MaxConcurrentStreams of the server was 100 and StrictMaxConcurrentStreams of the client was false.
I measured the throughput between the server and the client on AWS.
And network bandwidth between the server and the client was 150 Gbps in the infrastructure level.

What did you expect to see?

I expected that the throughput would be around 150 Gbps.

What did you see instead?

The measured throughput was 90 Gbps.

Proposal

I figured out what caused this. I modified the http2 connection pool a little as shown in the below link.
(ochanism/net@b2cc93d)
With the above code, I could achieve 150 Gbps throughput.
For achieving high throughput, multiple TCP connections are needed.
But the current version of the http2 connection pool increases TCP connections only when there is no available TCP connection (CurrentMaxStreams == MaxConcurrentStreams for all the connections).
I reduced MaxConcurrentStreams value at the server-side, but the number of TCP connections wasn't increased than I expected.

So I added a MinConcurrentConns field to http2 Transport.
MinConcurrentConns is the minimum number of TCP connections and ClientConnPool tries to maintain MinConcurrentConns TCP connections at least. If 0, ClientConnPool creates a TCP connection only when needed.

Please review my proposal.

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Feb 22, 2020

Curious if you just tried to walk backwards through the connections?

@ochanism

This comment has been minimized.

Copy link
Author

@ochanism ochanism commented Feb 22, 2020

Curious if you just tried to walk backwards through the connections?

@fraenkel
Could you elaborate your question?

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Feb 22, 2020

@ochanism Your fix was to randomly start somewhere in the list. The current code walks forward. Since connections are added to the end, in most cases, the next connection with availability will be the last one. Obviously it all depends on which streams in which connections are closed.

Ultimately the bigger issue is that one has no way of configuring your new option from net/http. Now if this is only for use from net/http2 its less of an issue.

Its also related to #34511

@ochanism

This comment has been minimized.

Copy link
Author

@ochanism ochanism commented Feb 22, 2020

@fraenkel
Client in net/http doesn't have this issue that I raised since the number of TCP connections for connection pooling can be adjusted via MaxIdleConns. However, there is no way to control the number of TCP connections in net/http2. So I've added MinConcurrentConns to control it.
Moreover, walking connections backward cannot resolve this issue. In this case, every request will select the last connection at a high probability and the remaining connections in the pool will be expired
due to IdleTimeout. So I've made the client randomly select a connection within MinConcurrentConns window to prevent the expiration.

@fraenkel

This comment has been minimized.

Copy link
Contributor

@fraenkel fraenkel commented Feb 22, 2020

Your selection will have similar issues in practice. Randomly selecting won't solve optimizing the network traffic across multiple connections. The underlying locking is a bit loose and writes currently lock reads. Which means depending on how loaded (# of streams) a connection is, and how much data is being sent back, the network may appear to be utilized but you are still dealing with lots of contention.

The fact that you are creating a new rand for each request does not provide any guarantees over the selection across the connections.

@ochanism

This comment has been minimized.

Copy link
Author

@ochanism ochanism commented Feb 22, 2020

@fraenkel
Yes, you're right. All the random algorithms in real world have the same problem. The goal of my proposal is not to utilize network resource optimally. The main issue of the current http2 client is to use a small number of tcp connections even if there is a large traffic (not high frequent request). I can make it optimal, but I don't want to make it complex. I think the random algorithm shows a good performance for generic use cases. Of course, as you mentioned, there could be an issue for some edge cases.

@cagedmantis cagedmantis added this to the Backlog milestone Feb 27, 2020
@cagedmantis

This comment has been minimized.

Copy link
Contributor

@cagedmantis cagedmantis commented Feb 27, 2020

@ochanism

This comment has been minimized.

Copy link
Author

@ochanism ochanism commented Mar 29, 2020

@cagedmantis @bradfitz @tombergan
Please give me any feedback.
I've been waiting for your review for 1 month.

@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Mar 30, 2020

It sounds like a knob we wouldn't want to expose to users.

How would you document how to tune it?

I'd rather just see it be automatically fast instead.

@ochanism

This comment has been minimized.

Copy link
Author

@ochanism ochanism commented Mar 31, 2020

It sounds like a knob we wouldn't want to expose to users.

How would you document how to tune it?

I'd rather just see it be automatically fast instead.

@bradfitz
I agree with your opinion. It would be very nice if the value is adjusted automatically according to the current traffic pattern. But as you know, estimating and understanding the traffic patterns is not an easy problem. Many variables are related to it.

I think the MinConcurrentConns field has a similar role to the MaxIdleConns field in HTTP transport. (Not exact same, but similar)
Normally, most users don't need to care about MinConcurrentConns value. When MinConcurrentConns = 0, it will work as the previous. But some users will face the low-throughput problem like me, and they need to find an appropriate MinConcurrentConns value for their traffic pattern.

I've been running a server with this library in production, and there was a huge gain in terms of throughput. When I used the vanilla net/http library, an HTTP client uses only one or two TCP connections even if there is huge traffic. When I set strictMaxConcurrentStreams to false, I expected that the number of TCP connections would be increased if there is burst traffic. But it didn't work as I expected.

What do you think about this? Do you have any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.