Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the k8s client hits some concurrent connections limit and it takes very long time to send all updates to the discovery manager #4528

Closed
krasi-georgiev opened this Issue Aug 22, 2018 · 6 comments

Comments

Projects
None yet
4 participants
@krasi-georgiev
Copy link
Member

krasi-georgiev commented Aug 22, 2018

There was an update in the golang/net/http2 package that causes the k8s to hit some connection limit and NOT takes very long time to send full targerts update to the discovery manager.

Might be related to #4518
Found while troubleshooting #4124

the http2 update in Prometheus happened in 0f37c02

The http2 commit that causes this behaviour golang/net@1c05540

To replicate:

minikube start
kubectl apply -f https://gist.githubusercontent.com/krasi-georgiev/337a0d5a5cf728bff87c102e825425af/raw/12e6df08752f82c53182e344ed3dbc879b886c68/sd-load.yaml
  • add some debugging info in discovery/manager.go
                        m.updateGroup(poolKey, tgs)
			// m.recentlyUpdatedMtx.Lock()
			// m.recentlyUpdated = true
			// m.recentlyUpdatedMtx.Unlock()

			var total int
			for _, tt := range m.allGroups() {
				for _, tg := range tt {
					if tg.Source == "endpoints/default/sd-load" {
						total += len(tg.Targets)
					}
				}
			}
			fmt.Println(total)
  • run Prometheus with the attached config(adjust the api_server IP and cert file paths for all jobs.).
go run cmd/prometheus/main.go --config.file=../../.local/sd-load/config.yaml
  • Scale up and down the deployment and the total number of targets should be 10x current replicas which doesn't happen most of the time.
kubectl scale deployment sd-load --replicas=10
kubectl scale deployment sd-load --replicas=1

config.zip

@cofyc any pointers on where to look in our k8s implementation to increase or disable this limit?

@cofyc

This comment has been minimized.

Copy link
Contributor

cofyc commented Aug 23, 2018

I'm looking into this.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Aug 23, 2018

@sttts is it possible to set the max number of concurrent streams through client-go?

@krasi-georgiev

This comment has been minimized.

Copy link
Member Author

krasi-georgiev commented Aug 24, 2018

I have narrowed it down to the changes in the canTakeNewRequestLocked logic.
https://github.com/golang/net/blob/1c05540f6879653db88113bc4a2b70aec4bd491f/http2/transport.go#L621

my feeling is that the ClientConn reports that it can handle more connections, but when the caller actually calls the RoundTrip it fails somewhere.

@krasi-georgiev krasi-georgiev changed the title the k8s client hits the maxConcurrentStreams so it fails to send all updates to the discovery manager the k8s client hist some concurrent connections limit so it fails to send all updates to the discovery manager Aug 24, 2018

@krasi-georgiev krasi-georgiev changed the title the k8s client hist some concurrent connections limit so it fails to send all updates to the discovery manager the k8s client hits some concurrent connections limit so it fails to send all updates to the discovery manager Aug 24, 2018

@krasi-georgiev krasi-georgiev changed the title the k8s client hits some concurrent connections limit so it fails to send all updates to the discovery manager the k8s client hits some concurrent connections limit and it takes very long time to send all updates to the discovery manager Aug 27, 2018

@krasi-georgiev

This comment has been minimized.

Copy link
Member Author

krasi-georgiev commented Aug 27, 2018

the limitation is from the server. It all works as expected. when I start minikube with

minikube start --extra-config=apiserver.http2-max-streams-per-connection=10000

So I guess this change in the http2 client package changes the workflow so that it hits different execution branch.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Nov 19, 2018

FTR the root issue is the Go HTTP2 library and tracked at golang/go#27044. It might be addressed in go 1.12.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Apr 19, 2019

Closing as the root issue golang/go#27044 has been closed by https://go-review.googlesource.com/c/net/+/151857/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.