Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoints object in kubemark can only have a single backend! #59823

Closed
shyamjvs opened this issue Feb 13, 2018 · 8 comments
Closed

Endpoints object in kubemark can only have a single backend! #59823

shyamjvs opened this issue Feb 13, 2018 · 8 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.

Comments

@shyamjvs
Copy link
Member

Recently profile collection has been added to our scalability tests. While running the kubemark-500 presubmit, @wojtek-t noticed from the memory allocation profile that an unusually large amount of allocations (~10GB) were being done by k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/errors.aggregate.Error() during load test. See - https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-kubemark-500-gce/11932/artifacts/profiles/ApiserverMemoryProfile_load.pdf

After he added some logging, I took a look at the apiserver logs and it seems like there are a huge no. of such errors:

E0213 15:42:17.454988       7 errors.go:207] AAA (Endpoints load-medium-15-svc) subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-xnsk2
E0213 15:42:17.482183       7 errors.go:207] AAA (Endpoints load-medium-15-svc) [subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-xnsk2, subsets[0].addresses[1].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-xbnqv]
E0213 15:42:17.549520       7 errors.go:207] AAA (Endpoints load-medium-15-svc) [subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-xnsk2, subsets[0].addresses[1].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-xbnqv, subsets[0].addresses[3].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-tx2x9]
E0213 15:42:17.552956       7 errors.go:207] AAA (Endpoints load-medium-15-svc) [subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-ztq89, subsets[0].addresses[1].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-xnsk2, subsets[0].addresses[2].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-xbnqv, subsets[0].addresses[4].nodeName: Forbidden: Cannot change NodeName for 2.3.4.5 to hollow-node-tx2x9]

On digging up a bit I observed few things:

  • These errors seem to be arising from PUT endpoints calls made by the ep-controller
  • The first PUT call for every endpoints object succeeds (i.e 200), and all the subsequent PUTs for it fail with 422
  • Because of the above, we never go above a single backend in the endpoints object (I verified this from a live kubemark cluster)
  • These error msgs grow linearly as more and more backends try to be put into the endpoints object. As a result we're seeing a cumulative O(N^2) allocs for the error msgs, where N is the no. of backends in the service.

@kubernetes/sig-scalability-misc @wojtek-t

@k8s-ci-robot k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Feb 13, 2018
@shyamjvs
Copy link
Member Author

/assign

I'll take a further look into this.

@shyamjvs shyamjvs added the kind/bug Categorizes issue or PR as related to a bug. label Feb 13, 2018
@shyamjvs
Copy link
Member Author

I guess I found the reason. There is a validation step in the apiserver for endpoints objects, which checks that an IP entry in the endpoints object cannot have it's nodeName overwritten. And this is precisely what is violated in kubemark (i.e we're trying to override the nodeName for IP).

This is because in kubemark, we're always using the same constant IP address for all our fake pods (which is 2.3.4.5) due to fake docker client setting a constant value for it (see - https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/libdocker/fake_client.go#L600). This is causing those clashes.

@shyamjvs
Copy link
Member Author

Note that this means all this while we haven't actually been populating endpoints objects in kubemark. And this now explains why we've been seeing such a difference in PUT endpoints call latencies across kubemark and real clusters in our performance benchmark:

W0213 18:24:49.028] load      PUT     endpoints                            namespace  Perc99      AvgL/R=5.60   AvgL(ms)=56.73   AvgR(ms)=10.13

Ref https://storage.googleapis.com/kubernetes-jenkins/logs/ci-perf-tests-kubemark-100-benchmark/3237/build-log.txt

@shyamjvs
Copy link
Member Author

shyamjvs commented Feb 13, 2018

IMO the correct solution for this is to fix our docker-client mock to actually assign different IPs to different containers. And this can be done in at least a couple of ways:

  • assign a random IP to every container (the probablility of collision should be very low with 32 bits of randomness and not too many backends within a single service). Even if there are few collisions, we should be able to live with it IMO
  • assign a proper CIDR to each hollow-node (this may need us to enable cidr-allocator for kubemark) and use that within the fake docker-client to choose IPs (probably we can just assign them serially)

I'll take up the implementation of this fix.

cc @kubernetes/sig-node-bugs

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 13, 2018
@shyamjvs
Copy link
Member Author

Sent out the above PR implementing the first approach, as I think it is sufficient and the 2nd one would be an overkill.

@wojtek-t
Copy link
Member

@shyamjvs - great debugging!

Unfortunately your fix doesn't seem to help.

@shyamjvs
Copy link
Member Author

shyamjvs commented Feb 14, 2018 via email

@shyamjvs
Copy link
Member Author

Seems like it did help a bit, but not completely - #59832 (comment)
Will look frther.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests

3 participants