New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core-dns pods crashing in large-cluster performance tests #68613

Open
shyamjvs opened this Issue Sep 13, 2018 · 91 comments

Comments

Projects
None yet
@shyamjvs
Member

shyamjvs commented Sep 13, 2018

Sometime ago I observed in our 5k-node scalability tests that core-dns pods were crashing (we use n1-standard-1 nodes there). This was after we switched the default from kube-dns to core-dns in #67569. We need to investigate this before we cut the release.

/kind bug
/sig scalability
/sig network
/priority critical-urgent
/milestone v1.12

cc @wojtek-t @krzysied

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs
Member

shyamjvs commented Sep 13, 2018

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

From our load test where we create ~8k services with a total of ~75k backends:

$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-779ffd89bd-25hk5   1/1     Running   20         23h
coredns-779ffd89bd-2b2kv   1/1     Running   26         23h
coredns-779ffd89bd-2f877   1/1     Running   21         23h
coredns-779ffd89bd-2k89h   1/1     Running   25         23h
coredns-779ffd89bd-2lctr   1/1     Running   22         23h
coredns-779ffd89bd-2ldqj   1/1     Running   30         23h
coredns-779ffd89bd-2mqhl   1/1     Running   27         23h
coredns-779ffd89bd-2rwk4   1/1     Running   22         23h
coredns-779ffd89bd-2x9gb   1/1     Running   26         23h
coredns-779ffd89bd-2xbtk   1/1     Running   19         23h
...
Member

shyamjvs commented Sep 13, 2018

From our load test where we create ~8k services with a total of ~75k backends:

$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-779ffd89bd-25hk5   1/1     Running   20         23h
coredns-779ffd89bd-2b2kv   1/1     Running   26         23h
coredns-779ffd89bd-2f877   1/1     Running   21         23h
coredns-779ffd89bd-2k89h   1/1     Running   25         23h
coredns-779ffd89bd-2lctr   1/1     Running   22         23h
coredns-779ffd89bd-2ldqj   1/1     Running   30         23h
coredns-779ffd89bd-2mqhl   1/1     Running   27         23h
coredns-779ffd89bd-2rwk4   1/1     Running   22         23h
coredns-779ffd89bd-2x9gb   1/1     Running   26         23h
coredns-779ffd89bd-2xbtk   1/1     Running   19         23h
...
@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

Looking at one such pod, it seems to have been OOMKilled:

Name:               coredns-779ffd89bd-wzhpr
Namespace:          kube-system
...
    State:          Running
      Started:      Wed, 12 Sep 2018 16:30:24 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
Member

shyamjvs commented Sep 13, 2018

Looking at one such pod, it seems to have been OOMKilled:

Name:               coredns-779ffd89bd-wzhpr
Namespace:          kube-system
...
    State:          Running
      Started:      Wed, 12 Sep 2018 16:30:24 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

From the yaml, it seems like core-dns has the same resources set as what kube-dns used to have earlier:

resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi

Member

shyamjvs commented Sep 13, 2018

From the yaml, it seems like core-dns has the same resources set as what kube-dns used to have earlier:

resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

@fturib Couple of questions:

  • Do we have any evidence that core-dns has greater resource footprint than kube-dns?
  • Also, I remember we were discussing about scale-testing core-dns a while ago for GA. Could you confirm what scale was it tested at?
Member

shyamjvs commented Sep 13, 2018

@fturib Couple of questions:

  • Do we have any evidence that core-dns has greater resource footprint than kube-dns?
  • Also, I remember we were discussing about scale-testing core-dns a while ago for GA. Could you confirm what scale was it tested at?
@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 13, 2018

Member

@thockin @bowei @kubernetes/sig-network-bugs - @shyamjvs and @krzysied are working on confirming that this is the final reason of regression; but assuming that it's the case, given where we currently are in release phase, I think that we should revert switch to core-dns by default (if that is the reason - unless we know how to fix those fast).

Member

wojtek-t commented Sep 13, 2018

@thockin @bowei @kubernetes/sig-network-bugs - @shyamjvs and @krzysied are working on confirming that this is the final reason of regression; but assuming that it's the case, given where we currently are in release phase, I think that we should revert switch to core-dns by default (if that is the reason - unless we know how to fix those fast).

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

IMHO we should revert this now and re-revert it back once we fix (and test) the OOM-kill issue.

Member

shyamjvs commented Sep 13, 2018

IMHO we should revert this now and re-revert it back once we fix (and test) the OOM-kill issue.

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 13, 2018

Member

We should first confirm with 100% certainty that it's that.

Member

wojtek-t commented Sep 13, 2018

We should first confirm with 100% certainty that it's that.

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t
Member

wojtek-t commented Sep 13, 2018

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

We should first confirm with 100% certainty that it's that.

We have evidence that load test passed in this run right before the switch to core-dns was made. But I agree it's probably worth confirming once again (@krzysied is running the test currently).

Member

shyamjvs commented Sep 13, 2018

We should first confirm with 100% certainty that it's that.

We have evidence that load test passed in this run right before the switch to core-dns was made. But I agree it's probably worth confirming once again (@krzysied is running the test currently).

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

Do we have any evidence that core-dns has greater resource footprint than kube-dns?

Our early scale tests (5K nodes, 10K services, max QPS load) showed that they were essentially the same (coredns slightly lower). Although this was done on coredns version 0.9.9. Coredns 1.2.2 is the version being tested.

CoreDNS is less memory efficient with cache than dnsmasq (what kube-dns uses). But, IIRC, these e2e scale tests do not provide any QPS load, so that would not be a factor because without any queries, cache would not grow in size. QPS load would have to be extremely high for cache to grow large enough to cause a problem during the test. Furthermore, the latest version of coredns has a smaller default cache size, such that if it is full, it should fit in the deployment's default limit.

CoreDNS does deploy two instances by default. Kube-DNS only deploys one. So together, the coredns pods would of course take 2X the memory, but individually the expectation is they should be about the same (assuming no significant cache usage).

Do you have any statistics collected on the memory growth?

There are also these scale service tests in sig-network added last release, which appear to be passing OK. These run the maximum number of services allowed per cluster, and query a subset of the service names to verify function. They are only run on a few nodes, but coredns doesn't store node information, so a large number of nodes should not affect the memory footprint.

Contributor

chrisohaver commented Sep 13, 2018

Do we have any evidence that core-dns has greater resource footprint than kube-dns?

Our early scale tests (5K nodes, 10K services, max QPS load) showed that they were essentially the same (coredns slightly lower). Although this was done on coredns version 0.9.9. Coredns 1.2.2 is the version being tested.

CoreDNS is less memory efficient with cache than dnsmasq (what kube-dns uses). But, IIRC, these e2e scale tests do not provide any QPS load, so that would not be a factor because without any queries, cache would not grow in size. QPS load would have to be extremely high for cache to grow large enough to cause a problem during the test. Furthermore, the latest version of coredns has a smaller default cache size, such that if it is full, it should fit in the deployment's default limit.

CoreDNS does deploy two instances by default. Kube-DNS only deploys one. So together, the coredns pods would of course take 2X the memory, but individually the expectation is they should be about the same (assuming no significant cache usage).

Do you have any statistics collected on the memory growth?

There are also these scale service tests in sig-network added last release, which appear to be passing OK. These run the maximum number of services allowed per cluster, and query a subset of the service names to verify function. They are only run on a few nodes, but coredns doesn't store node information, so a large number of nodes should not affect the memory footprint.

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

From our load test where we create ~8k services with a total of ~75k backends:

@shyamjvs, what tests are "our load tests"? And by "backend" do you mean a service endpoint? Can you provide more context? Is this the e2e test, or a private test your group performs using the perf-test tool? I see many coredns replicas, running for 24 hours, so presumably this is not the e2e tests.

Contributor

chrisohaver commented Sep 13, 2018

From our load test where we create ~8k services with a total of ~75k backends:

@shyamjvs, what tests are "our load tests"? And by "backend" do you mean a service endpoint? Can you provide more context? Is this the e2e test, or a private test your group performs using the perf-test tool? I see many coredns replicas, running for 24 hours, so presumably this is not the e2e tests.

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member
Member

shyamjvs commented Sep 13, 2018

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member
Member

shyamjvs commented Sep 13, 2018

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

Ah, OK Thanks! I didn't know those tests ran for extended periods of time. I thought they just spun up a number of services, then spun them down. How long do these tests run? I see 23 hours in the output you provided. Or is the cluster not rebuilt each test, the same coredns instances processing 10K service down/up for multiple tests? If thats the case, I could see a possibility of memory leak there maybe. We have not tested repeated spin-up/teardown of a huge number of services.

I recall we did simulate a 10% "churn" per hour of services in our local scale/load tests, which we felt was realistic. Services shouldn't churn very much in a real deployment so we chose to spin up/down about 10% of the services per hour (pods would churn a lot, but coredns/kube-dns do not monitor pods).

Contributor

chrisohaver commented Sep 13, 2018

Ah, OK Thanks! I didn't know those tests ran for extended periods of time. I thought they just spun up a number of services, then spun them down. How long do these tests run? I see 23 hours in the output you provided. Or is the cluster not rebuilt each test, the same coredns instances processing 10K service down/up for multiple tests? If thats the case, I could see a possibility of memory leak there maybe. We have not tested repeated spin-up/teardown of a huge number of services.

I recall we did simulate a 10% "churn" per hour of services in our local scale/load tests, which we felt was realistic. Services shouldn't churn very much in a real deployment so we chose to spin up/down about 10% of the services per hour (pods would churn a lot, but coredns/kube-dns do not monitor pods).

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

To clarify few things:

  • The load test I mentioned above typically takes about 8-10 hrs when running on a 5000-node cluster
  • It creates a bunch of services and then RCs which fall under those services. Then it scales those RCs up/down randomly and finally deletes the RCs and services
  • In this case you see 23 hrs because we were running that load test manually and didn't delete the cluster for a few hours after the test finished. Anyway, the crashes start to happen as the test proceeds
  • We're not doing any repeated spin-up or teardown here. It's just a one-off test.
  • The test doesn't create any sustained churn of services (except the initial creation and the final deletion of those). It's more of a churn with the endpoints (as RCs are created, scaled, etc..)
Member

shyamjvs commented Sep 13, 2018

To clarify few things:

  • The load test I mentioned above typically takes about 8-10 hrs when running on a 5000-node cluster
  • It creates a bunch of services and then RCs which fall under those services. Then it scales those RCs up/down randomly and finally deletes the RCs and services
  • In this case you see 23 hrs because we were running that load test manually and didn't delete the cluster for a few hours after the test finished. Anyway, the crashes start to happen as the test proceeds
  • We're not doing any repeated spin-up or teardown here. It's just a one-off test.
  • The test doesn't create any sustained churn of services (except the initial creation and the final deletion of those). It's more of a churn with the endpoints (as RCs are created, scaled, etc..)
@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

Thanks, that clarifies things for me. CoreDNS does monitor endpoints, so there may be a leak there in CoreDNS. Although if it is, I think it would be a leak in the k8s api go client/cache we use (coredns vendors 8.0.0 of client-go). CoreDNS doesn't store dns records, it constructs them on demand from the api client cache when a query is received. (Am i correct in saying there is no QPS load in this test?)

Contributor

chrisohaver commented Sep 13, 2018

Thanks, that clarifies things for me. CoreDNS does monitor endpoints, so there may be a leak there in CoreDNS. Although if it is, I think it would be a leak in the k8s api go client/cache we use (coredns vendors 8.0.0 of client-go). CoreDNS doesn't store dns records, it constructs them on demand from the api client cache when a query is received. (Am i correct in saying there is no QPS load in this test?)

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

Looks like kube-dns client-go is still at v3 ... last updated about a year ago.

Contributor

chrisohaver commented Sep 13, 2018

Looks like kube-dns client-go is still at v3 ... last updated about a year ago.

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

CoreDNS does monitor endpoints

Few questions about this:

  • Does that mean it keeps the endpoints objects in its memory (as that can be a sizable amount)?
  • Was kube-dns also monitoring endpoints?
  • Out of curiosity - why does DNS have to know about endpoints? (isn't just the service IP enough?)

Am i correct in saying there is no QPS load in this test?

No QPS that we are creating in the test (there may still be some default control-plane lookups - I'm not sure though).

Member

shyamjvs commented Sep 13, 2018

CoreDNS does monitor endpoints

Few questions about this:

  • Does that mean it keeps the endpoints objects in its memory (as that can be a sizable amount)?
  • Was kube-dns also monitoring endpoints?
  • Out of curiosity - why does DNS have to know about endpoints? (isn't just the service IP enough?)

Am i correct in saying there is no QPS load in this test?

No QPS that we are creating in the test (there may still be some default control-plane lookups - I'm not sure though).

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor
  • Endpoints API objects are kept in memory, in an api cache (part of client-go lib).
  • Kube-dns AFAIK more or less does the same. I'll look in the kube-dns code again, its been a long time since I looked.
  • DNS needs to know endpoints to be able to serve records for headless services (service without a cluster IP). There is no way monitor only changes for headless services (that I know of) - so all endpoints are monitored, even ones for cluster IP services.
Contributor

chrisohaver commented Sep 13, 2018

  • Endpoints API objects are kept in memory, in an api cache (part of client-go lib).
  • Kube-dns AFAIK more or less does the same. I'll look in the kube-dns code again, its been a long time since I looked.
  • DNS needs to know endpoints to be able to serve records for headless services (service without a cluster IP). There is no way monitor only changes for headless services (that I know of) - so all endpoints are monitored, even ones for cluster IP services.
@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 13, 2018

Member

@chrisohaver - from what you wrote above, it seems that CoreDNS is processing (and keeping) k8s objects pretty much the same way that kube-dns is doing.

But because of some reason (assuming that @shyamjvs checked that correcly and both are requesting the same amount of resources), something must be significantly differently, because we weren't observing any extensive crashing of kube-dns (up until it was replaced to CoreDNS).

Member

wojtek-t commented Sep 13, 2018

@chrisohaver - from what you wrote above, it seems that CoreDNS is processing (and keeping) k8s objects pretty much the same way that kube-dns is doing.

But because of some reason (assuming that @shyamjvs checked that correcly and both are requesting the same amount of resources), something must be significantly differently, because we weren't observing any extensive crashing of kube-dns (up until it was replaced to CoreDNS).

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 13, 2018

Member

And the test is exactly the same as it was - we didn't change that around that time.

Member

wojtek-t commented Sep 13, 2018

And the test is exactly the same as it was - we didn't change that around that time.

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 13, 2018

Member

In general, it seems to me like this e2e you're running to scale-test dns is insufficient, as it's only creating empty services (without any endpoints). IMO that test should be changed to also create some sizeable endpoints objects for those services (for e.g manually, or by actually creating pods as part of those services - which kind of ends up being like this load test) to reflect more realistic memory usage.

Member

shyamjvs commented Sep 13, 2018

In general, it seems to me like this e2e you're running to scale-test dns is insufficient, as it's only creating empty services (without any endpoints). IMO that test should be changed to also create some sizeable endpoints objects for those services (for e.g manually, or by actually creating pods as part of those services - which kind of ends up being like this load test) to reflect more realistic memory usage.

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

@wojtek-t, Well - Its possible that while kube-dns subscribes to events for endpoints, doesn't cache them if they are for cluster IP services. Or, that it doesn't cache endpoint API objects in their entirety. I'm looking at the code today.

@shyamjvs, Yes the other test only tests for service only. This was known/discussed at the time. The test is not a complete test - it was a first step. Adding endpoints objects (with no pods because it's not a scale cluster) would be the logical next step.

Contributor

chrisohaver commented Sep 13, 2018

@wojtek-t, Well - Its possible that while kube-dns subscribes to events for endpoints, doesn't cache them if they are for cluster IP services. Or, that it doesn't cache endpoint API objects in their entirety. I'm looking at the code today.

@shyamjvs, Yes the other test only tests for service only. This was known/discussed at the time. The test is not a complete test - it was a first step. Adding endpoints objects (with no pods because it's not a scale cluster) would be the logical next step.

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

Do the load tests create any headless services? Or they all cluster IP services.

Contributor

chrisohaver commented Sep 13, 2018

Do the load tests create any headless services? Or they all cluster IP services.

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 13, 2018

Contributor

we need to see what metrics we can start gathering during the test

Ah, Yes - ironically this isn't always so simple in kubernetes.
@wojtek-t, Is any resource metric monitoring exposed/available on these test systems?

Contributor

chrisohaver commented Sep 13, 2018

we need to see what metrics we can start gathering during the test

Ah, Yes - ironically this isn't always so simple in kubernetes.
@wojtek-t, Is any resource metric monitoring exposed/available on these test systems?

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 14, 2018

Member

Ah, Yes - ironically this isn't always so simple in kubernetes.
@wojtek-t, Is any resource metric monitoring exposed/available on these test systems?

Unfortunately not really.
We have custom stuff that e.g. is monitoring etcd db size, or resource consumption of master components (like etcd, apiserver, etc.)
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/metrics_util.go#L286
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/resource_usage_gatherer.go

Gathering resource-usage of all dns pods seems doable, but it's not one-liner change - it would require non-negiligble work (which I don't have time to do, though I'm happy to help with reviews if needed).

Member

wojtek-t commented Sep 14, 2018

Ah, Yes - ironically this isn't always so simple in kubernetes.
@wojtek-t, Is any resource metric monitoring exposed/available on these test systems?

Unfortunately not really.
We have custom stuff that e.g. is monitoring etcd db size, or resource consumption of master components (like etcd, apiserver, etc.)
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/metrics_util.go#L286
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/resource_usage_gatherer.go

Gathering resource-usage of all dns pods seems doable, but it's not one-liner change - it would require non-negiligble work (which I don't have time to do, though I'm happy to help with reviews if needed).

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 14, 2018

Member

So we have the revert merged and the CI tests are running against it currently.
Now for the remainder of the plan, we (sig-scalability) are happy to discuss / help with scale-testing any potential fixes to core-dns.

Member

shyamjvs commented Sep 14, 2018

So we have the revert merged and the CI tests are running against it currently.
Now for the remainder of the plan, we (sig-scalability) are happy to discuss / help with scale-testing any potential fixes to core-dns.

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 14, 2018

Contributor

Thanks, a comment in resource_usage_gatherer.go says that all pods in kube-system are tracked if a podList is not provided. Since dns pods reside in the kube-system namespace, does this mean they are tracked? Or are the scale tests limiting tracking to a specific podList?

Contributor

chrisohaver commented Sep 14, 2018

Thanks, a comment in resource_usage_gatherer.go says that all pods in kube-system are tracked if a podList is not provided. Since dns pods reside in the kube-system namespace, does this mean they are tracked? Or are the scale tests limiting tracking to a specific podList?

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 14, 2018

Contributor

@shyamjvs, @wojtek-t As a sanity check, what does /etc/resolv.conf contain on the test nodes (assuming kubelet uses the default /etc/resolv.conf, and not an alternate file)? A cause of OOM kills we have seen in the field is due to an upstream lookup loop (re: systemd-resolved). Unlikely this is the case here, but if you can check it, I can exclude that possibility. Thanks!

Contributor

chrisohaver commented Sep 14, 2018

@shyamjvs, @wojtek-t As a sanity check, what does /etc/resolv.conf contain on the test nodes (assuming kubelet uses the default /etc/resolv.conf, and not an alternate file)? A cause of OOM kills we have seen in the field is due to an upstream lookup loop (re: systemd-resolved). Unlikely this is the case here, but if you can check it, I can exclude that possibility. Thanks!

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 14, 2018

Member

Thanks, a comment in resource_usage_gatherer.go says that all pods in kube-system are tracked if a podList is not provided. Since dns pods reside in the kube-system namespace, does this mean they are tracked? Or are the scale tests limiting tracking to a specific podList?

I can't remember from the top of my head.
You can take a look how we initialize it (it should be somewhere in test/e2e/framework or sth like that).
Or maybe @shyamjvs remembers it?

Member

wojtek-t commented Sep 14, 2018

Thanks, a comment in resource_usage_gatherer.go says that all pods in kube-system are tracked if a podList is not provided. Since dns pods reside in the kube-system namespace, does this mean they are tracked? Or are the scale tests limiting tracking to a specific podList?

I can't remember from the top of my head.
You can take a look how we initialize it (it should be somewhere in test/e2e/framework or sth like that).
Or maybe @shyamjvs remembers it?

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 17, 2018

Member

Just to report the test results of last run.
The test itself failed:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/213

However, the main problem we were observing with CoreDNS (of crashlooping dns pods overloading apiserver) is gone.
There is a different problem there, though I filed: #68735 for it.

Member

wojtek-t commented Sep 17, 2018

Just to report the test results of last run.
The test itself failed:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/213

However, the main problem we were observing with CoreDNS (of crashlooping dns pods overloading apiserver) is gone.
There is a different problem there, though I filed: #68735 for it.

@guineveresaenger

This comment has been minimized.

Show comment
Hide comment
@guineveresaenger

guineveresaenger Sep 17, 2018

Contributor

@wojtek-t @shyamjvs should this issue be closed then?

Contributor

guineveresaenger commented Sep 17, 2018

@wojtek-t @shyamjvs should this issue be closed then?

@shyamjvs

This comment has been minimized.

Show comment
Hide comment
@shyamjvs

shyamjvs Sep 17, 2018

Member

No, the issue with core-dns is still there. We just fixed the release by reverting it from being default. So I suggest removing the release-blocker labels and keeping this open for the remainder of the bug.

Member

shyamjvs commented Sep 17, 2018

No, the issue with core-dns is still there. We just fixed the release by reverting it from being default. So I suggest removing the release-blocker labels and keeping this open for the remainder of the bug.

@guineveresaenger

This comment has been minimized.

Show comment
Hide comment
@guineveresaenger

guineveresaenger Sep 17, 2018

Contributor

@shyamjvs gotcha. I'll move it out of the milestone then.

/milestone clear
/milestone v1.13

/remove-priority critical-urgent
/priority important-soon

Contributor

guineveresaenger commented Sep 17, 2018

@shyamjvs gotcha. I'll move it out of the milestone then.

/milestone clear
/milestone v1.13

/remove-priority critical-urgent
/priority important-soon

@neolit123

This comment has been minimized.

Show comment
Hide comment
@neolit123

neolit123 Sep 17, 2018

Member

@chrisohaver @rajansandeep
if core-dns doesn't end up being default in 1.12 could please make sure the docs are up to date?
(kubeadm is independent from this).

Member

neolit123 commented Sep 17, 2018

@chrisohaver @rajansandeep
if core-dns doesn't end up being default in 1.12 could please make sure the docs are up to date?
(kubeadm is independent from this).

@miekg

This comment has been minimized.

Show comment
Hide comment
@miekg

miekg Sep 18, 2018

miekg commented Sep 18, 2018

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 18, 2018

Contributor

@wojtek-t, @shyamjvs Is there a way in the framework to enable collection of profiling data (i.e. pprof)? I initially thought no, but I noticed e2e/framework/profile_gatherer.go.

Contributor

chrisohaver commented Sep 18, 2018

@wojtek-t, @shyamjvs Is there a way in the framework to enable collection of profiling data (i.e. pprof)? I initially thought no, but I noticed e2e/framework/profile_gatherer.go.

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 18, 2018

Contributor

After a quick look at profile_gatherer.go, it looks like it only allows collection via SSH.

Contributor

chrisohaver commented Sep 18, 2018

After a quick look at profile_gatherer.go, it looks like it only allows collection via SSH.

@wojtek-t

This comment has been minimized.

Show comment
Hide comment
@wojtek-t

wojtek-t Sep 18, 2018

Member

After a quick look at profile_gatherer.go, it looks like it only allows collection via SSH.

yes - it's only via SSH. And I'm not sure it now supports more than master VM.

Member

wojtek-t commented Sep 18, 2018

After a quick look at profile_gatherer.go, it looks like it only allows collection via SSH.

yes - it's only via SSH. And I'm not sure it now supports more than master VM.

@chrisohaver

This comment has been minimized.

Show comment
Hide comment
@chrisohaver

chrisohaver Sep 18, 2018

Contributor

Looks like it would not be terribly complicated to add http support.

Contributor

chrisohaver commented Sep 18, 2018

Looks like it would not be terribly complicated to add http support.

@fturib

This comment has been minimized.

Show comment
Hide comment
@fturib

fturib Sep 19, 2018

Contributor

About the OOM and memory limit for the Pod, we were able to make some measurement using presubmit scale tests on the PR #68683.

We could estimate some mem usage for 0, 500, 2k nodes. The point at 5k nodes need to be run on a ci test (which means needs to merge the code and wait the scale-perf test is triggered).

CoreDNS

  • 0 nodes: 18mi (measured)
  • 500 nodes: 38mi (measured)
  • 2000 nodes: 93mi (measured)
  • 5000 nodes: 201mi (expected by extrapolation) - need to be verified with a real test

Kube-dns

  • 0 nodes: 38mi (measured)
  • 500 nodes: 43mi (measured)
  • 2000 nodes - ?? - extrapolation expect 82mi - test to be run.
  • 5000 nodes - ?? - extrapolation expect 148mi - which is in line the current max limit at 170mi. (and we should verify it with a test, once the pod dns resource monitoring will be merged)

For CoreDNS, the computation shows that we should need to bump-up the memory limit of CoreDNS to 220mi (expect 201 + 20 of buffer).

Contributor

fturib commented Sep 19, 2018

About the OOM and memory limit for the Pod, we were able to make some measurement using presubmit scale tests on the PR #68683.

We could estimate some mem usage for 0, 500, 2k nodes. The point at 5k nodes need to be run on a ci test (which means needs to merge the code and wait the scale-perf test is triggered).

CoreDNS

  • 0 nodes: 18mi (measured)
  • 500 nodes: 38mi (measured)
  • 2000 nodes: 93mi (measured)
  • 5000 nodes: 201mi (expected by extrapolation) - need to be verified with a real test

Kube-dns

  • 0 nodes: 38mi (measured)
  • 500 nodes: 43mi (measured)
  • 2000 nodes - ?? - extrapolation expect 82mi - test to be run.
  • 5000 nodes - ?? - extrapolation expect 148mi - which is in line the current max limit at 170mi. (and we should verify it with a test, once the pod dns resource monitoring will be merged)

For CoreDNS, the computation shows that we should need to bump-up the memory limit of CoreDNS to 220mi (expect 201 + 20 of buffer).

@johnbelamaric

This comment has been minimized.

Show comment
Hide comment
@johnbelamaric

johnbelamaric Sep 19, 2018

johnbelamaric commented Sep 19, 2018

@tpepper

This comment has been minimized.

Show comment
Hide comment
@tpepper

tpepper Sep 19, 2018

Contributor

We should make a release note for Known Issues, perhaps something like:

Feature #566 enabling CoreDNS as the default for kube-up deployments was dropped from the release due to a scalability resource consumption issue observed. The root cause in CoreDNS remains unclear (ie: it was expected to be faster and with lower resource consumption), but CoreDNS folks are investigating. If a cluster operator is considering using CoreDNS at scale, it may be necessary to give more consideration to CoreDNS pod resource limits and experimentally measure resource usage versus cluster resource availability.

Contributor

tpepper commented Sep 19, 2018

We should make a release note for Known Issues, perhaps something like:

Feature #566 enabling CoreDNS as the default for kube-up deployments was dropped from the release due to a scalability resource consumption issue observed. The root cause in CoreDNS remains unclear (ie: it was expected to be faster and with lower resource consumption), but CoreDNS folks are investigating. If a cluster operator is considering using CoreDNS at scale, it may be necessary to give more consideration to CoreDNS pod resource limits and experimentally measure resource usage versus cluster resource availability.

@fturib

This comment has been minimized.

Show comment
Hide comment
@fturib

fturib Sep 20, 2018

Contributor

I would propose to point more specifically the problem so users can make a better decision:

Feature #566 enabling CoreDNS as the default for kube-up deployments was dropped from the release due to a scalability memory resource consumption issue observed. The root cause in CoreDNS remains unclear, but CoreDNS folks are investigating. If a cluster operator is considering using CoreDNS on a cluster greater than 1000 nodes, it may be necessary to give more consideration to CoreDNS pod memory resource limits and experimentally measure that memory usage versus cluster resource availability.

Contributor

fturib commented Sep 20, 2018

I would propose to point more specifically the problem so users can make a better decision:

Feature #566 enabling CoreDNS as the default for kube-up deployments was dropped from the release due to a scalability memory resource consumption issue observed. The root cause in CoreDNS remains unclear, but CoreDNS folks are investigating. If a cluster operator is considering using CoreDNS on a cluster greater than 1000 nodes, it may be necessary to give more consideration to CoreDNS pod memory resource limits and experimentally measure that memory usage versus cluster resource availability.

@neolit123

This comment has been minimized.

Show comment
Hide comment
@neolit123

neolit123 Sep 20, 2018

Member

+1 for adding reasons in the release notes.

Member

neolit123 commented Sep 20, 2018

+1 for adding reasons in the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment