-
Notifications
You must be signed in to change notification settings - Fork 40.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heapster scalability testing #5880
Comments
cc @dchen1107 |
I made some changes to heapster to make it scale to 100 nodes without any load.
|
@vishh Plans for this? Is there someone available to take this right now? |
@saad-ali: Will you be able to look at heapster scalability? |
Abhi (@ArtfulCoder) and I will work on this together. Ideally we'd like to measure the following metrics:
These internal metrics require implementing a Heapster instrumentation infrastructure that doesn't yet exist. Therefore, we'll consider this lower priority and likely post V1 task. Instead we'll focus on getting a baseline of the following basic process metrics for Heapster:
These are available today because Heapster is run in a container. For a first stab at this, we plan on doing the following:
|
Thanks for picking this up! |
@saad-ali The plan SGTM Thanks! |
Abhi and I set up a GCE cluster with 4 nodes yesterday. We scheduled 275 pods (1 container each) on the cluster. Within an hour heapster stopped sending data to GCM because we hit quota limits:
By morning time the error switched to:
To by pass I will try to get quota increased, in the meantime, I'll set up a script to scrap docker stats off the machine directly. |
The quota issue is expected. On Fri, Jun 12, 2015 at 10:46 AM, Saad Ali notifications@github.com wrote:
|
We are filing the request to ask for more quota. Is there any easy way to get those quota, and do we have a estimation on how big quota it should be for a given cluster size (number of nodes and number of pods, etc. ) cc/ @roberthbailey @a-robinson too. |
@dchen1107 Yes, once the extra containers are removed, Heapster memory usage drops back down: As does InfluxDB memory usage (though not nearly close to original levels): |
These appear to be the numbers google is using to hit their 100-node 1.0 goal, per perf testing done under kubernetes/kubernetes#5880 This looks like 12X less data, and I've been finding influx unresponsive somewhere between 10-20 nodes, so maybe this is all the breathing room we need.
These appear to be the numbers google is using to hit their 100-node 1.0 goal, per perf testing done under kubernetes/kubernetes#5880 The defaults are 10s poll interval, 5s resolution, so this should back off load by about an order of magnitude. TODO: - drop the verbose flag once finished debugging
These appear to be the numbers google is using to hit their 100-node 1.0 goal, per perf testing done under kubernetes/kubernetes#5880 The defaults are 10s poll interval, 5s resolution, so this should back off load by about an order of magnitude.
These appear to be the numbers google is using to hit their 100-node 1.0 goal, per perf testing done under kubernetes/kubernetes#5880 The defaults are 10s poll interval, 5s resolution, so this should back off load by about an order of magnitude. We're using `avoidColumns=true` to force heapster to avoid additional columns and instead append all metadata into the series names. It makes the series name ugly and hard to aggregate on the grafana side, but it wildly reduces CPU load. I guess that's why influxdb docs recommend more series with fewer points over fewer series with more points. Grafana's kraken dashboard updated to use the new series.
@saad-ali did you ever push this further in total pod count? We're seeing failures after 12k... |
This is pretty old. Check out http://blog.kubernetes.io/2016/07/kubernetes-updates-to-performance-and-scalability-in-1.3.html Check out the https://github.com/kubernetes/community/blob/master/sig-scalability/README.md they should be able to give you the current information/plans, and address any issues you are having with published numbers. |
Found out last week that Heapster is being deprecated in favor of a metrics server and other components. |
Heapster must be tested to ensure that it meets our v1.0 scalability goals - 100 node clusters (#3876) each running 30 - 50 pods each (#4188). A soak test might also be very helpful.
Some of the interesting signals to track includes,
Heapster needs to expose some metrics to aid in scalability testing.
cc @vmarmol
The text was updated successfully, but these errors were encountered: