New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suspicious breakdown in pod startup time in scalability tests #71028
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/lifecycle frozen |
I've already observed a number cases in scalability tests, where for pod startup-time, the breakdown looks suspicious.
As an example, in this run:
https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/255/build-log.txt
from looking into 10% worst times, we see that:
The time for "schedule-to-watch" and "e2e latencies" for different pods are exactly the same (which suggests something is wrong).
The second thing that is a bit suspicious in some cases is that the "watch part" something is relatively long (time from when kubelet reports pod status as running to when test really observes that, even though watch latencies e.g. for scheduler or controller-manager are low). This may or may not be related to some starvation at the test level.
@kubernetes/sig-scalability-bugs @mborsz
The text was updated successfully, but these errors were encountered: