Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: [sig-cluster-lifecycle] Upgrade [Feature:Upgrade] cluster upgrade should maintain a functioning cluster [Feature:ClusterUpgrade] #70151

Closed
mortent opened this Issue Oct 23, 2018 · 12 comments

Comments

Projects
None yet
8 participants
@mortent
Copy link
Member

mortent commented Oct 23, 2018

Testgrid: https://k8s-testgrid.appspot.com/sig-release-master-upgrade#gce-new-master-upgrade-cluster

Examples: https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-new-master-upgrade-cluster/2570
https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-new-master-upgrade-cluster/2560

The error message differs, but all seems to be around problems with connecting to the master. This has failed three times in the last week.

/sig cluster-lifecycle
/kind flake
/priority important-soon

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Oct 24, 2018

/milestone v1.13

@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Oct 24, 2018

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Oct 25, 2018

@jberkus do we want to route this to anyone specific in cluster-lifecycle for deflaking?

@jberkus

This comment has been minimized.

Copy link

jberkus commented Oct 25, 2018

@timothysc, can you get someone to look into this?

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Oct 26, 2018

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Oct 26, 2018

Aha - I looked at https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-new-master-upgrade-cluster/2570, and tracked it down. That is #70305

It's a bug where apiserver does not start in time to meet its healthcheck.

I'll pull down some of the others and check they look like the same root cause

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Oct 26, 2018

Good news - it is rare that apiserver does start on time (looks like 72 / 92 restarted). The failures happen when we get unlucky and query it during a restart.

@mariantalla

This comment has been minimized.

Copy link
Contributor

mariantalla commented Nov 5, 2018

Hey @justinsb @roberthbailey - this is now failing again on sig-release-master-upgrade/gce-new-master-upgrade-master. Any updates/blockers?

cc @jberkus @AishSundar

@AishSundar

This comment has been minimized.

Copy link
Contributor

AishSundar commented Nov 5, 2018

FWIW the failures we are seeing in the last couple of runs are slightly different from before

Get https://104.198.171.150/apis/apps/v1/namespaces/e2e-tests-sig-apps-replicaset-upgrade-5m25x/replicasets/rs: dial tcp 104.198.171.150:443: connect: connection refused
not to have occurred

https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-new-master-upgrade-master/1865

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 3, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Mar 5, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Apr 4, 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Apr 4, 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.