Automated cherry pick of #51039 #51224 #51230 #52057 #52200

enisoc · 2017-09-08T21:01:36Z

Cherry pick of #51039 #51224 #51230 #52057 on release-1.7.

These are flakiness fixes that have helped on master.

#51039: StatefulSet: Deflake e2e "Saturate" phase.
#51224: StatefulSet: Deflake e2e "restart" phase.
#51230: StatefulSet: Deflake e2e kubectl exec commands.
#52057: StatefulSet: Deflake e2e RunHostCmd.

ref #48031

The "Saturate" phase of StatefulSet e2e tests verifies orderly startup by controlling when each Pod is allowed to report Ready. If a Pod unexepectedly goes down during the test, the replacement Pod created by the controller will forget if it was already allowed to report Ready. After this change, the signal that allows each Pod to report Ready is persisted in the Pod's PVC. Thus, the replacement Pod will remember that it was already told to proceed to a Ready state.

The test used to scale the StatefulSet down to 0, wait for ListPods to return 0 matching Pods, and then scale the StatefulSet back up. This was prone to a race in which StatefulSet was told to scale back up before it had observed its own deletion of the last Pod, as evidenced by logs showing the creation of Pod ss-1 prior to the creation of the replacement Pod ss-0. We now wait for the controller to observe all deletions before scaling it back up. This should fix flakes of the form: ``` Too many pods scheduled, expected 1 got 2 ```

We seem to get a lot of flakes due to "connection refused" while running `kubectl exec`. I can't find any reason this would be caused by the test flow, so I'm adding retries to see if that helps.

wojtek-t · 2017-09-11T06:42:27Z

Thanks @enisoc . LGTM.

/lgtm

fejta-bot · 2017-09-11T11:45:17Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

enisoc · 2017-09-11T16:57:49Z

/retest

enisoc · 2017-09-11T17:37:26Z

@wojtek-t It seems like federation e2e on release-1.7 has not passed in while:

https://k8s-testgrid.appspot.com/release-1.7-all#gce-federation-release-1-7&width=5

wojtek-t · 2017-09-11T18:01:21Z

@kubernetes/sig-federation-bugs - can you please take a look? ^^

fejta-bot · 2017-09-12T00:42:20Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

wojtek-t · 2017-09-12T06:29:15Z

Since federation presubmit is broken, but this cherrypick is touching only test files and is supposed to deflake those, I'm kicking tests and will merge it manually.

/retest

wojtek-t · 2017-09-12T11:38:10Z

/retest

The initial retry up to 20s was giving up too soon. I'm seeing this test flake because the Node rebooted and it takes ~2min to recover. Now StatefulSet RunHostCmd calls will use the same 5min timeout as with other Pod state checks.

enisoc · 2017-09-12T17:21:34Z

@wojtek-t Could you re-LGTM? I just pulled in a few lines from #52352 due to a flake seen on this PR.

wojtek-t · 2017-09-13T06:31:44Z

/lgtm

k8s-github-robot · 2017-09-13T06:32:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enisoc, wojtek-t

Associated issue: 51039

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~test/OWNERS~~ [enisoc,wojtek-t]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-09-13T06:32:20Z

/test all [submit-queue is verifying that this PR is safe to merge]

wojtek-t · 2017-09-13T07:03:24Z

/test pull-kubernetes-e2e-kops-aws

k8s-ci-robot · 2017-09-13T07:47:18Z

@enisoc: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-kops-aws	`9716e80`	link	`/test pull-kubernetes-e2e-kops-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wojtek-t · 2017-09-13T07:52:37Z

Strange - kops claims "no test failures".

I'm merging it manually to reduce flakiness.

enisoc added 3 commits September 8, 2017 14:01

StatefulSet: Deflake e2e kubectl exec commands.

ac0f75c

We seem to get a lot of flakes due to "connection refused" while running `kubectl exec`. I can't find any reason this would be caused by the test flow, so I'm adding retries to see if that helps.

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 8, 2017

k8s-github-robot assigned enj and sttts Sep 8, 2017

enisoc added this to the v1.7 milestone Sep 8, 2017

enisoc assigned wojtek-t and unassigned sttts and enj Sep 8, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 11, 2017

wojtek-t added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. cherrypick-candidate labels Sep 11, 2017

k8s-ci-robot added sig/federation kind/bug Categorizes issue or PR as related to a bug. labels Sep 11, 2017

StatefulSet: Deflake e2e RunHostCmd.

9716e80

The initial retry up to 20s was giving up too soon. I'm seeing this test flake because the Node rebooted and it takes ~2min to recover. Now StatefulSet RunHostCmd calls will use the same 5min timeout as with other Pod state checks.

enisoc force-pushed the automated-cherry-pick-of-#51039-#51224-#51230-#52057-upstream-release-1.7 branch from 952b3ac to 9716e80 Compare September 12, 2017 17:09

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 13, 2017

wojtek-t merged commit b0f7214 into kubernetes:release-1.7 Sep 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated cherry pick of #51039 #51224 #51230 #52057 #52200

Automated cherry pick of #51039 #51224 #51230 #52057 #52200

enisoc commented Sep 8, 2017 •

edited

wojtek-t commented Sep 11, 2017

fejta-bot commented Sep 11, 2017

enisoc commented Sep 11, 2017

enisoc commented Sep 11, 2017

wojtek-t commented Sep 11, 2017

fejta-bot commented Sep 12, 2017

wojtek-t commented Sep 12, 2017

wojtek-t commented Sep 12, 2017

enisoc commented Sep 12, 2017

wojtek-t commented Sep 13, 2017

k8s-github-robot commented Sep 13, 2017

k8s-github-robot commented Sep 13, 2017

wojtek-t commented Sep 13, 2017

k8s-ci-robot commented Sep 13, 2017

wojtek-t commented Sep 13, 2017

Automated cherry pick of #51039 #51224 #51230 #52057 #52200

Automated cherry pick of #51039 #51224 #51230 #52057 #52200

Conversation

enisoc commented Sep 8, 2017 • edited

wojtek-t commented Sep 11, 2017

fejta-bot commented Sep 11, 2017

enisoc commented Sep 11, 2017

enisoc commented Sep 11, 2017

wojtek-t commented Sep 11, 2017

fejta-bot commented Sep 12, 2017

wojtek-t commented Sep 12, 2017

wojtek-t commented Sep 12, 2017

enisoc commented Sep 12, 2017

wojtek-t commented Sep 13, 2017

k8s-github-robot commented Sep 13, 2017

k8s-github-robot commented Sep 13, 2017

wojtek-t commented Sep 13, 2017

k8s-ci-robot commented Sep 13, 2017

wojtek-t commented Sep 13, 2017

enisoc commented Sep 8, 2017 •

edited