Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing federation secret controller unit test flakiness #36463

Merged
merged 1 commit into from
Nov 9, 2016

Conversation

nikhiljindal
Copy link
Contributor

@nikhiljindal nikhiljindal commented Nov 8, 2016

Fixes #36422

Adding a wait for the secret to be updated in the store to fix flakiness.
It was failing ~once in 3 to 5 runs before this change. I now have had 30 local runs without a failure.

cc @kubernetes/sig-cluster-federation @mwielgus


This change is Reviewable

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Nov 8, 2016
@nikhiljindal nikhiljindal added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Nov 8, 2016
@nikhiljindal
Copy link
Contributor Author

Adding P1 label for the corresponding label on the issue

@nikhiljindal nikhiljindal added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed cla: yes labels Nov 8, 2016
err = WaitForSecretStoreUpdate(
secretController.secretFederatedInformer.GetTargetStore(),
cluster1.Name, types.NamespacedName{Namespace: secret1.Namespace, Name: secret1.Name}.String(),
updatedSecret, wait.ForeverTestTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could stop this earlier than Forever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will prefer keeping this as is. Dont want to introduce more flakiness because of not enough wait :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like var ForeverTestTimeout = time.Second * 30 so it's not that really forever :)

@dims
Copy link
Member

dims commented Nov 8, 2016

LGTM if all the tests pass 👍

@nikhiljindal nikhiljindal added this to the v1.5 milestone Nov 8, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 8ca1b30. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@nikhiljindal
Copy link
Contributor Author

@k8s-bot unit test this
namespace controller failed, secret controller passed :)

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit 8ca1b30. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@dims
Copy link
Member

dims commented Nov 8, 2016

LOL win some lose some :)

@nikhiljindal
Copy link
Contributor Author

nikhiljindal commented Nov 8, 2016

#36351 is tracking the namespace controller failures.
Looking at the test grid, it seems to be failing once every 10-15 runs: https://k8s-testgrid.appspot.com/k8s#test-go

@mwielgus and I are debugging it more

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 8ca1b30. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@nikhiljindal
Copy link
Contributor Author

GCE cluster failed to come up with Failed to setup provider config: Error building GCE/GKE provider: googleapi: Error 500: Internal Error, internalError. Retrying as its unrelated.

@k8s-bot cvm gce e2e test this

@nikhiljindal
Copy link
Contributor Author

Adding lgtm as per comment above. unit tests passed

@nikhiljindal nikhiljindal added lgtm "Looks good to me", indicates that a PR is ready to be merged. retest-not-required and removed retest-not-required labels Nov 8, 2016
@mwielgus
Copy link
Contributor

mwielgus commented Nov 8, 2016

LGTM to make tests greener, I will try to find more generic fix - other controllers may be affected as well.

@mwielgus
Copy link
Contributor

mwielgus commented Nov 9, 2016

I was unable to reproduce the issue on my laptop (without the fix).

for i in `seq 100`; do godep go test --race; done
PASS
ok      k8s.io/kubernetes/federation/pkg/federation-controller/secret   2.812s
PASS
ok      k8s.io/kubernetes/federation/pkg/federation-controller/secret   2.837s
PASS
ok      k8s.io/kubernetes/federation/pkg/federation-controller/secret   2.759s
PASS
ok      k8s.io/kubernetes/federation/pkg/federation-controller/secret   2.905s
PASS
ok      k8s.io/kubernetes/federation/pkg/federation-controller/secret   2.805s
PASS
ok      k8s.io/kubernetes/federation/pkg/federation-controller/secret   2.782s
[...]

However I agree that there can be a problem in this place. It seems that sometimes we are starting the modify before add is fully completed and as a result the expected modify is in fact another add. I will add a similar fix to other controllers tomorrow as they are equally affected.

@nikhiljindal
Copy link
Contributor Author

I was running it with make test KUBE_GOFLAGS="-race" WHAT='federation/pkg/federation-controller/secret/' which should be the same. I was able to repro the failure around 1 in 4 or 5 times.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestSecretController {secret}
7 participants