Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Federation] Add a worker queue to the generic sync controller. #44987

Merged
merged 1 commit into from
May 10, 2017

Conversation

perotinus
Copy link
Contributor

This is in preparation for converting the ReplicaSet controller to be a generic sync controller.

This doesn't include support for multiple workers yet: it's not immediately obvious how to support the command-line flags for ReplicaSet (or, I suppose in general, how do TypeAdapters support external configuration via whatever flag mechanism we're using).

cc @marun

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 27, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @perotinus. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 27, 2017
@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 27, 2017
@marun
Copy link
Contributor

marun commented May 1, 2017

@k8s-bot ok to test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 1, 2017
@nikhiljindal
Copy link
Contributor

/approve

LGTM. Will add lgtm label once you verify that e2e tests pass.

The unit test failure seems to be for deleting a federated service. cc @shashidharatd hope we didnt introduce any flakiness with recent service controller changes.

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2017
@shashidharatd
Copy link

@nikhiljindal, yeah it is a flake in deleting a federated service. it has been showing up more frequently in recent builds. I am surprised on what causes this flakiness, and why it was not showing up before service controller refactoring. the delete finalizers part is almost untouched by the refactoring.
Also i am unable to reproduce this failed test when i run locally. Appreciate any pointers to debug further to find the root cause. Thanks.

k8s-github-robot pushed a commit that referenced this pull request May 3, 2017
Automatic merge from submit-queue

Add wait for federated service deletion

Fixes the flaky kubectl tests #44987 (comment), #45264

service deletion is not instantaneous in federation.

The fix is same as #42674.
We need the fix now for services since we recently fixed federation service controller so that it runs successfully now.

cc @shashidharatd
@shashidharatd
Copy link

@k8s-bot unit test this

@shashidharatd
Copy link

The flakiness in unit test of federated service deletion is fixed by #45265. Now there is another flake in pkg/kubelet/kuberuntime for which i have raised an issue #45281

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 3, 2017
@perotinus
Copy link
Contributor Author

@nikhiljindal @shashidharatd @marun Can you someone PTAL and run the Federation e2e tests? This has been updated to have the worker method do redelivery, which makes the reconcile method a bit simpler.

@shashidharatd
Copy link

@k8s-bot verify test this
@k8s-bot pull-kubernetes-federation-e2e-gce test this

kind, namespacedName, err)
s.deliver(namespacedName, 0, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statusError will deliver this as s.deliver(namespacedName, 0, true) which is a change in behavior. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct: I'm inclined to believe that returning true before was a typo, since this is an error condition and diverging like this would have warranted at least a comment as to why.

@marun
Copy link
Contributor

marun commented May 5, 2017

I think this PR is not controversial and modulo avoiding returning errors from reconcile it should merge before #45374.


Reviewed 2 of 2 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


federation/pkg/federation-controller/sync/controller.go, line 242 at r1 (raw file):

type reconciliationStatus int

const (

(No action required) Would it make sense to centralize status rather than defining for each remaining controller?


federation/pkg/federation-controller/sync/controller.go, line 258 at r1 (raw file):

		item := obj.(*util.DelayingDelivererItem)
		namespacedName := item.Value.(*types.NamespacedName)
		status, err := s.reconcile(*namespacedName)

I think that @shashidharatd's recent work with the services PR suggests having the reconcile handle errors internally to ensure traceability.


federation/pkg/federation-controller/sync/controller.go, line 274 at r1 (raw file):

			s.deliver(*namespacedName, s.clusterAvailableDelay, false)
		default:
			glog.Errorf("Unhandled reconciliation status: %s", status)

Is this just defensive? Seems like something better handled via testing.


Comments from Reviewable

@perotinus
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


federation/pkg/federation-controller/sync/controller.go, line 242 at r1 (raw file):

Previously, marun (Maru Newby) wrote…

(No action required) Would it make sense to centralize status rather than defining for each remaining controller?

Absolutely! Not sure where it should live, though: it seems like it could live in the sync package since it will eventually, ideally, be an implementation detail of the sync controller and not used elsewhere.


federation/pkg/federation-controller/sync/controller.go, line 258 at r1 (raw file):

Previously, marun (Maru Newby) wrote…

I think that @shashidharatd's recent work with the services PR suggests having the reconcile handle errors internally to ensure traceability.

That seems reasonable.


federation/pkg/federation-controller/sync/controller.go, line 274 at r1 (raw file):

Previously, marun (Maru Newby) wrote…

Is this just defensive? Seems like something better handled via testing.

Agreed. I copied this from the reconcile loop in the replicaset controller without thinking too much about it. There doesn't seem to be harm in leaving it, though it scares me a bit to think how this codepath would be triggered–it would require casting–so I think I'll remove it.


Comments from Reviewable

@marun
Copy link
Contributor

marun commented May 10, 2017

/lgtm


Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


federation/pkg/federation-controller/sync/controller.go, line 242 at r1 (raw file):

Previously, perotinus (Jonathan MacMillan) wrote…

Absolutely! Not sure where it should live, though: it seems like it could live in the sync package since it will eventually, ideally, be an implementation detail of the sync controller and not used elsewhere.

Makes sense to me.


Comments from Reviewable

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marun, nikhiljindal, perotinus

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)

@k8s-github-robot k8s-github-robot merged commit f7dcf7d into kubernetes:master May 10, 2017
case statusNeedsRecheck:
s.deliver(*namespacedName, s.reviewDelay, false)
case statusNotSynced:
s.deliver(*namespacedName, s.reviewDelay, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice until I went to rebase on this, but shouldn't this be s.ClusterAvailableDelay?

@perotinus perotinus deleted the syncworker branch June 23, 2017 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants