Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl: Use v1.5-compatible ownership logic when listing dependents. #43239

Merged
merged 3 commits into from
Mar 17, 2017

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Mar 16, 2017

What this PR does / why we need it:

This restores compatibility between kubectl 1.6 and clusters running Kubernetes 1.5.x. It introduces transitional ownership logic in which the client considers ControllerRef when it exists, but does not require it to exist.

If we were to ignore ControllerRef altogether (pre-1.6 client behavior), we would introduce a new failure mode in v1.6 because controllers that used to get stuck due to selector overlap will now make progress. For example, that means when reaping ReplicaSets of an overlapping Deployment, we would risk deleting ReplicaSets belonging to a different Deployment that we aren't about to delete.

This transitional logic avoids such surprises in 1.6 clusters, and does no worse than kubectl 1.5 did in 1.5 clusters. To prevent this when kubectl 1.5 is used against 1.6 clusters, we can cherrypick this change.

Which issue this PR fixes:

Fixes #43159

Special notes for your reviewer:

Release note:

This effectively reverts the client-side changes in
cec3899.
We have to maintain the old behavior on the client side to support
version skew when talking to old servers that set the annotation.

However, the new server-side behavior is still to NOT set the
annotation.
In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
@enisoc enisoc added release-blocker release-note-none Denotes a PR that doesn't merit a release note. labels Mar 16, 2017
@enisoc enisoc added this to the v1.6 milestone Mar 16, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 16, 2017
@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 16, 2017
@enisoc
Copy link
Member Author

enisoc commented Mar 16, 2017

cc @kubernetes/sig-apps-pr-reviews

@k8s-reviewable
Copy link

This change is Reviewable

@enisoc enisoc added the kind/bug Categorizes issue or PR as related to a bug. label Mar 16, 2017
@@ -355,7 +356,16 @@ func (reaper *StatefulSetReaper) Stop(namespace, name string, timeout time.Durat
}

errList := []error{}
for _, pod := range podList.Items {
for i := range podList.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to switch StatefulSetHasDesiredReplicas to use ObservedGeneration and use that instead of deleting the pods manually here. If you do that, separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is client code, I think we can't do that until 1.7. If we use it against a 1.5 cluster, StatefulSet ObservedGeneration will never catch up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #43325.

@@ -67,6 +67,10 @@ const (
RollbackTemplateUnchanged = "DeploymentRollbackTemplateUnchanged"
// RollbackDone is the done rollback event reason
RollbackDone = "DeploymentRollback"
// OverlapAnnotation marks deployments with overlapping selector with other deployments
// TODO: Delete this annotation when we no longer need to support a client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 1.7, we should probably check deployments with this annotation and clean it up, then remove it from our codebase in 1.8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup is being done in this PR so it will begin in 1.6. Filed #43322 for later removal.

@@ -2441,6 +2441,10 @@ func (dd *DeploymentDescriber) Describe(namespace, name string, describerSetting
}
w.Write(LEVEL_0, "NewReplicaSet:\t%s\n", printReplicaSetsByLabels(newRSs))
}
overlapWith := d.Annotations[deploymentutil.OverlapAnnotation]
if len(overlapWith) > 0 {
w.Write(LEVEL_0, "!!!WARNING!!! This deployment has overlapping label selector with deployment %q and won't behave as expected. Please fix it before continuing.\n", overlapWith)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False negative for 1.6 clusters but I am ok with that. Or we can add back the logic for cleaning up this annotation in the deployment controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it atomically in

updateConditions := deploymentutil.SetDeploymentRevision(deployment, newRevision)
// If no other Progressing condition has been recorded and we need to estimate the progress
// of this deployment then it is likely that old users started caring about progress. In that
// case we need to take into account the first time we noticed their new replica set.
cond := deploymentutil.GetDeploymentCondition(deployment.Status, extensions.DeploymentProgressing)
if deployment.Spec.ProgressDeadlineSeconds != nil && cond == nil {
msg := fmt.Sprintf("Found new replica set %q", rsCopy.Name)
condition := deploymentutil.NewDeploymentCondition(extensions.DeploymentProgressing, v1.ConditionTrue, deploymentutil.FoundNewRSReason, msg)
deploymentutil.SetDeploymentCondition(&deployment.Status, *condition)
updateConditions = true
}
if updateConditions {
if deployment, err = dc.client.Extensions().Deployments(deployment.Namespace).UpdateStatus(deployment); err != nil {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning if a deployment with the annotation exists, simply remove the annotation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not remove it in syncDeployment() where we used to call handleOverlap()? It seems easier to prove that that will do the same thing as the old code would if the selectors no longer overlap.

This ensures old clients will not assume the Deployment is blocked.
@0xmichalis
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2017
@ethernetdan
Copy link
Contributor

@pwittrock @janetkuo PTAL at this release blocking issue

@ethernetdan
Copy link
Contributor

/cc @kubernetes/sig-cli-pr-reviews

@@ -581,6 +581,20 @@ func (dc *DeploymentController) syncDeployment(key string) error {
return nil
}

// This is the point at which we used to add/remove the overlap annotation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to this comment a date at which this code can be removed + file an issue and target a milestone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #43322. Added comment in another commit, which I'll send as a separate PR unless this one requires fixup before merging.

// TODO: Right now we list replica sets by their labels. We should list them by selector, i.e. the replica set's selector
// should be a superset of the deployment's selector, see https://github.com/kubernetes/kubernetes/issues/19830.
// ListReplicaSetsInternalV15 is ListReplicaSetsV15 for internalextensions.
// TODO: Remove the duplicate when call sites are updated to ListReplicaSetsV15.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add when this should be deleted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #43323. Added comment in another commit, which I'll send as a separate PR unless this one requires fixup before merging.

// ListPodsV15 is a compatibility function that emulates the behavior
// from v1.5.x (list matching objects by selector) except that it leaves out
// objects that are explicitly marked as being controlled by something else.
func ListPodsV15(deployment *extensions.Deployment, rsList []*extensions.ReplicaSet, getPodList podListFunc) (*v1.PodList, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add an issue to make sure there is a version skew test that will catch reaping issues in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #43324.

// the controller is blocked due to selector overlap and has no dependents.
if _, ok := d.Annotations[util.OverlapAnnotation]; ok {
delete(d.Annotations, util.OverlapAnnotation)
d, err = dc.client.ExtensionsV1beta1().Deployments(d.Namespace).UpdateStatus(d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update instead of UpdateStatus to remove that annotation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow annotations to be updated via status and the fact that the deployment is an overlapping one is more of a status info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code that worked with OverlapAnnotation used UpdateStatus(). I wanted to restore the clearOverlapAnnotation() behavior as exactly as possible to minimize risk.

@pwittrock
Copy link
Member

Fine to do comments in a follow up so that the tests are not run a second time before merge

@janetkuo
Copy link
Member

/approve
/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enisoc, janetkuo, kargakis

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2017
@@ -65,7 +65,7 @@ func (h *DeploymentHistoryViewer) ViewHistory(namespace, name string, revision i
if err != nil {
return "", fmt.Errorf("failed to retrieve deployment %s: %v", name, err)
}
_, allOldRSs, newRS, err := deploymentutil.GetAllReplicaSets(deployment, versionedClient)
_, allOldRSs, newRS, err := deploymentutil.GetAllReplicaSetsV15(deployment, versionedClient)
Copy link
Member

@janetkuo janetkuo Mar 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check if the deployment has the overlapping annotation (not in this PR, not release-blocking). Filed #43321

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit f37cffc into kubernetes:master Mar 17, 2017
@shiywang
Copy link
Contributor

shiywang commented Mar 21, 2017

@Kargakis when I working on #42570, I remove the podList parameter of FindOldReplicaSets, but after rebase with this pr, seems we still need this, do you think we should keep the old `FindOldReplicaSets‵ function ?

@enisoc
Copy link
Member Author

enisoc commented Mar 21, 2017

@shiywang I think it is still ok for you to remove podList from FindOldReplicaSets. In this PR, we had to retain some old/transitional logic because the new logic would not work against an old cluster.

In your case, it seems like the new logic you're introducing should work the same in old and new clusters.

@shiywang
Copy link
Contributor

shiywang commented Mar 22, 2017

@enisoc hi, thanks for your comments, here is my pr #43508 could you help to take a look ? I'm not sure if this is right since this is my first pr of deployment

@enisoc enisoc deleted the kubectl-controller-ref branch March 27, 2017 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-blocker release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubectl 1.6 Deployment reaper doesn't work against 1.5 cluster
9 participants