New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-24691: remove all managed fields used by old manager #2114
OCPBUGS-24691: remove all managed fields used by old manager #2114
Conversation
1dcded8
to
958759b
Compare
/retest |
7 similar comments
/retest |
/retest |
/retest |
/retest |
/retest |
/retest |
/retest |
/jira cherrypick OCPBUGS-22293 |
@jluhrsen: Jira Issue OCPBUGS-22293 has been cloned as Jira Issue OCPBUGS-24036. Will retitle bug to link to clone. WARNING: Failed to update the target version for the clone. Please update the target version manually. Full error below: Full error message.
customfield_12323140 - Field 'customfield_12323140' cannot be set. It is not on the appropriate screen, or unknown.: request failed. Please analyze the request body for more details. Status code: 400:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jluhrsen: This pull request references Jira Issue OCPBUGS-24036, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@jluhrsen: This pull request references Jira Issue OCPBUGS-24036, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
For other reviewersThe bug attached to this PR failed when apply was performed by CNO and resulted in error Code block:
That func above is ONLY called for poststart/prestop hooks when theyre not nil:
and old field manager:
Looks like api server bug "merging" the fields when using managed fields and multiple owners. Guessing, when the new manager drops field So how do we move forward? Reviewing now. |
pkg/apply/fieldmanager.go
Outdated
@@ -47,11 +48,45 @@ func mergeManager(ctx context.Context, clusterClient client.ClusterClient, us *u | |||
if err != nil { | |||
return fmt.Errorf("failed to patch (type %s) for object %s %s: %v", patchType, objGVR.String(), us.GetName(), err) | |||
} | |||
// remove the old manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand why this line and the reset below is needed.
The previous lines should remove the old manager.
Can you paste here what the managed fields look like at this point following the patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. the changes you did to deprecate the old fieldmanager were enough and the extra I added
was overkill. However, your changes' code path wasn't executing because it was being called after the APPLY
and the APPLY is where the upgrade failed and got stuck. Here is a PR that adds a GET before the APPLY and
if it sees an old manager, it will then execute your code path. I tested with it, and it works like we want.
I can replicate this with simple pods but also on OCP. I can confirm that the old field manager has invalid field manager configuration and that is why the apply cannot be applied. Its clear we need to remove the old field managers configuration before attempting to apply. The root of the problem is the old field manager owns the To replicate on any OCP ver without having to go through many upgrade jumps
Confirm no CNO pods are running.
Add in a
This will perform a clientside update to the artifact.
Do a server side apply using the existing field manager. We do this so the existing field manager also owns the
Edit the file and change the managed field section the field manager "another-field-manager" for the `postStart`` from:
to
This mimics what we see in the MG for the old field manager. It doesnt own the Clientside apply this update using current field manager.
Confirm that manager
|
958759b
to
9cf875c
Compare
// consider removing in OCP 4.18 when we know field manager 'cluster-network-operator' no longer possibly | ||
// exists in any object from all upgrade paths | ||
if isDepFieldManagerCleanupNeeded(subcontroller) { | ||
// Retrieve the current state of the resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ideally id put in a comment why we needed to do this and what OCP version it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that what's up above in lines 128-130? although correct me if I'm wrong, but I think this can be removed in 4.16 right? 4.15 will have these changes from GA onward and any cluster that eventually upgrades to 4.15 would have this resolved, so 4.16 and above wouldn't need it.
But, does it hurt? I suppose in the future another field manager name change could occur?
in the case that we have a deprecated field manager that we need to remove, if the Patch() were to fail that code path would not execute. To move it before the Patch() we also have to do one Get() to know if the removal is neccessary. this was uncovered while debugging this upgrade problem: JIRA: https://issues.redhat.com/browse/OCPBUGS-22293 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com> deal w/ deprecated field manager before apply Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
9cf875c
to
711f8ff
Compare
I've updated this PR with one small change to move the I will /retest these network-migration jobs a few times as a sanity check. otherwise, I think this is good now. |
/retest |
/test e2e-aws-sdn-network-reverse-migration |
} else { | ||
klog.Infof("Depreciated field manager %s for object %q %s %s", depreciatedFieldManager, | ||
gvk.String(), obj.GetNamespace(), obj.GetName()) | ||
us, err := clusterClient.Dynamic().Resource(rm.Resource).Namespace(namespace).Get(ctx, name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was think you could do all this once and when its successful, never do it again for the lifetime of the operator, this would reduce the api calls. Up to you though. Not hard req for me.
/test e2e-aws-sdn-network-migration-rollback |
@jluhrsen: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-aws-sdn-network-migration-rollback |
/retest-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how the Get was causing that much delay...but it seems to have fixed the problem somehow...
/hold cancel
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jluhrsen, martinkennelly, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jluhrsen: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required |
95e2d18
into
openshift:master
@jluhrsen: Jira Issue OCPBUGS-24036: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-24036 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build cluster-network-operator-container-v4.16.0-202312160833.p0.g95e2d18.assembly.stream for distgit cluster-network-operator. |
/retitle OCPBUGS-24691: remove all managed fields used by old manager |
@jluhrsen: Jira Issue OCPBUGS-24691: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-24691 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.15 |
@sdodson: new pull request created: #2167 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
the old manager "cluster-network-operator" was changed to "cluster-network-operator/operconfig" in 4.11 when server-side-apply was migrated to from client-side-apply. However, the old operator still had it's own managed fields and some interaction between those and the change to remove preStop hooks in ovnkube-master's daemonset containers was causing upgrades to stick in the network operator.
this will just remove the old manager (and it's managed fields) entirely. it's done before the update to apply the object instead of after.
JIRA: https://issues.redhat.com/browse/OCPBUGS-22293
deal w/ deprecated field manager before apply