Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finishes the conversion away from createOrReplace #20933

Merged
merged 1 commit into from Jun 23, 2023

Conversation

shawkins
Copy link
Contributor

@shawkins shawkins commented Jun 12, 2023

This is a broader change given the implications of serverSideApply vs createOrReplace - mostly the concern of only applying the just the managed state, which is not based upon an existing resource.

The changes include:

  • the introduction of an onUpdate filter MetadataAwareOnUpdateFilter to remove vacuous updates Repeated server side apply creates additional revisions kubernetes/kubernetes#118519 was opened specifically for Secrets, but I saw the same behavior on Ingress. For safety/consistency it was added to all informer event sources, but that may not be necessary. It would be better to prevent such modifications in the first place, but it's straight-forward to catch it after the fact. The eventual usage of dependent resources should fully address this possibility
  • There was some tweaking of the env logic needed because it could add multiple entries with the same name (one in KeycloakDeployment, and again in KeycloakDistConfigurator) and that would fail validation via a PATCH.
  • the removal of any paths that modifed an existing item and provided that via getReconciledResource - instead we just want to provide the golden managed state and let server side apply figure the rest out.
  • The follow on to the above point is that we should be able to use the resources from the informer / operator caches, rather than looking them up on each reconciliation, as there's no chance they are being modified inappropriately. This was not done as part of this pr to keep the changes a little more focused.
  • The next follow on is that means some of what we're asserting as durability no longer applies (there should be test failures related to this). If a field is not managed, such as Service spec.sessionAffinity, then something else modifying that field means it won't get reverted by the operator. A possible pro of this is that it allows for a broader, but likely unsupported, method for users to customize any of the dependent resources, rather than needing something like the spec.unsupported - however this gets messy when we have to delete our resources (see adds an instance label to support multiple instances #20906) - if we simply throw away the existing state, then users will be required to watch the resource and reapply any of their changes. The main con of this is that there's a greater chance the user could do something that's unsupported... @vmuzikar @abstractj
  • I had some initial test failures related to the watched secrets logic, rather than working through that as is, I modified this to not use a separate store and explicit rolling update. Instead we'll annotate the Keycloak statefulset in such a way as to trigger the rolling restart when needed and to provide a simple tracking of the secrets it references. If this looks like too much for this pr, I can go back and try to separate that off as it's own change. Something similar may also be possible for the realm import logic to avoid the need to explicitly call for a rolling restart. Removes the watched secret store by directly using the statefulsets #21161

Closes #20850

@Pepo48
Copy link
Contributor

Pepo48 commented Jun 16, 2023

@shawkins the proposed changes LGTM. Thank you very much, also for the detailed description, well done.
I still see a couple of failed CI stages, which seem to be related. Could you take a look at them?

@abstractj jFTR, this type of changes were approved even by Vašek, we just need to be careful to not to do major changes in the operator's logic in order to avoid any breaking changes just before the release. I don't see such change in this PR (which was basically my only concern). Therefore, I would give it the green light once all the checks pass.

Thank you both.

@shawkins
Copy link
Contributor Author

I still see a couple of failed CI stages, which seem to be related. Could you take a look at them?

The failures relate to the bullet point "The next follow on is that means some of what we're asserting as durability no longer applies". A consequence of the intended usage of server side apply is that we won't automatically revert changes for non-managed fields. The tests assume the traditional PUT behavior, which will revert everything. If we can accept the new behavior, I'll change the tests. If not, then we'll need a different method to apply our changes - this also relates as well to #13453 as the JOSDK behavior is moving toward server side apply.

@Pepo48
Copy link
Contributor

Pepo48 commented Jun 16, 2023

I still see a couple of failed CI stages, which seem to be related. Could you take a look at them?

The failures relate to the bullet point "The next follow on is that means some of what we're asserting as durability no longer applies". A consequence of the intended usage of server side apply is that we won't automatically revert changes for non-managed fields. The tests assume the traditional PUT behavior, which will revert everything. If we can accept the new behavior, I'll change the tests. If not, then we'll need a different method to apply our changes - this also relates as well to #13453 as the JOSDK behavior is moving toward server side apply.

You're right, my bad, I misunderstood the state of the tests initially. As I already stated, I don't consider it as a breaking change and I think that we can stick with the new behaviour (which means adjusting the tests). But it would be great if someone else could confirm it.

@shawkins
Copy link
Contributor Author

shawkins commented Jun 16, 2023

I'll go ahead and update the test expectations and to simplify things let me pull out the secret store removal. I've talked with @csviri about how this will look with the josdk and since the mechanics will need to change again we can differ that until later.

@shawkins
Copy link
Contributor Author

I'll go ahead and update the test expectations and to simplify things let me pull out the secret store removal.

This has been completed.

I have a better understanding of server side apply's ability to figure out intent. It is not quite what I expected to be. In places where you have an either or, such as a Ingress port name or number - which is represented by separate fields instead of IntOrString, you must try to remove the unwanted field in the apply. At least for services, it does not understand port merge keys kubernetes/kubernetes#118725 so the logic will fallback a strategic merge patch (to aid with that the service ports are given an explicit name). However there are likely other outside modifications which would leave both server side apply and stategic merge unable to modify the resource - with the current state of the pr that will show up as an error on the keycloak cr.

@shawkins shawkins marked this pull request as draft June 19, 2023 17:54
@shawkins
Copy link
Contributor Author

Marking as a draft. Looking more at the direction of the JOSDK they are basing the SSA logic off of fully specifying the state off of the current version of the object. This pr instead took the approach of just asserting the managed state. I need to evaluate things more before making this change.

@csviri
Copy link

csviri commented Jun 19, 2023

@shawkins we can take a look together, in the 'next' branch of josdk is the upcoming release ( in few days ) that contains the version with ssa, where we update just the desired state with ssa regardless of current state on server ( well if non matched )

@shawkins
Copy link
Contributor Author

@shawkins we can take a look together, in the 'next' branch of josdk is the upcoming release ( in few days ) that contains the version with ssa, where we update just the desired state with ssa regardless of current state on server ( well if non matched )

I opened operator-framework/java-operator-sdk#1951 and referenced the upstream kubernetes issue that is giving me pause about SSA.

For this pr I'd essentially just revert the manipluation of the existing state to align with the JOSDK. However there are potential issues with SSA vs the current replace handling - regardless of whether the existing state is used.

@shawkins
Copy link
Contributor Author

shawkins commented Jun 21, 2023

For this pr I'd essentially just revert the manipluation of the existing state to align with the JOSDK. However there are potential issues with SSA vs the current replace handling - regardless of whether the existing state is used.

After discussing with @abstractj we're willing to take the approach that if a user modifies a managed resource in such a way as the operator can't manage it, then they need to resolve that. This has been added as an error message.

The main benefit of this approach is that users will have far greater lattitude in updating the operator managed resources. This may help with issues such as #15395 and possibly #14504. #20107 could be approached this way or via the unsupported template, but also needs the keycloak javaopts tweaked.

To recap - the main drawback is that there are quite a number of changes one could make to our managed resources that would then cause our server side apply attempts to fail validation - ideally those are changes a user wouldn't actually do. The other important implication to us as developers is that you have to keep in mind that you're not asserting the same state as a PUT / replace. This applies to the changes we make in the operator over time as they will act upon existing resources. It will be important to have tests that start with the old state will help ensure things are modifed appropriately. The major considerations:

  • you can't assume that default values will be set (if the value is already populated in the existing state either by our logic or someone else's change it will stay that value).
  • you can't remove much that is unmanaged via a server side apply (only if it's a simple field that accepts an empty string) - if for whatever reason a port, a spread constraint, or whatever needs removed that will have to be done as either an update or json patch otherwise you'll end up in the state mentioned above where we need to user to intervene and remove their changes or delete the resource.

After typing all this out and updating the pr, I'm still on the fence about this change. Given that the operator sdk is adding an option to not user server side apply in future versions, we won't be required to move in this direction to use dependent resources. So if others are not strongly in favor of the flexiblity here, then I'm fine with just refining this to not use server side apply. If we do move forward I'll add some doc notes.

cc @csviri

@shawkins shawkins marked this pull request as ready for review June 21, 2023 18:42
Copy link
Contributor

@abstractj abstractj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawkins due to the recent changes in the codebase, it will be necessary to solve those conflicts.

But I agree with your proposal, control the changes over managed resources is outside the scope of the Operator's and there are other mechanisms in place for people to prevent it from happening.

cc @Pepo48

however this is a broader change given the implications of
serverSideApply vs createOrReplace - mostly the concern of only applying
the managed state not based upon an existing resource

Closes keycloak#20850
@shawkins
Copy link
Contributor Author

The conflict has been resolved. Also the test logic was changed to a create-or-replace like operation to completely ensure it behaves as it did originially.

https://kubernetes.io/blog/2022/10/20/advanced-server-side-apply/#reconstructive-controllers describes how the operator behaves after this change - the proplem of eliminating unnecessary applies should eventually be addressed by the josdk

@abstractj abstractj merged commit 6a92669 into keycloak:main Jun 23, 2023
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complete the removal of deprecated fabric8 logic from the operator
4 participants