Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow providers to react when Clusters are paused and indicate in the Cluster's status when those actions are finished #8473

Closed
Tracked by #3525
nojnhuh opened this issue Apr 4, 2023 · 17 comments · Fixed by #8690
Assignees
Labels
area/api Issues or PRs related to the APIs area/clusterctl Issues or PRs related to clusterctl kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@nojnhuh
Copy link
Contributor

nojnhuh commented Apr 4, 2023

What would you like to be added (User Story)?

As an infrastructure provider developer, I would like to allow my provider to perform and synchronize tasks in reaction to a Cluster being paused instead of assuming that pausing a Cluster happens instantaneously.

Detailed Description

In CAPZ, we are planning to adopt Azure Service Operator (ASO) to manage individual resources in Azure as Kubernetes objects in place of the Azure SDK from within CAPZ. Since Azure resources are reconciled out-of-band with Cluster API resources in this scenario, CAPZ cannot immediately block ASO's reconciliation loop when a Cluster is spec.paused.

ASO reads a separate annotation on each resource it controls to optionally block reconciliation. In response to a Cluster being paused, CAPZ intends to set this annotation on each relevant ASO resource so no reconciliation occurs when a Cluster is paused.

Since tooling like clusterctl move assumes pausing is an instantaneous operation, there is a risk in this case that actions may be taken when the Cluster is not yet fully paused without waiting for some indication that it is in fact paused.

An example of how I envision this might work for CAPZ with clusterctl move:

  1. clusterctl move starts
  2. Clusters' spec.paused set to true
  3. clusterctl move waits for each Cluster to have status.paused=true
  4. CAPZ's AzureCluster controller detects its owning Cluster is paused
  5. CAPZ annotates all the associated ASO resources to pause reconciliation
  6. CAPZ updates the owning Cluster with status.paused=true
  7. clusterctl move continues as it does today

Anything else you would like to add?

cc @dtzar @CecileRobertMichon

Label(s) to be applied

/kind feature
/area api
/area clusterctl

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/api Issues or PRs related to the APIs area/clusterctl Issues or PRs related to clusterctl needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 4, 2023
@CecileRobertMichon
Copy link
Contributor

cc @fabriziopandini

@fabriziopandini
Copy link
Member

/triage accepted

As of today clustectl move has a preflight check that tries to detect if provisioning is completed from proxy information:

https://github.com/fabriziopandini/cluster-api/blob/d9226f22de28ce8b0e7c97d1d33f7c15eb37f514/cmd/clusterctl/client/cluster/mover.go#L196-L202

Should this, or an improved version of this, be enough for this use case?

If not, we should start thinking about introducing an optional mechanism of synchronization between clusterctl move and providers, or even better, revamp the discussion about plugin in clusterctl which could provide more powerful extension points.

Looping in also @yastij and @srm09 because I assume a similar issue has been dealt with in CAPV

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 5, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 5, 2023

Thanks @fabriziopandini, I think something like that might work but I see a couple gaps for our use case.

I noticed that check happens before clusterctl move pauses the cluster, which was how I was thinking CAPZ would detect that a move is about to happen. Is there any other sign that CAPZ could pick up in between clusterctl move being invoked and that check being performed?

Then if CAPZ can detect a move is happening, I think there would be a race between clusterctl's check that the cluster is provisioned and CAPZ marking a provisioned cluster as unprovisioned in reaction to the move. If CAPZ doesn't react fast enough, then clusterctl might go ahead and move the cluster even if it was just about to be marked as unrpovisioned. It seems to me that clusterctl would need to wait explicitly for the cluster to be paused to avoid that issue.

@fabriziopandini
Copy link
Member

might be related to this #8322

@CecileRobertMichon
Copy link
Contributor

Adding what I said during office hours here:

Would a finalizer work for this use case? We could add a specific finalizer as part of the CAPI move contract (with an exported const part of CAPI utils) that providers can add on resources to indicate "I need to do some cleanup on this resource before you can move it". Then, CAPI move could treat it as best effort:

  • remove all finalizers BUT that specific known move finalizer
  • attempt to delete
  • if after n seconds the resource is still not delete it remove the finalizer and force delete

Unless we're looking to block create as well, then it might be more tricky...

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 5, 2023

Unless we're looking to block create as well, then it might be more tricky...

I think ideally this is the case to ensure we never have two ASO instances reconciling the same resource. In practice, I think the risk is fairly low since the time between clusterctl move's create and delete is usually only a few seconds and no changes are made to the resources during that time that would result in two different definitions being reconciled.

In general, pausing the ASO resources is something CAPZ should do whenever a Cluster is paused, not just during clusterctl move. Exposing whether the Cluster is paused from the status seems like it would enable any kind of tooling to be able to synchronize pause operations in a more consistent way.

@fabriziopandini
Copy link
Member

fabriziopandini commented Apr 14, 2023

edited May 1st 2023 s/finalizer/annotation

@CecileRobertMichon @nojnhuh What about a slight variation of the idea in #8473 (comment), that hopefully can address the potential issue of the delay between create and delete discussed in #8473 (comment).

Given the current move workflow is: preflights - pause old - create new - delete old - unpause new)
The workflow could be modified in:

  • preflights
  • pause old --> signal that the move operation starts
  • check for the well-known annotation to go away on all the objects in scope --> ack from all the components that move can start; if not, fail.
  • create new
  • delete old
  • unpause new

I'm using annotation instead of finalizer, because it blocks not only deletion but also creation (it blocks move).

We can consider defining this as annotations on single objects or on a CRDs (thus defining this setting once for all the objects), like we are already doing for force-move or other things; eventually, we can make it even more sophisticated allowing folks to override the duration of the period clusterctl move should wait with a similar mechanism.

caveats: the CRD must be discoverable by clusterctl

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 14, 2023

check for the well-known finalizer to go away on all the objects in scope

Would this finalizer be added by a controller? If so, then it seems like that would be susceptible to a race condition where the check that the finalizer is gone might happen before the controller has a chance to add it in the first place. And if the finalizer is added by clusterctl, that solves the overall problem for clusterctl move but any other tooling relying on a cluster being paused would need to build its own unique mechanism.

Overall I still think some explicit indication of whether or not the cluster is paused in its status is the most straightforward way to solve this problem. If it were its own condition, then it could still be opt-in without any API changes, where determining if a cluster is paused might look something like:

if cond := conditions.Get(cluster.Status.Conditions, "Paused"); cond != nil {
    return cond.Status == "True"
}
return cluster.Spec.Paused

@sbueringer
Copy link
Member

sbueringer commented Apr 22, 2023

Using a finalizer to block move sounds wrong to me. As far as I can tell finalizers have a clear semantic in kubernetes and our use case doesn't match.

Just wanted to add, I think this problem is not specific to infra providers. Because of the way the Controller-runtime cache works the Cluster CR is probably the only one where the pause is in effect instantly.

I think there are similar race conditions after velero restore + unpause just for unpause instead of pause

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 26, 2023

@fabriziopandini @CecileRobertMichon @sbueringer Thoughts on introducing a Paused status condition to communicate this? I have a very hacked-together POC showing how this might work: main...nojnhuh:cluster-api:wait-pause

@fabriziopandini
Copy link
Member

Would this finalizer be added by a controller? If so, then it seems like that would be susceptible to a race condition where the check that the finalizer is gone might happen before the controller has a chance to add it in the first place.

It is up to the controllers to set this annotation as soon as possible in the object's lifecycle as we do for finalizers, ownerRef, core labels, and other annotations as well.
This IMO is good enough because no infra could be created by a controller before the annotation is added.

Using a finalizer to block move sounds wrong to me. As far as I can tell finalizers have a clear semantic in Kubernetes and our use case doesn't match.

I expressed similar concerns in my proposal above, and I suggested using annotation instead; I have edited the comment to make it more explicit

@fabriziopandini @CecileRobertMichon @sbueringer Thoughts on introducing a Paused status condition to communicate this? I have a very hacked-together POC showing how this might work: main...nojnhuh:cluster-api:wait-pause

My main concern with this solution is that it will introduce a change that will prevent the new versions of clusterctl to perform move when working with older versions of CAPI/providers (versions without the condition), which is a use case that currently works.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 1, 2023

@fabriziopandini What if instead of checking that the annotation is present or not, clusterctl waits for the annotation either to not be defined (for backward-compat) or to have a particular value? That way, controllers can also use the annotation to indicate that a cluster is not paused, like @sbueringer brought up would be useful in other places as well. That also seems to make the potential race condition a bit more airtight if controllers can indicate the cluster's paused status proactively instead of only after a move is invoked.

Overall I think a status condition could work similarly, where when no Paused status is defined then spec.paused is assumed to be instantaneous. Would that still break compatibility if the condition is assumed to not always be set? A status condition might also make it a bit more clear that only controllers are expected to modify that field, where the function of a new annotation might be more easily confused with spec.paused. And the message and reason fields on conditions might be useful to provide more context like why a cluster is not yet (un)paused or show progress toward reaching the desired state.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 9, 2023

@fabriziopandini Do you still think an annotation is the best path forward at the moment?

@fabriziopandini
Copy link
Member

@nojnhuh I'm getting a little bit confused by the direction this discussion is taking

In the issue description, we are discussing the problem of propagating pause to ASO resources and how to make clusterctl move to wait for everything to be actually paused.

@CecileRobertMichon made a proposal to introduce a finalizer on the CAPZ resources responsible for the corresponding ASO objects; since then I have assumed that having this finalizer at the level of each CAPZ resource could make things easier because each resource can simply tell "I'm actually paused/I'm still in the process of pausing", but there was no need to surface a global "everything is paused" on the cluster object, which requires a lot of coordination across controllers.

Then, during the discussion, the finalizer became an annotation, but as far as understood the idea was still to support this annotation potentially on any resource in the scope of the move operation; the annotation was optional, allowing each resource/controller to opt-in in controlling the move operation.

Now, reading your last comment and looking at the linked PR, it seems that we are shifting to a different approach because we are now discussing only one annotation at the cluster level; not only, we are also trying to make this "everything is paused" a generic thing, not strictly related to clustectl move.

If I got all this right, my feedback about this shift of perspective is the following:

  • Intuitively, it seems to me that computing a global "everything is paused" at the cluster level implies a lot of complexity, and before committing to this path, I would like to hear opinions from different provider teams; this probably requires some discussion during office hours
  • If the majority of the providers prefer to handle "I'm actually paused" at the single resource level, we should revert back to a solution that could apply to any resource as the one discussed above.
  • If there is consensus across providers that computing a global "everything is paused" is not an issue, we can consider a solution that works at the cluster level only; in this case, my personal opinion is to make this explicit via a change to the contract that might be implemented as an additional field or something else + a mechanism to make the info to be propagated from the infrastructure cluster to the cluster.

I hope this helps in making progress; happy to jump in a meeting to discuss this in person if this can help

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 30, 2023

@fabriziopandini I think that's a good recap of my understanding as well.

I agree that annotating each individual resource would be less disruptive overall. I could see that being more difficult for use cases other than clusterctl move though if they involve similarly building a graph of all resources to check and see if each is paused. Even though that logic is documented, the implementation in clusterctl move seems non-trivial enough that having each client to reimplement that for itself seems to be asking a lot IMO given providers will have this knowledge already. That may also leak providers' implementation details that users shouldn't have to worry about and may instead start depending on.

I also still think it's acceptable to have a way to express "everything is paused" on the Cluster object since spec.paused on the same object already means "pause everything," so that would be more symmetrical (if that's worth anything) than if each individual resource was annotated with "I am paused." One thought I had since opening the PR was to make it clearer that only *infrastructure* instead of everything is paused since that's all an infrastructure provider can reasonably control.

Overall I have no issue with a clusterctl move-specific solution here, but this problem seems generic enough that a similarly generic solution seems worth pursuing. I'll add this to the office hours agenda for this week to hopefully get some more input.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 31, 2023

Now that I've thought about this more, I think I'm coming around to the annotating-individual-resources approach.

I think I had it stuck in my head that the ASO resources would need to be annotated this way, but maybe that's not the case if we can instead selectively annotate only the CAPZ infra resources which would block the move for it and all the resources under their respective ownership hierarchies (which would include ASO resources). Maybe that's already what you had in mind. At that point, it seems like the only real difference with the annotating-the-cluster approach is whether or not to propagate the annotation from the InfraCluster to the Cluster.

I need to take another look at clusterctl move to make sure that's possible, but if it is I think that would unblock our use-case in the short term. Then something more complex could probably be built mostly on top of that if needed later.

@fabriziopandini Does an approach like that align with what you had in mind?

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Jun 5, 2023

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Issues or PRs related to the APIs area/clusterctl Issues or PRs related to clusterctl kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants