Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting For a Plan to Finish #1461

Merged
merged 4 commits into from
Apr 14, 2020
Merged

Waiting For a Plan to Finish #1461

merged 4 commits into from
Apr 14, 2020

Conversation

kensipe
Copy link
Member

@kensipe kensipe commented Apr 10, 2020

@anthonydahann a community member made the first PR to introduce --wait to install.

As outlined in the issue #1418 there is a desire to move this code into a reusable space (not the CLI) with a request to add it into kudoClient. This enables the ability to use kudo as a library from terraform in particular for https://github.com/kudobuilder/terraform-provider-kudo. WaitForInstance in kudo.go now provides this ability.

After modifying kudo install to use this wait, it made sense to add a wait-timeout for client control. This is super important as it is very unclear how long a plan will take to "finish".

The challenge from a user perspective at that point is what if the timeout expired and you want to wait again... or what if you didn't wait but now you want to. It just made sense to add kudo plan wait with a wait-time as well. The kudo plan wait works for any plan... it will wait for whatever the active plan is to finish.

To use this new feature:

# precondition
go run cmd/kubectl-kudo/main.go install mysql

# then 
go run cmd/kubectl-kudo/main.go plan wait --instance mysql-instance 

New feature in plan submenu:

 go run cmd/kubectl-kudo/main.go plan --help
The plan command has subcommands to view all available plans.

Usage:
  kubectl-kudo plan [command]

Available Commands:
  history     Lists history to a specific operator-version of an instance.
  status      Shows the status of all plans to an particular instance.
  trigger     Triggers a specific plan on a particular instance.
  wait        Waits on a plan to finish for a particular instance.

help for plan wait

 go run cmd/kubectl-kudo/main.go plan wait --help
Waits on a plan to finish for a particular instance.

Usage:
  kubectl-kudo plan wait [flags]

Examples:
  # Wait on the current plan status to finish
  kubectl kudo plan wait --instance=<instanceName>

Fixes #1418

Tagging @anthonydahanne in case he wanted to see this work

Signed-off-by: Ken Sipe <kensipe@gmail.com>

working version

Signed-off-by: Ken Sipe <kensipe@gmail.com>
…ackage

Signed-off-by: Ken Sipe <kensipe@gmail.com>
Signed-off-by: Ken Sipe <kensipe@gmail.com>
@kensipe kensipe added the release/highlight This PR is a highlight for the next release label Apr 10, 2020
@kensipe kensipe requested a review from runyontr April 10, 2020 19:12
@kensipe kensipe added this to the 0.13.0 milestone Apr 10, 2020
@kensipe
Copy link
Member Author

kensipe commented Apr 11, 2020

I received some feedback questioning the newly added cli command kudo plan wait. If you modify an instance and immediately "wait" on that change... it is possible that the manager didn't see the change yet and the wait will complete before the plan is invoked... It was deemed by this feedback as not worth adding in. A couple of thoughts around this for a debate:

  1. the plan status has the same issue in that if you do a plan status, it could be the "status" prior to the plan.. repeated hits against plan status are necessary
  2. I still see value in an admin installing an operator and a dev or someone wanting to use it wanting to know when the deploy is done... thus the installer doesn't want to wait but another user does.
    IMO it is worth having... and worth documenting this situation for awareness... what do other reviewers, teammates and community think?

@alenkacz
Copy link
Contributor

@kensipe This is solvable with the admission webhook as that already marks Instance into a state where it's apparent that a plan will be run and it does not have to wait for controller to see it. Potentially without webhook, using generation and/or version of instance could be a solution but the wait then would have to be something like kudo plan trigger --wait for that to work (you need version/generation before and after execution)

Signed-off-by: Ken Sipe <kensipe@gmail.com>
@kensipe
Copy link
Member Author

kensipe commented Apr 13, 2020

we currently have a solution when a plan is triggered / update or upgrade... the issue mentioned is the inability to have this information if a plan is trigger and another user (or same user later in time) wants to wait... for which we don't have the knowledge of the previous instance at that time.

@kensipe
Copy link
Member Author

kensipe commented Apr 13, 2020

just updated with a new wait on plan status... super cool IMO.

The plan status keeps refreshing in place on the terminal until the plan is complete (if --wait is used). I added an elapsed time because it can look like the screen is locked when it is not. The last update on the status is the last change to the plan by kudo manager.

The plan status without wait works the same. With --wait, it will loop until: 1) the user exits the process with a ctl+c break or 2) the plan completes.

go run cmd/kubectl-kudo/main.go plan status --instance mysql-instance --wait
Plan(s) for "mysql-instance" in namespace "default":
.
└── mysql-instance (Operator-Version: "mysql-0.2.0" Active-Plan: "deploy")
    ├── Plan backup (serial strategy) [NOT ACTIVE]
    │   └── Phase backup (serial strategy) [NOT ACTIVE]
    │       ├── Step pv [NOT ACTIVE]
    │       ├── Step backup [NOT ACTIVE]
    │       └── Step cleanup [NOT ACTIVE]
    ├── Plan deploy (serial strategy) [COMPLETE], last updated 2020-04-13 08:56:40
    │   └── Phase deploy (serial strategy) [COMPLETE]
    │       ├── Step deploy [COMPLETE]
    │       ├── Step init [COMPLETE]
    │       └── Step cleanup [COMPLETE]
    └── Plan restore (serial strategy) [NOT ACTIVE]
        └── Phase restore (serial strategy) [NOT ACTIVE]
            ├── Step restore [NOT ACTIVE]
            └── Step cleanup [NOT ACTIVE]

elapsed time 6.086202815s✔

This is harder to see in a static image... so I create a video to show it off. (this is the feature I always wanted!)

https://drive.google.com/file/d/1xp-Eax1XtmAKp9eEtpaUnqDcFU8U5-Dj/view?usp=sharing

return fmt.Errorf("OperatorVersion %s from instance %s/%s does not exist", instance.Spec.OperatorVersion.Name, ns, options.Instance)
}
// for loop breaks if Wait==false, or when active plan completes (or when user exits process)
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this logic is really complex and there's some great Go constructs for doing this nicely. I'd prefer we do this with channels and select, which was well-designed for this and can be a lot more clear. Check this out:

https://www.sohamkamani.com/golang/2018-06-17-golang-using-context-cancellation/

Put the logic into a goroutine that takes a channel, and then select over that and a time.After(options.WaitTime * time.Second). The goroutine can use a for loop with time.Sleep (and then breaks), but there's also time.Ticker depending on what you want to do.

@kensipe kensipe merged commit bf5c090 into master Apr 14, 2020
@kensipe kensipe added this to Done in KUDO Global via automation Apr 14, 2020
@kensipe kensipe deleted the ken/instance-wait branch April 14, 2020 02:01
@zen-dog
Copy link
Contributor

zen-dog commented Apr 14, 2020

The challenge from a user perspective at that point is what if the timeout expired and you want to wait again

I'm not sure how important this use case is. The initial issue #966 is the 80/20 use case: wait for the instance update (or triggered plan) to finish. Doing that with two commands is inconvenient but more importantly racy (as @alenkacz mentioned above). While "the plan status has the same issue in that if you do a plan status, it could be the "status" prior to the plan" statement is correct in itself, a plan status command is a snapshot in time: it can happen before, during or after and has no expectations to reflect a previously started plan. This is clearly not the case when --waiting for an Instance update.

Tech debt introduced by the original #966 PR is still present and I suspect that the users will run into raciness issues when using this command after updating instances and triggering plans. Additionally, it is debatable whether plan wait use case is worth added complexity.

@kensipe
Copy link
Member Author

kensipe commented Apr 15, 2020

Now that we have plan status --wait I'm not sure I see the value in plan wait and since it is questionable perhaps we should remove it.

There is a "raciness" from a cli perspective regarding plan status --wait and plan wait which is understandable and should be documented but doesn't diminish the value of the feature IMO. From the CLI perspective, there isn't a way to be aware that a change has been requested but it hasn't been identified by the controller yet.

There is NO tech debt on #966 however... the imagined race condition does not exist with one small exception... if there is an uninstall and a rapid re-install with the same instance name... it is possible that the stale state of the previous instance isn't clean up yet.

It is also important to note... that unlike a manager race condition... there is no ill side-effect of a racy condition. There is the potential of a confused user and poor UX. For users that know what they are doing it is still a worthy feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/highlight This PR is a highlight for the next release
Projects
KUDO Global
  
Done
Development

Successfully merging this pull request may close these issues.

Install --wait
4 participants