Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance controller refactoring #850

Merged
merged 34 commits into from
Sep 30, 2019
Merged

Instance controller refactoring #850

merged 34 commits into from
Sep 30, 2019

Conversation

alenkacz
Copy link
Contributor

@alenkacz alenkacz commented Sep 20, 2019

What this PR does / why we need it:
This is an implementation of controller based on KEP-18.

Some highlights:

Fixes #422

@@ -232,7 +245,7 @@ func prepareKubeResources(plan *activePlan, meta *executionMetadata, renderer ku
}

for _, phase := range plan.Spec.Phases {
phaseState, _ := getPhaseFromStatus(phase.Name, plan.State)
phaseState, _ := getPhaseFromStatus(phase.Name, plan.PlanStatus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check error or remove it from the return signature of getPhaseFromStatus

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the ignored error as well at first glance... however the error for an invalid name and we are getting the name for the phases. It is hard to see that this could error unless the plans change from underneath...

this is a private function that is use in only this use case... we should remove the error return from the function. I would also be inclined to from the prefix get -> phaseFromStatus. all of this is minor.

}
}
return nil
}

func prettyPrint(i interface{}) string {
s, _ := json.MarshalIndent(i, "", " ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could fail for custom marshalers, make sure if this errors to set some reasonable default at least ("{}"?) (so you can preserve your return signature)

}

return newState, nil
}

func executeStep(step v1alpha1.Step, state *v1alpha1.StepStatus, resources []runtime.Object, c client.Client) error {
if isInProgress(state.State) {
state.State = v1alpha1.PhaseStateInProgress
if isInProgress(state.Status) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: I'm wondering if instead of doing all of this inline setting of status fields everywhere, we could create a set of transformer functions that take a state and the next step, executes it, and returns the state of the PE at that point. Could make everything a lot more fluent and set the stage for various transformations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, that was my original intentioon behing this executePlan refactoring but at that point I did not do it because it's hard to "compute the next step" - you need to figure out if it's update or create (that's easy) and then what to update (so compute the diff) which is harder. I think moving in this direction would make the test really testable though. Are you ok with me pursuing that direction in the next phase? as it's rather more a testability improvement in the execution engine than the Instance controller thing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with pursuing this in the next phase and agree 👍

@alenkacz alenkacz changed the title WIP: Instance controller refactoring Instance controller refactoring Sep 25, 2019
Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking to have a number of passes for review.
This pass is looking at plan-execution changes:

Beyond the suggestion provided... there is a label for PlanExecutionAnnotation
https://github.com/kudobuilder/kudo/blob/master/pkg/util/kudo/labels.go#L14
It isn't clear how it is used now... it is used in instance/apply_convention.go
which is an applyConventionsToTemplates. I don't think this is needed any longer (meaning the PEAnnonation)

labelSelector = fmt.Sprintf("instance=%s", options.Instance)
} else {
fmt.Printf("History of plan-executions for instance \"%s\" in namespace \"%s\" to operator-version \"%s\":\n", options.Instance, options.Namespace, args[0])
fmt.Printf("History of plan-executions for instance \"%s\" in namespace \"%s\" for operator-version \"%s\":\n", options.Instance, options.Namespace, args[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Printf("History of plan-executions for instance \"%s\" in namespace \"%s\" for operator-version \"%s\":\n", options.Instance, options.Namespace, args[0])
fmt.Printf("History of plan executions for instance \"%s\" in namespace \"%s\" for operator-version \"%s\":\n", options.Instance, options.Namespace, args[0])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at plan history cmd more... I'm not sure any of this makes sense. you mentioned in our walk thru that this is slated for the next release... which I completely support. I wonder if we should just remove these commands or stub them out. they will not be functional.

the following:

	planExecutionsGVR := schema.GroupVersionResource{
		Group:    "kudo.dev",
		Version:  "v1alpha1",
		Resource: "planexecutions",
	}

is not functional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Ken said! ^^

func(a handler.MapObject) []reconcile.Request {
requests := make([]reconcile.Request, 0)
// We want to query and queue up operators Instances
instances := &kudov1alpha1.InstanceList{}
// we are listing all instances here, which could come with some performance penalty
// a possible optimization is to introduce filtering based on operatorversion (or operator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less concerned with the optimization concern (which we should do... we should put a todo: here)
I'm more concerned that when the watch condition is triggered for 1 instance we perform this operation on ALL instances. Why do we not work solely with the instance associated with the trigger?

We want to handle edge and level... we need to handle level in a different way IMO... one that isn't driven by events in the system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok... after further review... I see that we loop thru all instances... but we only operator on the one defined by "a". my concern isn't valid... I still suggest the //TODO: but it isn't a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nothing new in this PR, this was already in before. I don't like TODOs in code, I am happy to file an issue though :) #860

for i2, p := range v.Phases {
planStatus.Phases[i2].Status = ExecutionPending
for i3 := range p.Steps {
i.Status.PlanStatus[n1].Phases[i2].Steps[i3].Status = ExecutionPending
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is unfortunate that i is instance here... and we have i2 and i3 as indexes. I would encourage expanding i into instance in this case. and use i, j,k as indexes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. it's a nit, but still: using n1, i2, i3 for indexes/keys is confusing. short var names are good until they aren't ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I rename this one I would have to rename all of them in the whole file :/ and I kind of like to use i for it, feels go-ish... I renamed the index to planIndex let me know if that helps

Copy link
Member

@gerred gerred Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alenkacz i've also seen Idx to go (heh heh) shorter in a lot of Go, so you could have:

planIdx, phaseIdx, stepIdx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a rule/situation where we can favor clarity over conciseness, because as it stands it would be really easy to refactor this, not thinking about indexes, and create a runtime out of bounds error. making those index names look like that embeds the semantic right in the access pattern.


// getInstance retrieves the instance by namespaced name
// returns nil, nil when instance is not found (not found is not considered an error)
func (r *Reconciler) getInstance(request ctrl.Request) (instance *kudov1alpha1.Instance, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this should take a ctx context.Context as an argument. This function is called once from Reconcile and each step creates it's own context (which works of course) but the point of context is to carry the context throughout the execution.

same with getOperatorVersion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionally, would it be helpful to create our own Context struct embedding conext.Context here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to counter this, including myself, we should question ourselves every time we think we need a context somewhere because context typically ends up being very viral in your codebase. I don't think it's an antipattern but should be very carefully considered as a whole before we do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, let's talk about this... right now we don't really use the context at all, we always use context.TODO() ... until we start doing something meaningful with it I don't see much advantage of having it as dependency as it's only bloating the number of parameters that method has... Having it as dependency and then always passing in context.TODO() for now that feels a bit meaningless

// after have an active plan (which may be complete)
return instance.Status.ActivePlan.Name == ""
if len(missingRequiredParameters) != 0 {
return nil, &executionError{err: fmt.Errorf("parameters are missing when evaluating template: %s", strings.Join(missingRequiredParameters, ",")), fatal: true, eventName: kudo.String("Missing parameter")}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider creating a Kubernetes event here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually do! :) the execution error contains name of the event and that is then emitted "automatically" in handleError so the error type already holds the event information

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do. this executionError will bubble up and land in handleError method where an event will be generated.

@gerred
Copy link
Member

gerred commented Sep 26, 2019

👍 to @kensipe's comments, and I need another pass after taking some more time to think about it all. This is a great PR. I have a lot of little things I want to think about in the full context before I suggest anything that actually blocks this PR from landing. In general, with maybe disabling the Plan History command for the moment (not having tried it or delved into @kensipe's comments there), I'm open to landing this and continuing to increment on a lot of those nits and see how it evolves with Tasker.

Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a number of recommendations / nits... but nothing that should stop this.

nice work

ov = &kudov1alpha1.OperatorVersion{}
err = c.Get(ctx,
err = r.Get(context.TODO(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like it should be passed in

if err != nil {
return nil, err
}
type executionError struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executionError is confusing to me... many things could be an execution error... I'm not sure when to use it and how it is different from InstanceError. The places it is used it looks like a PlanExecutionError. perhaps it is fine... but I had to hunt for it's use to understand it which means like some challenge in readability.

and... although we collapse planexec into instance... I could still see having a planexecution package which would allow for planexec.NewError etc. or plan... like plan.NewError

Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after more thought... I really don't like the broken CLI commands for plan. While I dislike adding more to a large PR... we can't merge this and then release... we will need to do something regarding plan commands.

@kensipe kensipe self-requested a review September 27, 2019 02:08
@kensipe
Copy link
Member

kensipe commented Sep 27, 2019

I'm ok with merge with plan cmd fixes... which could be removing them... or commenting them out.. or responding with a "coming soon message"

alenkacz and others added 2 commits September 27, 2019 08:45
@gerred
Copy link
Member

gerred commented Sep 27, 2019

I'm ok with merge with plan cmd fixes... which could be removing them... or commenting them out.. or responding with a "coming soon message"

what if we just don't actually register them with the command handler so we can iterate on it and it's tests, then introduce it?

Copy link
Contributor

@zen-dog zen-dog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! First batch of nits/comments out, more to come

P.S. don't forget to publish the events ;)

// v v
//+------+------+ +-------+--------+
//| Fatal error | | Complete |
//+-------------+ +----------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏

// EnsurePlanStatusInitialized initializes plan status for all plans this instance supports
// it does not trigger run of any plan
// it either initializes everything for a fresh instance without any status or tries to adjust status after OV was updated
func (i *Instance) EnsurePlanStatusInitialized(ov *OperatorVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about it offline, so it's merely a reminder: we need to unit tests this method. It's not very wild but merging Json manually deserves a few tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as mentioned offline, we might wanna use apimachinery patch module to do a 2/3 way merge automatically.

const PhaseStatePending PhaseState = "PENDING"
// isUpgradePlan returns true if this could be an upgrade plan - this is just an approximation because deploy plan can be used for both
func isUpgradePlan(planName string) bool {
return planName == "deploy" || planName == "upgrade"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should put these magic strings into a const() at least.

for i2, p := range v.Phases {
planStatus.Phases[i2].Status = ExecutionPending
for i3 := range p.Steps {
i.Status.PlanStatus[n1].Phases[i2].Steps[i3].Status = ExecutionPending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. it's a nit, but still: using n1, i2, i3 for indexes/keys is confusing. short var names are good until they aren't ;)


// PhaseState captures the state of the rollout.
type PhaseState string
// TODO in the future when we again support manual plan execution, snapshot should be saved only manually executed plans
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this should be the case?


// new instance, need to run deploy plan
if i.NoPlanEverExecuted() {
return kudo.String("deploy"), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic strings ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's your point here? :DD

okay okay, I'll do better :)


// did the instance change so that we need to run deploy/upgrade/update plan?
instanceSnapshot, err := i.getSnapshotedSpec()
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSnapshotedSpec returns nil, nil when no annotations or no snapshotAnnotation was found but you only check the err != nil here (which could lead to a npe below)

pkg/apis/kudo/v1alpha1/instance_types.go Show resolved Hide resolved
pkg/apis/kudo/v1alpha1/instance_types.go Outdated Show resolved Hide resolved
pkg/controller/instance/instance_controller.go Outdated Show resolved Hide resolved
Copy link
Contributor

@zen-dog zen-dog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2 done. I left a few more nits but nothing that should prevent this PR from landing! 🚢 Great work!

// TODO in the future when we again support manual plan execution, snapshot should be saved only manually executed plans
err := i.SaveSnapshot()
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wrap it into an InstanceError to make it visible on the event bus?

// Automatically generate RBAC rules to allow the Controller to read and write Deployments
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=kudo.dev,resources=instances,verbs=get;list;watch;create;update;patch;delete
func (r *Reconciler) Reconcile(request ctrl.Request) (ctrl.Result, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one good looking Reconcile method! 👏

if err != nil {
log.Printf("InstanceController: Error creating PlanExecution")
return err
fmt.Printf("Pre-error instance %v", instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: remove this line - logging will happen in r. handleError

Suggested change
fmt.Printf("Pre-error instance %v", instance)

// after have an active plan (which may be complete)
return instance.Status.ActivePlan.Name == ""
if len(missingRequiredParameters) != 0 {
return nil, &executionError{err: fmt.Errorf("parameters are missing when evaluating template: %s", strings.Join(missingRequiredParameters, ",")), fatal: true, eventName: kudo.String("Missing parameter")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do. this executionError will bubble up and land in handleError method where an event will be generated.

labelSelector = fmt.Sprintf("instance=%s", options.Instance)
} else {
fmt.Printf("History of plan-executions for instance \"%s\" in namespace \"%s\" to operator-version \"%s\":\n", options.Instance, options.Namespace, args[0])
fmt.Printf("History of plan-executions for instance \"%s\" in namespace \"%s\" for operator-version \"%s\":\n", options.Instance, options.Namespace, args[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Ken said! ^^


if plan.Status.State == kudov1alpha1.PhaseStateComplete {
if obj.Status.AggregatedStatus.Status == kudov1alpha1.ExecutionComplete {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt?

Suggested change
if obj.Status.AggregatedStatus.Status == kudov1alpha1.ExecutionComplete {
if isFinished(obj.Status.AggregatedStatus.Status){

Copy link
Member

@kensipe kensipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@alenkacz alenkacz merged commit 37dd630 into master Sep 30, 2019
@alenkacz alenkacz deleted the av/instance-controller-v2 branch October 30, 2019 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong assumptions and anti-patterns in InstanceController
4 participants