Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify control flow in step generator #1843

Merged
merged 2 commits into from
Sep 5, 2018

Conversation

swgillespie
Copy link
Contributor

This commit introduces a new type, 'controlFlow', which indicates to
callers whether or not it should continue the operation that it was
doing before it called a particular function. The variants of
'controlFlow', 'Bail' and 'Continue', indicate to the caller what it
should do. Components can signal 'Bail' when a diagnostic is logged and
the plan should no longer continue.

@swgillespie
Copy link
Contributor Author

This PR is a small part of #1694 that tries to introduces some coding conventions that, if we like them, can bring to the rest of the engine. In order to realize the full benefit of this we'll probably need to do a lot more plumbing, in particular:

  1. Anywhere that needs to log a diagnostic needs to be passed a diag.Sink, preferably as an argument and not hanging off Plan like it is today
  2. Anywhere that can log a diagnostic needs to return (controlFlow, error) so that callers can determine whether or not to proceed.

After this commit, we can say that the step generator returning a non-nil error is a bug in Pulumi. We can't quite realize the full benefit of this (bubbling all the way to the CLI top-level and presenting an error requesting a bug report) but we can tackle that in subsequent PRs.

Here's a before and after comparison of this PR applied to a check failure:

Before:

screen shot 2018-08-29 at 3 02 05 pm

After:

screen shot 2018-08-29 at 3 01 34 pm

cc @lukehoban

@joeduffy
Copy link
Member

joeduffy commented Aug 29, 2018

Nice cleanup. Since you are tidying things up generally, do you think we will end up in an end state where it simply says

error: pulumi-nodejs:dynamic:Resource ... property state is 42

without the Diagnostics: global: global stuff (which I find just adds noise), and the terminating error: an error occurred while advancing the preview?

@swgillespie
Copy link
Contributor Author

swgillespie commented Aug 29, 2018

@joeduffy Yeah, that's the eventual goal. The hard part there is that we're using the error error: an error occurred while advancing the preview to signal to the CLI layer that we need to exit with a non-zero exit code. Getting rid of that is a little tricky, because we need to communicate to the CLI layer somehow that the deployment failed, but that it failed due to the program having an error and not due to a bug in Pulumi.

There are a few ways to go about this, but I think the one that makes the most sense is to have the owner of the diagnostic sink tell the CLI layer to exit with a non-zero exit code if there have been any errors logged to the sink. This is still a little tricky, because today the engine owns the diagnostic sink, but it conceivably could be moved to the backend layer.

@swgillespie
Copy link
Contributor Author

To be clear, I expect that the above goal will take a few PRs to achieve - there's a nontrivial amount of plumbing that needs to get done before we're in a place where we can get away with using the diagnostic sink to determine our exit status.

@joeduffy
Copy link
Member

I expect that the above goal will take a few PRs to achieve

👍 totally get it. Please just allow me to register enthusiasm for the expected end state 😄 🎉

@@ -36,9 +36,6 @@ type planExecutor struct {
stepExec *stepExecutor // step executor owned by this plan
}

// Utility for convenient logging.
var log = logging.V(4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this doesn't work anymore - logging.V(4) gets the logging level at init, and we don't use the normal glog initialization mechanism (i.e. flags) anymore and set the log level after module constructors have run.

pe.reportError(pe.plan.generateEventURN(event.Event), eventErr)
}

logging.V(4).Infof("planExecutor.Execute(...): canceling context")
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was an issue before. However, i think it would probably always be good to explain the implications of these sorts of calls as htey are essentially non local flow control as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

if err != nil {
return err
return flow, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is both interesting and hard to reason about. We got an error from a deeper path, but we're also passing along flow control dcisions from a deeper level to a higher one. it's not clear to me that that's necessarily the right thing to do.

@swgillespie
Copy link
Contributor Author

Based on @CyrusNajmabadi's in-person and online feedback I've reworked the programming idioms here a bit.

controlFlow now is an enum with three variants:

  • Bail - Indicates to the caller that the callee issued a diagnostic and does not think that the plan should continue.
  • Continue - Indicates to the caller that the callee thinks the plan should continue. The callee may still have issued diagnostics, but the plan can still proceed despite it.
  • Undefined - Indicates to the caller that the callee doesn't know what to do because it or something it calls transitively has hit an internal-to-Pulumi error (i.e. a bug). It's up to the caller to decide what to do.

A function returning a controlFlow returns two or three things:

  1. First, if the function returns data (i.e. []Step, as is the case with this PR), the first return value is the data.
  2. The recommended control flow, one of Bail, Continue, or Undefined.
  3. An error.

The returned data is not the default value, controlFlow must be Continue and error must be nil. Callers interpret this as "please continue, this operation was successful."

If controlFlow is Bail, the data must be the default value and error must be nil. Callers interpret this as "please do not continue, this operation was not successful but we have gracefully issued appropriate diagnostics."

If controlFlow is Undefined, the data must be the default value and error must not be nil. Callers interpret this as "figure out what to do next, this operation failed because of an internal error."

// error that ultimately bubbles up to the top level.
if eventErr == nil {
eventErr = errors.New("step generation failed")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this definitely seems like a reasonable way to 'snip' off the virality of this change at certain points. However, in general, it would be nice if there was some type/pattern we could use here so you can go back later and move these forward. i.e. have some sort of helper function (which simply makes an error) but now can be documetned and can be something we look for and remove later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like context.TODO() is for contexts? That would make sense here for sure

// used in tandem with an `error` return value. A function that returns a (controlFlow, error) means that
// the called function can produce diagnostics and *gracefully fail*, while also being capable of failing ungracefully.
// It is a *bug in Pulumi* whenever a function of this type returns a non-nil error, and it is expected that
// callers of these functions propegate the returned errors upwards so that higher-level components can print
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propegate

} else if sg.issueCheckErrors(new, urn, failures) {
return nil, errors.New("One or more resource validation errors occurred; refusing to proceed")
return nil, Bail, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I missed this one earlier. Sorry, I thought this made it into the first commit.

@@ -204,7 +238,7 @@ func (sg *stepGenerator) GenerateSteps(event RegisterResourceEvent) ([]Step, err

// If the resource isn't valid, don't proceed any further.
if invalid {
return nil, errors.New("One or more resource validation errors occurred; refusing to proceed")
return nil, Bail, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected?

The Result type can be used to signal the failure of a computation due
to both internal and non-internal reasons. If a computation failed due
to an internal error, the Result type carries that error with it and
provides it when the 'Error' method on a Result is called. If a
computation failed gracefully, but wished to bail instead of continue a
doomed plan, the 'Error' method provides a value of null.
@swgillespie
Copy link
Contributor Author

The most recent commit introduces a Result type to encapsulate the error and control flow types in the original PR. The contents of this PR came from a couple brainstorms with Pat and Cyrus, so I'd appreciate eyes on the design if either of you have the time today.

The original before/after screenshots in this PR still apply here, so the end result of this PR is still the same.

@swgillespie
Copy link
Contributor Author

One thing that came up in our brainstorming is that it would be useful if Error could be called on a nil receiver, so that callers don't ever need to check res != nil before turning it into an error with res.Error. Cyrus suggested we move Result to an interface so that we're not passing pointers around everywhere, which I did in the most recent commit, but we lose the ability to invoke methods on null Results.

// adapted to use Results. Their use is intended to be temporary until Results
// are plumbed throughout the Pulumi codebase.
func TODO() error {
return errors.New("bailng due to error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the comment you just explained to me about how this will not be in the users face all the time in the interim period?

cancel()
return false, eventErr
return false, result.TODO()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is reasonable API design, but part of me wants to add a string (literal) argument to result.TODO that forces users to write a comment as to why this TODO is necessary. It could be an actual code comment here, todo.

Copy link
Contributor

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But would like @pgavlin to look at as well. IN the meantime, i would just start making more dependent PRs taht thread this through more of the stack.

pkg/resource/deploy/plan_executor.go Outdated Show resolved Hide resolved
} else if analyzer == nil {
return nil, errors.Errorf("analyzer '%v' could not be loaded from your $PATH", a)
return nil, result.Errorf("analyzer '%v' could not be loaded from your $PATH", a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this instead report a diagnostic and then Bail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely should - I tried to avoid changing too many things in this PR so I could keep the size down. I plan on tackling a lot of these in follow-up PRs. Is that OK with you or would you prefer I tackle them here?

// At the highest level, when a function wishes to return only an `error`, the
// `Error` member function can be used to turn a nullable `Result` into an
// `error`.
type Result interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If resultImpl is the only expected implementor of this interface, then this is very non-idiomatic. Instead, we should just use a struct with methods on its pointer type:

type Result struct {
    err error
}

func (r *Result) Error() error {
    return r.err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi does this sound OK?


// Bail produces a Result that represents a computation that failed to complete
// successfully but is not a bug in Pulumi.
func Bail() Result {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the term "bail", but I don't have a better suggestion offhand. Perhaps Terminate or Done or something similar? I could certainly live with "bail" if nothing better arises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything better than Bail - Terminate and Done are both already used by contexts so I think it would be confusing to overload those terms again in the codebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abort?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even Stop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know that this is bikeshedding, but given that this is sort of an odd pattern I do want it to be as obvious and readable as possible)

@swgillespie
Copy link
Contributor Author

@pgavlin Addressed your feedback - I think this is ready to merge if there aren't any objections. I'm going to start pushing result.Result through more parts of the engine once this PR merges.

@swgillespie
Copy link
Contributor Author

I'm going to go ahead and merge this so I can get started on follow-up work. Let me know if there are any changes I should make.

@swgillespie swgillespie merged commit ca58b81 into master Sep 5, 2018
@pulumi-bot pulumi-bot deleted the swgillespie/error-handling branch September 5, 2018 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants