-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error messages output by the CLI #1011
Conversation
// In general, our resource state is only really unknown if the server | ||
// had an internal error, in which case it will serve one of `codes.Internal`, | ||
// `codes.DataLoss`, or `codes.Unknown` to us. | ||
func interpretRPCError(err error) (resource.Status, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bleeds in over to #219 a little bit and needs more work in order to reach its potential.
The idea here is that providers can send gRPC error codes alongside their error payloads to inform the engine of the nature of their failure. Today, we are sending the error code Unknown
for all errors, so the engine has no idea if we can actually recover from the error we just received.
After this PR lands providers can start sending error codes like e.g. FailedPrecondition
if a resource fails Terraform validation and we won't see the angry "catastrophic error" message for run-of-the-mill failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there's a better name for this method. It's sort of a bummer that it's not clear from the name this is going to give you information about the resource's status and that the error returned is not an error from this method but rather a translation of the error argument.
Maybe resourceStateAndErrorFromRPCError
Or maybe just resourceStateAndError
? I don't know. Your call.
Re: the after:
Perhaps something like:
|
If we use
I looked at
Sure! I'm not really sure what to do with "failure was catastrophic" - should we just omit it completely? We'll emit it for pretty much every error as it is today... |
pkg/resource/deploy/plan_apply.go
Outdated
return status, errors.New("Preview failed due to step application failure") | ||
} | ||
|
||
return status, errors.New("Deployment failed due to step application failure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if we call this Deployment
elsewhere in the engine, for messages that may end up in the user's face. The user facing verb is arguably Update
.
It's not immediately clear we are gaining any value from that error. In my suggested text above, that whole part of the error was replaced with just |
A bit of a shame to have to duplicate, but I can see an argument for being consistent about always printing an error message as the final output when we fail. Perhaps just keep it really simple then:
|
So unfortunately it's not nice today to present sub-errrors in an "indented" way like in Luke's example. This is because the errors get wrapped on the provider side and serialized out via RPC so, by the time we see it in the engine, it's just a big string. We could in theory parse the error message, but I'm wary of hardcoding provider implementation details (error message format) into the engine in such a way. I think that, going forward, we'd get a tremendous benefit from re-working our RPC boundaries a little bit to emit structured errors that the engine can inspect. gRPC lets us throw arbitrary "Details" objects onto error payloads, so we can pretty trivially include an error chain in gRPC error messages. We also have the opportunity to attach arbitrary metadata to error messages (#992). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's try to land this morning.
pkg/engine/deploy.go
Outdated
b.WriteString(colors.Reset) | ||
b.WriteString("\n") | ||
// | ||
// [pulumi/pulumi#219] Since all errors are "unrecoverable" today, the below error message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would delete this and think about how to add back something useful as part of #219.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay!
pkg/engine/plan.go
Outdated
@@ -195,7 +195,7 @@ func printPlan(result *planResult) (ResourceChanges, error) { | |||
actions := newPreviewActions(result.Options) | |||
_, _, _, err := result.Walk(actions, true) | |||
if err != nil { | |||
return nil, errors.Wrapf(err, "An error occurred while advancing the preview") | |||
return nil, errors.New("An error occurred while advancing the preview") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, should this be lower case like our other error messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
pkg/resource/deploy/plan_apply.go
Outdated
// a more specific error message. | ||
if err != nil { | ||
if preview { | ||
return status, errors.New("Preview failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, lower case (also below one).
// and a message. Based on the error code given to us, we can understand | ||
// the state of our system and if our resource status is truly unknown. | ||
// | ||
// In general, our resource state is only really unknown if the server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the propagated error message differentiate on the status code? I'm just thinking that a bug report from a customer that contains minimal data might be easier to diagnose if it said something like [rpc internal error]
or somesuch. Obviously, not for the errors that we expect to arise from plugins returning errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. No provider actually returns Internal
today, but if it did we could go even further and ask that providers return a stack trace of a panic that produced the internal error. (This is pretty trivial with gRPC, we just don't do it.)
Since everything returns Unknown
today I'm inclined to leave this code as it is and, when we start plumbing gRPC error codes throughout our system, address it then. Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
} | ||
|
||
glog.V(8).Infof("rpc error kind `%s` is well-understood and recoverable", rpcStatus.Code()) | ||
return resource.StatusOK, errors.New(rpcStatus.Message()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the original intent of this was related to resource tainting. StatusUnknown
was actually meant to convey that, because a resource provider returned an error, we cannot guarantee the state of the resources. Or so the thinking went. We should figure this out as part of #219, but AFAIK nothing actually uses this today so changing its meaning shouldn't affect anything. Just wanted you to know the history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, and I figured as much.
By the way, I really liked Luke's example from earlier:
I gather we can't get there just yet because we need to communicate more error information across the RPC boundary. However, please do file a follow up work item to track the idea. This would be a lovely place to get to. |
@joeduffy will do. I think that being a little more disciplined about our RPC endpoints allow a lot of very cool scenarios like indented error messages just like this. I've been thinking a lot about it this week and I don't think it's a ton of work. |
Can't we then just fix this inside the provider? Do the newlines an indentation there? |
We can, but that doesn't feel like the right fix to me. It feels like a layering violation to expect a resource provider to spit out error messages that are nicely-formatted for a console. (I'm happy to do it if we really want this to be nice for M11, though.) In general I'm a big believer in structured logging over stringy logging and to me there are a lot of engineering and hygiene benefits to having provider errors be structured. |
Is anybody opposed to merging this now? I think I've addressed all feedback and I'll log issues for follow-up work items. |
Go for it - big improvement already here - looking forward to even more! |
Thanks! I'll hold off merging until @ellismg 's done to keep the merge conflict pain to a minimum. |
This fixes a couple known issues with the way that we present errors from the Pulumi CLI: 1. Any errors from RPC endpoints were bubbling up as they were to the top-level, which was unfortunate because they contained RPC-specific noise that we don't want to present to the user. This commit unwraps errors from resource providers. 2. The "catastrophic error" message often got printed twice 3. Fatal errors are often printed twice, because our CLI top-level prints out the fatal error that it receives before exiting. A lot of the time this error has already been printed. 4. Errors were prefixed by PU####.
…e error message as the final error
9aa697e
to
13aa063
Compare
thanks for the reviews! |
Partially addresses #606.
This fixes a couple known issues with the way that we present errors
from the Pulumi CLI:
Here's a before and after for a doomed deployment:
before:
after:
The things to note:
rpc error: code = Unknown desc =
is gone, getting stripped away at the provider plugin layerPU2000
is goneDeployment failed due to step application failure
and not the verbose error that we've already reported