CLI drops (some?) errors on the floor #1762

joeduffy · 2018-08-12T16:53:18Z

I have a resource that is failing its Check, and the CLI simply fails with:

λ pulumi up
Previewing update of stack 'hello-aks'
Previewing changes:

     Type                       Name                    Plan       Info
 +   pulumi:pulumi:Stack        azure-ts-aks-hello-aks  create
 +   └─ pulumi:providers:azure  default                 create
error: an error occurred while advancing the preview

The logs clearly show the failure coming from Check that never gets displayed:

I0812 09:43:41.744353   11969 rpc.go:68] Marshaling property for RPC[Provider[azure, 0xc42025aff0].Check(urn:pulumi:hello-aks::azure-ts-aks::azure:core/resourceGroup:ResourceGroup::azure-ts-aks-hello-aks).news]: location={East US}
I0812 09:43:41.745081   11969 provider_plugin.go:244] Provider[azure, 0xc42025aff0].Check(urn:pulumi:hello-aks::azure-ts-aks::azure:core/resourceGroup:ResourceGroup::azure-ts-aks-hello-aks) failed: err=name 'azure-ts-aks-hello-aks' plus 8 random chars is longer than maximum length 24
I0812 09:43:41.745091   11969 plan_executor.go:209] PlanExecutor.handleSingleEvent(...): received step event error: name 'azure-ts-aks-hello-aks' plus 8 random chars is longer than maximum length 24
I0812 09:43:41.745118   11969 plan_executor.go:115] PlanExecutor.Execute(...): waiting for incoming events
I0812 09:43:41.745129   11969 plan_executor.go:143] PlanExecutor.Execute(...): context canceled
I0812 09:43:41.745130   11969 step_executor.go:245] StepExecutor worker(7): worker exiting due to cancellation
I0812 09:43:41.745130   11969 plan_executor.go:81] PlanExecutor.Execute(...): cancel goroutine exiting
I0812 09:43:41.745141   11969 step_executor.go:245] StepExecutor worker(1): worker exiting due to cancellation
I0812 09:43:41.745150   11969 plan_executor.go:149] PlanExecutor.Execute(...): exited event loop, waiting for completion
I0812 09:43:41.746223   11969 step_executor.go:245] StepExecutor worker(-1): StepExecutor.waitForCompletion(): waiting for worker threads to exit
I0812 09:43:41.745152   11969 step_executor.go:245] StepExecutor worker(3): worker exiting due to cancellation
I0812 09:43:41.745161   11969 step_executor.go:245] StepExecutor worker(8): worker exiting due to cancellation
I0812 09:43:41.745161   11969 step_executor.go:245] StepExecutor worker(9): worker exiting due to cancellation
I0812 09:43:41.745156   11969 step_executor.go:245] StepExecutor worker(5): worker exiting due to cancellation
I0812 09:43:41.745149   11969 step_executor.go:245] StepExecutor worker(4): worker exiting due to cancellation
I0812 09:43:41.745167   11969 step_executor.go:245] StepExecutor worker(2): worker exiting due to cancellation
I0812 09:43:41.745175   11969 step_executor.go:245] StepExecutor worker(6): worker exiting due to cancellation
I0812 09:43:41.745200   11969 step_executor.go:245] StepExecutor worker(0): worker exiting due to cancellation
I0812 09:43:41.746407   11969 step_executor.go:245] StepExecutor worker(-1): StepExecutor.waitForCompletion(): worker threads all exited
I0812 09:43:41.746413   11969 plan_executor.go:151] PlanExecutor.Execute(...): step executor has completed
I0812 09:43:41.746419   11969 plan_executor.go:161] PlanExecutor.Execute(...): observed that the plan errored
I0812 09:43:41.749407   11969 langruntime_plugin.go:153] langhost[nodejs].Run(pwd=/Users/joeduffy/dev/code/src/github.com/pulumi/examples/azure-ts-aks,program=.,...,dryrun=true) failed: err=transport is closing
 +   └─ pulumi:providers:azure  default                 create
I0812 09:43:41.763428   11969 ignore.go:44] Explicitly ignoring and discarding error: 1 error occurred:

* operation not permitted
I0812 09:43:41.763452   11969 source_eval.go:160] EvalSourceIterator ended with an error: transport is closing
I0812 09:43:41.763471   11969 plan_executor.go:99] PlanExecutor.Execute(...): incoming events goroutine exiting
I0812 09:43:41.764196   11969 sink.go:154] defaultSink::Error(error: an error occurred while advancing the preview

This is on 0.15.0-rc2 bits:

λ pulumi version
v0.15.0-rc2
λ grep '"version":' node_modules/@pulumi/pulumi/package.json
  "version": "0.15.0-rc2"
λ grep '"version":' node_modules/@pulumi/azure/package.json
  "version": "0.15.0-rc2"

The text was updated successfully, but these errors were encountered:

joeduffy · 2018-08-12T17:00:39Z

Bizarrely, now I can't even update or destroy this stack, and get 500s from the service each time I try. I'm not sure if it's related to the above failure.

Here is the log:

Updating stack 'hello-aks'
I0812 09:55:25.852103   12364 api.go:155] Making Pulumi API call: https://api.pulumi.com/api/stacks/joeduffy/hello-aks/update
I0812 09:55:25.852120   12364 api.go:157] Pulumi API call details (https://api.pulumi.com/api/stacks/joeduffy/hello-aks/update): headers=map[Content-Type:[application/json] User-Agent:[pulumi-cli/1 (v0.15.0-rc2; darwin)] Authorization:[token <SNIP>]]; body={"name":"azure-ts-aks","runtime":"nodejs","main":"","description":"Create an Azure Kubernetes Service (AKS) cluster and deploy a container to it.","options":{"analyzers":[],"color":"raw","dryRun":false,"parallel":10,"showConfig":false,"showReplacementSteps":false,"showNames":false,"summary":false,"debug":false},"config":{"azure-ts-aks:config:clientId":{"string":"clientId","secret":false},"azure-ts-aks:config:clientSecret":{"string":"AA<SNIP>Kg=","secret":true},"azure-ts-aks:config:clientsecret":{"string":"AAA<SNIP>mFiU=","secret":true},"azure-ts-aks:config:sshPublicKey":{"string":"ssh-rsa AAAAB3NzaC1yc2EAA<SNIP>vrCKiS14H74UsSricKFc+CG0S/jZJzX3Y2nu+CZIW0GpqkT9JdlkO32gDXKujxYaF joeduffy@joedu-wallawalla.local","secret":false},"azure:config:environment":{"string":"public","secret":false}},"metadata":{"message":"","environment":{"git.author":"joeduffy","git.author.email":"joe@pulumi.com","git.committer":"joeduffy","git.committer.email":"joe@pulumi.com","git.dirty":"true","git.head":"0b057e1ee855220e8d8a00833407d971de9e4192","github.login":"pulumi","github.repo":"examples"}}}
I0812 09:55:25.937413   12364 api.go:170] Pulumi API call response code (https://api.pulumi.com/api/stacks/joeduffy/hello-aks/update): 500 Internal Server Error
I0812 09:55:25.937645   12364 sink.go:154] defaultSink::Error(error: [500] Internal Server Error)
error: [500] Internal Server Error

/cc @chrsmith

joeduffy · 2018-08-12T17:34:03Z

Got hit by this again just now:

λ pulumi up
Previewing update of stack 'hi-aks'
Previewing changes:

     Type                       Name                 Plan          Info
 *   pulumi:pulumi:Stack        azure-ts-aks-hi-aks  no change
 ~   └─ pulumi:providers:azure  default              update        changes: ~ version
error: an error occurred while advancing the preview

Inspecting the logs reveals that my Azure token has expired:

I0812 10:33:41.195156   58419 provider_plugin.go:197] Provider[azure, 0xc420e705a0].Configure() failed: err=No valid (unexpired) Azure CLI Auth Tokens found. Please run `az login`.
I0812 10:33:41.195175   58419 plan_executor.go:209] PlanExecutor.handleSingleEvent(...): received step event error: No valid (unexpired) Azure CLI Auth Tokens found. Please run `az login`.
I0812 10:33:41.195200   58419 plan_executor.go:115] PlanExecutor.Execute(...): waiting for incoming events
I0812 10:33:41.195207   58419 plan_executor.go:143] PlanExecutor.Execute(...): context canceled
I0812 10:33:41.195212   58419 plan_executor.go:149] PlanExecutor.Execute(...): exited event loop, waiting for completion
I0812 10:33:41.195218   58419 step_executor.go:245] StepExecutor worker(-1): StepExecutor.waitForCompletion(): waiting for worker threads to exit
I0812 10:33:41.195225   58419 step_executor.go:245] StepExecutor worker(5): worker exiting due to cancellation
I0812 10:33:41.195228   58419 step_executor.go:245] StepExecutor worker(8): worker exiting due to cancellation
I0812 10:33:41.195234   58419 plan_executor.go:81] PlanExecutor.Execute(...): cancel goroutine exiting
I0812 10:33:41.195237   58419 step_executor.go:245] StepExecutor worker(7): worker exiting due to cancellation
I0812 10:33:41.195242   58419 step_executor.go:245] StepExecutor worker(9): worker exiting due to cancellation
I0812 10:33:41.195241   58419 step_executor.go:245] StepExecutor worker(1): worker exiting due to cancellation
I0812 10:33:41.195248   58419 step_executor.go:245] StepExecutor worker(4): worker exiting due to cancellation
I0812 10:33:41.195244   58419 step_executor.go:245] StepExecutor worker(0): worker exiting due to cancellation
I0812 10:33:41.195255   58419 step_executor.go:245] StepExecutor worker(2): worker exiting due to cancellation
I0812 10:33:41.195256   58419 step_executor.go:245] StepExecutor worker(3): worker exiting due to cancellation
I0812 10:33:41.195266   58419 step_executor.go:245] StepExecutor worker(6): worker exiting due to cancellation
I0812 10:33:41.195324   58419 step_executor.go:245] StepExecutor worker(-1): StepExecutor.waitForCompletion(): worker threads all exited
I0812 10:33:41.195330   58419 plan_executor.go:151] PlanExecutor.Execute(...): step executor has completed
I0812 10:33:41.195335   58419 plan_executor.go:161] PlanExecutor.Execute(...): observed that the plan errored
I0812 10:33:41.201494   58419 langruntime_plugin.go:153] langhost[nodejs].Run(pwd=/Users/joeduffy/dev/code/src/github.com/pulumi/examples/azure-ts-aks,program=.,...,dryrun=true) failed: err=transport is closing
I0812 10:33:41.201536   58419 ignore.go:44] Explicitly ignoring and discarding error: rpc error: code = Canceled desc = grpc: the client connection is closing
I0812 10:33:41.213520   58419 ignore.go:44] Explicitly ignoring and discarding error: 1 error occurred:

* operation not permitted
I0812 10:33:41.213576   58419 source_eval.go:160] EvalSourceIterator ended with an error: transport is closing
 ~   └─ pulumi:providers:azure  default              update        changes: ~ version
I0812 10:33:41.214373   58419 sink.go:154] defaultSink::Error(error: an error occurred while advancing the preview

joeduffy · 2018-08-12T23:28:02Z

I've hit this in a handful more cases today. This definitely needs a fix before 0.15 is ready to go.

I0812 16:25:22.672234   71528 provider_plugin.go:197] Provider[kubernetes, 0xc420d76be0].Configure() failed: err=Unable to read kubectl config: invalid configuration: no con
I0812 16:25:22.686537   71528 plan_executor.go:209] PlanExecutor.handleSingleEvent(...): received step event error: Unable to read kubectl config: invalid configuration: no configuration has been provided

lukehoban · 2018-08-13T01:12:17Z

FWIW - I’ve also hit this on every error (I think so far all things originating from the resource provider).

This test is failing in Travis, and is not debugable due to pulumi/pulumi#1762. Disabling for now. Tracking reenabling this with #130.

swgillespie · 2018-08-13T03:40:56Z

There are potentially two different issues here. The first one is that the implementation of Check by the provider registry returns an error when Configure or CheckConfig fails instead of populating the failures field of CheckResponse. This leads us to drop those errors (see also #1712, since it is basically impossible today to reason about who logs what error).

I'm not sure what's going on with Joe's original Azure example. Looks like it's probably a similar issue.

Either way, a band-aid to make this much better until we tackle #1712 in M17 shouldn't be too bad here.

swgillespie · 2018-08-13T03:45:40Z

It seems weird to me that the azure implementation of Check returns a legitimate error in Joe's example and not a non-error response with the failures field set. That seems to be what's confusing the step generator here.

lukehoban · 2018-08-13T03:47:29Z

I have a feeling this isn’t just about Check. I saw this for several failures coming from the provider which I’m fairly certain were errors from Update or Create operations as well.

I also hit this for both Azure and Kubernetes this afternoon.

swgillespie · 2018-08-13T03:57:32Z

Here's the state of our error reporting:

Failures in Check are (supposed to be) reported to the CLI by the step generator. The step generator may return errors which today may or may not have already been logged. As we can see here in this bug, the plan executor doesn't log the errors coming out of the step generator, so they can be dropped.
Failures in step applications are logged by post-step events. This hasn't been changed recently, so I would be surprised if there were issues here.
Errors coming from step execution other than the application of the steps themselves (e.g. pre and post-step callbacks returned errors, we registered the same URN twice) are logged by the step executor.

The de facto contract that the step generator has with the plan executor is that it has already done all error logging that needs to be done before returning an error. We should strive to centralize our error reporting logic in one place, either in the step generator or in the plan executor. In practice this would mean that the step generator always logs every single error that it sees, or that the plan executor logs every error that comes out of step generation.

chrsmith · 2018-08-13T15:18:01Z

While we do need to focus on the CLI error reporting issue, the reason the service fails the call is because it looks like we are being sent a duplicate metadata key with the update. We'll probably still want to fail the request with a 400, but we can provide a better error message from the service using the expanded ErrorResponse type.

This test is failing in Travis, and is not debugable due to pulumi/pulumi#1762. Disabling for now. Tracking reenabling this with #130.

Similar to #1762, fixes #1775. The language host can fail without issuing any diagnostics and it is very unclear what happens if the engine does not log the error.

* Log errors coming from the language host Similar to #1762, fixes #1775. The language host can fail without issuing any diagnostics and it is very unclear what happens if the engine does not log the error. * CR feedback

joeduffy added kind/bug Some behavior is incorrect or out of spec area/cli UX of using the CLI (args, output, logs) labels Aug 12, 2018

joeduffy added this to the 0.16 milestone Aug 12, 2018

joeduffy assigned swgillespie Aug 12, 2018

lukehoban added the p0 Bugs severe enough to interrupt existing work label Aug 13, 2018

lukehoban added a commit to pulumi/pulumi-kubernetes that referenced this issue Aug 13, 2018

Temporarily disable provider test.

a2ba053

This test is failing in Travis, and is not debugable due to pulumi/pulumi#1762. Disabling for now. Tracking reenabling this with #130.

joeduffy pushed a commit to pulumi/pulumi-kubernetes that referenced this issue Aug 13, 2018

Temporarily disable provider test.

bbd163b

This test is failing in Travis, and is not debugable due to pulumi/pulumi#1762. Disabling for now. Tracking reenabling this with #130.

swgillespie mentioned this issue Aug 13, 2018

Emit diagnostics for errors coming from step generation #1766

Merged

swgillespie closed this as completed in #1766 Aug 13, 2018

swgillespie added a commit that referenced this issue Aug 15, 2018

Log errors coming from the language host

d9d9eff

Similar to #1762, fixes #1775. The language host can fail without issuing any diagnostics and it is very unclear what happens if the engine does not log the error.

swgillespie mentioned this issue Aug 15, 2018

Log errors coming from the language host #1780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI drops (some?) errors on the floor #1762

CLI drops (some?) errors on the floor #1762

joeduffy commented Aug 12, 2018

joeduffy commented Aug 12, 2018

joeduffy commented Aug 12, 2018

joeduffy commented Aug 12, 2018

lukehoban commented Aug 13, 2018

swgillespie commented Aug 13, 2018

swgillespie commented Aug 13, 2018

lukehoban commented Aug 13, 2018 •

edited

swgillespie commented Aug 13, 2018

chrsmith commented Aug 13, 2018

CLI drops (some?) errors on the floor #1762

CLI drops (some?) errors on the floor #1762

Comments

joeduffy commented Aug 12, 2018

joeduffy commented Aug 12, 2018

joeduffy commented Aug 12, 2018

joeduffy commented Aug 12, 2018

lukehoban commented Aug 13, 2018

swgillespie commented Aug 13, 2018

swgillespie commented Aug 13, 2018

lukehoban commented Aug 13, 2018 • edited

swgillespie commented Aug 13, 2018

chrsmith commented Aug 13, 2018

lukehoban commented Aug 13, 2018 •

edited