Gracefully handle unreachable k8s cluster #946

lblackstone · 2020-01-13T20:26:31Z

Proposed changes

Previously, the provider erroneously expected that the default provider pointed to a functioning Kubernetes cluster. This led to unexpected failures in cases where this wasn't true, such as the user manually setting the kubeconfig value for the stack to an invalid value. This change explicitly checks for a valid configuration, and falls back to a degraded state if this check fails. This still
allows invoke logic to run during previews without requiring an active k8s cluster.

Related issues (optional)

Fixes #950

pgavlin · 2020-01-13T22:45:03Z

Here's the approach I was starting in on until the holiday break: https://gist.github.com/pgavlin/35c55c320d9904d04aa4674ff3c9a615

I stalled out on the error classification bit. FWIW, I would rather use this sort of dynamic approach than the static approach--I would expect us to fail as we do today if the kubeconfig is invalid.

lblackstone · 2020-01-13T23:09:59Z

I would rather use this sort of dynamic approach than the static approach

If I understand correctly, you want:

If the kubeconfig is known (not computed) AND is invalid (fails to unmarshal), then return an error
If the kubeconfig is known AND valid, check connectivity during Configure and return a specific error for an unreachable cluster

Is that right?

lblackstone · 2020-01-13T23:13:59Z

To follow up on that, I think my current approach on Diff isn't quite right. If the cluster is not reachable (e.g., in the computed provider case), the diff should be equivalent to a cluster without the resources created.

pgavlin · 2020-01-13T23:52:19Z

If the kubeconfig is known (not computed) AND is invalid (fails to unmarshal), then return an error

Yes.

If the kubeconfig is known AND valid, check connectivity during Configure and return a specific error for an unreachable cluster

Essentially, yes. As long as Read still operates in the case of an unreachable cluster, I think the other operations (Check, Diff, Create, Update, Delete) can return the original error that indicated that the cluster is unreachable. I think Read should return a nil state and error, and warn that the resource is not found because the cluster could not be reached. This allows refresh to operate on unreachable clusters (e.g. clusters that have been deleted).

lblackstone · 2020-01-15T00:55:01Z

@pgavlin This should be RFR now. I tested these changes locally with the invoke changes, and it seems to be working.

pkg/provider/provider.go

lukehoban · 2020-01-16T19:33:19Z

tests/integration/provider/provider_test.go

+			{
+				Dir:           "step2",
+				Additive:      false,
+				ExpectFailure: true,


Is this test really testing anything? I assume it passed previously as well - just with a different error?

Is it worth having a test that doesn't fail after these changes?

Yeah, good point. I think what I really want to test is that a preview succeeds even when a cluster isn't reachable, but I don't think our test framework currently handles that case (preview only, no update).

Justin informed me that this is possible using RunCommand("pulumi", "preview"), so I'll update the test accordingly.

This turned out to be more difficult than I expected, so I'm opening an issue to update the test framework and will follow up on this later.

pulumi/pulumi#3762

Previously, the provider erroneously expected that the default provider pointed to a functioning Kubernetes cluster. This led to unexpected failures in cases where this wasn't true, such as the user manually setting the kubeconfig value for the stack to an invalid value. This change explicitly checks for a valid configuration, and returns with a descriptive error if this check fails.

Reintroduce the reverted changed (#941) from #925 and #934 with a few additional fixes related to the changes in #946. The major changes include the following: - Use a runtime invoke to call a common decodeYaml method in the provider rather than using YAML libraries specific to each language. - Use the namespace parameter of helm.v2.Chart as a default, and set it on known namespace-scoped resources.

…ces (#952) Reintroduce the reverted changed (#941) from #925 and #934 with a few additional fixes related to the changes in #946. The major changes include the following: - Use a runtime invoke to call a common decodeYaml method in the provider rather than using YAML libraries specific to each language. - Use the namespace parameter of helm.v2.Chart as a default, and set it on known namespace-scoped resources.

lblackstone requested a review from pgavlin January 13, 2020 20:26

lblackstone force-pushed the lblackstone/provider-no-cluster branch 2 times, most recently from 512a8fe to 0667766 Compare January 14, 2020 21:17

lblackstone changed the title ~~Return error from provider for unreachable k8s cluster~~ Gracefully handle unreachable k8s cluster Jan 15, 2020

lukehoban approved these changes Jan 16, 2020

View reviewed changes

lblackstone added 8 commits January 16, 2020 12:58

Test for error on invalid config

9e23f91

Update invoke logic

880c7f5

Update logic

fea2dad

Don't error out on invalid kubeconfig in Configure

c20cead

Update changelog

f5ca028

Handle case with valid kubeconfig but unreachable cluster

e88d24e

Address feedback

cffa7a6

lblackstone force-pushed the lblackstone/provider-no-cluster branch from 166be7a to cffa7a6 Compare January 16, 2020 21:05

lblackstone merged commit 56ef622 into master Jan 16, 2020

pulumi-bot deleted the lblackstone/provider-no-cluster branch January 16, 2020 21:41

lblackstone mentioned this pull request Jan 16, 2020

Move YAML decode logic into provider and improve default Helm namespaces #952

Merged

renovate bot mentioned this pull request Jun 6, 2021

chore(deps): update dependency @pulumi/kubernetes to v1.6.0 marcus-sa/pulumi-kubernetes-istio#34

Open

1 task

EronWright mentioned this pull request Oct 5, 2023

Provider config should be available as outputs #2486

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gracefully handle unreachable k8s cluster #946

Gracefully handle unreachable k8s cluster #946

lblackstone commented Jan 13, 2020 •

edited

Loading

pgavlin commented Jan 13, 2020

lblackstone commented Jan 13, 2020

lblackstone commented Jan 13, 2020

pgavlin commented Jan 13, 2020

lblackstone commented Jan 15, 2020

lukehoban Jan 16, 2020

lblackstone Jan 16, 2020

lblackstone Jan 16, 2020

lblackstone Jan 16, 2020

lblackstone Jan 16, 2020

Gracefully handle unreachable k8s cluster #946

Gracefully handle unreachable k8s cluster #946

Conversation

lblackstone commented Jan 13, 2020 • edited Loading

Proposed changes

Related issues (optional)

pgavlin commented Jan 13, 2020

lblackstone commented Jan 13, 2020

lblackstone commented Jan 13, 2020

pgavlin commented Jan 13, 2020

lblackstone commented Jan 15, 2020

lukehoban Jan 16, 2020

Choose a reason for hiding this comment

lblackstone Jan 16, 2020

Choose a reason for hiding this comment

lblackstone Jan 16, 2020

Choose a reason for hiding this comment

lblackstone Jan 16, 2020

Choose a reason for hiding this comment

lblackstone Jan 16, 2020

Choose a reason for hiding this comment

lblackstone commented Jan 13, 2020 •

edited

Loading