Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle unreachable k8s cluster #946

Merged
merged 8 commits into from
Jan 16, 2020

Conversation

lblackstone
Copy link
Member

@lblackstone lblackstone commented Jan 13, 2020

Proposed changes

Previously, the provider erroneously expected that the default provider pointed to a functioning Kubernetes cluster. This led to unexpected failures in cases where this wasn't true, such as the user manually setting the kubeconfig value for the stack to an invalid value. This change explicitly checks for a valid configuration, and falls back to a degraded state if this check fails. This still
allows invoke logic to run during previews without requiring an active k8s cluster.

Related issues (optional)

Fixes #950

@pgavlin
Copy link
Member

pgavlin commented Jan 13, 2020

Here's the approach I was starting in on until the holiday break: https://gist.github.com/pgavlin/35c55c320d9904d04aa4674ff3c9a615

I stalled out on the error classification bit. FWIW, I would rather use this sort of dynamic approach than the static approach--I would expect us to fail as we do today if the kubeconfig is invalid.

@lblackstone
Copy link
Member Author

I would rather use this sort of dynamic approach than the static approach

If I understand correctly, you want:

  1. If the kubeconfig is known (not computed) AND is invalid (fails to unmarshal), then return an error
  2. If the kubeconfig is known AND valid, check connectivity during Configure and return a specific error for an unreachable cluster

Is that right?

@lblackstone
Copy link
Member Author

To follow up on that, I think my current approach on Diff isn't quite right. If the cluster is not reachable (e.g., in the computed provider case), the diff should be equivalent to a cluster without the resources created.

@pgavlin
Copy link
Member

pgavlin commented Jan 13, 2020

If the kubeconfig is known (not computed) AND is invalid (fails to unmarshal), then return an error

Yes.

If the kubeconfig is known AND valid, check connectivity during Configure and return a specific error for an unreachable cluster

Essentially, yes. As long as Read still operates in the case of an unreachable cluster, I think the other operations (Check, Diff, Create, Update, Delete) can return the original error that indicated that the cluster is unreachable. I think Read should return a nil state and error, and warn that the resource is not found because the cluster could not be reached. This allows refresh to operate on unreachable clusters (e.g. clusters that have been deleted).

@lblackstone lblackstone force-pushed the lblackstone/provider-no-cluster branch 2 times, most recently from 512a8fe to 0667766 Compare January 14, 2020 21:17
@lblackstone lblackstone changed the title Return error from provider for unreachable k8s cluster Gracefully handle unreachable k8s cluster Jan 15, 2020
@lblackstone
Copy link
Member Author

@pgavlin This should be RFR now. I tested these changes locally with the invoke changes, and it seems to be working.

pkg/provider/provider.go Outdated Show resolved Hide resolved
pkg/provider/provider.go Show resolved Hide resolved
pkg/provider/provider.go Show resolved Hide resolved
{
Dir: "step2",
Additive: false,
ExpectFailure: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test really testing anything? I assume it passed previously as well - just with a different error?

Is it worth having a test that doesn't fail after these changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I think what I really want to test is that a preview succeeds even when a cluster isn't reachable, but I don't think our test framework currently handles that case (preview only, no update).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Justin informed me that this is possible using RunCommand("pulumi", "preview"), so I'll update the test accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be more difficult than I expected, so I'm opening an issue to update the test framework and will follow up on this later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the provider erroneously expected that the
default provider pointed to a functioning Kubernetes
cluster. This led to unexpected failures in cases where
this wasn't true, such as the user manually setting the
kubeconfig value for the stack to an invalid value. This
change explicitly checks for a valid configuration, and
returns with a descriptive error if this check fails.
@lblackstone lblackstone force-pushed the lblackstone/provider-no-cluster branch from 166be7a to cffa7a6 Compare January 16, 2020 21:05
@lblackstone lblackstone merged commit 56ef622 into master Jan 16, 2020
@pulumi-bot pulumi-bot deleted the lblackstone/provider-no-cluster branch January 16, 2020 21:41
lblackstone added a commit that referenced this pull request Jan 16, 2020
Reintroduce the reverted changed (#941) from #925 and #934 with a few
additional fixes related to the changes in #946.

The major changes include the following:

- Use a runtime invoke to call a common decodeYaml method in the
provider rather than using YAML libraries specific to each language.
- Use the namespace parameter of helm.v2.Chart as a default,
and set it on known namespace-scoped resources.
lblackstone added a commit that referenced this pull request Jan 16, 2020
Reintroduce the reverted changed (#941) from #925 and #934 with a few
additional fixes related to the changes in #946.

The major changes include the following:

- Use a runtime invoke to call a common decodeYaml method in the
provider rather than using YAML libraries specific to each language.
- Use the namespace parameter of helm.v2.Chart as a default,
and set it on known namespace-scoped resources.
lblackstone added a commit that referenced this pull request Jan 21, 2020
…ces (#952)

Reintroduce the reverted changed (#941) from #925 and #934 with a few
additional fixes related to the changes in #946.

The major changes include the following:

- Use a runtime invoke to call a common decodeYaml method in the
provider rather than using YAML libraries specific to each language.
- Use the namespace parameter of helm.v2.Chart as a default,
and set it on known namespace-scoped resources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refresh and up fail when Kubernetes providers are deleted
3 participants