Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsphere creds checking #1186

Merged

Conversation

joelddiaz
Copy link
Contributor

Fix the immediate problem where passing bad credentials for a vSphere installation means that deleting the ClusterDeployment doesn't get deleted (b/c the deprovision fails (b/c of the bad creds)).

Test the creds, and set a condition so that we can avoid ever even creating a ClusterProvision when we detect bad credentials. This means deprovision works b/c the provision was never attempted. Right now we only check vSphere credentials, but more platforms can (and should) be added.

Add test cases to validate this.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2020
@joelddiaz
Copy link
Contributor Author

/assign @dgoodwin
/cc @abutcher

Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

Given what's going on don't worry about these changes, I or someone on the team will get them.

if authWorking {
status = corev1.ConditionFalse
reason = platformAuthSuccessReason
message = "Platform credentails passed authentication check"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credentails/credentials

authCondition := controllerutils.FindClusterDeploymentCondition(cd.Status.Conditions, hivev1.AuthenticationFailureClusterDeploymentCondition)
if authCondition != nil && authCondition.Status == corev1.ConditionTrue {
cdLog.Info("Skipping provision while platform credentials authentication is failing.")
// Periodically retry???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice to just requeue every 5 min and be done with it, but there's no reason to think that would help. It probably needs to be done right, watch secrets for changes to a platform secret, requeue the CD if we spot one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now return an error when credentials are not valid.

pkg/controller/utils/credentials.go Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2020
@@ -313,6 +313,9 @@ const (

// DeprovisionLaunchErrorCondition is set when a cluster deprovision fails to launch.
DeprovisionLaunchErrorCondition ClusterDeploymentConditionType = "DeprovisionLaunchError"

// AuthenticationFailureCondition is true when platform credentials cannot be used because of authentication failure
AuthenticationFailureClusterDeploymentCondition ClusterDeploymentConditionType = "AuthenticationFailure"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's any opportunity for better synergy with #1109? I can't see any immediately.

@@ -57,6 +57,9 @@ const (
defaultRequeueTime = 10 * time.Second
maxProvisions = 3

platformAuthFailureResason = "PlatformAuthError"
platformAuthSuccessReason = "PlatformAuthWorking"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest PlatformAuthSuccess instead of Working.

message = "Platform credentails passed authentication check"
} else {
status = corev1.ConditionTrue
reason = platformAuthFailureResason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resason.

@@ -271,6 +268,10 @@ type ReconcileClusterDeployment struct {
// for the remote cluster's API server
remoteClusterAPIClientBuilder func(cd *hivev1.ClusterDeployment) remoteclient.Builder

// validateCredentialsForClusterDeployment is what this controller will call to validate
// that the platform creds are good (used for testing)
validateCredentialsForClusterDeployment func(client.Client, *hivev1.ClusterDeployment, log.FieldLogger) (bool, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda wish this could be a separate controller, but obviously we don't want to process with launching an install here until it's been run. I know we typically wouldn't see a condition false unless it had previously been true, but would it make sense in this case? Don't launch unless we see an AuthenticationFailure condition set to False? It would be nice to encapsulate the responsibility for setting this across cloud providers into it's own controller.

@dgoodwin
Copy link
Contributor

Also please add this new condition to the array in provision_underway_collector.go so it's reported in the alerts if this happens.

Joel Diaz and others added 2 commits December 1, 2020 10:56
Fix the immediate problem where passing bad credentials for a vSphere installation means that deleting the ClusterDeployment doesn't get deleted (b/c the deprovision fails (b/c of the bad creds)).

Test the creds, and set a condition so that we can avoid ever even creating a ClusterProvision when we detect bad credentials. This means deprovision works b/c the provision was never attempted. Right now we only check vSphere credentials, but more platforms can (and should) be added.

Add test cases to validate this.
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2020
@abutcher
Copy link
Member

abutcher commented Dec 2, 2020

/test e2e

@dgoodwin
Copy link
Contributor

dgoodwin commented Dec 3, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, joelddiaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abutcher
Copy link
Member

abutcher commented Dec 3, 2020

/test e2e

1 similar comment
@abutcher
Copy link
Member

abutcher commented Dec 3, 2020

/test e2e

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants