Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/asset/installconfig/aws: public DNS validation #5189

Conversation

jhixson74
Copy link
Member

Verify no DNS records exist in the public DNS zone. This helps to prevent
installing into an existing cluster. The goal here is to not leak public DNS
records. This should help make the race condition less likely.

https://issues.redhat.com/browse/CORS-1195

@wking
Copy link
Member

wking commented Sep 12, 2021

Previously, an existing public A record would cause an install failure when Terraform tried to create the public A record and failed on the collision. With this PR's precheck, you'll fail a bit faster. But the overall result will be similar. If you want to avoid the leak entirely, you'll need to reverse ae9cbaf (#1508), and start claiming the private record before you claim the public record. That way, the deletion logic will find the private record and know it can safely remove the public record. It will be a bit racy, although your new precheck will limit the exposure. You'll be vulnerable to:

  1. Install A's precheck looks for the public record, confirms it doesn't exist.
  2. Install B creates the public record.
  3. Install A creates the private record.
  4. Install A tries to create the public record, but fails because B won the race.
  5. Attempting to remove cluster A breaks cluster B by removing its public record (which A thought it owned, because of the private record from step 3).

It's possible you could move the precheck into a Terraform data request or some such, to reduce the racy gap between step 1 and step 4. But even with your Go-side precheck, the gap may be small enough that we can say "probably won't happen too often, and we can manually restore B's public A record if it does" and decide that that risk is less annoying than the current A record leak. I personally don't have much opinion on the annoyance level of leaking public records on deletion vs. removing public records that really belong to one cluster when deleting a separate cluster. Parallel installs using the same name seems like something that will probably never happen for production clusters.

@jhixson74 jhixson74 force-pushed the master_aws_public_dns_record_leak branch from b66afef to 31a8f2c Compare October 19, 2021 21:28
@jhixson74
Copy link
Member Author

/test e2e-aws

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with this change, on its merits. This aligns with the small window of vulnerability that we have with BYO hosted zone. And both those vulnerabilities seem acceptably rare and shoot-yourself-in-the-foot enough to me.

pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
@jhixson74 jhixson74 force-pushed the master_aws_public_dns_record_leak branch 2 times, most recently from 33c911f to 8f8ccfb Compare October 28, 2021 00:11
@jhixson74
Copy link
Member Author

/test e2e-aws

pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
pkg/asset/installconfig/aws/validation.go Outdated Show resolved Hide resolved
data/data/aws/cluster/route53/base.tf Outdated Show resolved Hide resolved
@jhixson74 jhixson74 force-pushed the master_aws_public_dns_record_leak branch from 8f8ccfb to e2a8d07 Compare December 14, 2021 23:20
Verify no DNS records exist in the public DNS zone. This helps to prevent
installing into an existing cluster. The goal here is to not leak public DNS
records. This should help make the race condition less likely.

https://issues.redhat.com/browse/CORS-1195
@jhixson74 jhixson74 force-pushed the master_aws_public_dns_record_leak branch from e2a8d07 to b88b077 Compare December 14, 2021 23:25
Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 15, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

10 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

18 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2021

@jhixson74: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-provider-network b66afef link /test e2e-openstack-provider-network
ci/prow/okd-e2e-aws-upgrade b88b077 link false /test okd-e2e-aws-upgrade
ci/prow/e2e-aws-single-node b88b077 link false /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit de6fa77 into openshift:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants