Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORS-2062: Customer configured DNS for cloud platforms AWS, Azure and GCP #1468

Closed
wants to merge 2 commits into from

Conversation

sadasu
Copy link
Contributor

@sadasu sadasu commented Aug 31, 2023

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 31, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sadasu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@barbacbd barbacbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/enhancements/pull/1400/files#r1279672504 - Is the installer required to clean up the public dns records or since we cannot clean up private records just leave both ?

enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
barbacbd added a commit to barbacbd/api that referenced this pull request Sep 28, 2023
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
@2uasimojo
Copy link
Member

2uasimojo commented Sep 29, 2023

Please link CORS-1874

barbacbd added a commit to barbacbd/api that referenced this pull request Sep 29, 2023
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
barbacbd added a commit to barbacbd/api that referenced this pull request Sep 29, 2023
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
barbacbd added a commit to barbacbd/api that referenced this pull request Sep 29, 2023
The work relates to the enhancement openshift/enhancements#1468. The AWS, Azure, and GCP platform status structs are updated to include custom DNS options. The internal and external load balancer ip addresses as well as the types of dns records make up the base of the data.y
even after cluster installation completes.

If the user successfully configures their external DNS service with api,
api-int and *.apps services, then they could optionally delete the in-cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete implies that this coredns pod is unmanaged, how will the pod be evolved over time, eg updated as upgrades happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CoreDNS pod would be manged by the MCO. Currently there is no way of knowing if the lb addresses have changed since the Installer first created it. So, the lb dns addresses within the CoreDNS pod are also not expected to change.
A new field is being added to platformSpec for AWS, Azure and GCP which indicates whether the customDNS solution has been enabled. The coreDNS pod would be created by the MCO only when the feature is Enabled and the ConfigMap containing the LB config is present. When either of these conditions becomes False, the CoreDNS pod could be deleted. These are manual steps.
We are not recommending that the customer delete the coreDNS pod but pointing out that if the customer's DNS solution is configured correctly, then the cluster could function without the self-hosted coreDNS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the MCO remove the CoreDNS pod if the conditions it needs to be configured change? Eg if they decide to disable the feature, the MCO could recognise that desire and remove the CoreDNS pod right?

I think what you're saying here makes sense, it wasn't clear to me what's lifecycling the pod, so we should make sure the context of MCO lifecycling the pod is clear and then maybe clarify the workflow for the user to disable or remove the CoreDNS as well? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should make sure the user provisioned DNS is functioning before we remove the CoreDNS pod. So, if this capability is disabled day-2, the responsibility to check for a functioning DNS alternative would fall on which component? Or do we assume that the customer knows what they are doing and not have any checks?
As you can tell, I have some unanswered questions in this area. I think if we decide to allow that in the future, we should be able to add something to the Spec to control that behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the customer disables the feature, and we remove the pod without any checks, what will break? And how hard would it be for them to then recover the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the customer disables the feature, and we remove the pod without any checks, what will break? And how hard would it be for them to then recover the cluster?

IMO, disabling the feature would mean that the customer would want to start using the cloud default DNS and discontinue using their external DNS and the in-cluster DNS. When the feature is disabled, the customer could remove/delete entries from their external DNS. MCO could delete the CoreDNS pod. How do we configure the cloud default DNS then? We could:

  1. Ask the customer to manually configure the cloud DNS using LB values gathered from Infra CR or cloud CLI
  2. The Installer is currently responsible for configuring the cloud DNS for API and API-Int. If the cloud DNS has to be configured with these values day-2, MCO (or another appropriate component has to take on this task).

My current understanding is that disabling this feature day-2 would be an exception. Providing just the manual option seems sufficient at this time. @JoelSpeed, @zaneb

Comment on lines +144 to +147
4. After the Installer uses the cloud specific terraform providers to create
the LBs for API and API-Int, it will add the LB DNS Names of these LB to a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/). This ConfigMap is
only created when the custom DNS feature is enabled. This ConfigMap gets
appended to the Ignition file created by the Installer. Let us call this
the `lbConfigforDNS` ConfigMap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a configmap and not on the status of the infrastructure object? Doesn't the installer already populate the status of the infrastructure object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had explored that option in #1276 but ran into some implementation issues within the Installer.

The Installer generates all its manifests, adds them to the bootstrap ignition which is written to an s3 bucket (in the case of AWS) before terraform is started to create and configure cloud infrastructure. So, the Infrastructure manifest has already been written to the bootstrap ignition before the LBs are created via terraform. We found it easier to create a new configmap, append it to the bootstrap ignition and re-write to the s3 bucket than updating the Infrastructure CR that has already been written to the bootstrap ignition file.

Secondly, we don't expect the customer to interact with the configmap at all and for the LB information within it to change (no operator is monitoring the LB values today). It is meant to be a simple mechanism to pass the LB DNS information to MCO from the Installer.

Fwiw, we haven't completely given up on finding a way to update a manifest already written to the bootstrap ignition. Also, another operator (say MCO) could potentially read this configmap and update the Infrastructure CR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an API perspective, I would much rather see the installer update the status of the infra object (I appreciate the challenges you've outlined) than use a configmap. Configmaps have no validation and aren't real APIs. Having something in cluster change that value based on the configmap value creates a confused deputy style problem.

It's not a blocker per se, but it would help me sleep at night if we could update the infrastructure status directly, or some other in cluster API rather than a configmap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current Installer architecture makes it very hard to update the bootstrap ignition once it is generated. And the bootstrap ignition is generated before we know the LB IPs and regeneration is also not possible (installer limitation and re-generation would cause us to loose user edits to bootstrap ignition that might have happened in the meantime). We did recognize that updating the Infrastructure is the best option but we are going with our 2nd best option because appending to the bootstrap ignition rather than update a manifest that was already written to it is currently our only viable option.

With the updates happening to the Installer code to remove dependency on terraform, we hope to influence the design to make things like this easier to accomplish within the Installer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon is opposed to updating the Infrastructure CR within the Installer due to the amount of surgery needed on the bootstrap ignition file. I am not sure if the Installer changes needed to remove terraform make things better. Even if they do, that won't be available until a later release.
If the Infrastructure can be updated with the data in ConfigMap by a component other than the Installer, I am open to that. I already explored MCO as an option early on and that doesn't work either. Any other options seem viable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wrap it in a ConfigMap and add it to the manifests directory at all then? It could be any old JSON file.

Copy link
Contributor Author

@sadasu sadasu Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wrap it in a ConfigMap and add it to the manifests directory at all then? It could be any old JSON file.

@zaneb we wanted to treat it like an asset generated by the Installer which it is.

Copy link
Contributor Author

@sadasu sadasu Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to @JoelSpeed 's original question. I believe everyone is caught up on the reasons for moving away from the Infra CR although we started there originally :-)

Yes, the Installer populates the Status of the Infra CR while generating the Infra manifest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be for MCO to read the configMap and update Infrastructure CR with these values. This was initially thought of as not a possibility because MCO did not own the Infrastructure resource. But, recent discussions seem promising.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we wanted to treat it like an asset generated by the Installer

Oh, like you wanted the user to be able to edit it as a manifest? That makes sense if that is actually a requirement. Is it though? IIUC there's no additional detail that users can add beyond what they already provide in the install-config, so really they can only mess it up.

If that's not a requirement, there are heaps of Assets that get added to the bootstrap ignition without being manifests as such - most of the ones in this list.

Comment on lines 202 to 210
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: LBConfigforDNS
namespace: openshift-aws-infra
data:
internal-api-lb-dns-name: "abc-123"
external-api-lb-dns-name: "xyz-456"
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding why this middle man is needed here, can you expand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
successful cluster install, to configure their own DNS solution.

```go
type AWSPlatformStatus struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an API PR for this? We can do the in-depth API review there, general structure looks ok here I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one openshift/api#1606. @barbacbd is working on it.

enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
Comment on lines 294 to 259
3. Add a field within the `PlatformSpec` for AWS, Azure and GCP to indicate if
custom DNS is enabled. `PlatformSpec` is within the `Spec` field of the
Infrastructure CR. Here is the update for platform AWS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this spec or status? Can it be changed after a cluster has been bootstrapped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added it to the Spec field because this is configuration that is provided by the user. But, as you point out this cannot be changed after the cluster has been bootstrapped. If the Status field is a better place for it, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, if it cannot be changed on day 2, it should be in status. If it can be changed day 2, then we tend to have a spec field that is reflected into the status once the controllers that observe the configuration have had a chance to observe and update themselves based on the new input.

I think for this case, status only is sufficient

enhancements/installer/cloud-custom-dns.md Outdated Show resolved Hide resolved
@sadasu sadasu force-pushed the custom-dns branch 2 times, most recently from 0a637ed to 3da89b0 Compare October 27, 2023 14:30
@sadasu sadasu changed the title Customer configured DNS for cloud platforms AWS, Azure and GCP CORS-1874: Customer configured DNS for cloud platforms AWS, Azure and GCP Oct 27, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 27, 2023

@sadasu: This pull request references CORS-1874 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.15." or "openshift-4.15.", but it targets "openshift-4.14" instead.

In response to this:

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 27, 2023
@sadasu
Copy link
Contributor Author

sadasu commented Oct 27, 2023

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 27, 2023

@sadasu: This pull request references CORS-1874 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.15." or "openshift-4.15.", but it targets "openshift-4.14" instead.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu changed the title CORS-1874: Customer configured DNS for cloud platforms AWS, Azure and GCP CORS-2062: Customer configured DNS for cloud platforms AWS, Azure and GCP Oct 27, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 27, 2023

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Oct 27, 2023

/jira refresh

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 27, 2023

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 19, 2024
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Copy link
Contributor

openshift-ci bot commented Jan 27, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu
Copy link
Contributor Author

sadasu commented Feb 1, 2024

/reopen

@openshift-ci openshift-ci bot reopened this Feb 1, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 1, 2024

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Enhancement proposal for CORS-1874.

Two enhancement proposals preceding this work:
#1276
#1400

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Feb 1, 2024

@sadasu: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

openshift-ci bot commented Feb 1, 2024

@sadasu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sadasu
Copy link
Contributor Author

sadasu commented Feb 2, 2024

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 2, 2024
@dhellmann
Copy link
Contributor

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 20, 2024
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Mar 28, 2024
Copy link
Contributor

openshift-ci bot commented Mar 28, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

5 similar comments
@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants