Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORS-2062: Allow Customers to BYO DNS #1400

Closed
wants to merge 1 commit into from
Closed

Conversation

sadasu
Copy link
Contributor

@sadasu sadasu commented May 16, 2023

Enhancement proposal for https://issues.redhat.com/browse/CORS-1874

API changes: openshift/api#1397

@sadasu sadasu changed the title Allow Customers to BYO DNS CORS-1874: Allow Customers to BYO DNS May 16, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 16, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 16, 2023

@sadasu: This pull request references CORS-1874 which is a valid jira issue.

In response to this:

Enhancement proposal for https://issues.redhat.com/browse/CORS-1874

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from jcantrill and tremes May 16, 2023 14:54
@sadasu sadasu changed the title CORS-1874: Allow Customers to BYO DNS CORS-2062: Allow Customers to BYO DNS May 16, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 16, 2023

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

In response to this:

Enhancement proposal for https://issues.redhat.com/browse/CORS-1874

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sadasu sadasu force-pushed the byo-dns branch 3 times, most recently from ef442d3 to d4b7709 Compare May 18, 2023 14:22
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 26, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from sadasu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enhancements/installer/byo-custom-dns.md Show resolved Hide resolved
enhancements/installer/byo-custom-dns.md Show resolved Hide resolved
Comment on lines 152 to 157
8. When the LoadBalancer Service successfully creates the Ingress LB in the
underlying cloud, it updates the `Hostname` (in AWS) or `IP (Azure and GCP)
fields within the [LoadBalancerIngress struct] (https://github.com/kubernetes/api/blob/cdff1d4efea5d7ddc52c4085f82748c5f3e5cc8e/core/v1/types.go#L4554). These fields represent the hostname
or the IP address of the Ingress LB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be combined with bullet point 10?

enhancements/installer/byo-custom-dns.md Show resolved Hide resolved
Comment on lines 162 to 164
10. runtimecfg monitors changes to the `IP` associated with the Ingress LB. When
the `Hostname` is set, it needs to be resolved to its IP addresses before they
can be added to the Corefile associated with the CoreDNS pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems like a pain point. We have talked about this in the past, but to rehash: it seems that coredns will only allow us to create A records with IP addresses.

Is it realistic (how painful) for runtimecfg to resolve the LB DNS name to the ip addresses? Do we know whether those underlying IP addresses can change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 ways we can solve this:

  1. runtimecfg watches for changes to the IP address that Ingress LB Hostname resolves to. When there is a change, it updates the CoreDNS Corefile with the new IP
  2. Add CNAME records in CoreDNS for our DNS entries.

I am not looking pursuing option 2 at this time because for Azure & GCP, the IP address of the LB is available and not its Hostname. By choosing option 1, we can have a common implementation for all 3 platforms.

enhancements/installer/byo-custom-dns.md Outdated Show resolved Hide resolved
enhancements/installer/byo-custom-dns.md Outdated Show resolved Hide resolved
enhancements/installer/byo-custom-dns.md Outdated Show resolved Hide resolved
Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive 👀 here.

IIUC hive will not be impacted or need to add support for this as long as managed DNS is not used for such clusters because the main touchpoint is the install-config. Is that right?

enhancements/installer/byo-custom-dns.md Outdated Show resolved Hide resolved

<snip>
```
Setting `userConfiguredDNS` to `Enabled` and not specifying `apiLBName` and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean userConfiguredDNSLB.userDNS rather than userConfiguredDNS here, right?

Copy link
Contributor Author

@sadasu sadasu Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Fixed it now

@sadasu sadasu force-pushed the byo-dns branch 3 times, most recently from 6140434 to dd34233 Compare July 17, 2023 15:34
@jcantrill
Copy link
Contributor

/unassign

@sadasu sadasu force-pushed the byo-dns branch 2 times, most recently from 75215f8 to 876d631 Compare July 18, 2023 17:39
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 18, 2023

@sadasu: This pull request references CORS-2062 which is a valid jira issue.

In response to this:

Enhancement proposal for https://issues.redhat.com/browse/CORS-1874

API changes: openshift/api#1397

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines +72 to +76
We will not use the [Kubernetes External DNS provider](https://github.com/openshift/external-dns)
and/or the [OpenShift External DNS Operator](https://github.com/openshift/external-dns-operator)
to program dns entries across varied DNS providers. The External DNS Operator
relies on being able to program the customer's DNS solution. The customer DNS
solutions in consideration here do not allow OpenShift to configure them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The customer DNS solutions in consideration here do not allow OpenShift to configure them.

Is this the primary issue with the External DNS Operator and is this really the case? I have not heard that users don't want openshift to program DNS, but rather our IPI installs are tied to a single platform where users' DNS is frequently hosted in a separate platform.

I don't think we want to intentionally exclude the external dns operator. I think we want to include it as a part of the solution as much as possible (but that might not be very much).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this solution needs to worry about the set of DNSs the External DNS operator can configure. If the DNS solution the customer wants to use is supported by External DNS operator and they are OK with OpenShift configuring it, they can still use it.
This solution is useful for customers that want to use a DNS solution that is currently not supported by the External DNS operator and/or they don't want OpenShift to configure their DNS. @makentenza can add details about the customers we are targeting with this solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're able to use the External DNS Operator it could provide some substantial advantages. Consider if we moved API load balancers to be provisioned/managed by services created on bootstrap and ran the external dns operator on the bootstrap node. This would allow installer/cluster--provisioned LBs (customers don't need to BYO) and DNS records for the subset of DNS providers supported by the external DNS operator. We would need to determine some form of support for the DNS providers not included in the external DNS operator, but that could be either of the approaches we have already discussed (byo LBs or post-configured DNS).

Abhinav did a write up of SLB managed API load balancers, which might provide some more context about the approach or challenges. I wonder if using the external DNS operator would mitigate some (or most) of the issues he mentions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the original requirements for the feature said that OpenShift is not given credentials or access to manage the external DNS server at all, based on infrastructure security requirements in the settings where this feature is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the feature in Jira, it says:

While cloud-based DNS services provide convenient hostname management, there's a number of regulatory and operational constraints customers face prohibiting the use of those DNS hosting services on public cloud providers.

So, yeah, maybe that does knock out the option of using the external DNS operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to https://issues.redhat.com/browse/OCPSTRAT-261 which states:

Out of Scope
The Installer to manage (create/update/delete) the required DNS records to deploy OpenShift on these cloud providers or integrate with those DNS services in any way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solving the use case where there is no programmatic access to DNS is not optional. So we cannot require the external DNS operator.

I guess that for users who are using the external DNS operator anyway, it would be nice if they didn't also have to manually create their own load balancer. But that's not an explicit goal of this work.

Potentially we could have an API design that would leave room for this if it ever actually comes up - e.g. if we had a DNSProvider field where the options are (some variant of) UserManaged vs. OpenShiftManaged then we could add an ExternalDNSOperatorManaged option if needed in the future. (Please don't use any of these names 🙂)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can't require programmable DNS.

Suggested change
We will not use the [Kubernetes External DNS provider](https://github.com/openshift/external-dns)
and/or the [OpenShift External DNS Operator](https://github.com/openshift/external-dns-operator)
to program dns entries across varied DNS providers. The External DNS Operator
relies on being able to program the customer's DNS solution. The customer DNS
solutions in consideration here do not allow OpenShift to configure them.
We will not use the [Kubernetes External DNS provider](https://github.com/openshift/external-dns)
and/or the [OpenShift External DNS Operator](https://github.com/openshift/external-dns-operator)
to program dns entries across varied DNS providers. Because programmable DNS is precluded in some
environments due to technical limitations or organizational policy we cannot require programmable DNS.

@sadasu sadasu force-pushed the byo-dns branch 2 times, most recently from da4d1f4 to 4a08899 Compare July 29, 2023 19:08
API-Int. The customer would also be responsible for removing entries from
their custom DNS.

#### Variation [optional]
Copy link
Contributor Author

@sadasu sadasu Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon @zaneb, @Miciah and @bennerv , I have updated this section to include information about spec.loadBalancerIP and the annotations that have been added by AWS, Azure and GCP that we could potentially use towards the design goal of pre-configuring the DNS entry for *.apps before cluster installation.

Could you please take a look and provide comments? If these annotations seem like a better option than running the CoreDNS pod, I will update the main "Proposal" to reflect that.

@cblecker
Copy link
Member

@sdodson Thanks for the ping.

After my first pass at this, I have some concerns from a managed services perspective:

  • The current proposed solution requires a number of cloud resources to be pre-provisioned and effectively unmanaged by the cluster itself. It's not just DNS, but it's the cloud load balancers and networking bits that I'm quite concerned about.
  • Use of this feature also appears to inhibit the use of other features. If the cloud load balancers cannot be updated by the cluster, then use of things like ControlPlaneMachineSets to replace control plane nodes I expect wouldn't function in addition to the already noted complexities with Ingress (ref: CORS-2062: Allow Customers to BYO DNS #1400 (comment))
  • The solution as described here would not be compatible with the ARO architecture. While I understand the initial proposal is just for AWS, it seems we are designing ourselves into a corner here where we know this solution won't work. In particular, if the load balancers are user created that interferes with the use of managed resource groups.

It seems a preferable solution here would be one where the cluster can provision and update cloud load balancer objects, and either run a local DNS pod, configure host files, or some other method in order to allow the cluster to still internally resolve it's required API DNS entries while not relying on external DNS functioning. This would allow for cloud DNS configuration as a post-install operation, and is more in line with what ARO does today. Is this an option we can more fulsomely consider? Happy to discus further.

@patrickdillon
Copy link
Contributor

Thanks for the feedback @cblecker!

in order to allow the cluster to still internally resolve it's required API DNS entries while not relying on external DNS functioning.

One of the obstacles here is that the installer--in addition to the cluster--needs to resolve DNS for the API. @cblecker in managed services does the installer have the same DNS view as the cluster? I don't know much about Private Link or how it's used in managed services, but perhaps it solves this problem?

@sadasu
Copy link
Contributor Author

sadasu commented Jul 31, 2023

@cblecker thanks for taking a look.

@sdodson Thanks for the ping.

After my first pass at this, I have some concerns from a managed services perspective:

  • The current proposed solution requires a number of cloud resources to be pre-provisioned and effectively unmanaged by the cluster itself. It's not just DNS, but it's the cloud load balancers and networking bits that I'm quite concerned about.

While considering this aspect of the design, I was told that there is a move towards BYO infrastructure so this is not really a concern.

  • Use of this feature also appears to inhibit the use of other features. If the cloud load balancers cannot be updated by the cluster, then use of things like ControlPlaneMachineSets to replace control plane nodes I expect wouldn't function in addition to the already noted complexities with Ingress (ref: CORS-2062: Allow Customers to BYO DNS #1400 (comment))

I don't see why the cluster cannot update the LBs. The users are required to create them but the installer would configure them and the same LB reference would be passed into Machine objects.

  • The solution as described here would not be compatible with the ARO architecture. While I understand the initial proposal is just for AWS, it seems we are designing ourselves into a corner here where we know this solution won't work. In particular, if the load balancers are user created that interferes with the use of managed resource groups.

We don't want to design a solution that won't work for ARO. The platform specific configuration bits will be added one platform at a time but we want a cohesive design across AWS, Azure and GCP. So, could you please point out which specific section of the design is not OK for ARO or is it the design as a whole?

It seems a preferable solution here would be one where the cluster can provision and update cloud load balancer objects, and either run a local DNS pod, configure host files, or some other method in order to allow the cluster to still internally resolve it's required API DNS entries while not relying on external DNS functioning. This would allow for cloud DNS configuration as a post-install operation, and is more in line with what ARO does today. Is this an option we can more fulsomely consider? Happy to discus further.

The previous iteration of the design for Custom DNS in #1276 has many aspects of what you are asking for here. That design met with an untimely death and so is not in its final polished state. Feel free to take a look.

@sadasu
Copy link
Contributor Author

sadasu commented Jul 31, 2023

@cblecker could you also PTAL at the "Variations" section? It has some new information about annotations and spec.loadBalancerIP which is currently being used by ARO.

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having the user pre-allocate an IP address rather than a whole load balancer. The platform-specific APIs are annoying, but we can work with them. I'm curious about the particulars of how the user would create or reserve the custom IP address on Azure or GCP.

enhancements/installer/byo-custom-dns.md Show resolved Hide resolved
enhancements/installer/byo-custom-dns.md Outdated Show resolved Hide resolved
Comment on lines 236 to 237
the install-config. The Installer will make it avilable to the ingress operator
via the Ingress Controller CRD as in the above case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than modifying the IngressController API, it might make sense to add a field to the cluster ingress config, or maybe it makes the most sense to have both: IngressController.spec.endpointPublishingStrategy.loadBalancer.providerParameters.aws.networkLoadBalancer.eipAllocations and Ingress.spec.loadBalancer.platform.aws.networkLoadBalancer.eipAllocations, using the new ingress config field to set a default value for the "default" IngressController's new field.

(My first thought was to add an IngressIPs field to AWSPlatformStatus à la BareMetalPlatformStatus, OpenStackPlatformStatus, etc., but it makes sense to put the new config under the ingress config as it really is specifically ingress-related config and the ingress config already has the spec.aws union, which could be extended with NLB-specifc config.)

utilize this feature for their Ingress LB pre-creation. Again, the concern here
is the deprecation comment in the documentation.

2. [`service.beta.kubernetes.io/azure-load-balancer-ipv4` and `service.beta.kubernetes.io/azure-load-balancer-ipv6`](https://learn.microsoft.com/en-us/azure/aks/load-balancer-standard#specify-the-load-balancer-ip-address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't rely on the AKS documentation as there could be AKS-only annotations, but the annotation is defined in cloud-provider-azure (which you pointed me to: https://github.com/kubernetes-sigs/cloud-provider-azure/blob/0cfeb043809bdf796233f47dbffb8608d3a716e4/pkg/consts/consts.go#L205), so it should be available for use by OpenShift.

Comment on lines 251 to 268
2. [`service.beta.kubernetes.io/azure-load-balancer-ipv4` and `service.beta.kubernetes.io/azure-load-balancer-ipv6`](https://learn.microsoft.com/en-us/azure/aks/load-balancer-standard#specify-the-load-balancer-ip-address)

Setting this annotation means that the Azure LB would not have to be created
by the user before cluster installation. The custom DNS solution can be
configured before cluster installtion such that *.apps resolves to these
IPv4 (and IPv6 for dual stack) addresses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Azure documentation says, "All custom IP addresses must be created and managed by the user." It doesn't say exactly how to create a custom IP address though. Is it as easy as creating an EIP on AWS?

Comment on lines +264 to +280
2. [`kubernetes.io/ingress.global-static-ip-name`](https://cloud.google.com/kubernetes-engine/docs/concepts/ingress-xlb#static_ip_addresses_for_https_load_balancers)

This annotation was added to [`Ingress-gce`](https://github.com/kubernetes/ingress-gce).
It seems similar to the 2nd option discussed for AWS and Azure where the a
pre-determined static IP is set aside for the external Ingress LB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the kubernetes.io/ingress.global-static-ip-name annotation defined in https://github.com/openshift/kubernetes/blob/release-4.14/staging/src/k8s.io/legacy-cloud-providers/gce/gce_annotations.go, so I expect it isn't available for use by OpenShift.

enhancements/installer/byo-custom-dns.md Show resolved Hide resolved
Comment on lines +270 to +285
Given the different approaches taken by different platforms, it seems that if
we wanted a semantically consistent way to handle to the pre-configuration of
DNS for *.apps we use the platform specific annotation that allows a
pre-defined static IP to be specified for the Ingress LB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, internally we would use the platform-specific mechanism, and for the customer, it makes sense to me to define a consistent API that could be used on each of these platforms that has some mechanism, for example on the IngressController API:

  • spec.endpointPublishingStrategy.loadBalancer.providerParameters.aws.networkLoadBalancer.eipAllocations, which would get translated into the service.beta.kubernetes.io/aws-load-balancer-eip-allocations annotation.
  • spec.endpointPublishingStrategy.loadBalancer.providerParameters.azure.loadBalancerIPAddress, which would get translated into the service.beta.kubernetes.io/azure-load-balancer-ipv4 or service.beta.kubernetes.io/azure-load-balancer-ipv6 annotation.
  • spec.endpointPublishingStrategy.loadBalancer.providerParameters.gcp.loadBalancerIPAddress, which would get translated into the Service spec.loadBalancerIP field.

Or possibly similar on the cluster ingress config API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the idea and putting it all together like this makes it very clear. Adding it back to the document.

@zaneb
Copy link
Member

zaneb commented Aug 2, 2023

One thing that I think might help this discussion would be to have a diagram that shows the dependencies that the various pieces of infrastructure have on one another. e.g. (disclaimer: probably inaccurate):

flowchart BT;
  awslb[AWS LoadBalancer]
  awseip[AWS ElasticIP]
  dns[AWS Route53]
  awssubnet[AWS Subnet]
  awslb-->awseip
  awslb-->awssubnet
  awseip-->awssubnet
  dns-->awslb
  API-->dns
  style dns stroke:#f55,stroke-width:3px
flowchart BT;
  awslb[AWS LoadBalancer]
  awseip[AWS ElasticIP]
  dns[AWS Route53]
  awssubnet[AWS Subnet]
  awslb-->awseip
  awslb-->awssubnet
  awseip-->awssubnet
  dns-->awslb
  awslb-->Service
  Service-->Ingress
  style dns stroke:#f55,stroke-width:3px

I think this would help in discussing the different options where we need to break the graph edges for the user to provide input, given that the DNS often resides in the middle of the graph.
If the graphs are different for different clouds, that would be useful too so that we can see if we are coming up with a solution that won't carry over well to e.g. Azure.

Source for the charts above (see https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-diagrams#creating-mermaid-diagrams):

```mermaid
flowchart BT;
  awslb[AWS LoadBalancer]
  awseip[AWS ElasticIP]
  dns[AWS Route53]
  awssubnet[AWS Subnet]
  awslb-->awseip
  awslb-->awssubnet
  awseip-->awssubnet
  dns-->awslb
  awslb-->Service
  Service-->Ingress
  style dns stroke:#f55,stroke-width:3px
```
```mermaid
flowchart BT;
  awslb[AWS LoadBalancer]
  awseip[AWS ElasticIP]
  dns[AWS Route53]
  awssubnet[AWS Subnet]
  awslb-->awseip
  awslb-->awssubnet
  awseip-->awssubnet
  dns-->awslb
  awslb-->Service
  Service-->Ingress
  style dns stroke:#f55,stroke-width:3px
```


## Implementation History

Major milestones in the life cycle of a proposal should be tracked in `Implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the current status of this work?

I'm getting the impression that the API part is implemented but the Ingress part is not, and that's the part we are continuing to discuss?

const (
// DNSUserAndClusterProvided indicates that the user configures DNS for API and API-Int.
// The cluster handles some of its in-cluster DNS needs without user intervention.
DNSUserAndClusterProvided DNSProviderType = "UserAndClusterProvided"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to make the distinction between the needs of the cluster and the needs of an external user in the key name rather than in the value. e.g. publicDNSProvider: UserManaged/publicDNSProvider: OpenShiftDefaultClusterManaged instead of provider: UserAndClusterProvided

configured with the LB DNS Names of the LBs associated with each of these
services. Hence the LBs have to be created before they can used to configure
the DNS solution. Also, important to note is that these LB DNS Names cannot
be generated before they are created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC in the case of Ingress the reason here is that the LB name is based on the UID of the Service CR that manages it, at least in cloud-provider-aws.
What's the constraint for the API?

When it is indicated to the OpenShift installer via an input parameter that the
custom DNS solution is in use, OpenShift does not configure the cloud DNS
solution [Route53 for AWS](https://aws.amazon.com/route53/) for any of its
services. In the case of API and API-Int services, the customer is responsible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the Goals states:

"Continue using the default cloud LB service for API, API-Int and Ingress."

I read that as meaning that the installer would continue to create the load balancers, so maybe there's a mismatch here. Is the goal out of date or am I reading wrong?

LoadBalancer service to create and configure the LBs for the *.apps service.
OpenShift will start an in-cluster DNS pod to provide DNS resolution for *.apps
during cluster installation. After the cluster is installed, the user will have
to update their custom DNS solution with an entry for the *.apps service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further enquiry suggests this is not assisted-specific.

Ingress is typically one of the very last things to come up, because it runs on worker nodes. I'm not sure that anything depends on it, except for the check in the installer to see if the console is available - that definitely does.

Is it possible that the answer for Ingress could be as simple as disabling that check when installing with user-managed DNS??

enabled via install-config. Since API and API-Int resolution are pre-requisties
for successful cluster installation, the Installer performs additional
validations to make sure they are configured correctly. The installer provides
a warning but does not stop installation when API or API-Int resolution fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be more accurate to say that we can validate anything we want from the bootstrap but since the installer doesn't have a way of checking it, it's only useful for debugging after the fact.

fields within the [LoadBalancerIngress struct](https://github.com/kubernetes/api/blob/cdff1d4efea5d7ddc52c4085f82748c5f3e5cc8e/core/v1/types.go#L4554). These fields represent the hostname
or the IP address of the Ingress LB.

9. OpenShift needs to provide an in-cluster DNS solution for *.apps resolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has always been my impression that this is already a built-in part of k8s: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ - this allows pods within the cluster to obtain the IP address (on the cluster network) for particular services.

IIUC the implementation for it in OpenShift is managed by the DNS Operator. You can see the CoreDNS pods running in the openshift-dns namespace.

The wildcard *.apps external DNS is only needed for stuff outside the cluster to get traffic to the correct Ingress point.

Have we tried installing a cluster without adding a static CoreDNS pod? What happens?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true of services, but not for Ingresses. I did a quick check to see what happens if we delete the *.apps record in an AWS cluster. Both the authentication and console operators entered Available=False state.

authentication                             4.14.0-0.okd-scos-2023-08-07-185858   False       False         False      14s     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.sdodson.redacted.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.sdodson.redacted.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
console                                    4.14.0-0.okd-scos-2023-08-07-185858   False       False         False      17s     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.sdodson.redacted.openshift.com): Get "https://console-openshift-console.apps.sdodson.redacted.openshift.com": dial tcp: lookup console-openshift-console.apps.sdodson.redacted.openshift.com on 172.30.0.10:53: no such host

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I should have added this earlier. This is some test code in Installer : openshift/installer#7403. e2e tests show auth, console and Ingress operators degrading.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 10, 2023

@sadasu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2023
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 15, 2023
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Sep 23, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 23, 2023

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

1 similar comment
@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, CORS-1874, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@sadasu
Copy link
Contributor Author

sadasu commented Oct 13, 2023

Closed in favor of #1468

@dhellmann dhellmann removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet