Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple A records for the same domain from different external-dns instances. #1441

Open
fore5fire opened this issue Feb 24, 2020 · 45 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@fore5fire
Copy link

fore5fire commented Feb 24, 2020

What would you like to be added:
Allowing multiple A records to be set from different sources that are managed by different external-dns instances.

For some background, I'm trying to create A records from services of type LoadBalancer in different clusters, but it seems that currently (v0.6.0) the only way to specify multiple IP addresses for a single DNS name is to include them all as targets in a single DNSEndpoint, which is not an option when services are running in different clusters using different instances of external-dns. When I attempt to do this, only one of the records is created and then the logs report level=info msg="All records are already up to date" across all instances.

Why is this needed:
Allowing multiple A records per domain allows for failover clusters with minimum configuration, and is especially useful in situations where inter-region load balancers aren't available, like with DigitalOcean or on-prem. The IP addresses for load balancers or ingresses are only available in their respective cluster, and cannot all be consolidated into a single DNSEndpoint resource without implementing custom automation that would require resource inspection permissions across clusters.

@fore5fire fore5fire added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 24, 2020
@caviliar
Copy link

caviliar commented Mar 6, 2020

I have the need for the same use case. but currently cannot find a workaround to get this behaviour to work.

If putting external-dns into debug then we get the message from the second external-dns instance that it cannot add the A record as it is not the owner.

If something like the TXT records value was keyed by the external-dns's txt-owner-id then it would be able to maintain and store the records associated with that cluster so that multiple external-dns instances from multiple clusters could all maintain records for x.foo.bar

@jonpulsifer
Copy link

I would also like to play with this at work, and at home.

This issue has been around for a while, @njuettner @Raffo do you have any thoughts?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 14, 2020
@seanmalloy
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 14, 2020
@savitha-ping
Copy link

Is this being considered? It is very useful to have in multi-region deployments (i.e. multiple Kubernetes clusters) when using service discovery protocols such as JGroups DNS_PING. Would appreciate adding this feature! :)

@kespinola
Copy link

kespinola commented Nov 12, 2020

I am also interested in this feature to help safely rollover traffic to a new cluster.

I would like external dns to run in both the current and incoming cluster and attach to their respective gateways (we are using Istio). The incoming cluster and current cluster should contribute to the same record in Route 53 but assign independent weights. For example, start with responding to 10% of DNS queries with the IP of the incoming Istio Ingress load balancer and the rest to the current load balancer. This requires the DNS provider to support weighted entries which Route 53 does but I'm not sure about others.

I am happy to help make this contribution if it is desired by the maintainers. I'd also love to hear other methods for achieving the same incremental rollout of services from one cluster to another.

@rsaffi
Copy link

rsaffi commented Dec 4, 2020

One more use-case for this right here! ✋
And same reason: safe rollout of the service on different clusters (different providers, even), so multiple external-dns instances.

@mamiu
Copy link

mamiu commented Dec 10, 2020

We're looking for the same (multiple A records per domain) but for another purpose. We'd like to use Round-robin DNS in our cluster. Port 80 and 443 of every node is exposed to the public and can be used as an entry for all routes (handled by ingress-nginx, as described here).

Or is this already a feature that can be enabled via configuration?

@povils
Copy link

povils commented Feb 9, 2021

Same here: we have many short-lived clusters and external-dns seems would be a good fit to automate these DNS records for our API gateways

@CRASH-Tech
Copy link
Contributor

Also need this feature

@buzzsurfr
Copy link

I'm looking at contributing to this issue (since I'm also interested in it), but wanted to discuss the experience before working on it.

  • Would this need a feature flag or argument to enable? (e.g. wouldn't be a default)
  • Would we want some sort of permission model for determining which external-dns instance can share a service/record?

I'm specifically focusing around the aws-sd provider (but will also test for the txt provider). When I created a new service in cluster-0 called nginx, the Cloud Map Service uses this for the Description field:

heritage=external-dns,external-dns/owner=cluster-0,external-dns/resource=service/default/nginx

Would it make sense to have an annotation on the k8s Service resource specifying it as a "shared" resource? That way, if both k8s clusters agree that the resource is shared, they will use a different behavior model and not overwrite each other's records (Cloud Map Service Instances).

For each record (Service Instance), I was thinking of adding Custom attributes for heritage, owner, and resource, and each external-dns instance would be responsible for updating the records if it's the owner.

There's a few operational checks that would need to exist around the Cloud Map Service resource (e.g. not deleting the service if other external-dns instances have records in there).

Any thoughts/opinions?

@sagor999
Copy link
Contributor

@buzzsurfr would be really cool if you can implement this feature!
Some thoughts:

  • I think this feature should be behind argument, since it does change behavior of external-dns quite a lot. Plus that would allow us to test this feature a bit more safely.
  • I think we should indicate to external-dns that some record should be marked as 'shared' (probably via additional annotation?). That way if existing record already exists, new record from a different cluster will not attempt to hi-jack it and start adding its own records to it. So all external-dns instances from various clusters should all be set to treat that record as shared.
  • As for permissions, I think above would fix it? For example, if there is already non shared record, and new instance tries to add a shared record -> it will spit out error instead. If record was already shared, and new instance has a record set to non shared, it will spit out error.
  • I am not sure how you would resolve the issue of external-dns/owner and external-dns/resource records in TXT record, since AFAIR it is used to validate inside external-dns which resource owns that record.I guess if both of them are set to shared then those validations will just be skipped?

Looking forward to checking out merge request, as I am curious how it will be implemented.

@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@mamiu
Copy link

mamiu commented Jul 27, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2021
@CRASH-Tech
Copy link
Contributor

We can add assigned IP to TXT record, than external-dns will be known which record is own

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2021
@rifelpet
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2021
@Rock981119
Copy link

Rock981119 commented Feb 11, 2022

I have the same requirement, is this feature under development?

If it is annotate nodeport svc, externaldns can add multiple A records at the same time.
However, if Ingresses with the same domain name are published separately through IngressClass, only the A record in the first Ingress will be updated.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 12, 2022
@jonpulsifer
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2022
@parkedwards
Copy link

is there any known workaround for this? I attempted @Eramirez33's approach, but im still getting this conflict when both external-dns instances have unique owner-id:

Skipping endpoint...because owner id does not match...

@gabrieloliveers
Copy link

@Eramirez33 You save my life bro!!!

@jbg
Copy link

jbg commented Mar 27, 2023

--txt-owner-id as suggested by @Eramirez33 works for the case of having different external-DNS instances managing different DNS names in the same zone. It doesn't work for having different external-DNS instances managing multiple A records for the same DNS name in the same zone, which was the original subject of this issue.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2023
@nitrocode
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 27, 2023
@airhorns
Copy link

I encountered another use case for this (I think): DNS round robin between two different ingress controllers. We're trying to switch our ingress controller to a new stack, and we'd like to slowly move traffic over from one to the other. We have two ingresses in the same cluster with the same hostname but different ingress classes, and we were expecting external-dns to create two A records, one for each. It doesn't right now, it seems like the first one wins, but if we could set up two external-dns instances, we could work around this.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2024
@rifelpet
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2024
@vhurtevent
Copy link

Hello, here is another use case : multiple clusters with external-dns in each providing DNS records and a Thanos for centralized metrics. Thanos can use DNS Service Discovery with a single DNS name to point at each Thanos sidecars as a target in each cluster.

In our situation, only the first cluster was able to create the A record for Thanos sidecar, the others complain about not being record's owner.

As I understand, external-dns can only work with a single source of truth which is its own kubernetes cluster. TXT records for ownership are only locks.

I am curious about how do you manage with this limitation ?
In our situation, as clusters are spawned using Terraform, it could be possible to manage Thanos A records with TF but it lacks of automatic updates in case how service load balancers behave. Maybe with Kyberno we could manage a centralized DNSEndpoint ressource with multiple targets.

@mimmus
Copy link

mimmus commented Mar 14, 2024

Interested to this feature to switch an Ingress from a cluster to another.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2024
@nitrocode
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2024
@pabclsn
Copy link

pabclsn commented Jun 13, 2024

This feature would be really appreciated for DNS Load Balancing :)

@k0da
Copy link
Contributor

k0da commented Jun 13, 2024

Hello, here is another use case : multiple clusters with external-dns in each providing DNS records and a Thanos for centralized metrics. Thanos can use DNS Service Discovery with a single DNS name to point at each Thanos sidecars as a target in each cluster.

In our situation, only the first cluster was able to create the A record for Thanos sidecar, the others complain about not being record's owner.

As I understand, external-dns can only work with a single source of truth which is its own kubernetes cluster. TXT records for ownership are only locks.

I am curious about how do you manage with this limitation ? In our situation, as clusters are spawned using Terraform, it could be possible to manage Thanos A records with TF but it lacks of automatic updates in case how service load balancers behave. Maybe with Kyberno we could manage a centralized DNSEndpoint ressource with multiple targets.

We solved similar case this way:

  • each cluster creates own A ingress records - thanos-sidecar-clustername.
  • We write cluster name into parameter store
  • On thanos querier we scrape parameter store and update SRV records with new targets or remove no longer existing.
  • Then we use srv record for sidecar discovery

@brutog
Copy link

brutog commented Jun 24, 2024

I would find this feature incredibly useful as well. Maybe I have 2 clusters and I want them load balanced.

Maybe I have an EKS and a GKE cluster, and I want traffic routed 70/30.

Maybe I have many clusters over many clouds and would like a single provider (e.g R53) to do multi-cluster DNS for all of them using geo-location so they get the closest cluster.

@panditha
Copy link

We are also looking for this feature to load balance traffic across multiple clusters.

@evandam
Copy link

evandam commented Jul 15, 2024

I believe all of this can be done with proper annotations, depending on the DNS provider? For Route53 for example - https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/aws.md#routing-policies

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2024
@Xe
Copy link

Xe commented Nov 6, 2024

Hi, this affects me too. I'd like to have the same domain (xeiaso.net) broadcasted from multiple clusters through the same DNS name in the same DNS zone.

@evandam

I believe all of this can be done with proper annotations

I have tried annotations, it has failed.

@Xe Xe mentioned this issue Nov 6, 2024
10 tasks
@Xe
Copy link

Xe commented Nov 6, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests