Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Always create AAAA alias records in route53 #3605

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

johngmyers
Copy link
Contributor

@johngmyers johngmyers commented May 14, 2023

Description

Always creates both A and AAAA alias records in Route53.

When a Route53 AAAA alias record points to an IPv4-only resource, AAAA DNS queries return zero records. This is the same behavior as when there is no AAAA alias record but is an A alias record. For this reason, there is no rational reason to not also create the AAAA alias record.

Adds logic for reconciling discrepancies between the A and AAAA alias records, including when only one of the two is present. In some situations, this reconciliation will take two iterations of the reconciliation algorithm to converge, as the Route53 provider does not store all information about the AAAA record with the discrepancy in the Endpoint. Encoding this information in the provider-specific annotations would result in a lot of complicated code for a rare edge case. An alternative would be to add an opaque provider-specific interface{} to Endpoint so that the Route53 provider could store the ResourceRecordSet directly.

This PR is an alternative to #2050.

Fixes #3707

Checklist

  • Unit tests updated
  • End user documentation updated

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 14, 2023
@szuecs
Copy link
Contributor

szuecs commented May 15, 2023

@johngmyers this will 1) change default to use dualstack and therefore 2) roughly double requests to AWS route53 with its "nice" API rate limits (see also #3402).
I would rather like to opt-in this feature at least via a new flag.

@johngmyers
Copy link
Contributor Author

One should not have to opt-in to expected, correct behavior. #3402 should address rate limiting, especially if it defaults to enabled.

At worst, this should be opt-out, though that will further increase complexity and the number of needed test cases.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2023
@johngmyers johngmyers mentioned this pull request May 16, 2023
2 tasks
@johngmyers
Copy link
Contributor Author

johngmyers commented May 16, 2023

Also, why does a mere factor of 1.5 deserve such complexity?

@szuecs
Copy link
Contributor

szuecs commented May 16, 2023

@johngmyers in general I agree with you!
I just wanted to raise awareness. Maybe you can also review the cache PR, I see that you did, but if you think it's ready please add your /lgtm, too.

@szuecs
Copy link
Contributor

szuecs commented May 16, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label May 16, 2023
@johngmyers johngmyers changed the title Always create AAAA alias records in route53 WIP Always create AAAA alias records in route53 May 16, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2023
@johngmyers
Copy link
Contributor Author

I'm going to experiment with implementing this using AdjustEndpoints. I'm concerned with the complexity of reconciling discrepancies between A and AAAA in the provider.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 16, 2023
@johngmyers johngmyers changed the title WIP Always create AAAA alias records in route53 Always create AAAA alias records in route53 May 16, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 16, 2023
@johngmyers
Copy link
Contributor Author

The AdjustEndpoints approach is much simpler, though with the disadvantage that the TXT registry creates ownership records for the additional AAAA alias records.

I don't think registries should be creating ownership records for endpoints that were added by AdjustEndpoints, though that's arguably an issue for a separate PR. I haven't yet figured out how to avoid creating such records without giving the txt registry knowledge of route53.

I'm also thinking of writing a dynamodb registry.

@johngmyers
Copy link
Contributor Author

To avoid creating ownership records for the AAAA alias records, we could:

  1. Define a label and have Provider.AdjustEndpoints() add said label to each additional Endpoint it creates.
  2. The presence of this label causes TXTRegistry.ApplyChanges() to not create any ownership records for such Endpoints.
  3. Define a new interface method on Provider or a new interface implemented by AWSProvider. Registry.Records() implementations call this method after adding all their Labels to the endpoints. The AWSProvider implementation of this method copies the Labels from the CNAME alias Endpoints to the corresponding AAAA alias Endpoints.

This wouldn't handle the case where the A record is deleted but the AAAA and TXT were not. To handle that case, the new interface method would have to take and endpointName, type, and possibly labels and return a set of additional endpointNames and types that the labels would apply to.

@szuecs
Copy link
Contributor

szuecs commented May 17, 2023

/ok-to-test

@johngmyers
Copy link
Contributor Author

@szuecs what is the next step to get this merged? I'm working on a DynamoDB registry in #3648, but I doubt that should be a prerequisite.

Would scheduling a meeting be helpful?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2023
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2023
@reltuk
Copy link

reltuk commented Jul 6, 2023

@johngmyers I've deployed 4dcce15 to a cluster on AWS and I'm seeing the following behavior:

When I add a new hostname to a service's external-dns.alpha.kubernetes.io/hostname annotation, it creates the following records:

_extdns.aaaa-hostname TXT
_extdns.cname-hostname TXT
_extdns.hostname TXT
hostname A
hostname AAAA

When I subsequently remove the same hostname from the service's annotation, it only deletes:

_extdns.cname-hostname TXT
_extdns.hostname TXT
hostname A

When I added the hostname back to the service, it created those same three records and left the AAAA-related records alone.

As I was testing the behavior of moving the hostname back and forth between two different services, I got the following panic during the reconciliation loop:

time="2023-07-06T22:28:27Z" level=info msg="Applying provider record filter for domains: [...]"
panic: runtime error: index out of range [3] with length 3

goroutine 1 [running]:
sigs.k8s.io/external-dns/provider/aws.(*AWSProvider).createUpdateChanges(0xc000df89c0?, {0xc000e964c0, 0x5, 0xc00007a400?}, {0xc00075edb0, 0x3, 0xc001574ca0?})
	/src/3p/external-dns/provider/aws/aws.go:469 +0x5c5
sigs.k8s.io/external-dns/provider/aws.(*AWSProvider).ApplyChanges(0xc000e93200?, {0x3c54410, 0xc000d06390}, 0xc000e8fd40)
	/src/3p/external-dns/provider/aws/aws.go:508 +0x86
sigs.k8s.io/external-dns/registry.(*TXTRegistry).ApplyChanges(0xc000609c30, {0x3c54410, 0xc000d06390}, 0xc000e8fce0)
	/src/3p/external-dns/registry/txt.go:284 +0xae5
sigs.k8s.io/external-dns/controller.(*Controller).RunOnce(0xc000de58c0, {0x3c54368, 0xc00013c230})
	/src/3p/external-dns/controller/controller.go:231 +0x61d
sigs.k8s.io/external-dns/controller.(*Controller).Run(0xc000de58c0?, {0x3c54368, 0xc00013c230})
	/src/3p/external-dns/controller/controller.go:313 +0xc5
main.main()
	/src/3p/external-dns/main.go:452 +0x4765

As far as I could tell, this may have been persisting until I deleted the 5 records associated with that external name in the zone. Not every move of the name seemed to cause this state. I feel like I saw other moves where the external name simply wasn't updated, and other moves where only the A record was updated.

The deployment itself has the following settings:

        - name: EXTERNAL_DNS_PROVIDER
          value: aws
        - name: EXTERNAL_DNS_DOMAIN_FILTER
          value: |-
            domain.com
            anotherdomain.com
            thirddomain.com
        - name: EXTERNAL_DNS_AWS_ZONE_TYPE
          value: public
        - name: EXTERNAL_DNS_TXT_OWNER_ID
          value: eks-cluster-1-external-dns
        - name: EXTERNAL_DNS_TXT_PREFIX
          value: _extdns.
        - name: EXTERNAL_DNS_REGISTRY
          value: txt
        - name: EXTERNAL_DNS_SOURCE
          value: |-
            service
            ingress

as well as a number of --zone-id-filters supplied as CLI arguments.

On the other hand, the initial deployment of this, when there were no existing matching records in the zone it was targeting and it created new records for a newly created Service went really well and seemed to work great.

@johngmyers
Copy link
Contributor Author

I suspect this is the issue that #3724 is attempting to fix.

@johngmyers
Copy link
Contributor Author

/hold for #3724

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 11, 2023
@johngmyers
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 10, 2023
@johngmyers
Copy link
Contributor Author

/assign @mloiseleur

@johngmyers
Copy link
Contributor Author

/test pull-external-dns-lint
/test pull-external-dns-unit-test

@johngmyers
Copy link
Contributor Author

/retest

@johngmyers
Copy link
Contributor Author

I think this is ready to go in.

@mloiseleur
Copy link
Contributor

I have tested it with a build of this PR that I published here.
I can confirm that:

  1. It should be documented in doc/tutorials/aws.md, since it works following this tutorial on Service with annotations
  2. When using --managed-record-types="A,CNAME", it did NOT update records at all (Same behavior between this PR and v0.13.5). So it does not seem to be a working opt-out.
  3. When changing the hostname, for instance from test2.example.com to test3.example.com, it did not delete the AAAA record:
time="2023-08-14T15:01:37Z" level=info msg="Applying provider record filter for domains: [example.com. .example.com.]"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE _extdns.cname-test2.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE _extdns.test2.example.com TXT "
time="2023-08-14T15:01:37Z" level=info msg="Desired change: DELETE test2.example.com A"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.aaaa-test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.cname-test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE _extdns.test3.example.com TXT"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE test3.example.com A"
time="2023-08-14T15:01:37Z" level=info msg="Desired change: CREATE test3.example.com AAAA"

@johngmyers johngmyers changed the title Always create AAAA alias records in route53 WIP Always create AAAA alias records in route53 Sep 15, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 15, 2023
@johngmyers
Copy link
Contributor Author

Blocked by #3910

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@irony
Copy link

irony commented Oct 29, 2023

Any news on this PR?

@paalkr
Copy link

paalkr commented Nov 13, 2023

Any status updates?

@paalkr
Copy link

paalkr commented Dec 7, 2023

I'm sorry to bump this one more time. It's a super important feature to us. Here in Norway, all public services must, by law, be announced with IPv6 support.

Anything we can contribute with to get this feature merged faster?

@mloiseleur
Copy link
Contributor

It is open source.
One can either wait for @johngmyers to finish this PR or create a new PR (based on this one), rebase and fix the remaining issues.

@schmitz-chris
Copy link

It is open source. One can either wait for @johngmyers to finish this PR or create a new PR (based on this one), rebase and fix the remaining issues.

not everyone can do this and fix the problem but still need the fix for their systems. So +1 from my site, I also need it.

@k8s-ci-robot
Copy link
Contributor

@johngmyers: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-external-dns-licensecheck 4dcce15 link true /test pull-external-dns-licensecheck

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS NLB in dual-stack does not create AAAA record
10 participants