-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
designate: fix deletion of TXT records #1255
designate: fix deletion of TXT records #1255
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mcayland The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @mcayland! |
|
I thought I disabled it, sorry for confusing you. This has nothing to do with your PR. |
Because I saw you also updated the docs, there's already a PR for that: #1254. Maybe you want to have a look there first and see if anything is missing (please comment), if not I would suggest to drop the change of the doc in your PR and leave the rest. I will have a look ASAP. Thanks again 👍. |
0224647
to
6d3ab36
Compare
Yes indeed - looks like I missed the documentation change by just a few hours :) #1254 looks good enough to me, so I've just repushed the branch with the documentation changes dropped. |
If I got this right this will add provider specifics inside the TXT records and if that's the case will there not a cap for TXT records? At least on AWS we had this issue once, you have only a limit of 255 character. |
Well that's not the case here, or at least testing this PR locally I get 2 records similar to the following generated in designate DNS: foo.my.domain A 1.2.3.4 Maybe part of this could be due to me misunderstanding the exact purpose of a Label: my understanding from reading the source is that Endpoint.Labels is a map of data that comes from the provider, and the txt registry implementation simply serialises the contents of the map into the TXT record. For this reason I looked at the code and noticed that the designate ids for the zone and recordsets are retrieved from designate directly based upon externalName: these are then used to populate the Endpoint. Given that these ids are only used internally, it seemed that ProviderSpecific was the right way to store this information within the Endpoint without exposing it which is why I implemented it. However: just to confuse things more, the designate provider apparently (ab)uses Endpoint.Labels to store the internal zone and recordset ids, even though they are not visible as part of the TXT record. I guess this is because ProviderSpecific didn't exist at the time the provider was written? Actually in fact I also see there is still a "designateOriginalRecords" Label which should be switched to ProviderSpecific if this interpretation is correct. If you can clarify whether Label or ProviderSpecific is the correct mechanism to attach the internal ids to Endpoints then that will help decide what the correct approach should be. |
@mcayland your explanation is correct. After reviewing again it seems to be fine but we want to run it in a test cluster again if this also works with other provider. In the past we had some issues when we enabled cache sync and it tried to recreate records again and again due to provider specific stuff. |
Thanks @njuettner! In that case the one remaining designateOriginalRecords property should also be converted from a Label to a ProviderSpecific. Let me do that first so then everything is consistent and repush. |
6d3ab36
to
0ae6375
Compare
@njuettner I've now pushed an updated version of this branch. I made a few minor tweaks from the previous version:
Let me know how the testing goes, and if something doesn't quite work I can try my best to help. |
0ae6375
to
6e81ee7
Compare
@njuettner after sleeping on this, I've realised that adding a prefix to namespace the ProviderSpecific properties is just masking the real problem which is that ProviderSpecific properties are not preserved for TXT records (re)generated by the txt registry implementation of ApplyChanges(). Fortunately it is possible to fix this using a similar method as already happens for Labels, and so once that is working all that remains is to switch the relevant internal properties from Labels to ProviderSpecific properties and then everything works as intended. I think this is a much better solution since then all the provider has to do is attach ProviderSpecific properties to the original Records which are then visible from end-to-end all the way through to the provider's ApplyChanges() implementation - there is no need for the provider to have any preconceived ideas around overlapping property namespaces. |
@njuettner have you had a chance to put this latest version through its paces on your test cluster? or is there anything else you're waiting for from me? |
@mcayland could your do a rebase again? |
@njuettner i've tried a rebase locally, but something appears to be broken elsewhere in external-dns. The build completes fine and all regression tests pass, but in the logs I keep seeing repeated messages like this:
... and the DNS records seem to be ping-ponging out of existence. Any idea what is going on here? |
@njuettner there seems to be 2 separate issues here. The first issue is the constant stream of error messages of the form "round_trippers.go:174] CancelRequest not implemented by *instrumented_http.Transport" and "streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)" in the log files which I've traced down to this PR:
Unfortunately bisecting this exactly is proving to be difficult since PR review doesn't appear to require squashing of build fixes into the original commit before merge :( My guess is that this may be something internal to the Azure SDK but I don't really know enough about this to diagnose it further. |
@njuettner the issue with the "ping-pong" creation and destruction of records I managed to bisect down to this commit:
This is part of PR #1008 and it seems to be making changes to the way ProviderSpecific properties are being handled, but the commit message does not really give any detail as to why these are required and what they were trying to fix. @devkid can you clarify further the intention of the ProviderSpecific changes here? (Edit: one additional point here: the ping-pong occurs only with this PR applied of the above commit which makes me believe that something in this commit has subtly changed the ProviderSpecific semantics) |
6e81ee7
to
516dbbd
Compare
@njuettner I've just repushed the rebased branch for this PR which although it builds and passes regression, suffers from being stuck in a "ping-pong" cycle of creating and then destroying the records on alternate runs. For reference the previous working branch is here: https://github.com/mcayland/external-dns/tree/designate/fix-delete-TXT-records.good |
@mcayland The intention of the fix was: previously |
@devkid Thanks for the info, I can see why that change is needed. I think the designate provider has some similar logic internally to try and track changes vs. update/remove: https://github.com/kubernetes-sigs/external-dns/blob/master/provider/designate.go#L45 Since this PR also switches the per-Endpoint data from Labels to ProviderSpecific then I wonder if these 2 things are in conflict with each other? |
Could you describe in more detail what behavior you are seeing? Do you mean records are created in one iteration, then deleted in the next one, then created again etc.? |
@devkid Yes, that's exactly it. I've spent a few more hours this afternoon digging at this and I think I understand why this is happening:
vs.
Note that the desired state does not contain the ProviderSpecific information: my guess is that this is because the desired Endpoint is generated from service.go's Endpoints() function. And in this case as discussed above with @njuettner the ProviderSpecific entries for designate are ephemeral: therefore with your change to shouldUpdateProviderSpecific() the current and desired state will never match because the service Endpoint will never contain the ephemeral properties, which is why we see this continual loop of record creation and deletion. |
I've just updated the related issue at #1122 to keep this alive. @njuettner @devkid thank you for your comments so far, but do we have any answers to #1255 (comment) yet? |
The fact that designate doesn't work with |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@mcayland: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Bumping this back up as it is still an active problem with the designate provider. |
/remove-lifecycle rotten @mcayland are you planning on finishing this pull request? Looks like the PR needs a rebase and there are two files with conflicts. |
/kind bug |
Sorry about the late reply, and thanks for all the pings. Due to COVID my contract working on OpenStack was extended, but is now due to terminate in the next month after which I will lose access to the cluster. Now it may be that I get a few odd days towards the end of next month to look at this again, although any time and testing will be quite limited. What's the current state of upstream regarding |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mcayland thank you for your PR from 2019. Alas this issue is still a problem in 2022. |
@stefanandres no updates from this end I'm afraid. As documented in the original issue #1122 above, the work was done under contract for a client OpenStack/k8s installation which I was given permission to submit to upstream. Unfortunately due to issues with the changes to I still get emails from people asking about this feature, so there is clearly still interest. If someone were willing to sponsor the work and provide access to a test OpenStack/k8s installation then I would certainly be interested to revisit this with the aim of getting a fix merged upstream. |
This patchset ultimately contains a fix for the designate provider not deleting TXT records (see #1122) whilst on the way implementing ProviderSpecificProperty for the txt registry.