-
Notifications
You must be signed in to change notification settings - Fork 7
fix(hostedzone): Set NS record's TTL to 900 #231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Maybe we could add a default value to the NSTTL somewhere? Not sure if its needed, I'd just prefer to not have to read deep into DNS RFC's to figure out a sane value when using the create hosted zone function in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
Codecov Report
@@ Coverage Diff @@
## master #231 +/- ##
==========================================
- Coverage 38.51% 38.26% -0.25%
==========================================
Files 102 102
Lines 2942 2961 +19
==========================================
Hits 1133 1133
- Misses 1809 1828 +19 |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Description
Set TTL of NS record to 15 minutes when creating a hosted zone.
Motivation and Context
When users delete a cluster, and creates it again with the same domain name, okctl create fails. This is due to DNS resolvers having cached a now outdated NS record for the domain. The step immediately after creating a hosted zone, create cognito identity pool, fails.
And because users frequently creates and deletes clusters when testing okctl for the first x times, they will run into this quite often.
The current problem before this PR is something like this (not 100% sure here): After a delete cluster, and when trying to create it again, oslo.systems's NS records correctly points to the new NS record of mycluster.oslo.systems. However, the NS record of the old mycluster.oslo.systems had TTL two days, meaning DNS resolvers still will return the old NS record. So when okctl create (or anyone) attempts to use mycluster.oslo.systems for anything, DNS resolvers will ask the old NS name servers to do lookups, making any request fail to mynewcluster.oslo.systems fail. For instance, getting the A-record of myapp.mycluster.oslo.systems will resolve to the IP of myapp in my previous cluster.
By setting the TTL to 900 seconds, the user can just wait 15 minutes, instead of two days, after deleting a cluster before creating it again.
Why 900 secs / 15 minutes? Because this observation: https://blog.apnic.net/2019/11/12/stop-using-ridiculously-low-dns-ttls/
Our users can increase to 1 or 2 hours later when they are done testing, 15 minutes seems like a good sweet spot to facilitate multiple runs of okctl and a inefficient and too low TTL.
How Has This Been Tested?
Manually.
Screenshots (if appropriate):
Types of changes
Checklist: