-
Notifications
You must be signed in to change notification settings - Fork 58
internal/external DNS should only contain records for active Nexus instances #9060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…est-nexus-handoff
…andoff' into dap/handoff-dns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready for review, but I'm keeping it draft because it depends on #9059.
There was some discussion of using the reconfigurator-cli
simulated state around the active Nexus generation when computing the contents of internal/external DNS. This would be appropriate if we had reconfigurator-cli
commands that show DNS information for a blueprint based on the current simulated state (like blueprint-show
might). But we don't have such a command today. (blueprint-show
in reconfigurator-cli
doesn't show DNS information.) The only commands that compute DNS contents are load-example
(which is doing it for a freshly-created example system), blueprint-diff
(which is diff'ing historical blueprints and we can't assume the simulated state applies to them), and blueprint-diff-dns
(where you give it the specific DNS contents to diff against, so this too could be a historical blueprint, not a current one). So in the end, I chose to always use the blueprint's own generation. There's a big comment on blueprint_active_nexus_generation()
explaining why.
DNS zone: "control-plane.oxide.internal" (unchanged) | ||
unchanged names: 51 (records: 65) | ||
|
||
external DNS: | ||
* DNS zone: "oxide.example": | ||
* name: example-silo.sys (records: 3 -> 6) | ||
- A 192.0.2.2 | ||
- A 192.0.2.3 | ||
- A 192.0.2.4 | ||
+ A 192.0.2.2 | ||
+ A 192.0.2.7 | ||
+ A 192.0.2.3 | ||
+ A 192.0.2.6 | ||
+ A 192.0.2.4 | ||
+ A 192.0.2.5 | ||
unchanged names: 4 (records: 6) | ||
DNS zone: "oxide.example" (unchanged) | ||
unchanged names: 5 (records: 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change reflects the bug fixed by this PR. Before, we were updating DNS to include both old and new zones as soon as new ones were created. Now, we don't change DNS at this point in the process.
* DNS zone: "control-plane.oxide.internal": | ||
- name: 0c71b3b2-6ceb-4e8f-b020-b08675e83038.host (records: 1) | ||
- AAAA fd00:1122:3344:101::22 | ||
- name: 3eeb8d49-eb1a-43f8-bb64-c2338421c2c6.host (records: 1) | ||
- AAAA fd00:1122:3344:103::22 | ||
- name: 466a9f29-62bf-4e63-924a-b9efdb86afec.host (records: 1) | ||
- AAAA fd00:1122:3344:102::22 | ||
+ name: 90dbd6f3-9bcb-4a62-ad85-d61f5b3a36ad.host (records: 1) | ||
+ AAAA fd00:1122:3344:103::2b | ||
+ name: 9ae90740-7fdb-4073-ae43-048f2fca3d69.host (records: 1) | ||
+ AAAA fd00:1122:3344:102::2c | ||
* name: _nexus._tcp (records: 3 -> 3) | ||
- SRV port 12221 0c71b3b2-6ceb-4e8f-b020-b08675e83038.host.control-plane.oxide.internal | ||
- SRV port 12221 3eeb8d49-eb1a-43f8-bb64-c2338421c2c6.host.control-plane.oxide.internal | ||
- SRV port 12221 466a9f29-62bf-4e63-924a-b9efdb86afec.host.control-plane.oxide.internal | ||
+ SRV port 12221 90dbd6f3-9bcb-4a62-ad85-d61f5b3a36ad.host.control-plane.oxide.internal | ||
+ SRV port 12221 9ae90740-7fdb-4073-ae43-048f2fca3d69.host.control-plane.oxide.internal | ||
+ SRV port 12221 d516d61b-fd96-46ad-a743-78eec814ee90.host.control-plane.oxide.internal | ||
+ name: d516d61b-fd96-46ad-a743-78eec814ee90.host (records: 1) | ||
+ AAAA fd00:1122:3344:101::2b | ||
unchanged names: 47 (records: 59) | ||
|
||
external DNS: | ||
DNS zone: "oxide.example" (unchanged) | ||
unchanged names: 5 (records: 12) | ||
* DNS zone: "oxide.example": | ||
* name: example-silo.sys (records: 3 -> 3) | ||
- A 192.0.2.2 | ||
- A 192.0.2.3 | ||
- A 192.0.2.4 | ||
+ A 192.0.2.7 | ||
+ A 192.0.2.6 | ||
+ A 192.0.2.5 | ||
unchanged names: 4 (records: 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this change reflects the bug fixed by this PR. Before, we had already updated DNS (wrongly) at this point in the process. This is the correct time to do it (we've just done the handoff).
dev-tools/reconfigurator-cli/tests/output/cmds-target-release-stdout
Outdated
Show resolved
Hide resolved
dev-tools/reconfigurator-cli/tests/output/cmds-target-release-stdout
Outdated
Show resolved
Hide resolved
I wound up backing out the change that showed the DNS changes across the handoff itself. That was using |
- A 192.0.2.5 | ||
+ A 192.0.2.7 | ||
+ A 192.0.2.6 | ||
+ A 192.0.2.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I didn't notice this before (and this is not relevant to this PR's changes), but it looks like our DNS diffing code could use some tweaks? This was previously saying we removed 192.0.2.{4,5,6,7}
and added 192.0.2.{5,6,7}
; shouldn't it have just reported that we were removing 4 and keeping 5,6,7?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be an improvement, yeah.
"Deploy DNS records", | ||
async move |cx| { | ||
let Some(nexus_id) = nexus_id else { | ||
return StepSkipped::new((), "not running as Nexus").into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this impact reconfigurator-exec-unsafe
in some way that needs documenting or warnings from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does mean this step will no longer be done by that tool. That should be visible in the output (that this step was skipped). There are other steps it already doesn't do.
This is not ready yet.
target-release
test simulating an entire update #9059 rather than have the existing callers just use the current blueprint's nexus generation (as this PR currently does)explicit tests for blueprint_internal_dns_config/blueprint_external_dns_config for these cases(not doing this)reconfigurator-cli
tests (usingblueprint-diff-dns
)Fixes #8505.