Skip to content

Conversation

jgallagher
Copy link
Contributor

This grew a bit bigger than I expected. The main goal is to fix #6261: now that R10 is out the door, we no longer need to tell each internal NTP zone the specific names of all boundary NTP zones. That grew into a few other cleanup operations:

  • I removed the ntp_servers, dns_servers, and domain fields from OmicronZoneType::InternalNtp. The only thing it needs now is internal DNS, which it can get the same way every other zone does. (I left address in place for consistency with other zones, but I was tempted to remove it too: some discussion in Sled-agent service creation ignores the provided addresses #6651.)
  • In sled-agent, we no longer try to resolve internal DNS via internal DNS to populate /etc/resolve.conf for zones that expect to access internal DNS; instead, we populate it with the fixed list of reserved INTERNAL_DNS_REDUNDANCY IP addresses. This was immediately important in getting tests passing, but is also probably the right thing to do; I added a comment summarizing the points @davepacheco made in a call earlier today.
  • Renamed MAX_INTERNAL_DNS_REDUNDANCY to RESERVED_INTERNAL_DNS_REDUNDANCY, and changed almost all of its users to use INTERNAL_DNS_REDUNDANCY instead. This split was confusing, and we had helper methods to construct DNS clients that disagreed on which constant to use. RESERVED_INTERNAL_DNS_REDUNDANCY should be more clear that this is reserved for future work/growth, and INTERNAL_DNS_REDUNDANCY should be the value used in practice.
  • Fixed a bug in the reconfigurator planner where the external networking and DNS subnet allocators were considering all resources used by the parent blueprint as in use, even if the planner had decided to expunge the zones owning those resources. The fix for this is kinda gross; I'm open to better ideas but also I'd like to take a crack at a larger refactoring once some of the other big planner PRs have landed.

I had tested an R10 -> this branch upgrade on london, but only about halfway through the above work. I want to retest both an R10 -> branch upgrade and a fresh RSS install to ensure there are no lingering expectations on the old internal NTP config path, but since I'd made it through a successful upgrade, I think this can go ahead and be opened for review.

Copy link
Contributor

@plotnick plotnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think the lazy initialization of the network allocators is a great idea.

I tried merging main (including #6581) into this and seeing if I could get test_reuse_external_dns_ips_from_expunged_zones to pass with one fewer planning passes, but it didn't work on my first try. Probably user error, as it should now work; I can try again later this afternoon. I don't think that should block this, just would be a nice side-effect. I can open a follow-up PR with any such cleanups.

Comment on lines +27 to +35
/// The potential number of internal DNS servers we hold reserved for future
/// growth.
///
/// This determines the number of addresses which are reserved for internal DNS servers.
pub const MAX_INTERNAL_DNS_REDUNDANCY: usize = 5;
/// Any consumers interacting with "the number of internal DNS servers" (e.g.,
/// to construct a DNS client) should operate in terms of
/// [`INTERNAL_DNS_REDUNDANCY`]. This constant should only be used to reserve
/// space where we could increase `INTERNAL_DNS_REDUNDANCY` up to at most this
/// value.
pub const RESERVED_INTERNAL_DNS_REDUNDANCY: usize = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments should go a long way in reducing confusion about which of these to use. Thanks for straightening that out!

* Semantic mismerge on the OnceCell stuff
* Addresses new uses of parent_blueprint that need to look at "current
  builder state" instead
@jgallagher
Copy link
Contributor Author

I tried merging main (including #6581) into this and seeing if I could get test_reuse_external_dns_ips_from_expunged_zones to pass with one fewer planning passes, but it didn't work on my first try. Probably user error, as it should now work; I can try again later this afternoon. I don't think that should block this, just would be a nice side-effect. I can open a follow-up PR with any such cleanups.

I think I got this working in 3096591. There needed to be some other tweaks to get this branch and new main to align (in particular, the construction of available_external_dns_ips also needed to look at "the current builder state" instead of only looking at parent_blueprint).

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup. I haven't yet looked at the blueprint builder parts.

Comment on lines -241 to -250
ntp_servers: Vec<String>,
dns_servers: Vec<IpAddr>,
domain: Option<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this means that Sled Agent will no longer be sending this information as part of inventory, and that this is okay because we're still doing offline upgrades so Nexus will always be sync'd up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is most obvious in the change to openapi/sled-agent.json. Dropping these fields affects both PUT /omicron-zones (in a backwards-compatible way: if we were to receive a put from an old Nexus, which we can't, we'd ignore the extra fields) and GET /omicron-zones (in a backwards-incompatible way: we won't return these fields, which an old Nexus would expect to receive).

Comment on lines -1289 to -1484

// Set the zone specific fields
inv_omicron_zone.ntp_ntp_servers = Some(ntp_servers.clone());
inv_omicron_zone.ntp_dns_servers = Some(
dns_servers.iter().cloned().map(IpNetwork::from).collect(),
);
inv_omicron_zone.ntp_domain.clone_from(domain);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting a change to the db model types and database schema and then I thought "this is gonna be complicated" because we might have old inventories that have this information. But I guess maybe: no change is strictly necessary because these are NULL-able fields anyway, so they will just be NULL for all new inventories?

Should we still drop the columns? This technically loses data from old inventories where they were populated but I think we're saying this data is useless anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot drop the columns; they were used (i.e., set to non-NULL values) by both internal NTP and boundary NTP; they are still used by boundary NTP. Good question though - I need to update some comments that refer to them as applying to both kinds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comments on these columns in dbinit.sql in 48ee15b

"address": {
"type": "string"
},
"dns_servers": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: the change to this file is okay, even though it's an on-disk interface, because new schema describes a subset of the old schema? i.e., we will be able to continue reading files that had these properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct on both counts. If OmicronZoneType had a #[serde(deny_unknown_fields)] we'd have a problem, but I believe we only attach that attribute to structs that are deserialized from things that should never be wrong (e.g., config files that are shipped in the zone alongside the binary).

"address": {
"type": "string"
},
"dns_servers": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same as all-zones-requests.json) To be clear: the change to this file is okay, even though it's an on-disk interface, because new schema describes a subset of the old schema? i.e., we will be able to continue reading files that had these properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Comment on lines 97 to 103
#[arg(
short,
long,
num_args = 1..,
num_args = 0..,
value_parser = NonEmptyStringValueParser::default(),
)]
servers: Vec<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly:

  • for boundary servers, this argument is essentially required (there must be at least one value in the Vec)
  • for internal NTP servers, this argument is not used (and should not be provided)

Is that right? (Might be worth adding a comment. (edit: I found it in the SMF manifest. Maybe worth mentioning here?))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's mostly right, but it's slightly uglier than that. The SMF manifest for chrony-setup unconditionally passes the -s flag followed by any configured servers:

exec='/opt/oxide/zone-setup-cli/bin/zone-setup chrony-setup -b %{config/boundary} -p %{config/boundary_pool} -s %{config/server} -a %{config/allow}'

The change from 1 to 0 here allows that manifest to pass ... -s -a ...; i.e., the flag with no arguments to it. We should probably clean that up - I can file an issue? I assume that would involve replacing the exec command with a bash script a la our various method_script.sh helpers so we could do some checking on which flags to pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported over the comments (and added a check that clap was previously doing for us, oops) in 48ee15b. Still leaning toward filing an issue to make a cleanup pass over the way this binary accepts args from SMF, unless you think that's not worth the trouble.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still leaning toward filing an issue to make a cleanup pass over the way this binary accepts args from SMF, unless you think that's not worth the trouble.

I've no opinion on that.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing fields is nice. Removing the ugly builder code that was populating them is even nicer!

pub fn add_external_dns_ip(&mut self, addr: IpAddr) {
self.external_networking.add_external_dns_ip(addr);
self.external_networking()
.expect("failed to initialize external networking allocator")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this expect okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is #[cfg(test)] only.

Comment on lines +357 to +365
// Helper method to create a `BuilderExternalNetworking` allocator at the
// point of first use. This is to work around some timing issues between the
// planner and the builder: `BuilderExternalNetworking` wants to know what
// resources are in use by running zones, but the planner constructs the
// `BlueprintBuilder` before it expunges zones. We want to wait to look at
// resources are in use until after that expungement happens; this
// implicitly does that by virtue of knowing that the planner won't try to
// allocate new external networking until after it's done expunging and has
// started to provision new zones.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all fine, but just to be clear: is this just trying to avoid an extra planning lap when expunging a zone before you can use its resources again? I feel like there's another issue here and I also feel like we talked about this a bunch -- sorry -- but I am not now remembering why else this is important.

Copy link
Contributor Author

@jgallagher jgallagher Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, this is just trying to avoid an extra planning lap, yes. In practice, the way the planner is currently written, it expects to be able to reuse resources within a single planning execution and fails if it cannot:

  1. It first performs any necessary zone expungements (based on sled or disk expungements from PlanningInput)
  2. It then adds any services as required by the policy, taking into account any services that are now underrepresented because of expungements that happened in step 1

Without these helpers to delay the construction of the allocators, we were seeing this:

  1. Allocators are constructed. They assume resource availability based solely on the parent blueprint.
  2. Planner performs expungement. Suppose we expunge an external DNS zone.
  3. Planner performs new service addition. It sees we're down an external DNS zone, so tries to add one. This fails, because the allocator was based on the parent blueprint, in which all the external DNS IPs were in use by in-service zones.
  4. We're now stuck; we can't get to a subsequent planning stage because we can't produce a child blueprint at all, given these (valid) inputs.

Maaaybe another way to tackle this would be to make the planner bail out if it performed any expungements? But that feels like a much bigger change.

Copy link
Collaborator

@davepacheco davepacheco Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way's fine with me but maybe we could add this context to the comment? A sentence or two is fine: "This is necessary because the planner assumes that it can expunge zones and replace them in the same pass."

Some(SqlU32::from(*gz_address_index));
}
BlueprintZoneType::InternalNtp(
blueprint_zone_type::InternalNtp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing fields from ZoneTypes rules! 🎉

@jgallagher
Copy link
Contributor Author

I haven't forgotten about this and still want to land it before R11, but london/madrid have been occupied all week and I'd strongly prefer to get a live test in before merging. Will do that as soon as I can.

@jgallagher
Copy link
Contributor Author

I grabbed london today to put this through its paces. I set london up on R10 to start.

From the internal NTP zone on sled 15:

SMF config for chrony-setup shows both the new boundary DNS name and the names of the specific boundary NTP zones, as expected:

      <property_group name='config' type='application'>
        <propval name='allow' type='astring' value='fd00:1122:3344:100::/56'/>
        <propval name='boundary' type='boolean' value='false'/>
        <propval name='boundary_pool' type='astring' value='boundary-ntp.control-plane.oxide.internal'/>
        <property name='server' type='astring'>
          <astring_list>
            <value_node value='1678ac0f-7c4e-4ee8-a3a7-bf2e2f9f68a0.host.control-plane.oxide.internal'/>
            <value_node value='d030daa3-d8b4-42ff-acf5-72703622e6a4.host.control-plane.oxide.internal'/>
          </astring_list>
        </property>
      </property_group>

and the chrony config file confirms:

root@oxz_ntp_7081ac04:~# tail -n 5 /etc/inet/chrony.conf
refresh 120

server 1678ac0f-7c4e-4ee8-a3a7-bf2e2f9f68a0.host.control-plane.oxide.internal iburst minpoll 0 maxpoll 4
server d030daa3-d8b4-42ff-acf5-72703622e6a4.host.control-plane.oxide.internal iburst minpoll 0 maxpoll 4
pool boundary-ntp.control-plane.oxide.internal iburst maxdelay 0.1 maxsources 16

chronyc sources shows the two resolved IP addrs:

root@oxz_ntp_7081ac04:~# chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* fd00:1122:3344:101::10        4   2   377     2    +16us[+9023ns] +/-   14ms
^+ fd00:1122:3344:102::f         4   3   377     7  -3286ns[  -15us] +/-   14ms

I then parked the rack and mupdated to this branch. Everything came back online. From the same internal NTP zone on sled 15, we see the explicit boundary server names gone from the SMF config:

      <property_group name='config' type='application'>
        <propval name='allow' type='astring' value='fd00:1122:3344:100::/56'/>
        <propval name='boundary' type='boolean' value='false'/>
        <propval name='boundary_pool' type='astring' value='boundary-ntp.control-plane.oxide.internal'/>
      </property_group>

and the chrony config file:

root@oxz_ntp_7081ac04:~# tail -n 3 /etc/inet/chrony.conf
refresh 120

pool boundary-ntp.control-plane.oxide.internal iburst maxdelay 0.1 maxsources 16

but chrony still found the two NTP boundary NTP servers via DNS:

root@oxz_ntp_7081ac04:~# host boundary-ntp.control-plane.oxide.internal
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:101::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::f
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:101::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::f
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:101::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::f
root@oxz_ntp_7081ac04:~# chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^+ fd00:1122:3344:102::f         4   6   377    15  -1482us[-1482us] +/-   25ms
^* fd00:1122:3344:101::10        4   6   377    21  -1442us[-2669us] +/-   25ms

Finally, I clean slate'd london, left it on this branch, and confirmed we could go through RSS. After RSS completed, the internal NTP zones were identical to the post-mupdate bits above (no explicit names, successfully found boundary NTP zones via DNS).


Given both R10-to-this-branch-mupdate and fresh RSS work, I'm gonna go ahead and land this so it makes it into R11.

@jgallagher jgallagher merged commit f0b8048 into main Sep 30, 2024
18 checks passed
@jgallagher jgallagher deleted the john/remove-old-ntp-config branch September 30, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Post release 10: Remove zone-specific server lines from internal NTP chrony config

4 participants