-
Notifications
You must be signed in to change notification settings - Fork 62
Description
The underlay network provides a communication substrate for the software infrastructure that underpins customer instances and networks. The primary functions of this network are addressing, naming and routing. The purpose of this issue is to identify parts of the underlay network that will need to be directly managed by the control plane.
In this issue I'll refer to any component that communicates over the underlay as an endpoint. Examples of endpoints include Propolis instances, control plane daemons, OPTE etc.. Some of these examples also have endpoints on overlay networks e.g., customer network paths running from Propolis guests through OPTE and onto a Geneve encapsulated overlay network. However, this issue focuses on the underlay.
As endpoints are created, deleted and potentially moved or migrated, the underlay network will need to be actively managed to support those endpoints.
Addressing
Any time a new endpoint comes onto the underlay it will need an address and someone will need to set that address up. As described in RFD 63, each compute host gets an IPv6 /64 prefix assigned to it. When the system first starts up, the first address in this /64 will be automatically assigned to the host. So if a host is assigned the prefix fd00:1701:d:0101::/64, once the router running on the host peers with the rack level router, the rack level router will provide that prefix to the host and the host routing daemon will self assign fd00:1701:d:0101::1/64 as the primary underlay address for that host. This address provides a basis for communications to get started and can allow the control plane to bootstrap.
When new endpoints are created that need their own underlay address, for example a Propolis instance, the control plane will need to allocate an address for that instance and assign it to the corresponding compute host. The endpoint being created also needs to know the address it's supposed to bind to.
Tasks
- Identify all underlay endpoints in an Oxide system.
- Determine how rack-level routers will get the rack-level IPv6 underlay /56 prefix to allocate and distribute /64 prefixes from.
- Determine what system is in charge of allocating endpoint addresses.
- Determine what system is in charge of managing endpoint addresses on compute hosts.
Naming
Addresses are based on a hierarchical representation of availability zone, then rack, then compute node. As such, they are bound to a specific compute host in a rack and do not move. Binding names to addresses allows us to decouple how we refer to an endpoint from where it actually lives. As new endpoints are created and destroyed and possibly moved, the control plane will need to assign the appropriate names to the addresses it creates for those endpoints.
Of particular note here is when an endpoint moves, say for live migration of a Propolis instance. This will require creating a temporary name for the new instance while it's being set up and then moving the actual name to the new instance when the migration has completed.
Naming could be mechanically carried out in one of two ways. There could be an underlay DNS server or, the control plane could provide addressing information for destination endpoints directly to source endpoints.
Tasks
- Determine what system is in charge of allocating names and associating them to addresses.
- Determine how naming will mechanically be carried out.
- Determine how moving a named endpoint will work before, during and after migration.
Routing
This section is here as a placeholder. At the time of writing, I think underlay routing within a rack will be mostly or completely autonomous and will not need direct control plane management. Going between racks, specifically going between racks over an untrusted network will require control plane management for authenticated peering, but I'm not ready for that conversation yet.
Tasks
TBD
Examples
Live Migration
The following example shows how live migration could work from an underlay network point of view.
Compute host 1 starts out with two VMs from Alice and Bob. Compute host 2 starts out empty. Each VM has an underlay address for the Propolis instance. On compute host 1, this address is based on the availability zone 1701:d, rack 1 (the first 01 in the 4th segment of the address) and host 1 (the second 01 in the 4th segment of the address). In this example the arbitrary choice of starting VM instance addresses at 100 is chosen.
When Bob's VM is migrated and the control plane decides on compute host 2 as the new host, a new Propolis instance is stood up with the address fd00:1701:d:0102::100/64. No routing action needs to be taken as the entire prefix fd00:1701:d:0102::/64 is already advertised into the broader network for compute host 2. A new name likely does need to be assigned such as bobs-vm-migrating so this new Propolis instance can be referred to by other underlay endpoints, in particular the source Propolis instance it's migrating from. When the migration is complete the name bobs-vm can move to fd00:1701:d:0102:100/64, the address fd00:1701:d:0101::101/64 can be recycled and the name bobs-vm-migrating can be removed.
