Skip to content

Commit

Permalink
DHCP-less proposal
Browse files Browse the repository at this point in the history
Draft proposal to explore how we can enable the CAPM3 IPAM flow in a DHCP-less
environment
  • Loading branch information
hardys committed Dec 19, 2023
1 parent edef5b1 commit 6276cd3
Showing 1 changed file with 391 additions and 0 deletions.
391 changes: 391 additions & 0 deletions design/dhcp-less.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,391 @@
<!--
This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
-->

# DHCP-less

Discuss options and outline a proposal to enable DHCP-less leveraging existing
Ironic support for this functionality, without dependencies on downstream customizations.

## Status

implementable

## Summary

Metal<sup>3</sup> provides an IPAM controller which can be used to enable
deployment with static-IPs instead of DHCP, however currently it is not
possible to use this functionality in a fully DHCP-less environement without
downstream customizations.

This proposal outlines the outstanding issues, and potential solutions to
enable an improved DHCP-less solution for Metal<sup>3</sup> users.

## Motivation

Infrastructure management via Metal<sup>3</sup> in DHCP-less environments
is common, but today our upstream features only partially solve for this use-case.

Since there are several groups in the community who require this functionality,
it makes sense to collaborate and ensure we can use upstream components where
possible and only depend on downstream customizations where absolutely required.

### Goals

Provide a method to support DHCP-less deployments without any downstream
customizations (except perhaps a different IPA ramdisk image?).

Enable e2e integration of the CAPM3 IPAM components such that it's possible
to deploy in a DHCP-less environment using static network confgiguration
managed via Metal<sup>3</sup> resources.

### Non-Goals

Existing methods used to configure networking via downstream customizations (such
as a custom PreprovisioningImageController) are valid and will still sometimes
be required, this doesn't aim to replace such methods - the approach here may be
complimentary for those users wishing to combine CAPM3 IPAM features with
a PreprovisioningImageController.

This proposal will focus on the Metal<sup>3</sup> components only - there are
also OS dependencies and potential related areas of work in Ironic, these will
be mentioned in the Dependencies section but not covered in detail here.

This proposal will only consider the Metal<sup>3</sup> IPAM controller -
there are other options but none are currently integrated via CAPM3.

## Proposal

Implement a new CAPM3 controller to handle setting the BareMetalHost `preProvisioningNetworkDataName`
in an automated way via existing Metal<sup>3</sup> IPAM resources.

### User Stories

#### Static network configuration (no IPAM)

As a user I want to manage my networkConfiguration statically as part of my
BareMetalHost inventory.

In this case the network configuration is provided via a Secret which is
either manually created or templated outside the scope of Metal<sup>3</sup>

The BareMetalHost API already supports two interfaces for passing network configuration:

* `networkData` - this data is passed to the deployed OS via Ironic via a
configuration drive partition. It is then typically read on firstboot by
a tool such as `cloud-init` which supports the OpenStack network data format.
* `preprovisioningNetworkDataName` - this data is designed to allow passing data
during the preprovisioning phase, e.g to configure networking for the IPA deploy
ramdisk.

The `preprovisioningNetworkDataName` API was added initially to enable [image
building workflows](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#preprovisioningimage), and a [recent BMO change](https://github.com/metal3-io/baremetal-operator/pull/1380) landed to enable this flow without any custom PreprovisioningImage controller.

#### IPAM configuration

As a user I wish to make use use of the Metal<sup>3</sup> IPAM solution, in a
DHCP-less environment.

Metal<sup>3</sup> provides an [IPAM controller](https://github.com/metal3-io/ip-address-manager)
which can be used to allocate IPs used as part of the Metal3Machine lifecycle.

Some gaps exist which prevent realizing this flow in a fully DHCP-less environment,
so the main focus of the proposal will be how to solve for this use-case.

##### IPAM Scenario 1 - common IPPool

An environment where a common configuration is desired for the pre-provisionining
phase and the provisioned BareMetalHost (e.g scenario where hosts are permanentaly
assigned to specific clusters)

##### IPAM Scenario 2 - decoupled preprovisioning/provisioning IPPool

An environment where a decoupled configuration is desired for the pre-provisionining
phase and the provisioned BareMetalHost (e.g BMaaS scenario where end-user network configuration
differs from the commissioning phase where a different configuration is desired for inspection/cleaning)

## Design Details

`Metal3MachineTemplate` and `Metal3DataTemplate` are used to apply networkData to specific BareMetalHost resources,
but they are by design coupled to the CAPI Machine lifecycle.

This is a problem for the pre-provisioning use-case since at this point we're preparing the BareMetalHost for
use, there is not yet any Machine.

To resolve this below we outline a proposal to add two new resources with similar behavior for the pre-provisioning
phase `Metal3PreProvisioningTemplate` and `Metal3PreProvisioningDataTemplate`

### API overview

The current flow in the provisioning phase is as follows (only the most relevant fields are included for clarity):

```yaml
apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
name: pool-1
spec:
clusterName: cluster

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3DataTemplate
metadata:
name: data-template
spec:
clusterName: cluster
networkData:
networks:
ipv4:
- id: eth0
ipAddressFromIPPool: pool-1

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3MachineTemplate
metadata:
name: machine-template
spec:
template:
spec:
dataTemplate:
name: data-template
hostSelector:
matchLabels:
cluster-role: control-plane

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: machine-deployment
spec:
clusterName: cluster
replicas: 1
template:
spec:
clusterName: cluster
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3MachineTemplate
name: machine-template
```

In this flow when a Metal3Machine is provisioned via the `MachineDeployment`, BareMetalHost resources labeled
`cluster-role: control-plane` will have `networkData` defined with an IP derived from the `pool-1` `IPPool`.

In CAPM3 an IPClaim is created to reserve and IP from the IPPool for each Machine, and an IPAddress resource
contains the data used for templating of the `networkData`

#### Preprovisioning - Common IPPool

```yaml
apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
name: pool-1
spec:
clusterName: cluster

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3PreprovisioningDataTemplate
metadata:
name: preprov-data-template
spec:
preprovisioningNetworkData:
networks:
ipv4:
- id: eth0
ipAddressFromIPPool: pool-1

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3PreprovisioningTemplate
metadata:
name: preprov-template
spec:
template:
spec:
dataTemplate:
name: preprov-data-template
hostSelector:
matchLabels:
pre-provisioning: foo
```

In this flow there is no `MachineDeployment`, BareMetalHost resources labeled to match the
preprov-template hostSelector will have preprovisioningNetworkDataName assigned using the same process
outlined above for `networkData` above.

There are a few things to consider:

To avoid the risk of multiple Metal3PreprovisioningTemplate resources matching BareMetalHost resources (which would be ambiguous)
a BMH must match *exactly one* Metal3PreprovisioningTemplate for the conroller to take action, if more than one matches it will be
reflected as ignored via the Metal3PreprovisioningTemplate status.

The preprovisioningNetworkDataName is used by default for networkData in the baremetal-operator, so in this configuration it's not
strictly necessary to specify networkData via Metal3DataTemplate, however we'll want to delete the IPClaim after preprovisioning
in the decoupled flow below so it seems likely we'll want to behave consistently and rely on the IP Reuse functionality if a
consistent IP is required between pre-provisioning and provisioning phases.

#### Preprovisioning Decoupled IPPool

```yaml
apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
name: pool-1
spec:
clusterName: cluster

---

apiVersion: ipam.metal3.io/v1alpha1
kind: IPPool
metadata:
name: preprovisioning-pool

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3PreprovisioningDataTemplate
metadata:
name: preprov-data-template
spec:
preprovisioningNetworkData:
networks:
ipv4:
- id: eth0
ipAddressFromIPPool: preprovisioning-pool

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3PreprovisioningTemplate
metadata:
name: preprov-template
spec:
template:
spec:
dataTemplate:
name: preprov-data-template
hostSelector:
matchLabels:
pre-provisioning: foo

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3DataTemplate
metadata:
name: data-template
spec:
clusterName: cluster
networkData:
networks:
ipv4:
- id: eth0
ipAddressFromIPPool: pool-1

---

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: Metal3MachineTemplate
metadata:
name: machine-template
spec:
template:
spec:
dataTemplate:
name: data-template
hostSelector:
matchLabels:
cluster-role: control-plane

```

In this flow we have `preprovisioning-pool` which is not associated with any cluster, this is used to provide an IPAddress during
the pre-provisioning phase as described above. To reduce the required size of the pool, the IPClaim will be deleted after the
preprovisioning phase is completed, e.g the BMH resource becomes available.

In the provisioning phase another pool, associated with a cluster is used to template networkData as in the existing process.

#### Assumptions and Open Questions

TODO

### Inspection on initial registration

On initial registration of a host, inspection is triggered immediately but this process cannot complete without preprovisioning network configuration in a DHCP-less environment (because the IPA ramdisk can't connect back to the Ironic API).

This can be resolved if the BareMetalHost resources are created with the existing [paused annotation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#pausing-reconciliation), set to a pre-determined value (e.g `metal3.io/preprovisioning`) which can then be removed by the new controller after `preprovisioningNetworkDataName` has been set, then inspection will be able to succeed.

### Implementation Details/Notes/Constraints

#### IP Reuse

A related issue has been previously addressed via the [IP Reuse](https://github.com/metal3-io/cluster-api-provider-metal3/blob/main/docs/ip_reuse.md) functionality - this means we can couple IPClaims to the BareMetalHost resources which will enable consistent IP allocations for pre-provisioning and subsequent provisioning operations (provided the same IPPool is used for both steps)

### Risks and Mitigations

- TODO

### Work Items

TODO

### Dependencies

#### Firstboot agent support

An agent in the IPA ramdisk image is required to consume the network data provided via the processes outlined above.

The Ironic DHCP-less documentation describes using glean (a minimal python-based cloud-init alternative), but we don't
currently have any community-supported IPA ramdisk image containing this tool.

#### Potential config-drive conflict on redeployment


### Test Plan

TODO

### Upgrade / Downgrade Strategy

TODO

### Version Skew Strategy

N/A

## Drawbacks

TODO

## Alternatives


### Kanod

One possibility is to manage the lifecycle of `preprovisioningNetworkDataName` outside of
the Metal<sup>3</sup> core components - such an approach has been successfully demonstrated
in the [Kanod community](https://gitlab.com/Orange-OpenSource/kanod/) which is related to
the [Sylva](https://sylvaproject.org) project.

The design proposal here has been directly inspired by this work, but I think directly integrating
this functionality into CAPM3 has the following advantages:

* We can close a functional gap which potentially impacts many Metal<sup>3</sup> users, not only those involved with Kanod/Sylva
* Directly integrating into CAPM3 means we can use a common approach for `networkData` and `preprovisioningNetworkData`

## References

TODO

0 comments on commit 6276cd3

Please sign in to comment.