Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
90e5d7b
refactor: fleetconfig-controller as an addon
arturshadnik Sep 25, 2025
ef5943b
chore: tidt todos, improve cleanup
arturshadnik Sep 25, 2025
c2a19c2
feat: add new pivot condition; update tests
arturshadnik Sep 25, 2025
da104f7
test: expand e2e coverage to valdiate full spoke cleanup
arturshadnik Sep 25, 2025
282d2f6
chore: rabbit
arturshadnik Sep 25, 2025
d863fed
chore: rabbit nits and DRY
arturshadnik Sep 26, 2025
aaedc87
feat: improve addon validation, add tests
arturshadnik Sep 26, 2025
0b0b000
chore: make reviewable
arturshadnik Sep 26, 2025
2ba3cbf
chore: logs
arturshadnik Sep 26, 2025
674fb05
fix: add fallback cleanup if addon agent has not come up
arturshadnik Sep 26, 2025
e511abd
fix: relax fallback condition
arturshadnik Sep 26, 2025
93e3be6
fix: ensure appliedManifestWork cleaned up
arturshadnik Sep 26, 2025
191ea9c
docs: add diagram, walkthru
arturshadnik Sep 26, 2025
75a80cf
chore: words
arturshadnik Sep 26, 2025
1d89831
docs: words
arturshadnik Sep 27, 2025
7f07b1b
docs: update diagrams
arturshadnik Sep 29, 2025
4008691
docs: clarify wording, actions
arturshadnik Sep 29, 2025
488f421
feat: add upgrade conditions
arturshadnik Sep 29, 2025
ea4da0c
docs: break out work/reg controllers into separate actors
arturshadnik Sep 29, 2025
3dd932e
fix: clustermanager nil check before upgrade
arturshadnik Sep 29, 2025
157d0cb
test: add new conditions to tests
arturshadnik Sep 29, 2025
7028df8
test: add upgrades to test; remove kconf secret during test
arturshadnik Sep 30, 2025
67a83c4
feat: add a 3rd instance type to enable fallback non-addon mode for EKS
arturshadnik Oct 1, 2025
c29cc44
fix: update all spoke manager references to be agent
arturshadnik Oct 1, 2025
2d6f6fb
fix: make reviewable
arturshadnik Oct 1, 2025
a8ea9f8
fix: only set finalizers once; validate addons
arturshadnik Oct 1, 2025
f7bc644
chore: always use base image for agent
arturshadnik Oct 2, 2025
ec08d1e
docs: update dev guide
arturshadnik Oct 2, 2025
bbdfc36
chore: guard against unset env vars
arturshadnik Oct 2, 2025
f2a234c
docs: typo
arturshadnik Oct 2, 2025
a5560da
chore: bump image version
arturshadnik Oct 2, 2025
a1480b7
docs: typo
arturshadnik Oct 2, 2025
97b6607
feat: manager controls namespace lifecycle
arturshadnik Oct 2, 2025
13ced7a
test: update test values
arturshadnik Oct 2, 2025
af51379
feat: conditional secret purge; update chart
arturshadnik Oct 2, 2025
0b7fc48
test: update test values
arturshadnik Oct 2, 2025
d58c64b
chore: logs
arturshadnik Oct 3, 2025
a15a364
chore: logs
arturshadnik Oct 3, 2025
4eca542
chore: default values
arturshadnik Oct 3, 2025
0691cdc
fix: conditional klusterlet purge
arturshadnik Oct 3, 2025
e30ce49
fix: nil check cleanupConfig
arturshadnik Oct 3, 2025
62ccf9a
chore: rabbit
arturshadnik Oct 3, 2025
19fb141
chore: make reviewable
arturshadnik Oct 3, 2025
ac12489
fix: explicit fcc agent exclusion when in unified mode
arturshadnik Oct 3, 2025
6c83eb6
chore: some review comments
arturshadnik Oct 3, 2025
df3ae69
chore: narrow scope of ns perms
arturshadnik Oct 3, 2025
8145181
chore: make addon mw checks more robust
arturshadnik Oct 3, 2025
a09c84d
fix: use update
arturshadnik Oct 3, 2025
c91baa7
fix: add nil checks before clustermanager access
arturshadnik Oct 3, 2025
90ca883
chore: make reviewable
arturshadnik Oct 3, 2025
d9a9dd3
chore: review changes
arturshadnik Oct 3, 2025
e1e2fa5
chore: docstring
arturshadnik Oct 3, 2025
65fac35
feat: use standalone watcher for agent cleanup
arturshadnik Oct 3, 2025
edcdbcd
feat: validate watch config and set timeouts on api calls
arturshadnik Oct 3, 2025
0a5c147
chore: recover from watch panics
arturshadnik Oct 3, 2025
d28fd65
chore: make reviewable
arturshadnik Oct 3, 2025
891565d
feat: redact sensitive data in logs
arturshadnik Oct 3, 2025
c0a2e14
feat: sanitize output
arturshadnik Oct 3, 2025
9714339
chore: revert to naive output redaction
arturshadnik Oct 3, 2025
92d2a33
chore: fail fast on unexpected MW
arturshadnik Oct 3, 2025
9d3898e
fix: handle hub-side cleanup properly
arturshadnik Oct 3, 2025
352b2b3
chore: make reviewable
arturshadnik Oct 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 75 additions & 12 deletions fleetconfig-controller/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,16 @@

## 🌱 Project Overview

The `fleetconfig-controller` introduces a new `FleetConfig` custom resource to the OCM ecosystem. It reconciles `FleetConfig` resources to declaratively manage the lifecycle of Open Cluster Management (OCM) multi-clusters. The `fleetconfig-controller` will initialize an OCM hub and one or more spoke clusters; add, remove, and upgrade clustermanagers and klusterlets when their bundle versions change, manage their feature gates, and uninstall all OCM components properly whenever a `FleetConfig` is deleted.
The `fleetconfig-controller` introduces 2 new custom resources to the OCM ecosystem: `Hub` and `Spoke` . It reconciles `Hub` and `Spoke` resources to declaratively manage the lifecycle of Open Cluster Management (OCM) multi-clusters. The `fleetconfig-controller` will initialize an OCM hub and one or more spoke clusters; add, remove, and upgrade clustermanagers and klusterlets when their bundle versions change, manage their feature gates, and uninstall all OCM components properly whenever a `Hub` or `Spoke`s are deleted.

The controller is a lightweight wrapper around [clusteradm](https://github.com/open-cluster-management-io/clusteradm). Anything you can accomplish imperatively via a series of `clusteradm` commands can now be accomplished declaratively using the `fleetconfig-controller`.

`fleetconfig-controller` supports 2 modes of operation:
- `addonMode: true` (recommended): After the initial join, a `fleetconfig-controller-agent` will be installed on the spoke cluster as an OCM addon. Once installed, the agent will manage all day 2 operations for the spoke cluster asynchronously. For more information about addon mode, see [2-phase-spoke-reconcile.md](./docs/2-phase-spoke-reconcile.md).
- `addonMode: false`: All management of all spokes is done from the hub cluster. No agent is installed on the spoke cluster. Currently, this is the only mode supported for EKS.

For the deprecated `v1alpha1` `FleetConfig` API, addon mode is not supported.

## 🔧 Installation

The controller is installed via Helm.
Expand All @@ -16,18 +22,18 @@ helm repo update ocm
helm install fleetconfig-controller ocm/fleetconfig-controller -n fleetconfig-system --create-namespace
```

By default the Helm chart will also produce a `FleetConfig` to orchestrate, however that behaviour can be disabled. Refer to the chart [README](./charts/fleetconfig-controller/README.md) for full documentation.
By default the Helm chart will also produce a `Hub` and 1 `Spoke` (`hub-as-spoke`) to orchestrate, however that behaviour can be disabled. Refer to the chart [README](./charts/fleetconfig-controller/README.md) for full documentation.

## 🏗️ Support Matrix

Support for orchestration of OCM multi-clusters varies based on the Kubernetes distribution and/or cloud provider.

| Kubernetes Distribution | Support Level |
|-------------------------|--------------------|
| Vanilla Kubernetes | ✅ Fully Supported |
| Amazon EKS | ✅ Fully Supported |
| Google GKE | ✅ Fully Supported |
| Azure AKS | 🚧 On Roadmap |
| Kubernetes Distribution | Support Level |
|-------------------------|---------------------------------------|
| Vanilla Kubernetes | ✅ Fully Supported |
| Amazon EKS | ✅ Fully Supported (addonMode: false) |
| Google GKE | ✅ Fully Supported |
| Azure AKS | 🚧 On Roadmap |

## 🏃🏼‍♂️ Quick Start

Expand All @@ -40,10 +46,10 @@ Support for orchestration of OCM multi-clusters varies based on the Kubernetes d

### Onboarding

To familiarize yourself with the `FleetConfig` API and the `fleetconfig-controller`, we recommend doing one or more of the following onboarding steps.
To familiarize yourself with the `Hub` and `Spoke` APIs and the `fleetconfig-controller`, we recommend doing one or more of the following onboarding steps.

1. Step through a [smoke test](./docs/smoketests.md)
1. Invoke the [end-to-end tests](./test/e2e/fleetconfig.go) and inspect the content of the kind clusters that the E2E suite automatically creates
1. Invoke the [end-to-end tests](./test/e2e/v1beta1_hub_spoke.go) and inspect the content of the kind clusters that the E2E suite automatically creates

```bash
SKIP_CLEANUP=true make test-e2e
Expand All @@ -53,6 +59,7 @@ To familiarize yourself with the `FleetConfig` API and the `fleetconfig-controll

The `fleetconfig-controller` repository is pre-wired for development using [DevSpace](https://www.devspace.sh/docs/getting-started/introduction).

### Single cluster (Hub and `hub-as-spoke` Spoke development)
```bash
# Create a dev kind cluster
kind create cluster \
Expand All @@ -64,18 +71,58 @@ export KUBECONFIG=~/Downloads/fleetconfig-dev.kubeconfig
# Initialize a devspace development container
devspace run-pipeline dev -n fleetconfig-system
```
See [Debugging](#debugging) for instructions on how to start the fleetconfig controller manager in debug mode.

### Two clusters (Hub and Spoke development)
```bash
# Create two dev kind clusters
kind create cluster \
--name fleetconfig-dev-hub \
--kubeconfig ~/Downloads/fleetconfig-dev-hub.kubeconfig
export KUBECONFIG=~/Downloads/fleetconfig-dev-hub.kubeconfig

kind create cluster \
--name fleetconfig-dev-spoke \
--kubeconfig ~/Downloads/fleetconfig-dev-spoke.kubeconfig

# Get the spoke kind cluster's internal kubeconfig
kind get kubeconfig --name fleetconfig-dev-spoke --internal > ~/Downloads/fleetconfig-dev-spoke-internal.kubeconfig

# Initialize a devspace development container. This will bootstrap in hub-as-spoke mode.
devspace run-pipeline dev --namespace fleetconfig-system --force-build
```
See [Debugging](#debugging) for instructions on how to start the fleetconfig controller manager in debug mode.

In a new terminal session, execute the following commands to create a Spoke resource and start the fleetconfig controller agent on the spoke cluster.

```bash
# Create a secret containing the spoke cluster kubeconfig
export KUBECONFIG=~/Downloads/fleetconfig-dev-hub.kubeconfig
kubectl --namespace fleetconfig-system create secret generic spoke-kubeconfig \
--from-file=value=<absolute/path/to/fleetconfig-dev-spoke-internal.kubeconfig>

# Create a minimal Spoke resource
kubectl apply -f hack/dev/spoke.yaml

# Once fleetconfig-controller-agent is created on the spoke cluster, start the debug session
export KUBECONFIG=~/Downloads/fleetconfig-dev-spoke.kubeconfig
devspace run-pipeline debug-spoke --namespace fleetconfig-system --force-build --profile v1alpha1
```
The `--profile v1alpha1` flag disables installing the default Hub and Spoke resources.

See [Debugging](#debugging) for instructions on how to start the fleetconfig controller agent in debug mode.

### Debugging

- Hit up arrow, then enter from within the dev container to start a headless delve session
- Use the following launch config to connect VSCode with the delve session running in the dev container:
- Use one of the following launch configs to connect VSCode with the delve session running in the dev container:

```json
{
"version": "0.2.0",
"configurations": [
{
"name": "DevSpace",
"name": "DevSpace - Hub",
"type": "go",
"request": "attach",
"mode": "remote",
Expand All @@ -89,6 +136,22 @@ devspace run-pipeline dev -n fleetconfig-system
],
"showLog": true,
// "trace": "verbose", // useful for debugging delve (breakpoints not working, etc.)
},
{
"name": "DevSpace - Spoke",
"type": "go",
"request": "attach",
"mode": "remote",
"port": 2345,
"host": "127.0.0.1",
"substitutePath": [
{
"from": "${workspaceFolder}/fleetconfig-controller",
"to": "/workspace",
}
],
"showLog": true,
// "trace": "verbose", // useful for debugging delve (breakpoints not working, etc.)
}
]
}
Expand Down
108 changes: 105 additions & 3 deletions fleetconfig-controller/api/v1beta1/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,13 @@ package v1beta1
import "k8s.io/apimachinery/pkg/labels"

const (
// HubCleanupFinalizer is the finalizer for Hub cleanup.
// HubCleanupPreflightFinalizer is the finalizer for cleanup preflight checks hub cluster's controller instance. Used to signal to the spoke's controller that unjoin can proceed.
HubCleanupPreflightFinalizer = "fleetconfig.open-cluster-management.io/hub-cleanup-preflight"

// HubCleanupFinalizer is the finalizer for cleanup by the hub cluster's controller instance.
HubCleanupFinalizer = "fleetconfig.open-cluster-management.io/hub-cleanup"

// SpokeCleanupFinalizer is the finalizer for Spoke cleanup.
// SpokeCleanupFinalizer is the finalizer for cleanup by the spoke cluster's controller instance.
SpokeCleanupFinalizer = "fleetconfig.open-cluster-management.io/spoke-cleanup"
)

Expand All @@ -21,8 +24,17 @@ const (
// CleanupFailed means that a failure occurred during cleanup.
CleanupFailed = "CleanupFailed"

// SpokeJoined means that the spoke has successfully joined the Hub.
// SpokeJoined means that the Spoke has successfully joined the Hub.
SpokeJoined = "SpokeJoined"

// PivotComplete means that the spoke cluster has successfully started managing itself.
PivotComplete = "PivotComplete"

// KlusterletSynced means that Klusterlet's OCM bundle version and values are up to date.
KlusterletSynced = "KlusterletSynced"

// HubUpgradeFailed means that the ClusterManager version upgrade failed.
HubUpgradeFailed = "HubUpgradeFailed"
)

// Hub and Spoke condition reasons
Expand Down Expand Up @@ -65,6 +77,74 @@ const (
ManagedClusterTypeHubAsSpoke = "hub-as-spoke"
)

// Addon mode
const (
// InstanceTypeManager indicates that the controller is running in a Hub cluster and only handles day 1 Spoke operations.
InstanceTypeManager = "manager"

// InstanceTypeAgent indicates that the controller is running in a Spoke cluster and only handles day 2 Spoke operations.
InstanceTypeAgent = "agent"

// InstanceTypeUnified indicates that the controller is running in a Hub cluster and handles the entire lifecycle of Spoke resources.
InstanceTypeUnified = "unified"

// HubKubeconfigEnvVar is the environment variable containing the path to the mounted Hub kubeconfig.
HubKubeconfigEnvVar = "HUB_KUBECONFIG"

// DefaultHubKubeconfigPath is the path of the mounted kubeconfig when the controller is running in a Spoke cluster. Used if the environment variable is not set.
DefaultHubKubeconfigPath = "/managed/hub-kubeconfig/kubeconfig"

// SpokeNameEnvVar is the environment variable containing the name of the Spoke resource.
SpokeNameEnvVar = "CLUSTER_NAME"

// SpokeNamespaceEnvVar is the environment variable containing the namespace of the Spoke resource.
SpokeNamespaceEnvVar = "CLUSTER_NAMESPACE"

// HubNamespaceEnvVar is the environment variable containing the namespace of the Hub resource.
HubNamespaceEnvVar = "HUB_NAMESPACE"

// ControllerNamespaceEnvVar is the environment variable containing the namespace that the controller is deployed to.
ControllerNamespaceEnvVar = "CONTROLLER_NAMESPACE"

// ClusterRoleNameEnvVar is the environment variable containing the name of the ClusterRole for fleetconfig-controller-manager.
ClusterRoleNameEnvVar = "CLUSTER_ROLE_NAME"

// PurgeAgentNamespaceEnvVar is the environment variable used to signal to the agent whether or not it should garbage collect it install namespace.
PurgeAgentNamespaceEnvVar = "PURGE_AGENT_NAMESPACE"

// FCCAddOnName is the name of the fleetconfig-controller addon.
FCCAddOnName = "fleetconfig-controller-agent"

// DefaultFCCManagerRole is the default name of the fleetconfig-controller-manager ClusterRole.
DefaultFCCManagerRole = "fleetconfig-controller-manager-role"

// NamespaceOCM is the open-cluster-management namespace.
NamespaceOCM = "open-cluster-management"

// NamespaceOCMAgent is the namespace for the open-cluster-management agent.
NamespaceOCMAgent = "open-cluster-management-agent"

// NamespaceOCMAgentAddOn is the namespace for open-cluster-management agent addons.
NamespaceOCMAgentAddOn = "open-cluster-management-agent-addon"

// AgentCleanupWatcherName is the name of the watcher for cleaning up the spoke agent.
AgentCleanupWatcherName = "agent-cleanup-watcher"
)

// SupportedInstanceTypes are the valid cluster types that the controller can be installed in.
var SupportedInstanceTypes = []string{
InstanceTypeManager,
InstanceTypeAgent,
InstanceTypeUnified,
}

// OCMSpokeNamespaces are the namespaces created on an OCM managed cluster.
var OCMSpokeNamespaces = []string{
NamespaceOCM,
NamespaceOCMAgent,
NamespaceOCMAgentAddOn,
}

// FleetConfig labels
const (
// LabelManagedClusterType is the label key for the managed cluster type.
Expand Down Expand Up @@ -112,3 +192,25 @@ var (
// ManagedBySelector is a label selector for filtering add-on resources managed fleetconfig-controller.
ManagedBySelector = labels.SelectorFromSet(labels.Set(ManagedByLabels))
)

const (
// AddonArgoCD is the name of the built-in ArgoCD hub addon.
AddonArgoCD = "argocd"

// AddonGPF is the name of the built-in Governance Policy Framework hub addon.
AddonGPF = "governance-policy-framework"
)

// SupportedHubAddons are the built-in hub addons which clusteradm and fleetconfig-controller support.
var SupportedHubAddons = []string{
AddonArgoCD,
AddonGPF,
}

const (
// BundleVersionLatest is the latest OCM source version
BundleVersionLatest = "latest"

// BundleVersionDefault is the default OCM source version
BundleVersionDefault = "default"
)
45 changes: 39 additions & 6 deletions fleetconfig-controller/api/v1beta1/spoke_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,30 @@ type SpokeSpec struct {
// +kubebuilder:default:=0
// +optional
LogVerbosity int `json:"logVerbosity,omitempty"`

// CleanupConfig is used to configure which resources should be automatically garbage collected during cleanup.
// +kubebuilder:default:={}
// +required
CleanupConfig CleanupConfig `json:"cleanupConfig"`
}

// CleanupConfig is the configuration for cleaning up resources during Spoke cleanup.
type CleanupConfig struct {
// If true, the agent will attempt to garbage collect its own namespace after the spoke cluster is unjoined.
// +kubebuilder:default:=false
// +optional
PurgeAgentNamespace bool `json:"purgeAgentNamespace,omitempty"`

// If set, the klusterlet operator will be purged and all open-cluster-management namespaces deleted
// when the klusterlet is unjoined from its Hub cluster.
// +kubebuilder:default:=true
// +optional
PurgeKlusterletOperator bool `json:"purgeKlusterletOperator,omitempty"`

// If set, the kubeconfig secret will be automatically deleted after the agent has taken over managing the Spoke.
// +kubebuilder:default:=false
// +optional
PurgeKubeconfigSecret bool `json:"purgeKubeconfigSecret,omitempty"`
}

// HubRef is the information required to get a Hub resource.
Expand All @@ -98,6 +122,21 @@ func (s *Spoke) IsManagedBy(om metav1.ObjectMeta) bool {
return s.Spec.HubRef.Name == om.Name && s.Spec.HubRef.Namespace == om.Namespace
}

// IsHubAsSpoke returns true if the cluster is a hub-as-spoke. Determined either by name `hub-as-spoke` or an InCluster kubeconfig
func (s *Spoke) IsHubAsSpoke() bool {
return s.Name == ManagedClusterTypeHubAsSpoke || s.Spec.Kubeconfig.InCluster
}

// PivotComplete return true if the spoke's agent has successfully started managing day 2 operations.
func (s *Spoke) PivotComplete() bool {
jc := s.GetCondition(SpokeJoined)
if jc == nil || jc.Status != metav1.ConditionTrue {
return false
}
pc := s.GetCondition(PivotComplete)
return pc != nil && pc.Status == metav1.ConditionTrue
}

// Klusterlet is the configuration for a klusterlet.
type Klusterlet struct {
// Annotations to apply to the spoke cluster. If not present, the 'agent.open-cluster-management.io/' prefix is added to each key.
Expand All @@ -124,12 +163,6 @@ type Klusterlet struct {
// +optional
Mode string `json:"mode,omitempty"`

// If set, the klusterlet operator will be purged and all open-cluster-management namespaces deleted
// when the klusterlet is unjoined from its Hub cluster.
// +kubebuilder:default:=true
// +optional
PurgeOperator bool `json:"purgeOperator,omitempty"`

// If true, the installed klusterlet agent will start the cluster registration process by looking for the
// internal endpoint from the public cluster-info in the Hub cluster instead of using hubApiServer.
// +optional
Expand Down
16 changes: 16 additions & 0 deletions fleetconfig-controller/api/v1beta1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion fleetconfig-controller/build/Dockerfile.base
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ COPY go.sum go.sum
RUN go mod download

# Copy the go source
COPY cmd/main.go cmd/main.go
COPY cmd/ cmd/
COPY api/ api/
COPY internal/ internal/
COPY pkg/ pkg/
Expand Down
2 changes: 1 addition & 1 deletion fleetconfig-controller/build/Dockerfile.devspace
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ COPY go.sum go.sum
RUN go mod download

# Copy the go source
COPY cmd/main.go cmd/main.go
COPY cmd/ cmd/
COPY api/ api/
COPY internal/ internal/
COPY pkg/ pkg/
Loading
Loading