Skip to content

Conversation

@arturshadnik
Copy link
Member

@arturshadnik arturshadnik commented Nov 5, 2025

  • Sets finalizers later to prevent a scenario where they become stuck. The issue was that, if a join failed outright, the unjoin would fail also and prevent the finalizer being removed.
  • If hub init failed, cleanup no longer becomes blocked.
  • Simplify clusterissuer handling in devspace

Summary by CodeRabbit

  • Chores

    • Bumped fleetconfig-controller image tag to v0.1.4.
    • Removed automated deployment step that applied a cluster issuer during local/dev pipelines.
    • Updated chart README with the new default image tag.
  • Bug Fixes

    • Safer hub cleanup with additional validation before proceeding.
    • Clearer spoke reconciliation and deletion flow; improved finalizer handling and post-join finalizer addition.
    • Helm chart templates now inject standardized chart labels.
  • Tests

    • Test setup now requires CRD paths and uses a project-root webhook config path; updated finalizer expectation in tests.

Signed-off-by: Artur Shad Nik <arturshadnik@gmail.com>
Signed-off-by: Artur Shad Nik <arturshadnik@gmail.com>
Signed-off-by: Artur Shad Nik <arturshadnik@gmail.com>
@openshift-ci
Copy link

openshift-ci bot commented Nov 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arturshadnik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Nov 5, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Bumps fleetconfig-controller image tag to v0.1.4, injects chart labels into the ClusterIssuer template, removes cluster-issuer apply steps from devspace pipelines, and refactors Hub/Spoke controllers, handler, and tests for finalizer, deletion, and test CRD path handling.

Changes

Cohort / File(s) Summary
Chart image tag
fleetconfig-controller/charts/fleetconfig-controller/README.md, fleetconfig-controller/charts/fleetconfig-controller/values.yaml
Updated image.tag from v0.1.3 to v0.1.4.
Helm templates
fleetconfig-controller/charts/fleetconfig-controller/templates/clusterissuer.yaml
Added labels under ClusterIssuer metadata via include "chart.labels".
Devspace pipeline
fleetconfig-controller/devspace.yaml
Removed kubectl apply -f ./hack/dev/cluster-issuer.yaml steps from dev and deploy-local pipelines.
Hub controller
fleetconfig-controller/internal/controller/v1beta1/hub_controller.go
In cleanHub, added an Operator client Get for ClusterManager named cluster-manager; if NotFound, log skip and remove HubCleanupFinalizer (no error); propagate other errors.
Spoke controller
fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go
Added DeletionTimestamp guard around instance-type switch; reorganized per-instance finalizer setup and requeue behavior; added panic on unknown instance types; consolidated flow to return early after pre-join handling.
Spoke handler
fleetconfig-controller/internal/controller/v1beta1/spoke_handler.go
After successful join, add SpokeCleanupFinalizer for non-Agent instances when absent and log info.
Tests / test harness
fleetconfig-controller/api/v1alpha1/webhook_suite_test.go, fleetconfig-controller/internal/webhook/v1beta1/webhook_suite_test.go, fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
Set ErrorIfCRDPathMissing: true; adjusted webhook install Paths to use project-root config/webhook; updated a test expectation changing finalizer from SpokeCleanupFinalizer to HubCleanupFinalizer.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Attention areas:
    • internal/controller/v1beta1/spoke_controller.go — control-flow, finalizer semantics, requeue behavior and panic on unknown types.
    • internal/controller/v1beta1/hub_controller.go — ClusterManager Get logic and finalizer removal implications.
    • Test harness files (webhook_suite_test.go, webhook install paths) — CRD path enforcement may affect CI/test runs and require environment updates.

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • ahmad-ibra
  • TylerGillson

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references the main changes: setting finalizers later in unified mode and unblocking hub deletion when initialization fails, which align with the PR's core objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 703bbb0 and 74e6235.

📒 Files selected for processing (1)
  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:125-133
Timestamp: 2025-09-25T23:18:41.573Z
Learning: In the fleetconfig-controller spoke deletion flow, SpokeCleanupFinalizer is always removed before HubCleanupFinalizer. This means that checking for the existence of HubCleanupFinalizer in the deletion logic is sufficient regardless of cluster type, as any SpokeCleanupFinalizer would have already been removed by the time the hub cleanup runs.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 58
File: fleetconfig-controller/charts/fleetconfig-controller/README.md:155-155
Timestamp: 2025-08-27T21:58:32.141Z
Learning: In the open-cluster-management-io/lab repository, the fleetconfig-controller follows a workflow where chart version bumps (in README.md and values.yaml) are included in PRs before the corresponding Docker image exists. The Docker image is built and pushed automatically via GitHub release workflows after the PR is merged and tagged, making the referenced version available.
Learnt from: TylerGillson
Repo: open-cluster-management-io/lab PR: 51
File: fleetconfig-controller/charts/fleetconfig-controller/README.md:155-155
Timestamp: 2025-08-22T17:55:52.159Z
Learning: In the open-cluster-management-io/lab repository, chart versioning for fleetconfig-controller is handled automatically via GitHub release workflows, not through manual version bumps in Chart.yaml during regular PRs.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/PROJECT:28-31
Timestamp: 2025-09-22T18:42:03.404Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller, the PROJECT file defines multiple API resources with different webhook configurations: FleetConfig v1alpha1 has defaulting: true (requiring MutatingWebhookConfiguration), while Hub and Spoke v1beta1 resources have defaulting: false. MutatingWebhookConfiguration resources in the manifests serve the v1alpha1 FleetConfig, not the v1beta1 Hub/Spoke resources.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/charts/fleetconfig-controller/templates/ocm/fcc-addon/addon-template.yaml:110-112
Timestamp: 2025-09-25T23:31:11.630Z
Learning: The fleetconfig-controller-manager spoke agent requires create/update/patch/delete permissions on CustomResourceDefinitions because `clusteradm upgrade klusterlet` operations need create/update permissions and cleanup operations require delete permissions for proper lifecycle management.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 52
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go:49-56
Timestamp: 2025-08-22T19:38:49.769Z
Learning: In the fleetconfig-controller project, the SpokeSpec and HubSpec structs in v1beta1 contain only optional fields (like Foo *string with omitempty tags), so creating these resources without populating the Spec field does not cause validation failures in tests.
📚 Learning: 2025-09-25T23:18:41.573Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:125-133
Timestamp: 2025-09-25T23:18:41.573Z
Learning: In the fleetconfig-controller spoke deletion flow, SpokeCleanupFinalizer is always removed before HubCleanupFinalizer. This means that checking for the existence of HubCleanupFinalizer in the deletion logic is sufficient regardless of cluster type, as any SpokeCleanupFinalizer would have already been removed by the time the hub cleanup runs.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
📚 Learning: 2025-08-22T19:38:49.769Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 52
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go:49-56
Timestamp: 2025-08-22T19:38:49.769Z
Learning: In the fleetconfig-controller project, the SpokeSpec and HubSpec structs in v1beta1 contain only optional fields (like Foo *string with omitempty tags), so creating these resources without populating the Spec field does not cause validation failures in tests.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
📚 Learning: 2025-09-22T18:42:03.404Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/PROJECT:28-31
Timestamp: 2025-09-22T18:42:03.404Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller, the PROJECT file defines multiple API resources with different webhook configurations: FleetConfig v1alpha1 has defaulting: true (requiring MutatingWebhookConfiguration), while Hub and Spoke v1beta1 resources have defaulting: false. MutatingWebhookConfiguration resources in the manifests serve the v1alpha1 FleetConfig, not the v1beta1 Hub/Spoke resources.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
📚 Learning: 2025-09-12T22:46:57.106Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:328-331
Timestamp: 2025-09-12T22:46:57.106Z
Learning: In fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go, nil klusterletValues returned by mergeKlusterletValues is a valid state, not an error condition. The downstream prepareKlusterletValuesFile function properly handles nil values by returning early, making additional nil checks unnecessary and potentially harmful to the intended flow.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
📚 Learning: 2025-09-12T22:46:57.106Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:328-331
Timestamp: 2025-09-12T22:46:57.106Z
Learning: In fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go, nil klusterletValues returned by mergeKlusterletValues is a valid state, not an error condition. The downstream prepareKlusterletValuesFile function properly handles nil values by returning early with (nil, nil, nil), making additional nil checks unnecessary and potentially harmful to the intended flow.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
📚 Learning: 2025-09-22T19:16:34.109Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/internal/webhook/v1beta1/validation.go:103-121
Timestamp: 2025-09-22T19:16:34.109Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller v1beta1 API, the Klusterlet field in SpokeSpec is defined as a struct value (Klusterlet Klusterlet), not a pointer (*Klusterlet), so direct field access like Klusterlet.Annotations is safe without nil checks. The Klusterlet struct does not contain a Source field.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
📚 Learning: 2025-09-26T04:15:34.475Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/internal/webhook/v1beta1/validation_test.go:699-701
Timestamp: 2025-09-26T04:15:34.475Z
Learning: In fleetconfig-controller/internal/webhook/v1beta1/validation_test.go, the TestValidateAddonUniqueness test has flawed assertion logic at lines 763-769 that uses `if len(err.Error()) >= len(expectedMsg)` for partial matching, which causes false positives when actual error messages are longer than expected messages, allowing tests to pass with completely incorrect expected error messages.

Applied to files:

  • fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go
🧬 Code graph analysis (1)
fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go (1)
fleetconfig-controller/api/v1beta1/constants.go (1)
  • HubCleanupFinalizer (10-10)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: e2e (fleetconfig-controller) / e2e
  • GitHub Check: test (fleetconfig-controller) / Ensure PR is reviewable & run tests
  • GitHub Check: test (fleetconfig-controller) / Run Helm Chart Tests
  • GitHub Check: e2e (fleetconfig-controller) / e2e
  • GitHub Check: test (fleetconfig-controller) / Ensure PR is reviewable & run tests
🔇 Additional comments (1)
fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go (1)

119-120: LGTM! Test correctly updated to reflect API change.

The test expectation now correctly verifies that HubCleanupFinalizer is added to the Spoke resource after reconciliation, aligning with the API change from SpokeCleanupFinalizer to HubCleanupFinalizer. This change is consistent with the PR's objective to modify finalizer timing and cleanup behavior in unified mode.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Artur Shad Nik <arturshadnik@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fleetconfig-controller/api/v1alpha1/webhook_suite_test.go (1)

81-90: Approve ErrorIfCRDPathMissing change; consider aligning webhook path with v1beta1.

Setting ErrorIfCRDPathMissing: true is good for failing fast on missing CRDs.

However, the webhook path at line 88 uses a relative path filepath.Join("..", "..", "config", "webhook"), while the v1beta1 test file (fleetconfig-controller/internal/webhook/v1beta1/webhook_suite_test.go:96) was updated to use an absolute path filepath.Join(root, "config", "webhook"). Since root is already available at line 77-78, consider using the same pattern for consistency and robustness.

Apply this diff to align with v1beta1:

 	WebhookInstallOptions: envtest.WebhookInstallOptions{
-		Paths: []string{filepath.Join("..", "..", "config", "webhook")},
+		Paths: []string{filepath.Join(root, "config", "webhook")},
 	},
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f52b69 and 703bbb0.

📒 Files selected for processing (2)
  • fleetconfig-controller/api/v1alpha1/webhook_suite_test.go (1 hunks)
  • fleetconfig-controller/internal/webhook/v1beta1/webhook_suite_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller.go:125-133
Timestamp: 2025-09-25T23:18:41.573Z
Learning: In the fleetconfig-controller spoke deletion flow, SpokeCleanupFinalizer is always removed before HubCleanupFinalizer. This means that checking for the existence of HubCleanupFinalizer in the deletion logic is sufficient regardless of cluster type, as any SpokeCleanupFinalizer would have already been removed by the time the hub cleanup runs.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 58
File: fleetconfig-controller/charts/fleetconfig-controller/README.md:155-155
Timestamp: 2025-08-27T21:58:32.141Z
Learning: In the open-cluster-management-io/lab repository, the fleetconfig-controller follows a workflow where chart version bumps (in README.md and values.yaml) are included in PRs before the corresponding Docker image exists. The Docker image is built and pushed automatically via GitHub release workflows after the PR is merged and tagged, making the referenced version available.
Learnt from: TylerGillson
Repo: open-cluster-management-io/lab PR: 51
File: fleetconfig-controller/charts/fleetconfig-controller/README.md:155-155
Timestamp: 2025-08-22T17:55:52.159Z
Learning: In the open-cluster-management-io/lab repository, chart versioning for fleetconfig-controller is handled automatically via GitHub release workflows, not through manual version bumps in Chart.yaml during regular PRs.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/PROJECT:28-31
Timestamp: 2025-09-22T18:42:03.404Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller, the PROJECT file defines multiple API resources with different webhook configurations: FleetConfig v1alpha1 has defaulting: true (requiring MutatingWebhookConfiguration), while Hub and Spoke v1beta1 resources have defaulting: false. MutatingWebhookConfiguration resources in the manifests serve the v1alpha1 FleetConfig, not the v1beta1 Hub/Spoke resources.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 52
File: fleetconfig-controller/internal/controller/v1beta1/spoke_controller_test.go:49-56
Timestamp: 2025-08-22T19:38:49.769Z
Learning: In the fleetconfig-controller project, the SpokeSpec and HubSpec structs in v1beta1 contain only optional fields (like Foo *string with omitempty tags), so creating these resources without populating the Spec field does not cause validation failures in tests.
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/charts/fleetconfig-controller/templates/ocm/fcc-addon/addon-template.yaml:110-112
Timestamp: 2025-09-25T23:31:11.630Z
Learning: The fleetconfig-controller-manager spoke agent requires create/update/patch/delete permissions on CustomResourceDefinitions because `clusteradm upgrade klusterlet` operations need create/update permissions and cleanup operations require delete permissions for proper lifecycle management.
📚 Learning: 2025-09-26T04:15:34.475Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/internal/webhook/v1beta1/validation_test.go:699-701
Timestamp: 2025-09-26T04:15:34.475Z
Learning: In fleetconfig-controller/internal/webhook/v1beta1/validation_test.go, the TestValidateAddonUniqueness test has flawed assertion logic at lines 763-769 that uses `if len(err.Error()) >= len(expectedMsg)` for partial matching, which causes false positives when actual error messages are longer than expected messages, allowing tests to pass with completely incorrect expected error messages.

Applied to files:

  • fleetconfig-controller/internal/webhook/v1beta1/webhook_suite_test.go
  • fleetconfig-controller/api/v1alpha1/webhook_suite_test.go
📚 Learning: 2025-09-22T18:42:03.404Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 59
File: fleetconfig-controller/PROJECT:28-31
Timestamp: 2025-09-22T18:42:03.404Z
Learning: In the open-cluster-management-io/lab repository's fleetconfig-controller, the PROJECT file defines multiple API resources with different webhook configurations: FleetConfig v1alpha1 has defaulting: true (requiring MutatingWebhookConfiguration), while Hub and Spoke v1beta1 resources have defaulting: false. MutatingWebhookConfiguration resources in the manifests serve the v1alpha1 FleetConfig, not the v1beta1 Hub/Spoke resources.

Applied to files:

  • fleetconfig-controller/internal/webhook/v1beta1/webhook_suite_test.go
  • fleetconfig-controller/api/v1alpha1/webhook_suite_test.go
📚 Learning: 2025-09-26T04:13:12.146Z
Learnt from: arturshadnik
Repo: open-cluster-management-io/lab PR: 69
File: fleetconfig-controller/internal/webhook/v1beta1/validation_test.go:719-724
Timestamp: 2025-09-26T04:13:12.146Z
Learning: The TestValidateAddonUniqueness function in fleetconfig-controller/internal/webhook/v1beta1/validation_test.go has flawed assertion logic that uses `len(err.Error()) >= len(expectedMsg)` to check error messages, which doesn't actually validate message content and will pass for any error longer than the expected message.

Applied to files:

  • fleetconfig-controller/api/v1alpha1/webhook_suite_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: e2e (fleetconfig-controller) / e2e
  • GitHub Check: e2e (fleetconfig-controller) / e2e
  • GitHub Check: test (fleetconfig-controller) / Run Helm Chart Tests
  • GitHub Check: test (fleetconfig-controller) / Ensure PR is reviewable & run tests
  • GitHub Check: test (fleetconfig-controller) / Ensure PR is reviewable & run tests
🔇 Additional comments (1)
fleetconfig-controller/internal/webhook/v1beta1/webhook_suite_test.go (1)

88-98: LGTM! Test environment configuration improved.

Setting ErrorIfCRDPathMissing: true ensures tests fail fast when CRD paths are missing, and using an absolute path from root for webhook configuration is more robust than relative path navigation.

Signed-off-by: Artur Shad Nik <arturshadnik@gmail.com>
@ahmad-ibra
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Nov 5, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 3c4948d into open-cluster-management-io:main Nov 5, 2025
17 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants