Skip to content

MGMT-23509: Allow restoring soft deleted hosts#10185

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
CrystalChun:restore-host-fix
Apr 23, 2026
Merged

MGMT-23509: Allow restoring soft deleted hosts#10185
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
CrystalChun:restore-host-fix

Conversation

@CrystalChun
Copy link
Copy Markdown
Contributor

@CrystalChun CrystalChun commented Apr 21, 2026

A host that was soft-deleted and not removed from the DB
will prevent that same host from being restored due to
primary key conflict.

To fix this, we should delete the soft-deleted host before
we try to restore and recreate it.

This allows restoring hosts on the same hub that may have
just had a disaster scenario with the hosted cluster.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested): Booted host with discovery ISO from infraenv, installed to hosted cluster, made a backup of the Agent and Infraenv CRs,
  • No tests needed

Manual test steps

  1. Create BMH
  2. Create HostedCluster
  3. Scaled up NodePool to install host to cluster
  4. Wait until Agent finishes installing into hosted cluster
  5. Make backup of Agent CR
  6. Annotate all Agent CR with agent.agent-install.openshift.io/skip-spoke-cleanup: true <-- prevents Agents (hosts) from being removed from the hosted cluster as nodes
  7. Delete Agent CR
  8. Recreate Agent CR
  9. Observe the Agent get restored

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Summary by CodeRabbit

  • Bug Fixes

    • Host recreation now removes soft-deleted entries only within the same infrastructure environment before creating a new active host.
    • Restore/conflict detection respects infra-environment scope to prevent cross-environment conflicts.
  • Tests

    • Added tests for composite-key host lookups, scoped deletion/recreation, conflict scenarios, and restoration after soft-delete.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@CrystalChun: This pull request explicitly references no jira issue.

Details

In response to this:

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested): Booted host with discovery ISO from infraenv, installed to hosted cluster, made a backup of the Agent and Infraenv CRs,
  • No tests needed

Manual test steps

  1. Create BMH
  2. Create HostedCluster
  3. Scaled up NodePool to install host to cluster
  4. Wait until Agent finishes installing into hosted cluster
  5. Make backup of Agent CR
  6. Annotate all Agent CR with agent.agent-install.openshift.io/skip-spoke-cleanup: true <-- prevents Agents (hosts) from being removed from the hosted cluster as nodes
  7. Delete Agent CR
  8. Recreate Agent CR
  9. Observe the Agent get restored

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 21, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Controller and inventory flows now use an InfraEnv-scoped composite key (InfraEnvID+HostID) for host lookups; inventory routines hard-delete matching soft-deleted host rows before inserting new hosts; tests added for composite lookups and DeleteSoftDeletedHost behavior.

Changes

Cohort / File(s) Summary
Inventory / DB
internal/bminventory/inventory.go, internal/bminventory/inventory_test.go, internal/common/db.go, internal/common/db_test.go
Inventory now calls DeleteSoftDeletedHost(...) (new helper) to hard-delete soft-deleted rows matching (infra_env_id, id) before inserting a new host; added tests for GetCommonHostInternal, create/replace semantics, and DeleteSoftDeletedHost.
Controller
internal/controller/controllers/agent_controller.go, internal/controller/controllers/agent_controller_test.go
Switched precondition lookup from ID-only to composite GetCommonHostInternal(infraEnvID, hostID); updated conflict logging/messages and tests to reflect composite-key behavior and soft-delete recreation path.
Host Manager
internal/host/host.go
Replaced inline unscoped delete with call to common.DeleteSoftDeletedHost(...) during Manager.RegisterHost path for handling prior soft-deleted records; adjusted error handling accordingly.

Sequence Diagram

sequenceDiagram
    participant Ctrl as Agent Controller
    participant Inv as Inventory
    participant DB as Database

    Ctrl->>Inv: GetCommonHostInternal(infraEnvID, hostID)
    Inv->>DB: SELECT host WHERE infra_env_id=X AND id=Y AND deleted_at IS NULL
    alt Active host exists
        DB-->>Inv: Host record
        Inv-->>Ctrl: Host exists (conflict)
        Ctrl->>Ctrl: Conflict handling / abort create
    else No active host
        DB-->>Inv: No result
        Inv-->>Ctrl: gorm.ErrRecordNotFound
        Ctrl->>Inv: CreateHostInKubeKeyNamespace(host)
        Inv->>DB: DELETE FROM hosts WHERE infra_env_id=X AND id=Y AND deleted_at IS NOT NULL  (Unscoped hard-delete)
        DB-->>Inv: Delete result
        Inv->>DB: INSERT new host row
        DB-->>Inv: Insert result
        Inv-->>Ctrl: New host created
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title 'NO-ISSUE: Allow restoring soft deleted hosts' is concise, clear, and directly describes the main change—enabling host restoration for soft-deleted records.
Description check ✅ Passed The PR description includes a clear problem statement, the fix rationale, manual test steps, relevant checklist items, and environment impact. All required template sections are addressed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names use stable, descriptive static strings without dynamic content like UUIDs, timestamps, or generated identifiers.
Test Structure And Quality ✅ Passed Ginkgo tests demonstrate solid quality with single responsibility, proper setup/cleanup in BeforeEach/AfterEach blocks, appropriate absence of timeouts for unit-level database tests, and full consistency with existing codebase patterns.
Microshift Test Compatibility ✅ Passed The added tests are unit tests in internal/ directories, not Ginkgo e2e tests. They test database and service logic with mocks, not Kubernetes/OpenShift APIs.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds unit tests in internal packages, not Ginkgo e2e tests. Unit tests do not execute against OpenShift clusters and are not subject to SNO topology constraints.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only backend Go code changes for database and controller logic related to soft-deleted host cleanup with no deployment manifests, operator code, or Kubernetes scheduling configuration modifications.
Ote Binary Stdout Contract ✅ Passed No stdout writes detected that would corrupt OTE binary JSON output. All code changes are confined to function bodies with no modifications to top-level initialization or suite setup.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Tests use database mocking and unit test patterns with no IPv4 assumptions or external connectivity requirements detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 21, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/controller/controllers/agent_controller_test.go (1)

6217-6247: Soft-delete test currently duplicates the generic not-found scenario.

Line [6233] and Line [6235] mirror the same setup already used in the prior "no existing host found" test, so this case does not uniquely validate soft-deleted-record handling and can give false confidence. Consider merging/renaming this test, or making it assert a distinct soft-delete-specific contract.

✏️ Minimal rename to avoid misleading coverage claim
-It("should handle soft-deleted host - CreateHostInKubeKeyNamespace will delete and recreate", func() {
+It("should create host when composite lookup is not found", func() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/controllers/agent_controller_test.go` around lines 6217 -
6247, The test "should handle soft-deleted host - CreateHostInKubeKeyNamespace
will delete and recreate" currently duplicates the generic not-found scenario by
having mockInstallerInternal.EXPECT().GetHostByKubeKey(...) and
GetHostByInfraEnvAndID(...) both return gorm.ErrRecordNotFound; instead make the
test exercise a true soft-delete path by returning a Host object marked as
deleted (or a sentinel error/value your code uses for soft-deleted records) from
GetHostByKubeKey (or GetHostByInfraEnvAndID as appropriate) and then assert that
CreateHostInKubeKeyNamespace is called to delete/recreate; update the mocks
around GetHostByKubeKey, GetHostByInfraEnvAndID and the expectation on
CreateHostInKubeKeyNamespace in the test so the inputs/returns reflect a
soft-deleted Host rather than ErrRecordNotFound (or alternatively rename the
test to reflect it’s identical to the not-found case if you choose not to change
behavior).
internal/bminventory/inventory_test.go (1)

22348-22354: Consider extracting repeated DB fixture setup into helpers.

InfraEnv/host creation patterns repeat across scenarios; small test helpers would reduce noise and make future scenario additions safer.

Also applies to: 22389-22395, 22420-22434, 22496-22503, 22532-22539, 22595-22609

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bminventory/inventory_test.go` around lines 22348 - 22354, Several
tests repeatedly create infra envs and hosts (e.g., constructing infraEnv :=
&common.InfraEnv{InfraEnv: models.InfraEnv{ID: &infraEnvID}} followed by
Expect(db.Create(infraEnv).Error).ToNot(HaveOccurred())) — extract that setup
into small test helper(s) such as createTestInfraEnv(db, infraEnvID) and
createTestHost(db, infraEnvID, hostID, opts) placed in the test package (e.g.,
internal/bminventory/test_helpers.go); update each scenario to call these
helpers instead of duplicating the infraEnv/host construction and db.Create
calls so the repeated patterns at infraEnv/host creation sites (references:
infraEnv, models.InfraEnv, db.Create) are consolidated and reused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/bminventory/inventory.go`:
- Around line 5816-5825: The Unscoped delete currently removes any row matching
id/infra_env_id regardless of soft-delete status; update the deletion to only
hard-delete rows that are soft-deleted by adding a constraint on the soft-delete
column (e.g., include "deleted_at IS NOT NULL" or check the GORM DeletedAt
field) in the query used with b.db.Unscoped().Delete so that common.Host rows
are only permanently removed when host.ID and host.InfraEnvID match AND the row
is soft-deleted.

---

Nitpick comments:
In `@internal/bminventory/inventory_test.go`:
- Around line 22348-22354: Several tests repeatedly create infra envs and hosts
(e.g., constructing infraEnv := &common.InfraEnv{InfraEnv: models.InfraEnv{ID:
&infraEnvID}} followed by
Expect(db.Create(infraEnv).Error).ToNot(HaveOccurred())) — extract that setup
into small test helper(s) such as createTestInfraEnv(db, infraEnvID) and
createTestHost(db, infraEnvID, hostID, opts) placed in the test package (e.g.,
internal/bminventory/test_helpers.go); update each scenario to call these
helpers instead of duplicating the infraEnv/host construction and db.Create
calls so the repeated patterns at infraEnv/host creation sites (references:
infraEnv, models.InfraEnv, db.Create) are consolidated and reused.

In `@internal/controller/controllers/agent_controller_test.go`:
- Around line 6217-6247: The test "should handle soft-deleted host -
CreateHostInKubeKeyNamespace will delete and recreate" currently duplicates the
generic not-found scenario by having
mockInstallerInternal.EXPECT().GetHostByKubeKey(...) and
GetHostByInfraEnvAndID(...) both return gorm.ErrRecordNotFound; instead make the
test exercise a true soft-delete path by returning a Host object marked as
deleted (or a sentinel error/value your code uses for soft-deleted records) from
GetHostByKubeKey (or GetHostByInfraEnvAndID as appropriate) and then assert that
CreateHostInKubeKeyNamespace is called to delete/recreate; update the mocks
around GetHostByKubeKey, GetHostByInfraEnvAndID and the expectation on
CreateHostInKubeKeyNamespace in the test so the inputs/returns reflect a
soft-deleted Host rather than ErrRecordNotFound (or alternatively rename the
test to reflect it’s identical to the not-found case if you choose not to change
behavior).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 88eec7b0-5a93-465d-9738-2604e6d1bc62

📥 Commits

Reviewing files that changed from the base of the PR and between 8812016 and 0cc290a.

📒 Files selected for processing (5)
  • internal/bminventory/inventory.go
  • internal/bminventory/inventory_test.go
  • internal/bminventory/mock_installer_internal.go
  • internal/controller/controllers/agent_controller.go
  • internal/controller/controllers/agent_controller_test.go

Comment thread internal/bminventory/inventory.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/bminventory/inventory.go (1)

192-192: Reuse the existing composite-key lookup instead of adding a second one.

Line 192 adds another InstallerInternals method for the same (infraEnvID, hostID) lookup that GetCommonHostInternal(ctx, infraEnvId, hostId) already exposes and implements later in this file. Keeping both grows the interface and generated mocks for identical behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bminventory/inventory.go` at line 192, The interface adds a
duplicate method GetHostByInfraEnvAndID for the same (infraEnvID, hostID) lookup
already implemented by GetCommonHostInternal; remove the redundant
GetHostByInfraEnvAndID entry from InstallerInternals and update any callers to
use GetCommonHostInternal(ctx, infraEnvId, hostId) instead so the single
composite-key lookup is reused and mocks aren’t duplicated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/bminventory/inventory.go`:
- Line 192: The interface adds a duplicate method GetHostByInfraEnvAndID for the
same (infraEnvID, hostID) lookup already implemented by GetCommonHostInternal;
remove the redundant GetHostByInfraEnvAndID entry from InstallerInternals and
update any callers to use GetCommonHostInternal(ctx, infraEnvId, hostId) instead
so the single composite-key lookup is reused and mocks aren’t duplicated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 41db0bd2-314b-4de5-82b5-b110f06834c7

📥 Commits

Reviewing files that changed from the base of the PR and between 0cc290a and dcd3df2.

📒 Files selected for processing (6)
  • internal/bminventory/inventory.go
  • internal/bminventory/inventory_test.go
  • internal/bminventory/mock_installer_internal.go
  • internal/controller/controllers/agent_controller.go
  • internal/controller/controllers/agent_controller_test.go
  • internal/host/host.go
✅ Files skipped from review due to trivial changes (1)
  • internal/host/host.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/bminventory/inventory_test.go
  • internal/bminventory/mock_installer_internal.go
  • internal/controller/controllers/agent_controller_test.go

@CrystalChun CrystalChun force-pushed the restore-host-fix branch 3 times, most recently from 216ac99 to 847e6b9 Compare April 21, 2026 19:39
@CrystalChun
Copy link
Copy Markdown
Contributor Author

/retitle NO-ISSUE: Allow restoring soft deleted hosts

@openshift-ci openshift-ci Bot changed the title NO-ISSUE: Restore hosts in the same hub NO-ISSUE: Allow restoring soft deleted hosts Apr 21, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/bminventory/inventory_test.go (1)

22648-22651: Assert soft-delete state explicitly for infraEnv2

Line 22648 currently verifies only unscoped existence. Add a scoped count assertion (Count == 0) to prove the record in infraEnv2 remained soft-deleted, not active.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bminventory/inventory_test.go` around lines 22648 - 22651, The test
currently only asserts Unscoped existence of the host in infraEnv2; add an
additional scoped count assertion to prove the record is soft-deleted (not
active). After the existing Unscoped check that uses
db.Unscoped().Model(&common.Host{}).Where("id = ? AND infra_env_id = ?",
hostID.String(), infraEnvID2.String()).Count(&count), add a second assertion
that uses the scoped DB (without Unscoped()) against the same
Model(&common.Host{}) and WHERE clause, count into a new/overwritten count
variable and Expect the count to equal 0 to show the record is soft-deleted.
Ensure you still keep the Unscoped Expect(count).To(Equal(int64(1))) assertion.
internal/controller/controllers/agent_controller_test.go (1)

6217-6247: Consider tightening the soft-delete test intent and assertions.

This case currently overlaps heavily with the generic “no existing host found” flow. A clearer name and a stricter CreateHostInKubeKeyNamespace assertion would better document what is being validated.

♻️ Suggested refinement
-It("should handle soft-deleted host - CreateHostInKubeKeyNamespace will delete and recreate", func() {
+It("should create host when no active host exists for the same (InfraEnvID, HostID)", func() {
@@
-	// CreateHostInKubeKeyNamespace will delete soft-deleted host and create new one
-	mockInstallerInternal.EXPECT().CreateHostInKubeKeyNamespace(gomock.Any(), infraEnvKey, gomock.Any()).Return(nil).Times(1)
+	// CreateHostInKubeKeyNamespace handles soft-deleted rows internally and creates a fresh host
+	mockInstallerInternal.EXPECT().
+		CreateHostInKubeKeyNamespace(gomock.Any(), infraEnvKey, gomock.AssignableToTypeOf(&models.Host{})).
+		DoAndReturn(func(_ context.Context, _ types.NamespacedName, h *models.Host) error {
+			Expect(h).ToNot(BeNil())
+			Expect(h.InfraEnvID).To(Equal(infraEnvId))
+			return nil
+		}).
+		Times(1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/controllers/agent_controller_test.go` around lines 6217 -
6247, The test name and assertions are too generic—rename the spec to indicate
"soft-deleted host" behavior and tighten the mock expectation for
CreateHostInKubeKeyNamespace: require the exact infraEnvKey (use
gomock.Eq(infraEnvKey)) and assert the host argument passed has the expected
kube namespace/name (agent.Name, testNamespace) and/or deleted-status semantics
(e.g., not reusing a soft-deleted record) so the test explicitly validates
deletion+recreation via CreateHostInKubeKeyNamespace; keep existing
GetHostByKubeKey and GetCommonHostInternal returns (gorm.ErrRecordNotFound) but
replace the loose gomock.Any() matcher for CreateHostInKubeKeyNamespace with
specific matchers that validate the infraEnvKey and host identity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/bminventory/inventory_test.go`:
- Around line 22339-22342: The AfterEach blocks call ctrl.Finish() but ctrl is
never initialized in the surrounding Describe scopes, causing possible nil or
double-finish panics; either initialize the mock controller in the corresponding
BeforeEach (create ctrl via gomock.NewController or equivalent and assign to the
shared ctrl variable) or remove the ctrl.Finish() calls from those DB-only
Describe blocks so only scopes that set up ctrl call Finish; update the
BeforeEach/AfterEach pairs that reference ctrl (the BeforeEach that should
create ctrl and the AfterEach calling ctrl.Finish()) consistently to avoid
nil/double finishes.

---

Nitpick comments:
In `@internal/bminventory/inventory_test.go`:
- Around line 22648-22651: The test currently only asserts Unscoped existence of
the host in infraEnv2; add an additional scoped count assertion to prove the
record is soft-deleted (not active). After the existing Unscoped check that uses
db.Unscoped().Model(&common.Host{}).Where("id = ? AND infra_env_id = ?",
hostID.String(), infraEnvID2.String()).Count(&count), add a second assertion
that uses the scoped DB (without Unscoped()) against the same
Model(&common.Host{}) and WHERE clause, count into a new/overwritten count
variable and Expect the count to equal 0 to show the record is soft-deleted.
Ensure you still keep the Unscoped Expect(count).To(Equal(int64(1))) assertion.

In `@internal/controller/controllers/agent_controller_test.go`:
- Around line 6217-6247: The test name and assertions are too generic—rename the
spec to indicate "soft-deleted host" behavior and tighten the mock expectation
for CreateHostInKubeKeyNamespace: require the exact infraEnvKey (use
gomock.Eq(infraEnvKey)) and assert the host argument passed has the expected
kube namespace/name (agent.Name, testNamespace) and/or deleted-status semantics
(e.g., not reusing a soft-deleted record) so the test explicitly validates
deletion+recreation via CreateHostInKubeKeyNamespace; keep existing
GetHostByKubeKey and GetCommonHostInternal returns (gorm.ErrRecordNotFound) but
replace the loose gomock.Any() matcher for CreateHostInKubeKeyNamespace with
specific matchers that validate the infraEnvKey and host identity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 98766177-73fe-40fb-83de-4f52612db9f4

📥 Commits

Reviewing files that changed from the base of the PR and between dcd3df2 and 847e6b9.

📒 Files selected for processing (5)
  • internal/bminventory/inventory.go
  • internal/bminventory/inventory_test.go
  • internal/controller/controllers/agent_controller.go
  • internal/controller/controllers/agent_controller_test.go
  • internal/host/host.go
✅ Files skipped from review due to trivial changes (1)
  • internal/host/host.go

Comment thread internal/bminventory/inventory_test.go
@CrystalChun CrystalChun marked this pull request as ready for review April 21, 2026 20:30
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@openshift-ci openshift-ci Bot requested review from javipolo and romfreiman April 21, 2026 20:32
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/bminventory/inventory_test.go (1)

22347-22607: Consider extracting shared DB fixture helpers.

InfraEnv/Host creation logic is repeated across several specs; small helper builders would reduce noise and make future edge-case additions easier.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bminventory/inventory_test.go` around lines 22347 - 22607, Extract
reusable test helpers to create infra envs and hosts to remove duplication: add
functions used by these specs (e.g., createTestInfraEnv(db, infraEnvID) and
createTestHost(db, hostID, infraEnvID, status, kubeKeyNamespace string,
softDelete bool)) and replace repeated blocks that construct common.InfraEnv,
common.Host, calls to db.Create, db.Delete, and common.GetHostFromDB; update
tests that call GetCommonHostInternal and CreateHostInKubeKeyNamespace to use
those helpers so the composite-key soft-delete logic and verification steps
remain identical but centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/bminventory/inventory_test.go`:
- Around line 22347-22607: Extract reusable test helpers to create infra envs
and hosts to remove duplication: add functions used by these specs (e.g.,
createTestInfraEnv(db, infraEnvID) and createTestHost(db, hostID, infraEnvID,
status, kubeKeyNamespace string, softDelete bool)) and replace repeated blocks
that construct common.InfraEnv, common.Host, calls to db.Create, db.Delete, and
common.GetHostFromDB; update tests that call GetCommonHostInternal and
CreateHostInKubeKeyNamespace to use those helpers so the composite-key
soft-delete logic and verification steps remain identical but centralized.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 6021b7b4-df64-4a5f-ba92-18d6c45575d7

📥 Commits

Reviewing files that changed from the base of the PR and between 847e6b9 and d5671ec.

📒 Files selected for processing (5)
  • internal/bminventory/inventory.go
  • internal/bminventory/inventory_test.go
  • internal/controller/controllers/agent_controller.go
  • internal/controller/controllers/agent_controller_test.go
  • internal/host/host.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/host/host.go
  • internal/controller/controllers/agent_controller_test.go

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 46.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.35%. Comparing base (8812016) to head (371b554).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
internal/bminventory/inventory.go 0.00% 2 Missing and 1 partial ⚠️
internal/common/db.go 60.00% 1 Missing and 1 partial ⚠️
internal/host/host.go 0.00% 1 Missing and 1 partial ⚠️
...nternal/controller/controllers/agent_controller.go 80.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10185      +/-   ##
==========================================
+ Coverage   44.33%   44.35%   +0.02%     
==========================================
  Files         415      415              
  Lines       72770    72775       +5     
==========================================
+ Hits        32263    32282      +19     
+ Misses      37588    37575      -13     
+ Partials     2919     2918       -1     
Files with missing lines Coverage Δ
...nternal/controller/controllers/agent_controller.go 77.10% <80.00%> (+0.06%) ⬆️
internal/common/db.go 25.00% <60.00%> (+0.56%) ⬆️
internal/host/host.go 74.20% <0.00%> (+0.24%) ⬆️
internal/bminventory/inventory.go 72.09% <0.00%> (+0.13%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CrystalChun
Copy link
Copy Markdown
Contributor Author

/retest

Comment thread internal/host/host.go
Comment thread internal/bminventory/inventory.go Outdated
// Delete any previous record of the host if it was soft deleted,
// no error will be returned if the host didn't exist.
// This follows the same pattern as RegisterHost.
if err := b.db.Unscoped().Where("deleted_at IS NOT NULL").Delete(&common.Host{}, "id = ? and infra_env_id = ?", host.ID.String(), host.InfraEnvID.String()).Error; err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query seems duplicated. Can we extract it to DB utils/common? Something like DeleteHost or some better naming conveying what this query is doing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Thank you! Added a new function, wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice ❤️ looking simpler now :)

Comment thread internal/controller/controllers/agent_controller.go Outdated
Comment on lines +2235 to +2236
// A non-deleted host with this composite key already exists
// This shouldn't happen - GetHostByKubeKey should have found it earlier
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, from the stack call, we know that there is no host with this id + this kube namespace

But we never checked about the infraenv id. Im not familiar with the whole flow (that is probably the real problem) but it seems to me that it is a different thing from checking that a host exist with this id and infraenv id

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess the assumption was that GetHostByKubeKey should've found the host, but that's not necessarily the case. I'll remove the comment about This shouldn't happen..

Comment thread internal/controller/controllers/agent_controller.go Outdated
@openshift-ci openshift-ci Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 22, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
internal/controller/controllers/agent_controller.go (1)

2235-2236: ⚠️ Potential issue | 🟡 Minor

Drop the “This shouldn't happen” assumption in the comment.

That branch can be legitimate when kube-key lookup and composite-key lookup diverge, so this wording can mislead debugging.

✏️ Suggested comment-only fix
-		// A non-deleted host with this composite key already exists
-		// This shouldn't happen - GetHostByKubeKey should have found it earlier
+		// A non-deleted host with this composite key already exists.
+		// This can happen when kube-key lookup does not resolve this composite-key record.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/controllers/agent_controller.go` around lines 2235 -
2236, Update the comment that currently reads "This shouldn't happen -
GetHostByKubeKey should have found it earlier" to remove the "This shouldn't
happen" assumption and state neutrally that a non-deleted host may exist with
the composite key even if kube-key lookup did not return it; locate the comment
near the composite-key branch in agent_controller.go (context: GetHostByKubeKey)
and replace the wording with a factual note such as "A non-deleted host with
this composite key already exists; kube-key and composite-key lookups can
diverge, so both checks are valid."
🧹 Nitpick comments (2)
internal/common/db_test.go (1)

522-624: Consider extracting repeated count assertions into a small helper.

There’s repeated Unscoped/Model/Where/Count boilerplate that could be reduced to improve readability and maintenance.

♻️ Optional cleanup example
+assertHostCount := func(unscoped bool, hostID, infraEnvID strfmt.UUID, expected int64) {
+	var count int64
+	query := db.Model(&Host{})
+	if unscoped {
+		query = db.Unscoped().Model(&Host{})
+	}
+	Expect(query.Where("id = ? AND infra_env_id = ?", hostID.String(), infraEnvID.String()).Count(&count).Error).ToNot(HaveOccurred())
+	Expect(count).To(Equal(expected))
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/common/db_test.go` around lines 522 - 624, Extract the repeated
query+count logic into a helper (e.g., countHost(db *gorm.DB, id string, infraID
string, unscoped bool) (int64, error)) and replace all invocations of
db.(Unscoped()).Model(&Host{}).Where("id = ? AND infra_env_id = ?",
...).Count(&count) with calls to that helper; update tests that call
DeleteSoftDeletedHost and assertions to use the helper to fetch the count and
assert its value, keeping existing behavior for unscoped vs scoped queries and
preserving use sites around DeleteSoftDeletedHost, Host, and the test
expectations.
internal/bminventory/inventory_test.go (1)

22458-22464: Add explicit nil checks before dereferencing Status in assertions.

Direct *foundHost.Status dereference can produce less-actionable failures. Prefer asserting non-nil first for clearer spec failures.

Proposed test assertion tightening
-		Expect(*foundHost1.Status).To(Equal(models.HostStatusKnown))
+		Expect(foundHost1.Status).ToNot(BeNil())
+		Expect(*foundHost1.Status).To(Equal(models.HostStatusKnown))

-		Expect(*foundHost2.Status).To(Equal(models.HostStatusDiscovering))
+		Expect(foundHost2.Status).ToNot(BeNil())
+		Expect(*foundHost2.Status).To(Equal(models.HostStatusDiscovering))

-		Expect(*foundHost.Status).To(Equal(models.HostStatusDiscovering))
+		Expect(foundHost.Status).ToNot(BeNil())
+		Expect(*foundHost.Status).To(Equal(models.HostStatusDiscovering))

Also applies to: 22576-22576

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bminventory/inventory_test.go` around lines 22458 - 22464, Add
explicit nil checks before dereferencing host Status to produce clearer
failures: after retrieving hosts via bm.GetCommonHostInternal (e.g., foundHost1
and foundHost2) assert Expect(foundHost1.Status).ToNot(BeNil()) and
Expect(foundHost2.Status).ToNot(BeNil()) before each
Expect(*foundHostX.Status).To(Equal(...)); apply the same change to the other
occurrence(s) (e.g., the assertion near the other host at the 22576 location) so
all Status dereferences are guarded by non-nil assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@internal/controller/controllers/agent_controller.go`:
- Around line 2235-2236: Update the comment that currently reads "This shouldn't
happen - GetHostByKubeKey should have found it earlier" to remove the "This
shouldn't happen" assumption and state neutrally that a non-deleted host may
exist with the composite key even if kube-key lookup did not return it; locate
the comment near the composite-key branch in agent_controller.go (context:
GetHostByKubeKey) and replace the wording with a factual note such as "A
non-deleted host with this composite key already exists; kube-key and
composite-key lookups can diverge, so both checks are valid."

---

Nitpick comments:
In `@internal/bminventory/inventory_test.go`:
- Around line 22458-22464: Add explicit nil checks before dereferencing host
Status to produce clearer failures: after retrieving hosts via
bm.GetCommonHostInternal (e.g., foundHost1 and foundHost2) assert
Expect(foundHost1.Status).ToNot(BeNil()) and
Expect(foundHost2.Status).ToNot(BeNil()) before each
Expect(*foundHostX.Status).To(Equal(...)); apply the same change to the other
occurrence(s) (e.g., the assertion near the other host at the 22576 location) so
all Status dereferences are guarded by non-nil assertions.

In `@internal/common/db_test.go`:
- Around line 522-624: Extract the repeated query+count logic into a helper
(e.g., countHost(db *gorm.DB, id string, infraID string, unscoped bool) (int64,
error)) and replace all invocations of db.(Unscoped()).Model(&Host{}).Where("id
= ? AND infra_env_id = ?", ...).Count(&count) with calls to that helper; update
tests that call DeleteSoftDeletedHost and assertions to use the helper to fetch
the count and assert its value, keeping existing behavior for unscoped vs scoped
queries and preserving use sites around DeleteSoftDeletedHost, Host, and the
test expectations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8bc2b8cb-6c21-444f-aae6-6bead430ee8e

📥 Commits

Reviewing files that changed from the base of the PR and between d5671ec and 778ae1c.

📒 Files selected for processing (7)
  • internal/bminventory/inventory.go
  • internal/bminventory/inventory_test.go
  • internal/common/db.go
  • internal/common/db_test.go
  • internal/controller/controllers/agent_controller.go
  • internal/controller/controllers/agent_controller_test.go
  • internal/host/host.go
✅ Files skipped from review due to trivial changes (1)
  • internal/host/host.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/controllers/agent_controller_test.go

@CrystalChun CrystalChun force-pushed the restore-host-fix branch 2 times, most recently from 7c9849d to 77a4584 Compare April 22, 2026 17:07
A host that was soft-deleted and not removed from the DB
will prevent that same host from being restored due to
primary key conflict.

To fix this, we should delete the soft-deleted host before
we try to restore and recreate it.

This allows restoring hosts on the same hub that may have
just had a disaster scenario with the hosted cluster.
@rccrdpccl
Copy link
Copy Markdown
Contributor

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CrystalChun, rccrdpccl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [CrystalChun,rccrdpccl]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 23, 2026

@CrystalChun: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 71fe801 into openshift:master Apr 23, 2026
25 checks passed
@CrystalChun
Copy link
Copy Markdown
Contributor Author

CrystalChun commented Apr 23, 2026

/retitle MGMT-23509: Allow restoring soft deleted hosts

@openshift-ci openshift-ci Bot changed the title NO-ISSUE: Allow restoring soft deleted hosts MGMT-23509: Allow restoring soft deleted hosts Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants