Bug 1917484: Don't adopt after clean failure during deprovisioning by zaneb · Pull Request #121 · openshift/baremetal-operator

zaneb · 2021-01-27T03:48:59Z

During deprovisioning of a Host, if 'deleting' (i.e. deprovisioning) the
node succeeds (i.e. it doesn't go to the Error state) but the automated
cleaning that follows fails, the only way to recover is to return the
node to the manageable state.

Previously, once in the manageable state we would attempt adoption on
the node so that we could deprovision again. However, in the course of
'deleting' the node, the image information is cleared from it so it
cannot be adopted again. (Adoption continues to be the right thing to do
if the node has just been re-registered due to the Ironic database being
recreated, and in that case the image information is present since it
gets added during the initial registration.)

To work around this, don't attempt to adopt during the Deprovisioning
state if the node is manageable and the image data is not present.
Handle the manageable state in Deprovision() by declaring the
deprovisioning complete.

A node in the manageable state cannot be re-provisioned without first
being cleaned - it must go through cleaning to reach the available state
before it can be provisioned. Provisioning already handles nodes in the
manageable state, as this is how they begin after the initial inspection
of the host before the first provisioning (which does the initial
cleaning).

Backport of metal3-io#772

During deprovisioning of a Host, if 'deleting' (i.e. deprovisioning) the node succeeds (i.e. it doesn't go to the Error state) but the automated cleaning that follows fails, the only way to recover is to return the node to the manageable state. Previously, once in the manageable state we would attempt adoption on the node so that we could deprovision again. However, in the course of 'deleting' the node, the image information is cleared from it so it cannot be adopted again. (Adoption continues to be the right thing to do if the node has just been re-registered due to the Ironic database being recreated, and in that case the image information is present since it gets added during the initial registration.) To work around this, don't attempt to adopt during the Deprovisioning state if the node is manageable and the image data is not present. Handle the manageable state in Deprovision() by declaring the deprovisioning complete. A node in the manageable state cannot be re-provisioned without first being cleaned - it must go through cleaning to reach the available state before it can be provisioned. Provisioning already handles nodes in the manageable state, as this is how they begin after the initial inspection of the host before the first provisioning (which does the initial cleaning). (cherry picked from commit ba38688)

openshift-ci-robot · 2021-01-27T03:49:05Z

@zaneb: This pull request references Bugzilla bug 1917484, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.7.0) matches configured target release for branch (4.7.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Details

In response to this:

Bug 1917484: Don't adopt after clean failure during deprovisioning

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2021-01-27T03:49:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [zaneb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2021-01-27T03:53:31Z

@zaneb: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/unit	`3d51922`	link	`/test unit`
ci/prow/e2e-metal-ipi-ovn-ipv6	`3d51922`	link	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/images	`3d51922`	link	`/test images`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

honza · 2021-02-01T11:47:39Z

This seems to depend on metal3-io@0e1acfe which is part of metal3-io#761

honza · 2021-02-01T14:46:17Z

Closing in favour of #122
/close

openshift-ci-robot · 2021-02-01T14:46:33Z

@honza: Closed this PR.

Details

In response to this:

Closing in favour of #122
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2021-02-01T14:46:36Z

@zaneb: This pull request references Bugzilla bug 1917484. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1917484: Don't adopt after clean failure during deprovisioning

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Jan 27, 2021

openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 27, 2021

openshift-ci-robot requested review from sadasu and stbenjam January 27, 2021 03:49

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2021

honza mentioned this pull request Feb 1, 2021

Bug 1917484: Don't adopt after clean failure during deprovisioning #122

Merged

openshift-ci-robot closed this Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1917484: Don't adopt after clean failure during deprovisioning#121

Bug 1917484: Don't adopt after clean failure during deprovisioning#121
zaneb wants to merge 1 commit intoopenshift:masterfrom
zaneb:openshift-4.7/deprov-failure-adopt

zaneb commented Jan 27, 2021

Uh oh!

openshift-ci-robot commented Jan 27, 2021

Uh oh!

openshift-ci-robot commented Jan 27, 2021

Uh oh!

openshift-ci Bot commented Jan 27, 2021

Uh oh!

honza commented Feb 1, 2021

Uh oh!

honza commented Feb 1, 2021

Uh oh!

openshift-ci-robot commented Feb 1, 2021

Uh oh!

openshift-ci-robot commented Feb 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zaneb commented Jan 27, 2021

Uh oh!

openshift-ci-robot commented Jan 27, 2021

Uh oh!

openshift-ci-robot commented Jan 27, 2021

Uh oh!

openshift-ci Bot commented Jan 27, 2021

Uh oh!

honza commented Feb 1, 2021

Uh oh!

honza commented Feb 1, 2021

Uh oh!

openshift-ci-robot commented Feb 1, 2021

Uh oh!

openshift-ci-robot commented Feb 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants