Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BMH with good BMC credentials and registration error stuck deleting #1385

Closed
lentzi90 opened this issue Oct 10, 2023 · 3 comments
Closed

BMH with good BMC credentials and registration error stuck deleting #1385

lentzi90 opened this issue Oct 10, 2023 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@lentzi90
Copy link
Member

What steps did you take and what happened:
Seen in #1382
Still need to verify the reproduction steps, but this is what I believe happened:

  1. A BMH (first) in available state was deleted without waiting for it to go away. Tthe important part is that it is still there, not that it was deleting. Just mentioning for completeness sake.)
  2. Another BMH (second) was created with identical BMC details (it is the same machine, just "re-creating" the BMH).
  3. second hits registration error since there is a node with identical MAC in Ironic already.
  4. second is marked for deletion (cleaning up after the failed test).
  5. first disappears eventually when the deletion is completed.
  6. second remains indefinitely stuck trying to power off before deletion.

I believe this is due to a simple logic fail here:

if r.NeedsRegistration() && !hsm.haveCreds {
// If the host is not registered as a node in Ironic and we
// lack the credentials to power it off, just continue to
// delete.
return skipToDelete()

The situation I described above have a BMH with NeedsRegistration=true (because registration failed due to conflict) and haveCreds=true. We get true && false so we do not skipToDelete. It is impossible to power off the BMH since it was not registered but this is what is attempted. I think the proper check would be NeedsRegistration || !haveCreds. In either of these cases we cannot power off the BMH and should just skip to deletion.

What did you expect to happen:

The BMH with registration error should eventually be deleted.

Anything else you would like to add:

I'm planning to open a PR with the suggested fix.

Environment:

  • Baremetal Operator version: main
  • Environment (metal3-dev-env or other): other (bmo e2e)

/kind bug

@metal3-io-bot metal3-io-bot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Oct 10, 2023
@dtantsur
Copy link
Member

/cc @honza
/triage accepted

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Oct 11, 2023
@zaneb
Copy link
Member

zaneb commented Oct 24, 2023

In theory this should be resolved by #1356.

@lentzi90
Copy link
Member Author

I agree, this was fixed (by making it impossible to reach the code I referenced in this particular case) in #1356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants