Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing case of BuildRAIDCleanSteps #942

Merged
merged 1 commit into from
Aug 2, 2021

Conversation

Hellcatlk
Copy link
Member

@Hellcatlk Hellcatlk commented Jul 22, 2021

Signed-off-by: Zou Yu zouy.fnst@cn.fujitsu.com

This PR replaces #941. I closed PR941 to avoid affecting openshift PR165.
PR165 should only track #908, rather than includes PR941, since it is a bug fix.
Unfortunately, PR941 and PR908 was on the same branch, so I have to close PR941 and open this one, sorry about that.

This PR aims to address a case in BuildRAIDCleanSteps which I missed in PR908.

In summary, it add the handling of the case when hardwareRAIDVolumes is nil and the target node doesn't have a RAID controller. We shouldn't delete the RAID configuration on that node in that case, otherwise the ironic node will enter the clean failed state because of delete_configuration clean step failure.

@metal3-io-bot metal3-io-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 22, 2021
}

Copy link
Member Author

@Hellcatlk Hellcatlk Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andfasano
The missing here is that unlike software raid part, HardwareRAIDVolumes == nil and HardwareRAIDVolumes == [] should be treated differently.
This is because when a node does not have a RAID controller, we definitely do not want to do any raid related operations including delete_configuration which will cause clean failure.
So we need to make 'do nothing' can also be specified through the spec.raid field, not only 'cleanning', 'cleanning and configuring'.
This can be done by let HardwareRAIDVolumes == nil represent 'do nothing' and HardwareRAIDVolumes == [] represent 'cleaning', and actually this logic is already written in the doc, here is a missing in the code, I am sorry.

Therefore, our goal is:

  • If hardwareRAIDVolume = nil
    Keep actual RAID configuration
  • If hardwareRAIDVolume = []
    Clear actual RAID configuration
  • Other
    Apply RAID configuration to node

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional explaintion:

This is the missing case:
User apply a new bmh without raid field, and at the same time, the target node doesn't have RAID controller.

In above case, when entering BuildRAIDCleanSteps(), the values of target and actual are:

target = {
    hardwareRAIDVolume = nil
    softwareRAIDVolume = []
}

actual = nil

Because of above values of target and actual, BMO will append delete_configuration into clean steps, then due to there is no RAID controller, an error will happen, which leads to reentry the preparing status (infinity loop).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I guess it's This is because when a node does not have a RAID controllerbut supports RAID)

Thanks @Hellcatlk for the clarification and links (and also for having used a different branch), it's definitely clearer now. Could you please update the commit message to reflect this case (ie that hw RAID operations are skipped when a node does not have a RAID controller), since right now it still sound pretty generic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

@Hellcatlk
Copy link
Member Author

/assign @russellb

@Hellcatlk
Copy link
Member Author

/assign @dtantsur

@Hellcatlk
Copy link
Member Author

/test-integration

@Hellcatlk Hellcatlk force-pushed the hardware-raid branch 2 times, most recently from 5e10b66 to d246b06 Compare July 22, 2021 14:17
@andfasano
Copy link
Member

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
When `hardwareRAIDVolumes` is nil and the target node doesn't have a RAID
controller. We shouldn't delete the RAID configuration on that node in
that case, otherwise the `ironic` node will enter the clean failed state
because of `delete_configuration` clean step failure. So in this case, hardware
RAID operations will be skipped.

Signed-off-by: Zou Yu <zouy.fnst@cn.fujitsu.com>
@metal3-io-bot metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@Hellcatlk
Copy link
Member Author

@andfasano Sorry, I just updated the commit message, causing lgtm to be removed.

@Hellcatlk
Copy link
Member Author

/test-integration

@andfasano
Copy link
Member

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2021
@Hellcatlk
Copy link
Member Author

@dtantsur PTAL.

@dtantsur
Copy link
Member

dtantsur commented Aug 2, 2021

/approve

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtantsur

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2021
@metal3-io-bot metal3-io-bot merged commit 848cf22 into metal3-io:master Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants