-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing case of BuildRAIDCleanSteps #942
Conversation
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andfasano
The missing here is that unlike software raid part, HardwareRAIDVolumes == nil
and HardwareRAIDVolumes == []
should be treated differently.
This is because when a node does not have a RAID controller, we definitely do not want to do any raid related operations including delete_configuration
which will cause clean failure.
So we need to make 'do nothing' can also be specified through the spec.raid
field, not only 'cleanning', 'cleanning and configuring'.
This can be done by let HardwareRAIDVolumes == nil
represent 'do nothing' and HardwareRAIDVolumes == []
represent 'cleaning', and actually this logic is already written in the doc, here is a missing in the code, I am sorry.
Therefore, our goal is:
- If hardwareRAIDVolume = nil
Keep actual RAID configuration - If hardwareRAIDVolume = []
Clear actual RAID configuration - Other
Apply RAID configuration to node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional explaintion:
This is the missing case:
User apply a new bmh without raid
field, and at the same time, the target node doesn't have RAID controller.
In above case, when entering BuildRAIDCleanSteps(), the values of target and actual are:
target = {
hardwareRAIDVolume = nil
softwareRAIDVolume = []
}
actual = nil
Because of above values of target and actual, BMO will append delete_configuration
into clean steps, then due to there is no RAID controller, an error will happen, which leads to reentry the preparing status (infinity loop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I guess it's This is because when a node does not have a RAID controller
but supports RAID
)
Thanks @Hellcatlk for the clarification and links (and also for having used a different branch), it's definitely clearer now. Could you please update the commit message to reflect this case (ie that hw RAID operations are skipped when a node does not have a RAID controller), since right now it still sound pretty generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
/assign @russellb |
/assign @dtantsur |
/test-integration |
5e10b66
to
d246b06
Compare
/lgtm |
When `hardwareRAIDVolumes` is nil and the target node doesn't have a RAID controller. We shouldn't delete the RAID configuration on that node in that case, otherwise the `ironic` node will enter the clean failed state because of `delete_configuration` clean step failure. So in this case, hardware RAID operations will be skipped. Signed-off-by: Zou Yu <zouy.fnst@cn.fujitsu.com>
@andfasano Sorry, I just updated the commit message, causing |
/test-integration |
/lgtm |
@dtantsur PTAL. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dtantsur The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Zou Yu zouy.fnst@cn.fujitsu.com
This PR replaces #941. I closed PR941 to avoid affecting openshift PR165.
PR165 should only track #908, rather than includes PR941, since it is a bug fix.
Unfortunately, PR941 and PR908 was on the same branch, so I have to close PR941 and open this one, sorry about that.
This PR aims to address a case in BuildRAIDCleanSteps which I missed in PR908.
In summary, it add the handling of the case when hardwareRAIDVolumes is nil and the target node doesn't have a RAID controller. We shouldn't delete the RAID configuration on that node in that case, otherwise the
ironic
node will enter the clean failed state because ofdelete_configuration
clean step failure.