Default to use soft power off instead of hard power off #294

tiendc · 2019-08-29T12:34:19Z

#273
Signed-off-by: Dao Cong Tien tiendc@vn.fujitsu.com

nordixinfra · 2019-08-29T12:36:04Z

Can one of the admins verify this patch?

derekhiggins · 2019-08-29T13:02:02Z

Is this the call that is used for fencing? If so should it remain a hard power off?

pkg/provisioner/provisioner.go

dhellmann · 2019-08-30T20:50:19Z

pkg/provisioner/ironic/ironic.go

+		result, err = p.changePower(ironicNode, nodes.SoftPowerOff)
+		if err != nil {
+			// Soft power off is not supported by vendor driver, uses PowerOff()
+			if strings.HasPrefix(err.Error(), "driver does not support target power state") {


Is there some way for us to detect whether the driver supports soft power off before we get here and try to use it? Could we store a setting in the Status section of the host, so the user knows what to expect when the ask for the host to be powered off?

I've checked the Ironic API, currently there is no API for retrieving supported power states. We may consider adding function to bmcAccessDetail, says softPowerOffSupported(), as an alternative solution.

pkg/provisioner/ironic/ironic.go

dhellmann · 2019-08-30T20:55:07Z

Is this the call that is used for fencing? If so should it remain a hard power off?

A soft power off is the default, but if that fails we still yank the power.

dhellmann · 2019-08-30T20:55:52Z

@tiendc thank you for working on this!

derekhiggins · 2019-09-02T10:25:57Z

Is this the call that is used for fencing? If so should it remain a hard power off?

A soft power off is the default, but if that fails we still yank the power.

ack, thanks

dhellmann · 2019-10-08T17:37:18Z

docs/api.md

+Value is one of the following:
+  * *<empty string>* -- Soft power off is not used on the node.
+  * *unsupported* -- Soft power off is not supported on the node.
+  * *triggered* -- Soft power off is triggered on the node but


Do we really need to track the soft power off status separately?

If we do, I think instead of reflecting it in a new status field, we should see if we can combine it with the provisioning status and make that a top level field on the status structure.

We would never have something in "provisioning" with a soft power off status of "triggered", for example, right?

dhellmann · 2019-10-08T17:41:23Z

pkg/provisioner/ironic/ironic.go

@@ -1325,6 +1336,19 @@ func (p *ironicProvisioner) PowerOn() (result provisioner.Result, err error) {
 func (p *ironicProvisioner) PowerOff() (result provisioner.Result, err error) {
 	p.log.Info("ensuring host is powered off")

+	// Tries soft power off first, if it fails, performs hard power off
+	result, err = p.softPowerOff()
+	if err != nil {


We only want to switch to the hard power off mode if we get the very specific 400 error. If we get a 409 we want to pause and try the soft power off again, for example.

I think we want to define a new error type so that we can convert the 400 error from line 1291 to our custom type, and then check for that type here instead of just checking against nil.

In my test environment, I have a Fujitsu BM server. It requires the OS must install a specific agent to allow soft power off. So if the agent is not installed, any try to perform soft power off will fail regardless of support from Ironic and Fujitsu driver for Ironic. I think if we retry the action when failed, we should limit the number of it, say 3 times. Do you have any suggestion?

The 409 error just means that ironic itself is too busy to handle our request (or more likely that it is already doing something with the host and cannot send multiple instructions). But your point about only retrying a few times makes a lot of sense, for other types of errors.

deploy/crds/metal3.io_baremetalhosts_crd.yaml

pkg/apis/metal3/v1alpha1/baremetalhost_types.go

tiendc · 2020-01-07T11:47:28Z

It seems I did something wrong with git, so the pull request contains a commit that is not mine. I will try to fix it.

zaneb

It would be great if we could eliminate having to store state in the Host CR. It looks to me like this should be theoretically possible.

pkg/apis/metal3/v1alpha1/baremetalhost_types.go

pkg/provisioner/ironic/ironic.go

pkg/provisioner/errors.go

pkg/provisioner/ironic/ironic.go

zhouhao3 · 2020-04-07T06:18:04Z

@zaneb @dhellmann
Hi, I will take over from @tiendc to continue this work, and I made some changes to the code based on your previous comments. PTAL.

zhouhao3 · 2020-04-20T03:36:51Z

@dhellmann PTAL

zhouhao3 · 2020-05-09T05:45:14Z

@zaneb @dhellmann PTAL

zhouhao3 · 2020-05-19T08:07:52Z

@zaneb @dhellmann PTAL

pkg/provisioner/ironic/ironic.go

Signed-off-by: Dao Cong Tien <tiendc@vn.fujitsu.com> Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>

zhouhao3 · 2020-06-10T00:48:51Z

@zaneb @dhellmann @maelk Can someone help review this patch? Thanks a lot!

zaneb

/approve
/test-integration

zaneb · 2020-06-10T17:49:50Z

pkg/provisioner/ironic/ironic.go

+		}
+		// If the target state is unset while the last error is set,
+		// then the last execution of soft power off has failed.
+		if targetState == "" && ironicNode.LastError != "" {


There's a possibility that the node could already have an error set before we attempt to power it off. However, even in that worst-case scenario, all that happens is that we will go straight to a hard power off. So I think this is fine.

zhouhao3 · 2020-06-15T02:47:15Z

/test-integration

zhouhao3 · 2020-06-16T00:46:16Z

@zaneb Why has this PR been stuck in this state? Is this PR currently waiting for the result of test-integration (almost a week has passed)? Is there anything I can do to promote this PR?

zaneb · 2020-06-16T01:14:26Z

I'm not sure why the integration test didn't run. @maelk any idea?

dhellmann · 2020-06-16T15:14:21Z

/test-integration

dhellmann · 2020-06-16T15:15:14Z

I'm not sure why the integration test didn't run. @maelk any idea?

Perhaps only org members can trigger the test job?

zaneb · 2020-06-16T15:52:46Z

Perhaps only org members can trigger the test job?

I also tried unsuccessfully to trigger it last week, but on reflection that may have been before the regex was changed to allow it not to be the only line in the comment.

zhouhao3 · 2020-06-17T01:16:09Z

@zaneb @dhellmann test-integration has passed. Please continue to review, thanks.

zhouhao3 · 2020-06-23T05:28:47Z

@zaneb PTAL, thanks.

zhouhao3 · 2020-06-28T00:58:47Z

@zaneb @dhellmann @maelk At present, this PR has obtained an approve, and test-integration has passed. Currently requires lgtm label. Is there anything else I can do to advance this PR?

dhellmann

I tested this locally with some VMs and it works well.

/lgtm

metal3-io-bot · 2020-07-02T19:43:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhellmann, tiendc, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dhellmann,zaneb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The default reboot-interface behaviour is to attempt a soft power off, and if this fails, revert to a hard power off (PR openshift#294). For high availability use-cases we require the ability to immediately power-off a node. This PR attempts to address that requirement and is part of a wider solution requiring the CAPBM to set the annotation that we have detailed and implemented in this commit. The baseline provisioner API changes have been provided in an earlier commit. CAPBM PR: openshift/cluster-api-provider-baremetal#138 Also see: https://bugzilla.redhat.com/show_bug.cgi?id=1927678

…rsion [release-4.13] OCPBUGS-17229: Set minimum TLS version for webhook to 1.2

dhellmann requested changes Aug 30, 2019

View reviewed changes

tiendc closed this Sep 26, 2019

tiendc reopened this Sep 26, 2019

dhellmann reviewed Oct 8, 2019

View reviewed changes

Xenwar mentioned this pull request Oct 9, 2019

Set Soft Power Off to be the default shutdown mode #306

Closed

metal3-io-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 26, 2019

dhellmann reviewed Jan 2, 2020

View reviewed changes

deploy/crds/metal3.io_baremetalhosts_crd.yaml Outdated Show resolved Hide resolved

pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated Show resolved Hide resolved

pkg/apis/metal3/v1alpha1/baremetalhost_types.go Outdated Show resolved Hide resolved

tiendc closed this Jan 13, 2020

tiendc deleted the soft_power_off branch January 13, 2020 05:01

tiendc restored the soft_power_off branch January 13, 2020 05:03

tiendc reopened this Jan 13, 2020

zaneb requested changes Jan 13, 2020

View reviewed changes

zaneb reviewed Jan 23, 2020

View reviewed changes

pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved

zaneb reviewed May 21, 2020

View reviewed changes

pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved

pkg/provisioner/ironic/ironic.go Show resolved Hide resolved

pkg/provisioner/ironic/ironic.go Show resolved Hide resolved

Default to use soft power off instead of hard power off

e17fdd6

Signed-off-by: Dao Cong Tien <tiendc@vn.fujitsu.com> Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>

zaneb reviewed Jun 10, 2020

View reviewed changes

metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2020

dhellmann approved these changes Jul 2, 2020

View reviewed changes

metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2020

metal3-io-bot merged commit da5b8a8 into metal3-io:master Jul 2, 2020

hardys mentioned this pull request Feb 11, 2021

Add explicit reboot mode options metal3-io/metal3-docs#164

Merged

rdoxenham mentioned this pull request Feb 15, 2021

Implement explicit reboot mode options #795

Merged

elfosardo pushed a commit to elfosardo/baremetal-operator that referenced this pull request Oct 16, 2023

Merge pull request metal3-io#294 from zaneb/openshift-4.13/tls-min-ve…

96e931b

…rsion [release-4.13] OCPBUGS-17229: Set minimum TLS version for webhook to 1.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to use soft power off instead of hard power off #294

Default to use soft power off instead of hard power off #294

tiendc commented Aug 29, 2019

nordixinfra commented Aug 29, 2019

derekhiggins commented Aug 29, 2019

dhellmann Aug 30, 2019

tiendc Sep 26, 2019 •

edited

dhellmann commented Aug 30, 2019

dhellmann commented Aug 30, 2019

derekhiggins commented Sep 2, 2019

dhellmann Oct 8, 2019

dhellmann Oct 8, 2019

tiendc Oct 14, 2019 •

edited

dhellmann Jun 16, 2020

tiendc commented Jan 7, 2020

zaneb left a comment

zhouhao3 commented Apr 7, 2020

zhouhao3 commented Apr 20, 2020

zhouhao3 commented May 9, 2020

zhouhao3 commented May 19, 2020

zhouhao3 commented Jun 10, 2020

zaneb left a comment

zaneb Jun 10, 2020

zhouhao3 commented Jun 15, 2020

zhouhao3 commented Jun 16, 2020

zaneb commented Jun 16, 2020

dhellmann commented Jun 16, 2020

dhellmann commented Jun 16, 2020

zaneb commented Jun 16, 2020

zhouhao3 commented Jun 17, 2020

zhouhao3 commented Jun 23, 2020

zhouhao3 commented Jun 28, 2020

dhellmann left a comment

metal3-io-bot commented Jul 2, 2020

Default to use soft power off instead of hard power off #294

Default to use soft power off instead of hard power off #294

Conversation

tiendc commented Aug 29, 2019

nordixinfra commented Aug 29, 2019

derekhiggins commented Aug 29, 2019

dhellmann Aug 30, 2019

Choose a reason for hiding this comment

tiendc Sep 26, 2019 • edited

Choose a reason for hiding this comment

dhellmann commented Aug 30, 2019

dhellmann commented Aug 30, 2019

derekhiggins commented Sep 2, 2019

dhellmann Oct 8, 2019

Choose a reason for hiding this comment

dhellmann Oct 8, 2019

Choose a reason for hiding this comment

tiendc Oct 14, 2019 • edited

Choose a reason for hiding this comment

dhellmann Jun 16, 2020

Choose a reason for hiding this comment

tiendc commented Jan 7, 2020

zaneb left a comment

Choose a reason for hiding this comment

zhouhao3 commented Apr 7, 2020

zhouhao3 commented Apr 20, 2020

zhouhao3 commented May 9, 2020

zhouhao3 commented May 19, 2020

zhouhao3 commented Jun 10, 2020

zaneb left a comment

Choose a reason for hiding this comment

zaneb Jun 10, 2020

Choose a reason for hiding this comment

zhouhao3 commented Jun 15, 2020

zhouhao3 commented Jun 16, 2020

zaneb commented Jun 16, 2020

dhellmann commented Jun 16, 2020

dhellmann commented Jun 16, 2020

zaneb commented Jun 16, 2020

zhouhao3 commented Jun 17, 2020

zhouhao3 commented Jun 23, 2020

zhouhao3 commented Jun 28, 2020

dhellmann left a comment

Choose a reason for hiding this comment

metal3-io-bot commented Jul 2, 2020

tiendc Sep 26, 2019 •

edited

tiendc Oct 14, 2019 •

edited