[preview] merge upstream 2021-02-01 #123

honza · 2021-02-01T20:07:25Z

Here is the commit without the vendor update.

Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>

Since 1846513, startProvisioning() no longer requests the state change that actually... starts provisioning, so rename it to reflect that.

The first time we provision a Node, we must move it from the manageable to available state. This involves automated cleaning, and it may fail. If it does, we should attempt to recover.

Set console=ttyS0 so that the IPA console can be logged for debugging.

Signed-off-by: seivanov <seivanov@mirantis.com>

Handle cleaning failures during provisioning

Add kubebuilder validation for ClockSpeed

Filter out status updates from the reconcile loop

Add console=ttyS0 to kernel params

It is up to the consumer, not the host, to decide when a host is ready to have provisioning instructions added. The checks in the Available() method will be inlined in cluster-api-provider-metal3. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>

Use the API field directly instead of the convenience method. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>

Move public methods of the host type to private functions in the controller package. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>

Bug 1907612: Update kubernetes deps to 1.20

Fix a typo in docs/dev-setup.md document

Fix a typo in doc/dev-setup.md

The new provioner is mostly a no-op and prov.IsReady() must return true

Fix outdated namespace docs/dev-setup document.

Allow BMC details to be omitted for Hosts in Unmanaged state

Currently we assume only an ipv4 *or* an ipv6 IP exists in the inspection data, but it's valid for both to exist in a dual-stack scenario. To avoid changing the BMH API, in this case we create an additional nic entry in the list, which will contain each IP but otherwise the same data. Fixes: metal3-io#458

Handle dual-stack configuration in inspection data

Fix manager namespace in docs

remove convenience methods from BareMetalHost

When a call to the provisioner succeeds rather than returning an error message, that's a good sign, and a reason to not have an error message set in the Host object. But it doesn't guarantee success: if the previous failure came at the end of some long-running operation (rather than outright rejection at the beginning), it could yet fail in exactly the same way. Clearing the ErrorType as soon as we start an operation allows us to use that field to determine whether to force the provisioner to retry in the presence of an error. (It will only be set if the last thing we did was record an error, therefore if we see it then we are at the beginning of a new retry.) One known issue with this is that if a request to change state in Ironic results in a 409 Conflict response, this just sets the Dirty flag and is indistinguishable from success when viewed outside the Ironic provisioner. If such a conflict occurs, we will effectively skip a retry attempt, increment the ErrorCount again, and sleep for even longer before the next attempt. To ensure that we actually do exponential backoff between retries, leave the ErrorCount unchanged until the whole operation actually succeeds (Dirty is no longer true). This is now decoupled from the ClearError() method that clears the ErrorType and ErrorMessage. As much as possible, do the clearing of ErrorCount in the host state machine. The exception is the steady states where we only do power management - the actionResult types currently lack enough detail to distinguish when count should be cleared when viewed from the state machine. In power management states (Ready/Available, Provisioned, Externally Provisioned), count the number of errors of any type since the power state was last successfully changed. Otherwise, the ErrorCount is cleared when an operation succesfully completes. Successful re-registration or adoption is never sufficient to clear the error count, except in the registration state.

Once we see an error in the Node, it returns to the 'enroll' [sic] state and we don't have a way of determining whether we have seen and saved that error or not. Previously we always assumed we had not, and didn't retry the validation unless the credentials had changed. Add a force flag to indicate that this is a new attempt and should start again. Fixes metal3-io#739

If inspection fails at some point before it is actually started in ironic-inspector, we were just repeatedly retrying it instead of setting an error. Instead, set an error and retry only after a backoff.

Error is the state that the Node goes into if deleting (i.e. deprovisioning) it fails. The only valid action from this state is to try deleting again, so do that rather than attempt to go straight to manageable.

If deprovisioning fails, we were just repeatedly retrying it instead of setting an error. Instead, set an error and retry only after a backoff. If we are deprovisioning because of a deletion, give up after 3 retries (since this may indicate that the host simply doesn't exist any more) and allow the Host to be deleted.

When registering a new host, we check if its bootMACAddress conflicts with any existing Ironic nodes. This procedure queries Ironic ports, and if a port is found, we use the port's NodeUUID to find the conflicting node. However, we are using `ports.List()` which only returns minimal resources by default; e.g. any returned ports will have their NodeUUID property set to an empty string. We are now explicitly asking for Ironic to include the `node_uuid` field. We also adjust the mock test mechanism to account for this.

Add missing newline in console log

Ironic: Don't adopt after clean failure during deprovisioning

Ask Ironic to include NodeUUID when querying ports

If we can't find a node by name, and have to resort to using MAC address lookup, we can't accept a node that has a name. In that scenario, we have detected a MAC address collision, and human intervention is required. We signal this by returning an `operationFailed` provisioning result with the offending MAC address in the error message.

Fail registration when boot MAC conflicts

Limit the number of hosts simultaneously provisioned

Add "NET_RAW" to ironic-endpoint-keepalived

openshift-ci-robot · 2021-02-01T20:08:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: honza

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [honza]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

stbenjam · 2021-02-01T20:21:28Z

/test e2e-metal-ipi-virtualmedia

stbenjam · 2021-02-01T20:22:16Z

/test e2e-metal-ipi

stbenjam · 2021-02-01T20:24:39Z

/test e2e-metal-assisted

Not sure if that one's here, but we probably want to check it.

openshift-ci-robot · 2021-02-01T20:24:54Z

@stbenjam: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-ipi-virtualmedia
/test images
/test lint
/test unit

Use /test all to run the following jobs:

pull-ci-openshift-baremetal-operator-master-e2e-metal-ipi-ovn-ipv6
pull-ci-openshift-baremetal-operator-master-images
pull-ci-openshift-baremetal-operator-master-lint
pull-ci-openshift-baremetal-operator-master-unit

Details

In response to this:

/test e2e-metal-assisted

Not sure if that one's here, but we probably want to check it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

stbenjam · 2021-02-01T20:25:04Z

/test e2e-metal-ipi-ovn-dualstack

honza · 2021-02-09T15:00:44Z

openshift/cluster-baremetal-operator#100 is now merged so let's try a fresh build

/retest

stbenjam · 2021-02-09T15:53:38Z

/test e2e-meta-ipi

openshift-ci-robot · 2021-02-09T15:53:49Z

@stbenjam: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-ipi-virtualmedia
/test images
/test lint
/test unit

Use /test all to run the following jobs:

pull-ci-openshift-baremetal-operator-master-e2e-metal-ipi-ovn-ipv6
pull-ci-openshift-baremetal-operator-master-images
pull-ci-openshift-baremetal-operator-master-lint
pull-ci-openshift-baremetal-operator-master-unit

Details

In response to this:

/test e2e-meta-ipi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

stbenjam · 2021-02-09T15:54:21Z

/test e2e-metal-ipi

andfasano · 2021-02-11T09:05:18Z

/retest

stbenjam · 2021-02-11T16:50:04Z

/retest

honza · 2021-02-11T17:40:26Z

#125

openshift-ci · 2021-02-11T18:49:45Z

@honza: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-metal-ipi-virtualmedia	`1cbb1cd`	link	`/test e2e-metal-ipi-virtualmedia`
ci/prow/e2e-metal-ipi	`1cbb1cd`	link	`/test e2e-metal-ipi`
ci/prow/e2e-metal-ipi-ovn-dualstack	`1cbb1cd`	link	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-metal-ipi-ovn-ipv6	`1cbb1cd`	link	`/test e2e-metal-ipi-ovn-ipv6`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

honza · 2021-02-17T16:53:13Z

Superseded by #125

zouy414 and others added 30 commits November 23, 2020 11:10

A typo in the comment for WithNodeStatesProvisionUpdate

31d6f09

Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>

Rename misleading startProvisioning() function

ad1bc39

Since 1846513, startProvisioning() no longer requests the state change that actually... starts provisioning, so rename it to reflect that.

Handle cleaning failure during provisioning

daa0b21

The first time we provision a Node, we must move it from the manageable to available state. This involves automated cleaning, and it may fail. If it does, we should attempt to recover.

Add console=ttyS0 to kernel params

256eedf

Set console=ttyS0 so that the IPA console can be logged for debugging.

Add kubebuilder validation for ClockSpeed

c75c0a3

Signed-off-by: seivanov <seivanov@mirantis.com>

Merge pull request metal3-io#735 from zaneb/provisioning-tweaks

4c78685

Handle cleaning failures during provisioning

Merge pull request metal3-io#738 from s3rj1k/master

bab0725

Add kubebuilder validation for ClockSpeed

Filter out status updates from the reconcile loop

9b3dbf7

Merge pull request metal3-io#747 from andfasano/fix-status-updates

0fde063

Filter out status updates from the reconcile loop

Merge pull request metal3-io#736 from derekhiggins/ttyS0-console

f62d238

Add console=ttyS0 to kernel params

remove HasError method from host type

29c784b

Use the API field directly instead of the convenience method. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>

move error handling methods from host to private functions

b0c8c58

Move public methods of the host type to private functions in the controller package. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>

Bug 1907612: Update kubernetes deps to 1.20

b54c2c9

Merge pull request metal3-io#751 from kirankt/bz-1907612

e743aa4

Bug 1907612: Update kubernetes deps to 1.20

Fix a typo in doc/dev-setup.md

b4600fe

Fix a typo in docs/dev-setup.md document

Merge pull request metal3-io#752 from Nordix/fix-doc/feruz

4024604

Fix a typo in doc/dev-setup.md

Add an empty provisioner and a new way to select the provisioner

cadeb28

The new provioner is mostly a no-op and prov.IsReady() must return true

Fix manager namespace in docs

5c9ff7d

Fix outdated namespace docs/dev-setup document.

Merge pull request metal3-io#748 from asalkeld/no-bmc

35423c0

Allow BMC details to be omitted for Hosts in Unmanaged state

Clarify BMH nic hardware status for dual-stack

515da0f

Merge pull request metal3-io#758 from hardys/dualstack_fix

b9b8cba

Handle dual-stack configuration in inspection data

Merge pull request metal3-io#756 from Nordix/ns-bmo-docs/feruz

71eb0b2

Fix manager namespace in docs

Merge pull request metal3-io#741 from dhellmann/drop-available-method

93a6fd2

remove convenience methods from BareMetalHost

Delay retry on inspect failed

2d02462

If inspection fails at some point before it is actually started in ironic-inspector, we were just repeatedly retrying it instead of setting an error. Instead, set an error and retry only after a backoff.

Ironic: Handle Error state correctly

7791d21

Error is the state that the Node goes into if deleting (i.e. deprovisioning) it fails. The only valid action from this state is to try deleting again, so do that rather than attempt to go straight to manageable.

honza and others added 11 commits January 26, 2021 09:15

Merge pull request metal3-io#778 from s3rj1k/newline

cb1564d

Add missing newline in console log

Merge pull request metal3-io#772 from zaneb/deprov-fixes

5cac40c

Ironic: Don't adopt after clean failure during deprovisioning

Merge pull request metal3-io#776 from honza/port-details

1ccc746

Ask Ironic to include NodeUUID when querying ports

Add "NET_RAW" to ironic-endpoint-keepalived

5a1153d

Merge pull request metal3-io#780 from honza/mac-address-error

20d4da5

Fail registration when boot MAC conflicts

Merge pull request metal3-io#725 from andfasano/limit-hosts

3f79aae

Limit the number of hosts simultaneously provisioned

Merge pull request metal3-io#782 from Nordix/fix-add-cap-bote

aa9b416

Add "NET_RAW" to ironic-endpoint-keepalived

Merge upstream

ec18e6e

Update vendor

1cbb1cd

openshift-ci-robot requested review from andfasano and asalkeld February 1, 2021 20:08

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2021

honza closed this Feb 17, 2021

[preview] merge upstream 2021-02-01 #123

[preview] merge upstream 2021-02-01 #123

Uh oh!

Conversation

honza commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 1, 2021

Uh oh!

stbenjam commented Feb 1, 2021

Uh oh!

stbenjam commented Feb 1, 2021

Uh oh!

stbenjam commented Feb 1, 2021

Uh oh!

openshift-ci-robot commented Feb 1, 2021

Uh oh!

stbenjam commented Feb 1, 2021

Uh oh!

honza commented Feb 9, 2021

Uh oh!

stbenjam commented Feb 9, 2021

Uh oh!

openshift-ci-robot commented Feb 9, 2021

Uh oh!

stbenjam commented Feb 9, 2021

Uh oh!

andfasano commented Feb 11, 2021

Uh oh!

stbenjam commented Feb 11, 2021

Uh oh!

honza commented Feb 11, 2021

Uh oh!

openshift-ci bot commented Feb 11, 2021

Uh oh!

honza commented Feb 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

honza commented Feb 1, 2021 •

edited

Loading