-
Notifications
You must be signed in to change notification settings - Fork 27
[preview] merge upstream 2021-02-01 #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
Since 1846513, startProvisioning() no longer requests the state change that actually... starts provisioning, so rename it to reflect that.
The first time we provision a Node, we must move it from the manageable to available state. This involves automated cleaning, and it may fail. If it does, we should attempt to recover.
Set console=ttyS0 so that the IPA console can be logged for debugging.
Signed-off-by: seivanov <seivanov@mirantis.com>
Handle cleaning failures during provisioning
Add kubebuilder validation for ClockSpeed
Filter out status updates from the reconcile loop
Add console=ttyS0 to kernel params
It is up to the consumer, not the host, to decide when a host is ready to have provisioning instructions added. The checks in the Available() method will be inlined in cluster-api-provider-metal3. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
Use the API field directly instead of the convenience method. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
Move public methods of the host type to private functions in the controller package. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
Bug 1907612: Update kubernetes deps to 1.20
Fix a typo in docs/dev-setup.md document
Fix a typo in doc/dev-setup.md
The new provioner is mostly a no-op and prov.IsReady() must return true
Fix outdated namespace docs/dev-setup document.
Allow BMC details to be omitted for Hosts in Unmanaged state
Currently we assume only an ipv4 *or* an ipv6 IP exists in the inspection data, but it's valid for both to exist in a dual-stack scenario. To avoid changing the BMH API, in this case we create an additional nic entry in the list, which will contain each IP but otherwise the same data. Fixes: metal3-io#458
Handle dual-stack configuration in inspection data
Fix manager namespace in docs
remove convenience methods from BareMetalHost
When a call to the provisioner succeeds rather than returning an error message, that's a good sign, and a reason to not have an error message set in the Host object. But it doesn't guarantee success: if the previous failure came at the end of some long-running operation (rather than outright rejection at the beginning), it could yet fail in exactly the same way. Clearing the ErrorType as soon as we start an operation allows us to use that field to determine whether to force the provisioner to retry in the presence of an error. (It will only be set if the last thing we did was record an error, therefore if we see it then we are at the beginning of a new retry.) One known issue with this is that if a request to change state in Ironic results in a 409 Conflict response, this just sets the Dirty flag and is indistinguishable from success when viewed outside the Ironic provisioner. If such a conflict occurs, we will effectively skip a retry attempt, increment the ErrorCount again, and sleep for even longer before the next attempt. To ensure that we actually do exponential backoff between retries, leave the ErrorCount unchanged until the whole operation actually succeeds (Dirty is no longer true). This is now decoupled from the ClearError() method that clears the ErrorType and ErrorMessage. As much as possible, do the clearing of ErrorCount in the host state machine. The exception is the steady states where we only do power management - the actionResult types currently lack enough detail to distinguish when count should be cleared when viewed from the state machine. In power management states (Ready/Available, Provisioned, Externally Provisioned), count the number of errors of any type since the power state was last successfully changed. Otherwise, the ErrorCount is cleared when an operation succesfully completes. Successful re-registration or adoption is never sufficient to clear the error count, except in the registration state.
Once we see an error in the Node, it returns to the 'enroll' [sic] state and we don't have a way of determining whether we have seen and saved that error or not. Previously we always assumed we had not, and didn't retry the validation unless the credentials had changed. Add a force flag to indicate that this is a new attempt and should start again. Fixes metal3-io#739
If inspection fails at some point before it is actually started in ironic-inspector, we were just repeatedly retrying it instead of setting an error. Instead, set an error and retry only after a backoff.
Error is the state that the Node goes into if deleting (i.e. deprovisioning) it fails. The only valid action from this state is to try deleting again, so do that rather than attempt to go straight to manageable.
If deprovisioning fails, we were just repeatedly retrying it instead of setting an error. Instead, set an error and retry only after a backoff. If we are deprovisioning because of a deletion, give up after 3 retries (since this may indicate that the host simply doesn't exist any more) and allow the Host to be deleted.
When registering a new host, we check if its bootMACAddress conflicts with any existing Ironic nodes. This procedure queries Ironic ports, and if a port is found, we use the port's NodeUUID to find the conflicting node. However, we are using `ports.List()` which only returns minimal resources by default; e.g. any returned ports will have their NodeUUID property set to an empty string. We are now explicitly asking for Ironic to include the `node_uuid` field. We also adjust the mock test mechanism to account for this.
Add missing newline in console log
Ironic: Don't adopt after clean failure during deprovisioning
Ask Ironic to include NodeUUID when querying ports
If we can't find a node by name, and have to resort to using MAC address lookup, we can't accept a node that has a name. In that scenario, we have detected a MAC address collision, and human intervention is required. We signal this by returning an `operationFailed` provisioning result with the offending MAC address in the error message.
Fail registration when boot MAC conflicts
Limit the number of hosts simultaneously provisioned
Add "NET_RAW" to ironic-endpoint-keepalived
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: honza The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-metal-ipi-virtualmedia |
|
/test e2e-metal-ipi |
|
/test e2e-metal-assisted Not sure if that one's here, but we probably want to check it. |
|
@stbenjam: The specified target(s) for
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-metal-ipi-ovn-dualstack |
|
openshift/cluster-baremetal-operator#100 is now merged so let's try a fresh build /retest |
|
/test e2e-meta-ipi |
|
@stbenjam: The specified target(s) for
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-metal-ipi |
|
/retest |
1 similar comment
|
/retest |
|
@honza: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Superseded by #125 |
Here is the commit without the vendor update.
cc @stbenjam