Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream 2021-04-06 #142

Merged

Conversation

honza
Copy link
Member

@honza honza commented Apr 6, 2021

cc @hardys

namnx228 and others added 30 commits March 30, 2021 12:44
Always use the shortcut p.status in place of p.host.Status.Provisioning.
Store the ObjectMeta as a separate field, so that we can eventually stop
passing the whole BareMetalHost object to provisioners.
Store the provisioning ID as a separate field, so that we can eventually
stop passing the whole BareMetalHost object to provisioners.
The BootMACAddress is only used in registering the host, so pass it
explicitly to ValidateManagementAccess().
The HostConfigData is only one of several things we will want to pass in
the future, so encapsulate it inside a struct.
Don't rely on the BootMode in the BareMetalHost, but instead pass it
explicitly to those Provisioner methods that require it
(ValidateManagementAccess, InspectHardware, and Provision).

This also resolves an inconsistency where we were sometimes looking at
the BootMode in the Spec and sometimes the one in the Status. We now
look only at the one in the Status. To allow this to do something useful
during Registration, also copy the BootMode from the Spec to the Status
as part of entering Registration (previously this did not happen until
entering Inspecting).
Where we are accessing parts of the BareMetalHost structure in order to
log not-particularly-interesting data, stop it.
Pass the Image to be provisioned to Provision(), and any current image
(provisioned or provisioning) to ValidateManagementAccess().
In most cases it is obvious from the call we are making which state the
host is in. However, ValidateManagementAccess is called from
substantially every state, and Adopt is also called from multiple
states. In those cases, pass the current state explicitly.
When I wrote this I had no idea what I was talking about. Rewrite it to
more accurately reflect how Adopt() is actually used by the controller.
When creating a provisioner, pass only the data that may be necessary for any
call. Avoid passing the entire BareMetalHost object, so that
provisioners are forced to use only data that is passed explicitly to
each method.
Although the ironic provisioner uses the BootMACAddress only during
registration, conceptually it can be regarded as part of the identity of
the Host and therefore potentially of use to the provisioner in any call
that it handles. Therefore pass it to the provisioner factory in the
HostData so it is always available to the provisioner.
Fix the bug that the inspector reverse proxy still runs even when Ironic TLS is disabled
In the patch 2d02462 we were supposed
to report an error and wait for an exponentially increasing backoff time
before retrying. However, due to a missing early return, we are retrying
immediately with the result that the error gets cleared in fairly short
order.

Return early on failure as intended. Also, use the operationFailed()
convenience function instead of ad hoc updates to the return variables;
this case was likely missed due to the lack of a return.
Currently we only handle the case where the json fails to parse
e.g is invalid json, but we silently ignore any value which is
valid data but fails to unpack into the HardwareDetails object.

In this case we should avoid removing the annotation since it's
not been copied into the status.hardware, and return an error
so the user can detect that the json data wasn't in the expected
format.
Don't immediately retry on Inspect fail
Define the inspector kernel parameter to include vlan interfaces
and associated IP addresses in the introspection report.
This will set 'name' field of metaData as bmh name, otherwise by default ironic
name gets set as hostname by default and ~ in hostname seems to be problematic
for kubeadm during cloudinit nodeRegistration.
Handle hardwaredetails annotation value error
@honza
Copy link
Member Author

honza commented Apr 7, 2021

/test e2e-metal-ipi-ovn-ipv6

@honza honza force-pushed the merge-upstream-2021-04-06 branch from b321903 to 958c62d Compare April 7, 2021 17:23
@stbenjam
Copy link
Member

stbenjam commented Apr 7, 2021

/test e2e-metal-ipi
/test e2e-metal-ipi-virtualmedia
/test e2e-metal-ipi-ovn-ipv6

@hardys
Copy link

hardys commented Apr 8, 2021

It seems suspicious that all the tests have failed several times, but AFAICS from the logs the workers did get provisioned and there's no error output in the BMO logs e.g:

2021-04-08T00:27:36.294702915Z {"level":"info","ts":1617841656.2936947,"logger":"controllers.BareMetalHost","msg":"verified access to the BMC","baremetalhost":"openshift-machine-api/ostest-worker-0","provisioningState":"provisioned"}
2021-04-08T00:27:36.357499265Z {"level":"info","ts":1617841656.3574355,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/ostest-worker-0","provisioningState":"provisioned","requeue":true,"after":60}

@hardys
Copy link

hardys commented Apr 8, 2021

/retest

Signed-off-by: s3rj1k <evasive.gyron@gmail.com>
@stbenjam
Copy link
Member

stbenjam commented Apr 8, 2021

[sig-installer][Feature:baremetal] Baremetal platform should have baremetalhost resources [Suite:openshift/conformance/parallel] seems to be consistently failing, that is concerning and needs looking at

@hardys
Copy link

hardys commented Apr 8, 2021

That's odd - I see them in the must-gather, and the BMO says the workers were deployed 🤔

metal3-io-bot and others added 2 commits April 8, 2021 08:27
Use PROVISIONING_LIMIT also for deprovisioning.
In 269ff1e we stopped potentially
making use of the image from the Spec in the provisioner when there is
no image in the Status, except for the Provisioning state where we want
to register with the image we are provisioning even though it hasn't
completed yet.

We also need to make an exception for the ExternallyProvisioned state,
because in that state the image details are never copied to the Status.

This relies on the change in c0373bb,
which ensures that the image data can be added to the Node once the Host
reaches the ExternallyProvisioned state. Prior to this, the image data
was only added when the Node was created, so we would have had to pass
the image details starting with the first call to
ValidateManagementAccess() in the Registration state.
@zaneb
Copy link
Member

zaneb commented Apr 8, 2021

I think metal3-io#849 should fix this.

@honza
Copy link
Member Author

honza commented Apr 8, 2021

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2021
@zaneb
Copy link
Member

zaneb commented Apr 8, 2021

/test e2e-metal-ipi
/test e2e-metal-ipi-virtualmedia
/test e2e-metal-ipi-serial-ipv4

@zaneb
Copy link
Member

zaneb commented Apr 8, 2021

/test e2e-metal-ipi
/test e2e-metal-ipi-virtualmedia

@stbenjam
Copy link
Member

stbenjam commented Apr 8, 2021

Looks like upstream prow will be broken until at least tomorrow, we should probably just land this when CI is happy, and carry Zane's patch here.

@stbenjam
Copy link
Member

stbenjam commented Apr 9, 2021

/test e2e-metal-ipi
/test e2e-metal-ipi-virtualmedia

@zaneb
Copy link
Member

zaneb commented Apr 9, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 9, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: honza, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zaneb
Copy link
Member

zaneb commented Apr 9, 2021

I agree with Stephen's suggestion. Carrying the patch until the upstream CI is stood up again won't hurt. CI here shows that it fixes the bug, so nothing is likely to change in the upstream PR and the subsequent merge will be trivial.
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2021
@openshift-ci
Copy link

openshift-ci bot commented Apr 9, 2021

@honza: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi 5e88bc2 link /test e2e-metal-ipi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 31b989a into openshift:master Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet