Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baremetal: Prevent race condition when adding HardwareDetails #3809

Merged
merged 1 commit into from Jul 23, 2020

Conversation

zaneb
Copy link
Member

@zaneb zaneb commented Jun 25, 2020

We add HardwareDetails to a BareMetalHost resource by including them in
initial Status data that is attached to the Host as an annotation. This
annotation must be present before the BareMetalHost is first reconciled
(thus creating the Status subresource), otherwise it will be ignored.

It is not possible to add the annotation at the time the BareMetalHost
is created. Therefore, we were previously relying on it being added
before the baremetal-operator is up and running. In practice this works
because the baremetal-operator takes a long time to start up, but in
fact this is a race.

Prevent the race by setting the baremetalhost.metal3.io/paused
annotation on the CR at the time it is created, and removing it again
when the status annotation is added. This annotation prevents the
operator from attempting to reconcile the resource.

@zaneb
Copy link
Member Author

zaneb commented Jun 25, 2020

/assign @stbenjam

@stbenjam
Copy link
Member

Nice. How concerned are we about actually hitting the race? Should we backport this to 4.5/4.4?

@zaneb
Copy link
Member Author

zaneb commented Jun 25, 2020

How concerned are we about actually hitting the race?

You tell me ;)

@stbenjam
Copy link
Member

/test e2e-metal-ipi

3 similar comments
@stbenjam
Copy link
Member

/test e2e-metal-ipi

@stbenjam
Copy link
Member

/test e2e-metal-ipi

@stbenjam
Copy link
Member

/test e2e-metal-ipi

@dhellmann
Copy link
Contributor

/retest

2 similar comments
@zaneb
Copy link
Member Author

zaneb commented Jul 13, 2020

/retest

@zaneb
Copy link
Member Author

zaneb commented Jul 15, 2020

/retest

@stbenjam
Copy link
Member

/test e2e-metal-ipi

1 similar comment
@stbenjam
Copy link
Member

/test e2e-metal-ipi

@stbenjam
Copy link
Member

Are we sure tihs is working properly? The failure looks like workers didn't come up.

/test e2e-metal-ipi

@@ -66,6 +66,9 @@ func Hosts(config *types.InstallConfig, machines []machineapi.Machine) (*HostSet
ObjectMeta: metav1.ObjectMeta{
Name: host.Name,
Namespace: "openshift-machine-api",
// Pause reconciliation until we can annotate with the initial
// status containing the HardwareDetails
Annotations: map[string]string{"baremetalhost.metal3.io/paused": ""},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation should only be set on masters. Moving it down to the block starting on line 84 should accomplish that.

We add HardwareDetails to a BareMetalHost resource by including them in
initial Status data that is attached to the Host as an annotation. This
annotation must be present before the BareMetalHost is first reconciled
(thus creating the Status subresource), otherwise it will be ignored.

It is not possible to add the annotation at the time the BareMetalHost
is created. Therefore, we were previously relying on it being added
before the baremetal-operator is up and running. In practice this works
because the baremetal-operator takes a long time to start up, but in
fact this is a race.

Prevent the race by setting the baremetalhost.metal3.io/paused
annotation on the CR at the time it is created, and removing it again
when the status annotation is added. This annotation prevents the
operator from attempting to reconcile the resource.
@stbenjam
Copy link
Member

/label platform/baremetal
/approve

@openshift-ci-robot openshift-ci-robot added the platform/baremetal IPI bare metal hosts platform label Jul 22, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2020
@stbenjam
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@zaneb: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-crc 1f04061 link /test e2e-crc
ci/prow/e2e-aws-fips 1f04061 link /test e2e-aws-fips
ci/prow/e2e-libvirt 1f04061 link /test e2e-libvirt

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 9832b74 into openshift:master Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. platform/baremetal IPI bare metal hosts platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants