Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom deploy procedure support #156

Merged
merged 2 commits into from Jul 19, 2021
Merged

Custom deploy procedure support #156

merged 2 commits into from Jul 19, 2021

Conversation

kirankt
Copy link

@kirankt kirankt commented Jul 13, 2021

Add support for custom-deploy procedure. See:
metal3-io/baremetal-operator#884

@openshift-ci openshift-ci bot requested review from hardys and russellb July 13, 2021 02:55
@kirankt
Copy link
Author

kirankt commented Jul 13, 2021

/assign @dtantsur

@kirankt
Copy link
Author

kirankt commented Jul 13, 2021

/retest

1 similar comment
@kirankt
Copy link
Author

kirankt commented Jul 13, 2021

/retest

@dtantsur
Copy link
Member

I wonder if we need to update CAPM3 in a similar way...

@kirankt
Copy link
Author

kirankt commented Jul 13, 2021

/retest

2 similar comments
@kirankt
Copy link
Author

kirankt commented Jul 14, 2021

/retest

@kirankt
Copy link
Author

kirankt commented Jul 15, 2021

/retest

@kirankt
Copy link
Author

kirankt commented Jul 19, 2021

/retest

@hardys
Copy link

hardys commented Jul 19, 2021

/test e2e-metal-ipi-virtualmedia

@hardys
Copy link

hardys commented Jul 19, 2021

The virtualmedia job has never passed so I'm wondering if there's a real issue - in the latest run I noticed BMO was failing to start due to a port conflict, but that doesn't seem to be related to this change so retesting to confirm if that's repeatable

@openshift-ci
Copy link

openshift-ci bot commented Jul 19, 2021

@kirankt: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi-virtualmedia 1ea2b23 link /test e2e-metal-ipi-virtualmedia

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dtantsur
Copy link
Member

At least some tests have passed, so it's not necessarily a problem with this patch.

@@ -552,17 +554,23 @@ func (a *Actuator) provisionHost(ctx context.Context, host *bmh.BareMetalHost,
config *bmv1alpha1.BareMetalMachineProviderSpec) error {
originalHost := host.DeepCopy()

// We only want to update the image setting if the host does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this is potentially breaking as it will cause re-deploys on upgrade. Maybe skip everything if host.Spec.Image != nil || host.Spec.CustomDeploy != nil?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was at best out of date and at worst nonsense. Kiran was right to delete it.
Hosts that are already provisioned/ing are not going to be selected anyway, so we will never get to this point.

// IsValid returns an error if the object is not valid, otherwise nil. The
// string representation of the error is suitable for human consumption.
func (s *BareMetalMachineProviderSpec) IsValid() error {
missing := []string{}
if s.Image.URL == "" {
if s.CustomDeploy.Method == "" && s.Image.URL == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if s.CustomDeploy.Method == "" {
    return nil
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I think the whole IsValid function is now defunct. I thought about removing it since Image.URL/Checksum are now optional unlike before.

@kirankt
Copy link
Author

kirankt commented Jul 19, 2021

The virtualmedia job has never passed so I'm wondering if there's a real issue - in the latest run I noticed BMO was failing to start due to a port conflict, but that doesn't seem to be related to this change so retesting to confirm if that's repeatable

I don't think the failure is relate but I did notice that BMO was failing a few times in my test failing to bind to 9440 which is the liveness probe port for both BMO and CAPBM. Not sure why they are the same. My best guess is that when these two competing pods end up scheduled on the same control plane host, BMO fails because its the last to start.

https://github.com/openshift/baremetal-operator/blob/c9a3f263854478328765208f3e8109eafae8e5bd/main.go#L102

healthAddr := flag.String(

@zaneb
Copy link
Member

zaneb commented Jul 19, 2021

I did notice that BMO was failing a few times in my test failing to bind to 9440 which is the liveness probe port for both BMO and CAPBM. Not sure why they are the same.

That's only a problem if CAPBM is also running with --net=host. I can't think of a reason it should, but maybe it does?

Copy link
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@@ -552,17 +554,23 @@ func (a *Actuator) provisionHost(ctx context.Context, host *bmh.BareMetalHost,
config *bmv1alpha1.BareMetalMachineProviderSpec) error {
originalHost := host.DeepCopy()

// We only want to update the image setting if the host does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was at best out of date and at worst nonsense. Kiran was right to delete it.
Hosts that are already provisioned/ing are not going to be selected anyway, so we will never get to this point.

missing = append(missing, "Image.URL")
}
if s.Image.Checksum == "" {
if s.CustomDeploy.Method == "" && s.Image.Checksum == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably we should always check for a checksum when there is an image, not just if there is no custom deploy method. But for now there's no difference.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 19, 2021
@openshift-ci
Copy link

openshift-ci bot commented Jul 19, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kirankt, zaneb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 19, 2021
@openshift-merge-robot openshift-merge-robot merged commit 94eff91 into openshift:master Jul 19, 2021
@zaneb
Copy link
Member

zaneb commented Jul 20, 2021

I did notice that BMO was failing a few times in my test failing to bind to 9440 which is the liveness probe port for both BMO and CAPBM. Not sure why they are the same.

That's only a problem if CAPBM is also running with --net=host. I can't think of a reason it should, but maybe it does?

I checked and AFAICT it doesn't, but maybe something else does and uses the same default port.
Nobody, including BMO, owns this port, so at least 2 things are Doing It Wrong.

@kirankt
Copy link
Author

kirankt commented Jul 20, 2021

I did notice that BMO was failing a few times in my test failing to bind to 9440 which is the liveness probe port for both BMO and CAPBM. Not sure why they are the same.

That's only a problem if CAPBM is also running with --net=host. I can't think of a reason it should, but maybe it does?

I checked and AFAICT it doesn't, but maybe something else does and uses the same default port.
Nobody, including BMO, owns this port, so at least 2 things are Doing It Wrong.

You are right, but there are processes on master-0 that use ports 9440 and 9441 and AFAIK, only MAO and the its providers use these ports for healtz. I haven't been able to definitely locate the process/pod that runs on 9440, so I assumed it was CAPBM.. As long as CBO gets scheduled to the other control plane nodes, it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants