Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baremetal: Always use image cache #5008

Merged
merged 1 commit into from Jul 2, 2021

Conversation

zaneb
Copy link
Member

@zaneb zaneb commented Jun 16, 2021

The metal3-image-cache daemonset is started unconditionally by the
cluster-baremetal-operator, serving on port 6181. However, we were only
pointing at this service in the MachineConfig if the provisioning
network was not present.

Given that the MachineConfig is fixed at install time while the
provisioning network can be added or removed by changing the
Provisioning resource, we should always use the cache.

This also means the cache will get exercised in CI, where previously it
was not.

The metal3-image-cache daemonset is started unconditionally by the
cluster-baremetal-operator, serving on port 6181. However, we were only
pointing at this service in the MachineConfig if the provisioning
network was not present.

Given that the MachineConfig is fixed at install time while the
provisioning network can be added or removed by changing the
Provisioning resource, we should always use the cache.

This also means the cache will get exercised in CI, where previously it
was not.
@openshift-ci openshift-ci bot requested review from markmc and stbenjam June 16, 2021 21:35
@zaneb
Copy link
Member Author

zaneb commented Jun 16, 2021

/cc @asalkeld @sadasu @stbenjam @hardys

@stbenjam
Copy link
Member

The changes make sense to me, but if the MachineConfig is fixed, then do we end up trying to use the provisioning IP after the network is disabled?

Maybe we always need to use the API VIP too, however that's less than ideal since if the user has a provisioning network we want all that traffic to be isolated there.

@sadasu
Copy link
Contributor

sadasu commented Jun 17, 2021

/retest

@zaneb
Copy link
Member Author

zaneb commented Jun 17, 2021

The changes make sense to me, but if the MachineConfig is fixed, then do we end up trying to use the provisioning IP after the network is disabled?

Excellent point. Yes :(

I do think this still makes sense if only to make sure we're always exercising the cache in CI.

Maybe we always need to use the API VIP too, however that's less than ideal since if the user has a provisioning network we want all that traffic to be isolated there.

In general the API VIP shouldn't even be routable from the provisioning network if it exists.

@sadasu
Copy link
Contributor

sadasu commented Jun 17, 2021

/retest

@sadasu
Copy link
Contributor

sadasu commented Jun 18, 2021

The changes make sense to me, but if the MachineConfig is fixed, then do we end up trying to use the provisioning IP after the network is disabled?

Excellent point. Yes :(

Should this be handled in a separate PR? Can we change the MachineConfig when ProvisioningNetwork is disabled on day-2?

@sadasu
Copy link
Contributor

sadasu commented Jun 18, 2021

/lgtm
Fix proposed to the problem stated in the PR looks good to me.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2021
@sadasu
Copy link
Contributor

sadasu commented Jun 21, 2021

/retest

1 similar comment
@sadasu
Copy link
Contributor

sadasu commented Jun 24, 2021

/retest

@stbenjam
Copy link
Member

The changes make sense to me, but if the MachineConfig is fixed, then do we end up trying to use the provisioning IP after the network is disabled?

Excellent point. Yes :(

Should this be handled in a separate PR? Can we change the MachineConfig when ProvisioningNetwork is disabled on day-2?

@zaneb Do you know the answer to this? ^^

@zaneb
Copy link
Member Author

zaneb commented Jun 24, 2021

That would need to be handled in CBO. We should probably look to address it as part of https://issues.redhat.com/browse/MPINSTALL-1 maybe.

@hardys
Copy link
Contributor

hardys commented Jul 2, 2021

The changes make sense to me, but if the MachineConfig is fixed, then do we end up trying to use the provisioning IP after the network is disabled?

I wonder if we should support going from enabled->disabled?

I've definitely heard requirements to deploy without any provisioning network, then enable it as a day-2 choice, but I've not heard any requirements to do the reverse flow (if you PXE booted all your hosts on day-1, you'll always need the provisioning network to support node rebuild/replacement right?)

That said I agree always using the cache makes sense, and we can deal with the potential network-disabled case in future if there turns out to be a requirement for that.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 2, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hardys

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@zaneb
Copy link
Member Author

zaneb commented Jul 2, 2021

I've definitely heard requirements to deploy without any provisioning network, then enable it as a day-2 choice, but I've not heard any requirements to do the reverse flow (if you PXE booted all your hosts on day-1, you'll always need the provisioning network to support node rebuild/replacement right?)

I guess if you had a provisioning network but only used virtualmedia on it, then you might decide that was a mistake and want to disable to provisioning network later. You make a good point that supporting this should probably be nobody's top priority though 😆

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 2, 2021

@zaneb: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-crc fc99be5 link /test e2e-crc

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 976e3a6 into openshift:master Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants