Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating Webhook for BareMetalHost #855

Open
8 of 21 tasks
zaneb opened this issue Apr 13, 2021 · 21 comments
Open
8 of 21 tasks

Validating Webhook for BareMetalHost #855

zaneb opened this issue Apr 13, 2021 · 21 comments
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@zaneb
Copy link
Member

zaneb commented Apr 13, 2021

There's a number of values or changes in a BMH Spec that we know are invalid, but that we don't prevent the user from commiting. A validating webhook would allow us to reject these values synchronously before the CR is written, thus relieving both us and the user from asynchronous error reporting.

This bug will serve as a place to record known invalid configs that should be checked by a webhook, until such time as it is implemented.

  • Hardware RAID + Software RAID requested simultaneously
  • Other invalid RAID config???
  • Resource name that is invalid in ironic (looks like a UUID or contains chars other than [A-Za-z0-9~._-]
  • Resource name that contains a ~ (used in ironic Node name to separate name + namespace)
  • Names that are invalid as part of a DNS hostname
  • Changing BMC Address once set
  • Changing Boot MAC once set
  • Invalid BMC Config
  • RAID requested when not supported by driver
  • Missing Boot MAC when required by driver
  • Status annotation added after Status subresource already exists (possible fix: 🌱 Bmh annotation validations #1120)
  • Status annotation containing invalid data (possible fix: 🌱 Bmh annotation validations #1120)
  • Hardware details annotation containing invalid data (possible fix: 🌱 Bmh annotation validations #1120)
  • Reboot annotation with invalid mode value (possible fix: 🌱 Bmh annotation validations #1120)
  • Inspect annotation with invalid value (possible fix: 🌱 Bmh annotation validations #1120)
  • Duplicate BMC addresses
  • Duplicate Boot MAC addresses
  • Externally Provisioned flag changed in a state where we can't give effect to it
  • Provisioning format live-iso used with non-virtualmedia driver
  • Hardware RAID physicalDisks specified without controller
  • Hardware RAID inconsistent numberOfPhysicalDisks and physicalDisks specified
@zaneb zaneb added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Apr 13, 2021
@andfasano
Copy link
Member

  • Expected value on some annotations (ie inspect.metal3.io)?
  • It would be useful to capture all these checks in one point only, so that they could be shared by both the webhook and controller

@ardaguclu
Copy link
Contributor

@s3rj1k are you working on it?

@s3rj1k
Copy link
Member

s3rj1k commented Apr 16, 2021

@ardaguclu No, just watching issue for updates

@ardaguclu
Copy link
Contributor

/assign @ardaguclu

@zaneb zaneb removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 18, 2021
@hardys
Copy link
Member

hardys commented Sep 7, 2021

Changing Boot MAC once set

We had some discussion on this - it is valid to change this in some cases, for example if you deploy via virtualmedia it can potentially work with any MAC on the system, but if you later want to deploy via PXE (or as we're doing downstream, derive the provisioning-network NIC from this MAC) then it must be the correct nic.

There are reported cases e.g https://bugzilla.redhat.com/show_bug.cgi?id=2000081 of this being configured wrongly but deployment still succeeding, in which case it's likely we'd want to enable patching of the bootMACAddress to some other value (but we could still e.g validate that it's a valid mac for the system by comparing with the status.hardware data)

@zaneb
Copy link
Member Author

zaneb commented Sep 7, 2021

Changing Boot MAC once set

We had some discussion on this - it is valid to change this in some cases, for example if you deploy via virtualmedia it can potentially work with any MAC on the system, but if you later want to deploy via PXE (or as we're doing downstream, derive the provisioning-network NIC from this MAC) then it must be the correct nic.

Changing from virtualmedia->PXE means changing the driver, so it also means a new BMC address (which is never safe).

There are reported cases e.g https://bugzilla.redhat.com/show_bug.cgi?id=2000081 of this being configured wrongly but deployment still succeeding, in which case it's likely we'd want to enable patching of the bootMACAddress to some other value (but we could still e.g validate that it's a valid mac for the system by comparing with the status.hardware data)

Validating against the HardwareDetails is not a bad idea, but as it stands that would mean the user is effectively free to change it, since if they can change the boot MAC they can also change the HardwareDetails.

@hardys
Copy link
Member

hardys commented Sep 9, 2021

Changing from virtualmedia->PXE means changing the driver, so it also means a new BMC address (which is never safe).

Ok, but what if a non-integrated NIC gets replaced - should we force a user to redeploy, or to defer updating the BMH until some later date when redeploy is needed (and probably the NIC replacement is forgotten...)? IMO it's more logical to relax this validation and allow them to patch the bootMACAddress, but with some validation to prevent completely-wrong values?

Validating against the HardwareDetails is not a bad idea, but as it stands that would mean the user is effectively free to change it, since if they can change the boot MAC they can also change the HardwareDetails.

They could only change it with the inspect annotation right? In which case they have to explicitly disable inspection, so I think it's reasonable to say users get to keep the pieces if they do that, we can only validate against the HardwareDetails we have, so it's up to the user to provide correct data (or use the default path where we collect it)?

@zaneb
Copy link
Member Author

zaneb commented Sep 9, 2021

Forgot the HardwareDetails is in the Status, not the Spec. That's probably a reasonable trade-off then, although I'm not sure how it helps if the NIC is replaced.

The main thing we want to prevent is switching the MAC to point to some other host so that things get messed up in the ironic DB.

@hardys
Copy link
Member

hardys commented Sep 10, 2021

Forgot the HardwareDetails is in the Status, not the Spec. That's probably a reasonable trade-off then, although I'm not sure how it helps if the NIC is replaced.

Hmm, yeah that's true, they'd have to patch the status (either the hacky way, or via the inspect annotation)

The main thing we want to prevent is switching the MAC to point to some other host so that things get messed up in the ironic DB.

That would be enough to help with the downstream case mentioned, although clearly that's not directly relevant to BMO, and may have be rethought if we're not comfortable with the pattern of consuming bootMACAddress to identify the provisioning NIC.

@demonCoder95
Copy link
Member

/assign @demonCoder95

@demonCoder95
Copy link
Member

I'm going to take a stab at the last 2 validations related to RAID. @zaneb PR coming up soon.

@demonCoder95
Copy link
Member

I'm going to take a stab at the last 2 validations related to RAID. @zaneb PR coming up soon.

@zaneb PR up, as promised. #1070

@Rozzii
Copy link
Member

Rozzii commented Feb 3, 2022

/help
/triage accepted

@metal3-io-bot
Copy link
Contributor

@Rozzii:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Feb 3, 2022
@ardaguclu ardaguclu removed their assignment Mar 24, 2022
@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 22, 2022
@furkatgofurov7
Copy link
Member

/remove-lifecycle stale

@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2022
@Rozzii
Copy link
Member

Rozzii commented Jul 6, 2022

/lifecycle frozen

@metal3-io-bot metal3-io-bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 6, 2022
@demonCoder95
Copy link
Member

/unassign @demonCoder95

@Rozzii
Copy link
Member

Rozzii commented Nov 16, 2022

/remove-lifecycle frozen

@metal3-io-bot metal3-io-bot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 16, 2022
@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2023
@zaneb
Copy link
Member Author

zaneb commented Feb 15, 2023

/lifecycle frozen
This is meant to serve as a tracking issue for people to co-ordinate on the various validations that could be added, so it will need to stay open for an extended period of time.

@metal3-io-bot metal3-io-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants