New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add validation webhook for AWSMachine #1218
✨ Add validation webhook for AWSMachine #1218
Conversation
api/v1alpha3/awsmachine_webhook.go
Outdated
Complete() | ||
} | ||
|
||
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=vawsmachine.kb.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably would be clearer. We'll have to remember to make this change for every webhook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncdc we could add a quick script to validate the default naming isn't used for CI. Might be able to even use something as naive as: if git --no-pager grep kb.io; then echo "kb.io type prefix found"; exit 1; fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a PR for renaming the validation webhook for AWSMachineTemplate here, and we can discuss more on that PR: #1219
api/v1alpha3/awsmachine_webhook.go
Outdated
Complete() | ||
} | ||
|
||
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=vawsmachine.kb.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=vawsmachine.kb.io | |
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.x-k8s.io |
ed244d3
to
3f6ef4e
Compare
api/v1alpha3/awsmachine_webhook.go
Outdated
func (r *AWSMachine) ValidateUpdate(old runtime.Object) error { | ||
oldAWSMachine := old.(*AWSMachine) | ||
if !reflect.DeepEqual(r.Spec, oldAWSMachine.Spec) { | ||
return errors.New("awsMachineSpec is immutable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.New("awsMachineSpec is immutable") | |
return field.Forbidden(field.NewPath("spec"), "cannot be modified") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we'll need to allow updates to at least a subset of the spec (controller setting awsmachine.Spec.ProviderID
for example).
We could probably allow updates to the AdditionalTags
and AdditionalSecurityGroups
fields safely as well (since those should be updated on each reconcile).
It might also be helpful for people that are troubleshooting issues with initial instance creation to be able to modify fields like the AMI, ImageLookupOrg, InstanceType, IAMInstanceProfile, AvailabilityZone, Subnet, SSHKeyName, NetworkInterfaces, etc assuming the instance doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to allow all updates as long as we haven't created an ec2 instance yet? We might need to worry about race conditions such as:
- Simultaneously:
- Reconciler receives AWSMachine at ResourceVersion=1
- User quickly gets in some updates, now ResourceVersion=2
- Reconciler creates ec2 instance using data from ResourceVersion=1 (maybe it's slow today)
- Reconciler sees updated AWSMachine at ResourceVersion=2
- But it can't do anything at this point, so the data doesn't match what happened in ec2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncdc as long as we don't necessarily rely on something like the providerID being set on the awsmachine.Spec, and fallback to doing a describe for the instance we should be able to avoid race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@detiber I don't think that helps what I described above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncdc our current implementation is also susceptible to this race condition you mentioned, right? so if we move the logic that's now deleted in this PR into the webhook validation code, it can also face this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tahsinrahman you are correct. I guess this comes back to my question from https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/1218/files#r344289646: is it better to allow updates with the possibility of a race, or err on the side of caution and reject all updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the side of disallowing all updates for now and see how it affects user experience. But lets come to an agreement on what to do! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm, seems like the general consensus is to disallow updates for now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have consensus!
api/v1alpha3/awsmachine_webhook.go
Outdated
Complete() | ||
} | ||
|
||
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.x-k8s.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.x-k8s.io | |
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.cluster.x-k8s.io |
api/v1alpha3/awsmachine_webhook.go
Outdated
func (r *AWSMachine) ValidateUpdate(old runtime.Object) error { | ||
oldAWSMachine := old.(*AWSMachine) | ||
if !reflect.DeepEqual(r.Spec, oldAWSMachine.Spec) { | ||
return errors.New("awsMachineSpec is immutable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we'll need to allow updates to at least a subset of the spec (controller setting awsmachine.Spec.ProviderID
for example).
We could probably allow updates to the AdditionalTags
and AdditionalSecurityGroups
fields safely as well (since those should be updated on each reconcile).
It might also be helpful for people that are troubleshooting issues with initial instance creation to be able to modify fields like the AMI, ImageLookupOrg, InstanceType, IAMInstanceProfile, AvailabilityZone, Subnet, SSHKeyName, NetworkInterfaces, etc assuming the instance doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be documenting anything on this PR? Such as what validations we are enforcing and how to set up the validation web hook? Or is everything opaque to the user and no documentation is required?
@chuckha the examples already set up cert-manager and webhooks. However, ultimately we need more comprehensive documentation around webhook setup, including how to use something other than cert-manager. I'd like to add that to kubernetes-sigs/cluster-api#1757. As far as documenting what validations we're enforcing, this will disallow all updates to AWSMachine.Spec... Although we have to address @detiber's comment:
And potentially:
I'm not sure where the best place is to document this. Any ideas? |
We should likely add a doc similar to https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/book/src/providers/v1alpha2-to-v1alpha3.md and add an item discussing the more strict validation there. I don't think we should have to inform users how to deploy the validation webhook, since it will be deployed as part of the published provider components yaml. |
I think we could have it in that doc for the current upgrade. More informational changes to be aware of. Perhaps changes to your existing YAML you may need to make. Then, I was thinking about future uses and where I'd go to find out validation rules. I'd probably look at either godoc or reference documentation (like the k8s reference docs). So godoc seems like a good place to put this from my perspective |
3f6ef4e
to
6c45d2d
Compare
Do I have to manually go through each fields in |
Oh and we'd presumably want to allow annotations and labels to change too. I think if you want an "easy" way, you could deep-copy both old & new, clear our the fields that we allow to be changed in both of the copies, and then compare with |
6c45d2d
to
dc50585
Compare
api/v1alpha3/awsmachine_webhook.go
Outdated
delete(newAWSMachineSpec, "additionalSecurityGroups") | ||
|
||
if !reflect.DeepEqual(oldAWSMachineSpec, newAWSMachineSpec) { | ||
return field.Forbidden(field.NewPath("spec"), "cannot be modified") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just learned about a better way to return validation errors - please see https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation.html / xref kubernetes-sigs/cluster-api#1795 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncdc sorry for being late. I updated the pr, ptal :)
dc50585
to
ee472bf
Compare
api/v1alpha3/awsmachine_webhook.go
Outdated
func (r *AWSMachine) ValidateUpdate(old runtime.Object) error { | ||
newAWSMachine, err := runtime.DefaultUnstructuredConverter.ToUnstructured(r) | ||
if err != nil { | ||
return errors.Wrap(err, "failed to convert new AWSMachine to unstructured object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return apierrors.NewInvalid()
here?
api/v1alpha3/awsmachine_webhook.go
Outdated
} | ||
oldAWSMachine, err := runtime.DefaultUnstructuredConverter.ToUnstructured(old) | ||
if err != nil { | ||
return errors.Wrap(err, "failed to convert old AWSMachine to unstructured object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return apierrors.NewInvalid()
here?
tests := []struct { | ||
name string | ||
old *AWSMachine | ||
new *AWSMachine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I usually try to avoid using new
as a variable name, given that it's a go keyword. If you want to change the name, you could use something like updated
instead.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: detiber, tahsinrahman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@tahsinrahman if you'd like, we can merge as-is and I can open a small follow-up PR to address my remaining comments. Please let me know your preference. Thanks! |
oops, i somehow missed your review. let me update it in this pr update: @ncdc updated! |
ee472bf
to
af1e8a3
Compare
Thanks! |
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #