Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add validation webhook for AWSMachine #1218

Merged

Conversation

tahsinrahman
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 15, 2019
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 15, 2019
Complete()
}

// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=vawsmachine.kb.io
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can override the odd vawsmachine.kb.io naming through this annotation... Could we name it validating.awsmachine.x-k8s.io here instead? @ncdc @vincepri thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would be clearer. We'll have to remember to make this change for every webhook.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncdc we could add a quick script to validate the default naming isn't used for CI. Might be able to even use something as naive as: if git --no-pager grep kb.io; then echo "kb.io type prefix found"; exit 1; fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a PR for renaming the validation webhook for AWSMachineTemplate here, and we can discuss more on that PR: #1219

Complete()
}

// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=vawsmachine.kb.io
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=vawsmachine.kb.io
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.x-k8s.io

@tahsinrahman tahsinrahman force-pushed the validate-awsmachine branch 2 times, most recently from ed244d3 to 3f6ef4e Compare October 19, 2019 06:05
@ncdc ncdc self-assigned this Oct 21, 2019
@ncdc ncdc added this to the v0.5.0 milestone Oct 21, 2019
func (r *AWSMachine) ValidateUpdate(old runtime.Object) error {
oldAWSMachine := old.(*AWSMachine)
if !reflect.DeepEqual(r.Spec, oldAWSMachine.Spec) {
return errors.New("awsMachineSpec is immutable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.New("awsMachineSpec is immutable")
return field.Forbidden(field.NewPath("spec"), "cannot be modified")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we'll need to allow updates to at least a subset of the spec (controller setting awsmachine.Spec.ProviderID for example).

We could probably allow updates to the AdditionalTags and AdditionalSecurityGroups fields safely as well (since those should be updated on each reconcile).

It might also be helpful for people that are troubleshooting issues with initial instance creation to be able to modify fields like the AMI, ImageLookupOrg, InstanceType, IAMInstanceProfile, AvailabilityZone, Subnet, SSHKeyName, NetworkInterfaces, etc assuming the instance doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to allow all updates as long as we haven't created an ec2 instance yet? We might need to worry about race conditions such as:

  1. Simultaneously:
    1. Reconciler receives AWSMachine at ResourceVersion=1
    2. User quickly gets in some updates, now ResourceVersion=2
  2. Reconciler creates ec2 instance using data from ResourceVersion=1 (maybe it's slow today)
  3. Reconciler sees updated AWSMachine at ResourceVersion=2
    1. But it can't do anything at this point, so the data doesn't match what happened in ec2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncdc as long as we don't necessarily rely on something like the providerID being set on the awsmachine.Spec, and fallback to doing a describe for the instance we should be able to avoid race conditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber I don't think that helps what I described above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncdc our current implementation is also susceptible to this race condition you mentioned, right? so if we move the logic that's now deleted in this PR into the webhook validation code, it can also face this scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahsinrahman you are correct. I guess this comes back to my question from https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/1218/files#r344289646: is it better to allow updates with the possibility of a race, or err on the side of caution and reject all updates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the side of disallowing all updates for now and see how it affects user experience. But lets come to an agreement on what to do! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, seems like the general consensus is to disallow updates for now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have consensus!

@ncdc
Copy link
Contributor

ncdc commented Nov 7, 2019

1 small comment. @detiber @vincepri how does this look to you?

Complete()
}

// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.x-k8s.io
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.x-k8s.io
// +kubebuilder:webhook:verbs=create;update,path=/validate-infrastructure-cluster-x-k8s-io-v1alpha3-awsmachine,mutating=false,failurePolicy=fail,groups=infrastructure.cluster.x-k8s.io,resources=awsmachines,versions=v1alpha3,name=validation.awsmachine.infrastructure.cluster.x-k8s.io

func (r *AWSMachine) ValidateUpdate(old runtime.Object) error {
oldAWSMachine := old.(*AWSMachine)
if !reflect.DeepEqual(r.Spec, oldAWSMachine.Spec) {
return errors.New("awsMachineSpec is immutable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we'll need to allow updates to at least a subset of the spec (controller setting awsmachine.Spec.ProviderID for example).

We could probably allow updates to the AdditionalTags and AdditionalSecurityGroups fields safely as well (since those should be updated on each reconcile).

It might also be helpful for people that are troubleshooting issues with initial instance creation to be able to modify fields like the AMI, ImageLookupOrg, InstanceType, IAMInstanceProfile, AvailabilityZone, Subnet, SSHKeyName, NetworkInterfaces, etc assuming the instance doesn't exist.

Copy link
Contributor

@chuckha chuckha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be documenting anything on this PR? Such as what validations we are enforcing and how to set up the validation web hook? Or is everything opaque to the user and no documentation is required?

@ncdc
Copy link
Contributor

ncdc commented Nov 13, 2019

@chuckha the examples already set up cert-manager and webhooks. However, ultimately we need more comprehensive documentation around webhook setup, including how to use something other than cert-manager. I'd like to add that to kubernetes-sigs/cluster-api#1757.

As far as documenting what validations we're enforcing, this will disallow all updates to AWSMachine.Spec... Although we have to address @detiber's comment:

I suspect we'll need to allow updates to at least a subset of the spec (controller setting awsmachine.Spec.ProviderID for example).

And potentially:

We could probably allow updates to the AdditionalTags and AdditionalSecurityGroups fields safely as well (since those should be updated on each reconcile).

I'm not sure where the best place is to document this. Any ideas?

@detiber
Copy link
Member

detiber commented Nov 13, 2019

Should we be documenting anything on this PR? Such as what validations we are enforcing and how to set up the validation web hook? Or is everything opaque to the user and no documentation is required?

We should likely add a doc similar to https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/book/src/providers/v1alpha2-to-v1alpha3.md and add an item discussing the more strict validation there.

I don't think we should have to inform users how to deploy the validation webhook, since it will be deployed as part of the published provider components yaml.

@chuckha
Copy link
Contributor

chuckha commented Nov 13, 2019

Should we be documenting anything on this PR? Such as what validations we are enforcing and how to set up the validation web hook? Or is everything opaque to the user and no documentation is required?

We should likely add a doc similar to https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/book/src/providers/v1alpha2-to-v1alpha3.md and add an item discussing the more strict validation there.

I don't think we should have to inform users how to deploy the validation webhook, since it will be deployed as part of the published provider components yaml.

I think we could have it in that doc for the current upgrade. More informational changes to be aware of. Perhaps changes to your existing YAML you may need to make.

Then, I was thinking about future uses and where I'd go to find out validation rules. I'd probably look at either godoc or reference documentation (like the k8s reference docs). So godoc seems like a good place to put this from my perspective

@tahsinrahman
Copy link
Contributor Author

Do I have to manually go through each fields in awsMachine.spec one by one? or is there any helper functions that'll allow me to check only providerID, AdditionalTags and AdditionalSecurityGroups is changed?

@ncdc
Copy link
Contributor

ncdc commented Nov 14, 2019

Oh and we'd presumably want to allow annotations and labels to change too.

I think if you want an "easy" way, you could deep-copy both old & new, clear our the fields that we allow to be changed in both of the copies, and then compare with reflect.DeepEqual... maybe?

main.go Outdated Show resolved Hide resolved
delete(newAWSMachineSpec, "additionalSecurityGroups")

if !reflect.DeepEqual(oldAWSMachineSpec, newAWSMachineSpec) {
return field.Forbidden(field.NewPath("spec"), "cannot be modified")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just learned about a better way to return validation errors - please see https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation.html / xref kubernetes-sigs/cluster-api#1795 too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncdc sorry for being late. I updated the pr, ptal :)

func (r *AWSMachine) ValidateUpdate(old runtime.Object) error {
newAWSMachine, err := runtime.DefaultUnstructuredConverter.ToUnstructured(r)
if err != nil {
return errors.Wrap(err, "failed to convert new AWSMachine to unstructured object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return apierrors.NewInvalid() here?

}
oldAWSMachine, err := runtime.DefaultUnstructuredConverter.ToUnstructured(old)
if err != nil {
return errors.Wrap(err, "failed to convert old AWSMachine to unstructured object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return apierrors.NewInvalid() here?

tests := []struct {
name string
old *AWSMachine
new *AWSMachine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I usually try to avoid using new as a variable name, given that it's a go keyword. If you want to change the name, you could use something like updated instead.

@detiber
Copy link
Member

detiber commented Dec 2, 2019

/approve
leaving lgtm to @ncdc

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: detiber, tahsinrahman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2019
@ncdc
Copy link
Contributor

ncdc commented Dec 3, 2019

@tahsinrahman if you'd like, we can merge as-is and I can open a small follow-up PR to address my remaining comments. Please let me know your preference. Thanks!

@tahsinrahman
Copy link
Contributor Author

tahsinrahman commented Dec 3, 2019

@tahsinrahman if you'd like, we can merge as-is and I can open a small follow-up PR to address my remaining comments. Please let me know your preference. Thanks!

oops, i somehow missed your review. let me update it in this pr

update: @ncdc updated!

@ncdc
Copy link
Contributor

ncdc commented Dec 3, 2019

Thanks!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2019
@k8s-ci-robot k8s-ci-robot merged commit 06ec59c into kubernetes-sigs:master Dec 3, 2019
@tahsinrahman tahsinrahman deleted the validate-awsmachine branch December 3, 2019 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants