Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't call service Validate / Prepare hooks twice #95666

Closed
wants to merge 19 commits into from

Conversation

thockin
Copy link
Member

@thockin thockin commented Oct 17, 2020

Service has an "outer" and "inner" REST handler. They use the same
strategy, but the outer calls into the inner, which caused Prepare and
Validate hooks to be called twice.

This change makes the "outer" strategy do nothing in those cases. One
net result of this is that the validation code can get a bit stricter,
since it will NEVER see a service before allocation. This is left for
later.

This whole mess can be simplified and flattened to a single REST layer,
which we can explore AFTER dual-stack merges

For posterity:

The "normal" flow seems to be:

mutating webhooks
generic REST store Create {
    BeforeCreate {
        strategy.PrepareForCreate {
            dropDisabledFields
        }
        strategy.Validate
        strategy.Canonicalize
    }
    createValidation (validating webhooks)
    storage Create
    AfterCreate
}

Service (before this PR) does:

mutating webhooks
svc custom Create {
    BeforeCreate {
        strategy.PrepareForCreate {
            dropDisabledFields
        }
        strategy.Validate
        strategy.Canonicalize
    }
    Allocations
    inner (generic) Create {
        BeforeCreate {
            strategy.PrepareForCreate {
                dropDisabledFields
            }
            strategy.Validate
            strategy.Canonicalize
        }
        createValidation (validating webhooks)
        storage Create
        AfterCreate
    }
}

After this PR the flow is the same but the "outer" strategy is no-op:

mutating webhooks
svc custom Create {
    BeforeCreate {
        strategy.PrepareForCreate (noop)
        strategy.Validate (noop)
        strategy.Canonicalize (noop)
    }
    Allocations
    inner (generic) Create {
        BeforeCreate {
            strategy.PrepareForCreate {
                dropDisabledFields
            }
            strategy.Validate
            strategy.Canonicalize
        }
        createValidation (validating webhooks)
        storage Create
        AfterCreate
    }
}

/kind bug
/kind cleanup

NONE

@thockin thockin added kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Oct 17, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 17, 2020
@thockin
Copy link
Member Author

thockin commented Oct 17, 2020

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 17, 2020
@thockin thockin force-pushed the dont-call-svc-strategy-twice branch from b02d3d7 to d1d41b9 Compare October 17, 2020 01:49
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 17, 2020
@thockin thockin force-pushed the dont-call-svc-strategy-twice branch 2 times, most recently from 6b46cb7 to c738b46 Compare October 17, 2020 04:45
@@ -810,3 +825,61 @@ func collectServiceNodePorts(service *api.Service) []int {
}
return servicePorts
}

// nearlyNullStrategy does almost nothing, but makes the REST logic happy.
Copy link
Member

@aojea aojea Oct 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in deepwaters here, is mostly for learning purposes, so
in Create we call

	if err := rest.BeforeCreate(rs.strategy, ctx, obj); err != nil {

with this nearlyNullStrategy, and here it can call strategy.GenerateName()

if len(objectMeta.GetGenerateName()) > 0 && len(objectMeta.GetName()) == 0 {
objectMeta.SetName(strategy.GenerateName(objectMeta.GetGenerateName()))
}

but I don`t see that we implement it , is not needed?
or should we mock it too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenerateName is implemented by names.SimpleNameGenerator which we embed in nearlyNullStrategy in NewREST() (pkg/registry/core/service/storage/rest.go). This was copied from the "real" strategy.

@aojea
Copy link
Member

aojea commented Oct 17, 2020

IIUIC with this change all allocators operations will be triggered without any validation,
can this be exploited by bad actors to, per example, exhaust nodeports? or is too paranoid?

@thockin
Copy link
Member Author

thockin commented Oct 17, 2020

@aojea it's a good question. I don't THINK so (not any more than before anyway)

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 17, 2020
@thockin
Copy link
Member Author

thockin commented Oct 17, 2020

I added more details on the flow to the PR description

@thockin
Copy link
Member Author

thockin commented Oct 17, 2020

/retest

@aojea
Copy link
Member

aojea commented Oct 17, 2020

I think that the tests that are failing are becuse this breaks current dual-stack, but that will be fixed later by Khal's PR ...
should this be incorporated in Khal's PR and merge at the same time?

EDIT

duh, I saw that you've already talked #91824 (review)

@thockin
Copy link
Member Author

thockin commented Oct 19, 2020

/retest

@thockin
Copy link
Member Author

thockin commented Oct 19, 2020

pkg/registry/core/service/storage/rest_test.go is all wonky in light of this. @khenidak and I agreed to get his PR merged and then untangle this.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 19, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thockin
Copy link
Member Author

thockin commented Oct 24, 2020

As expected, when I rebase this on top of Kal's dual-stack, it breaks. Working some ideas

// we have previously validated for ip correctness and if family exist it will match ip family
// so the following is safe to do
isIPv6 := netutil.IsIPv6String(ip)
parsedIP := net.ParseIP(ip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the top of my head, I' m afraid that a wrong service with one valid IP and an invalid one, allocates the valid and fails later.

ClusterIPs: ["192.168.0.2","INVALID"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic , we normalize and then parse, if something wrong parsing, return and validation catchs it later
👍

@thockin
Copy link
Member Author

thockin commented Oct 25, 2020

Thanks to @aojea for helping debug and adding a test.

@thockin thockin force-pushed the dont-call-svc-strategy-twice branch from beee1d1 to d880774 Compare October 25, 2020 01:35
@thockin
Copy link
Member Author

thockin commented Oct 25, 2020

the testing around this is weak, I will look at boosting it as I figure what failed.

@thockin
Copy link
Member Author

thockin commented Oct 25, 2020

To quote a master: "oof.. can't unsee"

func (nearlyNullStrategy) PrepareForUpdate(_ context.Context, _, _ runtime.Object) {
}

func (nearlyNullStrategy) Validate(_ context.Context, _ runtime.Object) field.ErrorList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin the unit tests fail now because we no longer validate here

func (strategy svcStrategy) Validate(ctx context.Context, obj runtime.Object) field.ErrorList {
service := obj.(*api.Service)
allErrs := validation.ValidateServiceCreate(service)
allErrs = append(allErrs, validation.ValidateConditionalService(service, nil, strategy.ipFamilies)...)
return allErrs
}

should we modify the test or the strategy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing here is ALL OVER THE PLACE. Before this change it tested with strategy, but strategy has its own tests. Some of the test in here only test validation, which also has its own tests. There are some general-looking (i.e. follows patterns) tests for the inner REST but the failing test are the outer rest.

So the question at hand is - should the UNIT TEST for the "outer" REST test all the way down into the inner REST? It never has. I can add strategy back in here, but it feels sketchy to be testing across packages that way. Or I can remove the validation-centric tests and just make this pass without strategy, then come back as we clean up the REST logic overall, which is where I am leaning.

When we used to validate before allocation.  Now that we
don't, we have to handle some extra cases in the allocation path.

This also removes some REST tests that were just testing validation
transitively and adjusts some other tests to be properly dual-stack.
Testing on pkg/registry/core/service/storage needs an overhaul.
@thockin
Copy link
Member Author

thockin commented Oct 25, 2020

I expect it to pass tests now, even if I am unsatisfied with the testing overall...

@thockin thockin force-pushed the dont-call-svc-strategy-twice branch from d880774 to 544e763 Compare October 25, 2020 21:42
@aojea
Copy link
Member

aojea commented Oct 25, 2020

/retest
/test pull-kubernetes-e2e-azure-dualstack
/test pull-kubernetes-e2e-iptables-azure-dualstack
/test pull-kubernetes-e2e-kind-ipvs-dual-canary
/test pull-kubernetes-e2e-kind-dual-canary

@thockin
Copy link
Member Author

thockin commented Oct 26, 2020

/retest

@k8s-ci-robot
Copy link
Contributor

@thockin: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-azure-dualstack 544e763 link /test pull-kubernetes-e2e-azure-dualstack
pull-kubernetes-e2e-iptables-azure-dualstack 544e763 link /test pull-kubernetes-e2e-iptables-azure-dualstack

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@aojea
Copy link
Member

aojea commented Oct 26, 2020

I expect it to pass tests now, even if I am unsatisfied with the testing overall...

@thockin they pass now, I think that as next step we should identify the gaps or areas we find weak and open issues to track and start adding more coverage

@k8s-ci-robot
Copy link
Contributor

@thockin: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 24, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thockin thockin deleted the dont-call-svc-strategy-twice branch August 2, 2021 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/ipvs area/kubectl area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants