Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server-side apply] Cap the number of managedFields entries for updates at 10 #81816

Merged
merged 2 commits into from Oct 5, 2019

Conversation

@jennybuckley
Copy link
Contributor

commented Aug 23, 2019

What type of PR is this?
/kind feature
/sig api-machinery
/wg apply
/priority important-soon
/cc @apelisse @lavalamp @kwiesmueller

What this PR does / why we need it:
Prevents the list of managers from growing too large.

also,
Fixes #82042

Does this PR introduce a user-facing change?:

NONE
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2019

/test pull-kubernetes-e2e-gce-alpha-features

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2019

/retest

}

for i := 0; i < 20; i++ {
time.Sleep(500 * time.Millisecond)

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 23, 2019

Member

That's guaranteed to take at least 10 seconds :-/

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 23, 2019

Member

Do you have to wait if you use a different updater anyway?

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 26, 2019

Author Contributor

The wait was to make sure that the oldest was removed, but since I'm not even checking that I think it's unnecessary


// Expect 12 entries, because the cap for update entries is 10, and there are 2 apiVersions of the
// endpoint being updated, so there should also be two bucket entries.
if len(accessor.GetManagedFields()) > 12 {

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 23, 2019

Member

Should it be exactly 12?

t.Fatalf("Failed to get meta accessor: %v", err)
}

actual, err := json.MarshalIndent(object, "\t", "\t")

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 23, 2019

Member

Can you only marshal if the test fails? Not super important

// endpoint being updated, so there should also be two bucket entries.
if len(accessor.GetManagedFields()) > 12 {
t.Fatalf("Object contains more managers than expected:\n%v", string(actual))
}

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 23, 2019

Member

You could technically test that you get a conflict with the "manager-too-old-to-track" if you change any of the fields

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 27, 2019

Member

That's maybe a bit overboard but you should test that the manager name (whatever it is) is present.

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 29, 2019

Member

Again, I suspect we could do most of the testing in fieldmanager, which would probably be much more convenient.

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Sep 20, 2019

Author Contributor

is this test still necessary with the unit tests I added? probably not

@@ -282,3 +287,65 @@ func (f *FieldManager) stripFields(managed fieldpath.ManagedFields, manager stri

return managed
}

// TODO: Determine a better value for this constant.
const maxUpdateManagers int = 10

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 23, 2019

Member

I suspect 10 is a lot? We don't really know yet, so let's keep it like this

@apelisse apelisse referenced this pull request Aug 26, 2019
@jennybuckley jennybuckley force-pushed the jennybuckley:apply-cap-managers branch from 9c89734 to e641b0e Aug 26, 2019
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

/retest
Benchmarks to show nothing got much worse:

benchmark                       old ns/op     new ns/op     delta
BenchmarkApplyNewObject-12      1068579       1050232       -1.72%
BenchmarkUpdateNewObject-12     921652        946795        +2.73%
BenchmarkRepeatedUpdate-12      311568        311222        -0.11%

benchmark                       old allocs     new allocs     delta
BenchmarkApplyNewObject-12      4041           4041           +0.00%
BenchmarkUpdateNewObject-12     3572           3573           +0.03%
BenchmarkRepeatedUpdate-12      1094           1096           +0.18%

benchmark                       old bytes     new bytes     delta
BenchmarkApplyNewObject-12      298797        298767        -0.01%
BenchmarkUpdateNewObject-12     261614        261635        +0.01%
BenchmarkRepeatedUpdate-12      94529         94584         +0.06%
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

/retest

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 27, 2019

/retest
let's see if pull-kubernetes-conformance-kind-ipv6 is fixed

// Create a new manager identifier for the versioned bucket entry.
// The version for this manager comes from the version of the update being merged into the bucket.
if manager, err = internal.BuildManagerIdentifier(&metav1.ManagedFieldsEntry{
Manager: "manager-too-old-to-track",

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 27, 2019

Member

This should maybe be a constant in some easy to spot place.

How about "ancient-changes"?

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 27, 2019

Member

"overflow-manager"?

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 27, 2019

Member

It's not really about time anymore as much as it is about number of managers

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 27, 2019

Member

It's kind of still about time, since the ones we're removing are the oldest.

if previous, ok := managed.Fields[manager]; ok {
managed.Fields[manager] = fieldpath.NewVersionedSet(vs.Set().Union(previous.Set()), vs.APIVersion(), vs.Applied())
} else {
managed.Fields[manager] = vs

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 27, 2019

Member

If you insert a manager, you need to remove one additional manager, no?

This comment has been minimized.

Copy link
@apelisse

apelisse Aug 27, 2019

Member

That's why the test is looking for 12, I agree it's confusing

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 27, 2019

Member

How does this keep the different api versions separate?

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 27, 2019

Author Contributor

vs.APIVersion() comes from the version of the entry we are merging into the bucket, and we used that apiversion to create the bucket's manager identifier, so the versions will remain separated

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 28, 2019

Member

It still seems that this code isn't idempotent-- if you run 11 entries through it, the first time you get 11 entries out, one having been converted to oldUpdatesManagerName, and then the second time you get another one merged and you get 10 out.

This comment has been minimized.

Copy link
@lavalamp

lavalamp Sep 19, 2019

Member

I see, it is sorted by age and the added one is always oldest (is it?).

So it is probably idempotent (yes please test) but I think it would be even better if "cap at 10" actually resulted in always outputting no more than 10 items (if possible), and resulted in the same number no matter whether an aggregate entry was present or not. So I guess I'm actually claiming this is off-by-one :)

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Sep 20, 2019

Author Contributor

Off by the number of versions, yes. I think it could be edited to always give entries <= max(apiVersions, 10)

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Sep 20, 2019

Author Contributor

would you expect the cap to be on appliers too? ex. If an object had three appliers, would you expect the number of updaters to be <= 7?

This comment has been minimized.

Copy link
@apelisse

apelisse Sep 20, 2019

Member

It seems unlikely, but then what if you have more than 10 appliers? And what if you have 8 appliers, does it mean you have at most 2 non-appliers?

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Sep 20, 2019

Author Contributor

but then what if you have more than 10 appliers?

Probably the same thing as it would do if you had more than 10 apiversions, just keep the number of updaters to a minimum

And what if you have 8 appliers, does it mean you have at most 2 non-appliers?

Yes

if len(updaters) <= maxUpdateManagers {
return managed, nil
}

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 27, 2019

Member

Add a comment-- this code does a bunch of pointless work if there are more distinct apiVersions than maxUpdateManagers... At some point in the very distant future this will cause someone a really confusing performance bug.

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 27, 2019

Member

(If I understand it correctly, anyway)

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 27, 2019

Author Contributor

I think it will still cap the number of managers from updates, there will just be more buckets

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 27, 2019

Author Contributor

Will the number of distinct apiVersions really grow like this over time though?

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 28, 2019

Member

I think it will remove and recreate all entries from oldUpdatesManagerName...

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 28, 2019

Author Contributor

Removing and recreating those is basically free, since once it removes it from the map, it doesn't have anything to merge it with, and it's just going to put it back in the map

func (f *FieldManager) capUpdateManagers(managed internal.Managed) (newManaged internal.Managed, err error) {
// Gather all entries from updates
updaters := []string{}
for manager, fields := range managed.Fields {

This comment has been minimized.

Copy link
@lavalamp

lavalamp Aug 28, 2019

Member

maybe explicitly exclude updates by the oldUpdatesManagerName manager?

@lavalamp lavalamp referenced this pull request Aug 28, 2019
@apelisse

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

I saw this PR while searching for your "split fieldmanager" PR. Can we do sort-of the same thing for this? Build it as part of a separate wrapper class?

@apelisse

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

I saw this PR while searching for your "split fieldmanager" PR. Can we do sort-of the same thing for this? Build it as part of a separate wrapper class?

We might have to parse the managers again. Maybe we should come-up with an alternative solution for that (lazy-deserialization of the manager fields with some sort of caching)

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Sep 26, 2019

Rebased on master now that it merged

@apelisse

This comment has been minimized.

Copy link
Member

commented Sep 26, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 26, 2019
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

/retest

1 similar comment
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

/retest

@jennybuckley jennybuckley force-pushed the jennybuckley:apply-cap-managers branch from 80e4f0d to 249c935 Sep 30, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 30, 2019
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2019

rebased

@apelisse

This comment has been minimized.

Copy link
Member

commented Sep 30, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Sep 30, 2019
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2019

/retest

@jennybuckley jennybuckley force-pushed the jennybuckley:apply-cap-managers branch from 249c935 to 61b19c7 Oct 3, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 3, 2019
@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2019

rebased

@apelisse

This comment has been minimized.

Copy link
Member

commented Oct 4, 2019

/retest

@lavalamp

This comment has been minimized.

Copy link
Member

commented Oct 4, 2019

/lgtm
/approve

Tests look great now, thanks! Very clear, readable, exhaustive.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: apelisse, jennybuckley, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Oct 4, 2019

/retest

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Oct 4, 2019

/retest
looks flaky

@fejta-bot

This comment has been minimized.

Copy link

commented Oct 5, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot

This comment has been minimized.

Copy link

commented Oct 5, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit a8e8e54 into kubernetes:master Oct 5, 2019
16 checks passed
16 checks passed
cla/linuxfoundation jennybuckley authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-kind-ipv6 Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 5, 2019

// Merge the oldest updaters with versioned bucket managers until the number of updaters is under the cap
versionToFirstManager := map[string]string{}
for i, length := 0, len(updaters); i < len(updaters) && length > f.maxUpdateManagers; i++ {

This comment has been minimized.

Copy link
@tedyu

tedyu Oct 5, 2019

Contributor

Should we add a log at the end of the loop with the reduction in length and the total time the loop takes ?

diff --git a/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/capmanagers.go b/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/capmanagers.go
index 8c38461613..e0e6223476 100644
--- a/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/capmanagers.go
+++ b/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/capmanagers.go
@@ -24,6 +24,7 @@ import (
        metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
        "k8s.io/apimachinery/pkg/runtime"
        "k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/internal"
+       "k8s.io/klog"
        "sigs.k8s.io/structured-merge-diff/fieldpath"
 )

@@ -93,6 +94,8 @@ func (f *capManagersManager) capUpdateManagers(managed Managed) (newManaged Mana

        // Merge the oldest updaters with versioned bucket managers until the number of updaters is under the cap
        versionToFirstManager := map[string]string{}
+       originalLength := len(updaters)
+       startTime := time.Now()
        for i, length := 0, len(updaters); i < len(updaters) && length > f.maxUpdateManagers; i++ {
                manager := updaters[i]
                vs := managed.Fields()[manager]
@@ -130,6 +133,7 @@ func (f *capManagersManager) capUpdateManagers(managed Managed) (newManaged Mana
                        versionToFirstManager[version] = manager
                }
        }
+       klog.Infof("reduced the number of updaters from %v to %v in %v", originalLength, f.maxUpdateManagers, time.Now().Sub(startTime))

        return managed, nil
 }

This comment has been minimized.

Copy link
@apelisse

apelisse Oct 6, 2019

Member

Why would we do that?

This comment has been minimized.

Copy link
@tedyu

tedyu Oct 6, 2019

Contributor

To have some idea how long the merge has taken.
Currently maxUpdateManagers is at 10. This may evolve in the future.

This comment has been minimized.

Copy link
@apelisse

apelisse Oct 7, 2019

Member

I think if we really cared about this value, then we would expose a metric for it rather than a log? @logicalhan
Also this is supposed to be very fast, we could also write a benchmark? These all seem good, but not as important as the other things we're doing.

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Oct 7, 2019

Author Contributor

I think if we wanted to expose a metric about managedFields update latency, it should probably be from start to finish of the entire managedFields update, from when it starts being decoded to when it is finally encoded back into the object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.