Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable tracking field management for some objects before they are applied to #87044

Merged
merged 2 commits into from Jan 12, 2020

Conversation

@jennybuckley
Copy link
Contributor

jennybuckley commented Jan 9, 2020

What type of PR is this?
/kind feature
/priority important-soon
/sig api-machinery
/wg apply
/area apiserver
/cc @lavalamp @apelisse @kwiesmueller

What this PR does / why we need it:
In #78738 we added a step to the fieldmanager which skips tracking field management until an object has been applied in order to improve performance. The main drawback of that change was that the field management history of objects created with update is less useful. This PR partially undoes that #78738 by probabilistically deciding not to skip tracking field management for some objects. The probability can be gradually increased as Server Side Apply's performance impact becomes better understood, which should be safer than enabling field management for all objects at once.

The inconsistency between the two behaviors would be minimal, since if an object without managedFields does get applied to, we already generate a single managedFields entry for all the existing fields, so the conflict set is the same as if we had been tracking managers from creation time. The only difference to a user would be that now these type of conflicts would sometimes contain more useful information about the manager(s) of the fields that caused the conflict.

The eventual goal is the get the probability to 100% by the time we release 1.18 (if we don't reach that goal we can always revert this PR), and doing it gradually can give us better feedback on the impact of the feature in production, without breaking the ci infrastructure too much in the event that Server Side Apply is not ready to be fully enabled.

Does this PR introduce a user-facing change?:

TODO
@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jan 10, 2020

The goal is to get this to 100% before 1.18, right?

@deads2k fyi

// If the operation is create, P(tracking managed fields) = f.probability
// If the operation is update, skip tracking managed fields, since we already know managed fields is empty.
if len(accessor.GetUID()) == 0 {
if f.probability <= rand.Float32() {

This comment has been minimized.

Copy link
@lavalamp

lavalamp Jan 10, 2020

Member

I have a preference for computing a number by e.g. hashing namespace and name so that it'll be repeatable based on the object. (e.g., if we break an e2e test, let's try to always break the same one)

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Jan 10, 2020

Author Contributor

That sounds reasonable to me, at least if we are planning to get this to 100% before 1.18.

Do you think we want this probability to be configurable by end users though?

This comment has been minimized.

Copy link
@lavalamp

lavalamp Jan 10, 2020

Member

I don't think that's necessary, and we can always add that if it looks like we won't get to 100% by the release day.

This comment has been minimized.

Copy link
@liggitt

liggitt Jan 10, 2020

Member

if we break an e2e test, let's try to always break the same one

I'm pretty sure e2e tests generate a namespace with a randomized suffix, fyi

This comment has been minimized.

Copy link
@lavalamp

lavalamp Jan 10, 2020

Member

That's true, and makes this less important. I still think it'd be better if it were deterministic :/

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Jan 10, 2020

The goal is to get this to 100% before 1.18, right?

I really hope we don't ship an actual release with this at anything other than 100% or 0%... I would be extremely confused to encounter this as an end user sometimes. Was this requested by sig-scalability to gradually ramp up to 100% to avoid disrupting scale tests?

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jan 10, 2020

I really hope we don't ship an actual release with this at anything other than 100% or 0%... I would be extremely confused to encounter this as an end user sometimes. Was this requested by sig-scalability to gradually ramp up to 100% to avoid disrupting scale tests?

Yeah, the purpose of this is to turn it on slowly for low impact on devs / tests.

I actually don't think it's too weird to partially enable a feature--if this were a cloud service and not an OSS project we'd definitely be turning this on a little bit at a time. But I think we will get it to 100% shortly and we can turn it totally off easily if not.

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jan 10, 2020

/test pull-kubernetes-e2e-gce

/lgtm
/approve

...since we can't get determinism anyway...

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 10, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jennybuckley, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jan 10, 2020
@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jan 10, 2020

/test pull-kubernetes-e2e-gce

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

jennybuckley commented Jan 10, 2020

With apply running for 50% of objects, it looks like the performance impact is pretty small.

For example, on namespaced pods POST from pull-kubernetes-e2e-gce-100-performance:

SSA on for 0% of objects       SSA on for 50% of objects
"Perc50": 27.293082,           "Perc50": 28.998357,
"Perc90": 49.127547,           "Perc90": 65.522041,
"Perc99": 96.556939,           "Perc99": 119.148148,
@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Jan 10, 2020

OK, well then let's get this merged and then work on one that tries 100%... Hopefully the test will pass this time?

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 10, 2020
@apelisse

This comment has been minimized.

Copy link
Member

apelisse commented Jan 10, 2020

These results look great.

That test is REALLY flaky, so I wouldn't think this is "ssa"'s fault.

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

jennybuckley commented Jan 10, 2020

/retest

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

jennybuckley commented Jan 10, 2020

/hold
To make sure the flakes are really flakes

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

jennybuckley commented Jan 10, 2020

/unhold

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

14 similar comments
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 11, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jan 12, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 36e40fb into kubernetes:master Jan 12, 2020
15 of 16 checks passed
15 of 16 checks passed
tide Not mergeable. Retesting: pull-kubernetes-e2e-gce
Details
cla/linuxfoundation jennybuckley authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-e2e-kind-ipv6 Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 12, 2020
@riking

This comment has been minimized.

Copy link

riking commented Jan 22, 2020

Please fix the release note @jennybuckley!

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

jennybuckley commented Jan 22, 2020

@riking this was reverted, so there is no release note needed as of now

@riking

This comment has been minimized.

Copy link

riking commented Jan 23, 2020

Can you switch it to "NONE", then? It showed up in the alpha stamp as "TODO".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.