Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use raw bytes instead of nested map in metav1.Fields #80730

Merged
merged 5 commits into from Aug 2, 2019

Conversation

jennybuckley
Copy link

What type of PR is this?
/kind feature
/wg apply
/sig api-machinery
/priority important-soon
/cc @lavalamp

What this PR does / why we need it:
Addresses the scalability concerns in kubernetes/enhancements#1110 as a follow-up to #80497

This uses the functions defined in kubernetes-sigs/structured-merge-diff#90 to serialize and deserialize field sets into raw extensions, and removes metav1.Fields

Will update with benchmarks

Does this PR introduce a user-facing change?:

TBD

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
kubernetes/enhancements#1110

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 29, 2019
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. wg/apply sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 29, 2019

// +k8s:deprecated=fields,protobuf=5

// FieldSet identifies a set of fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document this, either by doing so here or by directing people to smd (and document there).

@@ -1108,23 +1111,6 @@ const (
ManagedFieldsOperationUpdate ManagedFieldsOperationType = "Update"
)

// Fields stores a set of fields in a data structure like a Trie.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which implies we need to keep this around, either here, or move to smd.

},
},
},
Raw: []byte(`{"f:metadata":{"f:name":{},".":{}}}`),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see a golden data test that ensures no version incompatibilities ever sneak past?

(you can play with the benchmark code in SMD to get random fieldsets if that's helpful)

// +optional
Fields *Fields `json:"fields,omitempty" protobuf:"bytes,5,opt,name=fields,casttype=Fields"`
FieldSet *runtime.RawExtension `json:"fieldSet,omitempty" protobuf:"bytes,6,opt,name=fieldSet,casttype=k8s.io/apimachinery/pkg/runtime.RawExtension"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?

Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?

Versioning could be done per managed fields entry, but we might also want to consider versioning the entire list together.

Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?

To me it seemed easier to reason about removing a field and adding a new field, than changing the proto number and not the field name.

@jennybuckley jennybuckley force-pushed the apply-raw branch 2 times, most recently from dd8d59b to 69e3978 Compare July 29, 2019 23:59
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@jennybuckley
Copy link
Author

This also causes us to run into #55890 on every type

@lavalamp
Copy link
Member

lavalamp commented Jul 30, 2019 via email

@jennybuckley
Copy link
Author

@lavalamp I prefer not using RawExtension honestly, because we aren't even using the Object part of the RawExtension anyway, and it seems like RawExtension is meant to be encodable and decodable by a Scheme

@jennybuckley
Copy link
Author

@lavalamp we could also just fix #55890

@lavalamp
Copy link
Member

lavalamp commented Jul 30, 2019 via email

@jennybuckley
Copy link
Author

/retest

@jennybuckley
Copy link
Author

jennybuckley commented Aug 1, 2019

benchmarks:

benchmark old ns/op new ns/op delta
BenchmarkApplyNewObject-12 1670550 1650093 -1.22%
BenchmarkUpdateNewObject-12 1867805 1800949 -3.58%
BenchmarkRepeatedUpdate-12 839920 659418 -21.49%
BenchmarkSetToFields-12 18517 4322 -76.66%
BenchmarkFieldsToSet-12 12771 9425 -26.20%
BenchmarkYAMLToTyped-12 57630 58456 +1.43%
benchmark old allocs new allocs delta
BenchmarkApplyNewObject-12 7167 6820 -4.84%
BenchmarkUpdateNewObject-12 8396 7901 -5.90%
BenchmarkRepeatedUpdate-12 2887 2264 -21.58%
BenchmarkSetToFields-12 124 14 -88.71%
BenchmarkFieldsToSet-12 93 82 -11.83%
BenchmarkYAMLToTyped-12 152 152 +0.00%
benchmark old bytes new bytes delta
BenchmarkApplyNewObject-12 500674 481842 -3.76%
BenchmarkUpdateNewObject-12 560253 534794 -4.54%
BenchmarkRepeatedUpdate-12 221142 190649 -13.79%
BenchmarkSetToFields-12 8369 1016 -87.86%
BenchmarkFieldsToSet-12 4688 6417 +36.88%
BenchmarkYAMLToTyped-12 27408 27416 +0.03%

@lavalamp
Copy link
Member

lavalamp commented Aug 1, 2019

Strange, I thought my PoC showed more improvement for the first two benchmarks, and this should be the same?

@lavalamp
Copy link
Member

lavalamp commented Aug 1, 2019

/lgtm
/approve

I think a followup with a bit more in the way of golden data for catching future format breakage might be good.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jennybuckley, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2019
@lavalamp
Copy link
Member

lavalamp commented Aug 2, 2019

Oh I forgot for a second that this is an API change.

Hey @liggitt, @smarterclayton, fyi, we're making the API a little bit weirder as discussed at the SIG API Machinery meeting a while back.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 9743d7f into kubernetes:master Aug 2, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 2, 2019
return err
}
if len(p.Map) == 0 {
return json.Unmarshal(p.Raw, &m)
Copy link
Member

@liggitt liggitt Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time following this... some comments might help.

if I'm reading this correctly, we expect this to be the normal path (since we should not encounter persisted alpha data)

does that mean our preferred proto format (serialized to proto tag 2) is a byte array of json? I didn't realize that's where we ended up.

Copy link
Member

@liggitt liggitt Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also potentially forces another allocation of len(p.Raw), since Fields#UnmarshalJSON just appends non-null bytes to the Raw field (with f.Raw = append(f.Raw[0:0], b...)).

Would this work better?

m.Raw = nil
if !bytes.Equal(p.Raw, []byte("null")) {
  m.Raw = p.Raw
}
return

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a follow up for this too

field.Fields.Map[k1].Map[k2] = metav1.Fields{}
}
}
field.Fields.Raw = []byte("{}")
Copy link
Member

@liggitt liggitt Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this (or maybe the change in https://github.com/kubernetes/kubernetes/pull/80730/files#diff-5380be8c9f2e29b4ac1b57c2533e021b) produces no output in the test fixtures.

previously, we serialized this: "fields": {"18":{"19":null}}

now, "fields" is omitted entirely. that means if we break decoding this new format, our fixtures won't catch it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a follow up for this

if err != nil {
return err
}
return json.Unmarshal(b, &m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here, should we just do m.Raw = b at this point?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can do that. I was originally thinking this would keep the two serializations in line, and if i make the change you suggested above then it would be the same as changing it here, just one less function call

return errors.New("metav1.Fields: UnmarshalJSON on nil pointer")
}
if !bytes.Equal(b, []byte("null")) {
f.Raw = append(f.Raw[0:0], b...)
Copy link
Member

@liggitt liggitt Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this allow storing non-map values (1, true, [""], "", etc)? what about map values that aren't schema-compatible with the old map[string]Field structure? what ensures the raw JSON persisted in this field doesn't break old clients?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old clients as in, clients written while it was an alpha api?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can't ever send data that one of those clients would fail to deserialize

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could validate that it is just a map of maps, but would this require us to deserialize it on every update?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly, but I expected format validation on update anyway...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is already validated when we make sure it will convert into an smd fieldset

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that happens on every write (create/update/all patch types/subresources/etc)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have negative tests ensuring malformed data submitted by clients is rejected?

@jennybuckley
Copy link
Author

Ran some scalability tests on this, these were the results. Going to compare these more and see if the improvement is enough

old:

testing/density
APIResponsiveness: WARNING Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 383.515ms, perc90: 752.228ms, perc99: 1.242136s Count:3501}; threshold: 1s
APIResponsiveness: WARNING Top latency metric: {Resource:configmaps Subresource: Verb:POST Scope:namespace Latency:perc50: 4.872ms, perc90: 260.909ms, perc99: 1.076742s Count:24}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 340.473ms, perc90: 736.154ms, perc99: 968.789ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:perc50: 247.69ms, perc90: 580.675ms, perc99: 966.046ms Count:7000}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 106.528ms, perc90: 393.084ms, perc99: 878.509ms Count:17635}; threshold: 1s

testing/load
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 28.07ms, perc90: 93.536ms, perc99: 1.099665s Count:178}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 11.568ms, perc90: 383.024ms, perc99: 960.288ms Count:3251}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 14.699ms, perc90: 454.426ms, perc99: 944.308ms Count:3251}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:replicasets Subresource: Verb:DELETE Scope:namespace Latency:perc50: 2.904ms, perc90: 30.796ms, perc99: 758.386ms Count:328}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 13.661ms, perc90: 141.517ms, perc99: 639.23ms Count:18043}; threshold: 1s

new:

testing/density
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 14.22ms, perc90: 108.716ms, perc99: 1.013075s Count:97}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 263.879ms, perc90: 556.115ms, perc99: 879.549ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 256.942ms, perc90: 513.08ms, perc99: 742.577ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 49.876ms, perc90: 292.037ms, perc99: 583.91ms Count:18518}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:GET Scope:namespace Latency:perc50: 35.935ms, perc90: 226.091ms, perc99: 492.907ms Count:25344}; threshold: 1s

testing/load
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 13.842ms, perc90: 56.789ms, perc99: 345.596ms Count:179}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:perc50: 11.827ms, perc90: 116.55ms, perc99: 339.434ms Count:6666}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 9.179ms, perc90: 100.26ms, perc99: 219.863ms Count:3333}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 11.573ms, perc90: 68.031ms, perc99: 205.966ms Count:3333}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:deployments Subresource: Verb:DELETE Scope:namespace Latency:perc50: 3.56ms, perc90: 18.001ms, perc99: 201.319ms Count:328}; threshold: 1s

@lavalamp
Copy link
Member

lavalamp commented Aug 7, 2019 via email

@jennybuckley
Copy link
Author

jennybuckley commented Aug 8, 2019

@lavalamp
I think we can be fairly confident that we will not persist invalid data in this field because of how we treat it on update:

// First try to decode the managed fields provided in the update,
// This is necessary to allow directly updating managed fields.
managed, err := internal.DecodeObjectManagedFields(newObj)
// If the managed field is empty or we failed to decode it,
// let's try the live object. This is to prevent clients who
// don't understand managedFields from deleting it accidentally.
if err != nil || len(managed) == 0 {
managed, err = internal.DecodeObjectManagedFields(liveObj)
if err != nil {
return nil, fmt.Errorf("failed to decode managed fields: %v", err)
}
}

@liggitt
Copy link
Member

liggitt commented Aug 8, 2019

hmm... that means a client can send malformed data in that field and we'll silently replace it with the current data from the persisted object?

I only see Update and Apply paths in that file. What happens on Create when there isn't a liveObject?

@jennybuckley
Copy link
Author

jennybuckley commented Aug 8, 2019

Create and patch both go through this path as well, the create path sends an empty object as "liveObject":

liveObj, err := scope.Creater.New(scope.Kind)
if err != nil {
scope.err(fmt.Errorf("failed to create new object (Create for %v): %v", scope.Kind, err), w, req)
return
}
obj, err = scope.FieldManager.Update(liveObj, obj, managerOrUserAgent(options.FieldManager, req.UserAgent()))

@jennybuckley
Copy link
Author

It sounds reasonable to fail instead of replace with the current data

@liggitt
Copy link
Member

liggitt commented Aug 8, 2019

one other thought, do we normalize the bytes on write (by actually round-tripping them through the smd struct)? if not, would a client that roundtripped json and indented it or output keys in unsorted order cause different bytes to be written to etcd?

@liggitt
Copy link
Member

liggitt commented Aug 8, 2019

answering my own question, looks like EncodeObjectManagedFields round-trips and overwrites with a normalized version, which is good. That would be good to test (get object from API, marshal managed fields with different json indenting, send to API (which should be a no-op), verify resourceVersion doesn't change)

@jennybuckley
Copy link
Author

@liggitt yes that is currently true, and we should have a test for it. We had plans to sometimes skip the re-encoding but it seems like we should be careful with that. A test would help make sure we don't ever let user edited bytes directly through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants