Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use raw bytes instead of nested map in metav1.Fields #80730

Merged
merged 5 commits into from Aug 2, 2019

Conversation

@jennybuckley
Copy link
Contributor

commented Jul 29, 2019

What type of PR is this?
/kind feature
/wg apply
/sig api-machinery
/priority important-soon
/cc @lavalamp

What this PR does / why we need it:
Addresses the scalability concerns in kubernetes/enhancements#1110 as a follow-up to #80497

This uses the functions defined in kubernetes-sigs/structured-merge-diff#90 to serialize and deserialize field sets into raw extensions, and removes metav1.Fields

Will update with benchmarks

Does this PR introduce a user-facing change?:

TBD

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
kubernetes/enhancements#1110

jennybuckley added some commits Jul 29, 2019


// +k8s:deprecated=fields,protobuf=5

// FieldSet identifies a set of fields.

This comment has been minimized.

Copy link
@lavalamp

lavalamp Jul 29, 2019

Member

We need to document this, either by doing so here or by directing people to smd (and document there).

@@ -1108,23 +1111,6 @@ const (
ManagedFieldsOperationUpdate ManagedFieldsOperationType = "Update"
)

// Fields stores a set of fields in a data structure like a Trie.

This comment has been minimized.

Copy link
@lavalamp

lavalamp Jul 29, 2019

Member

...which implies we need to keep this around, either here, or move to smd.

},
},
},
Raw: []byte(`{"f:metadata":{"f:name":{},".":{}}}`),

This comment has been minimized.

Copy link
@lavalamp

lavalamp Jul 29, 2019

Member

I'd love to see a golden data test that ensures no version incompatibilities ever sneak past?

(you can play with the benchmark code in SMD to get random fieldsets if that's helpful)

// +optional
Fields *Fields `json:"fields,omitempty" protobuf:"bytes,5,opt,name=fields,casttype=Fields"`
FieldSet *runtime.RawExtension `json:"fieldSet,omitempty" protobuf:"bytes,6,opt,name=fieldSet,casttype=k8s.io/apimachinery/pkg/runtime.RawExtension"`

This comment has been minimized.

Copy link
@lavalamp

lavalamp Jul 29, 2019

Member

We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?

Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Jul 29, 2019

Author Contributor

We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?

Versioning could be done per managed fields entry, but we might also want to consider versioning the entire list together.

Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?

To me it seemed easier to reason about removing a field and adding a new field, than changing the proto number and not the field name.

@jennybuckley jennybuckley force-pushed the jennybuckley:apply-raw branch 2 times, most recently from dd8d59b to 69e3978 Jul 29, 2019

@fejta-bot

This comment has been minimized.

Copy link

commented Jul 30, 2019

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2019

This also causes us to run into #55890 on every type

@lavalamp

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2019

@lavalamp I prefer not using RawExtension honestly, because we aren't even using the Object part of the RawExtension anyway, and it seems like RawExtension is meant to be encodable and decodable by a Scheme

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2019

@lavalamp we could also just fix #55890

@lavalamp

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

@jennybuckley jennybuckley force-pushed the jennybuckley:apply-raw branch from 9a8f209 to 87f8ba5 Jul 31, 2019

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

/retest

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

benchmarks:

benchmark old ns/op new ns/op delta
BenchmarkApplyNewObject-12 1670550 1650093 -1.22%
BenchmarkUpdateNewObject-12 1867805 1800949 -3.58%
BenchmarkRepeatedUpdate-12 839920 659418 -21.49%
BenchmarkSetToFields-12 18517 4322 -76.66%
BenchmarkFieldsToSet-12 12771 9425 -26.20%
BenchmarkYAMLToTyped-12 57630 58456 +1.43%
benchmark old allocs new allocs delta
BenchmarkApplyNewObject-12 7167 6820 -4.84%
BenchmarkUpdateNewObject-12 8396 7901 -5.90%
BenchmarkRepeatedUpdate-12 2887 2264 -21.58%
BenchmarkSetToFields-12 124 14 -88.71%
BenchmarkFieldsToSet-12 93 82 -11.83%
BenchmarkYAMLToTyped-12 152 152 +0.00%
benchmark old bytes new bytes delta
BenchmarkApplyNewObject-12 500674 481842 -3.76%
BenchmarkUpdateNewObject-12 560253 534794 -4.54%
BenchmarkRepeatedUpdate-12 221142 190649 -13.79%
BenchmarkSetToFields-12 8369 1016 -87.86%
BenchmarkFieldsToSet-12 4688 6417 +36.88%
BenchmarkYAMLToTyped-12 27408 27416 +0.03%
@lavalamp

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Strange, I thought my PoC showed more improvement for the first two benchmarks, and this should be the same?

@lavalamp

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

/lgtm
/approve

I think a followup with a bit more in the way of golden data for catching future format breakage might be good.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jennybuckley, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lavalamp

This comment has been minimized.

Copy link
Member

commented Aug 2, 2019

Oh I forgot for a second that this is an API change.

Hey @liggitt, @smarterclayton, fyi, we're making the API a little bit weirder as discussed at the SIG API Machinery meeting a while back.

@fejta-bot

This comment has been minimized.

Copy link

commented Aug 2, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 9743d7f into kubernetes:master Aug 2, 2019

23 checks passed

cla/linuxfoundation jennybuckley authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 2, 2019

return err
}
if len(p.Map) == 0 {
return json.Unmarshal(p.Raw, &m)

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 2, 2019

Member

I'm having a hard time following this... some comments might help.

if I'm reading this correctly, we expect this to be the normal path (since we should not encounter persisted alpha data)

does that mean our preferred proto format (serialized to proto tag 2) is a byte array of json? I didn't realize that's where we ended up.

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 2, 2019

Member

this also potentially forces another allocation of len(p.Raw), since Fields#UnmarshalJSON just appends non-null bytes to the Raw field (with f.Raw = append(f.Raw[0:0], b...)).

Would this work better?

m.Raw = nil
if !bytes.Equal(p.Raw, []byte("null")) {
  m.Raw = p.Raw
}
return

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 5, 2019

Author Contributor

And a follow up for this too

field.Fields.Map[k1].Map[k2] = metav1.Fields{}
}
}
field.Fields.Raw = []byte("{}")

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 2, 2019

Member

it looks like this (or maybe the change in https://github.com/kubernetes/kubernetes/pull/80730/files#diff-5380be8c9f2e29b4ac1b57c2533e021b) produces no output in the test fixtures.

previously, we serialized this: "fields": {"18":{"19":null}}

now, "fields" is omitted entirely. that means if we break decoding this new format, our fixtures won't catch it.

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 5, 2019

Author Contributor

I'll make a follow up for this

if err != nil {
return err
}
return json.Unmarshal(b, &m)

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 2, 2019

Member

same question here, should we just do m.Raw = b at this point?

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 5, 2019

Author Contributor

Sure we can do that. I was originally thinking this would keep the two serializations in line, and if i make the change you suggested above then it would be the same as changing it here, just one less function call

return errors.New("metav1.Fields: UnmarshalJSON on nil pointer")
}
if !bytes.Equal(b, []byte("null")) {
f.Raw = append(f.Raw[0:0], b...)

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 2, 2019

Member

does this allow storing non-map values (1, true, [""], "", etc)? what about map values that aren't schema-compatible with the old map[string]Field structure? what ensures the raw JSON persisted in this field doesn't break old clients?

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 5, 2019

Author Contributor

old clients as in, clients written while it was an alpha api?

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 7, 2019

Member

yeah, we can't ever send data that one of those clients would fail to deserialize

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 7, 2019

Author Contributor

we could validate that it is just a map of maps, but would this require us to deserialize it on every update?

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 7, 2019

Member

possibly, but I expected format validation on update anyway...

This comment has been minimized.

Copy link
@jennybuckley

jennybuckley Aug 8, 2019

Author Contributor

Actually this is already validated when we make sure it will convert into an smd fieldset

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 8, 2019

Member

and that happens on every write (create/update/all patch types/subresources/etc)?

This comment has been minimized.

Copy link
@liggitt

liggitt Aug 8, 2019

Member

do we have negative tests ensuring malformed data submitted by clients is rejected?

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

Ran some scalability tests on this, these were the results. Going to compare these more and see if the improvement is enough

old:

testing/density
APIResponsiveness: WARNING Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 383.515ms, perc90: 752.228ms, perc99: 1.242136s Count:3501}; threshold: 1s
APIResponsiveness: WARNING Top latency metric: {Resource:configmaps Subresource: Verb:POST Scope:namespace Latency:perc50: 4.872ms, perc90: 260.909ms, perc99: 1.076742s Count:24}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 340.473ms, perc90: 736.154ms, perc99: 968.789ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:perc50: 247.69ms, perc90: 580.675ms, perc99: 966.046ms Count:7000}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 106.528ms, perc90: 393.084ms, perc99: 878.509ms Count:17635}; threshold: 1s

testing/load
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 28.07ms, perc90: 93.536ms, perc99: 1.099665s Count:178}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 11.568ms, perc90: 383.024ms, perc99: 960.288ms Count:3251}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 14.699ms, perc90: 454.426ms, perc99: 944.308ms Count:3251}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:replicasets Subresource: Verb:DELETE Scope:namespace Latency:perc50: 2.904ms, perc90: 30.796ms, perc99: 758.386ms Count:328}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 13.661ms, perc90: 141.517ms, perc99: 639.23ms Count:18043}; threshold: 1s

new:

testing/density
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 14.22ms, perc90: 108.716ms, perc99: 1.013075s Count:97}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 263.879ms, perc90: 556.115ms, perc99: 879.549ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 256.942ms, perc90: 513.08ms, perc99: 742.577ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 49.876ms, perc90: 292.037ms, perc99: 583.91ms Count:18518}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:GET Scope:namespace Latency:perc50: 35.935ms, perc90: 226.091ms, perc99: 492.907ms Count:25344}; threshold: 1s

testing/load
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 13.842ms, perc90: 56.789ms, perc99: 345.596ms Count:179}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:perc50: 11.827ms, perc90: 116.55ms, perc99: 339.434ms Count:6666}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 9.179ms, perc90: 100.26ms, perc99: 219.863ms Count:3333}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 11.573ms, perc90: 68.031ms, perc99: 205.966ms Count:3333}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:deployments Subresource: Verb:DELETE Scope:namespace Latency:perc50: 3.56ms, perc90: 18.001ms, perc99: 201.319ms Count:328}; threshold: 1s
@lavalamp

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

@lavalamp
I think we can be fairly confident that we will not persist invalid data in this field because of how we treat it on update:

// First try to decode the managed fields provided in the update,
// This is necessary to allow directly updating managed fields.
managed, err := internal.DecodeObjectManagedFields(newObj)
// If the managed field is empty or we failed to decode it,
// let's try the live object. This is to prevent clients who
// don't understand managedFields from deleting it accidentally.
if err != nil || len(managed) == 0 {
managed, err = internal.DecodeObjectManagedFields(liveObj)
if err != nil {
return nil, fmt.Errorf("failed to decode managed fields: %v", err)
}
}

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

hmm... that means a client can send malformed data in that field and we'll silently replace it with the current data from the persisted object?

I only see Update and Apply paths in that file. What happens on Create when there isn't a liveObject?

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

Create and patch both go through this path as well, the create path sends an empty object as "liveObject":

liveObj, err := scope.Creater.New(scope.Kind)
if err != nil {
scope.err(fmt.Errorf("failed to create new object (Create for %v): %v", scope.Kind, err), w, req)
return
}
obj, err = scope.FieldManager.Update(liveObj, obj, managerOrUserAgent(options.FieldManager, req.UserAgent()))

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

It sounds reasonable to fail instead of replace with the current data

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

one other thought, do we normalize the bytes on write (by actually round-tripping them through the smd struct)? if not, would a client that roundtripped json and indented it or output keys in unsorted order cause different bytes to be written to etcd?

@liggitt

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

answering my own question, looks like EncodeObjectManagedFields round-trips and overwrites with a normalized version, which is good. That would be good to test (get object from API, marshal managed fields with different json indenting, send to API (which should be a no-op), verify resourceVersion doesn't change)

@jennybuckley

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

@liggitt yes that is currently true, and we should have a test for it. We had plans to sometimes skip the re-encoding but it seems like we should be careful with that. A test would help make sure we don't ever let user edited bytes directly through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.