Use raw bytes instead of nested map in metav1.Fields #80730

jennybuckley · 2019-07-29T23:16:14Z

What type of PR is this?
/kind feature
/wg apply
/sig api-machinery
/priority important-soon
/cc @lavalamp

What this PR does / why we need it:
Addresses the scalability concerns in kubernetes/enhancements#1110 as a follow-up to #80497

This uses the functions defined in kubernetes-sigs/structured-merge-diff#90 to serialize and deserialize field sets into raw extensions, and removes metav1.Fields

Will update with benchmarks

Does this PR introduce a user-facing change?:

TBD

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
kubernetes/enhancements#1110

lavalamp · 2019-07-29T23:22:37Z

staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go

+
+	// +k8s:deprecated=fields,protobuf=5
+
+	// FieldSet identifies a set of fields.


We need to document this, either by doing so here or by directing people to smd (and document there).

lavalamp · 2019-07-29T23:22:52Z

staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go

@@ -1108,23 +1111,6 @@ const (
 	ManagedFieldsOperationUpdate ManagedFieldsOperationType = "Update"
 )

-// Fields stores a set of fields in a data structure like a Trie.


...which implies we need to keep this around, either here, or move to smd.

lavalamp · 2019-07-29T23:26:30Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/internal/fields_test.go

-					},
-				},
-			},
+			Raw: []byte(`{"f:metadata":{"f:name":{},".":{}}}`),


I'd love to see a golden data test that ensures no version incompatibilities ever sneak past?

(you can play with the benchmark code in SMD to get random fieldsets if that's helpful)

lavalamp · 2019-07-29T23:29:34Z

staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go

 	// +optional
-	Fields *Fields `json:"fields,omitempty" protobuf:"bytes,5,opt,name=fields,casttype=Fields"`
+	FieldSet *runtime.RawExtension `json:"fieldSet,omitempty" protobuf:"bytes,6,opt,name=fieldSet,casttype=k8s.io/apimachinery/pkg/runtime.RawExtension"`


We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?

Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?

We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?

Versioning could be done per managed fields entry, but we might also want to consider versioning the entire list together.

Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?

To me it seemed easier to reason about removing a field and adding a new field, than changing the proto number and not the field name.

fejta-bot · 2019-07-30T00:28:38Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

jennybuckley · 2019-07-30T19:34:37Z

This also causes us to run into #55890 on every type

lavalamp · 2019-07-30T19:42:28Z

Ouch, to fix that I think we have to duplicate RawExtension and provide a custom schema :/ It's possible, we do it for some types (e.g, intOrString). Alternatively, maybe we could leave the existing type for the purpose of generating documentation, but implement all of the custom marshalling functions so that it's effectively raw extension under the hood? Is this feasible?

…

On Tue, Jul 30, 2019 at 12:37 PM Jennifer Buckley ***@***.***> wrote: This also causes us to run into #55890 <#55890> on every type — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#80730?email_source=notifications&email_token=AAE6BFWVATZKTQ37MKX76M3QCCJ6JA5CNFSM4IHW7YBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3FBXJY#issuecomment-516561831>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE6BFSGBW7VKT66WXDZO63QCCJ6JANCNFSM4IHW7YBA> .

jennybuckley · 2019-07-30T20:02:29Z

@lavalamp I prefer not using RawExtension honestly, because we aren't even using the Object part of the RawExtension anyway, and it seems like RawExtension is meant to be encodable and decodable by a Scheme

jennybuckley · 2019-07-30T20:03:07Z

@lavalamp we could also just fix #55890

lavalamp · 2019-07-30T20:11:45Z

I agree that making a custom version of RawExtension is probably the way to go; I didn't do it in my earlier PR because I was just trying to measure things.

…

On Tue, Jul 30, 2019 at 1:05 PM Jennifer Buckley ***@***.***> wrote: @lavalamp <https://github.com/lavalamp> we could also just fix #55890 <#55890> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#80730?email_source=notifications&email_token=AAE6BFVGUPQK75F2CGCQW5DQCCNJJA5CNFSM4IHW7YBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3FEX2Q#issuecomment-516574186>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE6BFTJWBOEVZHDNKHP5XTQCCNJJANCNFSM4IHW7YBA> .

jennybuckley · 2019-08-01T01:01:16Z

/retest

jennybuckley · 2019-08-01T17:32:10Z

benchmarks:

benchmark	old ns/op	new ns/op	delta
BenchmarkApplyNewObject-12	1670550	1650093	-1.22%
BenchmarkUpdateNewObject-12	1867805	1800949	-3.58%
BenchmarkRepeatedUpdate-12	839920	659418	-21.49%
BenchmarkSetToFields-12	18517	4322	-76.66%
BenchmarkFieldsToSet-12	12771	9425	-26.20%
BenchmarkYAMLToTyped-12	57630	58456	+1.43%

benchmark	old allocs	new allocs	delta
BenchmarkApplyNewObject-12	7167	6820	-4.84%
BenchmarkUpdateNewObject-12	8396	7901	-5.90%
BenchmarkRepeatedUpdate-12	2887	2264	-21.58%
BenchmarkSetToFields-12	124	14	-88.71%
BenchmarkFieldsToSet-12	93	82	-11.83%
BenchmarkYAMLToTyped-12	152	152	+0.00%

benchmark	old bytes	new bytes	delta
BenchmarkApplyNewObject-12	500674	481842	-3.76%
BenchmarkUpdateNewObject-12	560253	534794	-4.54%
BenchmarkRepeatedUpdate-12	221142	190649	-13.79%
BenchmarkSetToFields-12	8369	1016	-87.86%
BenchmarkFieldsToSet-12	4688	6417	+36.88%
BenchmarkYAMLToTyped-12	27408	27416	+0.03%

lavalamp · 2019-08-01T20:26:41Z

Strange, I thought my PoC showed more improvement for the first two benchmarks, and this should be the same?

lavalamp · 2019-08-01T23:40:30Z

/lgtm
/approve

I think a followup with a bit more in the way of golden data for catching future format breakage might be good.

k8s-ci-robot · 2019-08-01T23:41:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jennybuckley, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [lavalamp]
~~api/OWNERS~~ [lavalamp]
~~staging/src/k8s.io/api/OWNERS~~ [lavalamp]
~~staging/src/k8s.io/apimachinery/pkg/apis/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lavalamp · 2019-08-02T00:16:14Z

Oh I forgot for a second that this is an API change.

Hey @liggitt, @smarterclayton, fyi, we're making the API a little bit weirder as discussed at the SIG API Machinery meeting a while back.

fejta-bot · 2019-08-02T02:48:56Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

liggitt · 2019-08-02T03:30:27Z

staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/fields_proto.go

+		return err
+	}
+	if len(p.Map) == 0 {
+		return json.Unmarshal(p.Raw, &m)


I'm having a hard time following this... some comments might help.

if I'm reading this correctly, we expect this to be the normal path (since we should not encounter persisted alpha data)

does that mean our preferred proto format (serialized to proto tag 2) is a byte array of json? I didn't realize that's where we ended up.

this also potentially forces another allocation of len(p.Raw), since Fields#UnmarshalJSON just appends non-null bytes to the Raw field (with f.Raw = append(f.Raw[0:0], b...)).

Would this work better?

m.Raw = nil if !bytes.Equal(p.Raw, []byte("null")) { m.Raw = p.Raw } return

And a follow up for this too

liggitt · 2019-08-02T03:34:32Z

staging/src/k8s.io/apimachinery/pkg/api/apitesting/roundtrip/compatibility.go

-						field.Fields.Map[k1].Map[k2] = metav1.Fields{}
-					}
-				}
+				field.Fields.Raw = []byte("{}")


it looks like this (or maybe the change in https://github.com/kubernetes/kubernetes/pull/80730/files#diff-5380be8c9f2e29b4ac1b57c2533e021b) produces no output in the test fixtures.

previously, we serialized this: "fields": {"18":{"19":null}}

now, "fields" is omitted entirely. that means if we break decoding this new format, our fixtures won't catch it.

I'll make a follow up for this

liggitt · 2019-08-02T03:40:39Z

staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/fields_proto.go

+	if err != nil {
+		return err
+	}
+	return json.Unmarshal(b, &m)


same question here, should we just do m.Raw = b at this point?

Sure we can do that. I was originally thinking this would keep the two serializations in line, and if i make the change you suggested above then it would be the same as changing it here, just one less function call

liggitt · 2019-08-02T17:49:19Z

staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/helpers.go

+		return errors.New("metav1.Fields: UnmarshalJSON on nil pointer")
+	}
+	if !bytes.Equal(b, []byte("null")) {
+		f.Raw = append(f.Raw[0:0], b...)


does this allow storing non-map values (1, true, [""], "", etc)? what about map values that aren't schema-compatible with the old map[string]Field structure? what ensures the raw JSON persisted in this field doesn't break old clients?

old clients as in, clients written while it was an alpha api?

yeah, we can't ever send data that one of those clients would fail to deserialize

we could validate that it is just a map of maps, but would this require us to deserialize it on every update?

possibly, but I expected format validation on update anyway...

Actually this is already validated when we make sure it will convert into an smd fieldset

and that happens on every write (create/update/all patch types/subresources/etc)?

do we have negative tests ensuring malformed data submitted by clients is rejected?

jennybuckley · 2019-08-06T16:25:01Z

Ran some scalability tests on this, these were the results. Going to compare these more and see if the improvement is enough

old:

testing/density
APIResponsiveness: WARNING Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 383.515ms, perc90: 752.228ms, perc99: 1.242136s Count:3501}; threshold: 1s
APIResponsiveness: WARNING Top latency metric: {Resource:configmaps Subresource: Verb:POST Scope:namespace Latency:perc50: 4.872ms, perc90: 260.909ms, perc99: 1.076742s Count:24}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 340.473ms, perc90: 736.154ms, perc99: 968.789ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:perc50: 247.69ms, perc90: 580.675ms, perc99: 966.046ms Count:7000}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 106.528ms, perc90: 393.084ms, perc99: 878.509ms Count:17635}; threshold: 1s

testing/load
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 28.07ms, perc90: 93.536ms, perc99: 1.099665s Count:178}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 11.568ms, perc90: 383.024ms, perc99: 960.288ms Count:3251}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 14.699ms, perc90: 454.426ms, perc99: 944.308ms Count:3251}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:replicasets Subresource: Verb:DELETE Scope:namespace Latency:perc50: 2.904ms, perc90: 30.796ms, perc99: 758.386ms Count:328}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 13.661ms, perc90: 141.517ms, perc99: 639.23ms Count:18043}; threshold: 1s

new:

testing/density
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 14.22ms, perc90: 108.716ms, perc99: 1.013075s Count:97}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 263.879ms, perc90: 556.115ms, perc99: 879.549ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 256.942ms, perc90: 513.08ms, perc99: 742.577ms Count:3500}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:status Verb:PATCH Scope:namespace Latency:perc50: 49.876ms, perc90: 292.037ms, perc99: 583.91ms Count:18518}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:GET Scope:namespace Latency:perc50: 35.935ms, perc90: 226.091ms, perc99: 492.907ms Count:25344}; threshold: 1s

testing/load
APIResponsiveness: Top latency metric: {Resource:nodes Subresource: Verb:LIST Scope:cluster Latency:perc50: 13.842ms, perc90: 56.789ms, perc99: 345.596ms Count:179}; threshold: 30s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:DELETE Scope:namespace Latency:perc50: 11.827ms, perc90: 116.55ms, perc99: 339.434ms Count:6666}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource:binding Verb:POST Scope:namespace Latency:perc50: 9.179ms, perc90: 100.26ms, perc99: 219.863ms Count:3333}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:pods Subresource: Verb:POST Scope:namespace Latency:perc50: 11.573ms, perc90: 68.031ms, perc99: 205.966ms Count:3333}; threshold: 1s
APIResponsiveness: Top latency metric: {Resource:deployments Subresource: Verb:DELETE Scope:namespace Latency:perc50: 3.56ms, perc90: 18.001ms, perc99: 201.319ms Count:328}; threshold: 1s

lavalamp · 2019-08-07T21:26:16Z

On update, we need to verify that the contents are valid OR that they didn't change.

…

On Wed, Aug 7, 2019 at 1:32 PM Jordan Liggitt ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/helpers.go <#80730 (comment)> : > } // UnmarshalJSON implements json.Unmarshaler func (f *Fields) UnmarshalJSON(b []byte) error { - return json.Unmarshal(b, &f.Map) + if f == nil { + return errors.New("metav1.Fields: UnmarshalJSON on nil pointer") + } + if !bytes.Equal(b, []byte("null")) { + f.Raw = append(f.Raw[0:0], b...) possibly, but I expected format validation on update anyway... — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#80730?email_source=notifications&email_token=AAE6BFQ4C6HG56EKXBGB4ZTQDMWPTA5CNFSM4IHW7YBKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCA42ALY#discussion_r311749465>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE6BFV3BRWMDDG5UWSW23LQDMWPTANCNFSM4IHW7YBA> .

jennybuckley · 2019-08-08T17:28:32Z

@lavalamp
I think we can be fairly confident that we will not persist invalid data in this field because of how we treat it on update:

kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/fieldmanager.go

Lines 94 to 106 in 229894c

    
           // First try to decode the managed fields provided in the update, 
        
           // This is necessary to allow directly updating managed fields. 
        
           managed, err := internal.DecodeObjectManagedFields(newObj) 
        
           // If the managed field is empty or we failed to decode it, 
        
           // let's try the live object. This is to prevent clients who 
        
           // don't understand managedFields from deleting it accidentally. 
        
           if err != nil || len(managed) == 0 { 
        
           	managed, err = internal.DecodeObjectManagedFields(liveObj) 
        
           	if err != nil { 
        
           		return nil, fmt.Errorf("failed to decode managed fields: %v", err) 
        
           	} 
        
           }

liggitt · 2019-08-08T17:30:26Z

hmm... that means a client can send malformed data in that field and we'll silently replace it with the current data from the persisted object?

I only see Update and Apply paths in that file. What happens on Create when there isn't a liveObject?

jennybuckley · 2019-08-08T17:43:17Z

Create and patch both go through this path as well, the create path sends an empty object as "liveObject":

kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/create.go

Lines 142 to 148 in 229894c

    
           liveObj, err := scope.Creater.New(scope.Kind) 
        
           if err != nil { 
        
           	scope.err(fmt.Errorf("failed to create new object (Create for %v): %v", scope.Kind, err), w, req) 
        
           	return 
        
           } 
        
           obj, err = scope.FieldManager.Update(liveObj, obj, managerOrUserAgent(options.FieldManager, req.UserAgent()))

jennybuckley · 2019-08-08T17:45:53Z

It sounds reasonable to fail instead of replace with the current data

liggitt · 2019-08-08T17:53:55Z

one other thought, do we normalize the bytes on write (by actually round-tripping them through the smd struct)? if not, would a client that roundtripped json and indented it or output keys in unsorted order cause different bytes to be written to etcd?

liggitt · 2019-08-08T17:57:35Z

answering my own question, looks like EncodeObjectManagedFields round-trips and overwrites with a normalized version, which is good. That would be good to test (get object from API, marshal managed fields with different json indenting, send to API (which should be a no-op), verify resourceVersion doesn't change)

jennybuckley · 2019-08-08T18:00:45Z

@liggitt yes that is currently true, and we should have a test for it. We had plans to sometimes skip the re-encoding but it seems like we should be careful with that. A test would help make sure we don't ever let user edited bytes directly through.

jennybuckley added 2 commits July 29, 2019 16:04

Update structured merge-diff version

c7c1a99

Update vendor

b7649db

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 29, 2019

k8s-ci-robot requested a review from lavalamp July 29, 2019 23:16

lavalamp reviewed Jul 29, 2019

View reviewed changes

jennybuckley force-pushed the apply-raw branch 2 times, most recently from dd8d59b to 69e3978 Compare July 29, 2019 23:59

jennybuckley force-pushed the apply-raw branch from 9a8f209 to 87f8ba5 Compare July 31, 2019 23:07

lavalamp mentioned this pull request Aug 1, 2019

WIP: custom serialization for field set #80497

Closed

k8s-ci-robot assigned lavalamp Aug 1, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2019

k8s-ci-robot merged commit 9743d7f into kubernetes:master Aug 2, 2019

k8s-ci-robot added this to the v1.16 milestone Aug 2, 2019

liggitt reviewed Aug 2, 2019

View reviewed changes

jennybuckley mentioned this pull request Aug 8, 2019

Followup to #80730 and add tests #81171

Closed

jennybuckley mentioned this pull request Aug 9, 2019

Add cached lazy unmarshalling for field sets #80438

Closed

sonyafenge mentioned this pull request Mar 5, 2021

Bump kube-openapi to 20200410 version and SMD to V3 CentaurusInfra/arktos#1010

Merged


		// +k8s:deprecated=fields,protobuf=5

		// FieldSet identifies a set of fields.

Use raw bytes instead of nested map in metav1.Fields #80730

Use raw bytes instead of nested map in metav1.Fields #80730

Conversation

jennybuckley commented Jul 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta-bot commented Jul 30, 2019

jennybuckley commented Jul 30, 2019

lavalamp commented Jul 30, 2019 via email

jennybuckley commented Jul 30, 2019

jennybuckley commented Jul 30, 2019

lavalamp commented Jul 30, 2019 via email

jennybuckley commented Aug 1, 2019

jennybuckley commented Aug 1, 2019 • edited

lavalamp commented Aug 1, 2019

lavalamp commented Aug 1, 2019

k8s-ci-robot commented Aug 1, 2019

lavalamp commented Aug 2, 2019

fejta-bot commented Aug 2, 2019

liggitt Aug 2, 2019 • edited

Choose a reason for hiding this comment

liggitt Aug 2, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Aug 2, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jennybuckley Aug 5, 2019 • edited

Choose a reason for hiding this comment

liggitt Aug 2, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jennybuckley commented Aug 6, 2019

lavalamp commented Aug 7, 2019 via email

jennybuckley commented Aug 8, 2019 • edited

liggitt commented Aug 8, 2019

jennybuckley commented Aug 8, 2019 • edited

jennybuckley commented Aug 8, 2019

liggitt commented Aug 8, 2019

liggitt commented Aug 8, 2019 • edited

jennybuckley commented Aug 8, 2019

jennybuckley commented Aug 1, 2019 •

edited

liggitt Aug 2, 2019 •

edited

liggitt Aug 2, 2019 •

edited

liggitt Aug 2, 2019 •

edited

jennybuckley Aug 5, 2019 •

edited

liggitt Aug 2, 2019 •

edited

jennybuckley commented Aug 8, 2019 •

edited

jennybuckley commented Aug 8, 2019 •

edited

liggitt commented Aug 8, 2019 •

edited