New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use raw bytes instead of nested map in metav1.Fields #80730
Conversation
|
||
// +k8s:deprecated=fields,protobuf=5 | ||
|
||
// FieldSet identifies a set of fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to document this, either by doing so here or by directing people to smd (and document there).
@@ -1108,23 +1111,6 @@ const ( | |||
ManagedFieldsOperationUpdate ManagedFieldsOperationType = "Update" | |||
) | |||
|
|||
// Fields stores a set of fields in a data structure like a Trie. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...which implies we need to keep this around, either here, or move to smd.
}, | ||
}, | ||
}, | ||
Raw: []byte(`{"f:metadata":{"f:name":{},".":{}}}`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see a golden data test that ensures no version incompatibilities ever sneak past?
(you can play with the benchmark code in SMD to get random fieldsets if that's helpful)
// +optional | ||
Fields *Fields `json:"fields,omitempty" protobuf:"bytes,5,opt,name=fields,casttype=Fields"` | ||
FieldSet *runtime.RawExtension `json:"fieldSet,omitempty" protobuf:"bytes,6,opt,name=fieldSet,casttype=k8s.io/apimachinery/pkg/runtime.RawExtension"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?
Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about e.g. versioning the contents of this field, since we won't be permitted to break it like this again once we're beta. Any thoughts about how we might do that?
Versioning could be done per managed fields entry, but we might also want to consider versioning the entire list together.
Is there a reason why you're not going with my trick of keeping the field name but changing the proto number?
To me it seemed easier to reason about removing a field and adding a new field, than changing the proto number and not the field name.
dd8d59b
to
69e3978
Compare
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
This also causes us to run into #55890 on every type |
Ouch, to fix that I think we have to duplicate RawExtension and provide a
custom schema :/
It's possible, we do it for some types (e.g, intOrString).
Alternatively, maybe we could leave the existing type for the purpose of
generating documentation, but implement all of the custom marshalling
functions so that it's effectively raw extension under the hood? Is this
feasible?
…On Tue, Jul 30, 2019 at 12:37 PM Jennifer Buckley ***@***.***> wrote:
This also causes us to run into #55890
<#55890> on every type
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#80730?email_source=notifications&email_token=AAE6BFWVATZKTQ37MKX76M3QCCJ6JA5CNFSM4IHW7YBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3FBXJY#issuecomment-516561831>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE6BFSGBW7VKT66WXDZO63QCCJ6JANCNFSM4IHW7YBA>
.
|
@lavalamp I prefer not using RawExtension honestly, because we aren't even using the Object part of the RawExtension anyway, and it seems like RawExtension is meant to be encodable and decodable by a Scheme |
I agree that making a custom version of RawExtension is probably the way to
go; I didn't do it in my earlier PR because I was just trying to measure
things.
…On Tue, Jul 30, 2019 at 1:05 PM Jennifer Buckley ***@***.***> wrote:
@lavalamp <https://github.com/lavalamp> we could also just fix #55890
<#55890>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#80730?email_source=notifications&email_token=AAE6BFVGUPQK75F2CGCQW5DQCCNJJA5CNFSM4IHW7YBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3FEX2Q#issuecomment-516574186>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE6BFTJWBOEVZHDNKHP5XTQCCNJJANCNFSM4IHW7YBA>
.
|
/retest |
benchmarks:
|
Strange, I thought my PoC showed more improvement for the first two benchmarks, and this should be the same? |
/lgtm I think a followup with a bit more in the way of golden data for catching future format breakage might be good. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jennybuckley, lavalamp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Oh I forgot for a second that this is an API change. Hey @liggitt, @smarterclayton, fyi, we're making the API a little bit weirder as discussed at the SIG API Machinery meeting a while back. |
/retest Review the full test history for this PR. Silence the bot with an |
return err | ||
} | ||
if len(p.Map) == 0 { | ||
return json.Unmarshal(p.Raw, &m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time following this... some comments might help.
if I'm reading this correctly, we expect this to be the normal path (since we should not encounter persisted alpha data)
does that mean our preferred proto format (serialized to proto tag 2) is a byte array of json? I didn't realize that's where we ended up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also potentially forces another allocation of len(p.Raw), since Fields#UnmarshalJSON just appends non-null bytes to the Raw field (with f.Raw = append(f.Raw[0:0], b...)
).
Would this work better?
m.Raw = nil
if !bytes.Equal(p.Raw, []byte("null")) {
m.Raw = p.Raw
}
return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a follow up for this too
field.Fields.Map[k1].Map[k2] = metav1.Fields{} | ||
} | ||
} | ||
field.Fields.Raw = []byte("{}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this (or maybe the change in https://github.com/kubernetes/kubernetes/pull/80730/files#diff-5380be8c9f2e29b4ac1b57c2533e021b) produces no output in the test fixtures.
previously, we serialized this: "fields": {"18":{"19":null}}
now, "fields"
is omitted entirely. that means if we break decoding this new format, our fixtures won't catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a follow up for this
if err != nil { | ||
return err | ||
} | ||
return json.Unmarshal(b, &m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here, should we just do m.Raw = b
at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we can do that. I was originally thinking this would keep the two serializations in line, and if i make the change you suggested above then it would be the same as changing it here, just one less function call
return errors.New("metav1.Fields: UnmarshalJSON on nil pointer") | ||
} | ||
if !bytes.Equal(b, []byte("null")) { | ||
f.Raw = append(f.Raw[0:0], b...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this allow storing non-map values (1
, true
, [""]
, ""
, etc)? what about map values that aren't schema-compatible with the old map[string]Field
structure? what ensures the raw JSON persisted in this field doesn't break old clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old clients as in, clients written while it was an alpha api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we can't ever send data that one of those clients would fail to deserialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could validate that it is just a map of maps, but would this require us to deserialize it on every update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly, but I expected format validation on update anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is already validated when we make sure it will convert into an smd fieldset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that happens on every write (create/update/all patch types/subresources/etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have negative tests ensuring malformed data submitted by clients is rejected?
Ran some scalability tests on this, these were the results. Going to compare these more and see if the improvement is enough
|
On update, we need to verify that the contents are valid OR that they
didn't change.
…On Wed, Aug 7, 2019 at 1:32 PM Jordan Liggitt ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/helpers.go
<#80730 (comment)>
:
> }
// UnmarshalJSON implements json.Unmarshaler
func (f *Fields) UnmarshalJSON(b []byte) error {
- return json.Unmarshal(b, &f.Map)
+ if f == nil {
+ return errors.New("metav1.Fields: UnmarshalJSON on nil pointer")
+ }
+ if !bytes.Equal(b, []byte("null")) {
+ f.Raw = append(f.Raw[0:0], b...)
possibly, but I expected format validation on update anyway...
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#80730?email_source=notifications&email_token=AAE6BFQ4C6HG56EKXBGB4ZTQDMWPTA5CNFSM4IHW7YBKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCA42ALY#discussion_r311749465>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE6BFV3BRWMDDG5UWSW23LQDMWPTANCNFSM4IHW7YBA>
.
|
@lavalamp kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/fieldmanager/fieldmanager.go Lines 94 to 106 in 229894c
|
hmm... that means a client can send malformed data in that field and we'll silently replace it with the current data from the persisted object? I only see Update and Apply paths in that file. What happens on Create when there isn't a liveObject? |
Create and patch both go through this path as well, the create path sends an empty object as "liveObject": kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/create.go Lines 142 to 148 in 229894c
|
It sounds reasonable to fail instead of replace with the current data |
one other thought, do we normalize the bytes on write (by actually round-tripping them through the smd struct)? if not, would a client that roundtripped json and indented it or output keys in unsorted order cause different bytes to be written to etcd? |
answering my own question, looks like EncodeObjectManagedFields round-trips and overwrites with a normalized version, which is good. That would be good to test (get object from API, marshal managed fields with different json indenting, send to API (which should be a no-op), verify resourceVersion doesn't change) |
@liggitt yes that is currently true, and we should have a test for it. We had plans to sometimes skip the re-encoding but it seems like we should be careful with that. A test would help make sure we don't ever let user edited bytes directly through. |
What type of PR is this?
/kind feature
/wg apply
/sig api-machinery
/priority important-soon
/cc @lavalamp
What this PR does / why we need it:
Addresses the scalability concerns in kubernetes/enhancements#1110 as a follow-up to #80497
This uses the functions defined in kubernetes-sigs/structured-merge-diff#90 to serialize and deserialize field sets into raw extensions, and removes metav1.Fields
Will update with benchmarks
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
kubernetes/enhancements#1110