New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apiserver: Call .Decorator inside update func #107847
apiserver: Call .Decorator inside update func #107847
Conversation
I didn't expect default-on-read code to be in a decorator. I found
@deads2k might recall something useful off the top of his head. |
there are two types of back-filling service does, iirc...
for the first category, using the standard defaulting path makes them apply far more consistently (on read from etcd, from incoming requests, and after applying a patch from a mutating webhook) for the second category, plumbing config into the standard defaulting path wasn't really workable, so I think that pushed services to use storage strategy Decorator |
We wanted to handle cases like a client who doesn't understand clusterIPs seeming to clear it and that ending up in losing the 2nd IP in a dual-stack service. We have a million test cases that permute this in all directions (though apparently not enough, given the bug at hand) |
tl;dr:
diff --git a/pkg/registry/core/service/storage/storage.go b/pkg/registry/core/service/storage/storage.go
index d9d77ae6cf4..f470e39cf38 100644
--- a/pkg/registry/core/service/storage/storage.go
+++ b/pkg/registry/core/service/storage/storage.go
@@ -365,7 +365,10 @@ func (r *REST) beginCreate(ctx context.Context, obj runtime.Object, options *met
func (r *REST) beginUpdate(ctx context.Context, obj, oldObj runtime.Object, options *metav1.UpdateOptions) (genericregistry.FinishFunc, error) {
newSvc := obj.(*api.Service)
oldSvc := oldObj.(*api.Service)
+
+ // make sure the existing object has all fields we expect to be defaulted set
+ r.defaultOnRead(oldSvc)
// Fix up allocated values that the client may have not specified (for
// idempotence).
Observation A:
Observation B: Observation C:
Observation D:
Observation E:
Because of A2 (existing object is not decorated on update), E1 (new object defaults clusterIPs by copying from existing, rather than defaulting as This PR adds a It seems arbitrary to decorate earlier internally to fix up this one internal use case for BeginUpdate and leave A1, A3, and A4 undecorated and BeginCreate / Admission / Validation handling undecorated objects. Concretely, that means admission webhooks can't rely on It also seems incorrect to decorate objects in A1, A3, and A4 because that pushes decoration data into storage (directly in opposition to the "this is for setting unstored values" docs). But if we look at how Service is using It also seems incorrect to decorate existing and new object inconsistently (to change A2 and not change A3)... it doesn't happen to matter in the end for Service because of the duplicate logic that sets the same fields in BeginCreate/BeginUpdate, but for a type to set I think to fix the immediate issue, service and pvc storage should apply their own defaultOnRead decoration to the old object at the top of beginUpdate. |
Decorators were definitely not intended to be used before update (and adding this would break the semantics of someone using a decorator). Basically, decorators are only for fields that cannot and should not be persisted, are not watchable, and should deliberately be changeable outside of storage (for whatever reason someone was using them). Any watchable field should definitely be a pre update call, but we shouldn’t change the meaning of a decorator. We definitely should update doc to indicate what decorator cannot be used for. I expected those resources to have their strategy set these during on before create and on before update. EDIT: one more decorator use case - for synthetic fields that would be expensive to store at rest which can be calculated from fields stored at rest (which would be watchable because the inputs are only in the object) |
No disagreement. I don't think clusterIPs can (because of old clients) and I
Plausible. I'll try it.
True. What I think Service really wants is a I don't think it would apply on the Create() path - that's what the existing WebHooks are an interesting one. There are 3 classes of fields: those the user It's also worth noting that "Modernize" is pretty much what we need to "touch" Anyway, I ACK that this is somewhat an abuse of Decorator(), but it was the |
@bswartz - you did the PVC equivalent of this, but it's not clear to me if you will have the same bugs (since this centers on Update()) |
28d6ad9
to
d5f1b50
Compare
pushed with alternate fix, which seems to work, too. |
/test pull-kubernetes-integration this keeps failing with timeouts, is probably not related |
I'll take a look at this. |
@thockin test with reproducer, I've also verified that this fixes the problem, can you please incorporate it? diff --git a/test/integration/service/upgrade_test.go b/test/integration/service/upgrade_test.go
new file mode 100644
index 00000000000..1df58674d2f
--- /dev/null
+++ b/test/integration/service/upgrade_test.go
@@ -0,0 +1,80 @@
+/*
+Copyright 2022 The Kubernetes Authors.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+package service
+
+import (
+ "context"
+ "testing"
+
+ v1 "k8s.io/api/core/v1"
+ metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+ "k8s.io/apimachinery/pkg/runtime"
+ "k8s.io/client-go/kubernetes"
+ kubeapiservertesting "k8s.io/kubernetes/cmd/kube-apiserver/app/testing"
+ "k8s.io/kubernetes/pkg/api/legacyscheme"
+ "k8s.io/kubernetes/test/integration/framework"
+)
+
+func Test_UpgradeService(t *testing.T) {
+ etcdOptions := framework.SharedEtcd()
+ apiServerOptions := kubeapiservertesting.NewDefaultTestServerOptions()
+ s := kubeapiservertesting.StartTestServerOrDie(t, apiServerOptions, nil, etcdOptions)
+ defer s.TearDownFn()
+ serviceName := "test-old-service"
+ ns := "old-service-ns"
+
+ kubeclient, err := kubernetes.NewForConfig(s.ClientConfig)
+ if err != nil {
+ t.Fatalf("Unexpected error: %v", err)
+ }
+ if _, err := kubeclient.CoreV1().Namespaces().Create(context.TODO(), (&v1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: ns}}), metav1.CreateOptions{}); err != nil {
+ t.Fatal(err)
+ }
+
+ // Create a service and store it in etcd with missing fields representing an old version
+ svc := &v1.Service{
+ ObjectMeta: metav1.ObjectMeta{
+ Name: serviceName,
+ Namespace: ns,
+ },
+ Spec: v1.ServiceSpec{
+ ClusterIP: "10.0.0.1",
+ Ports: []v1.ServicePort{
+ {
+ Name: "test-port",
+ Port: 81,
+ },
+ },
+ },
+ }
+ svcJSON, err := runtime.Encode(legacyscheme.Codecs.LegacyCodec(v1.SchemeGroupVersion), svc)
+ if err != nil {
+ t.Fatalf("Failed creating service JSON: %v", err)
+ }
+ key := "/" + etcdOptions.Prefix + "/services/specs/" + ns + "/" + serviceName
+ if _, err := s.EtcdClient.Put(context.Background(), key, string(svcJSON)); err != nil {
+ t.Error(err)
+ }
+ t.Logf("Service stored in etcd %v", string(svcJSON))
+
+ // Try to update the service
+ _, err = kubeclient.CoreV1().Services(ns).Update(context.TODO(), svc, metav1.UpdateOptions{})
+ if err != nil {
+ t.Error(err)
+ }
+
+} @liggitt we should have one of this for APIs that add new fields, maybe generalize it as the ones that test the etcd storage data? |
+1 for adding this test (I'd also recommend setting metadata.creationTimestamp and metadata.uid to match what all persisted objects have)
Since most types use standard defaulting which is applied much more consistently and straightforwardly, I don't think they have to test to the same degree service does. |
d5f1b50
to
8d1dcde
Compare
Added test, with UID and creation set. Thanks, @aojea. |
This is causing a bug when upgrading from older releases to 1.23 because of Service's maybe-too-clever default-on-read logic. Service depends on `Decorator()` to be called upon read, to back-populate old saved objects which do not have `.clusterIPs[]` set. This works on read, but the cache saves the pre-decorated type (as it is documented) In 1.23, this code was refactored and it seems some edge-case handling was inadvertently removed (I have not confirmed exactly what happened). Test by aojea
8d1dcde
to
e927ce8
Compare
LGTM, but better if Jordan, Clayton or Daniel approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
lgtm, pick to 1.23 is next |
LGTM, this looks much better I think patch/apply go through this code path, so you've fixed them too, but might be good to double check. |
It does and I checked. |
…847-upstream-release-1.23 Automated cherry pick of #107847: service REST: Call Decorator(old) on update path
/triage accepted |
This is causing a bug when upgrading from older releases to 1.23 because
of Service's maybe-too-clever default-on-read logic.
Service depends on
.Decorator()
to be called upon read, toback-populate old saved objects which do not have
.clusterIPs[]
set.This works on read, but the cache saves the pre-decorated type.
In 1.23, this code was refactored and it seems some edge-case handling
was inadvertently removed (I have not confirmed exactly what happened).
The simplest fix is this one - it will catch anyone who follows this
pattern. Alternately, we could patch this in Service registry, but if
anyone else follows this default-on-read pattern (and I know there are
some) they will also be susceptible to this sort of bug.
/kind bug
/kind regression