Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GC] controller manager gets error "unable to get REST mapping for kind" for ownerRefs to TPR and add-on APIs #39816

Closed
hongchaodeng opened this issue Jan 12, 2017 · 45 comments · Fixed by #40497
Assignees
Labels
area/controller-manager priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@hongchaodeng
Copy link
Contributor

hongchaodeng commented Jan 12, 2017

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

TPR. GC. OwnerReference. REST mapping.


Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT.

Kubernetes version (use kubectl version):

master. v1.5.1

Environment:

  • Cloud provider or hardware configuration:
    Repro in "hack/local-up-cluster.sh". But I have seen it in many different env.
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:
The controller manager is unresponsive and overloaded with error message:

E0112 18:26:11.690849   21374 garbagecollector.go:594] Error syncing item 
&garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:v1.OwnerReference{APIVersion:"etcd.coreos.com/v1beta1", Kind:"Cluster", Name:"example-etcd-cluster", UID:"89603932-d8f4-11e6-94f3-42010af00002", Controller:(*bool)(0xc422850af8)}, Namespace:"default"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc42075a6c0):struct {}{}, (*garbagecollector.node)(0xc421a0abd0):struct {}{}, (*garbagecollector.node)(0xc421c074d0):struct {}{}}, owners:[]v1.OwnerReference(nil)}: 
unable to get REST mapping for kind: Cluster, version: etcd.coreos.com/v1beta1

This error just keeps bloating up without any backoff. This seems that REST mapping didn't recognize the kind.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Generic way:

  • Create a TPR.
  • Create a TPR object "A"
  • Create pods that has OwnerReference pointing to "A".

Easy way:

Anything else do we need to know:

@hongchaodeng hongchaodeng changed the title unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1 [GC] unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1 Jan 12, 2017
@hongchaodeng hongchaodeng changed the title [GC] unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1 [GC] unable to get REST mapping for TPR kind Jan 12, 2017
@zhouhaibing089
Copy link
Contributor

well, the garbage collector still uses a static RESTMapper(registered.RESTMapper), and in this case, TPRs aren't recognized at all.

@xiang90
Copy link
Contributor

xiang90 commented Jan 14, 2017

@zhouhaibing089

Not recognizing is fine and expected. But controller manager keeps on printing out error message without backing off.

@caesarxuchao Can you confirm this is unintended? Or we are missing anything here?

@hongchaodeng
Copy link
Contributor Author

Right. I should have mentioned it: the controller manager is unresponsive and overloaded with error message.

Updated in title and top comment.

@hongchaodeng hongchaodeng changed the title [GC] unable to get REST mapping for TPR kind [GC] controller manager unresponsive, overloaded with error "unable to get REST mapping for kind" Jan 15, 2017
@liggitt
Copy link
Member

liggitt commented Jan 15, 2017

there are two things that need fixing here:

  1. the garbage collector should use a dynamic restmapper that picks up types the server knows about, not just compiled-in types
    a. this means, at minimum, discovering types at start-up
    b. it also means refreshing at some interval
  2. in cases where an ownerRef references a type that is not found by the garbage collector's restmapper, it should not panic or hotloop. I'm not really sure what the behavior should be, actually should it treat owners it can't look up as if they were deleted, or ignore ownerRefs for unknown kinds?

@hongchaodeng
Copy link
Contributor Author

@liggitt
If a user runs kubernetes and sees messages like "ignore XXX ownerRef due to unknown kind", he can't really do anything with it. It would be great to recognize the kind, even if there is a delay.

@liggitt
Copy link
Member

liggitt commented Jan 15, 2017

ownerref could have bad data, or it could reference a kind that is no longer part of the cluster. need to define what happens in those cases.

when it is referencing a dynamic kind (either a kind contributed by an add-on server or by a thirdpartyresource), yes, it should re-discover on some interval to recognize the kind

@andrewwebber
Copy link

I have the same issue when launching an etcd cluster with the coreos etcd-operator. If causes the api server to run out of disk space as it logs infinitely.

I have to keep restart the controller container to reclaim disk space.
From a security point of view this means a tenant of a cluster can bring the whole cluster down.

{"log":"I0118 09:51:08.806215       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.2.102\", UID:\"882b86ac-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.2.102 event: Registered Node 10.10.2.102 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.8087232Z"}
{"log":"I0118 09:51:08.806251       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.103\", UID:\"db9e7d24-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.103 event: Registered Node 10.10.3.103 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808748906Z"}
{"log":"I0118 09:51:08.806268       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.104\", UID:\"d856a908-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.104 event: Registered Node 10.10.3.104 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808759381Z"}
{"log":"I0118 09:51:08.806282       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.202\", UID:\"d175e1b3-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.202 event: Registered Node 10.10.3.202 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808769116Z"}
{"log":"I0118 09:51:08.806296       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.204\", UID:\"c8bca4db-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.204 event: Registered Node 10.10.3.204 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808778906Z"}
{"log":"I0118 09:51:08.806310       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.205\", UID:\"d54de405-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.205 event: Registered Node 10.10.3.205 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808788798Z"}
{"log":"I0118 09:51:08.806324       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.102\", UID:\"d18c2b99-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.102 event: Registered Node 10.10.3.102 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808798638Z"}
{"log":"I0118 09:51:08.806338       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.101\", UID:\"db492f64-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.101 event: Registered Node 10.10.3.101 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808808629Z"}
{"log":"I0118 09:51:08.806352       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.201\", UID:\"d50dc664-dc0d-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.201 event: Registered Node 10.10.3.201 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.80881979Z"}
{"log":"I0118 09:51:08.806368       1 event.go:217] Event(api.ObjectReference{Kind:\"Node\", Namespace:\"\", Name:\"10.10.3.203\", UID:\"0eea82fc-dc0e-11e6-9ff0-001e4f520ea4\", APIVersion:\"\", ResourceVersion:\"\", FieldPath:\"\"}): type: 'Normal' reason: 'RegisteredNode' Node 10.10.3.203 event: Registered Node 10.10.3.203 in NodeController\n","stream":"stderr","time":"2017-01-18T09:51:08.808830097Z"}
{"log":"I0118 09:51:18.471149       1 garbagecollector.go:780] Garbage Collector: All monitored resources synced. Proceeding to collect garbage\n","stream":"stderr","time":"2017-01-18T09:51:18.471400353Z"}
{"log":"E0118 09:58:30.196257       1 garbagecollector.go:593] Error syncing item \u0026garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:metatypes.OwnerReference{APIVersion:\"coreos.com/v1\", Kind:\"EtcdCluster\", UID:\"a9df64c0-dd64-11e6-a5cc-001e4f520ea4\", Name:\"etcd-client\", Controller:(*bool)(0xc421948230)}, Namespace:\"rhino-ci\"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc422b50510):struct {}{}}, owners:[]metatypes.OwnerReference(nil)}: unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1\n","stream":"stderr","time":"2017-01-18T09:58:30.196391795Z"}
{"log":"E0118 09:58:30.197389       1 garbagecollector.go:593] Error syncing item \u0026garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:metatypes.OwnerReference{APIVersion:\"coreos.com/v1\", Kind:\"EtcdCluster\", UID:\"a9df64c0-dd64-11e6-a5cc-001e4f520ea4\", Name:\"etcd-client\", Controller:(*bool)(0xc421948230)}, Namespace:\"rhino-ci\"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc422b50510):struct {}{}}, owners:[]metatypes.OwnerReference(nil)}: unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1\n","stream":"stderr","time":"2017-01-18T09:58:30.197454872Z"}
{"log":"E0118 09:58:30.197714       1 garbagecollector.go:593] Error syncing item \u0026garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:metatypes.OwnerReference{APIVersion:\"coreos.com/v1\", Kind:\"EtcdCluster\", UID:\"a9df64c0-dd64-11e6-a5cc-001e4f520ea4\", Name:\"etcd-client\", Controller:(*bool)(0xc421948230)}, Namespace:\"rhino-ci\"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc422b50510):struct {}{}}, owners:[]metatypes.OwnerReference(nil)}: unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1\n","stream":"stderr","time":"2017-01-18T09:58:30.197757981Z"}
{"log":"E0118 09:58:30.198055       1 garbagecollector.go:593] Error syncing item \u0026garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:metatypes.OwnerReference{APIVersion:\"coreos.com/v1\", Kind:\"EtcdCluster\", UID:\"a9df64c0-dd64-11e6-a5cc-001e4f520ea4\", Name:\"etcd-client\", Controller:(*bool)(0xc421948230)}, Namespace:\"rhino-ci\"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc422b50510):struct {}{}}, owners:[]metatypes.OwnerReference(nil)}: unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1\n","stream":"stderr","time":"2017-01-18T09:58:30.198092124Z"}
{"log":"E0118 09:58:30.198427       1 garbagecollector.go:593] Error syncing item \u0026garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:metatypes.OwnerReference{APIVersion:\"coreos.com/v1\", Kind:\"EtcdCluster\", UID:\"a9df64c0-dd64-11e6-a5cc-001e4f520ea4\", Name:\"etcd-client\", Controller:(*bool)(0xc421948230)}, Namespace:\"rhino-ci\"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc422b50510):struct {}{}}, owners:[]metatypes.OwnerReference(nil)}: unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1\n","stream":"stderr","time":"2017-01-18T09:58:30.198735295Z"}
{"log":"E0118 09:58:30.199169       1 garbagecollector.go:593] Error syncing item \u0026garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:metatypes.OwnerReference{APIVersion:\"coreos.com/v1\", Kind:\"EtcdCluster\", UID:\"a9df64c0-dd64-11e6-a5cc-001e4f520ea4\", Name:\"etcd-client\", Controller:(*bool)(0xc421948230)}, Namespace:\"rhino-ci\"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc422b50510):struct {}{}}, owners:[]metatypes.OwnerReference(nil)}: unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1\n","stream":"stderr","time":"2017-01-18T09:58:30.19922236Z"}
{"log":"E0118 09:58:30.199861       1 garbagecollector.go:593] Error syncing item \u0026garbagecollector.node{identity:garbagecollector.objectReference{OwnerReference:metatypes.OwnerReference{APIVersion:\"coreos.com/v1\", Kind:\"EtcdCluster\", UID:\"a9df64c0-dd64-11e6-a5cc-001e4f520ea4\", Name:\"etcd-client\", Controller:(*bool)(0xc421948230)}, Namespace:\"rhino-ci\"}, dependentsLock:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, dependents:map[*garbagecollector.node]struct {}{(*garbagecollector.node)(0xc422b50510):struct {}{}}, owners:[]metatypes.OwnerReference(nil)}: unable to get REST mapping for kind: EtcdCluster, version: coreos.com/v1\n","stream":"stderr","time":"2017-01-18T09:58:30.199929128Z"}

@0xmichalis 0xmichalis added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/controller-manager labels Jan 18, 2017
@0xmichalis 0xmichalis added this to the v1.6 milestone Jan 18, 2017
@xiang90
Copy link
Contributor

xiang90 commented Jan 18, 2017

@Kargakis Why the target fix is in 1.6 not a patch release for 1.5?

@liggitt
Copy link
Member

liggitt commented Jan 18, 2017

@Kargakis Why the target fix is in 1.6 not a patch release for 1.5?

I would not anticipate the garbage collector actually enabling support for TPRs in 1.5, but the bug that causes it to become unresponsive when it sees an ownerRef of an unknown type should be fixed

@0xmichalis 0xmichalis modified the milestones: v1.5, v1.6 Jan 18, 2017
@0xmichalis
Copy link
Contributor

Changed the milestone to 1.5

@xiang90
Copy link
Contributor

xiang90 commented Jan 18, 2017

@liggitt

I agree. I do not expect the gc to work with TPR either. But it should not "kill" controller manager. And it is the issue we should fix in 1.5. We can make GC work for TPR in 1.6.

@andrewwebber
Copy link

For me this is currently a DOS attack

@mikedanese
Copy link
Member

mikedanese commented Jan 25, 2017

Repro instructions on master in GCE:

$ cat <<EOF | kubectl apply -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  namespace: default
  name: service-account-all
rules:
  - apiGroups: ["*"]
    resources: ["*"]
    verbs: ["*"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
  name: read-secrets-global
subjects:
  - kind: Group
    name: system:serviceaccounts
roleRef:
  kind: ClusterRole
  name: service-account-all
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: etcd-operator
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: etcd-operator
    spec:
      containers:
      - name: etcd-operator
        image: quay.io/coreos/etcd-operator
        env:
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
EOF
$ sleep 30 # wait for operator to create third party resource
$ cat <<EOF | kubectl create -f -
apiVersion: "coreos.com/v1"
kind: "EtcdCluster"
metadata:
  name: "etcd-cluster"
spec:
  size: 3
  version: "v3.1.0-alpha.1"
EOF

@lavalamp
Copy link
Member

@mikedanese so you don't even need to have something with an owner ref pointing at the TPR to trigger this?

@liggitt
Copy link
Member

liggitt commented Jan 25, 2017

I would guess an ownerRef pointing to any arbitrary unknown kind would trigger the same behavior

@mikedanese
Copy link
Member

@lavalamp the etcd operator creates etcd pods with:

  ownerReferences:
  - apiVersion: coreos.com/v1
    controller: true
    kind: EtcdCluster
    name: etcd-cluster
    uid: 506aeb37-e346-11e6-a281-42010af00002

I suspect @liggitt is correct.

@lavalamp
Copy link
Member

Ah, the operator triggers it. OK, I see.

@xiang90
Copy link
Contributor

xiang90 commented Jan 25, 2017

etcd operator does not rely on the k8s GC features (it has internal GC). But it does write owner ref. when k8s gc works for tpr, we will remove our temp internal GC. but now the owner ref causes controller manager to spam.

@calebamiles calebamiles modified the milestones: v1.6, v1.5 Feb 27, 2017
@caesarxuchao
Copy link
Member

#38679 is close to get in. It will fix the hot loops.
We still need a dynamic restmapper. I'll start a fix after #38679 lands.

ash2k added a commit to atlassian/smith that referenced this issue Mar 3, 2017
@mcluseau
Copy link
Contributor

mcluseau commented Mar 3, 2017

FWIW, here's the patch I wrote to get back to a more reasonable behavior:

diff --git a/pkg/controller/garbagecollector/garbagecollector.go b/pkg/controller/garbagecollector/garbagecollector.go
index 77b8251ca5..4db4583915 100644
--- a/pkg/controller/garbagecollector/garbagecollector.go
+++ b/pkg/controller/garbagecollector/garbagecollector.go
@@ -591,8 +591,11 @@ func (gc *GarbageCollector) worker() {
        err := gc.processItem(timedItem.Object.(*node))
        if err != nil {
                utilruntime.HandleError(fmt.Errorf("Error syncing item %#v: %v", timedItem.Object, err))
-               // retry if garbage collection of an object failed.
-               gc.dirtyQueue.Add(timedItem)
+               go func() {
+                       time.Sleep(500 * time.Millisecond) // quick and dirty fix the hot loop
+                       // retry if garbage collection of an object failed.
+                       gc.dirtyQueue.Add(timedItem)
+               }()
                return
        }
        DirtyProcessingLatency.Observe(sinceInMicroseconds(gc.clock, timedItem.StartTime))

(then, rebuild your controller-manager or hyperkube)

@caesarxuchao
Copy link
Member

caesarxuchao commented Mar 3, 2017

Thanks @MikaelCluseau. #38679 changed gc to use the ratelimited queue, so the hotloop should have been fixed on the master branch. [edit] That said, we still need to make the restmapper dynamic, i'm working on a fix.

@mcluseau
Copy link
Contributor

mcluseau commented Mar 3, 2017

My patch was done on the release-1.5 branch, should anyone prefer to avoid upgrading to alpha.0 :)

@mcluseau
Copy link
Contributor

mcluseau commented Mar 3, 2017

@caesarxuchao BTW, is there a backport to 1.5 planned? #38679 is flagged size/XXL so, should a smaller patch be needed, feel free to use this one of course.

@caesarxuchao
Copy link
Member

Update: I tried to reuse DeferredDiscoveryRESTMapper to dynamically translate TPR, but it didn't seem to work because of #42516. I'll find a workaround.

@caesarxuchao
Copy link
Member

@MikaelCluseau what version are you on? @mikedanese said #40497 was landed in v1.5.3, so the problems should be masked. Is that true?

@mcluseau
Copy link
Contributor

mcluseau commented Mar 7, 2017

@caesarxuchao I had to overwrite hyperkube in the docker image quay.io/coreos/hyperkube-v1.5.3_coreos.0

$ docker exec xxx /controller-manager --version
Kubernetes v1.5.4-beta.0.31+f41f18b8a68842-dirty

@mcluseau
Copy link
Contributor

mcluseau commented Mar 7, 2017

Ahhh it has this commit 64029a2 which says:

Adjust global log limit to 1ms

So probably not a difference a human can detect.

@caesarxuchao
Copy link
Member

I see. 1 ms is still too spamming.

@mcluseau
Copy link
Contributor

mcluseau commented Mar 7, 2017

At least, it make the log unusable without filtering ;) I have a global feeling that Kubernetes log system could be improved but I still need to have a better of that. The global problem to solve is probably the repetition of log, something that's solved by events (or at the way kubectl shows them); so maybe I should move my practice more to events and, when I don't have the answer in the events, file an issue/PR.

k8s-github-robot pushed a commit that referenced this issue Mar 10, 2017
Automatic merge from submit-queue (batch tested with PRs 38805, 42362, 42862)

Let GC print specific message for RESTMapping failure

Make the error messages reported in #39816 to be more specific, also only print the message once.

I'll also update the garbage collector's doc to clearly state we don't support tpr yet.

We'll wait for the watchable discovery feature (@sttts are you going to work on that?) to land in 1.7, and then enable the garbage collector to handle TPR.

cc @hongchaodeng @MikaelCluseau @djMax
@liggitt
Copy link
Member

liggitt commented Mar 14, 2017

@caesarxuchao the backoff removed the log flooding, right? if so, I think this can be moved to 1.7

@liggitt
Copy link
Member

liggitt commented Mar 14, 2017

c.f. #42615 (comment)

@liggitt liggitt modified the milestones: v1.7, v1.6 Mar 14, 2017
@liggitt liggitt changed the title [GC] controller manager unresponsive, overloaded with error "unable to get REST mapping for kind" [GC] controller manager gets error "unable to get REST mapping for kind" for TPR and add-on APIs Mar 14, 2017
@liggitt liggitt changed the title [GC] controller manager gets error "unable to get REST mapping for kind" for TPR and add-on APIs [GC] controller manager gets error "unable to get REST mapping for kind" for ownerRefs to TPR and add-on APIs Mar 14, 2017
@caesarxuchao
Copy link
Member

Thanks @liggitt. You are right, this is not a 1.6 blocker. #42862 makes sure the log message is only printed once per un-resolvable ownerReference.

@caesarxuchao
Copy link
Member

I think the original issue of "spamming error messages" was solved. I created #44507 to track the issue of "GC should support non-core API". Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller-manager priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.