Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size #57073

chenchun · 2017-12-12T03:10:19Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
Create a job with lots of envs. The job can't run normally. Controller keeps logging Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size.

[root@kubernetes-master yaml]# kubectl get jobs -o wide
NAME         DESIRED   SUCCESSFUL   AGE       CONTAINERS   IMAGES           SELECTOR
bigmetajob   100       0            12m       bigmetajob   chenchun/hello   controller-uid=d5a0ff6b-dee7-11e7-9ab9-080027242396

[root@kubernetes-master log]# tail -f kube-controller-manager.log 
E1212 02:58:19.428087   14503 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size

What you expected to happen:
The job should run normally or fail early on creating.
How to reproduce it (as minimally and precisely as possible):

[root@kubernetes-master yaml]# cat bigmetajob.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: bigmetajob
spec:
  completions: 100
  parallelism: 100
  template:
    metadata:
      name: bigmetajob 
    spec:
      restartPolicy: Never
      containers:
      - name: bigmetajob
        image: chenchun/hello 
        imagePullPolicy: Never
        command: ["date"]
        env:
        - name: longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglongKey
          value: longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglongValue
        - name: longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglongKey
          value: longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglongValue
        - name: longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglongKey
          value: longlonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglonglongValue
        ...

[root@kubernetes-master yaml]# du -sh bigmetajob.yaml 
1.9M	bigmetajob.yaml

[root@kubernetes-master yaml]# kubectl create -f bigmetajob.yaml 
job "bigmetajob" created

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

chenchun · 2017-12-12T03:18:09Z

The error comes from the decoder which limits the max bytes up to 1MiB. I think we should limit max bytes of a object's meta on creating.

dims · 2017-12-13T02:37:24Z

/sig api-machinery

yliaog · 2017-12-18T21:40:14Z

/cc @yliaog

fejta-bot · 2018-03-18T21:53:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-04-17T22:10:55Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

nikhita · 2018-04-20T09:50:27Z

/remove-lifecycle rotten

yliaog · 2018-04-20T17:20:54Z

agreed, since the decoder has the 1MiB limit, the limit should be applied at the creation time. We don't want an object stored in the system that the system cannot decode.

fejta-bot · 2018-07-19T17:34:44Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

discordianfish · 2018-07-23T17:08:03Z

/remove-lifecycle stale
/lifecycle freeze

The nginx ingress controller runs into the same problem until it gives up and the config becomes stale.

fejta-bot · 2018-10-21T18:04:05Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

chenchun · 2018-10-22T02:00:48Z

/remove-lifecycle stale

chenchun · 2018-10-22T05:57:06Z

/remove-lifecycle stale

carsonoid · 2018-10-24T17:09:43Z

+1 This is a huge issue. It opens up a very easy dos attack from anyone that has create or update access to any objects in a cluster.

chenchun · 2018-10-30T03:51:21Z

Behaviour of kube-controller-manager changed after the original issue report of #57073 .
Here is the test against latest upstream.

First I created a 1.4M job just like the yaml described at the top comment with little differences of total numbers of envs. Kube-controller-manager watched the creation of this obj and its length is larger than the maximum limit 1Mib. So Kube-controller-manager keeped logging streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size.
What supprises me is that Kube-controller-manager successfully created pods for this job and ran to complete. I added some logs in reflector.go and found that these pods were created via list func instead of watch. The repeated logs Unexpected watch close - watch lasted less than a second and no items received partially proved my finding.
From the code view, list func invokes transformResponse
to read http body through data, err := ioutil.ReadAll(resp.Body), while watch invokes StreamWatcher#receive to read http body which has a maximum bytes limit.

[root@master yaml]# du -sh bigmetajob.yaml
1.4M    bigmetajob.yaml
[root@master yaml]# kubectl create -f bigmetajob.yaml                                                                                                                                                       
job.batch/bigmetajob created

# job ran to complete
[root@master ~]# kubectl get pods -o wide --all-namespaces                                                                                                                                                  
NAMESPACE   NAME                     READY   STATUS      RESTARTS   AGE     IP             NODE     NOMINATED NODE                                                                                          
default     bigmetajob-g88fk         0/1     Completed   0          6m11s   172.16.66.98   master   <none>                                                                                                  
default     bigmetajob-h26hc         0/1     Completed   0          6m11s   172.16.22.33   node1    <none>                                                                                                  
default     bigmetajob-z45xl         0/1     Completed   0          6m11s   172.16.66.99   master   <none>  

# kube-controller-manager logs
E1030 10:03:30.313228    7707 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size
E1030 10:03:30.320118    7707 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size
W1030 10:03:30.320163    7707 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Job ended with: very short watch: k8s.io/client-go/informers/factory.go:132: Unexpected watch close - watch lasted less than a second and no items received
I1030 10:03:31.046311    7707 node_lifecycle_controller.go:929] Node master ReadyCondition updated. Updating timestamp.
I1030 10:03:31.320678    7707 reflector.go:169] Listing and watching *v1.Job from k8s.io/client-go/informers/factory.go:132
I1030 10:03:31.420416    7707 event.go:221] Event(v1.ObjectReference{Kind:"Job", Namespace:"default", Name:"bigmetajob", UID:"0d768b4f-dc2b-11e8-88a4-52540005f38a", APIVersion:"batch/v1", ResourceVersion:"103944", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: bigmetajob-dv88b
E1030 10:03:31.435014    7707 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size
E1030 10:03:31.454380    7707 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size
W1030 10:03:31.454549    7707 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Pod ended with: very short watch: k8s.io/client-go/informers/factory.go:132: Unexpected watch close - watch lasted less than a second and no items received
I1030 10:03:31.555012    7707 event.go:221] Event(v1.ObjectReference{Kind:"Job", Namespace:"default", Name:"bigmetajob", UID:"0d768b4f-dc2b-11e8-88a4-52540005f38a", APIVersion:"batch/v1", ResourceVersion:"103944", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: bigmetajob-ljdfj
I1030 10:03:31.557595    7707 event.go:221] Event(v1.ObjectReference{Kind:"Job", Namespace:"default", Name:"bigmetajob", UID:"0d768b4f-dc2b-11e8-88a4-52540005f38a", APIVersion:"batch/v1", ResourceVersion:"103944", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: bigmetajob-6r5w4
E1030 10:03:31.615086    7707 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size
W1030 10:03:31.615153    7707 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Job ended with: very short watch: k8s.io/client-go/informers/factory.go:132: Unexpected watch close - watch lasted less than a second and no items received
I1030 10:03:32.454955    7707 reflector.go:169] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:132
E1030 10:03:32.612084    7707 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size
W1030 10:03:32.612125    7707 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Pod ended with: very short watch: k8s.io/client-go/informers/factory.go:132: Unexpected watch close - watch lasted less than a second and no items received
I1030 10:03:32.617363    7707 reflector.go:169] Listing and watching *v1.Job from k8s.io/client-go/informers/factory.go:132
E1030 10:03:32.752217    7707 streamwatcher.go:109] Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size
W1030 10:03:32.752650    7707 reflector.go:270] k8s.io/client-go/informers/factory.go:132: watch of *v1.Job ended with: very short watch: k8s.io/client-go/informers/factory.go:132: Unexpected watch close - watch lasted less than a second and no items received

But when I tried to create a 2.0M job, kube-apiserver failed to create it because etcdserver has a limit of maximum size of http body.

[root@master yaml]# du -sh bbbjob.yaml
2.0M    bbbjob.yaml
[root@master yaml]# kubectl create -f bbbjob.yaml
Error from server: error when creating "bbbjob.yaml": etcdserver: request is too large

carsonoid · 2018-10-30T15:58:04Z

It's good to see that the controller partially works. The big issue here is consistency, I would rather have the initial API request be rejected then experience inconsistent performance. Anything that uses client-go for watches will experience similar issues that may go undetected for a long time. Thank you for making the PR to fix this at the api layer!

fejta-bot · 2019-01-28T16:47:42Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-02-27T17:06:09Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

carsonoid · 2019-02-27T17:09:34Z

/close

k8s-ci-robot · 2019-02-27T17:09:43Z

@carsonoid: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 12, 2017

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 12, 2017

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Dec 13, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 13, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 17, 2018

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 20, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2018

discordianfish mentioned this issue Jul 23, 2018

ingress controller becomes stale after being unable to decode events from watch stream kubernetes/ingress-nginx#2837

Closed

cloudnautique mentioned this issue Aug 30, 2018

rancher 2.0.4 server memory leak rancher/rancher#14356

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 21, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 22, 2018

chenchun mentioned this issue Oct 30, 2018

Limit max bytes of the objects on creating at kube-apiserver instead of limiting on watching at kube-controller #70404

Closed

jpbetz mentioned this issue Dec 14, 2018

Increase limit for object size in streaming serializer #72053

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2019

freehan mentioned this issue Jan 29, 2019

Endpoints API Object could not support large number of endpoints #73324

Closed

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2019

k8s-ci-robot closed this as completed Feb 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size #57073

Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size #57073

chenchun commented Dec 12, 2017

chenchun commented Dec 12, 2017

dims commented Dec 13, 2017

yliaog commented Dec 18, 2017

fejta-bot commented Mar 18, 2018

fejta-bot commented Apr 17, 2018

nikhita commented Apr 20, 2018

yliaog commented Apr 20, 2018

fejta-bot commented Jul 19, 2018

discordianfish commented Jul 23, 2018

fejta-bot commented Oct 21, 2018

chenchun commented Oct 22, 2018

chenchun commented Oct 22, 2018

carsonoid commented Oct 24, 2018

chenchun commented Oct 30, 2018

carsonoid commented Oct 30, 2018

fejta-bot commented Jan 28, 2019

fejta-bot commented Feb 27, 2019

carsonoid commented Feb 27, 2019

k8s-ci-robot commented Feb 27, 2019

Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size #57073

Unable to decode an event from the watch stream: object to decode was longer than maximum allowed size #57073

Comments

chenchun commented Dec 12, 2017

chenchun commented Dec 12, 2017

dims commented Dec 13, 2017

yliaog commented Dec 18, 2017

fejta-bot commented Mar 18, 2018

fejta-bot commented Apr 17, 2018

nikhita commented Apr 20, 2018

yliaog commented Apr 20, 2018

fejta-bot commented Jul 19, 2018

discordianfish commented Jul 23, 2018

fejta-bot commented Oct 21, 2018

chenchun commented Oct 22, 2018

chenchun commented Oct 22, 2018

carsonoid commented Oct 24, 2018

chenchun commented Oct 30, 2018

carsonoid commented Oct 30, 2018

fejta-bot commented Jan 28, 2019

fejta-bot commented Feb 27, 2019

carsonoid commented Feb 27, 2019

k8s-ci-robot commented Feb 27, 2019