Skip to content

Kubelet gets "Timeout: Too large resource version" error from the API server after network outage #91073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rkojedzinszky opened this issue May 13, 2020 · 29 comments · Fixed by openshift/insights-operator#186 or #115093
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@rkojedzinszky
Copy link

What happened:
I have disconnected a node from network for a few minutes. After reconnecting, I keep receiving such error messages from kubelet on the node, even after 15 minutes in reconnected state:

May 13 19:44:54 k8s-node04 kubelet[598]: I0513 19:44:54.043005     598 trace.go:116] Trace[1747581308]: "Reflector ListAndWatch" name:object-"kube-system"/"default-token-h8dz9" (started: 2020-05-13 19:44:14.938918643 +0000 UTC m=+81978.107654790) (total time: 39.10398118s):
May 13 19:44:54 k8s-node04 kubelet[598]: Trace[1747581308]: [39.10398118s] [39.10398118s] END
May 13 19:44:54 k8s-node04 kubelet[598]: E0513 19:44:54.043090     598 reflector.go:178] object-"kube-system"/"default-token-h8dz9": Failed to list *v1.Secret: Timeout: Too large resource version: 159128021, current: 159127032
May 13 19:45:16 k8s-node04 kubelet[598]: I0513 19:45:16.944515     598 trace.go:116] Trace[527369896]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:135 (started: 2020-05-13 19:44:37.84920601 +0000 UTC m=+82001.017941865) (total time: 39.095209656s):
May 13 19:45:16 k8s-node04 kubelet[598]: Trace[527369896]: [39.095209656s] [39.095209656s] END
May 13 19:45:16 k8s-node04 kubelet[598]: E0513 19:45:16.944595     598 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Timeout: Too large resource version: 159128061, current: 159127066
May 13 19:45:23 k8s-node04 kubelet[598]: I0513 19:45:23.959866     598 trace.go:116] Trace[243135295]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/kubelet.go:517 (started: 2020-05-13 19:44:44.860565979 +0000 UTC m=+82008.029301834) (total time: 39.099201281s):
May 13 19:45:23 k8s-node04 kubelet[598]: Trace[243135295]: [39.099201281s] [39.099201281s] END
May 13 19:45:23 k8s-node04 kubelet[598]: E0513 19:45:23.959947     598 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:517: Failed to list *v1.Service: Timeout: Too large resource version: 159128031, current: 159127042
May 13 19:45:32 k8s-node04 kubelet[598]: I0513 19:45:32.752744     598 trace.go:116] Trace[1950236492]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:135 (started: 2020-05-13 19:44:53.65385557 +0000 UTC m=+82016.822591425) (total time: 39.098776276s):
May 13 19:45:32 k8s-node04 kubelet[598]: Trace[1950236492]: [39.098776276s] [39.098776276s] END
May 13 19:45:32 k8s-node04 kubelet[598]: E0513 19:45:32.752831     598 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: Timeout: Too large resource version: 159128079, current: 159127090
May 13 19:45:35 k8s-node04 kubelet[598]: I0513 19:45:35.670924     598 trace.go:116] Trace[1207388769]: "Reflector ListAndWatch" name:object-"kube-system"/"kube-router-token-4px26" (started: 2020-05-13 19:44:56.566459557 +0000 UTC m=+82019.735195412) (total time: 39.104363817s):
May 13 19:45:35 k8s-node04 kubelet[598]: Trace[1207388769]: [39.104363817s] [39.104363817s] END
May 13 19:45:35 k8s-node04 kubelet[598]: E0513 19:45:35.671005     598 reflector.go:178] object-"kube-system"/"kube-router-token-4px26": Failed to list *v1.Secret: Timeout: Too large resource version: 159128021, current: 159127032
May 13 19:46:05 k8s-node04 kubelet[598]: I0513 19:46:05.472918     598 trace.go:116] Trace[308823067]: "Reflector ListAndWatch" name:object-"kube-system"/"default-token-h8dz9" (started: 2020-05-13 19:45:26.359131486 +0000 UTC m=+82049.527867341) (total time: 39.113684635s):
May 13 19:46:05 k8s-node04 kubelet[598]: Trace[308823067]: [39.113684635s] [39.113684635s] END
May 13 19:46:05 k8s-node04 kubelet[598]: E0513 19:46:05.473007     598 reflector.go:178] object-"kube-system"/"default-token-h8dz9": Failed to list *v1.Secret: Timeout: Too large resource version: 159128021, current: 159127032

What you expected to happen:
I expect that after network recovery kubelet reconnects to the Apiserver as before, and after a recovery such timeouts do not occur.

How to reproduce it (as minimally and precisely as possible):
Just have a node connected to the cluster. Then, disconnect it from the network for 3-4 minutes, then reconnect. Then observe kubelet's logs.

Anything else we need to know?:
I have strict tcp keepalive settings in place on master and worker nodes, but this should not be the cause.

# sysctl -a|grep net.ipv4|grep tcp_keep
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 600

Restarting kubelet solves the issue, the error messages disappear.

Environment:

  • Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/arm"}
  • Cloud provider or hardware configuration:
    bare metal
  • OS (e.g: cat /etc/os-release):
# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel (e.g. uname -a):
# uname -a
Linux k8s-node04 5.4.40 #2 SMP Sun May 10 13:03:41 UTC 2020 aarch64 GNU/Linux
  • Install tools:
    kubeadm
  • Network plugin and version (if this is a network-related bug):
    kube-router
  • Others:
@rkojedzinszky rkojedzinszky added the kind/bug Categorizes issue or PR as related to a bug. label May 13, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 13, 2020
@rkojedzinszky
Copy link
Author

rkojedzinszky commented May 13, 2020

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 13, 2020
@rkojedzinszky
Copy link
Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label May 13, 2020
@rkojedzinszky
Copy link
Author

It seems that after some more time, kubelet recovers. Sometimes it takes one hour, sometimes two. Recovery is successful when I receive similar logs:

May 14 08:12:50 k8s-node04 kubelet[21312]: I0514 08:12:50.493844   21312 trace.go:116] Trace[2107010018]: "Reflector ListAndWatch" name:k8s.io/kubernetes/pkg/kubelet/kubelet.go:517 (started: 2020-05-14 08:12:14.272390782 +0000 UTC m=+3843
May 14 08:12:50 k8s-node04 kubelet[21312]: Trace[2107010018]: [36.221010158s] [36.221010158s] Objects listed
May 14 08:12:50 k8s-node04 kubelet[21312]: I0514 08:12:50.703616   21312 trace.go:116] Trace[2017894804]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:135 (started: 2020-05-14 08:12:30.637820871 +0000 UTC m=+38450.6
May 14 08:12:50 k8s-node04 kubelet[21312]: Trace[2017894804]: [20.065578776s] [20.065578776s] Objects listed
May 14 08:14:02 k8s-node04 kubelet[21312]: I0514 08:14:02.891254   21312 trace.go:116] Trace[941042897]: "Reflector ListAndWatch" name:object-"kube-system"/"default-token-h8dz9" (started: 2020-05-14 08:13:23.678619553 +0000 UTC m=+38503.7
May 14 08:14:02 k8s-node04 kubelet[21312]: Trace[941042897]: [39.212544499s] [39.212544499s] END
May 14 08:14:02 k8s-node04 kubelet[21312]: E0514 08:14:02.891341   21312 reflector.go:178] object-"kube-system"/"default-token-h8dz9": Failed to list *v1.Secret: Timeout: Too large resource version: 159128021, current: 159127032
May 14 08:15:18 k8s-node04 kubelet[21312]: I0514 08:15:18.227313   21312 trace.go:116] Trace[198067142]: "Reflector ListAndWatch" name:object-"kube-system"/"default-token-h8dz9" (started: 2020-05-14 08:14:39.10362647 +0000 UTC m=+38579.13
May 14 08:15:18 k8s-node04 kubelet[21312]: Trace[198067142]: [39.123595925s] [39.123595925s] END
May 14 08:15:18 k8s-node04 kubelet[21312]: E0514 08:15:18.227394   21312 reflector.go:178] object-"kube-system"/"default-token-h8dz9": Failed to list *v1.Secret: Timeout: Too large resource version: 159128021, current: 159127032
May 14 08:16:31 k8s-node04 kubelet[21312]: I0514 08:16:31.559426   21312 trace.go:116] Trace[527023983]: "Reflector ListAndWatch" name:object-"kube-system"/"default-token-h8dz9" (started: 2020-05-14 08:16:15.502345878 +0000 UTC m=+38675.5
May 14 08:16:31 k8s-node04 kubelet[21312]: Trace[527023983]: [16.056783907s] [16.056783907s] Objects listed

What is the reason that it takes so long for it to recover?

@liggitt
Copy link
Member

liggitt commented May 14, 2020

The root cause is #82428, deduplicating against that issue

@liggitt liggitt closed this as completed May 14, 2020
@rkojedzinszky
Copy link
Author

rkojedzinszky commented May 24, 2020

Debugging, tracing the code, I am not really sure that this is the same issue as the other mentioned here.
Error messages show that the client is requesting higher resource versions that is available at the server:
159128021, current: 159127032, the former is greater by 989 than the latter. I dont know from where this resource version comes from.

I've made a very stupid patch on my kubelet/client-go:

diff --git a/staging/src/k8s.io/client-go/tools/cache/reflector.go b/staging/src/k8s.io/client-go/tools/cache/reflector.go
index 99a7b284b78..032a6713fc7 100644
--- a/staging/src/k8s.io/client-go/tools/cache/reflector.go
+++ b/staging/src/k8s.io/client-go/tools/cache/reflector.go
@@ -174,6 +174,7 @@ var internalPackages = []string{"client-go/tools/cache/"}
 func (r *Reflector) Run(stopCh <-chan struct{}) {
        klog.V(2).Infof("Starting reflector %s (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
        wait.BackoffUntil(func() {
+               r.setLastSyncResourceVersion("")
                if err := r.ListAndWatch(stopCh); err != nil {
                        utilruntime.HandleError(err)
                }

I am pretty sure that this is not the best solution, but, it seems that it forces a full list every time ListAndWatch is called.

@liggitt
Copy link
Member

liggitt commented Jun 1, 2020

You're correct, sorry about that... I misread the timeout message.

The error it is getting is coming from the server, which means kubelet -> apiserver connectivity is fine.

This message comes from the watch cache, which means the watch cache in the API server is behind where the kubelet's informer was, which is pretty surprising.

A few questions:

  • Is there a single apiserver or multi-apiserver cluster?
  • Is there a single etcd member or multiple etcd members?
  • Is there an API server running on the node that was detached/reattached?

@liggitt liggitt changed the title Kubelet fails to reconnect to Apiserver after network outage Kubelet gets "Timeout: Too large resource version" error from the API server after network outage Jun 1, 2020
@liggitt
Copy link
Member

liggitt commented Jun 1, 2020

cc @jpbetz does this ring any bells?

@liggitt
Copy link
Member

liggitt commented Jun 1, 2020

cc @wojtek-t for eyes on watch cache

@jpbetz
Copy link
Contributor

jpbetz commented Jun 1, 2020

I don't think I've hit this problem specifically. Since this could happen due to an etcd partition or due to a partition between a apiserver and etcd, the answers questions asked in #91073 (comment) seem like the appropriate next things to figure out.

@wojtek-t
Copy link
Member

wojtek-t commented Jun 2, 2020

Yeah - I agree with Joe. It seems like some network partition issue of the apiserver the node was connected to.

@rkojedzinszky
Copy link
Author

I will be able to answer the questions in detail later, but in short: there are 3 masters set up with kubeadm, haproxy load-balances the queries to the apiservers. The only thing to reproduce the issue is to disconnect the node from the network for a few minutes. Masters or etcd members are not touched, and anyway the cluster seems to be healthy.

@rkojedzinszky
Copy link
Author

Today, it occured on a diffrent cluster. Setup is similar: stacked etcd on master nodes, haproxy load-balances traffic between them. Just was playing with haproxy servers and the HA setup, and after a few restarts of haproxy, one of the nodes entered this state. So during a haproxy restart, all kubelets' connections to apiservers are terminated, but rarely, one of them enters this unhealthy state.

@rkojedzinszky
Copy link
Author

rkojedzinszky commented Jun 2, 2020

You're correct, sorry about that... I misread the timeout message.

The error it is getting is coming from the server, which means kubelet -> apiserver connectivity is fine.

This message comes from the watch cache, which means the watch cache in the API server is behind where the kubelet's informer was, which is pretty surprising.

A few questions:

  • Is there a single apiserver or multi-apiserver cluster?

Multi apiserver cluster

  • Is there a single etcd member or multiple etcd members?

Stacked etcd members are present.

  • Is there an API server running on the node that was detached/reattached?

I think In my setup it is not relevant, as all the kubelets are talking to the apiserver through the load-balancer, which picks one of the available apiservers.

I am doing a rolling restarts of the nodes, and there were at least one occurence, when even one of the master nodes got stucked in this state.

How can I help debugging/fixing this?

@rkojedzinszky
Copy link
Author

How can we proceed with this issue?

@fvigotti
Copy link

fvigotti commented Jun 16, 2020

got same problem in a old cluster, ( up since 3+years) after updating to kube 1.18.3 , one node kubelet start logging those errors ( ~10 errors/minute with various resources)

kubelet[19138]: E0616 08:27:38.857563   19200 reflector.go:178] object-"kube-system"/"default-token-0pxb1": Failed to list *v1.Secret: Timeout: Too large resource version: 305622098, current: 305621712
....
E0616 08:28:14.662897   19200 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:517: Failed to list *v1.Service: Timeout: Too large resource version: 305622098, current: 305621717

I've now just restarted kubelet and now seems working again..

Update:
while testing I've tried to connect the kubelet of that node to each single master nodes to check if there were some kind of network issue / partitioning.. none of the master servers returned errors ( the kubelet worked fine with all master nodes bypassing the nginx proxy that manage masters HA

@rkojedzinszky
Copy link
Author

I think this problem is getting serious. I am not 100% sure, but today, during node restarts (including masters) coredns was migrated multiple times, and unfortunately, kube-proxy pods had not received the updated situation, thus, dns resolution was not working inside the cluster. After detecting this, a simple restart of all kube-proxy pods resolved our problem. There were no network issues during this, so I suspect we've hit the same bug, now in kube-proxy.

@rkojedzinszky
Copy link
Author

I am trying to dig this. Now, I've written a little go prog which uses the same reflector what kubelet uses. It just starts a watch for CSIDrivers, and I've made client-go to print the last resourceVersion. Starting the program multiple times (i.e. connecting to different masters) produces the following:

$ ./main 
setLastSyncResourceVersion= 179271356
^C
$ ./main 
setLastSyncResourceVersion= 179271308
^C
$ ./main 
setLastSyncResourceVersion= 179271231
^C
$ ./main 
setLastSyncResourceVersion= 179271356
^C
$ ./main 
setLastSyncResourceVersion= 179271308
^C
$ ./main 
setLastSyncResourceVersion= 179271231
^C
$ ./main 
setLastSyncResourceVersion= 179271356
^C
$ ./main 
setLastSyncResourceVersion= 179271308
^C

Right now I dont exactly know what url is being fetched and what arguments are passed in, I still have to figure that out. But now it seems, that different masters return different resouceVersions. Etcd does not report any problems/issues.

@rkojedzinszky
Copy link
Author

Any ideas, suggestions where to look further?

@rkojedzinszky
Copy link
Author

Today I have re-initialized 2 of my 3 master nodes to make sure etcd replicas are not corrupted. Just after the nodes have been joined, the same behavior could be observed: connecting to different masters resulted in different resourceVersions.

@rkojedzinszky
Copy link
Author

Now, it can be seen, that querying different apiservers again return different resourceVersions:

$ curl -k -sL -H 'Authorization: bearer x 'https://192.168.8.60:16443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0'
{
  "kind": "CSIDriverList",
  "apiVersion": "storage.k8s.io/v1",
  "metadata": {
    "selfLink": "/apis/storage.k8s.io/v1/csidrivers",
    "resourceVersion": "180941315"
  },
  "items": []
}
$ curl -k -sL -H 'Authorization: bearer x 'https://192.168.8.60:16443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0'
{
  "kind": "CSIDriverList",
  "apiVersion": "storage.k8s.io/v1",
  "metadata": {
    "selfLink": "/apis/storage.k8s.io/v1/csidrivers",
    "resourceVersion": "180939938"
  },
  "items": []
}
$ curl -k -sL -H 'Authorization: bearer x 'https://192.168.8.60:16443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0'
{
  "kind": "CSIDriverList",
  "apiVersion": "storage.k8s.io/v1",
  "metadata": {
    "selfLink": "/apis/storage.k8s.io/v1/csidrivers",
    "resourceVersion": "179941052"
  },
  "items": []
}

192.168.8.60:16443 is a haproxy load-balancing traffic between 3 masters.

@rkojedzinszky
Copy link
Author

@wojtek-t @jpbetz please let me know where to go further. My cluster runs on PI boards, perhaps, not the fastest ones. But if this issue depends on slow hardware, than it can pop up anytime on fast hardware as well.As I wrote, today I reinitialized 2 of my 3 masters to make sure no partitioning occured. Also, please have a look at my comments. Or, if you have a clue that my setup is broken, please let me know!

@wojtek-t
Copy link
Member

wojtek-t commented Jul 1, 2020

This has been fixed/mitigated in 1.18. Going to backport to 1.18 now.
It seems 1.18 is the first version affected (the offending changes were reverted in 1.17 months ago).

@wojtek-t
Copy link
Member

This has been mitigated in 1.18 and head.

The more proper fix is proposed in kubernetes/enhancements#1878

Closing this one.

mrueg added a commit to mrueg/kube-router that referenced this issue Sep 23, 2020
This vendors a later version of prometheus' golang client (0.8.0 ->
0.9.4) to allow `go mod tidy`to work properly.
It also updates the k8s libraries from 0.18.6 to 0.18.8 to avoid
hitting kubernetes/kubernetes#91073
murali-reddy pushed a commit to cloudnativelabs/kube-router that referenced this issue Sep 30, 2020
This vendors a later version of prometheus' golang client (0.8.0 ->
0.9.4) to allow `go mod tidy`to work properly.
It also updates the k8s libraries from 0.18.6 to 0.18.8 to avoid
hitting kubernetes/kubernetes#91073
@nerzhul
Copy link

nerzhul commented Nov 9, 2020

Hello, as i saw, i have the same issue in 1.19.3 version:

Nov 09 08:17:14 vp-09b7.vm.vptech.eu kube-apiserver[31992]: I1109 08:17:14.930916   31992 httplog.go:89] "HTTP" verb="GET" URI="/apis/storage.k8s.io/v1/csidrivers?resourceVersion=144195811" latency="3.0010497s" userAgent="kubelet/v1.19.2 (linux/amd64) kubernetes/f574309" srcIP="10.175.128.1:51466" resp=504 statusStack="\ngoroutine 113005721 [running]:\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc01580c2a0, 0x1f8)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:237 +0xcf\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc01580c2a0, 0x1f8)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:216 +0x35\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).WriteHeader(0xc03c194ae0, 0x1f8)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:228 +0xb2\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/metrics.(*ResponseWriterDelegator).WriteHeader(0xc01d1d74d0, 0x1f8)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:503 +0x45\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.(*deferredResponseWriter).Write(0xc007e23cc0, 0xc0294264e0, 0xbf, 0xc5, 0x7f8af5abf498, 0xc0235f4e60, 0xa6)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:202 +0x1f7\nk8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/protobuf.(*Serializer).doEncode(0xc00041e100, 0x504ca80, 0xc0235f4e60, 0x503bac0, 0xc007e23cc0, 0x0, 0x4843d7f)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go:210 +0x5e5\nk8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/protobuf.(*Serializer).Encode(0xc00041e100, 0x504ca80, 0xc0235f4e60, 0x503bac0, 0xc007e23cc0, 0x39c0571, 0x6)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go:167 +0x147\nk8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).doEncode(0xc0235f4f00, 0x504ca80, 0xc0235f4e60, 0x503bac0, 0xc007e23cc0, 0x0, 0x0)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning/versioning.go:228 +0x396\nk8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning.(*codec).Encode(0xc0235f4f00, 0x504ca80, 0xc0235f4e60, 0x503bac0, 0xc007e23cc0, 0xc00041e100, 0x5055340)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/runtime/serializer/versioning/versioning.go:184 +0x170\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.SerializeObject(0x48bbf1f, 0x23, 0x7f8af67cf080, 0xc0235f4f00, 0x509ee00, 0xc012fddde0, 0xc02c395800, 0x1f8, 0x504ca80, 0xc0235f4e60)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:96 +0x12c\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.WriteObjectNegotiated(0x50a2040, 0xc00627ac60, 0x50a2380, 0x722e908, 0x485b9ab, 0xe, 0x4843d7f, 0x2, 0x509ee00, 0xc012fddde0, ...)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:251 +0x572\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.ErrorNegotiated(0x503b360, 0xc0235f4dc0, 0x50a2040, 0xc00627ac60, 0x485b9ab, 0xe, 0x4843d7f, 0x2, 0x509ee00, 0xc012fddde0, ...)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:270 +0x16f\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*RequestScope).err(...)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:89\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers.ListResource.func1(0x509ee00, 0xc012fddde0, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/get.go:279 +0x1259\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints.restfulListResource.func1(0xc01d1d7290, 0xc01580c310)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/installer.go:1157 +0x91\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/metrics.InstrumentRouteFunc.func1(0xc01d1d7290, 0xc01580c310)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:384 +0x282\nk8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc0001d4360, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:288 +0xa84\nk8s.io/kubernetes/vendor/github.com/emicklei/go-restful.(*Container).Dispatch(...)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:199\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x485b1e7, 0xe, 0xc0001d4360, 0xc0006d6e70, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:146 +0x539\nk8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/apiserver.(*proxyHandler).ServeHTTP(0xc00fc0e000, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go:121 +0x183\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc0206f4140, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:248 +0x3db\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc00427e230, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:234 +0x8c\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x485e6c8, 0xf, 0xc002f12990, 0xc00427e230, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/handler.go:154 +0x74d\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:64 +0x563\nnet/http.HandlerFunc.ServeHTTP(0xc00813c340, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func2(0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:175 +0x4cf\nnet/http.HandlerFunc.ServeHTTP(0xc00bd97470, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go:50 +0x2306\nnet/http.HandlerFunc.ServeHTTP(0xc00813c380, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395800)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthentication.func1(0x7f8af539e878, 0xc012fdddb8, 0xc02c395700)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/filters/authentication.go:70 +0x672\nnet/http.HandlerFunc.ServeHTTP(0xc005ed4320, 0x7f8af539e878, 0xc012fdddb8, 0xc02c395700)\n\t/usr/local/go/src/net/http/server.go:2042 +0x44\nk8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc032a16b40, 0xc0069afd60, 0x50a5540, 0xc012fdddb8, 0xc02c395700)\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:113 +0xb8\ncreated by k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP\n\t/workspace/anago-v1.19.3-rc.0.69+37babbd0e76c11/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:99 +0x1cc\n" addedInfo="\nlogging error output: \"k8s\\x00\\n\\f\\n\\x02v1\\x12\\x06Status\\x12\\xa6\\x01\\n\\x06\\n\\x00\\x12\\x00\\x1a\\x00\\x12\\aFailure\\x1aBTimeout: Too large resource version: 144195811, current: 142181666\\\"\\aTimeout*C\\n\\x00\\x12\\x00\\x1a\\x00\\\"7\\n\\x17ResourceVersionTooLarge\\x12\\x1aToo large resource version\\x1a\\x00(\\x012\\x000\\xf8\\x03\\x1a\\x00\\\"\\x00\"\n"

Our setup is 3 etcd, 2 apiserver bound on those 3 etcd and a haproxy in front of apiservers

@wojtek-t
Copy link
Member

If that happens once - that can definitely happen (and that's fine). The bug was that the components were stuck with those errors.

@oldthreefeng
Copy link

1.18.0 get same error . restart docker and kubelet return OK.

so , should I update the 1.18.0 to 1.18.6+ ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
8 participants