List Watch Failed because of "The resourceVersion for the provided list is too old" #6032

yunhjiang · 2022-05-06T17:30:05Z

We are using typha on our deployment and the typha fail after runs some time and the logs is below. Then the typha will print such log about every 1 second. And the felix can't get new update anymore.

2022-05-03 00:19:06.899 [INFO][7] watchercache.go 188: Failed to perform list of current data during resync ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices" error=The resourceVersion for the provided list is too old.
2022-05-03 00:19:07.899 [INFO][7] watchercache.go 175: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices"

Expected Behavior

When the "resourceVersion too old" happens, the list watch should recover automatically and calico agent can still get update.

Current Behavior

The typha continuously print the error and the felix can't get update anymore.

Possible Solution

Bug fixing.

Steps to Reproduce (for bugs)

Start typha pointing to a busy k8s deployment which has a lot of object changes.

Context

We are deploying the felix with typha.

Your Environment

Calico version
v3.22.0 calico
v3.22 typha
Orchestrator version (e.g. kubernetes, mesos, rkt):
k8s 1.8.
Operating System and version:
Link to your project (optional):

fasaxc · 2022-05-09T08:30:14Z

Thanks @yunhjiang you mentioned on Slack that you thought this commit was responsible: 8980383

Looks like the first list will work fine but, if we ever get into a state where the current revision is "stale", we'll get stuck.

yunhjiang · 2022-05-09T17:13:08Z

I checked the code again, seems this stuck situation only if the current revision is stale happens in a very small window.

Checked the followed code in resyncAndCreateWatcher():

It will only initiate the full list when the currentWatchRevision is "0", thus when we begin the first round of the loop, it should be ok as we will pass "0" to the wc.client.List() call. Then it will use that version to the followed wc.client.Watch() call.

In normall case, this should be ok. However, if because of some reason (maybe burst update), this old version is triggered between the list and watch call, then we will dead loop.

So a simple solution is to reset the version to "0" after https://github.com/projectcalico/calico/blob/master/libcalico-go/lib/backend/watchersyncer/watchercache.go#L239 , setting performFullResync to be true.

This should be harmless. I will cook a patch for it.

yunhjiang · 2022-05-09T17:15:08Z

And I think this is window is quite small, and it is the reason that we hit this issue after running the typha for some time.

Also, I'm a bit confused on the performFullResync=true. The performFullResync seems never set to false, thus resetting to true is meaningless, right?

caseydavenport · 2022-05-09T23:02:52Z

Agree that code looks a little weird, not clearing the the resync flag.

I've put up a candidate fix here: #6045

It attempts to clear the resync flag on success, and it also clears the revision number when we get a "too old" error back from the server.

ivanovpavel1983 · 2022-12-12T09:55:33Z

Hello. Have the same issue after upgrade from calico 3.16 to 3.23.3(migrated from manifest to tigera-operator):

And after that have info logs:

In k8s api server have errors:

Maybe I need to manually remove some resources?

daemonadmin · 2023-02-16T11:48:11Z

Hi all! Same issue:

Calico v 3.24.1, typha logs:

2023-02-15 07:46:21.197 [INFO][7] sync_server.go 756: Status update to send. client=XXXXX:37056 connID=0x1 newStatus=in-sync thread="kv-sender" 2023-02-15 08:03:21.718 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 226184 (136011276) 2023-02-15 08:03:22.211 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/host/" 2023-02-15 08:59:46.439 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 233636 (136011276) 2023-02-15 08:59:46.929 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/host/" 2023-02-15 09:58:50.411 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 3302222 (136011276) 2023-02-15 09:58:50.905 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/host/" 2023-02-15 10:33:25.134 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 226184 (136011276)

@ivanovpavel1983 Did you follow these steps (Upgrade from Calico versions prior to v3.23.0) ?
https://artifacthub.io/packages/helm/projectcalico/tigera-operator

fasaxc added likelihood/high impact/high labels May 9, 2022

caseydavenport mentioned this issue May 9, 2022

Fix bug in handling of RV too old errors #6045

Merged

3 tasks

caseydavenport added the kind/bug label May 11, 2022

marvin-tigera closed this as completed in #6045 May 11, 2022

caseydavenport mentioned this issue May 11, 2022

Automated cherry pick of #6045: Fix bug in handling of RV too old errors #6079

Merged

3 tasks

kundan2707 mentioned this issue Jul 16, 2023

The resourceVersion for the provided list is too old kubernetes/kubernetes#102160

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List Watch Failed because of "The resourceVersion for the provided list is too old" #6032

List Watch Failed because of "The resourceVersion for the provided list is too old" #6032

yunhjiang commented May 6, 2022

fasaxc commented May 9, 2022

yunhjiang commented May 9, 2022

yunhjiang commented May 9, 2022 •

edited

caseydavenport commented May 9, 2022

ivanovpavel1983 commented Dec 12, 2022 •

edited

daemonadmin commented Feb 16, 2023 •

edited

List Watch Failed because of "The resourceVersion for the provided list is too old" #6032

List Watch Failed because of "The resourceVersion for the provided list is too old" #6032

Comments

yunhjiang commented May 6, 2022

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

fasaxc commented May 9, 2022

yunhjiang commented May 9, 2022

yunhjiang commented May 9, 2022 • edited

caseydavenport commented May 9, 2022

ivanovpavel1983 commented Dec 12, 2022 • edited

daemonadmin commented Feb 16, 2023 • edited

yunhjiang commented May 9, 2022 •

edited

ivanovpavel1983 commented Dec 12, 2022 •

edited

daemonadmin commented Feb 16, 2023 •

edited