Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL + Kine Conformance Checks Fail #10023

Open
VestigeJ opened this issue Apr 25, 2024 · 7 comments
Open

MySQL + Kine Conformance Checks Fail #10023

VestigeJ opened this issue Apr 25, 2024 · 7 comments
Assignees
Labels
kind/bug Something isn't working priority/high
Milestone

Comments

@VestigeJ
Copy link

VestigeJ commented Apr 25, 2024

Environmental Info:
K3s Version:

Any current branch

Node(s) CPU architecture, OS, and Version:

not applicable the same bug is present regardless of the OS.

Cluster Configuration:

3 servers 1 agent
1 external MySQL DB mysql Ver 8.0.36 for Linux on x86_64 (MySQL Community Server - GPL)

Describe the bug:

Failing conformance checks only when MySQL is the external DB - the exact same setup with postgres passes conformance checks just fine.

Failed tests:
 [sig-api-machinery] Servers with support for API chunking should support continue listing from the last key if the original version has been compacted away, though the list is inconsistent [Slow] [Conformance]
 [sig-api-machinery] Servers with support for API chunking should return chunks of results for list calls [Conformance]

Steps To Reproduce:

  • Installed K3s:
  1. Install K3s using an external MySQL DB as the datastore-endpoint
  2. wait for the cluster to come up and be fully ready.
  3. run sonobuoy conformance checks against the cluster
    sonobuoy run --mode=certified-conformance // to expedite the tests you could run --e2e-focus=sig-api-machinery or the string of the failed tests --e2e-focus="Servers with support for API chunking should support continue listing from the last key if the original version has been compacted away, though the list is inconsistent"

Expected behavior:

all conformance checks pass

Actual behavior:

specifically it seems that various cascading failures can occur specifically with the sig-api-machinery suite of tests.

Additional context / logs:

Plugin: e2e
Status: failed
Total: 7213
Passed: 351
Failed: 31
Skipped: 6831

Failed tests:
 [sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should be able to deny attaching pod [Conformance]
 [sig-node] KubeletManagedEtcHosts should test kubelet managed /etc/hosts file [LinuxOnly] [NodeConformance] [Conformance]
 [sig-network] Services should be able to switch session affinity for NodePort service [LinuxOnly] [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD preserving unknown fields in an embedded object [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] updates the published spec when one version gets renamed [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of different groups [Conformance]
 [sig-cli] Kubectl client Kubectl version should check is all data is printed  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD preserving unknown fields at the schema root [Conformance]
 [sig-storage] EmptyDir volumes pod should support shared volumes between containers [Conformance]
 [sig-auth] ServiceAccounts should mount an API token into pods  [Conformance]
 [sig-network] DNS should support configurable pod DNS nameservers [Conformance]
 [sig-cli] Kubectl client Proxy server should support proxy with --port 0  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD with validation schema [Conformance]
 [sig-cli] Kubectl client Kubectl run pod should create a pod from an image when restart is Never  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] removes definition from spec when one version gets changed to not be served [Conformance]
 [sig-cli] Kubectl client Kubectl diff should check if kubectl diff finds a difference for Deployments [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group but different versions [Conformance]
 [sig-node] Pods should support retrieving logs from the container over websockets [NodeConformance] [Conformance]
 [sig-cli] Kubectl client Kubectl patch should add annotations for pods in rc  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD without validation schema [Conformance]
 [sig-cli] Kubectl client Kubectl label should update the label on a resource  [Conformance]
 [sig-cli] Kubectl client Kubectl describe should check if kubectl describe prints relevant information for rc and pods  [Conformance]
 [sig-cli] Kubectl client Kubectl cluster-info should check if Kubernetes control plane services is included in cluster-info  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group and version but different kinds [Conformance]
 [sig-cli] Kubectl client Update Demo should create and stop a replication controller  [Conformance]
 [sig-cli] Kubectl client Kubectl replace should update a single-container pod's image  [Conformance]
 [sig-node] Pods should support remote command execution over websockets [NodeConformance] [Conformance]
 [sig-cli] Kubectl client Kubectl server-side dry-run should check if kubectl can dry-run update Pods [Conformance]
 [sig-cli] Kubectl client Update Demo should scale a replication controller  [Conformance]
 [sig-cli] Kubectl client Guestbook application should create and stop a working application  [Conformance]
 [sig-cli] Kubectl logs logs should be able to retrieve and filter logs  [Conformance]

Plugin: systemd-logs
Status: passed
Total: 4
Passed: 4
Failed: 0
Skipped: 0

Run Details:
API Server version: v1.27.13-rc2+k3s1
Node health: 4/4 (100%)
Pods health: 15/16 (93%)
Details for failed pods:
sonobuoy/sonobuoy-e2e-job-73fdfa25f4cd4449 Ready:False: PodFailed:
Errors detected in files:
Errors:
251 podlogs/sonobuoy/sonobuoy-e2e-job-73fdfa25f4cd4449/logs/e2e.txt
 54 podlogs/kube-system/traefik-7d5c94d587-4c2mp/logs/traefik.txt
 11 podlogs/kube-system/metrics-server-c44988498-w2fm7/logs/metrics-server.txt
Warnings:
12 podlogs/sonobuoy/sonobuoy-e2e-job-73fdfa25f4cd4449/logs/e2e.txt
 5 podlogs/sonobuoy/sonobuoy/logs/kube-sonobuoy.txt
Plugin: e2e
Status: failed
Total: 7411
Passed: 390
Failed: 2
Skipped: 7019

Failed tests:
 [sig-api-machinery] Servers with support for API chunking should support continue listing from the last key if the original version has been compacted away, though the list is inconsistent [Slow] [Conformance]
 [sig-api-machinery] Servers with support for API chunking should return chunks of results for list calls [Conformance]

Plugin: systemd-logs
Status: passed
Total: 4
Passed: 4
Failed: 0
Skipped: 0

Run Details:
API Server version: v1.29.2+k3s1
Node health: 4/4 (100%)
Pods health: 15/16 (93%)
Details for failed pods:
sonobuoy/sonobuoy-e2e-job-21fff695d911479a Ready:False: :
Errors detected in files:
Errors:
34 podlogs/kube-system/traefik-f4564c4f4-grgbq/logs/traefik.txt
11 podlogs/kube-system/metrics-server-67c658944b-lk4tj/logs/metrics-server.txt
 4 podlogs/sonobuoy/sonobuoy-e2e-job-21fff695d911479a/logs/e2e.txt
 1 podlogs/kube-system/helm-install-traefik-crd-xw6kt/logs/helm.txt
 1 podlogs/kube-system/helm-install-traefik-wrvmv/logs/helm.txt
Warnings:
1 podlogs/kube-system/metrics-server-67c658944b-lk4tj/logs/metrics-server.txt
1 podlogs/sonobuoy/sonobuoy-e2e-job-21fff695d911479a/logs/e2e.txt
1 podlogs/sonobuoy/sonobuoy/logs/kube-sonobuoy.txt

$ k3s -v
k3s version v1.29.2+k3s1 (86f10213)
go version go1.21.7

$ kgn
NAME               STATUS   ROLES                  AGE    VERSION
ip-1-1-31-84       Ready    <none>                 119m   v1.29.2+k3s1
ip-1-1-23-159      Ready    control-plane,master   120m   v1.29.2+k3s1
ip-1-1-17-35       Ready    control-plane,master   119m   v1.29.2+k3s1
ip-1-1-19-12       Ready    control-plane,master   122m   v1.29.2+k3s1

These commands in the test suite when run manually pass

STEP: Destroying namespace "kubectl-293" for this suite. @ 04/24/24 21:07:27.373
  << Timeline

  [FAILED] error running /usr/local/bin/kubectl --kubeconfig=/tmp/kubeconfig-504024898 --namespace=kubectl-293 exec httpd --pod-running-timeout=2m0s -- /bin/sh -c exit 0:
  Command stdout:

  stderr:
  error: error sending request: Post "[https://10.43.0.1:443/api/v1/namespaces/kubectl-293/pods/httpd/exec?command=%2Fbin%2Fsh&command=-c&command=exit+0&container=httpd&stderr=true&stdout=true](https://10.43.0.1/api/v1/namespaces/kubectl-293/pods/httpd/exec?command=%2Fbin%2Fsh&command=-c&command=exit+0&container=httpd&stderr=true&stdout=true)": dial tcp 10.43.0.1:443: connect: connection refused

  error:
  exit status 1
  In [It] at: k8s.io/kubernetes/test/e2e/kubectl/kubectl.go:538 @ 04/24/24 21:07:25.15

  There were additional failures detected.  To view them in detail run ginkgo -vv
Plugin: e2e
Status: failed
Total: 7213
Passed: 351
Failed: 31
Skipped: 6831

Failed tests:
 [sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should be able to deny attaching pod [Conformance]
 [sig-node] KubeletManagedEtcHosts should test kubelet managed /etc/hosts file [LinuxOnly] [NodeConformance] [Conformance]
 [sig-network] Services should be able to switch session affinity for NodePort service [LinuxOnly] [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD preserving unknown fields in an embedded object [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] updates the published spec when one version gets renamed [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of different groups [Conformance]
 [sig-cli] Kubectl client Kubectl version should check is all data is printed  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD preserving unknown fields at the schema root [Conformance]
 [sig-storage] EmptyDir volumes pod should support shared volumes between containers [Conformance]
 [sig-auth] ServiceAccounts should mount an API token into pods  [Conformance]
 [sig-network] DNS should support configurable pod DNS nameservers [Conformance]
 [sig-cli] Kubectl client Proxy server should support proxy with --port 0  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD with validation schema [Conformance]
 [sig-cli] Kubectl client Kubectl run pod should create a pod from an image when restart is Never  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] removes definition from spec when one version gets changed to not be served [Conformance]
 [sig-cli] Kubectl client Kubectl diff should check if kubectl diff finds a difference for Deployments [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group but different versions [Conformance]
 [sig-node] Pods should support retrieving logs from the container over websockets [NodeConformance] [Conformance]
 [sig-cli] Kubectl client Kubectl patch should add annotations for pods in rc  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for CRD without validation schema [Conformance]
 [sig-cli] Kubectl client Kubectl label should update the label on a resource  [Conformance]
 [sig-cli] Kubectl client Kubectl describe should check if kubectl describe prints relevant information for rc and pods  [Conformance]
 [sig-cli] Kubectl client Kubectl cluster-info should check if Kubernetes control plane services is included in cluster-info  [Conformance]
 [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group and version but different kinds [Conformance]
 [sig-cli] Kubectl client Update Demo should create and stop a replication controller  [Conformance]
 [sig-cli] Kubectl client Kubectl replace should update a single-container pod's image  [Conformance]
 [sig-node] Pods should support remote command execution over websockets [NodeConformance] [Conformance]
 [sig-cli] Kubectl client Kubectl server-side dry-run should check if kubectl can dry-run update Pods [Conformance]
 [sig-cli] Kubectl client Update Demo should scale a replication controller  [Conformance]
 [sig-cli] Kubectl client Guestbook application should create and stop a working application  [Conformance]
 [sig-cli] Kubectl logs logs should be able to retrieve and filter logs  [Conformance]

Plugin: systemd-logs
Status: passed
Total: 4
Passed: 4
Failed: 0
Skipped: 0

Run Details:
API Server version: v1.27.13-rc2+k3s1
Node health: 4/4 (100%)
Pods health: 15/16 (93%)
Details for failed pods:
sonobuoy/sonobuoy-e2e-job-73fdfa25f4cd4449 Ready:False: PodFailed:
Errors detected in files:
Errors:
251 podlogs/sonobuoy/sonobuoy-e2e-job-73fdfa25f4cd4449/logs/e2e.txt
 54 podlogs/kube-system/traefik-7d5c94d587-4c2mp/logs/traefik.txt
 11 podlogs/kube-system/metrics-server-c44988498-w2fm7/logs/metrics-server.txt
Warnings:
12 podlogs/sonobuoy/sonobuoy-e2e-job-73fdfa25f4cd4449/logs/e2e.txt
 5 podlogs/sonobuoy/sonobuoy/logs/kube-sonobuoy.txt

Important note that MySQL itself does not seem bottlenecked or overburdened by the cluster in this setup

sql> status
--------------
mysql  Ver 8.0.36 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:		1269
Current database:
Current user:		root@localhost
SSL:			Not in use
Current pager:		less
Using outfile:		''
Using delimiter:	;
Server version:		8.0.36 MySQL Community Server - GPL
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8mb4
Db     characterset:	utf8mb4
Client characterset:	utf8mb4
Conn.  characterset:	utf8mb4
UNIX socket:		/var/lib/mysql/mysql.sock
Binary data as:		Hexadecimal
Uptime:			42 min 38 sec

Threads: 14  Questions: 434404  Slow queries: 0  Opens: 421  Flush tables: 3  Open tables: 340  Queries per second avg: 169.821
--------------
@brandond
Copy link
Contributor

brandond commented Apr 25, 2024

This appears to be related to watch requests and response pagination, I suspect the fixes from these recent kine PRs may need some additional tweaking to properly group requests in the way the apiserver expects:

@brandond brandond self-assigned this Apr 25, 2024
@brandond brandond added this to the v1.30.1+k3s1 milestone Apr 25, 2024
@brandond
Copy link
Contributor

@VestigeJ do you know if this can be reproduced on a single-node instance of k3s, or does it require 3+1 to fail?

@VestigeJ
Copy link
Author

I haven't tried it on a single host at all we don't usually try to do conformance tests without a 3-1 cluster.

@brandond
Copy link
Contributor

brandond commented May 7, 2024

 << Timeline

 [FAILED] Expected
     <string>: template-0002
 to equal
     <string>: template-0001
 In [It] at: k8s.io/kubernetes/test/e2e/apimachinery/chunking.go:112 @ 05/07/24 00:14:39.521

It appears that the conformance tests now expect the datastore to list things sorted by key. Previously this wasn't required, and we closed out issues where people complained about it:

There are now tests that create a large number (400) of resources, and then list them in smaller (17) chunks, and test that the items come out sorted by name (the 7th item returned should be named template-0007, for example). Ref:
https://github.com/kubernetes/kubernetes/blob/v1.29.4/test/e2e/apimachinery/chunking.go#L112-L113

This fails because kine does not sort returned items by key, as etcd does. They are instead sorted by revision, as this is the native order returned by the SQL query.

@brandond
Copy link
Contributor

brandond commented May 7, 2024

Timeline >>
 STEP: Creating a kubernetes client @ 05/06/24 23:55:51.153
 May  6 23:55:51.153: INFO: >>> kubeConfig: /tmp/kubeconfig-4089597224
 STEP: Building a namespace api object, basename chunking @ 05/06/24 23:55:51.154
 STEP: Waiting for a default service account to be provisioned in namespace @ 05/06/24 23:55:51.17
 STEP: Waiting for kube-root-ca.crt to be provisioned in namespace @ 05/06/24 23:55:51.172
 STEP: creating a large number of resources @ 05/06/24 23:55:51.174
 STEP: retrieving the first page @ 05/06/24 23:56:08.864
 May  6 23:56:08.935: INFO: Retrieved 40/40 results with rv 6307 and continue eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0
 STEP: retrieving the second page until the token expires @ 05/06/24 23:56:08.935
 May  6 23:56:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:56:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:57:08.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:57:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:57:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:58:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:58:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:58:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:59:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:59:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  6 23:59:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:00:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:00:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:00:48.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:01:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:01:28.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:01:48.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:02:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:02:28.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:02:48.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:03:08.942: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:03:28.943: INFO: Token eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6NjMwNywic3RhcnQiOiJ0ZW1wbGF0ZS0wMDMzXHUwMDAwIn0 has not expired yet
 May  7 00:03:48.940: INFO: got error The provided continue parameter is too old to display a consistent list result. You can start a new list without the continue parameter, or use the continue token in this response to retrieve the remainder of the results. Continuing with the provided token results in an inconsistent list - objects that were created, modified, or deleted between the time the first chunk was returned and now may show up in the list.
 May  7 00:03:48.940: INFO: Retrieved inconsistent continue eyJ2IjoibWV0YS5rOHMuaW8vdjEiLCJydiI6LTEsInN0YXJ0IjoidGVtcGxhdGUtMDAzM1x1MDAwMCJ9
 STEP: retrieving the second page again with the token received with the error message @ 05/07/24 00:03:48.94
 [FAILED] in [It] - k8s.io/kubernetes/test/e2e/apimachinery/chunking.go:202 @ 05/07/24 00:03:48.946

This error seems to stem from a similar problem, related to paginating with expired continue tokens. However in this case the problem seems to be related to consistency with the remaining item count. This was supposed to have been fixed by k3s-io/kine#271 but perhaps there are corner cases remaining with pagination and key order. Ref:
https://github.com/kubernetes/kubernetes/blob/v1.29.4/test/e2e/apimachinery/chunking.go#L202

@brandond
Copy link
Contributor

brandond commented May 7, 2024

  Timeline >>
 STEP: Creating a kubernetes client @ 05/06/24 23:54:45.579
 May  6 23:54:45.579: INFO: >>> kubeConfig: /tmp/kubeconfig-4089597224
 STEP: Building a namespace api object, basename watchlist @ 05/06/24 23:54:45.579
 STEP: Waiting for a default service account to be provisioned in namespace @ 05/06/24 23:54:45.604
 STEP: Waiting for kube-root-ca.crt to be provisioned in namespace @ 05/06/24 23:54:45.606
 STEP: Adding 5 secrets to watchlist-8617 namespace @ 05/06/24 23:54:45.608
 STEP: Starting the secret informer @ 05/06/24 23:54:45.641
 STEP: Waiting until the secret informer is fully synchronised @ 05/06/24 23:54:45.641
 W0506 23:54:45.642543      22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets)
 W0506 23:54:45.642578      22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call
 E0506 23:54:45.642590      22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call
 W0506 23:54:46.700065      22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets)
 W0506 23:54:46.700096      22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call
 E0506 23:54:46.700108      22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call
 W0506 23:54:49.055521      22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets)
 W0506 23:54:49.055553      22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call
 E0506 23:54:49.055564      22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call
 W0506 23:54:53.110684      22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets)
 W0506 23:54:53.110713      22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call
 E0506 23:54:53.110726      22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call
 W0506 23:55:03.003099      22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets)
 W0506 23:55:03.003129      22 reflector.go:539] runtime/asm_amd64.s:1650: failed to list *v1.Secret: unexpected list call
 E0506 23:55:03.003142      22 reflector.go:147] runtime/asm_amd64.s:1650: Failed to watch *v1.Secret: failed to list *v1.Secret: unexpected list call
 May  6 23:55:15.642: INFO: Unexpected error: Failed waiting for the secret informer in  namespace to be synced:
     <context.deadlineExceededError>:
     context deadline exceeded

         {}
 [FAILED] in [It] - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:80 @ 05/06/24 23:55:15.642

W0506 23:54:45.642543 22 reflector.go:337] The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = the server rejected our request due to an error in our request (get secrets)

An error is being bubbled up here, but I'm not sure if it's coming from the apiserver, or kine. I think the code in this test is small enough that we could just run it directly though. Ref:
https://github.com/kubernetes/kubernetes/blob/v1.29.4/test/e2e/apimachinery/watchlist.go#L56

EDIT: This FeatureGate is still Alpha and is disabled by default, I'm not sure why this test is being run by default. The server is sending back:

The ListOptions "" is invalid: sendInitialEvents: Forbidden: sendInitialEvents is forbidden for watch unless the WatchList feature gate is enabled

With the FeatureGate enabled, the test passes:

------------------------------
[sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] [sig-api-machinery, Serial, Feature:WatchList]
k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:39
  should be requested when ENABLE_CLIENT_GO_WATCH_LIST_ALPHA is set
  k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:41
  > Enter [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - set up framework | framework.go:200 @ 05/07/24 17:57:25.448
  STEP: Creating a kubernetes client - k8s.io/kubernetes/test/e2e/framework/framework.go:220 @ 05/07/24 17:57:25.449
  May  7 17:57:25.449: INFO: >>> kubeConfig: /home/brandond/.kube/k3s-server-1.yaml
  STEP: Building a namespace api object, basename watchlist - k8s.io/kubernetes/test/e2e/framework/framework.go:259 @ 05/07/24 17:57:25.45
  STEP: Waiting for a default service account to be provisioned in namespace - k8s.io/kubernetes/test/e2e/framework/framework.go:268 @ 05/07/24 17:57:25.459
  STEP: Waiting for kube-root-ca.crt to be provisioned in namespace - k8s.io/kubernetes/test/e2e/framework/framework.go:271 @ 05/07/24 17:57:25.461
  < Exit [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - set up framework | framework.go:200 @ 05/07/24 17:57:25.463 (15ms)
  > Enter [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:33 @ 05/07/24 17:57:25.463
  < Exit [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:33 @ 05/07/24 17:57:25.464 (0s)
  > Enter [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:33 @ 05/07/24 17:57:25.464
  < Exit [BeforeEach] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:33 @ 05/07/24 17:57:25.464 (0s)
  > Enter [It] should be requested when ENABLE_CLIENT_GO_WATCH_LIST_ALPHA is set - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:41 @ 05/07/24 17:57:25.464
  STEP: Adding 5 secrets to watchlist-6031 namespace - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:67 @ 05/07/24 17:57:25.464
  STEP: Starting the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:73 @ 05/07/24 17:57:25.474
  STEP: Waiting until the secret informer is fully synchronised - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:76 @ 05/07/24 17:57:25.474
  STEP: Verifying if the secret informer was properly synchronised - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:82 @ 05/07/24 17:57:25.975
  STEP: Listing secrets directly from the server from watchlist-6031 namespace - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:96 @ 05/07/24 17:57:25.975
  STEP: Comparing secrets retrieved directly from the server with the ones that have been streamed to the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:102 @ 05/07/24 17:57:25.978
  STEP: Modifying a secret and checking if the update was picked up by the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:85 @ 05/07/24 17:57:25.979
  STEP: Listing secrets directly from the server from watchlist-6031 namespace - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:96 @ 05/07/24 17:57:25.984
  STEP: Comparing secrets retrieved directly from the server with the ones that have been streamed to the secret informer - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:102 @ 05/07/24 17:57:25.985
  < Exit [It] should be requested when ENABLE_CLIENT_GO_WATCH_LIST_ALPHA is set - k8s.io/kubernetes/test/e2e/apimachinery/watchlist.go:41 @ 05/07/24 17:57:25.987 (523ms)
  > Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:34 @ 05/07/24 17:57:25.987
  May  7 17:57:25.987: INFO: Waiting up to 7m0s for all (but 0) nodes to be ready
  < Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/node/init/init.go:34 @ 05/07/24 17:57:25.989 (2ms)
  > Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:35 @ 05/07/24 17:57:25.989
  < Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - k8s.io/kubernetes/test/e2e/framework/metrics/init/init.go:35 @ 05/07/24 17:57:25.989 (0s)
  > Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - dump namespaces | framework.go:218 @ 05/07/24 17:57:25.989
  < Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - dump namespaces | framework.go:218 @ 05/07/24 17:57:25.989 (0s)
  > Enter [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - tear down framework | framework.go:215 @ 05/07/24 17:57:25.989
  STEP: Destroying namespace "watchlist-6031" for this suite. - k8s.io/kubernetes/test/e2e/framework/framework.go:360 @ 05/07/24 17:57:25.989
  < Exit [DeferCleanup (Each)] [sig-api-machinery] API Streaming (aka. WatchList) [Serial] [Feature:WatchList] - tear down framework | framework.go:215 @ 05/07/24 17:57:25.992 (3ms)
• [0.544 seconds]
------------------------------

@brandond
Copy link
Contributor

brandond commented May 8, 2024

All of the failing tests should be addressed by the linked kine PR.

Note: any tests with Feature:x in the name require the named featuregate to be enabled on the apiserver; most of these are off by default and the test failures are expected in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working priority/high
Projects
Status: Peer Review
Development

No branches or pull requests

3 participants