Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui updates seem to be too slow #8835

Closed
rgl opened this issue Mar 25, 2024 · 16 comments · Fixed by #8900
Closed

ui updates seem to be too slow #8835

rgl opened this issue Mar 25, 2024 · 16 comments · Fixed by #8900
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@rgl
Copy link

rgl commented Mar 25, 2024

What happened?

updating any resource takes too long (> 1s) which is substantial higher than the apparently equivalent kubectl command.

What did you expect to happen?

Expected to see deployments displayed in roughly the same amount of time as kubectl get deployments -A.

How can we reproduce it (as minimally and precisely as possible)?

Observe the time taken with kubectl, 0.068s:

$ time kubectl get deployments -A
NAMESPACE            NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
cert-manager         cert-manager                                          1/1     1            1           9h
cert-manager         cert-manager-cainjector                               1/1     1            1           9h
cert-manager         cert-manager-webhook                                  1/1     1            1           9h
kube-dashboard       kube-dashboard-kong                                   1/1     1            1           9h
kube-dashboard       kube-dashboard-kubernetes-dashboard-api               1/1     1            1           9h
kube-dashboard       kube-dashboard-kubernetes-dashboard-auth              1/1     1            1           9h
kube-dashboard       kube-dashboard-kubernetes-dashboard-metrics-scraper   1/1     1            1           9h
kube-dashboard       kube-dashboard-kubernetes-dashboard-web               1/1     1            1           9h
kube-external-dns    kube-external-dns                                     1/1     1            1           9h
kube-external-dns    pdns                                                  1/1     1            1           9h
kube-metallb         kube-metallb-controller                               1/1     1            1           9h
kube-system          coredns                                               1/1     1            1           9h
kube-traefik         kube-traefik                                          1/1     1            1           9h
local-path-storage   local-path-provisioner                                1/1     1            1           9h
pgadmin              pgadmin-pgadmin4                                      1/1     1            1           9h

real	0m0,068s
user	0m0,065s
sys	0m0,016s

Getting, and displaying the entire yaml, 0.140s:

$ time kubectl get deployments -A -o yaml
...
real	0m0,140s
user	0m0,112s
sys	0m0,036s

Observe the time taken with the browser, 1.2s:

image

Anything else we need to know?

This was tested in a kind cluster, with traefik ingress controller, sending data to kong using http (without tls), and lifting all resource limits (also note that modifying the api replicas does not seem to make much difference):

- name: Install kubernetes-dashboard
  kubernetes.core.helm:
    name: kube-dashboard
    chart_ref: kubernetes-dashboard/kubernetes-dashboard
    chart_version: '{{ kind_kubernetes_dashboard_chart_version }}'
    release_namespace: kube-dashboard
    create_namespace: true
    update_repo_cache: true
    values:
      kong:
        proxy:
          http:
            enabled: true
      app:
        settings:
          global:
            logsAutoRefreshTimeInterval: 0
            resourceAutoRefreshTimeInterval: 30
      api:
        scaling:
          replicas: 1
        containers:
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 0
              memory: 0
      auth:
        containers:
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 0
              memory: 0
      web:
        containers:
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 0
              memory: 0
      metricsScraper:
        containers:
          resources:
            requests:
              cpu: 0
              memory: 0
            limits:
              cpu: 0
              memory: 0

The entire ansible playbook is at:

https://github.com/rgl/my-ubuntu-ansible-playbooks/tree/upgrade-kubernetes-dashboard

Have a look a the last commit in that branch to see just the kubernetes-dashboard changes.

What browsers are you seeing the problem on?

No response

Kubernetes Dashboard version

7.1.2

Kubernetes version

1.29.2

Dev environment

No response

@rgl rgl added the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2024
@floreks
Copy link
Member

floreks commented Mar 28, 2024

Could you try disabling metrics and checking if it improves anything? Pass --metrics-provider=none arg to the API. That's the only thing I can think of that could be a bottleneck here.

@rgl
Copy link
Author

rgl commented Mar 31, 2024

@floreks ah that did the trick! now it pretty fast!

@floreks
Copy link
Member

floreks commented Mar 31, 2024

I will have to investigate that at some point. There were no real changes to metrics gathering. Maybe there is an issue with metrics server responsiveness.

@rgl
Copy link
Author

rgl commented Mar 31, 2024

hummm I do not have metrics-server installed in my kind cluster. without metrics-server is this expected to be slow?

if so, maybe the FAQ should make it more explicit?

the chart values.yaml comments seem to be more explicit. maybe put that in the faq?

@sushain97
Copy link

sushain97 commented Apr 7, 2024

👋 I'm experiencing a similar issue after upgrading from a much earlier version. I added:

api:
  containers:
    args:
      - --metrics-provider=none

and things are substantially better on most pages.

However, some pages still struggle to load quickly (especially the Workloads page) and I have less than 150 pods.

I'm running k3s with the builtin metrics-server.

Some request timings:

image

image

edit: Eventually, things got super slow again after I clicked around a bunch. Then, I restarted the api pod and things got snappy again...

@floreks
Copy link
Member

floreks commented Apr 7, 2024

If you start clicking too much and spamming API server with requests throttling will kick in and significantly slow down your responses. Restarting API server can 'reset' throttling and it will work faster. Normal use should be ok.

@sushain97
Copy link

If you start clicking too much and spamming API server with requests throttling will kick in and significantly slow down your responses. Restarting API server can 'reset' throttling and it will work faster. Normal use should be ok.

Hmm, I'm still a bit surprised that I can cause throttling by human-scale-clicking around. To be clear, I wasn't trying to stress the system, just view different panels in the UI :)

Here's how the requests from /#/workloads?namespace=_all look after ~6 hours of not accessing the dashboard at all:

image

There aren't any timeouts but this is still really slow, right?

@floreks
Copy link
Member

floreks commented Apr 8, 2024

That is definitely unexpected. What device are you using for your k3s installation?

@sushain97
Copy link

sushain97 commented Apr 9, 2024

That is definitely unexpected. What device are you using for your k3s installation?

4 cores of an AMD EPYC 7371.

Some quick benchmarks:

sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl get pods -A'           18:20:19
Benchmark 1: kubectl get pods -A
  Time (mean ± σ):     220.6 ms ±   4.1 ms    [User: 207.0 ms, System: 71.1 ms]
  Range (min … max):   214.3 ms … 227.9 ms    13 runs

sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl describe pods -A'      18:20:29
Benchmark 1: kubectl describe pods -A
  Time (mean ± σ):      1.231 s ±  0.031 s    [User: 0.533 s, System: 0.123 s]
  Range (min … max):    1.188 s …  1.294 s    10 runs

sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl get deployments -A'    18:20:43
Benchmark 1: kubectl get deployments -A
  Time (mean ± σ):     177.5 ms ±   7.1 ms    [User: 175.0 ms, System: 58.1 ms]
  Range (min … max):   169.8 ms … 195.3 ms    16 runs

sushain@vesuvianite ~ ❯❯❯ hyperfine 'kubectl describe deployments -A'
Benchmark 1: kubectl describe deployments -A
  Time (mean ± σ):      1.021 s ±  0.032 s    [User: 0.426 s, System: 0.121 s]
  Range (min … max):    0.980 s …  1.097 s    10 runs

So I guess my timings in the UI aren't that much slower if it's calling the equivalent of kubectl describe...

@floreks
Copy link
Member

floreks commented Apr 9, 2024

We also can't directly compare kubectl to the UI as we have to make more calls than kubectl to get some extra information and apply additional logic such as server side pagination, sorting, filtering. It will always be slower.

@sushain97
Copy link

sushain97 commented Apr 9, 2024

We also can't directly compare kubectl to the UI as we have to make more calls than kubectl to get some extra information and apply additional logic such as server side pagination, sorting, filtering. It will always be slower.

Yep, that makes sense. FWIW I jumped from docker.io/kubernetesui/dashboard-api:v1.0.0 to docker.io/kubernetesui/dashboard-api:1.4.1 so there might be a bunch of changes... maybe I'll try bisecting through the Helm chart versions at some point.

@floreks
Copy link
Member

floreks commented Apr 9, 2024

@sushain97 I have been further debugging the performance issue and pinned it down exactly. Add --sidecar-host arg to dashboard API deployment. Example: --sidecar-host=kubernetes-dashboard-metrics-scraper.dashboard where kubernetes-dashboard-metrics-scraper is metrics-scraper service name and dashboard is your namespace where Dashboard is deployed.

I honestly have no idea what is causing in-cluster service proxy to be super slow compared to accessing metrics scraper with HTTP client through service proxy directly. I don't see anything that changed there recently.

@sushain97
Copy link

Hm, it doesn't feel too different to me:

image

Here's what I have:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  repo: https://kubernetes.github.io/dashboard/
  chart: kubernetes-dashboard
  targetNamespace: kubernetes-dashboard
  version: 7.2.0
  valuesContent: |-
    app:
      scheduling:
        nodeSelector:
            kubernetes.io/hostname: kube.local.skc.name
    # https://github.com/kubernetes/dashboard/issues/8835
    api:
      containers:
        args:
          - --metrics-provider=none
          - --sidecar-host=kubernetes-dashboard-metrics-scraper.kubernetes-dashboard
    kong:
      proxy:
        http:
          enabled: true

@bnabholz
Copy link

I encountered a similar thing once I upgraded to the newer versions of kubernetes-dashboard (lots of requests timing out). API server logs showed this, client-side throttling in effect:

2024/04/10 04:24:53 Getting list of namespaces
2024/04/10 04:24:54 Getting list of all jobs in the cluster
2024/04/10 04:24:55 Getting list of all pods in the cluster
I0410 04:24:56.623578       1 request.go:697] Waited for 1.199406392s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kube-system/pod-list/kube-multus-ds-5cc2s,kubed-57f78db5b6-2hvch,external-dns-54fdb56c7-llcp8,kube-api-proxy-5769f97cdf-mkhgz,kube-proxy-m5l4b,kube-scheduler-swerver,kube-controller-manager-swerver,kube-apiserver-swerver,etcd-swerver,coredns-76f75df574-wt796,coredns-76f75df574-q99kd,openebs-lvm-controller-0,openebs-lvm-node-w5vhp,calico-node-sngft,smarter-device-manager-gs99r,metrics-server-85bc948865-b7xrv,calico-kube-controllers-9d77f677d-m84kv/metrics/cpu/usage_rate
2024/04/10 04:25:01 Getting pod metrics
2024/04/10 04:25:03 Getting list of namespaces
2024/04/10 04:25:04 Getting list of all pods in the cluster
I0410 04:25:06.823448       1 request.go:697] Waited for 2.783877882s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/memory/usage
2024/04/10 04:25:13 Getting list of namespaces
2024/04/10 04:25:14 Getting pod metrics
2024/04/10 04:25:14 Getting list of all pods in the cluster
I0410 04:25:16.824112       1 request.go:697] Waited for 1.992075126s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/cpu/usage_rate
2024/04/10 04:25:23 Getting list of namespaces
2024/04/10 04:25:24 Getting list of all pods in the cluster
I0410 04:25:27.023139       1 request.go:697] Waited for 5.387414496s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/guacamole/pod-list/postgres-77659c7c89-2n55q,guacamole-64b6c4c56-vxz9d,oauth2-proxy-64968f4c7f-df6cn/metrics/memory/usage
2024/04/10 04:25:28 Getting pod metrics
2024/04/10 04:25:33 Getting list of namespaces
2024/04/10 04:25:34 Getting list of all pods in the cluster
I0410 04:25:37.023852       1 request.go:697] Waited for 4.792209339s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubevirt/pod-list/virt-handler-cgnwr,virt-api-54666f869-q8sg9,virt-controller-c67776ccb-949z4,virt-operator-67d55bb884-rwmjs,virtvnc-65986fb5d7-6ghmr,virt-operator-67d55bb884-nl2k8,kvpanel-d46bb99dd-7ss6d,virt-controller-c67776ccb-82pjg/metrics/memory/usage

Setting metrics-provider=none does seem to help:

2024/04/10 04:28:26 Getting list of namespaces
2024/04/10 04:28:36 Getting list of namespaces
2024/04/10 04:28:46 Getting list of namespaces
2024/04/10 04:28:52 Getting list of all pods in the cluster
2024/04/10 04:28:53 Getting pod metrics
2024/04/10 04:28:56 Getting list of namespaces
2024/04/10 04:29:02 Getting list of all pods in the cluster
2024/04/10 04:29:03 Getting pod metrics
2024/04/10 04:29:06 Getting list of namespaces
2024/04/10 04:29:06 Getting list of all pods in the cluster
2024/04/10 04:29:06 Getting pod metrics
2024/04/10 04:29:09 Getting list of all deployments in the cluster
2024/04/10 04:29:12 Getting list of all pods in the cluster
2024/04/10 04:29:12 Getting pod metrics
2024/04/10 04:29:16 Getting list of namespaces
2024/04/10 04:29:22 Getting list of all pods in the cluster
2024/04/10 04:29:22 Getting pod metrics
2024/04/10 04:29:26 Getting list of namespaces

...but that wasn't the first thing that I tried because I wanted to keep metrics.

What I found was that in

return rest.InClusterConfig()
if the in-cluster config is being used (the common case?) then it's being immediately returned and the default request limits at
config.QPS = DefaultQPS
aren't being applied. I think that buildBaseConfig needs to fetch its config from whatever source it can, but then also apply its default settings on top of that, specifically the queries per second limit.

Below is the compare of what I ended up using for my own use case, but I feel like I could clean it up as far as pointer usage, happy for any advice.

master...bnabholz:kubernetes-dashboard:fixes/qps

@floreks
Copy link
Member

floreks commented Apr 10, 2024

Hm, it doesn't feel too different to me:

image

Here's what I have:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  repo: https://kubernetes.github.io/dashboard/
  chart: kubernetes-dashboard
  targetNamespace: kubernetes-dashboard
  version: 7.2.0
  valuesContent: |-
    app:
      scheduling:
        nodeSelector:
            kubernetes.io/hostname: kube.local.skc.name
    # https://github.com/kubernetes/dashboard/issues/8835
    api:
      containers:
        args:
          - --metrics-provider=none
          - --sidecar-host=kubernetes-dashboard-metrics-scraper.kubernetes-dashboard
    kong:
      proxy:
        http:
          enabled: true

You could actually reenable metrics with that sidecar host change. If that doesn't help then it might be your machine. When I was testing locally on my kind cluster response times went down from 1-3 seconds to 100ms on average for every view with all namespaces selected.

@floreks
Copy link
Member

floreks commented Apr 10, 2024

I encountered a similar thing once I upgraded to the newer versions of kubernetes-dashboard (lots of requests timing out). API server logs showed this, client-side throttling in effect:

2024/04/10 04:24:53 Getting list of namespaces
2024/04/10 04:24:54 Getting list of all jobs in the cluster
2024/04/10 04:24:55 Getting list of all pods in the cluster
I0410 04:24:56.623578       1 request.go:697] Waited for 1.199406392s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kube-system/pod-list/kube-multus-ds-5cc2s,kubed-57f78db5b6-2hvch,external-dns-54fdb56c7-llcp8,kube-api-proxy-5769f97cdf-mkhgz,kube-proxy-m5l4b,kube-scheduler-swerver,kube-controller-manager-swerver,kube-apiserver-swerver,etcd-swerver,coredns-76f75df574-wt796,coredns-76f75df574-q99kd,openebs-lvm-controller-0,openebs-lvm-node-w5vhp,calico-node-sngft,smarter-device-manager-gs99r,metrics-server-85bc948865-b7xrv,calico-kube-controllers-9d77f677d-m84kv/metrics/cpu/usage_rate
2024/04/10 04:25:01 Getting pod metrics
2024/04/10 04:25:03 Getting list of namespaces
2024/04/10 04:25:04 Getting list of all pods in the cluster
I0410 04:25:06.823448       1 request.go:697] Waited for 2.783877882s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/memory/usage
2024/04/10 04:25:13 Getting list of namespaces
2024/04/10 04:25:14 Getting pod metrics
2024/04/10 04:25:14 Getting list of all pods in the cluster
I0410 04:25:16.824112       1 request.go:697] Waited for 1.992075126s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubernetes-dashboard/pod-list/kubernetes-dashboard-api-774cf68885-sqdbw,kubernetes-dashboard-web-5b8d87bf85-n2smh,kubernetes-dashboard-auth-6cf78cdd47-5qb2h,kubernetes-dashboard-kong-6cf54d7fcf-74ltv,kubernetes-dashboard-metrics-scraper-9758854f6-gpzlb,kubernetes-dashboard-proxy-5c7cd7d76c-dxdw9/metrics/cpu/usage_rate
2024/04/10 04:25:23 Getting list of namespaces
2024/04/10 04:25:24 Getting list of all pods in the cluster
I0410 04:25:27.023139       1 request.go:697] Waited for 5.387414496s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/guacamole/pod-list/postgres-77659c7c89-2n55q,guacamole-64b6c4c56-vxz9d,oauth2-proxy-64968f4c7f-df6cn/metrics/memory/usage
2024/04/10 04:25:28 Getting pod metrics
2024/04/10 04:25:33 Getting list of namespaces
2024/04/10 04:25:34 Getting list of all pods in the cluster
I0410 04:25:37.023852       1 request.go:697] Waited for 4.792209339s due to client-side throttling, not priority and fairness, request: GET:https://10.152.183.1:443/api/v1/namespaces/kubernetes-dashboard/services/kubernetes-dashboard-metrics-scraper/proxy/api/v1/dashboard/namespaces/kubevirt/pod-list/virt-handler-cgnwr,virt-api-54666f869-q8sg9,virt-controller-c67776ccb-949z4,virt-operator-67d55bb884-rwmjs,virtvnc-65986fb5d7-6ghmr,virt-operator-67d55bb884-nl2k8,kvpanel-d46bb99dd-7ss6d,virt-controller-c67776ccb-82pjg/metrics/memory/usage

Setting metrics-provider=none does seem to help:

2024/04/10 04:28:26 Getting list of namespaces
2024/04/10 04:28:36 Getting list of namespaces
2024/04/10 04:28:46 Getting list of namespaces
2024/04/10 04:28:52 Getting list of all pods in the cluster
2024/04/10 04:28:53 Getting pod metrics
2024/04/10 04:28:56 Getting list of namespaces
2024/04/10 04:29:02 Getting list of all pods in the cluster
2024/04/10 04:29:03 Getting pod metrics
2024/04/10 04:29:06 Getting list of namespaces
2024/04/10 04:29:06 Getting list of all pods in the cluster
2024/04/10 04:29:06 Getting pod metrics
2024/04/10 04:29:09 Getting list of all deployments in the cluster
2024/04/10 04:29:12 Getting list of all pods in the cluster
2024/04/10 04:29:12 Getting pod metrics
2024/04/10 04:29:16 Getting list of namespaces
2024/04/10 04:29:22 Getting list of all pods in the cluster
2024/04/10 04:29:22 Getting pod metrics
2024/04/10 04:29:26 Getting list of namespaces

...but that wasn't the first thing that I tried because I wanted to keep metrics.

What I found was that in

return rest.InClusterConfig()
if the in-cluster config is being used (the common case?) then it's being immediately returned and the default request limits at
config.QPS = DefaultQPS
aren't being applied. I think that buildBaseConfig needs to fetch its config from whatever source it can, but then also apply its default settings on top of that, specifically the queries per second limit.

Below is the compare of what I ended up using for my own use case, but I feel like I could clean it up as far as pointer usage, happy for any advice.

master...bnabholz:kubernetes-dashboard:fixes/qps

Ye, I have pinned it down to in-cluster client too, but I actually ended up using fake rate limiter as i.e. internal rest client derived from client was also overriding some configuration for me. I will create a PR with a bunch of changes including this fix a bit later today.

Thanks for your help anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
4 participants