Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage (32 vCPUs) - looks due to targets discovery in K8s #8014

Closed
lorenzo-biava opened this issue Oct 6, 2020 · 69 comments
Closed

Comments

@lorenzo-biava
Copy link

lorenzo-biava commented Oct 6, 2020

What did you do?

We're monitoring a Kubernetes cluster consisting of about 400 nodes and 4500 pods, leveraging a single Prometheus instance with 32 vCPUs (almost fully utilized, while memory hovers between 40-50Gi). The setup leverages Prometheus Operator and most of the targets come from Service Monitors definitions (shouldn't be too relevant for the issue though).
There are about 130 target pools, with a few of those that result each in a few hundreds pods to scrape (a handful can have a couple thousands pods).
Judging by the CPU profiling graph, it looks like most of the CPU is used to update those target pools.
pprof.prometheus.samples.cpu.005.pb.gz

prof-prod1c-20201001

EDIT: We're experiencing the same in another cluster with much less total Pods (~1500) but way higher target pools (~450).

What did you expect to see?

Not exactly sure what overall CPU usage to expect for such load, but definitely not >60% of 32vCPUs for targets discovery alone.

In case this usage is expected (and provided it is indeed coming from targets discovery), I would expect to be able to set a custom interval for targets update to tune such behavior or some other ways to reduce CPU footprint.

What did you see instead? Under which circumstances?

32 vCPUs (almost fully utilized), >60% of which seems to be related to targets discovery.

I see about 80 of such pools taking more than 5 seconds to get synched (varying between 4 and 8 seconds).
If my understanding is correct, the sync is executed every 5 seconds (

ticker := time.NewTicker(5 * time.Second)
).

Environment

  • System information:

Linux 4.15.0-1093-azure x86_64

  • Prometheus version:

prometheus, version 2.20.1 (branch: HEAD, revision: 983ebb4)
build user: root@7cbd4d1c15e0
build date: 20200805-17:26:58
go version: go1.14.6

  • Prometheus configuration file:
global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    cluster: prod1c
    prometheus: monitoring/prometheus-operator-prometheus
    prometheus_replica: prometheus-prometheus-operator-prometheus-0
alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  alertmanagers:
  - kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names:
        - monitoring
    scheme: http
    path_prefix: /
    timeout: 10s
    api_version: v1
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      separator: ;
      regex: prometheus-operator-alertmanager
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      separator: ;
      regex: web
      replacement: $1
      action: keep
rule_files:
- /etc/prometheus/rules/prometheus-prometheus-operator-prometheus-rulefiles-0/*.yaml
- /etc/prometheus/rules/prometheus-prometheus-operator-prometheus-rulefiles-1/*.yaml
scrape_configs:
- job_name: asraas-prod/ambassador-asraas-prod/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - asraas-prod
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_service]
    separator: ;
    regex: ambassador-admin
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: ambassador-admin
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: ambassador-admin
    action: replace
- job_name: asraas-prod/bofa-eng-usa-400-krypton/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - asraas-prod
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: krypton
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_release]
    separator: ;
    regex: bofa-eng-usa-400
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: kr-svc-http
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: kr-svc-http
    action: replace
- job_name: asraas-prod/bofa-eng-usa-400-krypton/1
  honor_timestamps: true
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - asraas-prod
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app]
    separator: ;
    regex: krypton
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_label_release]
    separator: ;
    regex: bofa-eng-usa-400
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: kr-fluentd-metrics-port
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: kr-fluentd-metrics-port
    action: replace
[...]

PS: I can provide the full configuration if that's helpful, though it's quite longer

@lorenzo-biava
Copy link
Author

Removing a few jobs that weren't scraping any targets (~20) did actually reduce the CPU load significantly (~3 vCPU)
image

@roidelapluie
Copy link
Member

roidelapluie commented Oct 8, 2020

Can we have the startup logs (if possible loglevel debug)?

@lorenzo-biava
Copy link
Author

Here's the log from the past few days, including startup (I'm seeing a lot of K8s API errors, which I believe are related to transient instability of that managed cluster).
prod1c-prom-logs.zip

This is a fresh start with debug level.
prod1c-prom-logs-debug.zip

@roidelapluie
Copy link
Member

roidelapluie commented Oct 8, 2020

You are opening 28 different kubernetes connections. You could try to reuse the exact same service discoveries config and apply relabel_configs per job to select which targets to monitor.

cat prod1c-prom-logs-debug.txt |grep subs |grep kuber -c
28

@roidelapluie
Copy link
Member

Thanks for the logs anyway, I will try to have a look at the code as well.

@lorenzo-biava
Copy link
Author

@roidelapluie thanks for the suggestion; can you point me to an example of such config? I then need to check if that's possible via PromOp ServiceMonitors or something similar...
I know the overall config could be way more efficient, though we're offering the Prometheus environment to multiple teams, each of which should be able to deploy the configuration needed.

@brian-brazil
Copy link
Contributor

Basically if the bit under kubernetes_sd_configs: is identical then we'll only instantiate one k8s discovery.

@lorenzo-biava
Copy link
Author

So basically I need to get rid of the namespace in the kubernetes_sd_configs and add a relabel_configs to filter the namespace.

I believe that's not (currently) possible with ServiceMonitors, not even with a more general matchExpressions (which is limited to the Service's label:
https://github.com/prometheus-operator/prometheus-operator/blob/v0.42.1/pkg/prometheus/promcfg.go#L911).

I'll ask around and possibly reach out to PromOp to see if there are ways to do that. However having the namespace filter in the kubernetes_sd_configs seemed more logical, and I was under the impression to also be more efficient (not trying to filter among all the clusters' pods).
I still hope you can find some other optimizations in the discovery code or Prometheus-wide settings to leverage 😃

@brian-brazil
Copy link
Contributor

What's more efficient would depend on how many namespaces there are, and what proportion of them that this Prometheus is interested in. I'd expect specifying a namespace to always be more efficient though, so if this is the issue this smells like a performance issue in the k8s client code somewhere as the namespace filtering isn't happening properly on the server side.

@djsly
Copy link

djsly commented Oct 8, 2020

@brian-brazil we have

❯ k get ns --no-headers | wc -l
     112

and

❯ k get servicemonitor -A --no-headers  | awk '{ print $1}' | sort | uniq -c | sort -r
 118 asraas-stage
  87 asraas-qa
  54 gatekeeper-qa
  29 gatekeeper-dev
  24 mix-dev
  23 monitoring
  23 mix-stage
  23 mix-qa
  18 nluaas-stage
  16 gatekeeper2-dev
  16 asraas-dev
  15 nluaas-qa
   9 gatekeeper-stage
   8 fabric-stage
   8 fabric-qa
   8 fabric-dev
   8 fabric-ctp-dev
   7 fabric-ctp-qa
   6 ttsaas-dev
   6 nluaas-ctp-dev
   6 dlgaas-dev
   5 nluaas-dev
   4 ttsaas-stage
   4 svc-ps-dev
   4 nluaas-ctp-qa
   4 global-auth-stage
   4 global-auth-qa
   4 global-auth-dev
   4 global-auth-ctp-qa
   4 global-auth-ctp-dev
   4 dlgaas-qa
   3 ttsaas-qa
   3 mixidp-auth-dev
   3 media-manager-dev
   3 dlgaas-stage
   2 mixidp-auth-stage
   2 mixidp-auth-qa
   1 xarch-dev
   1 ingress-controllers
   1 cert-manager

@djsly
Copy link

djsly commented Oct 24, 2020

Any insights here would be helpful. we are getting to 500 nodes with > 2000 pods and right now we would need to get > 32 CPU for prometheus.

@chancez
Copy link

chancez commented Nov 7, 2020

What's more efficient would depend on how many namespaces there are, and what proportion of them that this Prometheus is interested in. I'd expect specifying a namespace to always be more efficient though, so if this is the issue this smells like a performance issue in the k8s client code somewhere as the namespace filtering isn't happening properly on the server side.

There is no namespace level filtering on the server-side that I'm aware of. There's two ways you can implement it. 1 'watch all namespaces' watcher, with client side filtering, or open 1 watch per-namespace (with http2 that can still be 1 connection if you re-use the client I believe).

@lorenzo-biava
Copy link
Author

lorenzo-biava commented Nov 12, 2020

So we managed to merge together a lot of ServiceMonitors. Even though they were all scraping the same namespace, we got an impressive reduction in CPU usage (less than half; see below).

I think this indicates the number of connections to Kubernetes (which should be one per namespace in this scenario, as suggested previously) might not be the primary contributor, while the sheer number of jobs that use K8s Service Discovery is.

Just to reiterate on this: is my assumption the service discovery for each target pool runs every 5 seconds correct (see here)?

image

@brian-brazil
Copy link
Contributor

Just to reiterate on this: is my assumption the service discovery for each target pool runs every 5 seconds correct (see here)?

No, that's a throttle so we don't process updates from an SD more than every 5s. Processing updates is considered to be cheap, but not that cheap.

@w4rgrum
Copy link

w4rgrum commented Nov 24, 2020

We are currently facing the same issue on our Prometheis instances: we have ~30 kube sd jobs (pod type) that are not constrained by namespaces and there are a lot of pods running on the platform (~5K pods) and the CPU usage for those instances is abnormally high.
Having a look at the discovery page of the ui we can see that most of the jobs are keeping less than 100 targets out of ~28K each.
In order to mitigate this we added selectors to some of the kube sd jobs, which helped reduce the CPU usage by half. However this is not perfect since it seems when a selector is matching nothing you see 0/28K in the discovery which is weird (I would have expected something like 0/0).
Our impressions on this is that the less targets discovered you have as a result of the kube calls the less CPU is consumed in the end, meaning the consumption might not be linked to kube calls but to what is done afterwards (relabeling and such?).

NB: we also noticed that the rate on prometheus_target_sync_length_seconds_sum dropped after adding the selectors.
sum(rate(prometheus_target_sync_length_seconds_sum{<filters>}[2m]))
image

This is a sum but if you get the details for all the scrape jobs, the jobs with the new selector drop to ~0 and the other jobs see their rate dropping.
rate(prometheus_target_sync_length_seconds_sum{<filters>}[2m])
image

@brian-brazil
Copy link
Contributor

Can you get a CPU profile?

@w4rgrum
Copy link

w4rgrum commented Nov 24, 2020

CPU profile with the selectors
profile_with_selectors.zip
profile_with_selectors

CPU profile without selectors
profile.zip
profile

@brian-brazil
Copy link
Contributor

Looks like it's that part of the code alright. It's not something we've ever really optimised, as it's not meant to be run particularly often. k8s sending 28k targets for each of 30 scrape configs every 5 seconds wasn't exactly in mind. There's probably some low hanging fruit here.

@vitkovskii
Copy link

vitkovskii commented Dec 2, 2020

@brian-brazil Hey there! What if we have a label filter in the k8s discovery plugin as a fast win? In our case discovery produces ~120 labels per target, but we use only 5 of them. Yes, there is a relabel config, but the main performance issue is to alloc memory for the number of labels (30000 targets * 120 labels=~3 million labels), sort them, hash for deduplication, and then send to relabeling. This long and hard work happens each scrape pool sync. What do you think?

@brian-brazil
Copy link
Contributor

That sounds like slowly reinventing relabelling, so would ultimately end up with the same performance costs. We have to allocate the labels one way or the other.

I'd suggest looking at ways to make what we have more efficient, rather than immediately jumping to adding yet more configuration for users. There's likely quite a few low hanging fruit performance wise, as the relevant code paths have never really been optimised.

@d-ulyanov
Copy link

The option from @vitkovskii sounds really reasonable. Unfortunately, we just wasting CPU and memory for useless work here and I have no idea what to do next because the number of targets in our K8S growing really rapidly. It seems that Prometheus becoming unscalable here. Let's find an engineering solution, colleagues.

@roidelapluie
Copy link
Member

cc @simonpasquier @brancz

@vitkovskii
Copy link

Actually, this hard work is doing per service monitor. So if you have 30 of them, you have x30 useless work. Let's count: 120 labels is about 6Kb, 30000 targets, and 30 service monitors. It's about 5.5Gb of RAM only to create raw label sets from discovery.

@roidelapluie
Copy link
Member

Do you have concrete exemples about how we could avoid that? Is that e.g. underlying labels you could delete from your pods?

@brian-brazil
Copy link
Contributor

Let's count: 120 labels is about 6Kb, 30000 targets, and 30 service monitors. It's about 5.5Gb of RAM only to create raw label sets from discovery.

That'd only be if none of them were dropped, which is unlikely to be the case. We should only be keeping dropped targets once, not 30x - if not that sounds like a low hanging fruit that could be tackled.

@lorenzo-biava
Copy link
Author

Actually, this hard work is doing per service monitor. So if you have 30 of them, you have x30 useless work.

It's definitely amplified by the number of Service Monitors. We replaced tens of SM with a single one (for a particular application), while the overall pods/labels stayed the same (or even slightly increased) and the CPU usage got to less than 50% it was before (see #8014 (comment))

@brian-brazil
Copy link
Contributor

and the CPU usage got to less than 50% it was before

I'm confused here. #8014 (comment) is talking about RAM, and now you're talking about CPU. Which is the problem?

@lorenzo-biava
Copy link
Author

For us it was definitely CPU (as per the title of the issue). Not sure about RAM, haven't seen any correlation with a change in memory usage yet.
PS: All the provided profiles are for CPU. Let me know if you also need a memory one.

@brian-brazil
Copy link
Contributor

Let's just stick with CPU then. If it is also memory that's a separate thing which we can look at optimising.

My first though on how to handle this generally so it benefits all SDs/users would be to see if we can avoid re-process a target that hasn't changed when there's refresh from SD, thus avoiding a large chunk of the relabelling processing.

@brian-brazil
Copy link
Contributor

Yeah, looking at the code at https://github.com/prometheus/prometheus/blob/master/scrape/scrape.go#L417-L431 if we made droppedTargets a map we could check if an identical target was dropped last time and fastpath that, bypassing all the relabelling.

@roidelapluie
Copy link
Member

@brian-brazil Is it a strict rule that we should expose all original labels(to UI/API) in a sorted way? Or can we break the sequence? My experiments show that the most expensive operation is sorting labels. But for relabeling it isn't required labels to be sorted. So we can sort much fewer labels only after relabeling. The drawback is that the original label set will be exposed unsorted.

We could sort them in the UI, if that is really making a big difference. But I think that relabeling might somwhow have dependencies on order, alonside other sanity checks we do (like checking for duplicate labels)

@brian-brazil
Copy link
Contributor

Is it a strict rule that we should expose all original labels(to UI/API) in a sorted way?

Yes, that's part of our documented API that cannot be broken - though the sorting doesn't matter, it is a JSON map which has no ordering.

My experiments show that the most expensive operation is sorting labels.

For the data structure to work it needs to be sorted, and that includes for relabelling so it can find the labels. Are we sorting more than we need to? We should only need to do it once per SD target.

@vitkovskii
Copy link

vitkovskii commented Dec 10, 2020

The first one is here: https://github.com/prometheus/prometheus/blob/master/scrape/target.go#L368 actually, it's not necessary for relabeling. The second time after relabeling: https://github.com/prometheus/prometheus/blob/master/scrape/target.go#L425. The second one is OK because next we should calculate hash() for deduplication of targets.

@vitkovskii
Copy link

vitkovskii commented Dec 10, 2020

and that includes for relabelling so it can find the labels

Relabeling scans all labels and it's not required them to be sorted.

@vitkovskii
Copy link

Why relabeling take a list of labels, not a map? For every target on every config iteration, it allocates a new list. This is the second bottleneck after sorting.

@brian-brazil
Copy link
Contributor

The map is the old data structure, Prometheus 2.x introduced the list for performance reasons. The map should be considered legacy, and remaining uses removed where practical.

Why not do the sort up in SD? We only need it once, not for every scrape config and it'd further prune where the old data structure is used.

Relabeling scans all labels and it's not required them to be sorted.

It's using a library which expects them to be sorted, so that invariant should be maintained. Relabelling does require them to be sorted due to this.

@shaikatz
Copy link

shaikatz commented Jan 24, 2021

@brian-brazil there are any plans to improve that area? it's still a pain for anybody who uses many service monitors in his cluster, the savings potential here is great.

I'll donate my CPU profiling here, 172 service monitors, 5 cores currently in use, we can see that at least half of them being wasted for the same targetsFromGroup function:

image

@brian-brazil
Copy link
Contributor

I think that's the first thing we should probably look at improving.

@brancz
Copy link
Member

brancz commented Feb 15, 2021

@shaikatz do you mind uploading the whole profile either as a file to github or via https://share.polarsignals.com/ ? It would be great if we could explore the profile.

I also opened prometheus-operator/prometheus-operator#3840 as I think that is another angle from which this can be optimized on the prometheus-operator side.

@shaikatz
Copy link

@brancz
Copy link
Member

brancz commented Feb 15, 2021

Could you also share an allocs profile?

From that profile it does look to me like a lot of memory-trash is produced by discovery which causes large sweep and GC CPU usage. It does all seem to add up to prometheus-operator/prometheus-operator#3840.

@shaikatz
Copy link

shaikatz commented Feb 15, 2021

Is that the one you need? profile001.svg.zip
If not, can you provide the exact pprof command to get your required profile?

@brancz
Copy link
Member

brancz commented Feb 17, 2021

Yeah it is, thank you! Yeah that seems to also point in the same direction.

@lorenzo-biava
Copy link
Author

@brancz / @brian-brazil / @roidelapluie is there any update on this issue? any options we can explore to patch this behavior?
The prom-operator's PR seems to be stuck since March...

@wulianhuo
Copy link

Same problem here.
The easy solution for now is to merge servicemonitor.
Try Using only one servicemonitor for all the services to be monitored, and the prometheus reload op will be more efficicent.

@d-ulyanov
Copy link

As MR was not accepted we've decided to implement our own separate discovery service (we're calling it "target balancer") and deliver targets in files (fileSD) to Proms. Profits: 1) reduce kubeAPI load 2) significantly reduced CPU usage on Prom instances

@m-yosefpor
Copy link

m-yosefpor commented Sep 8, 2022

It seems we are also hitting this issue, but the weird thing is we are hitting the issue with only 1 of our prometheus servers!! (we have configured 2 replicas in prometheus operator, no sharding).

However the cpu usage is mostly in scrape.run for us rather than scrape.reload, so not sure if it's the same problem.

prometheus-0: profile.pb.gz
image

prometheus-1: profile(1).pb.gz
image

You can see the difference in pprof between two instances. Also the difference in CPU usage of these instances:

image

More info:

$ oc get servicemonitor,podmonitor -A | wc -l
461
$ oc get po -A | wc -l
3646

@d-ulyanov
Copy link

@m-yosefpor we finally moved out all discovery logic outside of prometheus to separate daemon and switched to simple file/http discovery, it also allowed us to implement custom sharding logic.
Maybe it makes sense to open source this tool.

@m-yosefpor
Copy link

@m-yosefpor we finally moved out all discovery logic outside of prometheus to separate daemon and switched to simple file/http discovery, it also allowed us to implement custom sharding logic. Maybe it makes sense to open source this tool.

We would appreciate if such tools gets open-sourced.

@iamyeka
Copy link

iamyeka commented Feb 17, 2023

Need a great solution, currently we set scrape.discovery-reload-interval to a larger duration as the temporary solution.

@iamyeka
Copy link

iamyeka commented Feb 19, 2023

It's using a library which expects them to be sorted, so that invariant should be maintained. Relabelling does require them to be sorted due to this.

I can't figure out why this could happen. What library is being used?

@bboreham
Copy link
Member

bboreham commented Mar 9, 2023

The problems described at the top may be improved by #12048 and #12084.

@beorn7
Copy link
Member

beorn7 commented May 21, 2024

Hello from the bug scrub.

We assume this problem has been improved by #12048 and #12084 indeed. If you still see excessive CPU usage that we might be able to fix, please follow up here (or file a new issue).

@beorn7 beorn7 closed this as completed May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests