Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Azure Network API calls by azure_sd_config causing API throttling #8481

Closed
infa-kparida opened this issue Feb 12, 2021 · 4 comments
Closed

Comments

@infa-kparida
Copy link

We are using azure_sd_config to discover VM's and VMSS.How ever for around 400 VM+VMSS Prometheus is making 60k+ API calls where as Azure has a limit of 10k API calls.This is causing instability for other application who are making azure API calls

Can we optimize the azure discovery codebase to reduce the number of calls ?

System Information :

  • We observed this issue in Prometheus 2.15.2 and then upgraded to 2.24.1 but still facing the same issue
  • Prometheus is Running on Kubernetes 1.16
@roidelapluie
Copy link
Member

Thank you, this is a recurring/known limitation of azure, but can you share a bit of your prometheus config so we can see if the discovery configs are used as expected?

@infa-kparida
Copy link
Author

infa-kparida commented Feb 13, 2021

Hi @roidelapluie : Here is my config

- job_name: prometheus
   static_configs:
   - targets:
     - localhost:9090
 - azure_sd_configs:
   - authentication_method: ManagedIdentity
     environment: AzurePublicCloud
     port: 1934
     refresh_interval: 300s
     subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
   - authentication_method: ManagedIdentity
     environment: AzurePublicCloud
     port: 1935
     refresh_interval: 300s
     subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
   - authentication_method: ManagedIdentity
     environment: AzurePublicCloud
     port: 1936
     refresh_interval: 300s
     subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
   - authentication_method: ManagedIdentity
     environment: AzurePublicCloud
     port: 1937
     refresh_interval: 300s
     subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
   job_name: haproxy_exporter
   metrics_path: /metrics
   relabel_configs:
   - action: keep
     regex: westus2
     separator: ;
     source_labels:
     - __meta_azure_machine_location
   - action: keep
     regex: .*-haproxy.*
     separator: ;
     source_labels:
     - __meta_azure_machine_tag_APPS
   - action: replace
     regex: (.*)
     replacement: $1
     separator: ;
     source_labels:
     - __meta_azure_machine_tag_BUSINESSUNIT
     target_label: bu 

@roidelapluie
Copy link
Member

roidelapluie commented Feb 13, 2021

Hi @roidelapluie : Here is my config

Thank you!! You configuration is correct but there is a way to reduce the calls to the Azure API.

Prometheus is able to reuse the same SD configs, reusing the same API calls for multiple jobs.

The condition for this is that the SD config is exactly the same. In you case you should make azure_sd_configs identical.

That would require you to align your configurations and reuse relabel_configs to change the port. [Because the ports are currently different in the sd configs, Prometheus can not reuse them (even if the rest is identical)].

In your case, you will also need to split your configuration in multiple jobs (I assume it is the case and you have provided a partial config):

- job_name: prometheus
   static_configs:
   - targets:
     - localhost:9090
- job_name: job1
  azure_sd_configs:
   - authentication_method: ManagedIdentity
     environment: AzurePublicCloud
     port: 80
     refresh_interval: 300s
     subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
  relabel_configs:
  - source_labels: [__meta_azure_machine_tag_privateip]
    regex: (.+)
    replacement: ${1}:9090
    target_label: __address__
- job_name: job2
  azure_sd_configs:
   - authentication_method: ManagedIdentity
     environment: AzurePublicCloud
     port: 80
     refresh_interval: 300s
     subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
  relabel_configs:
    - source_labels: [__meta_azure_machine_private_ip]
      regex: (.+)
      replacement: ${1}:9100
      target_label: __address__

That would divide the calls to the azure API by 4.

@infa-kparida
Copy link
Author

infa-kparida commented Feb 15, 2021

Thanks @roidelapluie for the help.The above trick of reusing SD for multiple jobs drastically reduced the number of API calls.We also discovered we had similar 20 more jobs running and optimized those too.I guess this is one of the feature which should be well documented in the official prometheus document.

@prometheus prometheus locked as resolved and limited conversation to collaborators Nov 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants