Monitoring stack doesn't respect manager_agent.prometheus config parameter #1992

karol-kokoszka · 2023-05-26T11:28:44Z

This issue is connected to scylla-cloud deployments.

Scylla manager agent allows end-users to change the default port where agent exposes metrics for prometheus scrapping.
https://github.com/scylladb/scylla-manager/blob/e25e51487cae81fc04eb67fa5b957249a4ac6801/dist/etc/scylla-manager-agent.yaml#L31-L35

Scylla cloud uses this config to override the default port = 5090 to custom 56090.

Targets for prometheus server to scrap metrics from agents are created by monitoring stack and it looks that it always creates scrapeUrl with the port 5090.
See:

 {
        "discoveredLabels": {
          "__address__": "*****",
          "__meta_filepath": "/etc/scylla.d/prometheus/scylla_servers.yml",
          "__metrics_path__": "/metrics",
          "__scheme__": "http",
          "__scrape_interval__": "20s",
          "__scrape_timeout__": "15s",
          "cluster": "*****",
          "dc": "*****",
          "instance": "*****",
          "job": "manager_agent",
          "publicIp": "*****",
          "serverExternalId": "*****"
        },
        "labels": {
          "cluster": "*****",
          "dc": "*****",
          "instance": "*****",
          "job": "manager_agent",
          "publicIp": "*****",
          "serverExternalId": "*****",
          "serverId": "*****"
        },
        "scrapePool": "manager_agent",
        "scrapeUrl": "http://*****:5090/metrics",
        "globalUrl": "http://*****:5090/metrics",
        "lastError": "Get \"http://*****:5090/metrics\": context deadline exceeded",
        "lastScrape": "2023-05-25T13:22:41.865990115Z",
        "lastScrapeDuration": 15.000340933,
        "health": "down",
        "scrapeInterval": "20s",
        "scrapeTimeout": "15s"
      },

So finally, metrics exposed by the manager agent are not collected.
We can see it on the manager dashboard as well:

Please use scylla-manager-agent.yml file to check on what port manager-agent is exposing its metrics.

The text was updated successfully, but these errors were encountered:

karol-kokoszka added the bug label May 26, 2023

amnonh added this to the Monitoring 4.5 milestone Jun 4, 2023

amnonh mentioned this issue Jun 7, 2023

Manager agents #2000

Merged

amnonh closed this as completed in #2000 Jun 7, 2023

karol-kokoszka mentioned this issue Jul 20, 2023

agent process inflated and cause oom-kill scylladb/scylla-manager#3450

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring stack doesn't respect manager_agent.prometheus config parameter #1992

Monitoring stack doesn't respect manager_agent.prometheus config parameter #1992

karol-kokoszka commented May 26, 2023

Monitoring stack doesn't respect manager_agent.prometheus config parameter #1992

Monitoring stack doesn't respect manager_agent.prometheus config parameter #1992

Comments

karol-kokoszka commented May 26, 2023