Switching to Scylla prometheus #79

amnonh · 2016-11-02T10:28:08Z

This patch set the prometheus to use the Scylla Prometheus API.

The chanages are:

Modify the names from the collectd_exporter names to prometheus
names.
Use the node_exporter instead of the collectd for node metrics.
set the prometheus server to listen both to scylla and to the
node_exporter.

Signed-off-by: Amnon Heiman amnon@scylladb.com

This patch set the prometheus to use the Scylla Prometheus API. The chanages are: 1. Modify the names from the collectd_exporter names to prometheus names. 2. Use the node_exporter instead of the collectd for node metrics. 3. set the prometheus server to listen both to scylla and to the node_exporter. Signed-off-by: Amnon Heiman <amnon@scylladb.com>

tzach · 2016-11-03T08:30:45Z

Compaction chart does not work

Grafana use seastar_compaction_manager_objects
Prometheus have seastar_compaction_manager{instance="some-scylla2",job="scylla",metric="compactions",shard="0",type="objects"}

tzach · 2016-11-03T08:44:48Z

sum(irate(seastar_io_queue{type="total_operations", metric=~"streaming_reads.*"}[30s])) by (instance)

Should be

sum(irate(seastar_io_queue{type="total_operations", metric=~"streaming_read.*"}[30s])) by (instance)

reads --> read

tgrabiec · 2016-11-03T09:06:35Z

After this change the dashboards will not be compatible with 1.3 versions. What's our plan here? Should we keep the old dashboards, appending "-1.3" suffix?

tzach · 2016-11-03T09:22:34Z

tag 0.1 marks the last version to support for scylla 1.3.x
After that, it will be scylla 1.4
If we need an urgent fix to match 1.3 we will branch. Not sure it is required.
make sense?

tzach · 2016-11-03T12:55:21Z

@amnonh the dead node expression
count(up{job=\"scylla\"})-count(seastar_memory{metric=\"free\",shard=\"0\",type=\"total_operations\"})
does not as well as the old
count(up)-count(collectd_processes_ps_code{processes=\"scylla\"}>0

The first take a long time to refresh (5m), while the first was immediate.
The reason is Prometheus assume missing metric have the value of the last seen one. It takes 5 min for it to recognize it is missing. In the old version, the metric was always there, just the value change.
In the new version the metric is missing.

tgrabiec · 2016-11-03T13:32:23Z

@tzach I think shipping both new and old versions of the dashboards is easier for the users because you can use the same infrastructure to monitor old and new cluster, e.g. during rolling upgrade, or when testing various Scylla versions - no need to switch between monitoring stacks.

tzach · 2016-11-03T14:31:49Z

@tgrabiec having two dashboards types at the same time just for the upgrade phase look like an overkill to me.
You can have similar experience by running two monitoring stacks

tzach · 2016-11-03T14:37:09Z

@amnonh the example at the end of prometheus/prometheus.yml is out of date, use wrong port

## two servers example: - targets: ["172.17.0.3:9103","172.17.0.2:9103"]

tgrabiec · 2016-11-03T14:44:52Z

@tzach Why is it overkill if it makes using the monitoring stack easier? It's easier to switch between dashboards than it is to start multiple monitoring stacks. We won't have to provide 2 versions of appliances (AMI, docker), etc.

tzach · 2016-11-03T14:48:51Z

@tgrabiec I was not clear.
Your suggestion is easier for the user. I'm not sure its justify to keep two dashboards of each just for this short migration phase.

I guess its not a big effort either. Just copy the old dashboard under a new name.

tzach · 2016-11-07T09:47:30Z

Keeping two set of targets in prometheus.yaml (old and new) each with a different port is an unnecessary complication for the large majority users.
I'm keeping it simple and merging the PR.

compaction manager expr missing type attribute

56be2c6

streaming_read instead of streaming_reads

ad829ae

tzach merged commit cb3abc5 into scylladb:master Nov 7, 2016

tzach mentioned this pull request Nov 10, 2016

Slow response to dead nodes #83

Closed

amnonh deleted the prometheus_names branch October 9, 2018 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switching to Scylla prometheus #79

Switching to Scylla prometheus #79

amnonh commented Nov 2, 2016

tzach commented Nov 3, 2016

tzach commented Nov 3, 2016

tgrabiec commented Nov 3, 2016

tzach commented Nov 3, 2016

tzach commented Nov 3, 2016

tgrabiec commented Nov 3, 2016

tzach commented Nov 3, 2016

tzach commented Nov 3, 2016

tgrabiec commented Nov 3, 2016

tzach commented Nov 3, 2016

tzach commented Nov 7, 2016

Switching to Scylla prometheus #79

Switching to Scylla prometheus #79

Conversation

amnonh commented Nov 2, 2016

tzach commented Nov 3, 2016

tzach commented Nov 3, 2016

tgrabiec commented Nov 3, 2016

tzach commented Nov 3, 2016

tzach commented Nov 3, 2016

tgrabiec commented Nov 3, 2016

tzach commented Nov 3, 2016

tzach commented Nov 3, 2016

tgrabiec commented Nov 3, 2016

tzach commented Nov 3, 2016

tzach commented Nov 7, 2016