Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching to Scylla prometheus #79

Merged
merged 3 commits into from
Nov 7, 2016
Merged

Conversation

amnonh
Copy link
Collaborator

@amnonh amnonh commented Nov 2, 2016

This patch set the prometheus to use the Scylla Prometheus API.

The chanages are:

  1. Modify the names from the collectd_exporter names to prometheus
    names.
  2. Use the node_exporter instead of the collectd for node metrics.
  3. set the prometheus server to listen both to scylla and to the
    node_exporter.

Signed-off-by: Amnon Heiman amnon@scylladb.com

This patch set the prometheus to use the Scylla Prometheus API.

The chanages are:
1. Modify the names from the collectd_exporter names to prometheus
names.
2. Use the node_exporter instead of the collectd for node metrics.
3. set the prometheus server to listen both to scylla and to the
node_exporter.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
@tzach
Copy link
Contributor

tzach commented Nov 3, 2016

Compaction chart does not work

  • Grafana use seastar_compaction_manager_objects
  • Prometheus have seastar_compaction_manager{instance="some-scylla2",job="scylla",metric="compactions",shard="0",type="objects"}

@tzach
Copy link
Contributor

tzach commented Nov 3, 2016

sum(irate(seastar_io_queue{type="total_operations", metric=~"streaming_reads.*"}[30s])) by (instance)

Should be

sum(irate(seastar_io_queue{type="total_operations", metric=~"streaming_read.*"}[30s])) by (instance)

reads --> read

@tgrabiec
Copy link
Contributor

tgrabiec commented Nov 3, 2016

After this change the dashboards will not be compatible with 1.3 versions. What's our plan here? Should we keep the old dashboards, appending "-1.3" suffix?

@tzach
Copy link
Contributor

tzach commented Nov 3, 2016

tag 0.1 marks the last version to support for scylla 1.3.x
After that, it will be scylla 1.4
If we need an urgent fix to match 1.3 we will branch. Not sure it is required.
make sense?

@tzach
Copy link
Contributor

tzach commented Nov 3, 2016

@amnonh the dead node expression
count(up{job=\"scylla\"})-count(seastar_memory{metric=\"free\",shard=\"0\",type=\"total_operations\"})
does not as well as the old
count(up)-count(collectd_processes_ps_code{processes=\"scylla\"}>0

The first take a long time to refresh (5m), while the first was immediate.
The reason is Prometheus assume missing metric have the value of the last seen one. It takes 5 min for it to recognize it is missing. In the old version, the metric was always there, just the value change.
In the new version the metric is missing.

@tgrabiec
Copy link
Contributor

tgrabiec commented Nov 3, 2016

@tzach I think shipping both new and old versions of the dashboards is easier for the users because you can use the same infrastructure to monitor old and new cluster, e.g. during rolling upgrade, or when testing various Scylla versions - no need to switch between monitoring stacks.

@tzach
Copy link
Contributor

tzach commented Nov 3, 2016

@tgrabiec having two dashboards types at the same time just for the upgrade phase look like an overkill to me.
You can have similar experience by running two monitoring stacks

@tzach
Copy link
Contributor

tzach commented Nov 3, 2016

@amnonh the example at the end of prometheus/prometheus.yml is out of date, use wrong port

## two servers example: - targets: ["172.17.0.3:9103","172.17.0.2:9103"]

@tgrabiec
Copy link
Contributor

tgrabiec commented Nov 3, 2016

@tzach Why is it overkill if it makes using the monitoring stack easier? It's easier to switch between dashboards than it is to start multiple monitoring stacks. We won't have to provide 2 versions of appliances (AMI, docker), etc.

@tzach
Copy link
Contributor

tzach commented Nov 3, 2016

@tgrabiec I was not clear.
Your suggestion is easier for the user. I'm not sure its justify to keep two dashboards of each just for this short migration phase.

I guess its not a big effort either. Just copy the old dashboard under a new name.

@tzach
Copy link
Contributor

tzach commented Nov 7, 2016

Keeping two set of targets in prometheus.yaml (old and new) each with a different port is an unnecessary complication for the large majority users.
I'm keeping it simple and merging the PR.

@tzach tzach merged commit cb3abc5 into scylladb:master Nov 7, 2016
@tzach tzach mentioned this pull request Nov 10, 2016
@amnonh amnonh deleted the prometheus_names branch October 9, 2018 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants