Choose a local metrics aggregator #62

lfrancke · 2021-07-08T12:51:52Z

We plan on deploying a local metrics aggregator on each physical server.
Whatever we choose should support the OpenMetrics/Prometheus exposition format for ingest and its federation API for output/querying.

Some candidates:

Thanos
Cortex
VictoriaMetrics

We have no knowledge about Thanos or Cortex but VictoriaMetrics seems to fit the bill.
We should at least take a look at all three options and do some research to see if there are any more available that we should consider.

This issue can be closed once we've made a decision.
The decision should probably be documented in an ADR in the documentation repository.

razvan · 2021-07-12T13:04:09Z

Thanos and Cortex

These systems and not drop-in replacements but rather they build on top of Prometheus. Their intended usage is coupled with large and long-term storage in the cloud. They add high-availability and horizontal scaling but also add a lot more complexity to the monitoring infrastructure.

Some learnings after reading the links below:

Prometheus:

time resolution rounded to millisecond

Thanos:

High complexity, lots of components
Thanos does query-time federation which is a pull mechanism, unlike Prometheus and Victoria.
Add-on for Prometheus (as a sidecar)
cloud storage

Victoria:

low resource
operation simplicity
time resolution rounded to second
metric precision is slightly sacrificed in favor of smaller storage requirements.
PromQL implementation of VictoriaMetrics differs slightly from the one in Prometheus. See links [3] and [4]

Links:

https://monitoring2.substack.com/p/big-prometheus
https://news.ycombinator.com/item?id=21995942
https://www.robustperception.io/evaluating-performance-and-correctness (prometheus vs victoria)
https://valyala.medium.com/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87 (victoria design decisions)
https://logz.io/blog/devops/prometheus-architecture-at-scale/ (thanos & cortex features on top of prometheus)

lfrancke · 2021-07-12T13:24:28Z

That sounds like Victoria ain't a bad choice.

There are two drawbacks for Victoria I can see:

Downsampling is not supported (Downsampling data VictoriaMetrics/VictoriaMetrics#36) and when it'll be supported it'll be for Enterprise only
Retention policies by space don't exist (add a limit for the disk storage size (retention size) VictoriaMetrics/VictoriaMetrics#342)

razvan · 2021-07-20T15:35:38Z

We selected Prometheus as default.

lfrancke added priority/high labels Jul 8, 2021

lfrancke mentioned this issue Jul 8, 2021

Metrics, Monitoring - Milestones 1 & 2 #61

Closed

16 tasks

razvan closed this as completed Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a local metrics aggregator #62

Choose a local metrics aggregator #62

lfrancke commented Jul 8, 2021 •

edited

razvan commented Jul 12, 2021

lfrancke commented Jul 12, 2021

razvan commented Jul 20, 2021

Choose a local metrics aggregator #62

Choose a local metrics aggregator #62

Comments

lfrancke commented Jul 8, 2021 • edited

razvan commented Jul 12, 2021

Thanos and Cortex

Prometheus:

Thanos:

Victoria:

Links:

lfrancke commented Jul 12, 2021

razvan commented Jul 20, 2021

lfrancke commented Jul 8, 2021 •

edited