Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose a local metrics aggregator #62

Closed
Tracked by #61
lfrancke opened this issue Jul 8, 2021 · 3 comments
Closed
Tracked by #61

Choose a local metrics aggregator #62

lfrancke opened this issue Jul 8, 2021 · 3 comments

Comments

@lfrancke
Copy link
Member

lfrancke commented Jul 8, 2021

See #61

We plan on deploying a local metrics aggregator on each physical server.
Whatever we choose should support the OpenMetrics/Prometheus exposition format for ingest and its federation API for output/querying.

Some candidates:

  • Thanos
  • Cortex
  • VictoriaMetrics

We have no knowledge about Thanos or Cortex but VictoriaMetrics seems to fit the bill.
We should at least take a look at all three options and do some research to see if there are any more available that we should consider.

This issue can be closed once we've made a decision.
The decision should probably be documented in an ADR in the documentation repository.

@razvan
Copy link
Member

razvan commented Jul 12, 2021

Thanos and Cortex

These systems and not drop-in replacements but rather they build on top of Prometheus. Their intended usage is coupled with large and long-term storage in the cloud. They add high-availability and horizontal scaling but also add a lot more complexity to the monitoring infrastructure.

Some learnings after reading the links below:

Prometheus:

  • time resolution rounded to millisecond

Thanos:

  • High complexity, lots of components
  • Thanos does query-time federation which is a pull mechanism, unlike Prometheus and Victoria.
  • Add-on for Prometheus (as a sidecar)
  • cloud storage

Victoria:

  • low resource
  • operation simplicity
  • time resolution rounded to second
  • metric precision is slightly sacrificed in favor of smaller storage requirements.
  • PromQL implementation of VictoriaMetrics differs slightly from the one in Prometheus. See links [3] and [4]

Links:

  1. https://monitoring2.substack.com/p/big-prometheus
  2. https://news.ycombinator.com/item?id=21995942
  3. https://www.robustperception.io/evaluating-performance-and-correctness (prometheus vs victoria)
  4. https://valyala.medium.com/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87 (victoria design decisions)
  5. https://logz.io/blog/devops/prometheus-architecture-at-scale/ (thanos & cortex features on top of prometheus)

@lfrancke
Copy link
Member Author

That sounds like Victoria ain't a bad choice.

There are two drawbacks for Victoria I can see:

  1. Downsampling is not supported (Downsampling data VictoriaMetrics/VictoriaMetrics#36) and when it'll be supported it'll be for Enterprise only
  2. Retention policies by space don't exist (add a limit for the disk storage size (retention size) VictoriaMetrics/VictoriaMetrics#342)

@razvan
Copy link
Member

razvan commented Jul 20, 2021

We selected Prometheus as default.

@razvan razvan closed this as completed Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants