-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs][proposal] Scaled Prometheus Pipeline #2118
Conversation
47f47ae
to
20a0162
Compare
20a0162
to
80f6d03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorganize the doc a bit for me please:
- Problem statement: as-is
- Solution proposal: Thanos with single prom server (or possibly HA) and object storage. Elaborate on the current proposed architecture on the push and query sides - what will you deploy, what talks to what, data flow for timeseries from the edge. Describe what object storage is and what our options for it are for public cloud and on-prem deployments
- Implementation details: query-side
- Implementation details: push-side
|
||
Object storage will allow us to only store a few hours of metrics on the server itself (potentially keeping everything in-memory) and then exporting older metrics to object storage elsewhere. For example on an AWS deployment metrics would be stored in S3. | ||
|
||
We will then deploy multiple Querier components behind a load balancer which are configured to talk to both the prometheus server and the Object storage. This will distribute the compute and I/O load away from the prometheus server to the stateless querier components which can be trivially scaled horizontally to handle increase query loads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increased
|
||
With this setup, we only need to deploy the Thanos `sidecar` and multiple `Querier` components, along with Object storage to achieve faster queries. | ||
|
||
### Cortex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just keep this doc about thanos which seems like a far more approriate solution to our problem space
Signed-off-by: Scott8440 <scott8440@gmail.com>
|
||
|Step |Est. Time | | ||
|--- |--- | | ||
|Deploy Thanos locally and experiment with loads to validate query time improvements |2 wk | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll probably have to deploy this on AWS yourself otherwise you won't be able to test what kind of impact object storage has on query performance
Summary
Design doc discussing how to scale prometheus with the end goal of improving query times and supporting increased capacity.
Test Plan
N/a