Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upProposal: expose `/api/v1/write` enpoint for remote_write storage API #4769
Comments
This comment has been minimized.
This comment has been minimized.
semyonslepov
commented
Oct 31, 2018
|
Have you tried Prometheus federation mechanism for this purpose yet? |
This comment has been minimized.
This comment has been minimized.
Federation doesn't fit, since it requires access from the top Prometheus instance to all the remote leaf Prometheus instances in different networks / datacenters (which may be behind NATs / firewalls with varying configs). This is harder to operate comparing to the case when leaf Prometheus instances write directly to the top Prometheus instance via standard |
This comment has been minimized.
This comment has been minimized.
semyonslepov
commented
Nov 1, 2018
|
I'm not sure if anybody goes to allow direct writing to Prometheus TSDB apart from Prometheus server on the same machine itself and it seems as quite a bad idea to me too. |
This comment has been minimized.
This comment has been minimized.
|
There is another use case for the exposed remote write API - seamless integration of push model via external adapter service that can collect metrics in various popular formats such as InfluxDB's line protocol, various Graphite formats and formats based on message queues. |
This comment has been minimized.
This comment has been minimized.
|
Hi, this is basically a request for adding push support to Prometheus, which has come up a lot in the past, and which we've been very careful about so far. First, a bit of background why push support hasn't been added yet to Prometheus so far: Prometheus is primarily a pull-based monitoring system (which just happens to include a TSDB, but we don't see Prometheus as a TSDB primarily), several of its core features assume that the server is in control of scraping metrics and attaching timestamps at its own configured pace. For example, recording and alerting rules are evaluated based on the server's notion of the current time. So the underlying metrics must arrive in lockstep with rule evaluations, which is best ensured with a pull model. Service discovery integration (with target metadata attaching etc.) and automatic target health monitoring are other features that rely on the pull model. So if we added a push endpoint and advertised it loudly, we would be worried that many users who don't know 100% what they are doing would shoot themselves in the foot, thinking that Prometheus supports "push" now. However, I started a rough discussion doc a long while back about pros/cons of adding push support to Prometheus: https://docs.google.com/document/d/1H47v7WfyKkSLMrR8_iku6u9VB73WrVzBHb2SB6dL9_g/edit#heading=h.2v27snv0lsur The consensus at the last dev summit was to add an item to the Prometheus roadmap to only allow backfilling of entire time series (not appending individual samples like via remote write): https://prometheus.io/docs/introduction/roadmap/#backfill-time-series And yeah, in the originally described case, federation is the usually adopted architecture, which does require making individual Prometheus servers reachable from the central one. |
This comment has been minimized.
This comment has been minimized.
I need to qualify that comment: federation is not recommended for transferring all data from one Prometheus server to another, as neither the source TSDB is optimized for that, not is it resource-efficient to do single humungous scrapes. It's more meant to transfer over a select set of aggregated series in the thousands, not in the millions. So unless the data that needs to be federated into the global Prometheus server is small-ish, there is indeed no great solution for this at the moment with Prometheus. Mind you, remote write is super inefficient resource-wise too, though. Maybe you are rather looking for something like Thanos, which gives you a global view and long-term storage, while being efficient at the transfer of large amounts of data too (much more so than remote write or federation). |
This comment has been minimized.
This comment has been minimized.
|
To sum up, if you use Prometheus to push its metrics via remote write to another Prometheus (not possible, but requested in this issue) or some LTS storage, this is still pull model, because what matters is how data were collected, right? (: So if I understand this right, the main blocker for adding this one is to avoid abuse and pushing arbitrary samples from other non-pull collectors? (which is fair point) I'm asking as we consider adding remote write receiver endpoint for Thanos system: improbable-eng/thanos#659 and we need to educate users which one to choose (sidecar + query for fresh data) or remote write. |
This comment has been minimized.
This comment has been minimized.
|
@bwplotka This depends on your perspective. From the final Prometheus's perspective, the collection happens as a push, so it is not a pull model from its perspective. It cannot reliably do the things that Prometheus normally does with the data (attach target metadata from its SD perspective, compute rules in the confidence that timestamps will arrive in lockstep with its current time, know what data should be coming in, etc.). You are effectively treating it as a TSDB mainly then, not a monitoring system first. For Thanos this might be fine, as Thanos aims to mainly be a long-term storage TSDB and not a monitoring system. For Prometheus so far the opinion of devs has been to not go this direction. |
This comment has been minimized.
This comment has been minimized.
Yup, totally agree. |
This comment has been minimized.
This comment has been minimized.
Why? From my experience it works quite well for Prometheus scraping 50K-100K samples/sec. Probably, it must be optimized somehow for Prometheus scraping millions of samples per second? |
This comment has been minimized.
This comment has been minimized.
|
@valyala On the sending side, remote write totally blows up memory usage (or at least used to, not sure about the current state of optimization) because normally sample ingestion is a very highly memory-tuned process that avoids allocations as much as possible, appends samples into the TSDB using internal TSDB series IDs when possible (rather than storing full label sets for every sample), etc. For remote write, every sample has to be re-encoded, buffered, and then sent over the wire in its fully expanded form, in protobuf format, in near-time (instead of larger compressed batches). In contrast, Thanos just ships completed (and very well-compressed) on-disk blocks, which requires much less memory memory and other resources. I'm actually glad to hear that remote write still works well for you with 50K-100K samples/sec. I'd be curious to hear how much the same Prometheus server would use if you turned remote write off. |
valyala commentedOct 22, 2018
Proposal
Sometimes is is useful to have a single big Prometheus instance with all the data from other Prometheus instances located in different networks / datacenters.
Solution
The most straightforward solution for this use case is to expose
/api/v1/writeendpoint for remote_write storage API on the big Prometheus instance, so other instances could write to it using remote storage protocol.