Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote storage #10

Closed
juliusv opened this issue Jan 4, 2013 · 170 comments
Closed

Remote storage #10

juliusv opened this issue Jan 4, 2013 · 170 comments

Comments

@juliusv
Copy link
Member

@juliusv juliusv commented Jan 4, 2013

Prometheus needs to be able to interface with a remote and scalable data store for long-term storage/retrieval.

bernerdschaefer added a commit that referenced this issue Apr 9, 2014
This ensures that these files are properly included only in testing.

[Fixes #10]
@johann8384
Copy link

@johann8384 johann8384 commented Feb 5, 2015

Is there anyone planning to work on this? Is the work done in the opentsdb-integration branch still valid or has the rest of the code-base moved past that?

@beorn7
Copy link
Member

@beorn7 beorn7 commented Feb 5, 2015

The opentsdb-integration branch is indeed completely outdated (still using the old storage backend etc.). Personally, I'm a great fan of the OpenTSDB integration, but where I work, there is not an urgent enough requirement to justify a high priority from my side...

@juliusv
Copy link
Member Author

@juliusv juliusv commented Feb 5, 2015

To be clear, the outdated "opentsdb-integration" was only for the
proof-of-concept read-back support (querying OpenTSDB through Prometheus).

Writing into OpenTSDB should be experimentally supported in master, but
the last time we tried it was a year ago on a single-node OpenTSDB.

You initially asked on #10:

"I added the storage.remote.url command line flag, but as far as I can tell
Prometheus doesn't attempt to store any metrics there."

A couple of questions:

  • did you enable the OpenTSDB option "tsd.core.auto_create_metrics"?
    Otherwise OpenTSDB won't auto-create metrics for you, as the option is
    false by default. See
    http://opentsdb.net/docs/build/html/user_guide/configuration.html
  • if you run Prometheus with -logtostderr, do you see any relevant log
    output? If there is an error sending samples to TSDB, it should be logged
    (glog.Warningf("error sending %d samples to TSDB: %s", len(s), err))
  • Prometheus also exports metrics itself about sending to OpenTSDB. On
    /metrics of your Prometheus server, you should find the counter metrics
    "prometheus_remote_storage_sent_errors_total" and
    "prometheus_remote_storage_sent_samples_total". What do these say?

Cheers,
Julius

On Thu, Feb 5, 2015 at 9:22 AM, Björn Rabenstein notifications@github.com
wrote:

The opentsdb-integration branch is indeed completely outdated (still using
the old storage backend etc.). Personally, I'm a great fan of the OpenTSDB
integration, but where I work, there is not an urgent enough requirement to
justify a high priority from my side...


Reply to this email directly or view it on GitHub
#10 (comment)
.

@sammcj
Copy link
Contributor

@sammcj sammcj commented Feb 11, 2015

I cannot +1 this enough

@mwitkow
Copy link
Contributor

@mwitkow mwitkow commented Mar 5, 2015

Is InfluxDB on the cards in any way? :)

@beorn7
Copy link
Member

@beorn7 beorn7 commented Mar 5, 2015

Radio Yerevan: "In principle yes." (Please forgive that Eastern European digression... ;)

@mwitkow
Copy link
Contributor

@mwitkow mwitkow commented Mar 5, 2015

:D That was slightly before my time ;)

@juliusv
Copy link
Member Author

@juliusv juliusv commented Mar 5, 2015

See also: https://twitter.com/juliusvolz/status/569509228462931968

We're just waiting for InfluxDB 0.9.0, which has a new data model which
should be more compatible with Prometheus's.

On Thu, Mar 5, 2015 at 10:31 AM, Michal Witkowski notifications@github.com
wrote:

:D That was slightly before my time ;)


Reply to this email directly or view it on GitHub
#10 (comment)
.

@pires
Copy link

@pires pires commented Apr 7, 2015

We're just waiting for InfluxDB 0.9.0, which has a new data model which
should be more compatible with Prometheus's.

Can I say awesome more than once? Awesome!

@fabxc
Copy link
Member

@fabxc fabxc commented Apr 7, 2015

Unfortunately, @juliusv ran some tests with 0.9 and InfluxDB consumed 14x more storage than Prometheus.

Before it was an overhead of 11x but Prometheus's could reduce storage size significantly since then - so in reality InfluxDB has apparently improved in that regard.
Nonetheless, InfluxDB did not turn out to be the eventual answer for long-term storage, yet.

@beorn7
Copy link
Member

@beorn7 beorn7 commented Apr 7, 2015

At least experimental write support is in master, as of today, so anybody can play with Influxdb receiving Prometheus metrics. Quite possible somebody finds the reason for the blow-up in storage space and everything will be unicorns and rainbows in the end...

@pires
Copy link

@pires pires commented Apr 7, 2015

@beorn7 that's great. TBH I'm not concerned about disk space, it's the cheapest resource on the cloud after all. Not to mention, I'm expecting to hold data with a very small TTL, i.e. few weeks.

@beorn7
Copy link
Member

@beorn7 beorn7 commented Apr 7, 2015

@pires In that case, why not just run two identically configured Prometheis with a reasonably large disk?
A few weeks or months is usually fine as retention time for Prometheus. (Default is 15d for a reason... :) The only problem is that if your disk breaks, your data is gone, but for that, you have the other server.

@fabxc
Copy link
Member

@fabxc fabxc commented Apr 7, 2015

@pires do you have a particular reason to hold the data in another database for that time? "A few weeks" does not seem to require a long-term storage solution. Prometheus's default retention time is 15 days - increasing that to 30 or even 60 days should not be a problem.

@pires
Copy link

@pires pires commented Apr 7, 2015

@beorn7 @fabxc I am currently using a proprietary & very specific solution that writes monitoring metrics into InfluxDB. This can eventually be replaced with Prometheus.

Thing is I have some tailored apps that read metrics from InfluxDB in order to reactively scale up/down, that would need to be rewritten to read from Prometheus instead. Also, I use continuous queries. Does Prometheus deliver such a feature?

@brian-brazil
Copy link
Member

@brian-brazil brian-brazil commented Apr 7, 2015

http://prometheus.io/docs/querying/rules/#recording-rules are the equivalent to InfluxDB's continuous queries.

@dever860
Copy link

@dever860 dever860 commented Jul 1, 2015

+1

1 similar comment
@drawks
Copy link

@drawks drawks commented Jul 31, 2015

👍

@fabxc fabxc removed this from the Small Scale Mission Critical Monitoring Use Cases milestone Sep 21, 2015
@blysik
Copy link

@blysik blysik commented Oct 8, 2015

How does remote storage as currently implemented interact with PromDash or grafana?

I have a use case where I want to run Prometheus in a 'heroku-like' environment, where the instances could conceivably go away at any time.

Then I would configure a remote, traditional influxdb cluster to store data in.

Could this configuration function normally?

@matthiasr
Copy link
Contributor

@matthiasr matthiasr commented Oct 9, 2015

This depends on your definition of "normally", but mostly, no.

Remote storage as it is is write-only; from Prometheus you would only get what it has locally.

To get at older data, you need to query OpenTSDB or InfluxDB directly, using their own interfaces and query languages. With PromDash you're out of luck in that regard; AFAIK Grafana knows all of them.

You could build your dashboards fully based on querying them and leave Prometheus to be a collection and rule evaluation engine, but you would miss out on its query language for ad hoc drilldowns over extended time spans.

@matthiasr
Copy link
Contributor

@matthiasr matthiasr commented Oct 9, 2015

Also note that both InfluxDB and OpenTSDB support are somewhat experimental, under-exercised on our side, and in flux.

@mattkanwisher
Copy link
Contributor

@mattkanwisher mattkanwisher commented Oct 21, 2015

We're kicking around the idea of a flat file exporter, thus we can start storing long term data and then once bulk import issue is done we can use that #535. Would you guys be open for a PR around this?

@juliusv
Copy link
Member Author

@juliusv juliusv commented Oct 21, 2015

For #535 take a look at my way outdated branch import-api, where I once added an import API as a proof-of-concept: https://github.com/prometheus/prometheus/commits/import-api. It's from March, so it doesn't apply to master anymore, but it just shows that in principle adding such an API using the existing transfer formats would be trivial. We just need to agree that we want this (it's a contentious issue, /cc @brian-brazil) and whether it should use the same sample transfer format as we use for scraping. The issue with this transfer format is that it's optimized for the many-series-one-sample (scrape) case, while with batch imports you often care more about importing all samples of a series at once, without having to repeat the metric name and labels for each sample (massive overhead). But maybe we don't care about efficiency in the (rare?) bulk import case, so the existing format could be fine.

For the remote storage part, there was this discussion
https://groups.google.com/forum/#!searchin/prometheus-developers/json/prometheus-developers/QsjXwQDLHxI/Cw0YWmevAgAJ about decoupling the remote storage in some generic way, but some details haven't been resolved yet. The basic idea was that Prometheus could send all samples in some well-defined format (JSON, protobuf, or whatever) to a user-specified endpoint which could then do anything it wants with it (write it to a file, send it to another system, etc.).

So it might be ok to add a flat file exporter as a remote storage backend directly to Prometheus, or resolve that discussion above and use said well-defined transfer format and an external daemon.

@brian-brazil
Copy link
Member

@brian-brazil brian-brazil commented Oct 21, 2015

I think for flat file we'd be talking the external daemon, as it's not something we can ever read back from.

@mattkanwisher
Copy link
Contributor

@mattkanwisher mattkanwisher commented Oct 26, 2015

So the more I think about it, it would be nice to have this /import-api (a raw data) api, so we can have backup nodes mirroring the data from the primary prometheus. Would their be appetite for a PR for this and corresponding piece inside of prometheus to import the data. So you can have essentially read slaves?

@brian-brazil
Copy link
Member

@brian-brazil brian-brazil commented Oct 26, 2015

For that use case we generally recommend running multiple identical Prometheus servers. Remote storage is about long term data, not redundancy or scaling.

@mattkanwisher
Copy link
Contributor

@mattkanwisher mattkanwisher commented Oct 26, 2015

I think running multiple scrapers is not a good solution cause the data won't match, also there is no way to backfill data. So we have issue where I need to spin up some redundant nodes and now they are missing a month of data. If you have an api to raw import the data you could at least catch them up. Also the same interface could be used for backups

@brian-brazil
Copy link
Member

@brian-brazil brian-brazil commented Oct 26, 2015

So we have issue where I need to spin up some redundant nodes and now they are missing a month of data. If you have an api to raw import the data you could at least catch them up. Also the same interface could be used for backups

This is the use case for remote storage, you pull the older data from remote storage rather than depending on Prometheus being stateful. Similarly in such a setup there's no need for backups, as Prometheues doesn't have any notable state.

@juliusv
Copy link
Member Author

@juliusv juliusv commented Mar 10, 2017

@brian-brazil Oh yeah, I have multiple vector selector sets, but great point about different offsets!

@pilhuhn
Copy link

@pilhuhn pilhuhn commented Mar 10, 2017

A simple static duration is not sufficient, as the remote storage may not be caught up that far yet or Prometheus may have retention going further back. I think this is something we'll have to figure out

I don't think Prometheus having retention going further back is really an issue here, as long as the remote can (already) provide the data. Worst case is with downsampling that you lose granularity.

@juliusv
Copy link
Member Author

@juliusv juliusv commented Mar 10, 2017

@pilhuhn I meant it the other way around: if you have a Prometheus retention of 15d and you query only data older than 15d from the remote storage, it doesn't necessarily mean that Prometheus will already have all data younger than 15d (due to storage wipe or whatever).

Well, for a first iteration we're just going to query all time ranges from everywhere.

@juliusv juliusv mentioned this issue Mar 15, 2017
@juliusv
Copy link
Member Author

@juliusv juliusv commented Mar 15, 2017

There's a WIP PR for the remote read integration here for anyone who would like to take a look early: #2499

@ghost
Copy link

@ghost ghost commented Apr 15, 2017

I'm trying to use the remote_storage_adapter to send metrics from prometheus to opentsdb. But I'm getting these errors in the logs:

WARN[0065] cannot send value NaN to OpenTSDB, skipping sample &model.Sample{Metric:model.Metric{"instance":"localhost:9090", "job":"prometheus", "monitor":"codelab-monitor", "location":"archived", "quantile":"0.5", "__name__":"prometheus_local_storage_maintain_series_duration_seconds"}, Value:NaN, Timestamp:1492267735191}  source=client.go:78
WARN[0065] Error sending samples to remote storage       err=invalid character 'p' after top-level value num_samples=100 source=main.go:281 storage=opentsdb

I've also tried using influxdb instead of opentsdb, with similar results:

EBU[0001] cannot send value NaN to InfluxDB, skipping sample &model.Sample{Metric:model.Metric{"job":"prometheus", "instance":"localhost:9090", "scrape_job":"ns1-web-pinger", "quantile":"0.99", "__name__":"prometheus_target_sync_length_seconds", "monitor":"codelab-monitor"}, Value:NaN, Timestamp:1492268550191}  source=client.go:76

Here's how I'm starting the remote_storage_adapter:

# this is just for influxdb, i make the appropriate changes if trying to use opentsdb
./remote_storage_adapter -influxdb-url=http://138.197.107.211:8086 -influxdb.database=prometheus -influxdb.retention-policy=autogen -log.level debug

Here's the Prometheus config:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

remote_write:
  url: "http://localhost:9201/write"

Is there something I'm misunderstanding about how to configure the remote_storage_adapter?

@juliusv
Copy link
Member Author

@juliusv juliusv commented Apr 15, 2017

@tjboring Neither OpenTSDB nor InfluxDB support float64 NaN (not a number) values, so these samples are skipped when sending samples to them. We have mentioned this problem to InfluxDB, and if we're lucky, they will support NaN values sometime in the future, or maybe we can find another workaround.

OpenTSDB issue: OpenTSDB/opentsdb#183
InfluxDB issue: influxdata/influxdb#4089

I am not sure where the invalid character 'p' after top-level value error comes from though.

@ghost
Copy link

@ghost ghost commented Apr 15, 2017

@juliusv Thanks for the pointers to the opentsdb/influxdb issues. I was just seeing the error messages on the console and thought nothing was being written, not realizing those are just samples that are being skipped. I've since confirmed that samples are indeed making it to the remote storage db. :)

@mattbostock
Copy link
Contributor

@mattbostock mattbostock commented Apr 17, 2017

Now that remote read and write APIs are in place (albeit experimental), should this issue be closed in favour of raising more specific issues as they arise?

https://prometheus.io/docs/operating/configuration/#<remote_write>
https://prometheus.io/docs/operating/configuration/#<remote_read>

@prasenforu
Copy link

@prasenforu prasenforu commented Apr 21, 2017

Any body tried with container ? Please paste Dockerfile

Because I am not able to find "remote_storage_adapter" executable file in docker "prom/prometheus" version 1.6

/prometheus # find / -name remote_storage_adapter
/prometheus #

Please

@sorrowless
Copy link

@sorrowless sorrowless commented Apr 21, 2017

@prasenforu I have built a docker image with remote_storage_adapter from current master code: gra2f/remote_storage_adapter, feel free to use it.

@juliusv I have a problems similar to @tjboring ones:

time="2017-04-21T17:45:00Z" level=warning msg="cannot send value NaN to Graphite,skipping sample &model.Sample{Metric:model.Metric{"name":"prometheus_target_sync_length_seconds", "monitor":"codelab-monitor", "job":"prometheus", "instance":"localhost:9090", "scrape_job":"prometheus", "quantile":"0.9"}, Value:NaN, Timestamp:1492796695772}" source="client.go:90"

but I am using Graphite. Is it okay?

@ghost
Copy link

@ghost ghost commented Apr 21, 2017

@sorrowless

Do you see other metrics in Graphite that you know came from Prometheus?

In my case I verified this by connecting to the Influxdb server I was using, and running a query. It gave me back metrics, which confirmed that Prometheus was indeed writing metrics; it's just that some were being skipped, per the log message.

@sorrowless
Copy link

@sorrowless sorrowless commented Apr 21, 2017

@tjboring yes, I can see some of the metrics in Graphite and what's more strange for me is that I cannot understand why some are there and some are not. For example, sy and us per CPU stored into Graphite but load average is not.

@prasenforu
Copy link

@prasenforu prasenforu commented Apr 22, 2017

@sorrowless

Not able to find the image, can you please share the url.

Thanks in advance.

@sorrowless
Copy link

@sorrowless sorrowless commented Apr 22, 2017

@prasenforu just run
$ docker pull gra2f/remote_storage_adapter
in your command line, that's all you need

@prasenforu
Copy link

@prasenforu prasenforu commented Apr 22, 2017

@sorrowless

Thanks.

@juliusv
Copy link
Member Author

@juliusv juliusv commented Apr 24, 2017

@mattbostock As you suggested, I'm closing this issue. We should open more specific remote-storage related issues in the future.

Further usage questions are best asked on our mailing lists or IRC (https://prometheus.io/community/).

@juliusv juliusv closed this Apr 24, 2017
@prasenforu
Copy link

@prasenforu prasenforu commented Apr 27, 2017

@sorrowless

I was looking the images, I saw there was file remote_storage_adapter in /usr/bin

but rest of prometheus file and volume not there,

~ # find / -name remote_storage_adapter
/usr/bin/remote_storage_adapter
~ # find / -name prometheus.yml
~ # find / -name prometheus

Anyway can you please send me the dockerfile of "gra2f/remote_storage_adapter"

@sorrowless
Copy link

@sorrowless sorrowless commented Apr 30, 2017

@prasenforu
you do not need main prometheus executable to use remote storage adapter. Use prom/prometheus image for that.
What related for Dockerfile - all it is doing is copy prebuilt remote_storage_adapter to it and run it, that's all.

@gdmelloatpoints
Copy link

@gdmelloatpoints gdmelloatpoints commented Aug 16, 2017

If anyone wants to test it out (like I need to), I wrote a small docker-compose based setup to get this up and running locally - https://github.com/gdmello/prometheus-remote-storage.

simonpasquier referenced this issue in simonpasquier/prometheus Oct 12, 2017
Make documentation for absent() not, uhm, absent
cofyc added a commit to cofyc/prometheus that referenced this issue Jun 5, 2018
Revert "Share kubernetes informers in kubernetes discovery to improve performance."
bobmshannon pushed a commit to bobmshannon/prometheus that referenced this issue Nov 19, 2018
@lock
Copy link

@lock lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.