Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism to perform bulk imports #535

Open
juliusv opened this Issue Feb 17, 2015 · 63 comments

Comments

Projects
None yet
@juliusv
Copy link
Member

juliusv commented Feb 17, 2015

Currently the only way to bulk-import data is a hacky one involving client-side timestamps and scrapes with multiple samples per time series. We should offer an API for bulk import. This relies on #481.

EDIT: It probably won't be an web-based API in Prometheus, but a command-line tool.

@juliusv juliusv changed the title Add feature for bulk imports Add API for bulk imports Feb 17, 2015

@RichiH

This comment has been minimized.

Copy link
Member

RichiH commented Nov 17, 2015

Just as a random data point, I have several (I think valid) use cases for bulk imports.

@grypyrg

This comment has been minimized.

Copy link

grypyrg commented Jan 19, 2016

+1

@foic

This comment has been minimized.

Copy link

foic commented Feb 14, 2016

This would be very useful for me as well. I understand that this could be used to blur the line between a true time-series event store and prometheus' communicated focus as a representation of recent monitoring state.

Its beneficial In the case where prometheus is the ingestion point for a fan out system involving influxdb or federated rollup prometheus nodes - this would allow me to just simply keep pumping all data through the prometheus entry points without having to have two input paths in the case where the data feed is delayed.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 14, 2016

@foic I can't think of a sane Prometheus setup where that'd be useful. If you want to pump data to influxdb, it'd be best to do it without involving Prometheus as Prometheus is adding no value in such a setup. Similarly rollup depends on continuous pulling, and delayed data can't work with that.

I'd generally adivse to run duplicates, and live with the odd gap. It's not worth the effort to try and get it perfect.

@foic

This comment has been minimized.

Copy link

foic commented Feb 15, 2016

Thanks @brian-brazil - this is pretty much an expected response :-) Sounds like there is
too much to change to make all the pieces work with historical data. Alert
manager, rollup etc etc.

Should this feature request be closed then if working with historical data
is too difficult?

On 15 February 2016 at 10:02, Brian Brazil notifications@github.com wrote:

@foic https://github.com/foic I can't think of a sane Prometheus setup
where that'd be useful. If you want to pump data to influxdb, it'd be best
to do it without involving Prometheus as Prometheus is adding no value in
such a setup. Similarly rollup depends on continuous pulling, and delayed
data can't work with that.

I'd generally adivse to run duplicates, and live with the odd gap. It's
not worth the effort to try and get it perfect.


Reply to this email directly or view it on GitHub
#535 (comment)
.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 15, 2016

@foic What you're requesting is different to what this feature request is mainly about.

@Baughn

This comment has been minimized.

Copy link

Baughn commented Feb 15, 2016

There's some value to bulk import, even in a world where 'storage' isn't the intended purpose of Prometheus. For example...

Recently I've been working on a Prometheus configuration for a certain forum. Although some of the metrics are from PHP, most of the really useful ones are being exported by an nginx logtailer I wrote.

In order to quickly iterate on possible metrics, prior to putting it in production--that hasn't happened yet--I added code to the logtailer that lets it read logs with a time-offset, pausing between each record until it's "supposed" to happen. That's okay-ish, but it'd be much nicer if I could bulk import an entire day's worth of logs at once without actually waiting a day. Then I could look at the result, clear the DB, and try again.

There's the timestamp hack, but none of the client libraries support timestamps, and it's ugly anyway. I haven't tried to use it.

@jinxcat

This comment has been minimized.

Copy link

jinxcat commented Feb 15, 2016

@Baughn what do you mean by "timestamp hack"?

I have use for a bulk import endpoint as well, and that's for back-filling data that was interrupted/unavailable on the normal time flow.

Overall, I feel it might be somewhat on the border of what is the intended model of prometheus, but there will always be people with the need to diverge from the ideal setup or situation.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 15, 2016

that's for back-filling data that was interrupted/unavailable on the normal time flow.

That's also not what this issue is about. This issue covers brand new data, with nothing newer than it in the database. It's also not backfilling data, which is when there's nothing older than it in the database.

We've never even discussed this variant.

@Baughn

This comment has been minimized.

Copy link

Baughn commented Feb 15, 2016

@Baughn what do you mean by "timestamp hack"?

The /metrics format allows specifying a timestamp in addition to the values. None of the clients support this, and Prometheus doesn't support adding values that are any older than than the newest one.

There's a list of caveats as long as your arm, starting with the impossibility of reliably doing this with multiple tasks exporting metrics, but in theory it should be possible to use timestamps to simulate fast-forwarding through historical data, which would cover my specific scenario.

I've never tried it, though.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Oct 26, 2016

Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time.

@delgod

This comment has been minimized.

Copy link

delgod commented Feb 10, 2017

@brian-brazil thanks!

Currently it is need ask user to run systat every minute, after that ask him to dump sar results to file and send it. after that analyze results manually or via kSar tool.
if Prometheus realize importing it will be very-very useful!

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Feb 10, 2017

That's not something we will support. When we said bulk we mean bulk.

I'd recommend you look at the node exporter for your needs, it'll produce better stats than sar.

@svetasmirnova

This comment has been minimized.

Copy link

svetasmirnova commented Feb 11, 2017

+1

1 similar comment
@ssouris

This comment has been minimized.

Copy link

ssouris commented Feb 24, 2017

+1

@begakens

This comment has been minimized.

Copy link

begakens commented Feb 27, 2017

This would be excellent for my use case. I assumed it was already possible by adding a custom time stamp as outlined in the 'Exposition Formats' page but I've since realized it doesn't work as expected. I've had to move away from Prometheus for my current project because of this but would be very interested in returning to use it in the future if this feature was implemented.

@radiophysicist

This comment has been minimized.

Copy link

radiophysicist commented Mar 3, 2017

+for loading data based on server logs

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 3, 2017

For logs look at mtail or the grok exporter. This is not suitable for logs.

@radiophysicist

This comment has been minimized.

Copy link

radiophysicist commented Mar 4, 2017

I tried grok and gave it up due to its impossible to use actual timestamps from log data

@logemann

This comment has been minimized.

Copy link

logemann commented Sep 11, 2018

@parserpro thats nearly the same use case i have. We are thinking about bundling Prometheus/Grafana into our docker-compose product stack. For selling and demo reasons, its quite necessary to have demo data before any real data enters the system, which will be never the case on a demo notebook of a sales rep.

@narciero

This comment has been minimized.

Copy link

narciero commented Sep 19, 2018

whats a realistic ETA for this feature to make it into a release?

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Sep 19, 2018

@narciero
This PR prometheus/tsdb#370 is required to be merged for bulk import. But as this is not a small change in TSDB which has potential of breaking things, it would take some time to verify and test and iterate on possible improvements.

You can safely assume that it will be at least 1-2 months (including the time for deciding on design of bulk import).

@narciero

This comment has been minimized.

Copy link

narciero commented Sep 19, 2018

understood, thanks for the update!

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Nov 9, 2018

@codesome why do you need prometheus/tsdb#370 for the bulk import?
Quickly reading trough the comments this relates to bulk imports to a new Prometheus server without any data in it so it will import data in order(ordered timestamps and nothing in the past) which should be possible even with the current tsdb package.

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Nov 9, 2018

If we are allowing bulk import, I think we need to support for every case and not only for empty storage. We need prometheus/tsdb#370 to allow import of any time range.

But yes, valid point. We can do bulk import even with current tsdb packages, but we need to implement that part in prometheus/prometheus. I would like to do it after the above mentioned PR so that import is seamless.

@guvenim

This comment has been minimized.

Copy link

guvenim commented Dec 26, 2018

Hi All,

I am new to Prometheus. I have read @brian-brazil's article on safari and I thought this post might be a good place to ask my question.

I have some sensor data with timestamps and other features (location etc) and I would like to insert these data to Prometheus using Python API, then connect with Grafana to visualize. It might be overshooting, but since I already have Prometheus as a Docker container, I thought I can use it as a DB to store the data. Can I do it? or do you advise to set up another DB to store the data then connect with Grafana?

I saw @thypon answer but, unfortunately, I don't know Go.

Sincerely

Guven

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Dec 26, 2018

We use github for bug reports and feature requests so I suggest you move this to our user mailing list.

If you haven't looked already you might find your answer in the official docs and examples or by searching in the users or devs groups.
The #prometheus IRC channel is also a great place to mix up with the community and ask questions (don't be afraid to answers few while waiting).

@jomach

This comment has been minimized.

Copy link

jomach commented Feb 6, 2019

Any news on this ? We want to migrate data from opentsdb into prometheus and it would be nice to have a way to import old data

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 6, 2019

we are actively working on prometheus/tsdb#370 and once implemented in Prometheus you could take blocks from another Prometheus server just drop them in the data folder and it will all be handled when querying and blocks will be merged at the next automated compaction trigger.

No strict ETA, but it looks like we might be able to add this to 2.8 which should be in about a month.

@calebtote

This comment has been minimized.

Copy link

calebtote commented Feb 6, 2019

@krasi-georgiev Just to be clear on your comment, is the expectation that you have to import from another Prometheus instance (but we still couldn't script our own bulk imports with epoch:data from other tsdbs)?

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Feb 6, 2019

Yes, after prometheus/tsdb#370 is merged, I will be jumping directly into implementing bulk import.

@calebtote No, I think bulk import would support importing from non-Prometheus source too. The design has not been decided yet.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 6, 2019

aaah yeah I missed that part about the opentsdb. Don't think we have done anything in this direction yet.

@jomach

This comment has been minimized.

Copy link

jomach commented Feb 7, 2019

@codesome I'm with @calebtot here. In our use case we have metrics on the opentsdb server we want to migrate to opentsdb and don't loose metrics

@cesarcabral

This comment has been minimized.

Copy link

cesarcabral commented Feb 7, 2019

Hi I'm not sure if what I'm trying to do is supported or not yet.
In jenkins we use folders as teams/projects and subfolders as subprojects and then all the jobs are inside subfolders.
I have a python script which summarize the build status (SUCCESS, FAILURE,UNSTABLE), for all the jobs belonging to that team (root folder), that is done but now I want to also collect the timestamp of those metrics as I want to be able to see the builds status of each project selecting the period (year, monthly, weekly, daily).
Is that possible to do? I'm publishing all the metrics as gauge values.

This is an example of what I'm publishing for Prometheus:

HELP Jenkins_metrics_project_team01 Metrics by project

TYPE Jenkins_metrics_project_team01 gauge

Jenkins_metrics_project_team01{status="success"} 13.0
Jenkins_metrics_project_team01{status="failures"} 22.0
Jenkins_metrics_project_team01{status="unstable"} 10.0

HELP Jenkins_metrics_project_team02 Metrics by project

TYPE Jenkins_metrics_project_team02 gauge

Jenkins_metrics_project_team02{status="success"} 0.0
Jenkins_metrics_project_team02{status="failures"} 0.0
Jenkins_metrics_project_team02{status="unstable"} 0.0

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Feb 7, 2019

@cesarcabral Runs of batch jobs like that are usually tracked by encoding a Unix timestamp into the sample value (rather than the sample timestamp), see e.g. https://www.digitalocean.com/community/tutorials/how-to-query-prometheus-on-ubuntu-14-04-part-2#step-4-%E2%80%94-working-with-timestamp-metrics.

@cesarcabral

This comment has been minimized.

Copy link

cesarcabral commented Feb 15, 2019

@cesarcabral Runs of batch jobs like that are usually tracked by encoding a Unix timestamp into the sample value (rather than the sample timestamp), see e.g. https://www.digitalocean.com/community/tutorials/how-to-query-prometheus-on-ubuntu-14-04-part-2#step-4-%E2%80%94-working-with-timestamp-metrics.

Thank you Julius, nice articles by the way.

@valyala valyala referenced this issue Feb 28, 2019

Open

Import/Export #6

@roidelapluie

This comment has been minimized.

Copy link
Contributor

roidelapluie commented Mar 3, 2019

Note: prometheus/tsdb#370 is merged. and there is #5292 to include it in prometheus.

@codesome

This comment has been minimized.

Copy link
Member

codesome commented Mar 3, 2019

And with this project in GSoC, we can expect to have a better user-facing package for easy imports from other monitoring stacks.

@MarkusTeufelberger

This comment has been minimized.

Copy link

MarkusTeufelberger commented Apr 5, 2019

So after some student implements this feature this(?) summer, it is reasonable to expect something along the lines of a /api/v1/admin/tsdb/insert_samples API call that I can call with a mapping of values like {timestamp1: value1, timestamp2: value2, ...} a series name and a mapping of {tag_name1: tag_value1, tag_name2: tag_value2, ...} tags?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 5, 2019

No, I'd expect this to be more a command line thing as it's messing with blocks.

@MarkusTeufelberger

This comment has been minimized.

Copy link

MarkusTeufelberger commented Apr 5, 2019

Ok, then no API call... but in general the use case of "I have some timestamped values and want to insert these into Prometheus at a certain series + with the following tags, I do something (API call, command line call...) and then I have them available in Prometheus" is what this student should code? I'm asking this because "Package for bulk imports" sounds to me like building yet another building block to add support for bulk imports, yet still not enabling users to do bulk imports.

As a side note, I'd also like to point out that for 4 years the name of this issue is "Add API for bulk imports"...

@juliusv juliusv changed the title Add API for bulk imports Add mechanism to perform for bulk imports Apr 5, 2019

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Apr 5, 2019

@MarkusTeufelberger Good point, I renamed the issue to "Add mechanism to perform bulk imports".

@juliusv juliusv changed the title Add mechanism to perform for bulk imports Add mechanism to perform bulk imports Apr 5, 2019

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 5, 2019

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Apr 5, 2019

I imagine the cli will just create a new block which you can just add in the data dir of the Prometheus server where you want to bulk import.
The data will be available after a Prometheus restart. and after the first compaction the overlapping blocks would be merged by removing the duplicated data.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.