Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd mechanism to perform bulk imports #535
Comments
juliusv
added
the
feature-request
label
Feb 17, 2015
juliusv
changed the title
Add feature for bulk imports
Add API for bulk imports
Feb 17, 2015
This comment has been minimized.
This comment has been minimized.
|
Just as a random data point, I have several (I think valid) use cases for bulk imports. |
beorn7
referenced this issue
Jan 14, 2016
Closed
Note the required ordering of lines with timestamps #288
This comment has been minimized.
This comment has been minimized.
grypyrg
commented
Jan 19, 2016
|
+1 |
This comment has been minimized.
This comment has been minimized.
foic
commented
Feb 14, 2016
|
This would be very useful for me as well. I understand that this could be used to blur the line between a true time-series event store and prometheus' communicated focus as a representation of recent monitoring state. Its beneficial In the case where prometheus is the ingestion point for a fan out system involving influxdb or federated rollup prometheus nodes - this would allow me to just simply keep pumping all data through the prometheus entry points without having to have two input paths in the case where the data feed is delayed. |
This comment has been minimized.
This comment has been minimized.
|
@foic I can't think of a sane Prometheus setup where that'd be useful. If you want to pump data to influxdb, it'd be best to do it without involving Prometheus as Prometheus is adding no value in such a setup. Similarly rollup depends on continuous pulling, and delayed data can't work with that. I'd generally adivse to run duplicates, and live with the odd gap. It's not worth the effort to try and get it perfect. |
This comment has been minimized.
This comment has been minimized.
foic
commented
Feb 15, 2016
|
Thanks @brian-brazil - this is pretty much an expected response :-) Sounds like there is Should this feature request be closed then if working with historical data On 15 February 2016 at 10:02, Brian Brazil notifications@github.com wrote:
|
This comment has been minimized.
This comment has been minimized.
|
@foic What you're requesting is different to what this feature request is mainly about. |
This comment has been minimized.
This comment has been minimized.
Baughn
commented
Feb 15, 2016
|
There's some value to bulk import, even in a world where 'storage' isn't the intended purpose of Prometheus. For example... Recently I've been working on a Prometheus configuration for a certain forum. Although some of the metrics are from PHP, most of the really useful ones are being exported by an nginx logtailer I wrote. In order to quickly iterate on possible metrics, prior to putting it in production--that hasn't happened yet--I added code to the logtailer that lets it read logs with a time-offset, pausing between each record until it's "supposed" to happen. That's okay-ish, but it'd be much nicer if I could bulk import an entire day's worth of logs at once without actually waiting a day. Then I could look at the result, clear the DB, and try again. There's the timestamp hack, but none of the client libraries support timestamps, and it's ugly anyway. I haven't tried to use it. |
This comment has been minimized.
This comment has been minimized.
jinxcat
commented
Feb 15, 2016
|
@Baughn what do you mean by "timestamp hack"? I have use for a bulk import endpoint as well, and that's for back-filling data that was interrupted/unavailable on the normal time flow. Overall, I feel it might be somewhat on the border of what is the intended model of prometheus, but there will always be people with the need to diverge from the ideal setup or situation. |
This comment has been minimized.
This comment has been minimized.
That's also not what this issue is about. This issue covers brand new data, with nothing newer than it in the database. It's also not backfilling data, which is when there's nothing older than it in the database. We've never even discussed this variant. |
This comment has been minimized.
This comment has been minimized.
Baughn
commented
Feb 15, 2016
The /metrics format allows specifying a timestamp in addition to the values. None of the clients support this, and Prometheus doesn't support adding values that are any older than than the newest one. There's a list of caveats as long as your arm, starting with the impossibility of reliably doing this with multiple tasks exporting metrics, but in theory it should be possible to use timestamps to simulate fast-forwarding through historical data, which would cover my specific scenario. I've never tried it, though. |
fabxc
added
kind/enhancement
and removed
feature request
labels
Apr 28, 2016
This comment has been minimized.
This comment has been minimized.
|
Per post-Promcon discussions, the consensus was to have a API that can take in a time series at a time. |
This comment has been minimized.
This comment has been minimized.
delgod
commented
Feb 10, 2017
|
@brian-brazil thanks! Currently it is need ask user to run systat every minute, after that ask him to dump sar results to file and send it. after that analyze results manually or via kSar tool. |
This comment has been minimized.
This comment has been minimized.
|
That's not something we will support. When we said bulk we mean bulk. I'd recommend you look at the node exporter for your needs, it'll produce better stats than sar. |
This comment has been minimized.
This comment has been minimized.
svetasmirnova
commented
Feb 11, 2017
|
+1 |
1 similar comment
This comment has been minimized.
This comment has been minimized.
ssouris
commented
Feb 24, 2017
|
+1 |
This comment has been minimized.
This comment has been minimized.
begakens
commented
Feb 27, 2017
|
This would be excellent for my use case. I assumed it was already possible by adding a custom time stamp as outlined in the 'Exposition Formats' page but I've since realized it doesn't work as expected. I've had to move away from Prometheus for my current project because of this but would be very interested in returning to use it in the future if this feature was implemented. |
This comment has been minimized.
This comment has been minimized.
radiophysicist
commented
Mar 3, 2017
•
|
+for loading data based on server logs |
This comment has been minimized.
This comment has been minimized.
|
For logs look at mtail or the grok exporter. This is not suitable for logs. |
This comment has been minimized.
This comment has been minimized.
radiophysicist
commented
Mar 4, 2017
|
I tried grok and gave it up due to its impossible to use actual timestamps from log data |
This comment has been minimized.
This comment has been minimized.
logemann
commented
Sep 11, 2018
|
@parserpro thats nearly the same use case i have. We are thinking about bundling Prometheus/Grafana into our docker-compose product stack. For selling and demo reasons, its quite necessary to have demo data before any real data enters the system, which will be never the case on a demo notebook of a sales rep. |
This comment has been minimized.
This comment has been minimized.
narciero
commented
Sep 19, 2018
|
whats a realistic ETA for this feature to make it into a release? |
This comment has been minimized.
This comment has been minimized.
|
@narciero You can safely assume that it will be at least 1-2 months (including the time for deciding on design of bulk import). |
This comment has been minimized.
This comment has been minimized.
narciero
commented
Sep 19, 2018
|
understood, thanks for the update! |
This comment has been minimized.
This comment has been minimized.
|
@codesome why do you need prometheus/tsdb#370 for the bulk import? |
This comment has been minimized.
This comment has been minimized.
|
If we are allowing bulk import, I think we need to support for every case and not only for empty storage. We need prometheus/tsdb#370 to allow import of any time range. But yes, valid point. We can do bulk import even with current tsdb packages, but we need to implement that part in prometheus/prometheus. I would like to do it after the above mentioned PR so that import is seamless. |
This comment has been minimized.
This comment has been minimized.
guvenim
commented
Dec 26, 2018
|
Hi All, I am new to Prometheus. I have read @brian-brazil's article on safari and I thought this post might be a good place to ask my question. I have some sensor data with timestamps and other features (location etc) and I would like to insert these data to Prometheus using Python API, then connect with Grafana to visualize. It might be overshooting, but since I already have Prometheus as a Docker container, I thought I can use it as a DB to store the data. Can I do it? or do you advise to set up another DB to store the data then connect with Grafana? I saw @thypon answer but, unfortunately, I don't know Go. Sincerely Guven |
This comment has been minimized.
This comment has been minimized.
|
We use github for bug reports and feature requests so I suggest you move this to our user mailing list. If you haven't looked already you might find your answer in the official docs and examples or by searching in the users or devs groups. |
This comment has been minimized.
This comment has been minimized.
jomach
commented
Feb 6, 2019
|
Any news on this ? We want to migrate data from opentsdb into prometheus and it would be nice to have a way to import old data |
This comment has been minimized.
This comment has been minimized.
|
we are actively working on prometheus/tsdb#370 and once implemented in Prometheus you could take blocks from another Prometheus server just drop them in the data folder and it will all be handled when querying and blocks will be merged at the next automated compaction trigger. No strict ETA, but it looks like we might be able to add this to 2.8 which should be in about a month. |
This comment has been minimized.
This comment has been minimized.
calebtote
commented
Feb 6, 2019
•
|
@krasi-georgiev Just to be clear on your comment, is the expectation that you have to import from another Prometheus instance (but we still couldn't script our own bulk imports with epoch:data from other tsdbs)? |
This comment has been minimized.
This comment has been minimized.
|
Yes, after prometheus/tsdb#370 is merged, I will be jumping directly into implementing bulk import. @calebtote No, I think bulk import would support importing from non-Prometheus source too. The design has not been decided yet. |
This comment has been minimized.
This comment has been minimized.
|
aaah yeah I missed that part about the opentsdb. Don't think we have done anything in this direction yet. |
This comment has been minimized.
This comment has been minimized.
jomach
commented
Feb 7, 2019
|
@codesome I'm with @calebtot here. In our use case we have metrics on the opentsdb server we want to migrate to opentsdb and don't loose metrics |
This comment has been minimized.
This comment has been minimized.
cesarcabral
commented
Feb 7, 2019
|
Hi I'm not sure if what I'm trying to do is supported or not yet. This is an example of what I'm publishing for Prometheus: HELP Jenkins_metrics_project_team01 Metrics by projectTYPE Jenkins_metrics_project_team01 gaugeJenkins_metrics_project_team01{status="success"} 13.0 HELP Jenkins_metrics_project_team02 Metrics by projectTYPE Jenkins_metrics_project_team02 gaugeJenkins_metrics_project_team02{status="success"} 0.0 |
This comment has been minimized.
This comment has been minimized.
|
@cesarcabral Runs of batch jobs like that are usually tracked by encoding a Unix timestamp into the sample value (rather than the sample timestamp), see e.g. https://www.digitalocean.com/community/tutorials/how-to-query-prometheus-on-ubuntu-14-04-part-2#step-4-%E2%80%94-working-with-timestamp-metrics. |
This comment has been minimized.
This comment has been minimized.
cesarcabral
commented
Feb 15, 2019
Thank you Julius, nice articles by the way. |
This comment has been minimized.
This comment has been minimized.
|
Note: prometheus/tsdb#370 is merged. and there is #5292 to include it in prometheus. |
This comment has been minimized.
This comment has been minimized.
|
And with this project in GSoC, we can expect to have a better user-facing package for easy imports from other monitoring stacks. |
This comment has been minimized.
This comment has been minimized.
MarkusTeufelberger
commented
Apr 5, 2019
|
So after some student implements this feature this(?) summer, it is reasonable to expect something along the lines of a |
This comment has been minimized.
This comment has been minimized.
|
No, I'd expect this to be more a command line thing as it's messing with blocks. |
This comment has been minimized.
This comment has been minimized.
MarkusTeufelberger
commented
Apr 5, 2019
|
Ok, then no API call... but in general the use case of "I have some timestamped values and want to insert these into Prometheus at a certain series + with the following tags, I do something (API call, command line call...) and then I have them available in Prometheus" is what this student should code? I'm asking this because "Package for bulk imports" sounds to me like building yet another building block to add support for bulk imports, yet still not enabling users to do bulk imports. As a side note, I'd also like to point out that for 4 years the name of this issue is "Add API for bulk imports"... |
juliusv
changed the title
Add API for bulk imports
Add mechanism to perform for bulk imports
Apr 5, 2019
This comment has been minimized.
This comment has been minimized.
|
@MarkusTeufelberger Good point, I renamed the issue to "Add mechanism to perform bulk imports". |
juliusv
changed the title
Add mechanism to perform for bulk imports
Add mechanism to perform bulk imports
Apr 5, 2019
This comment has been minimized.
This comment has been minimized.
|
The goal is to allow for bulk imports, not to change Prometheus into a
push-based system. The implementation will likely not work with data in the
past few hours as they won't be on blocks yet.
…On Fri 5 Apr 2019, 13:26 MarkusTeufelberger, ***@***.***> wrote:
Ok, then no API call... but in general the use case of "I have some
timestamped values and want to insert these into Prometheus at a certain
series + with the following tags, I do something (API call, command line
call...) and then I have them available in Prometheus" is what this student
should code? I'm asking this because "Package for bulk imports" sounds to
me like building yet another building block to add support for bulk
imports, yet still not enabling users to do bulk imports.
As a side note, I'd also like to point out that for 4 years the name of
this issue is "Add *API* for bulk imports"...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#535 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyTdjEcX6sQXSVBo4XcxpzsERsy1iBJks5vd0D1gaJpZM4Dhxb4>
.
|
This comment has been minimized.
This comment has been minimized.
|
I imagine the cli will just create a new block which you can just add in the data dir of the Prometheus server where you want to bulk import. |
This comment has been minimized.
This comment has been minimized.
|
I presume we'll trigger a reload explicitly, rather than wait for one.
…On Fri 5 Apr 2019, 13:56 Krasi Georgiev, ***@***.***> wrote:
I imagine the cli will just dump a new block which you can just add it in
the data dir of the Prometheus server where you want to bulk import.
The data will be available after a Prometheus restart. and after the first
compaction the overlapping blocks would be merged by removing the
duplicated data.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#535 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyTdnMRcDcUFpyUMhHcEpS1dDTZHee8ks5vd0gQgaJpZM4Dhxb4>
.
|
juliusv commentedFeb 17, 2015
•
edited
Currently the only way to bulk-import data is a hacky one involving client-side timestamps and scrapes with multiple samples per time series. We should offer an API for bulk import. This relies on #481.
EDIT: It probably won't be an web-based API in Prometheus, but a command-line tool.