Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Insert new tags to existing values, like update #3904

Closed
mvadu opened this issue Aug 31, 2015 · 61 comments
Closed

[feature request] Insert new tags to existing values, like update #3904

mvadu opened this issue Aug 31, 2015 · 61 comments
Milestone

Comments

@mvadu
Copy link
Contributor

mvadu commented Aug 31, 2015

Can we have a query syntax which allows to insert/attach a new set of tags (along with exiting ones) to values/rows that are already part of a measurement?

My use case: I created a measurement from PerfMon log, which already has Host= tag. Now I want to categorize the data by applications, so I want to add tags like "App1=, App2=" assuming I can have two apps hosted on same server.

Then I want to be able to say Update <measurement name> add <tag name=value> where <some condition based on tags>

@beckettsean beckettsean added this to the Longer term milestone Aug 31, 2015
@airyland
Copy link

airyland commented Sep 1, 2015

+1

@beckettsean
Copy link
Contributor

This feature would require a fairly major architectural change to the database and is not something we will do this year, if ever. Closing the request for now.

@mvadu
Copy link
Contributor Author

mvadu commented Sep 2, 2015

Please consider this during your next architecture review. I think this would be required in many enterprise situations, where you make a mistake while inserting values, and realize it later. Having an option to update would be a life saver!

@ivanscattergood
Copy link

Hi,
I have been two specific use cases for this functionality

  1. Obfuscation of data for testing / demo purposes.
    As the data we deal with is Market data the Tags within the data may actually refer to a ticker and given the other attributes of the data you can work out trade information. I am currently working on a data set which I have been asked to Obfuscate for a demo. To achieve this I am currently loading the dataset into memory and writing it back in with new tags. (This would be simplified by the insert into functionality that has been proposed in Should be able to force recalculation of continuous query for given time interval #211)

  2. Change of the Tag value in the real world
    Imagine that one of the tags in your data referenced the Name of a department, if that department name changed it would be good to be able to change it historically, rather than having to have the two tag values running in tandem.
    In the world of Data Warehousing this is known as slowly changing dimensions:
    http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html

@rapport
Copy link

rapport commented Jun 17, 2016

+1 For us events are usually only recognising in the data following a reasoning step after capture. And, we want to tag data with its event once its recognised. I am considering using continuous queries to do some of this tagging after capture but it seems inefficient and I am not sure it will work for all our use cases

@srikara
Copy link

srikara commented Jun 17, 2016

+1

@mrh666
Copy link

mrh666 commented Jun 27, 2016

+1
This is will be surprisingly useful feature! Sometimes our developers sending metrics not following the namespace requirement, but following some pre-defined hardcoded application name structure (storm metrics collector for example) and those metrics have a lot of value for research.

@jarlwolfganger
Copy link

+1, if you guys aren't going to add this in some time soon could you at least add work around options to the current documentation since there was nothing mentioning appending tags to an existing measurement.

@ghost
Copy link

ghost commented Jul 24, 2016

+1 this is definitely needed in many real-world solutions as architectural changes might need additional tags.

@hgomez-sonarsource
Copy link

+1 to add this feature since as of today, there is no other options than to dump a measure in ASCII, hack ASCII file and send it back to a new measure via wireprotocol.

@hgomez-sonarsource
Copy link

For now, I updated my InfluxDB Fetcher tool (https://github.com/hgomez/influxdb) to transform fields in tags so I could reimport them in proper format

@jincejames
Copy link

+1

3 similar comments
@szll
Copy link

szll commented Oct 13, 2016

+1

@autumnw
Copy link

autumnw commented Oct 24, 2016

+1

@andreaaizza
Copy link

+1

@ryanmills
Copy link

+1 please provide ability to update existing tags

@geethanjalieswaran
Copy link

+1

@mvadu
Copy link
Contributor Author

mvadu commented Dec 13, 2016

@beckettsean Hi Sean, since closing this request quite few folks have expressed their interest and support for this feature. Do you guys want to reconsider this request?

@ryanmills
Copy link

ryanmills commented Dec 13, 2016

Additional use-cases:

  1. Adding a new tag to an existing schema -- It's crazy you can't do this?

  2. You might find that suddenly you need to GROUP BY an existing field value so you must then update the records to remove the value as a field and now add it as a tag

  3. Deletes are expensive so you could update a tag to mark the record as active/disabled instead of deleting it and include the status tag in queries.

@hcsaustrup
Copy link

+1

@hraftery
Copy link

hraftery commented Dec 23, 2016

Issue #828 concerns renaming databases, measurements, fields and tags. Some of that functionality is provided by INTO. Eg.
SELECT "old_field_key" AS "new_field_key" INTO new_db..new_measurement FROM old_db..old_measurement GROUP BY "old_tag_key"
renames database, measurement and field key, but cannot be extended to rename the tag key or update with new tags.

Given those limitations, it may be better to dump data, convert to line protocol, go nuts with sed to change/add keys, and then POST back.

@hgomez-sonarsource 's excellent InfluxDB Fetcher makes this quite quick and easy, but note:

  • It requires maven and java. Easy to install and use, but requires quite a few packages to be downloaded/built/tested/installed.
  • The invocation line is long and complex, but works just fine if you're careful.
  • The extra "data1,data2" arguments are for converting a tag to a field, so don't do anything if you don't have fields by those names. You can achieve the same in your sed script anyway.
  • You probably don't want the 'i' (integer) suffixes on your values because Influx numeric fields are floats by default, so make sure you strip them out in your sed script.

So you can do a simultaneous database rename, measurement rename, field rename, tag rename and tag addition with three chunky lines:

java -cp target/influxdb-fetcher-1.0.0-SNAPSHOT.jar com.github.hgomez.influxdb.InfluxDBFetcher http://127.0.0.1:8086 login password old_db "SELECT * from old_measurement GROUP BY *" > dump.wireproto
sed -i -e "s/old_measurement/new_measurement/;s/old_field_key/new_field_key/;s/old_tag_key/new_tag_key/;s/new_tag_key/additional_tag_key=additional_tag,new_tag_key/;s/i,/,/g;s/i / /" dump.wireproto 
curl -i -POST "http://127.0.0.1:8086/write?db=new_db" --data-binary @dump.wireproto

I used something very similar to convert 20000 points in a couple of seconds. YMMV.

@andyflury
Copy link

+1

@hraftery
Copy link

hraftery commented Jan 6, 2017

Just came across a possible improvement on my workaround posted above. Haven't investigated it, but influx_inspect export might be a replacement for InfluxDB Fetcher for this purpose?

@einhirn
Copy link

einhirn commented Jan 25, 2017

@hraftery Nice, but you can only use it on database level, not measurement level. Output will get very big, very soon, and I didn't manage to have it output to stdout (i.e. pipe.). Of course you can work around that with mkfifo etc. but still big output...

Of course you can add to the workaround by first "select"ing "into" a new db and so on 😁

@vikrammurugesan
Copy link

+1 this will be really useful feature

@mei-rune
Copy link
Contributor

mei-rune commented Mar 28, 2017

+1, this will be really useful feature while refact app

@samhatchett
Copy link

+1 : this could also be accomplished if SELECT INTO could alter or replace tag keys/values.

@JeffAbrahamson
Copy link

+1

@meesern
Copy link

meesern commented Aug 25, 2017

+1 to allow tagging measurements with post processed tags.

@amoondra19
Copy link

+1

@bkdonline
Copy link

@beckettsean Can this be reconsidered at this point? Its been two years since the thread was closed, and it might be more feasible architecturally to consider this now!

This will be extremely useful for maintaining data in medium to long term in an enterprise setting.

@gwijnja
Copy link

gwijnja commented Nov 1, 2017

+1
Just realized that a CQ by default drops all tags during while it downsamples data. The tags in the source table were always the same (network traffic for 1 host, 1 interface), so I never noticed the missing tags in the downsampled table. Now that I'm adding a second host, I realize that all existing downsampled entries need to have the tags added (the same for all entries) before I can add a new host. There's months of data in the table already...

Update:
Found a workaround here to add tags. If anyone has the same problem, then this may work for you as well:

  1. Drop the continuous query
> DROP CONTINUOUS QUERY cq_1m ON network
  1. Run the python script (I adjusted it a bit to my problem) to copy the data into an intermediate table, while adding some tags:
from influxdb import InfluxDBClient

client = InfluxDBClient('localhost', database='network')
db_data = client.query('select tx_bps, rx_bps from downsampled_traffic')
data_to_write = [
        {
            'measurement': 'intermediate',
            'tags': {'host': 'compass', 'if': 'eth0'},
            'time': d['time'],
            'fields': {'tx_bps': float(d['tx_bps']), 'rx_bps': float(d['rx_bps'])}
        }
        for d in db_data.get_points()
    ]
client.write_points(data_to_write)
  1. Drop the original measurement
> DROP MEASUREMENT downsampled_traffic
  1. Reload the intermediate measurement into the original measurement:
> SELECT * INTO downsampled_traffic FROM intermediate GROUP BY *
  1. Now the downsampled table contains the host and interface tags I needed:
> select * from downsampled_traffic limit 3
name: downsampled_traffic
time                host    if   rx_bps        tx_bps
----                ----    --   ------        ------
1504201620000000000 compass eth0 12962.4666667 13043
1504201740000000000 compass eth0 107997.333333 125236.466667
1504201800000000000 compass eth0 50329.7333333 51249.3333333
  1. Drop the intermediate measurement:
> DROP MEASUREMENT intermediate
  1. Recreate the continuous query, now including a GROUP BY for the host and interface (if) tags:
> CREATE CONTINUOUS QUERY cq_1m ON network BEGIN SELECT 8 * derivative(mean(rx), 1s) AS rx_bps, 8 * derivative(mean(tx), 1s) AS tx_bps INTO network.autogen.downsampled_traffic FROM network.autogen.traffic GROUP BY time(1m),host,if END

@gylu
Copy link

gylu commented Nov 6, 2017

+9000
Having to loop through every single data point to insert or rename a tag is saddening and extremely resource consuming

@UVk
Copy link

UVk commented Nov 17, 2017

+1

@rvolosatovs
Copy link

Encountered this issue on a production system. We have around a million entries every day, the current dataset has been accumulating for half a year.
I was going to set up some additional continuous queries, but turned out that is not possible, as some of the values in measurements, which I wanted to GROUP BY are stored as fields and there is not way to group by "field" types and also no way to convert the "field" types to "tag" types.

Work has been started on a simple utility to convert fields to tags in the wire protocol representation.

go get -u github.com/rvolosatovs/influx-taggify

Example usage:

influx_inspect export -database "$db" -datadir "$datadir" -waldir "$waldir" -out /tmp/influx-export
# Delete the measurements you don't need to convert using `sed`/`perl` (i.e. `perl -in -e 'print unless m/^unrelated_measurement.*/' /tmp/influx-export`)
influx-taggify -in /tmp/influx-export -out /tmp/influx-export-tagged fieldFoo fieldBar
# Drop the old database or edit generated file to change the name of the database
influx -import -path /tmp/influx-export-tagged

It worked for my use case, but your mileage may vary.
Try locally on non-critical setup first!
Feel free to try, report issues and contribute! :)

@YEMEAC
Copy link

YEMEAC commented May 4, 2018

+1 @2018

@jheusser
Copy link
Contributor

+1

5 similar comments
@javiergarciad
Copy link

+1

@hor1z0nx
Copy link

hor1z0nx commented Jun 5, 2018

+1

@longit644
Copy link

+1

@kikohs
Copy link

kikohs commented Jul 19, 2018

+1

@galindro
Copy link

galindro commented Sep 5, 2018

+1

@hraftery
Copy link

For what it's worth I've revisited this 2 years later and explored all alternative options for exporting/importing data. To save others the hours of frustration, it seems my workaround above is still the easiest way to achieve this and all manner of related export/import tasks.

Lesser alternatives:

  • The influx_inspect tool is complicated, requires sudo, relies on TSM files and dumps the whole db.
  • The HTTP API (eg. via curl) is great, but it produces JSON that can't be read by influx!
  • The influx tool's - execute flag is promising, but even csv2influx.py can't read the csv generated.

@samhatchett
Copy link

We were trying to solve this problem for a customer - where data may have been inserted incorrectly and we then needed to go back and re-tag a series (or several series). We ended up creating a little node-based migration service. It's solving our problem for now, and our customer has a more friendly means of managing their data. May not be suitable for all cases. You can find it here.

@steverweber
Copy link

steverweber commented Jan 28, 2019

+1 : this could also be accomplished if SELECT INTO could alter or replace tag keys/values.

this would be nice in CONTINUOUS QUERY... because the data is being transformed and new tags would be the most logical.

example:

we have a few lab room and we use a CQ to find how busy each room is... The rooms are not tagged but are inferred from the host name.
It be ideal if we could tag the room like mc3006... but because of influxdb current limitations we are using a new metric for each room.

SELECT sum("value") INTO "thinclient_active_mc3006" FROM "thinclient_active" WHERE ("host" =~ /3006/) GROUP BY time(1m, -5m)
SELECT sum("value") INTO "thinclient_active_mc3007" FROM "thinclient_active" WHERE ("host" =~ /3007/) GROUP BY time(1m, -5m)
SELECT sum("value") INTO "thinclient_active_mc3008" FROM "thinclient_active" WHERE ("host" =~ /3008/) GROUP BY time(1m, -5m)

something like the following would would be a nice improvement!

SELECT sum("value"), room::tag=mc3008 INTO "thinclient_active_room" FROM "thinclient_active" WHERE ("host" =~ /3008/) GROUP BY time(1m, -5m), room

@steverweber
Copy link

steverweber commented Jan 28, 2019

@hraftery please reconsider this feature request.

@hraftery
Copy link

@steverweber I'm just a user - you probably want to direct your request to a contributor. But I wouldn't hold your breath. This issue was closed back in 2015. There are at least three workarounds described in this thread, but I agree, something along the lines of a SELECT INTO would be far better.

@newskooler
Copy link

I sincerely with that this would be reconsidered. It's a must-have :)

@kaszperro
Copy link

Are there any plans to implement this feature in influxdb 2.0?

@kavinbright
Copy link

+1

@amotl
Copy link

amotl commented Feb 1, 2021

Hi there,

following up on the comment by @hraftery at #3904 (comment), everything about @hgomez' excellent InfluxDB Fetcher still holds true, but we just improved the convenience of installation.

You don't have to build it yourself from now on but instead you can conveniently download a ready-made influxdb-fetcher-1.0.2.jar to your workstation. The new setup instructions at [1] outline how to easily install a wrapper program which will download the .jar file automatically on its first invocation.

With kind regards,
Andreas.

[1] https://github.com/hgomez/influxdb#setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests