Missing data after using influx_tsm #5924

ignaskniz · 2016-03-07T14:37:17Z

Hey,
since the issue with some field values being set to 0 has been fixed in 0.10.2, I am trying to use influx_tsm to migrate data from b1 shards to tsm1 and am faced with a different issue:
Seems like only one field per point gets migrated. I am running Ubuntu.
Reproduced it by:
Installed InfluxDB version 0.9.6.1, set the storage engine to "b1" in influxdb.conf
inserted points with multiple fields:

> select * from test
name: test
----------
time            cpu memory  server
1457360018399629523 13  66  x
1457360027376224669 15  46  y
1457360041196291918 77  88  x

Where server is a tag, and cpu and memory are fields
I then installed InfluxDB 0.10.2 dpkg -i influxdb_0.10.2-1_amd64.deb
and ran influx_tsm with the following output:

sudo influx_tsm -backup ~/test_back /var/lib/influxdb/data

b1 and bz1 shard conversion.

Data directory is: /var/lib/influxdb/data
Backup directory is: /home/user/test_back
Databases specified: all
Database backups enabled: yes
Parallel mode enabled: no 1

Found 2 shards that will be converted.

Database Retention Path Engine Size
_internal monitor /var/lib/influxdb/data/_internal/monitor/1 b1 131072
servers default /var/lib/influxdb/data/servers/default/2 b1 65536

These shards will be converted. Proceed? y/N: y
Conversion starting....
Backing up 2 databases...
2016/03/07 14:20:35.578841 Backup of databse '_internal' started
2016/03/07 14:20:35.579562 Backing up file /var/lib/influxdb/data/_internal/monitor/1
2016/03/07 14:20:35.580313 Database _internal backed up (1.462204ms)
2016/03/07 14:20:35.582060 Backup of databse 'servers' started
2016/03/07 14:20:35.582425 Backing up file /var/lib/influxdb/data/servers/default/2
2016/03/07 14:20:35.584703 Database servers backed up (2.627228ms)
2016/03/07 14:20:35.584838 Starting conversion of shard: /var/lib/influxdb/data/_internal/monitor/1
2016/03/07 14:20:35.602970 Conversion of /var/lib/influxdb/data/_internal/monitor/1 successful (18.01142ms)
2016/03/07 14:20:35.603055 Starting conversion of shard: /var/lib/influxdb/data/servers/default/2
2016/03/07 14:20:35.604651 Conversion of /var/lib/influxdb/data/servers/default/2 successful (1.619587ms)

Summary statistics

Databases converted: 2
Shards converted: 2
TSM files created: 2
Points read: 105
Points written: 105
NaN filtered: 0
Inf filtered: 0
Points without fields filtered: 0
Disk usage pre-conversion (bytes): 196608
Disk usage post-conversion (bytes): 5392
Reduction factor: 97%
Bytes per TSM point: 51.35
Total conversion time: 26.098251ms

After the process I connected to the database and ran the command again:

> select * from test
name: test
----------
time            cpu memory  server
1457360018399629523 13      x
1457360027376224669     46  y
1457360041196291918 77      x

as can be seen two of the points lost memory field, and one lost cpu.
Is there something I can do to avoid this happening?

The text was updated successfully, but these errors were encountered:

joelegasse · 2016-03-07T20:05:23Z

Hi @ignaskniz, I can't seem to reproduce this locally following the instructions listed for upgrading your database engine, found here.

Reading your steps, it doesn't sound like you're starting up the new server binary and letting the write-ahead log (WAL) flush to the shards before running the conversion tool.

Can you confirm for me that you are starting up the new server before running the conversion tool? If you are, and are still seeing some fields being dropped, I may need some more detailed steps to be able to reproduce what you're seeing.

vladlopes · 2016-03-07T22:21:49Z

Hello @joelegasse, I experienced the same issue. Followed the steps linked by you and I am certain that the WAL had flushed everything.

For coincidence, I was examining the converter code trying to find out what could have happened, when I found this issue.

I think that I found the problem but I don't have a solution yet.

I will try my best to explain below what I think is the problem.

On the b1 engine, each "record" in the bolt series bucket have the following structure:
k, v ==> timestamp, byte array with the value of ALL fields

The b1/reader.go from the influx_tsm converter create a cursor for each field. And then iterate it retrieving the value for each field. The problem is that on each cursor iteration the reader should read ALL fields and not only the cursor field.

Example:

measurement A
fields field1[integer, offset 0], field2 [integer,offset 9]

Bolt bucket

timestamp1, [100000000200000000]
timestamp2, [300000000400000000]
timestamp3, [500000000600000000]
timestamp4, [700000000800000000]

The reader create two cursors, one for each field, and on each iteration it advances the cursor, resulting in the following reading:

- Reads timestamp1, field 1 value
- Reads timestamp2, field 2 value
- Reads timestamp3, field 1 value
- Reads timestamp4, field 2 value

I am still analysing what would be the best solution in this case, but thought about commenting here so someone with a better knowledge of the converter can take a look at it.

joelegasse · 2016-03-08T15:14:39Z

@vladlopes Thank you for the extra information, it turns out I had the wrong config selected, and actually was testing with a bz1 shard. Sorry about that @ignaskniz. I'll dig in and see what I can do to fix this.

Fixes #5924

joelegasse added area/tooling breaking change area/tsm b1 labels Mar 8, 2016

joelegasse self-assigned this Mar 8, 2016

joelegasse added a commit that referenced this issue Mar 8, 2016

Fix b1 conversion regression added in 0.10.2

af967a3

Fixes #5924

joelegasse mentioned this issue Mar 8, 2016

Fix b1 conversion regression added in 0.10.2 #5936

Merged

4 tasks

joelegasse closed this as completed in #5936 Mar 8, 2016

jwilder mentioned this issue Oct 26, 2016

influx_tsm TSM files created: 0 #7377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing data after using influx_tsm #5924

Missing data after using influx_tsm #5924

ignaskniz commented Mar 7, 2016

b1 and bz1 shard conversion.

Summary statistics

joelegasse commented Mar 7, 2016

vladlopes commented Mar 7, 2016

joelegasse commented Mar 8, 2016

Missing data after using influx_tsm #5924

Missing data after using influx_tsm #5924

Comments

ignaskniz commented Mar 7, 2016

b1 and bz1 shard conversion.

Summary statistics

joelegasse commented Mar 7, 2016

vladlopes commented Mar 7, 2016

joelegasse commented Mar 8, 2016