Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data after using influx_tsm #5924

Closed
ignaskniz opened this issue Mar 7, 2016 · 3 comments
Closed

Missing data after using influx_tsm #5924

ignaskniz opened this issue Mar 7, 2016 · 3 comments

Comments

@ignaskniz
Copy link

Hey,
since the issue with some field values being set to 0 has been fixed in 0.10.2, I am trying to use influx_tsm to migrate data from b1 shards to tsm1 and am faced with a different issue:
Seems like only one field per point gets migrated. I am running Ubuntu.
Reproduced it by:
Installed InfluxDB version 0.9.6.1, set the storage engine to "b1" in influxdb.conf
inserted points with multiple fields:

> select * from test
name: test
----------
time            cpu memory  server
1457360018399629523 13  66  x
1457360027376224669 15  46  y
1457360041196291918 77  88  x

Where server is a tag, and cpu and memory are fields
I then installed InfluxDB 0.10.2 dpkg -i influxdb_0.10.2-1_amd64.deb
and ran influx_tsm with the following output:

sudo influx_tsm -backup ~/test_back /var/lib/influxdb/data

b1 and bz1 shard conversion.

Data directory is: /var/lib/influxdb/data
Backup directory is: /home/user/test_back
Databases specified: all
Database backups enabled: yes
Parallel mode enabled: no 1

Found 2 shards that will be converted.

Database Retention Path Engine Size
_internal monitor /var/lib/influxdb/data/_internal/monitor/1 b1 131072
servers default /var/lib/influxdb/data/servers/default/2 b1 65536

These shards will be converted. Proceed? y/N: y
Conversion starting....
Backing up 2 databases...
2016/03/07 14:20:35.578841 Backup of databse '_internal' started
2016/03/07 14:20:35.579562 Backing up file /var/lib/influxdb/data/_internal/monitor/1
2016/03/07 14:20:35.580313 Database _internal backed up (1.462204ms)
2016/03/07 14:20:35.582060 Backup of databse 'servers' started
2016/03/07 14:20:35.582425 Backing up file /var/lib/influxdb/data/servers/default/2
2016/03/07 14:20:35.584703 Database servers backed up (2.627228ms)
2016/03/07 14:20:35.584838 Starting conversion of shard: /var/lib/influxdb/data/_internal/monitor/1
2016/03/07 14:20:35.602970 Conversion of /var/lib/influxdb/data/_internal/monitor/1 successful (18.01142ms)
2016/03/07 14:20:35.603055 Starting conversion of shard: /var/lib/influxdb/data/servers/default/2
2016/03/07 14:20:35.604651 Conversion of /var/lib/influxdb/data/servers/default/2 successful (1.619587ms)

Summary statistics

Databases converted: 2
Shards converted: 2
TSM files created: 2
Points read: 105
Points written: 105
NaN filtered: 0
Inf filtered: 0
Points without fields filtered: 0
Disk usage pre-conversion (bytes): 196608
Disk usage post-conversion (bytes): 5392
Reduction factor: 97%
Bytes per TSM point: 51.35
Total conversion time: 26.098251ms

After the process I connected to the database and ran the command again:

> select * from test
name: test
----------
time            cpu memory  server
1457360018399629523 13      x
1457360027376224669     46  y
1457360041196291918 77      x

as can be seen two of the points lost memory field, and one lost cpu.
Is there something I can do to avoid this happening?

@joelegasse
Copy link
Contributor

Hi @ignaskniz, I can't seem to reproduce this locally following the instructions listed for upgrading your database engine, found here.

Reading your steps, it doesn't sound like you're starting up the new server binary and letting the write-ahead log (WAL) flush to the shards before running the conversion tool.

Can you confirm for me that you are starting up the new server before running the conversion tool? If you are, and are still seeing some fields being dropped, I may need some more detailed steps to be able to reproduce what you're seeing.

@vladlopes
Copy link
Contributor

Hello @joelegasse, I experienced the same issue. Followed the steps linked by you and I am certain that the WAL had flushed everything.

For coincidence, I was examining the converter code trying to find out what could have happened, when I found this issue.

I think that I found the problem but I don't have a solution yet.

I will try my best to explain below what I think is the problem.

On the b1 engine, each "record" in the bolt series bucket have the following structure:
k, v ==> timestamp, byte array with the value of ALL fields

The b1/reader.go from the influx_tsm converter create a cursor for each field. And then iterate it retrieving the value for each field. The problem is that on each cursor iteration the reader should read ALL fields and not only the cursor field.

Example:

measurement A
fields field1[integer, offset 0], field2 [integer,offset 9]

Bolt bucket

timestamp1, [100000000200000000]
timestamp2, [300000000400000000]
timestamp3, [500000000600000000]
timestamp4, [700000000800000000]

The reader create two cursors, one for each field, and on each iteration it advances the cursor, resulting in the following reading:

- Reads timestamp1, field 1 value
- Reads timestamp2, field 2 value
- Reads timestamp3, field 1 value
- Reads timestamp4, field 2 value

I am still analysing what would be the best solution in this case, but thought about commenting here so someone with a better knowledge of the converter can take a look at it.

@joelegasse
Copy link
Contributor

@vladlopes Thank you for the extra information, it turns out I had the wrong config selected, and actually was testing with a bz1 shard. Sorry about that @ignaskniz. I'll dig in and see what I can do to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants