Buffer overflow occours when a large compressed file is imported to influx #8986

lets00 · 2017-10-19T18:00:59Z

Description

I needed export my data using influx_inspect program. I used this command to generate a output file on line protocol format.

influx_inspect export -database monasca -datadir monasca_influx/data/ -retention autogen -waldir monasca_influx/wal -out monasca.dmp -start 2017-10-08T00:00:00Z -end 2017-10-08T23:59:59Z -compress

This file has 950MB compressed, when I try to import it using influx command (influx -import -compressed -path=monasca.dmp) the error occours:

ERROR: reading standard input: bufio.Scanner: token too long

Bug report

System info:
Ubuntu 16.04
Influx 1.3.5 and Influx 1.1.0
Database full
24GB RAM

Steps to reproduce:

Export database to line protocol file in compress mode (the output file must have more than 600MB)
Try to import this database using influx import command

Expected behavior:

The line protocol series is added on influxdb.

Actual behavior:

Nothing is added on influxdb.

Additional info:

I look at the code (https://github.com/influxdata/influxdb/blob/master/importer/v8/importer.go#L95-L112), When the file is uncompressed, the final file size is probably larger than the available RAM in the system. Chunk the decompression process on smaller parts of the file maybe solves the problem (e.g. read 1024 bytes of file, decompress it, add on the db, and so on).

The text was updated successfully, but these errors were encountered:

e-dard · 2017-10-20T13:39:50Z

@lets00 this is probably our fault for the implementation we used to scan import files. I think you have one or more particularly long files in your import file.

If you're on Ubuntu can you try $ wc -L monasca.dmp and see how long the longest line is?

In order for the importer to accept this we'll need to fix the importer, or you'll need to remove the long lines from the input (which isn't particularly practical).

lets00 · 2017-10-20T14:10:08Z

@e-dard Thank you for answer me. I ran the command, the output is described below:

$ wc -L monasca.dmp
1678 monasca.dmp

The max line lenght in this file is 1678. The problem in remove the long lines is that, uncompressed, this file is extremely huge (1 GB becomes some like 200GB). And try to edit manually all long line is impractical.

e-dard · 2017-10-20T14:16:18Z

@lets00 is that the line length of the compressed file? I probably should have been more specific that it would be interesting to know the max size of a line in the uncompressed (line protocol) format. If the file is too big to handle then don't worry about that.

The other thing I noticed from your initial comment is that you didn't use the -compressed flag when doing the import. You need to use -compressed if you're importing compressed data.

lets00 · 2017-10-20T14:27:33Z

@e-dard I used -compressed when I tried import (I update the issuie add this information, I forgot to put it the issue). The Import command that I executed was:

$ influx -import -compressed -path=monasca.dmp

I will try to get this information (max line lenght) with uncompressed file.

lets00 · 2017-10-20T16:04:58Z

@e-dard More information about the uncompress file. The max line length of uncompress file is 411511.

$ wc -L monasca_uncompressed.dmp 
411511 monasca3.dmp

The compress file have 1678 (I did not specify before).

$ wc -L monasca_compressed.dmp
1678 monasca_compressed.dmp

e-dard · 2017-10-20T16:06:56Z

@lets00 OK, that's as I suspected then. bufio.Scanner can only handle lines <= 64K, and that line will be way over that. We can fix this with using a different reader from the std library.

lets00 · 2017-10-24T00:39:52Z

@e-dard I write a solution for my problem, replacing NewScanner to NewReader function which I found in importer/v8/importer.go code. I appreciate if you could revise my code and verify if this is a good solution.

e-dard added kind/bug area/cli proposed labels Oct 20, 2017

e-dard added this to the 1.4.0 milestone Oct 20, 2017

lets00 added a commit to lets00/influxdb that referenced this issue Oct 23, 2017

Change NewScanner to NewReader to fix bug influxdata#8986

03b5440

lets00 mentioned this issue Oct 23, 2017

Replace NewScanner to NewReader to fix bug #8986 #8999

Merged

4 tasks

rbetts added backlog/storage in progress and removed proposed labels Oct 24, 2017

e-dard closed this as completed in 905945e Oct 30, 2017

ghost removed the in progress label Oct 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer overflow occours when a large compressed file is imported to influx #8986

Buffer overflow occours when a large compressed file is imported to influx #8986

lets00 commented Oct 19, 2017 •

edited

Loading

e-dard commented Oct 20, 2017

lets00 commented Oct 20, 2017

e-dard commented Oct 20, 2017

lets00 commented Oct 20, 2017

lets00 commented Oct 20, 2017

e-dard commented Oct 20, 2017

lets00 commented Oct 24, 2017

Buffer overflow occours when a large compressed file is imported to influx #8986

Buffer overflow occours when a large compressed file is imported to influx #8986

Comments

lets00 commented Oct 19, 2017 • edited Loading

Description

Bug report

e-dard commented Oct 20, 2017

lets00 commented Oct 20, 2017

e-dard commented Oct 20, 2017

lets00 commented Oct 20, 2017

lets00 commented Oct 20, 2017

e-dard commented Oct 20, 2017

lets00 commented Oct 24, 2017

lets00 commented Oct 19, 2017 •

edited

Loading