Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer overflow occours when a large compressed file is imported to influx #8986

Closed
lets00 opened this issue Oct 19, 2017 · 7 comments
Closed

Comments

@lets00
Copy link
Contributor

lets00 commented Oct 19, 2017

Description

I needed export my data using influx_inspect program. I used this command to generate a output file on line protocol format.

influx_inspect export -database monasca -datadir monasca_influx/data/ -retention autogen -waldir monasca_influx/wal -out monasca.dmp -start 2017-10-08T00:00:00Z -end 2017-10-08T23:59:59Z -compress

This file has 950MB compressed, when I try to import it using influx command (influx -import -compressed -path=monasca.dmp) the error occours:

ERROR: reading standard input: bufio.Scanner: token too long

Bug report

System info:
Ubuntu 16.04
Influx 1.3.5 and Influx 1.1.0
Database full
24GB RAM

Steps to reproduce:

  1. Export database to line protocol file in compress mode (the output file must have more than 600MB)
  2. Try to import this database using influx import command

Expected behavior:

The line protocol series is added on influxdb.

Actual behavior:

Nothing is added on influxdb.

Additional info:

I look at the code (https://github.com/influxdata/influxdb/blob/master/importer/v8/importer.go#L95-L112), When the file is uncompressed, the final file size is probably larger than the available RAM in the system. Chunk the decompression process on smaller parts of the file maybe solves the problem (e.g. read 1024 bytes of file, decompress it, add on the db, and so on).

@e-dard
Copy link
Contributor

e-dard commented Oct 20, 2017

@lets00 this is probably our fault for the implementation we used to scan import files. I think you have one or more particularly long files in your import file.

If you're on Ubuntu can you try $ wc -L monasca.dmp and see how long the longest line is?

In order for the importer to accept this we'll need to fix the importer, or you'll need to remove the long lines from the input (which isn't particularly practical).

@lets00
Copy link
Contributor Author

lets00 commented Oct 20, 2017

@e-dard Thank you for answer me. I ran the command, the output is described below:

$ wc -L monasca.dmp
1678 monasca.dmp

The max line lenght in this file is 1678. The problem in remove the long lines is that, uncompressed, this file is extremely huge (1 GB becomes some like 200GB). And try to edit manually all long line is impractical.

@e-dard
Copy link
Contributor

e-dard commented Oct 20, 2017

@lets00 is that the line length of the compressed file? I probably should have been more specific that it would be interesting to know the max size of a line in the uncompressed (line protocol) format. If the file is too big to handle then don't worry about that.

The other thing I noticed from your initial comment is that you didn't use the -compressed flag when doing the import. You need to use -compressed if you're importing compressed data.

@lets00
Copy link
Contributor Author

lets00 commented Oct 20, 2017

@e-dard I used -compressed when I tried import (I update the issuie add this information, I forgot to put it the issue). The Import command that I executed was:

$ influx -import -compressed -path=monasca.dmp

I will try to get this information (max line lenght) with uncompressed file.

@lets00
Copy link
Contributor Author

lets00 commented Oct 20, 2017

@e-dard More information about the uncompress file. The max line length of uncompress file is 411511.

$ wc -L monasca_uncompressed.dmp 
411511 monasca3.dmp

The compress file have 1678 (I did not specify before).

$ wc -L monasca_compressed.dmp
1678 monasca_compressed.dmp

@e-dard
Copy link
Contributor

e-dard commented Oct 20, 2017

@lets00 OK, that's as I suspected then. bufio.Scanner can only handle lines <= 64K, and that line will be way over that. We can fix this with using a different reader from the std library.

@e-dard e-dard added this to the 1.4.0 milestone Oct 20, 2017
lets00 added a commit to lets00/influxdb that referenced this issue Oct 23, 2017
@lets00
Copy link
Contributor Author

lets00 commented Oct 24, 2017

@e-dard I write a solution for my problem, replacing NewScanner to NewReader function which I found in importer/v8/importer.go code. I appreciate if you could revise my code and verify if this is a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants