Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caused by: java.io.IOException: Maximum buffer size 8388608 is not enough to read data #52

Closed
manishlogan opened this issue Feb 5, 2021 · 4 comments

Comments

@manishlogan
Copy link

Describe the bug
I am trying to read a file having size of approx 340 MBs. After I reach line number 809, I get this error:
Caused by: java.io.IOException: Maximum buffer size 8388608 is not enough to read data

To Reproduce
Try reading a csv with 800k rows.

Code:

CsvReader reader = CsvReader.builder()
            .fieldSeparator('\t')
            .quoteCharacter('"')
            .commentStrategy(CommentStrategy.NONE)
            .skipEmptyRows(true)
            .errorOnDifferentFieldCount(true)
            .build(path, charset);

reader.forEach(System.out::println);

Additional context
java version "1.8.0_201"

@osiegmar
Copy link
Owner

osiegmar commented Feb 5, 2021

Due to the streaming mechanism of FastCSV the file size (in terms of MB or lines) doesn't matter. This error occurs if a single field contains more than 8 megabytes which happens if the field has a starting quote but no ending quote. So please analyze the file at this line number.

@manishlogan
Copy link
Author

Hi @osiegmar ,

Thanks for the quick revert.

I will verify the data again, from the first look, that doesn't seem to be the case. There are double quotes but they also have their closing double-quotes. Examples below
{"countrySet":"GB%2CIE","limit":"1"} {"countrySet":"GB%2CIE","limit":"1"} {"countrySet":"GB%2CIE","limit":"1"}

@osiegmar
Copy link
Owner

osiegmar commented Feb 5, 2021

So the values of your CSV file are JSON documents? If a JSON string contains a double-quote JSON has a different way of escaping it (via backslash).

Your first field ({"countrySet":"GB%2CIE","limit":"1"}) would have to be
"{""countrySet"":""GB%2CIE"",""limit"":""1""}"
in the CSV file to be valid.

If the JSON itself contains a double-quote (e.g. {"countrySet":"double \" quote","limit":"1"}) it would have to be
"{""countrySet"":""double \"" quote"",""limit"":""1""}"
in the CSV file.

@manishlogan
Copy link
Author

Okay. Thanks for the detailed feedback. Really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants