NegativeArraySizeException when reading data from binary table #131

vforchi · 2018-07-06T13:21:57Z

I have a big binary table (70 million rows), and when I use the following method:
fits.getHDU(1).getColumn(0)

I get the following exception:

java.lang.NegativeArraySizeException
at nom.tam.util.ArrayFuncs.newInstance(ArrayFuncs.java:480)
at nom.tam.fits.BinaryTable$ColumnDesc.newInstance(BinaryTable.java:152)
at nom.tam.fits.BinaryTable.createTable(BinaryTable.java:1306)
at nom.tam.fits.BinaryTable.getData(BinaryTable.java:645)
at nom.tam.fits.BinaryTable.ensureData(BinaryTable.java:1360)
at nom.tam.fits.BinaryTable.getColumn(BinaryTable.java:633)
at nom.tam.fits.TableHDU.getColumn(TableHDU.java:275)

This is because ArrayFuncs.newInstance takes an int for the dimension, and the second column is a string with 30 characters.

Note that the column I am trying to read contains integers, so it would fit in the max integer value: is there a way for me to read just one column without the library trying to read all the others?

The text was updated successfully, but these errors were encountered:

vforchi · 2018-12-19T12:32:00Z

Are there any news on this? I have a workaround, but it is considerably slower and more verbose than the other approach.

attipaci · 2021-09-24T19:14:18Z

I'd be happy to work with you on this if still interested, especially if you have specific solution (maybe even a pull request) in mind... Let me know...

vforchi · 2021-09-25T09:59:58Z

If I remember correctly the problem I'm afraid there is no simple solution:
ArrayFuncs.newInstance takes int because Array.newInstance takes int, so there is no way around that.
I think the problem was that char columns are mapped to char[] instead of String[], so there is a limitation that the number of rows times the size of a text column must be smaller than MAXINT.
This is in my opinion a pretty strong limitation, but it would probably require a lot of low level changes.

attipaci · 2021-09-25T15:04:15Z

I did take a peek last night, and getMemoryRow(int) does use Java arrays like you mention to locate the data -- so once the BinaryTable has been fully read, it would require a major rewrite on how we store and access column data. However, getFileRow() should readily handle larger values, since it repositions in the file to the element (which uses long) to read... I think there is rationale to that if you have more than 2^31 rows, you probably don't want to read them all at once in anyway... So, perhaps, we can support more rows by:

Changing getRow(int) to getRow(long)...
For tables with more than 2^31 rows, we don't even attempt to read the table into RAM...
getRow(long) to throw an ArrayIndexOutOfBoundsException whenever the requested row is larger than the number of rows (whether loaded in RAM or not)

What do you think?

vforchi · 2021-09-25T17:26:00Z

I don't think this would solve my problem: getColumn calls ensureData.
Also, I'm not asking to have more than 2^31 rows, my file had only 70 million, the problem is in how char columns are handled.

attipaci · 2021-09-26T02:03:10Z

I see. Maybe we can think about a better way to store column data, other than primitive java arrays. For example, I do think rather than using a raw java char[] array, column data should have its own object type (e.g. TableColumn class), which abstracts the underlying storage from the user API. Then we would have more freedom to tweak how that class stores data internally in a more optimal way... I think it's worth thinking about it. Maybe for a next release (1.16 or after...), since it would inevitably affect the API in more fundamental ways...

vforchi · 2021-09-26T08:18:35Z

That is certainly a good approach, but it might require a bigger refactoring. Strictly speaking I think we just need to replace the char[] with String[], but it might have performance issues while reading the table.

attipaci · 2021-09-26T17:29:03Z

That might work too. But, whatever we do, we'll be breaking backward compatibility for getColumn() to some degree. That means (a) we have to tread carefully, and (b) a definite bump in the version number, e.g. 1.16 earliest.

If we do that, I'd vote for the better more comprehensive fix, rather than the quick fix... If interested, you could go ahead and play with some ideas on branches of your fork. It would be nice if you tried some benchmarking too, to see how much performance is affected by the changes. It's unlikely the performance will change so much as to be a showstopper, but it would be good to know it anyway...

attipaci added the enhancement A new feature and/or an improved capability label Sep 24, 2021

attipaci mentioned this issue Sep 25, 2021

Limitation: table rows are described by Int and not Long #128

Closed

attipaci pinned this issue Sep 25, 2021

attipaci unpinned this issue Sep 26, 2021

attipaci pinned this issue Sep 26, 2021

attipaci unpinned this issue Oct 16, 2021

attipaci closed this as not planned Won't fix, can't repro, duplicate, stale Mar 28, 2023

attipaci mentioned this issue Jul 5, 2023

Complete overhaul BinaryTable and related classes. #442

Merged

attipaci added this to the 1.18.0 milestone Jul 5, 2023

attipaci self-assigned this Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NegativeArraySizeException when reading data from binary table #131

NegativeArraySizeException when reading data from binary table #131

vforchi commented Jul 6, 2018 •

edited

vforchi commented Dec 19, 2018

attipaci commented Sep 24, 2021

vforchi commented Sep 25, 2021

attipaci commented Sep 25, 2021

vforchi commented Sep 25, 2021 •

edited

attipaci commented Sep 26, 2021

vforchi commented Sep 26, 2021

attipaci commented Sep 26, 2021

NegativeArraySizeException when reading data from binary table #131

NegativeArraySizeException when reading data from binary table #131

Comments

vforchi commented Jul 6, 2018 • edited

vforchi commented Dec 19, 2018

attipaci commented Sep 24, 2021

vforchi commented Sep 25, 2021

attipaci commented Sep 25, 2021

vforchi commented Sep 25, 2021 • edited

attipaci commented Sep 26, 2021

vforchi commented Sep 26, 2021

attipaci commented Sep 26, 2021

vforchi commented Jul 6, 2018 •

edited

vforchi commented Sep 25, 2021 •

edited