Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read dataset larger than Integer.MAX_VALUE bytes #485

Open
Tracked by #4
jbellis opened this issue Jul 30, 2023 · 1 comment
Open
Tracked by #4

Unable to read dataset larger than Integer.MAX_VALUE bytes #485

jbellis opened this issue Jul 30, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@jbellis
Copy link

jbellis commented Jul 30, 2023

Exception: Failed to map data buffer for dataset '/train'
        at org.example.Texmex.lambda$main$3(Texmex.java:110)
        at java.base/java.lang.Thread.run(Thread.java:1623)
Caused by: io.jhdf.exceptions.HdfException: Failed to map data buffer for dataset '/train'
        at io.jhdf.dataset.ContiguousDatasetImpl.getDataBuffer(ContiguousDatasetImpl.java:44)
        at io.jhdf.dataset.DatasetBase.getData(DatasetBase.java:133)
        at org.example.Texmex.computeRecallFor(Texmex.java:70)
        at org.example.Texmex.lambda$main$3(Texmex.java:108)
        ... 1 more
Caused by: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
        at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1185)
        at io.jhdf.storage.HdfFileChannel.mapNoOffset(HdfFileChannel.java:74)
        at io.jhdf.storage.HdfFileChannel.map(HdfFileChannel.java:66)
        at io.jhdf.dataset.ContiguousDatasetImpl.getDataBuffer(ContiguousDatasetImpl.java:40)

The dataset in question is 3848008288 bytes (http://ann-benchmarks.com/deep-image-96-angular.hdf5).

@jamesmudd
Copy link
Owner

Thanks for raising this and providing a sample file. This is a limitation currently with contiguous datasets, it would be possible to split the mapping up and read contigious datasets more like chunked datasets are read. This in theory would be a nice way to parrelise the reading as well to gain performance.

In the meantime you could try using the slice reading. Using Dataset#getData(long[] sliceOffset, int[] sliceDimensions).

Some code like seems to work (it is definitly not optimal takes about 30 secs to read on my system)

public class ReadDataset {
	public static void main(String[] args) {
		try (HdfFile hdfFile = new HdfFile(Paths.get("/path/to/deep-image-96-angular.hdf5"))) {
			Dataset dataset = hdfFile.getDatasetByPath("/train");
			int[] dimensions = dataset.getDimensions();
			float[][] data = (float[][]) Array.newInstance(dataset.getJavaType(), dimensions);
			for (int i = 0; i < dimensions[0]; i++) {
				data[i] = ((float[][]) dataset.getData(new long[]{i, 0}, new int[]{1, dimensions[1]}))[0];
			}
			System.out.println("Finished read");
		}
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants