-
Notifications
You must be signed in to change notification settings - Fork 524
-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak when Slicing Dataset #1176
Comments
Hi @colehurwitz31! |
I had about 70 GB of RAM (I am was working on a remote server with plenty of space). When I did the slicing procedure, it quickly increased from 5 GB used all the way up to 70GB before I terminated the program. |
What is the chunking on that dataset? Can you pull the whole data set up successfully? |
I can pull up the whole dataset successfully. I am not sure what the chunking is though. |
RE chunking see https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html print(dataset['3BData/Raw'][0:1000:2]) # fails
print(dataset['3BData/Raw'][0:1000][::2]) # works? My guess is something is wrong in Line 477 in 29605d2
@colehurwitz31 Can you provide a script to generate a file that fails is this way (random data should be fine). This is super weird and troubling... |
I'm trying to imagine what could make one work and the other one not. The only thing that comes to my mind is that in the second case one reads a continuous buffer and takes one element out of two, while in the first case one may be forced to allocate a destination buffer and copy element by element to that buffer. If to read each element one has to read a big chunk of data instead of just reading one element that could explain some huge memory usage. |
This reproduces the issue on my windows machine. The code has to be run twice. Of course I would never do such things, it was just to reproduce the problem
|
I have a story for this:
Not clear if the error is on the h5py side or the hdf5 side. |
Thanks for making a script to reproduce the error @vasole and thanks @tacaswell for looking into it. Hopefully a good solution is found! |
Apparently, the problem is in HDF5 library; reported upstream. |
That is good to know! Is there a place where I can follow updates for this library? Thanks for the help! |
@colehurwitz31 I have sent report to https://forum.hdfgroup.org but it's currently down so I can't get the link... |
Hello, H5Pyers, Looks like the FORUM is down and I reported it to our sysadmin. It is little bit early here ;-) Just one comment: please make sure that you close all handles. It is a known issue with HDF5 hyperslab selection code that internal data structures are growing and are not released until the library is closed. Please send us a C example that reproduces the issue. Thank you! |
@epourmal Good morning! :-) #include <stdio.h>
#include <stdlib.h>
#include "hdf5.h"
int main() {
const hsize_t size = 100000000L;
hid_t fapl, file, dataset, dcpl, memspace, dataspace;
herr_t status;
hsize_t start = 0;
hsize_t stride = 2;
hsize_t halfsize = size/2;
float *data;
printf("Creating data file...");
file = H5Fcreate("dummy.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
dataspace = H5Screate_simple(1, &size, NULL);
dcpl = H5Pcreate(H5P_DATASET_CREATE);
status = H5Pset_chunk(dcpl, 1, &size);
status = H5Pset_deflate(dcpl, 3);
dataset = H5Dcreate2(file, "data", H5T_INTEL_F32, dataspace,
H5P_DEFAULT, dcpl, H5P_DEFAULT);
status = H5Sclose(dataspace);
status = H5Pclose(dcpl);
data = malloc(sizeof(float)*size);
for(hsize_t i=0; i<size; i++)
data[i] = (float)rand()/(float)(RAND_MAX/1.);
memspace = H5Screate_simple(1, &size, NULL);
dataspace = H5Dget_space(dataset);
status = H5Dwrite(dataset, H5T_NATIVE_FLOAT, memspace, dataspace, H5P_DEFAULT, data);
status = H5Sclose(dataspace);
status = H5Sclose(memspace);
free(data);
status = H5Dclose(dataset);
status = H5Fclose(file);
printf(" done.\n");
printf("Loading data...");
file = H5Fopen("dummy.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
dataset = H5Dopen2(file, "data", H5P_DEFAULT);
data = malloc(sizeof(float)*halfsize);
memspace = H5Screate_simple(1, &halfsize, NULL);
dataspace = H5Dget_space(dataset);
status = H5Sselect_hyperslab(dataspace, H5S_SELECT_SET, &start, &stride, &halfsize, NULL);
status = H5Dread(dataset, H5T_NATIVE_FLOAT, memspace, dataspace, H5P_DEFAULT, data);
status = H5Sclose(dataspace);
status = H5Sclose(memspace);
free(data);
status = H5Dclose(dataset);
status = H5Fclose(file);
printf(" done.\n");
} |
Thank you! Forwarded to our Helpdesk. Which version of HDF5 are you using? Did you try the same program without compression? Which version of zlib is used? More detailed information will help. Elena |
@epourmal Reproducible for me on Windows, HDF5 1.10.4. |
Please track upstream report at |
I'm going to close this as it is an upstream bug. |
Interesting but doesn't make sense :-) We will investigate. |
To whom it may concern,
I recently ran into an issue where when I tried to slice a HDF5 dataset (similar to how I would slice a numpy array), my RAM kept filling up until I had to kill the program.
Here are the commands I entered in:
I basically tried to slice out every other element in the dataset, but this function never completed and it filled my entire RAM. Here are the dataset details:
Here are the specifications I am using:
python -c 'import h5py; print(h5py.version.info)'
h5py 2.8.0
HDF5 1.10.2
Python 3.7.1 (default, Oct 23 2018, 19:19:42)
[GCC 7.3.0]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.15.3
The text was updated successfully, but these errors were encountered: