Memory Leak when Slicing Dataset #1176

colehurwitz · 2019-02-14T21:22:44Z

To whom it may concern,

I recently ran into an issue where when I tried to slice a HDF5 dataset (similar to how I would slice a numpy array), my RAM kept filling up until I had to kill the program.

Here are the commands I entered in:

dataset = h5py.File(dataset_directory + recording_name)
print(dataset['3BData/Raw'][0:1000:2])

I basically tried to slice out every other element in the dataset, but this function never completed and it filled my entire RAM. Here are the dataset details:

<HDF5 dataset "Raw": shape (3224391680,), type "<u2">

Here are the specifications I am using:

python -c 'import h5py; print(h5py.version.info)'
h5py 2.8.0
HDF5 1.10.2
Python 3.7.1 (default, Oct 23 2018, 19:19:42)
[GCC 7.3.0]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.15.3

The text was updated successfully, but these errors were encountered:

aparamon · 2019-02-15T13:47:34Z

Hi @colehurwitz31!
How much RAM do you have on your instance?
Your slice (every other element in the dataset) needs ~3GB RAM.

colehurwitz · 2019-02-15T14:41:26Z

I had about 70 GB of RAM (I am was working on a remote server with plenty of space). When I did the slicing procedure, it quickly increased from 5 GB used all the way up to 70GB before I terminated the program.

tacaswell · 2019-02-16T04:24:24Z

What is the chunking on that dataset?

Can you pull the whole data set up successfully?

colehurwitz · 2019-02-16T16:02:12Z

I can pull up the whole dataset successfully. I am not sure what the chunking is though.

tacaswell · 2019-02-17T22:28:29Z

RE chunking see https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html

print(dataset['3BData/Raw'][0:1000:2]) # fails
print(dataset['3BData/Raw'][0:1000][::2])  # works?

My guess is something is wrong in

h5py/h5py/_hl/dataset.py

Line 477 in 29605d2

def __getitem__(self, args):

@colehurwitz31 Can you provide a script to generate a file that fails is this way (random data should be fine).

This is super weird and troubling...

vasole · 2019-02-17T23:00:53Z

I'm trying to imagine what could make one work and the other one not.

The only thing that comes to my mind is that in the second case one reads a continuous buffer and takes one element out of two, while in the first case one may be forced to allocate a destination buffer and copy element by element to that buffer. If to read each element one has to read a big chunk of data instead of just reading one element that could explain some huge memory usage.

vasole · 2019-02-17T23:28:47Z

This reproduces the issue on my windows machine. The code has to be run twice.

Of course I would never do such things, it was just to reproduce the problem

import os
import h5py
import numpy

fname = "dummy.h5"
length = 100000000
if not os.path.exists(fname):
    data = numpy.random.random(length)
    h5 = h5py.File(fname,"w")
    dset = h5.create_dataset("data", shape=(length,),
                      dtype=numpy.float,
                      chunks=(length,),
                      compression="gzip")
    dset[:] = data
    h5.close()
    data = None
    print("GENERATED")
else:
    h5 = h5py.File(fname,"r")
    print("READING SAFE")
    print(h5["/data"][0:length][::2])
    print("READING UNSAFE")
    print(h5["/data"][0:length:2])
    h5.close()

tacaswell · 2019-02-18T03:19:55Z

I have a story for this:

in the "read it all" case, we pull up the one chunk from the file, copy it to the waiting numpy buffer and then use numpy striding to get back a view of every-other.
in the slice-at-dataset-level case we are constructing an hdf5 selector that is going to walk through an fill in just the data we need
- we may not be constructing that selector in the optimal way?
- because there is only one (very large) chunk what may be happening is we pull it up to get the first element, then because it does not fit in the chunk cache it gets mostly discarded so for the second element the chunk is not in the cache so it gets pulled up again, the second element read out, and then because it again does not fit in the cache it gets mostly discarded, and so on. Eventually we end up with N copies of the single chunk is some sort of purgatory and OOM the machine.

Not clear if the error is on the h5py side or the hdf5 side.

colehurwitz · 2019-02-18T11:04:12Z

Thanks for making a script to reproduce the error @vasole and thanks @tacaswell for looking into it. Hopefully a good solution is found!

aparamon · 2019-02-20T11:59:17Z

Apparently, the problem is in HDF5 library; reported upstream.

colehurwitz · 2019-02-20T12:14:45Z

That is good to know! Is there a place where I can follow updates for this library? Thanks for the help!

aparamon · 2019-02-20T12:47:59Z

@colehurwitz31 I have sent report to https://forum.hdfgroup.org but it's currently down so I can't get the link...
Hopefully, it is up soon, the report is read, and JIRA issue is created by HDF Group members.

epourmal · 2019-02-20T13:19:23Z

Hello, H5Pyers,

Looks like the FORUM is down and I reported it to our sysadmin. It is little bit early here ;-)

Just one comment: please make sure that you close all handles. It is a known issue with HDF5 hyperslab selection code that internal data structures are growing and are not released until the library is closed. Please send us a C example that reproduces the issue.

Thank you!
Elena

aparamon · 2019-02-20T13:25:29Z

@epourmal Good morning! :-)
Here you go:

#include <stdio.h>
#include <stdlib.h>
#include "hdf5.h"

int main() {
   const hsize_t size = 100000000L;
   hid_t fapl, file, dataset, dcpl, memspace, dataspace;
   herr_t status;
   hsize_t start = 0;
   hsize_t stride = 2;
   hsize_t halfsize = size/2;
   float *data;

   printf("Creating data file...");
   file = H5Fcreate("dummy.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

   dataspace = H5Screate_simple(1, &size, NULL);
   dcpl = H5Pcreate(H5P_DATASET_CREATE);
   status = H5Pset_chunk(dcpl, 1, &size);
   status = H5Pset_deflate(dcpl, 3);

   dataset = H5Dcreate2(file, "data", H5T_INTEL_F32, dataspace,
   			H5P_DEFAULT, dcpl, H5P_DEFAULT);

   status = H5Sclose(dataspace);
   status = H5Pclose(dcpl);

   data = malloc(sizeof(float)*size);
   for(hsize_t i=0; i<size; i++)
     data[i] = (float)rand()/(float)(RAND_MAX/1.);
   memspace = H5Screate_simple(1, &size, NULL);
   dataspace = H5Dget_space(dataset);
   status = H5Dwrite(dataset, H5T_NATIVE_FLOAT, memspace, dataspace, H5P_DEFAULT, data);
   status = H5Sclose(dataspace);
   status = H5Sclose(memspace);
   free(data);

   status = H5Dclose(dataset);
   status = H5Fclose(file);
   printf(" done.\n");

   printf("Loading data...");
   file = H5Fopen("dummy.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
   dataset = H5Dopen2(file, "data", H5P_DEFAULT);

   data = malloc(sizeof(float)*halfsize);
   memspace = H5Screate_simple(1, &halfsize, NULL);
   dataspace = H5Dget_space(dataset);
   status = H5Sselect_hyperslab(dataspace, H5S_SELECT_SET, &start, &stride, &halfsize, NULL);
   status = H5Dread(dataset, H5T_NATIVE_FLOAT, memspace, dataspace, H5P_DEFAULT, data);
   status = H5Sclose(dataspace);
   status = H5Sclose(memspace);
   free(data);

   status = H5Dclose(dataset);
   status = H5Fclose(file);
   printf(" done.\n");
}

epourmal · 2019-02-20T14:21:51Z

Thank you! Forwarded to our Helpdesk.

Which version of HDF5 are you using? Did you try the same program without compression? Which version of zlib is used? More detailed information will help.

Elena

aparamon · 2019-02-20T14:34:27Z

@epourmal Reproducible for me on Windows, HDF5 1.10.4.
Commenting out just
status = H5Pset_deflate(dcpl, 3)
doesn't help, only removing
status = H5Pset_chunk(dcpl, 1, &size)
makes it run nicely.

aparamon · 2019-02-20T16:53:02Z

Please track upstream report at
https://jira.hdfgroup.org/browse/HDFFV-10709

tacaswell · 2019-02-20T17:04:31Z

I'm going to close this as it is an upstream bug.

epourmal · 2019-02-20T17:06:07Z

Interesting but doesn't make sense :-) We will investigate.

tacaswell added the bug-in-external-lib label Feb 20, 2019

tacaswell closed this as completed Feb 20, 2019

nabito mentioned this issue Jun 14, 2020

best_model.hdf5 hfawaz/dl-4-tsc#26

Open

DocDriven mentioned this issue Dec 6, 2021

Memory Leak with Slice operator #2013

Closed

ljgray mentioned this issue Mar 10, 2023

Memory leak when getting/setting dataset distributed over last axis radiocosmology/caput#165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Leak when Slicing Dataset #1176

Memory Leak when Slicing Dataset #1176

colehurwitz commented Feb 14, 2019

aparamon commented Feb 15, 2019 •

edited

Loading

colehurwitz commented Feb 15, 2019

tacaswell commented Feb 16, 2019

colehurwitz commented Feb 16, 2019

tacaswell commented Feb 17, 2019

vasole commented Feb 17, 2019

vasole commented Feb 17, 2019 •

edited

Loading

tacaswell commented Feb 18, 2019

colehurwitz commented Feb 18, 2019

aparamon commented Feb 20, 2019

colehurwitz commented Feb 20, 2019 •

edited

Loading

aparamon commented Feb 20, 2019 •

edited

Loading

epourmal commented Feb 20, 2019

aparamon commented Feb 20, 2019

epourmal commented Feb 20, 2019

aparamon commented Feb 20, 2019 •

edited

Loading

aparamon commented Feb 20, 2019

tacaswell commented Feb 20, 2019

epourmal commented Feb 20, 2019

Memory Leak when Slicing Dataset #1176

Memory Leak when Slicing Dataset #1176

Comments

colehurwitz commented Feb 14, 2019

aparamon commented Feb 15, 2019 • edited Loading

colehurwitz commented Feb 15, 2019

tacaswell commented Feb 16, 2019

colehurwitz commented Feb 16, 2019

tacaswell commented Feb 17, 2019

vasole commented Feb 17, 2019

vasole commented Feb 17, 2019 • edited Loading

tacaswell commented Feb 18, 2019

colehurwitz commented Feb 18, 2019

aparamon commented Feb 20, 2019

colehurwitz commented Feb 20, 2019 • edited Loading

aparamon commented Feb 20, 2019 • edited Loading

epourmal commented Feb 20, 2019

aparamon commented Feb 20, 2019

epourmal commented Feb 20, 2019

aparamon commented Feb 20, 2019 • edited Loading

aparamon commented Feb 20, 2019

tacaswell commented Feb 20, 2019

epourmal commented Feb 20, 2019

aparamon commented Feb 15, 2019 •

edited

Loading

vasole commented Feb 17, 2019 •

edited

Loading

colehurwitz commented Feb 20, 2019 •

edited

Loading

aparamon commented Feb 20, 2019 •

edited

Loading

aparamon commented Feb 20, 2019 •

edited

Loading