Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infinite loop when closing file #556

Closed
jreadey opened this issue Apr 2, 2015 · 19 comments
Closed

infinite loop when closing file #556

jreadey opened this issue Apr 2, 2015 · 19 comments

Comments

@jreadey
Copy link
Contributor

jreadey commented Apr 2, 2015

I'm getting an "infinite loop closing library" error when using libver="latest".

See example below. Without the libver='latest' the file closes fine, but the entire script takes some time to run (~8 seconds on my system). With the libver="latest' it's much faster (~0.2 seconds), but I get the infinite loop message.

This is with h5py 2.4 and hdf5 lib 1.8.14.

import h5py
import time

ATTR_COUNT = 1000
f = h5py.File("attr1k.h5", "w", libver='latest')

create_start = time.time()
print "creating attributes", create_start
# create attributes
for i in range(ATTR_COUNT):
    name = 'a{:04d}'.format(i)
    f.attrs[name] = "this is attribute: " + str(i)

modify_start = time.time()
print "updating attributes", modify_start

# modify
for i in range(ATTR_COUNT):
    name = 'a{:04d}'.format(i)
    f.attrs[name] = "an updated attribute: " + str(i)

modify_end = time.time();
print "done!", modify_end
print "total time: ", (modify_end - create_start)

f.close()
@andrewcollette
Copy link
Contributor

Could you post the error message? Btw, this may be an HDF5 problem... all the libver keyword does is set a flag on the file access property list.

@jreadey
Copy link
Contributor Author

jreadey commented Apr 9, 2015

Here's the error message:

HDF5: infinite loop closing library
      D,G,A,S,T,F,FD,P,PL,FD,P,FD,P,E,E,SL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL

Strangely I don't see the error on my Mac, but only with Linux.

The equivalent C program doesn't produce an error:


#include "hdf5.h"

#include <stdio.h>
#include <string.h>

#define ATTR_COUNT 1000

void main(int argc, char** argv)
{
  hid_t file, fapl, dtype, dspace, attr;
  int i;
  char *wdata[1], name[80];

  wdata[0] = name;

  fapl = H5Pcreate(H5P_FILE_ACCESS);
  H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);

  file = H5Fcreate("attr1k.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl);

  dtype = H5Tcopy(H5T_C_S1);
  H5Tset_size(dtype, H5T_VARIABLE);
  dspace = H5Screate(H5S_SCALAR);

  printf("creating attributes\n");
  for (i = 0; i < ATTR_COUNT; ++i)
    {
      sprintf(name, "a:%04d", i);
      attr = H5Acreate2(file, name, dtype, dspace, H5P_DEFAULT, H5P_DEFAULT);
      sprintf(name, "this is attribute: %d", i);
      H5Awrite(attr, dtype, &wdata);
      H5Aclose(attr);
    }

  printf("updating attributes\n");
  for (i = 0; i < ATTR_COUNT; ++i)
    {
      sprintf(name, "a:%04d", i);
      attr = H5Aopen(file, name, H5P_DEFAULT);
      sprintf(name, "an updated attribute: %d", i);
      H5Awrite(attr, dtype, &wdata);
      H5Aclose(attr);
    }

  H5Sclose(dspace);
  H5Tclose(dtype);
  H5Fclose(file);
}

Is it possible that libver is exposing some issue on the h5py side?

@andrewcollette
Copy link
Contributor

Maybe, but I can't think where. Once that value goes into the property list we don't touch it any more. I'll leave this issue open but I'm stumped.

@jreadey
Copy link
Contributor Author

jreadey commented Apr 14, 2015

Is there some way that h5py can generate a trace of the HDF5 lib calls? If we had that the hdf5 library development team could take a look.

@andrewcollette
Copy link
Contributor

Not that I know of. You could try running Python in verbose mode, though (python -v).

@greenc-FNAL
Copy link

The below example is sufficient to reproduce the problem with h5py-2.6.0. The difference between triggering the problem and not (apart from the aforementioned libver='latest') is the number of attributes: 544 will trigger, 543 will not. The threshold would appear to be the triggering of "dense" attributes as supported in HDF5 1.8 and above. The example below was tested with HDF 1.10.0-patch1.

import h5py
f = h5py.File("test.h5", "w", libver="latest")
dset = f.create_dataset("test", shape=(1000,))
dset[:] = 2.6
for counter in xrange(1,544):
    name = "{}".format(counter)
    rref = dset.regionref[5:40]
    dset.attrs[name] = rref
f.close()

@greenc-FNAL
Copy link

Apparently the magic number shifted after a reboot, it appears not to be a hard number. In my post-reboot checks, 549 is now the number needed to trigger the problem on my OS X Yosemite MacBook Pro; YMMV.

@derobins
Copy link

I see this failure with numbers as low as 50 and it appears that the failure is random but becomes more likely as the number of attributes increases.

@subiol
Copy link

subiol commented Mar 10, 2017

I am seeing the same bug appear randomly, have not found a pattern. This is the error I get:

CHDF5: infinite loop closing library
D,G,S,T,F,FD,P,PL,FD,P,FD,P,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E

Has anyone looked into this?

@tacaswell
Copy link
Member

I wonder if this is fixed by the gc fencing in #903

@tacaswell
Copy link
Member

I also can not reproduce this with py3.6, h5py 2.7, hdf5 1.8.18 (event with 100000 attributes).

@jreadey
Copy link
Contributor Author

jreadey commented Jul 28, 2017

I get this error periodically when h5serv (https://github.com/HDFGroup/h5serv) shuts down. I've talked about this with the hdf5 library team, but we haven't been able to isolate the issue.

At least it doesn't seem to have any adverse effects.

@project-tuva
Copy link

project-tuva commented Sep 26, 2018

Hi all, I've written a simple python module to manage r/w from/to hdf5 files and I'm experiencing the same issue. I have Ubuntu 16.04.4 LTS and h5 version 1.10.2.

It seems to be more frequent when attributes were added to a dataset or when the dataset hasn't been already present before launching make.

Here the repo: https://github.com/project-tuva/h5bug
There are two branches: master and buggy. You can run a minimal example by issuing make, after having modified the variables in the makefile according to your environment.

It seems that a solution was found for this bug:
IntelPython/sdc#2 (comment)

How can I get rid of this error?

Thanks in advance

@tacaswell
Copy link
Member

I can reproduce this with h5py 2.9 and hdf5 1.10.4 (from arch packages), but with a different error pattern. I do not think this is in closing the file, but it tearing down the library. Using a slightly simpler script:

import h5py
import time

ATTR_COUNT = 600
f = h5py.File("attr1k.h5", "w", libver='latest')

create_start = time.time()
print("creating attributes", create_start)
# create attributes
for i in range(ATTR_COUNT):
    name = 'a{:04d}'.format(i)
    f.attrs[name] = "this is attribute: " + str(i)

f.close()
print('closed file?!')
print('about to tear things down')
creating attributes 1552357400.43879
closed file?!
about to tear things down
HDF5: infinite loop closing library
      L,T_top,P,P,Z,FD,E,SL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL


However it does not fail every time which suggests to me there is a race condition in de-allocating the guts of hdf5 and the python side. With smaller numbers of attributes you still see this, just at lower rates (from eye-balling it) and as reported above by @derobins and @chissg

This may be fixed by / related to https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/commits/f808c108ed0315f115a7c69cbd8ee95032a64b34 which looks like it got merged to the 1.10 branch in https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/commits/489f6fb69711ef7f26f4c13ad863438779f654b8 which is post 1.10.4 and pre 1.10.5 (I think? I am a bit confusedby the tagging scheme) .

Unfortunately I'm out of bandwidth to build 1.10.5 locally to test this tonight.

@rmvanhees
Copy link

rmvanhees commented Mar 12, 2019 via email

@tacaswell
Copy link
Member

That does suggest that it is related to the 'dense' attribute storage, but unfortunate that just upgrading won't fix it :(

@buaacarzp
Copy link

Hey, man, its my honer to ask this question, this problem means your h5py file is not be saved completly ,you can try again ,use anothor model to train your data.

@epourmal
Copy link

Please check your program with the latest HDF5 develop and 1_10 branches.

@tacaswell
Copy link
Member

I can not reproduce this with h5py 2.10 and hdf5 1.10.5. Clasing as fixed upstream, thanks @epourmal !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests