Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI tests fail at test_close_multiple_mpio_driver #1285

Closed
drew-parsons opened this issue Aug 17, 2019 · 5 comments · Fixed by #1289
Closed

MPI tests fail at test_close_multiple_mpio_driver #1285

drew-parsons opened this issue Aug 17, 2019 · 5 comments · Fixed by #1289

Comments

@drew-parsons
Copy link
Contributor

When h5py is built with mpi, run_tests() fail:

mpirun -n 2 python3 -c "import h5py.tests; h5py.tests.run_tests(verbosity=2)"
test_close (h5py.tests.old.test_file.TestClose)
Close file via .close method ... ok
test_close_multiple_default_driver (h5py.tests.old.test_file.TestClose) ... ok
test_close_multiple_mpio_driver (h5py.tests.old.test_file.TestClose)
MPIO driver and options ... ok
test_close (h5py.tests.old.test_file.TestClose)
Close file via .close method ... ok
test_close_multiple_default_driver (h5py.tests.old.test_file.TestClose) ... ok
test_close_multiple_mpio_driver (h5py.tests.old.test_file.TestClose)
MPIO driver and options ... mca_fbtl_posix_pwritev: error in writev:Bad file descriptor

The log shows that the first processor closed the mpio file successfully, while the second processor emitted error in writev:Bad file descriptor.

I expect this is caused by the double f.close() in test_close_multiple_mpio_driver . Is there a reason for running f.close() twice? It doesn't seem to be doing the mpio test any good.

This is testing on Debian,

  • Linux amd64 4.19.37-6
  • python3-h5py 2.9.0-5
  • libhdf5-openmpi-103 1.10.4+repack-10
  • libopenmpi3 3.1.3-11

Summary of the h5py configuration

h5py 2.9.0
HDF5 1.10.4
Python 3.7.4 (default, Jul 11 2019, 10:43:21)
[GCC 8.3.0]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.16.2

@drew-parsons
Copy link
Contributor Author

drew-parsons commented Aug 17, 2019

The double f.close() is beside the point. The error comes from the use of mktemp(). It hasn't been parallelized, so a different filename is given to the h5py File on each process. But with mpio used, the File object expects the same filename, so it emits the error.

mpi4py gives a pattern to use at https://bitbucket.org/mpi4py/mpi4py/src/master/test/test_io.py

Here is a patch. For good measure, the patch replaces mktemp (which is deprecated, possibly unsafe) with mkstemp.

--- a/h5py/tests/common.py
+++ b/h5py/tests/common.py
@@ -62,7 +62,9 @@
     def mktemp(self, suffix='.hdf5', prefix='', dir=None):
         if dir is None:
             dir = self.tempdir
-        return tempfile.mktemp(suffix, prefix, dir=self.tempdir)
+        fd,tempname = tempfile.mkstemp(suffix, prefix, dir=self.tempdir)
+        os.close(fd)
+        return tempname
 
     def setUp(self):
         self.f = h5py.File(self.mktemp(), 'w')
--- a/h5py/tests/old/test_file.py
+++ b/h5py/tests/old/test_file.py
@@ -512,8 +512,13 @@
         """ MPIO driver and options """
         from mpi4py import MPI
 
-        fname = self.mktemp()
-        f = File(fname, 'w', driver='mpio', comm=MPI.COMM_WORLD)
+        comm = MPI.COMM_WORLD
+        fname = None
+        if comm.Get_rank() == 0:
+            fname = self.mktemp()
+        fname = comm.bcast(fname, 0)
+
+        f = File(fname, 'w', driver='mpio', comm=comm)
         f.create_group("test")
         f.close()
         f.close()

@drew-parsons
Copy link
Contributor Author

Actually my patch to mktemp() may cause other problems, e.g.

======================================================================
ERROR: test_create_exclusive (h5py.tests.old.test_file.TestFileOpen)
Mode 'w-' opens file in exclusive mode
----------------------------------------------------------------------
Traceback (most recent call last):
  File "h5py/tests/old/test_file.py", line 84, in test_create_exclusive
    fid = File(fname, 'w-')
  File "h5py/_hl/files.py", line 394, in __init__
    swmr=swmr)
  File "h5py/_hl/files.py", line 174, in make_fid
    fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 105, in h5py.h5f.create
IOError: Unable to create file (unable to open file: name = '/tmp/h5py-test_c2ic4h/hZMwmH.hdf5',          errno = 17, error message = 'File exists', flags = 15, o_flags = c2)

So handle any update from mktemp to mkstemp critically. Perhaps fd should be deleted after closing?

Best to apply the MPI patch for test_file.py separately from handling the mkstemp patch, i.e. just

--- a/h5py/tests/old/test_file.py
+++ b/h5py/tests/old/test_file.py
@@ -512,8 +512,13 @@
         """ MPIO driver and options """
         from mpi4py import MPI
 
-        fname = self.mktemp()
-        f = File(fname, 'w', driver='mpio', comm=MPI.COMM_WORLD)
+        comm = MPI.COMM_WORLD
+        fname = None
+        if comm.Get_rank() == 0:
+            fname = self.mktemp()
+        fname = comm.bcast(fname, 0)
+
+        f = File(fname, 'w', driver='mpio', comm=comm)
         f.create_group("test")
         f.close()
         f.close()

@drew-parsons
Copy link
Contributor Author

There were a couple of other tests using mpio which needed fixing also. This patch seems to catch them all:

--- a/h5py/tests/old/test_file.py
+++ b/h5py/tests/old/test_file.py
@@ -251,8 +251,9 @@
         """ MPIO driver and options """
         from mpi4py import MPI
 
-        fname = self.mktemp()
-        with File(fname, 'w', driver='mpio', comm=MPI.COMM_WORLD) as f:
+        comm=MPI.COMM_WORLD
+        fname = self.mktemp_mpi(comm)
+        with File(fname, 'w', driver='mpio', comm=comm) as f:
             self.assertTrue(f)
             self.assertEqual(f.driver, 'mpio')
 
@@ -263,8 +264,9 @@
         """ Enable atomic mode for MPIO driver """
         from mpi4py import MPI
 
-        fname = self.mktemp()
-        with File(fname, 'w', driver='mpio', comm=MPI.COMM_WORLD) as f:
+        comm=MPI.COMM_WORLD
+        fname = self.mktemp_mpi(comm)
+        with File(fname, 'w', driver='mpio', comm=comm) as f:
             self.assertFalse(f.atomic)
             f.atomic = True
             self.assertTrue(f.atomic)
@@ -512,8 +514,9 @@
         """ MPIO driver and options """
         from mpi4py import MPI
 
-        fname = self.mktemp()
-        f = File(fname, 'w', driver='mpio', comm=MPI.COMM_WORLD)
+        comm=MPI.COMM_WORLD
+        fname = self.mktemp_mpi(comm)
+        f = File(fname, 'w', driver='mpio', comm=comm)
         f.create_group("test")
         f.close()
         f.close()
--- a/h5py/tests/common.py
+++ b/h5py/tests/common.py
@@ -64,6 +64,16 @@
             dir = self.tempdir
         return tempfile.mktemp(suffix, prefix, dir=self.tempdir)
 
+    def mktemp_mpi(self, comm=None, suffix='.hdf5', prefix='', dir=None):
+        if comm is None:
+            from mpi4py import MPI
+            comm = MPI.COMM_WORLD
+        fname = None
+        if comm.Get_rank() == 0:
+            fname = self.mktemp(suffix, prefix, dir)
+        fname = comm.bcast(fname, 0)
+        return fname
+
     def setUp(self):
         self.f = h5py.File(self.mktemp(), 'w')

@takluyver
Copy link
Member

Do you want to turn the patches into a pull request?

@drew-parsons
Copy link
Contributor Author

Can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants