parallel write to parallel file system #73

witzel · 2016-11-25T10:12:04Z

Hi,

I ran four tests on an 8^4 lattice on the summit machine at UC Boulder and similar tests on pi0 at Fermilab. All jobs were running on 2 nodes each with 1 mpi rank and 24 threads summit (16 threads pi0). The jobs differ by the type of the files ystem used for writing the ckpoints (NFS or GPFS summit; ZFS or lustre Fermilab) and whether I split the T or the Z direction (1 or 2 IO nodes)

summit
mpi SLURM-ID
1.1.1.2 (2 IO nodes) NFS 462 280 MB/s
1.1.1.2 (2 IO nodes) GPFS 461 0.05 MB/s
1.1.2.1 (1 IO node) NFS 460 131 MB/s
1.1.2.1 (1 IO node) GPFS 455 79 MB/s

pi0 Fermilab
mpi PBS-ID (last three)
1.1.1.2 (2 IO nodes) ZFS 628 228 MB/s
1.1.1.2 (2 IO nodes) lustre 635 0.002 MB/s
1.1.2.1 (1 IO node) ZFS 626 110 MB/s
1.1.2.1 (1 IO node) lustre 627 3-20 MB/s

Unfortunately, I didn't find performance values for the single I/O writing the rng-files in the log-files;
the full log-files are however attached and carry the SLURM-ID / PBS-ID in the filename. Do I need some special flag for parallel file systems? (striping?)

The parallel read of the ckpoint at the beginning of the job seems OK four all cases although in this tests not all jobs started from a checkpoint. On both machines Grid is compiled on NFS/ZFS.

Thank you,
Oliver

pi0.zip
summit.zip

paboyle · 2017-04-26T07:56:36Z

Just a comment -- I'm aware of this and it is on my todo list. Christoph also is reporting.
Chunk size read in on "read" needs to be much bigger. Plan to expand to an "x-strip" and hope this is big enough.

paboyle · 2017-06-01T21:50:09Z

Making progress on this now; getting 1GB/s on the BNL KNL GPFS system
on parallel writes and accelerates both configuration (Lattice) and RNG I/O
pretty well. Switching to use MPI-2 I/O.

However; this is presently in a feature branch (feature/parallelio) and I haven't yet finished
the RNG state (need to add in the serial RNG state).

See issue 111 for a bug/gotcha in Intel MPI running on GPFS.

paboyle · 2017-06-19T11:07:44Z

Committed back to develop. Can close this now as we have comprehensive SciDAC support, ILDG and MPI2 IO implemented

coppolachan added the pri: normal label Nov 25, 2016

aportelli added this to the 0.8.0 milestone May 5, 2017

aportelli assigned paboyle May 5, 2017

paboyle closed this as completed Jun 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel write to parallel file system #73

parallel write to parallel file system #73

witzel commented Nov 25, 2016

paboyle commented Apr 26, 2017

paboyle commented Jun 1, 2017

paboyle commented Jun 19, 2017

parallel write to parallel file system #73

parallel write to parallel file system #73

Comments

witzel commented Nov 25, 2016

paboyle commented Apr 26, 2017

paboyle commented Jun 1, 2017

paboyle commented Jun 19, 2017