Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel write to parallel file system #73

Closed
witzel opened this issue Nov 25, 2016 · 3 comments
Closed

parallel write to parallel file system #73

witzel opened this issue Nov 25, 2016 · 3 comments
Assignees
Milestone

Comments

@witzel
Copy link

witzel commented Nov 25, 2016

Hi,

I ran four tests on an 8^4 lattice on the summit machine at UC Boulder and similar tests on pi0 at Fermilab. All jobs were running on 2 nodes each with 1 mpi rank and 24 threads summit (16 threads pi0). The jobs differ by the type of the files ystem used for writing the ckpoints (NFS or GPFS summit; ZFS or lustre Fermilab) and whether I split the T or the Z direction (1 or 2 IO nodes)

summit
mpi SLURM-ID
1.1.1.2 (2 IO nodes) NFS 462 280 MB/s
1.1.1.2 (2 IO nodes) GPFS 461 0.05 MB/s
1.1.2.1 (1 IO node) NFS 460 131 MB/s
1.1.2.1 (1 IO node) GPFS 455 79 MB/s

pi0 Fermilab
mpi PBS-ID (last three)
1.1.1.2 (2 IO nodes) ZFS 628 228 MB/s
1.1.1.2 (2 IO nodes) lustre 635 0.002 MB/s
1.1.2.1 (1 IO node) ZFS 626 110 MB/s
1.1.2.1 (1 IO node) lustre 627 3-20 MB/s

Unfortunately, I didn't find performance values for the single I/O writing the rng-files in the log-files;
the full log-files are however attached and carry the SLURM-ID / PBS-ID in the filename. Do I need some special flag for parallel file systems? (striping?)

The parallel read of the ckpoint at the beginning of the job seems OK four all cases although in this tests not all jobs started from a checkpoint. On both machines Grid is compiled on NFS/ZFS.

Thank you,
Oliver

pi0.zip
summit.zip

@paboyle
Copy link
Owner

paboyle commented Apr 26, 2017

Just a comment -- I'm aware of this and it is on my todo list. Christoph also is reporting.
Chunk size read in on "read" needs to be much bigger. Plan to expand to an "x-strip" and hope this is big enough.

@aportelli aportelli added this to the 0.8.0 milestone May 5, 2017
@paboyle
Copy link
Owner

paboyle commented Jun 1, 2017

Making progress on this now; getting 1GB/s on the BNL KNL GPFS system
on parallel writes and accelerates both configuration (Lattice) and RNG I/O
pretty well. Switching to use MPI-2 I/O.

However; this is presently in a feature branch (feature/parallelio) and I haven't yet finished
the RNG state (need to add in the serial RNG state).

See issue 111 for a bug/gotcha in Intel MPI running on GPFS.

@paboyle
Copy link
Owner

paboyle commented Jun 19, 2017

Committed back to develop. Can close this now as we have comprehensive SciDAC support, ILDG and MPI2 IO implemented

@paboyle paboyle closed this as completed Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants