You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran four tests on an 8^4 lattice on the summit machine at UC Boulder and similar tests on pi0 at Fermilab. All jobs were running on 2 nodes each with 1 mpi rank and 24 threads summit (16 threads pi0). The jobs differ by the type of the files ystem used for writing the ckpoints (NFS or GPFS summit; ZFS or lustre Fermilab) and whether I split the T or the Z direction (1 or 2 IO nodes)
Unfortunately, I didn't find performance values for the single I/O writing the rng-files in the log-files;
the full log-files are however attached and carry the SLURM-ID / PBS-ID in the filename. Do I need some special flag for parallel file systems? (striping?)
The parallel read of the ckpoint at the beginning of the job seems OK four all cases although in this tests not all jobs started from a checkpoint. On both machines Grid is compiled on NFS/ZFS.
Just a comment -- I'm aware of this and it is on my todo list. Christoph also is reporting.
Chunk size read in on "read" needs to be much bigger. Plan to expand to an "x-strip" and hope this is big enough.
Making progress on this now; getting 1GB/s on the BNL KNL GPFS system
on parallel writes and accelerates both configuration (Lattice) and RNG I/O
pretty well. Switching to use MPI-2 I/O.
However; this is presently in a feature branch (feature/parallelio) and I haven't yet finished
the RNG state (need to add in the serial RNG state).
See issue 111 for a bug/gotcha in Intel MPI running on GPFS.
Hi,
I ran four tests on an 8^4 lattice on the summit machine at UC Boulder and similar tests on pi0 at Fermilab. All jobs were running on 2 nodes each with 1 mpi rank and 24 threads summit (16 threads pi0). The jobs differ by the type of the files ystem used for writing the ckpoints (NFS or GPFS summit; ZFS or lustre Fermilab) and whether I split the T or the Z direction (1 or 2 IO nodes)
summit
mpi SLURM-ID
1.1.1.2 (2 IO nodes) NFS 462 280 MB/s
1.1.1.2 (2 IO nodes) GPFS 461 0.05 MB/s
1.1.2.1 (1 IO node) NFS 460 131 MB/s
1.1.2.1 (1 IO node) GPFS 455 79 MB/s
pi0 Fermilab
mpi PBS-ID (last three)
1.1.1.2 (2 IO nodes) ZFS 628 228 MB/s
1.1.1.2 (2 IO nodes) lustre 635 0.002 MB/s
1.1.2.1 (1 IO node) ZFS 626 110 MB/s
1.1.2.1 (1 IO node) lustre 627 3-20 MB/s
Unfortunately, I didn't find performance values for the single I/O writing the rng-files in the log-files;
the full log-files are however attached and carry the SLURM-ID / PBS-ID in the filename. Do I need some special flag for parallel file systems? (striping?)
The parallel read of the ckpoint at the beginning of the job seems OK four all cases although in this tests not all jobs started from a checkpoint. On both machines Grid is compiled on NFS/ZFS.
Thank you,
Oliver
pi0.zip
summit.zip
The text was updated successfully, but these errors were encountered: