mpiFileUtils is a suite of MPI-based tools to manage large datasets, which may vary from large directory trees to large files. High-performance computing users often generate large datasets with parallel applications that run with many processes (millions in some cases). However those users are then stuck with single-process tools like cp and rm to manage their datasets. This suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for such datasets, providing speedups of up to 20-30x. Documentation is available on ReadTheDocs and in this repo:
- dbcast - Broadcast files to compute nodes.
- dchmod - Change permissions and group access on files.
- dcmp - Compare files.
- dcp - Copy files.
- dfilemaker - Generate random files.
- drm - Remove files.
- dstripe - Restripe files.
- dwalk - List files.
To enable experimental utilities, run configure with the enable experimental option.
- dfind - Find files by path name (experimental).
- dgrep - Search contents of files (experimental).
- dparallel - Perform commands in parallel (experimental).
- dtar - Create file tape archives (experimental).
mpiFileUtils depends on several libraries. To simplify builds, there are two scripts: buildme_dependencies and buildme. The buildme_dependencies script downloads and installs all the necessary libraries. The buildme script then builds mpiFileUtils assuming the libraries have been installed. Both scripts require that mpicc is in your path, and that it is for an MPI library that supports at least v2.2 of the MPI standard. Please review each buildme script, and edit if necessary. Then run them in sequence:
Alternatively, mpiFileUtils is available in Spack, which simplifies the install to just:
spack install mpifileutils
or to enable all features:
spack install mpifileutils +lustre +experimental