-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #160 from dsikich/docs-update
update readthedocs
- Loading branch information
Showing
7 changed files
with
232 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
======================== | ||
Build | ||
======================== | ||
|
||
mpiFileUtils depends on several libraries. mpiFileUtils is available in Spack, | ||
which simplifies the install to just: | ||
|
||
.. code-block:: Bash | ||
$ spack install mpifileutils | ||
Or to enable all features: | ||
|
||
.. code-block:: Bash | ||
$ spack install mpifileutils +lustre +experimental | ||
To build from a release tarball, there are two scripts: buildme_dependencies and | ||
buildme. The buildme_dependencies script downloads and installs all the | ||
necessary libraries. The buildme script then builds mpiFileUtils assuming the | ||
libraries have been installed. Both scripts require that mpicc is in your path, | ||
and that it is for an MPI library that supports at least v2.2 of the MPI | ||
standard. Please review each buildme script, and edit if necessary. Then run | ||
them in sequence: | ||
|
||
.. code-block:: Bash | ||
$ ./buildme_dependencies | ||
$ ./buildme | ||
To build from a clone, it may also be necessary to first run the | ||
buildme_autotools script to obtain the required set of autotools, then use | ||
buildme_dependencies_dev and buildme_dev: | ||
|
||
.. code-block:: Bash | ||
$ ./buildme_autotools | ||
$ ./buildme_dependencies_dev | ||
$ ./buildme_dev |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
====================== | ||
Experimental Utilities | ||
====================== | ||
|
||
Experimental utilities are under active development. They are not considered to | ||
be production worthy, but they are available in the distribution for those | ||
interested in developing them further or to provide additional examples. To | ||
enable experimental utilities, run configure with the enable experimental | ||
option. | ||
|
||
.. code-block:: Bash | ||
$./configure --enable-experimental |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ Experimental Tools | |
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
experimental-utilities.rst | ||
dfind.1 | ||
dgrep.1 | ||
dparallel.1 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
======================== | ||
libmfu | ||
======================== | ||
|
||
Functionality that is common to multiple tools is moved to the common library, | ||
libmfu. This goal of this library is to make it easy to develop new tools and | ||
to provide consistent behavior across tools in the suite. The library can also | ||
be useful to end applications, e.g., to efficiently create or remove a large | ||
directory tree in a portable way across different parallel file systems. | ||
|
||
---------------------------------------- | ||
libmfu: the mpiFileUtils common library | ||
---------------------------------------- | ||
|
||
The mpiFileUtils common library defines data structures and methods on those | ||
data structures that makes it easier to develop new tools or for use within HPC | ||
applications to provide portable, performant implementations across file systems | ||
common in HPC centers. | ||
|
||
To use this library, include mfu.h. | ||
|
||
.. code-block:: C | ||
#include "mfu.h" | ||
This file includes all other necessary headers. | ||
|
||
---------------------------------------- | ||
mfu_flist | ||
---------------------------------------- | ||
|
||
The key data structure in libmfu is a distributed file list called mfu_flist. | ||
This structure represents a list of files, each with stat-like metadata, that | ||
is distributed among a set of MPI ranks. | ||
|
||
The library contains functions for creating and operating on these lists. For | ||
example, one may create a list by recursively walking an existing directory or | ||
by inserting new entries one at a time. Given a list as input, functions exist | ||
to create corresponding entries (inodes) on the file system or to delete the | ||
list of files. One may filter, sort, and remap entries. One can copy a list of | ||
entries from one location to another or compare corresponding entries across | ||
two different lists. A file list can be serialized and written to or read from | ||
a file. | ||
|
||
Each MPI rank "owns" a portion of the list, and there are routines to step | ||
through the entries owned by that process. This portion is referred to as the | ||
"local" list. Functions exist to get and set properties of the items in the | ||
local list, for example to get the path name, type, and size of a file. | ||
Functions dealing with the local list can be called by the MPI process | ||
independently of other MPI processes. | ||
|
||
Other functions operate on the global list in a collective fashion, such as | ||
deleting all items in a file list. All processes in the MPI job must invoke | ||
these functions simultaenously. | ||
|
||
For full details, see `mfu_flist.h <https://github.com/hpc/mpifileutils/blob/master/src/common/mfu_flist.h>`_ | ||
and refer to its usage in existing tools. | ||
|
||
---------------------------------------- | ||
mfu_path | ||
---------------------------------------- | ||
|
||
mpiFileUtils represents file paths with the | ||
`mfu_path <https://github.com/hpc/mpifileutils/blob/master/src/common/mfu_path.h>`_ | ||
structure. Functions are available to manipulate paths to prepend and append | ||
entries, to slice paths into pieces, and to compute relative paths. | ||
|
||
---------------------------------------- | ||
mfu_param_path | ||
---------------------------------------- | ||
|
||
Path names provided by the user on the command line (parameters) are handled | ||
through the | ||
`mfu_param_path <https://github.com/hpc/mpifileutils/blob/master/src/common/mfu_param_path.h>`_ | ||
structure. Such paths may have to be checked for existence and to determine | ||
their type (file or directory). Additionally, the user may specify many such | ||
paths through invocations involving shell wildcards, so functions are available | ||
to check long lists of paths in parallel. | ||
|
||
---------------------------------------- | ||
mfu_io and mfu_util | ||
---------------------------------------- | ||
|
||
The `mfu_io.h <https://github.com/hpc/mpifileutils/blob/master/src/common/mfu_io.h>`_ | ||
functions provide wrappers for many POSIX-IO functions. This is helpful for | ||
checking error codes in a consistent manner and automating retries on failed | ||
I/O calls. One should use the wrappers in mfu_io if available, and if not, one | ||
should consider adding the missing wrapper. | ||
|
||
The `mfu_util.h <https://github.com/hpc/mpifileutils/blob/master/src/common/mfu_util.h>`_ | ||
functions provide wrappers for error reporting and memory allocation. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
================ | ||
Overview | ||
================ | ||
|
||
mpiFileUtils provides both a library called libmfu and a suite of MPI-based | ||
tools to manage large datasets, which may vary from large directory trees to | ||
large files. High-performance computing users often generate large datasets with | ||
parallel applications that run with many processes (millions in some cases). | ||
However those users are then stuck with single-process tools like cp and rm to | ||
manage their datasets. This suite provides MPI-based tools to handle typical | ||
jobs like copy, remove, and compare for such datasets, providing speedups of up | ||
to 50x. It also provides a library that simplifies the creation of new tools | ||
or can be used in applications | ||
|
||
--------------------------- | ||
Utilities | ||
--------------------------- | ||
|
||
The tools in mpiFileUtils are actually MPI applications. They must be launched | ||
as MPI applications, e.g., within a compute allocation on a cluster using | ||
mpirun. The tools do not currently checkpoint, so one must be careful that an | ||
invocation of the tool has sufficient time to complete before it is killed. | ||
Example usage of each tool is provided below. | ||
|
||
- dbcast - Broadcast files to compute nodes. | ||
- dchmod - Change owner, group, and permissions on files. | ||
- dcmp - Compare files. | ||
- dcp - Copy files. | ||
- ddup - Find duplicate files. | ||
- dfilemaker - Generate random files. | ||
- drm - Remove files. | ||
- dstripe - Restripe files. | ||
- dsync - Synchronize files. | ||
- dwalk - List files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
========================= | ||
Project Design Principles | ||
========================= | ||
|
||
The following principles drive design decisions in the project. | ||
|
||
--------------------------- | ||
Scale | ||
--------------------------- | ||
|
||
The library and tools should be designed such that running with more processes | ||
increases performance, provided there are sufficient data and parallelism | ||
available in the underlying file systems. The design of the tool should not | ||
impose performance scalability bottlenecks. | ||
|
||
--------------------------- | ||
Performance | ||
--------------------------- | ||
|
||
While it is tempting to mimic the interface, behavior, and file formats of | ||
familiar tools like cp, rm, and tar, when forced with a choice between | ||
compatibility and performance, mpiFileUtils chooses performance. For example, | ||
if an archive file format requires serialization that inhibits parallel | ||
performance, mpiFileUtils will opt to define a new file format that enables | ||
parallelism rather than being constrained to existing formats. Similarly, | ||
options in the tool command line interface may have different semantics from | ||
familiar tools in cases where performance is improved. Thus, one should be | ||
careful to learn the options of each tool. | ||
|
||
--------------------------- | ||
Portability | ||
--------------------------- | ||
|
||
The tools are intended to support common file systems used in HPC centers, like | ||
Lustre, GPFS, and NFS. Additionally, methods in the library should be portable | ||
and efficient across multiple file systems. Tool and library users can rely on | ||
mpiFileUtils to provide portable and performant implementations. | ||
|
||
--------------------------- | ||
Composability | ||
--------------------------- | ||
|
||
While the tools do not support chaining with Unix pipes, they do support | ||
interoperability through input and output files. One tool may process a dataset | ||
and generate an output file that another tool can read as input, e.g., to walk | ||
a directory tree with one tool, filter the list of file names with another, and | ||
perhaps delete a subset of matching files with a third. Additionally, when | ||
logic is deemed to be useful across multiple tools or is anticipated to be | ||
useful in future tools or applications, it should be provided in the common | ||
library. |