Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
mpiFileUtils is a suite of MPI-based tools to manage large datasets, which may vary from large directory trees to large files. High-performance computing users often generate large datasets with parallel applications that run with many processes (in some cases millions). However those users are then stuck with single-process tools like cp and rm to manage their datasets. This suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for such datasets, providing speedups of up to 20-30x.
The suite is built around a common library currently called "bayer".
Overview of libbayer library
All routines and types prefixed with "bayer_", constants and macros prefixed "BAYER_".
- routines to print errors, debug messages, and abort
- routines to allocate and free memory (bayer_malloc, bayer_strdup, bayer_free, etc)
- allocation routines call abort and print file and line number on failure (no need to check return value)
- valid to allocate 0 bytes, NULL is returned (no need to check size > 0 before allocating)
- bayer_free routine takes a pointer to a pointer, it sets the pointer to NULL before returning (good for catching memory bugs)
- valid to pass NULL or pointer to NULL to bayer_free (nothing is done in that case)
- macros like BAYER_MALLOC automatically attach file and line number
- wrappers around POSIX-IO routines to retry in case of common, non-fatal errors (e.g., EINTR)
- routines to build and manipulate file paths as components
- routines to simplify paths, e.g., /a/b/../c/. --> /a/c
- distributed list of file info, currently two types: list of (name, type) tuples or (name, stat) tuples
- can build a list with "walk", "read", and "subset", cleanup with "free"
- iterate over elements by index number (index local to process)
- some global "summary" properties of list, e.g., file_max_name, global_size, global_offset
- functions to query properties of elements, e.g., get_name, get_type, get_atime, etc.
- see drm/drm.c and dwalk/dwalk.c for examples on how to use bayer_flist