Skip to content

Commit

Permalink
dcp: add --chunksize and --blocksize options
Browse files Browse the repository at this point in the history
Add command-line options for --chunksize and --blocksize, and add
them to the dcp.1 man page, since these options now exist for both
dcp and dcp1.

Improve the usage message for these options in dcp.1 to not be
self-referential, and explain what they are actually used for.
It was otherwise unclear what the difference between them was.

Keep the default values at 1MB for now, but it probably makes
sense to make them at least the Lustre stripe size to minimize
contention between multiple threads reading/writing the same file.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
  • Loading branch information
adilger authored and adammoody committed Jun 11, 2019
1 parent eb56ee3 commit e559ef1
Show file tree
Hide file tree
Showing 6 changed files with 76 additions and 17 deletions.
23 changes: 19 additions & 4 deletions doc/rst/dcp.1.rst
Expand Up @@ -11,26 +11,41 @@ DESCRIPTION


Parallel MPI application to recursively copy files and directories. Parallel MPI application to recursively copy files and directories.


dcp is a file copy tool in the spirit of :manpage:`cp(1)` that evenly distributes dcp is a file copy tool in the spirit of :manpage:`cp(1)` that evenly
work across a large cluster without any centralized state. It is distributes the work of scanning the directory tree, and copying file
data across a large cluster without any centralized state. It is
designed for copying files that are located on a distributed parallel designed for copying files that are located on a distributed parallel
file system. file system, and will split large file copies across multiple processes.


OPTIONS OPTIONS
------- -------


.. option:: -b, --blocksize SIZE

Set the I/O buffer to be SIZE bytes. Units like "MB" and "GB" may
immediately follow the number without spaces (eg. 8MB). The default
blocksize is 1MB.

.. option:: -i, --input FILE .. option:: -i, --input FILE


Read source list from FILE. FILE must be generated by another tool Read source list from FILE. FILE must be generated by another tool
from the mpiFileUtils suite. from the mpiFileUtils suite.


.. option:: -k, --chunksize SIZE

Split large files into chunks of SIZE bytes to be processed. Multiple
process ranks may copy a large file in parallel. Units like "MB" and
"GB" can immediately follow the number without spaces (eg. 64MB).
The default chunksize is 1MB.

.. option:: -p, --preserve .. option:: -p, --preserve


Preserve permissions, group, timestamps, and extended attributes. Preserve permissions, group, timestamps, and extended attributes.


.. option:: -s, --synchronous .. option:: -s, --synchronous


Use synchronous read/write calls (open files with 0_DIRECT) Use synchronous read/write calls (open files with O_DIRECT).
This also avoids caching the file data on the client nodes.


.. option:: -S, --sparse .. option:: -S, --sparse


Expand Down
22 changes: 19 additions & 3 deletions man/dcp.1
Expand Up @@ -38,18 +38,34 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
Parallel MPI application to recursively copy files and directories. Parallel MPI application to recursively copy files and directories.
.sp .sp
dcp is a file copy tool in the spirit of \fBcp(1)\fP that evenly distributes dcp is a file copy tool in the spirit of \fBcp(1)\fP that evenly distributes
work across a large cluster without any centralized state. It is the work of scanning the directory tree, and copuing file data across a
designed for copying files that are located on a distributed parallel large cluster without any centralized state. It is designed for copying
file system. files that are located on a distributed parallel file system, and will
split large file copies across multiple processes.
.SH OPTIONS .SH OPTIONS
.INDENT 0.0 .INDENT 0.0
.TP .TP
.B \-b, \-\-blocksize SIZE
Set the I/O buffer to be SIZE bytes. Units like "MB" and "GB" may
immediately follow the number without spaces (eg. 8MB). The default
blocksize is 1MB.
.UNINDENT
.INDENT 0.0
.TP
.B \-i, \-\-input FILE .B \-i, \-\-input FILE
Read source list from FILE. FILE must be generated by another tool Read source list from FILE. FILE must be generated by another tool
from the mpiFileUtils suite. from the mpiFileUtils suite.
.UNINDENT .UNINDENT
.INDENT 0.0 .INDENT 0.0
.TP .TP
.B \-k, \-\-chunksize SIZE
Split large files into chunks of SIZE bytes to be processed. Multiple
process ranks may copy a large file in parallel. Units like "MB" and
"GB" can immediately follow the number without spaces (eg. 64MB).
The default chunksize is 1MB.
.UNINDENT
.INDENT 0.0
.TP
.B \-p, \-\-preserve .B \-p, \-\-preserve
Preserve permissions, group, timestamps, and extended attributes. Preserve permissions, group, timestamps, and extended attributes.
.UNINDENT .UNINDENT
Expand Down
5 changes: 4 additions & 1 deletion src/common/mfu_flist.h
Expand Up @@ -59,7 +59,10 @@ extern "C" {
#define DCOPY_DEF_PERMS_FILE (S_IRUSR | S_IWUSR) #define DCOPY_DEF_PERMS_FILE (S_IRUSR | S_IWUSR)
#define DCOPY_DEF_PERMS_DIR (S_IRWXU) #define DCOPY_DEF_PERMS_DIR (S_IRWXU)


/* buffer size to read/write data to file system */ /* default chunk size to split files into work units */
#define FD_CHUNK_SIZE (1*1024*1024)

/* default buffer size to read/write data to file system */
#define FD_BLOCK_SIZE (1*1024*1024) #define FD_BLOCK_SIZE (1*1024*1024)


/* /*
Expand Down
2 changes: 1 addition & 1 deletion src/common/mfu_flist_copy.c
Expand Up @@ -2768,7 +2768,7 @@ mfu_copy_opts_t* mfu_copy_opts_new(void)
opts->sparse = false; opts->sparse = false;


/* Set default chunk size */ /* Set default chunk size */
opts->chunk_size = 1*1024*1024; opts->chunk_size = FD_CHUNK_SIZE;


/* temporaries used during the copy operation for buffers to read/write data */ /* temporaries used during the copy operation for buffers to read/write data */
opts->block_size = FD_BLOCK_SIZE; opts->block_size = FD_BLOCK_SIZE;
Expand Down
29 changes: 27 additions & 2 deletions src/dcp/dcp.c
Expand Up @@ -65,7 +65,9 @@ void print_usage(void)
#ifdef LUSTRE_SUPPORT #ifdef LUSTRE_SUPPORT
/* printf(" -g, --grouplock <id> - use Lustre grouplock when reading/writing file\n"); */ /* printf(" -g, --grouplock <id> - use Lustre grouplock when reading/writing file\n"); */
#endif #endif
printf(" -b, --blocksize - IO buffer size in bytes (default 1MB)\n");
printf(" -i, --input <file> - read source list from file\n"); printf(" -i, --input <file> - read source list from file\n");
printf(" -k, --chunksize - work size per task in bytes (default 1MB)\n");
printf(" -p, --preserve - preserve permissions, ownership, timestamps, extended attributes\n"); printf(" -p, --preserve - preserve permissions, ownership, timestamps, extended attributes\n");
printf(" -s, --synchronous - use synchronous read/write calls (O_DIRECT)\n"); printf(" -s, --synchronous - use synchronous read/write calls (O_DIRECT)\n");
printf(" -S, --sparse - create sparse files when possible\n"); printf(" -S, --sparse - create sparse files when possible\n");
Expand Down Expand Up @@ -109,6 +111,8 @@ int main(int argc, char** argv)


int option_index = 0; int option_index = 0;
static struct option long_options[] = { static struct option long_options[] = {
{"blocksize" , required_argument, 0, 'b'},
{"chunksize" , required_argument, 0, 'k'},
{"debug" , required_argument, 0, 'd'}, // undocumented {"debug" , required_argument, 0, 'd'}, // undocumented
{"grouplock" , required_argument, 0, 'g'}, // untested {"grouplock" , required_argument, 0, 'g'}, // untested
{"input" , required_argument, 0, 'i'}, {"input" , required_argument, 0, 'i'},
Expand All @@ -123,10 +127,11 @@ int main(int argc, char** argv)
}; };


/* Parse options */ /* Parse options */
unsigned long long bytes = 0;
int usage = 0; int usage = 0;
while(1) { while(1) {
int c = getopt_long( int c = getopt_long(
argc, argv, "d:g:i:psSvqh", argc, argv, "b:d:g:i:k:psSvqh",
long_options, &option_index long_options, &option_index
); );


Expand All @@ -135,6 +140,16 @@ int main(int argc, char** argv)
} }


switch(c) { switch(c) {
case 'b':
if (mfu_abtoull(optarg, &bytes) != MFU_SUCCESS || bytes == 0) {
if (rank == 0)
MFU_LOG(MFU_LOG_ERR,
"Failed to parse block size: '%s'\n", optarg);
usage = 1;
} else {
mfu_copy_opts->block_size = (size_t)bytes;
}
break;
case 'd': case 'd':
if(strncmp(optarg, "fatal", 5) == 0) { if(strncmp(optarg, "fatal", 5) == 0) {
CIRCLE_debug = CIRCLE_LOG_FATAL; CIRCLE_debug = CIRCLE_LOG_FATAL;
Expand Down Expand Up @@ -182,7 +197,7 @@ int main(int argc, char** argv)
case 'g': case 'g':
mfu_copy_opts->grouplock_id = atoi(optarg); mfu_copy_opts->grouplock_id = atoi(optarg);
if(rank == 0) { if(rank == 0) {
MFU_LOG(MFU_LOG_INFO, "groulock ID: %d.", MFU_LOG(MFU_LOG_INFO, "grouplock ID: %d.",
mfu_copy_opts->grouplock_id); mfu_copy_opts->grouplock_id);
} }
break; break;
Expand All @@ -193,6 +208,16 @@ int main(int argc, char** argv)
MFU_LOG(MFU_LOG_INFO, "Using input list."); MFU_LOG(MFU_LOG_INFO, "Using input list.");
} }
break; break;
case 'k':
if (mfu_abtoull(optarg, &bytes) != MFU_SUCCESS || bytes == 0) {
if (rank == 0)
MFU_LOG(MFU_LOG_ERR,
"Failed to parse chunk size: '%s'\n", optarg);
usage = 1;
} else {
mfu_copy_opts->chunk_size = bytes;
}
break;
case 'p': case 'p':
mfu_copy_opts->preserve = true; mfu_copy_opts->preserve = true;
if(rank == 0) { if(rank == 0) {
Expand Down
12 changes: 6 additions & 6 deletions src/dcp1/dcp1.c
Expand Up @@ -268,11 +268,11 @@ void DCOPY_print_usage(void)
printf(" -d, --debug <level> - specify debug verbosity level (default info)\n"); printf(" -d, --debug <level> - specify debug verbosity level (default info)\n");
printf(" -f, --force - delete destination file if error on open\n"); printf(" -f, --force - delete destination file if error on open\n");
printf(" -p, --preserve - preserve permissions, ownership, timestamps, extended attributes\n"); printf(" -p, --preserve - preserve permissions, ownership, timestamps, extended attributes\n");
printf(" -p, --verbose - verbose output\n"); printf(" -v, --verbose - verbose output\n");
printf(" -p, --quiet - quiet output\n"); printf(" -q, --quiet - quiet output\n");
printf(" -s, --synchronous - use synchronous read/write calls (O_DIRECT)\n"); printf(" -s, --synchronous - use synchronous read/write calls (O_DIRECT)\n");
printf(" -k, --chunksize - specify chunksize in MB unit (default 1MB)\n"); printf(" -k, --chunksize - work size per task in bytes (default 1MB)\n");
printf(" -b, --blocksize - specify blocksize in MB unit (default 1MB)\n"); printf(" -b, --blocksize - IO buffer size in bytes (default 1MB)\n");
printf(" -h, --help - print usage\n"); printf(" -h, --help - print usage\n");
printf("\n"); printf("\n");
printf("Level: dbg,info,warn,err,fatal\n"); printf("Level: dbg,info,warn,err,fatal\n");
Expand Down Expand Up @@ -338,12 +338,12 @@ int main(int argc, \
DCOPY_user_opts.synchronous = false; DCOPY_user_opts.synchronous = false;


static struct option long_options[] = { static struct option long_options[] = {
{"compare" , no_argument , 0, 'c'},
{"blocksize" , required_argument, 0, 'b'}, {"blocksize" , required_argument, 0, 'b'},
{"chunksize" , required_argument, 0, 'k'},
{"compare" , no_argument , 0, 'c'},
{"debug" , required_argument, 0, 'd'}, {"debug" , required_argument, 0, 'd'},
{"force" , no_argument , 0, 'f'}, {"force" , no_argument , 0, 'f'},
{"help" , no_argument , 0, 'h'}, {"help" , no_argument , 0, 'h'},
{"chunksize" , required_argument, 0, 'k'},
{"preserve" , no_argument , 0, 'p'}, {"preserve" , no_argument , 0, 'p'},
{"verbose" , no_argument , 0, 'v'}, {"verbose" , no_argument , 0, 'v'},
{"quiet" , no_argument , 0, 'q'}, {"quiet" , no_argument , 0, 'q'},
Expand Down

0 comments on commit e559ef1

Please sign in to comment.