-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rsync subprocess exits immediately and the file count number not match #36
Comments
The behavior you describe at the end of your note is the expected
behavior. It's a feature, not a bug of 'fpart', so that the rsyncing can
start before the recursive descent of the entire tree is finished. ie:
rsync is fired off with the completed chunk file as soon as it is done.
I seem to remember the behavior you describe tho from an older version of
parsyncfp. What version are you running?
Also, if you are using the '--altcache' option, make sure that the logs,
etc you're reading are from the correct cache dir. It can get confusing
once you start generating lots of cache files.
However, assuming that you're correct and the logs are what you say, the
small sizes of bytes sent could be the result of files already existing at
the target, and only the checksums being exchanged. That is what makes
rsync so efficient.
However, the output of the fpart log and the sizes of the chunk files
should be proportional - In a recent test on my system,
if the fpart log says:
1 Examining filesystem...
2 Filled part #0: size = 4587626807, 24151 file(s)
3 Filled part #1: size = 4497063936, 2 file(s)
4 Filled part #2: size = 4344097561, 2255 file(s)
5 Filled part #3: size = 4312630307, 139 file(s)
6 Filled part #4: size = 4769913757, 202 file(s)
7 Filled part #5: size = 4295052994, 838 file(s)
8 Filled part #6: size = 4759503831, 86444 file(s)
9 Filled part #7: size = 4341861400, 31980 file(s)
10 Filled part #8: size = 2771762905, 1476 file(s)
the wordount of the f.* chunk files should be very close (sometimes they
differ due to permissions or spaces)
$ wc -l f.*
24151 f.0
2 f.1
2256 f.2
139 f.3
202 f.4
838 f.5
86444 f.6
31980 f.7
1476 f.8
Harry
…On Mon, Aug 10, 2020 at 7:55 PM idlej ***@***.***> wrote:
Hi all,
I create a parsyncfp job to migrate about 4T files with a 4G chunksize.
perl /opt/parsyncfp/parsyncfp --NP=10
--altcache=/root/.parsyncfp/jiaoyu_gv_upload --startdir=/mnt/ps15
jiaoyu_gv_upload /mnt/hw_9000 --maxload=32 --verbose=3 --chunksize=4G
--interface=eth4
The rsync process starts and exits immediately after the corresponding
cachefile is created.
The fpart log tells the cachefile a huge size
***@***.*** fpcache]# cat fpart.log.09.34.13_2020-08-11
Examining filesystem...
Filled part #0: size = 4294972562, 388939 file(s)
Filled part #1: size = 4294970326, 528222 file(s)
Filled part #2: size = 4294971458, 533180 file(s)
Filled part #3: size = 4294977049, 481518 file(s)
Filled part #4: size = 4294967610, 479615 file(s)
Filled part #5: size = 4294972108, 491070 file(s)
Filled part #6: size = 4294967691, 448095 file(s)
But actually there are only few lines in the cachefiles.
***@***.*** fpcache]# wc -l f*
246 f.0
1982 f.1
1033 f.2
1712 f.3
1331 f.4
614 f.5
636 f.6
2108 f.7
8 fpart.log.09.34.13_2020-08-11
9670 total
There are only small size of files be sent.
***@***.*** jiaoyu_gv_upload]# cat rsync-logfile-09.34.13_2020-08-11_*
2020/08/11 09:34:19 [8718] building file list
2020/08/11 09:34:19 [8718] sent 14196 bytes received 17 bytes total size 3123551
2020/08/11 09:43:48 [11469] building file list
2020/08/11 09:43:52 [11469] sent 50222 bytes received 17 bytes total size 15715093
2020/08/11 09:49:04 [13024] building file list
2020/08/11 09:49:09 [13024] sent 31495 bytes received 17 bytes total size 8388468
2020/08/11 09:54:41 [14649] building file list
2020/08/11 09:54:45 [14649] sent 44404 bytes received 17 bytes total size 13828482
2020/08/11 09:59:28 [16039] building file list
2020/08/11 09:59:32 [16039] sent 35528 bytes received 17 bytes total size 12950497
2020/08/11 10:04:12 [17434] building file list
2020/08/11 10:04:13 [17434] sent 16247 bytes received 17 bytes total size 5056321
2020/08/11 10:08:45 [18741] building file list
2020/08/11 10:08:47 [18741] sent 17912 bytes received 17 bytes total size 5144628
2020/08/11 10:12:27 [19802] building file list
2020/08/11 10:12:32 [19802] sent 58061 bytes received 17 bytes total size 23893062
2020/08/11 10:15:15 [20632] building file list
2020/08/11 10:15:17 [20632] sent 2398 bytes received 17 bytes total size 831982
I think the problem may be that the rsync starts when the corresponding
cachefile just created and the rsync won't wait for the cachefile
generating complete.
I don't know if this is the expected scenario and how to fix this.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#36>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASF3Y3HZVGUSWQCZ3RJPF3SACXJPANCNFSM4P2SCZTQ>
.
--
Harry Mangalam
|
Hi Harry, I understand the strategy that
I don't know if any version of rsync will watch the chunkfile and process the newly appended lines after the chunkfile opened. And it's also quiet strange the fpart stopped to write to the chunkfile. |
Hi there - Sorry for the delay.
I just exchanged an email with Ganael (fpart author). He reminded me that
in live mode (-L) which I use, you can specify the -W option which I do not
(yet) to execute the dependent cmd after the chunk file (his term is
'partition') is written instead of when the file exists.
I'll have to look into how to re-organize parsyncfp (pfp) to handle this -
right now fpart is forked and so doesn't communicate with pfp. I'll have
to have it write a marker file or something to indicate that the file is
closed.
Busy today, so this might not get done immediately.
Stay tuned.
Harry
…On Mon, Aug 10, 2020 at 10:08 PM idlej ***@***.***> wrote:
@hjmangalam <https://github.com/hjmangalam>
Hi Harry,
I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp
version 1.67.
I understand the strategy that rsync is fired off with the completed
chunk file as soon as it is done.
But in my case, the rsync is fired when the chunk just been created. And
in the code the condition to fork a new rsnyc process is just the existance
of the new chunkfile.
while ( $RSYNCS_GOING < $NP && $KEEPGOING ) { #
$CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI; # the current fp chunkfile
if ( -e $CUR_FP_FLE ) { # if the current chunkfile exists <----------------------
fixfilenames( $CUR_FP_FLE, $ROOTDIR ); # check & fix for spaces, bad chars.
# entire rsync command and PID capture (used in total of 2 places)
$logfile = $bytefiles . $CUR_FPI;
$RSYNC_CMD =
"cd $TRIMPATH && rsync --bwlimit=$MAXBW $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE $ROOTDIR $TARGET & echo \"\${!}\" >> $PIDFILE";
if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); }
# WARN("$RSYNC_CMD");
# there will be as many logfiles as fp chunkfiles.
# ie LOTS. but they can be deleted after the run has been verified..
# TODO don't know if we need this logfile.
if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); }
system("$RSYNC_CMD"); # launch rsync and capture the bg job PID to PIDfile
I don't know if any version of rsync will watch the chunkfile and process
the newly appended lines after the chunkfile opened.
But in my case it seems the rsync just handle the few lines then exit.
And it's also quiet strange the fpart stopped to write to the chunkfile.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#36 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASF3Y44LKDKIDNTQDA55N3SADG3NANCNFSM4P2SCZTQ>
.
--
Harry Mangalam
|
Hi there,
I've added the bits to use only closed chunk files but am going thru the
code again with some more oversight to make sure I haven't missed
anything.
If you'd like to test the current version I'm working on, send me a note
with your private email and I'll send it to you. Otherwise it should be
pushed to github within a day or so.
Harry
…On Wed, Aug 12, 2020 at 8:48 AM Harry Mangalam ***@***.***> wrote:
Hi there - Sorry for the delay.
I just exchanged an email with Ganael (fpart author). He reminded me that
in live mode (-L) which I use, you can specify the -W option which I do not
(yet) to execute the dependent cmd after the chunk file (his term is
'partition') is written instead of when the file exists.
I'll have to look into how to re-organize parsyncfp (pfp) to handle this -
right now fpart is forked and so doesn't communicate with pfp. I'll have
to have it write a marker file or something to indicate that the file is
closed.
Busy today, so this might not get done immediately.
Stay tuned.
Harry
On Mon, Aug 10, 2020 at 10:08 PM idlej ***@***.***> wrote:
> @hjmangalam <https://github.com/hjmangalam>
>
> Hi Harry,
> I use fpart v1.2.0, rsync version 3.1.2 protocol version 31 and parsyncfp
> version 1.67.
>
> I understand the strategy that rsync is fired off with the completed
> chunk file as soon as it is done.
> But in my case, the rsync is fired when the chunk just been created. And
> in the code the condition to fork a new rsnyc process is just the existance
> of the new chunkfile.
>
> while ( $RSYNCS_GOING < $NP && $KEEPGOING ) { #
>
> $CUR_FP_FLE = $FP_ROOT . "." . $CUR_FPI; # the current fp chunkfile
>
> if ( -e $CUR_FP_FLE ) { # if the current chunkfile exists <----------------------
>
> fixfilenames( $CUR_FP_FLE, $ROOTDIR ); # check & fix for spaces, bad chars.
>
> # entire rsync command and PID capture (used in total of 2 places)
>
> $logfile = $bytefiles . $CUR_FPI;
>
> $RSYNC_CMD =
>
> "cd $TRIMPATH && rsync --bwlimit=$MAXBW $RSYNCOPTS -a -s --log-file=$logfile --files-from=$CUR_FP_FLE $ROOTDIR $TARGET & echo \"\${!}\" >> $PIDFILE";
>
> if ( $VERBOSE >= 2 ) { INFO("Starting rsync for chunkfile [$CUR_FP_FLE]..\n"); }
>
> # WARN("$RSYNC_CMD");
>
> # there will be as many logfiles as fp chunkfiles.
>
> # ie LOTS. but they can be deleted after the run has been verified..
>
> # TODO don't know if we need this logfile.
>
> if ($DEBUG) { debug( __LINE__, "Complete rsync cmd = [$RSYNC_CMD]" ); }
>
> system("$RSYNC_CMD"); # launch rsync and capture the bg job PID to PIDfile
>
>
> I don't know if any version of rsync will watch the chunkfile and process
> the newly appended lines after the chunkfile opened.
> But in my case it seems the rsync just handle the few lines then exit.
>
> And it's also quiet strange the fpart stopped to write to the chunkfile.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#36 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AASF3Y44LKDKIDNTQDA55N3SADG3NANCNFSM4P2SCZTQ>
> .
>
--
Harry Mangalam
--
Harry Mangalam
|
Hi all,
I create a parsyncfp job to migrate about 4T files with a 4G chunksize.
perl /opt/parsyncfp/parsyncfp --NP=10 --altcache=/root/.parsyncfp/jiaoyu_gv_upload --startdir=/mnt/ps15 jiaoyu_gv_upload /mnt/hw_9000 --maxload=32 --verbose=3 --chunksize=4G --interface=eth4
The rsync process starts and exits immediately after the corresponding cachefile is created.
The fpart log tells the cachefile a huge size
But actually there are only few lines in the cachefiles.
There are only small size of files be sent.
I think the problem may be that the rsync starts when the corresponding cachefile just created and the rsync won't wait for the cachefile generating complete.
I don't know if this is the expected scenario and how to fix this.
The text was updated successfully, but these errors were encountered: