Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't seem to get parsyncfp 1.63 or newer to work right #33

Open
novosirj opened this issue Apr 29, 2020 · 5 comments
Open

Can't seem to get parsyncfp 1.63 or newer to work right #33

novosirj opened this issue Apr 29, 2020 · 5 comments

Comments

@novosirj
Copy link

novosirj commented Apr 29, 2020

Here's an example of what happens, tested with parsyncfp 1.65 with the -d flag so we could see what is going on. It doesn't correctly do the fpart section, and it seems like never generates a list based on my input list?

[root@quorum03 bin]# env | grep -ie homedir -ie snapshot -ie mountpoint -ie backuphost                                                            BACKUPHOST=nas1                                                                                                                                   SNAPSHOT=projectsc
MOUNTPOINT=projects
HOMEDIR=/gpfs/cache/home/root/policy
[root@quorum03 bin]# /usr/local/bin/parsyncfp-1.65 -d -i ens6 --checkperiod 1800 --nowait --altcache '/root/.parsyncfp-backups' -NP 12 --rsyncopts '-a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no"' --maxload 96 --chunksize=10T --fromlist=$HOMEDIR/$SNAPSHOT.list.allfiles.clean --trimpath=/$MOUNTPOINT/.snapshots/$SNAPSHOT --trustme $BACKUPHOST:/zdata/gss/$SNAPSHOT
DEBUG[359]: FPARTSIZE set to: [10T]
FPARTSIZE_N set to [10995116277800]
Press [ENTER] to continue.

DEBUG[380]: Using network interface [ens6] with connection quality [ 100000Mb/s]
Press [ENTER] to continue.

DEBUG[385]: 1m load is [0.85] and the 1m Load:#CPU ratio is [0.0354166666666667] ( [24] CPU cores).
    OK to continue.
Press [ENTER] to continue.

WARN: About to remove all the old cached chunkfiles from [/root/.parsyncfp-backups/fpcache].
  Enter ^C to stop this.
  If you specified '--nowait', cache will be cleared in 3s regardless.
  Otherwise, hit [Enter] and I'll clear them. 

INFO: The fpart chunk files [/root/.parsyncfp-backups/fpcache/f*] are cleared .. continuing.
INFO: The TRIMPATH you specified exists, is a dir, and is readable.
INFO: Alternative file list is readable; converting list to chunks.
DEBUG[624]: fpart fork cmd:
[cd /projects/.snapshots/projectsc; fpart -v -L -s 10995116277800 -a  -i /tmp/pfp.tmp -o /root/.parsyncfp-backups/fpcache/f 2> /root/.parsyncfp-backups/fpcache/fpart.log.00.41.53_2020-04-29 & echo "${!}" > /root/.parsyncfp-backups/fpcache/FP_PIDFILE00.41.53_2020-04-29]
Press [ENTER] to continue.
INFO: Forking fpart with PID = [6639].  Check [/root/.parsyncfp-backups/fpcache/fpart.log.00.41.53_2020-04-29] for errors if it hangs.
INFO: Waiting for fpart to be forked.
INFO: [/root/.parsyncfp-backups/fpcache/f.0] visible.
INFO: Waiting [1]s for chunk files to be written.
INFO: Starting the 1st [12] rsyncs ..
DEBUG[685]: Complete rsync cmd = [cd /projects/.snapshots/projectsc && rsync  --bwlimit=1000000  -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" -a -s --log-file=/root/.parsyncfp-backups/rsync-logfile-00.41.53_2020-04-29_0 --files-from=/root/.parsyncfp-backups/fpcache/f.0  /projects/.snapshots/projectsc  nas1:/zdata/gss/projectsc  & echo "${!}" >> /root/.parsyncfp-backups/fpcache/rsync-PIDs-00.41.53_2020-04-29]
Press [ENTER] to continue.

DEBUG[704]: OUT OF RSYNC STARTUP LOOP
Press [ENTER] to continue.

WARN: 
    The number of chunk files generated by fpart [1] < the # of rsync 
    processes you specified [12].  
    Did you check the dir tree / file list to make sure you're setting the chunk 
    size appropriately (--chunksize) ?  It's currently set to [10T].
     
        | Elapsed |   1m   |    [   ens6]   MB/s  | Running || Susp'd  |      Chunks       [2020-04-29]
  Time  | time(m) |  Load  |     TCP / RDMA  out  |   PIDs  ||  PIDs   | [UpTo] of [ToDo]
DEBUG[1105]: sPIDs string in get_rPIDs() = []
Press [ENTER] to continue.
^C
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(638) [sender=3.1.2]

[root@quorum03 bin]#

I'm not sure why this happens. Here's what the fpart log looks like (this wasn't the exact same run, but this is the same as it looked like; I didn't save the output):

[root@quorum03 ~]# cat /root/.parsyncfp-backups/fpcache/fpart.log.00.40.37_2020-04-29
Examining filesystem...
error parsing input values:
Filled part #0: size = 0, 1 file(s)
1 file(s) found.

I'm not sure what's going on here, or how to debug. Originally, I didn't have stats and scut installed. An issue on my system is that there is a stats command that belongs to Confluent that's installed on the system and in the PATH. So I change the PATH with export PATH=... where PATH no longer contains the Confluent PATH with stats but does contain /usr/local/bin which is where I have stats, scut, and fpart that are required for parsyncfp. But I'm not sure if that's not working out somehow, or if something else is going wrong?

The only other thing I can think of is that our file specified by --fromlist= actually is formatted with a space between the filesize and the PATH instead of the that's specified in the documentation/usage info. This has never been a problem before, however.

For the moment, I've dropped back to 1.61 to continue my backups. Never tried 1.62.

@hjmangalam
Copy link
Owner

hjmangalam commented Apr 29, 2020 via email

@novosirj
Copy link
Author

Any help is welcome whenever it comes; no real rush for us. I won't be trying anything until next Tuesday most likely (we run this weekly at 23:00 EST/EDT on Tuesdays).

@hjmangalam
Copy link
Owner

hjmangalam commented Apr 30, 2020 via email

@novosirj
Copy link
Author

I'll have to have it try to do that transfer again when it wraps up work on that filesystem to see what the content of the fpart files is; the amount of source data changes from week to week, theoretically, so it might not happen next week. I never considered what might happen if the parts are broken up in 10TB chunks and there aren't more than a couple of them.

@novosirj
Copy link
Author

novosirj commented May 8, 2020

1.67 seems to have solved this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants