-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
showing full command line when lockfile exists #237
Comments
Why do you need to see the full command line? What extra information is available from the full command line? It just going to be more or less "rsnapshot -c /my/config/file some-level" isn't it? And you'll see that in your log files if your running from cron (grep "CRON" /var/log/syslog) and same for systemd anyway? Also not sure how portable "/proc/$pid/" is. |
it is precisely the which is interesting to see. When the lock file exists, the whole question is, what level created it (i.e., is taking longer to run than expected). Yes, the information is in the various log files, but my experience is that it is not simple to correlate the pid from the "lock file exists" mail with the process that created the lock file and the process that aborted. Painful enough that I went to the trouble of hacking the code. Right, as I said, I also doubt /proc/$pid/cmdline is universally portable. It was just the simplest method so I could figure out my cron timing problems. Clearly it would simple enough to write the cmdline to the lock file and read it back, to avoid going through /proc. |
Fair enough. Yes to pid, no to full command line. It's overkill. Feel free to PR. |
I think I'd be OK with showing the command line, but note that in the patch it would show the first process's pid but the second process's command line, which is definitely not what you want! Something like this would be OK I think if verbosity is turned up:
As for how you get the command line - you're right, using /proc like this is not portable, it's a Linuxism. procfs exists in most modern Unixes (OpenBSD being a notable exception) but doesn't necessarily expose all the information needed. You'd need to implement it using something like Proc::ProcessTable (don't look at the source, it's a horrid lump of platform-specific C which will make your eyes bleed). Obviously that would have to be an optional dependency just like how rsnapshot will use Lchown if available. However, to solve @kberry's original problem - the rule of thumb is that you set up your cron jobs so that the least frequent runs first, then the next least frequent, and so on up to the most frequent. Except for the most frequent, they should all run very fast as all they're doing is renaming directories. Make sure that you have lazy deletes turned on so that deletion of expired backups is delayed to the end of the process and done after the lock file has been removed. My crontab looks like this ...
So my yearly job shuffles things around at 0215, the monthly one at 0230, the weekly at 0245, and the daily does all the hard work at 0300. In practice just a one or two minute gap should be sufficient even on a heavily IO-loaded machine provided you go in that order - basically enough time to spin-up the disks if necessary, plus a few seconds. |
thanks!! lack of use_lazy_deletes was my whole problem, hence the less-frequent runs were potentially taking significant time to finish. I was using the rsnapshot packaged for centos7, where it is not enabled by default :(. Should have looked harder. Thanks again. |
Perhaps we should change that default. Thoughts, people? |
Is there a downside to changing the default? In any case, perhaps it would be worth adding a caveat to the use_lazy_deletes description in the conf file, and/or the "USAGE" section in the man page, and/or the FAQ on the web site, that the upper levels can take longer. The man page now recommends increasing the time between backups when there is collisions, ("If rsnapshot takes longer than 10 minutes ...) which is what led me down this path. |
Also, I thought this idea was dead, but in any case, I'm confused about the need for Proc::ProcessTable. The email from cron already shows the command line of the process that gets aborted. What I needed was the command line of the process that held the lock. Seems like this could easily be reported by outputting the command line into the lock file as in print join (" ", @argv)) and then reading it back. However, perhaps an even more useful, and simpler, idea is to have the abort message say "maybe you should set use_lazy_deletes" for backups not at the first level and if the directive isn't already set. Just an idea. Thanks again. |
Having Only one downside comes to my mind at the moment: if the process is interrupted, the |
👍 agree it should be the default. Downsides: Uses a snapshot worth more disk space, and what @StefanMoch mentions. Maybe should add a warning about unhandled "_delete.$$" dirs. Can't see much risk in adding code to detect and restarting "rm _delete.$$" in code either. But even without that I think the benefits outweigh the costs here. Would need a major version bump though. |
For the reasons I wrote in #236, it would be nice if rsnapshot showed the full command line of the running command when it aborts due to the lock file existing. That way it is easier to know which level is taking longer than expected, so the cron times can be adjusted.
I hacked this in using /proc/$pid/cmdline (and then changing the nulls used in that interface to spaces), but clearly it would be more portable to put the cmdline args into the lockfile and then cat the file, or any of myriad other approaches. Sorry, but this is what I've got to show.
Thanks for considering.
The text was updated successfully, but these errors were encountered: