New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recover problems with rpc.statd #870
Comments
I assign it to me because it is about SUSE Regarding "failed to run /usr/sbin/sm-notify": @GCChelp You can add missing things for rpc.statd to the rear REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" prog1 prog2 ) COPY_AS_IS=( "${COPY_AS_IS[@]}" /path1/file1 /path2/file2 ) in /etc/rear/local conf - cf. This way it should be possible to make rpc.statd Regarding "BACKUP_OPTIONS ... prevents the creation of the live rescue system on the USB device": I submitted a speparated issue |
@jsmeix Yes, sm-notify is missing in the recovery system. I manually fumbled it into the initrd with cpio, but your advice using the configuration setting is much more straight forward. I will try the REQUIRED_PROGS variable and let you know. |
I like that you test if it also works for you when you OUTPUT=USB USB_DEVICE=/dev/disk/by-label/REAR-000 BACKUP=NETFS BACKUP_URL=nfs://nfsserver/backups/rear BACKUP_OPTIONS="nfsvers=3,nolock" OUTPUT_OPTIONS="nodiratime" To see only how things are mounted run rear -d -D mkbackup and afterwards grep for 'mount_url' in the rear log file. |
@jsmeix Nevertheless, recover still fails with
Debug output of a manual start of rpc.statd :
Adding Regarding your dummy OUTPUT_OPTIONS suggestion, I will try this too and post the results. |
@jsmeix
Hope this helps! |
It could become a lengthy step by step process The "No such file or directory" messages indicate that Additionally and/or alternatively you might have to adapt Bottom line: FYI: |
@GCChelp BACKUP_OPTIONS="nfsvers=3,nolock" OUTPUT_OPTIONS="nodiratime" both "rear mkbackup" and then "rear recover" work for you? |
I thought, that openSUSE already was supported by REAR...
Unchanged: mkbackup works and recover fails with
|
On my SUSE test systems I never had an NFS issue when I use BACKUP_OPTIONS="nfsvers=3,nolock" The only NFS issue that I had was #532 In general regarding "Support" see In general regarding "Disaster Recovery" see |
@jsmeix
Unfortunately, this option is not derived from a configuration file, but is hardcoded in the startup script... Could this be useful in ReaR rescue system as well? How could this be done? |
This was not the kind of 'support' I meant. In fact, this page led me to this GitHub repo and I signed up just for Rear.
Wow interesting and comprehensive article. Thanks for the link! Nevertheless, I will need some more time to go through this extensive stuff. |
I already made some of those steps... Now, rpc.statd still complains
The man page of sm-notify states that /var/lib/nfs/state is the NSM state number and /proc/sys/fs/nfs/nsm_local_state is the kernel's copy of the NSM state number. Any ideas how we could get this into the rescue kernel? |
Seems to come with the lockd kernel module. It even is available in the rescue system and I can load it with modprobe . After loading it, /proc/sys/fs/nfs/nsm_local_state is present! How can I configure ReaR such that the lockd kernel module gets loaded automatically? |
I think we need a proper NFS prep script to cover the dependencies (with NFSv4 there are more daemons and alike we need in the rescue image). |
@gdha Is there anything I can check to find out what's (going) wrong? |
@gdha @GCChelp # autoload these modules in the given order MODULES_LOAD=() so that MODULES_LOAD=( "${MODULES_LOAD[@]}" lockd ) is the right syntax. |
@GCChelp In usr/share/rear/conf/default.conf there is ################ ---- custom scripts # # NOTE: The scripts can be defined as an array # to better handly spaces in parameters. # The scripts are called like this: # eval "${PRE_RECOVERY_SCRIPT[@]}" # Call this after Rela-and-Recover did everything # in the recover workflow. # Use $TARGET_FS_ROOT (by default '/mnt/local') # to refer to the recovered system. POST_RECOVERY_SCRIPT= # call this before Relax-and-Recover starts # to do anything in the recover workflow. # You have the rescue system but nothing else PRE_RECOVERY_SCRIPT= I did not test it myself - perhaps you can use it |
@jsmeix The lockd module is present and can be loaded manually. But it does not get loaded automatically. I created the rescue media with debugging output, but didn't find a problem in the log. Nor did I find anything related in dmesg output after booting the rescue media. Any hints where I can find enlightenment or what specifically I should search for? |
@GCChelp Please post your complete /etc/rear/local.conf file. Additionally post what exact comands you have to run Finally I like to know what kind of system your NFS server is. Then I can try to reproduce it (but do not expect too much |
I never claimed that "rear recover" was working... It still fails with :-(
It's a Hitachi HNAS 4080 high performance storage system. We mount the NFS file systems via plain NFS3 protocol and it's working like a charm together with dozens of openSUSE workstations and SLES servers (plus machines with SGI/Irix). This is what one of the NFS mounts look like in the running system:
We use TCP for NFS. I tried to mount a filesystem with "proto=udp" instead: works fine as well. |
I did not yet try to reproduce it with your setup. For me the current GitHub rear master I have this /etc/rear/local.conf OUTPUT=ISO BACKUP=NETFS BACKUP_OPTIONS="nfsvers=3,nolock" BACKUP_URL=nfs://10.160.4.244/nfs NETFS_KEEP_OLD_BACKUP_COPY=yes SSH_ROOT_PASSWORD="rear" USE_DHCLIENT="yes" cf. In the rear recovery system during "rear recover" 10.160.4.244:/nfs on /tmp/rear.SWrvJXtLXSSvr6n/outputfs type nfs (ro,relatime,vers=3,rsize=1048576,wsize=1048576, namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2, sec=sys,mountaddr=10.160.4.244,mountvers=3, mountport=20048,mountproto=udp,local_lock=all, addr=10.160.4.244) In my "rear -d -D recover" log file the NFS mount comand is +++ mount -v -t nfs -o nfsvers=3,nolock 10.160.4.244:/nfs /tmp/rear.SWrvJXtLXSSvr6n/outputfs mount.nfs: trying 10.160.4.244 prog 100003 vers 3 prot TCP port 2049 mount.nfs: trying 10.160.4.244 prog 100005 vers 3 prot UDP port 20048 mount.nfs: timeout set for Tue Jun 14 12:44:07 2016 mount.nfs: trying text-based options 'nfsvers=3,nolock,addr=10.160.4.244' mount.nfs: prog 100003, trying vers=3, prot=6 mount.nfs: prog 100005, trying vers=3, prot=17 ++ StopIfError ... Interestingly for me rpcbind and rpc.statd are running RESCUE e137:~ # ps auxw | grep rpc root 632 0.0 0.0 34796 960 ? Ss 12:42 0:00 rpcbind root 637 0.0 0.1 17232 1308 ? Ss 12:42 0:00 rpc.statd root 669 0.0 0.0 0 0 ? S< 12:42 0:00 [rpciod] RESCUE e137:~ # journalctl | grep rpc Jun 14 12:42:07 e137 rpc.statd[637]: Version 1.2.8 starting Jun 14 12:42:07 e137 rpc.statd[637]: Failed to open directory sm: No such file or directory Jun 14 12:42:07 e137 rpc.statd[637]: Failed to read /var/lib/nfs/state: No such file or directory Jun 14 12:42:07 e137 rpc.statd[637]: Initializing NSM state Jun 14 12:42:07 e137 rpc.statd[637]: Running as root. chown /var/lib/nfs to choose different user Regarding 'rpc' my "rear -d -D recover" log file contains ++ PROGS=(${PROGS[@]:-} rpc.statd rpcbind ... rpcinfo ... ... ++ MODULES=(${MODULES[@]:-} ... sunrpc ... ... ++ COPY_AS_IS=(${COPY_AS_IS[@]:-} ... /etc/rpc ... ... ++ CLONE_USERS=("${CLONE_USERS[@]:-}" ... rpc ... ... ++ has_binary rpcbind ++ type rpcbind ++ rpcinfo -p localhost ++ rpcbind ++ StopIfError 'Could not start port mapper [rpcbind] !' ++ rpcinfo -p localhost ++ has_binary rpc.statd ++ type rpc.statd ++ rpcinfo -p localhost ++ rpc.statd ++ StopIfError 'Could not start rpc.statd !' My NFS server is an openSUSE Leap 42.1 system /nfs *(rw,no_root_squash,sync,no_subtree_check) For my general testing setup I think this particular isue here is something special I feel there is very little what I can really do here What I can do is basically only blind guesswork. |
Regarding 'sm-notify' in the initial comment: |
In my recovery system, rpcstat.d is not running:
But nevertheless, I can manually mount the NFS file system, if I do it without locking: And it looks very much like your mount output from above, only IP addresses and [rw]size differ! |
in your initial comment #870 (comment) you wrote Work-around, if any ln -s /bin/true /bin/rpc.statd but in your later comment #870 (comment) you wrote > Additionally post what exact comands you have to run > manually in the recovery system so that afterwards > "rear recover" works successfully for you. I never claimed that "rear recover" was working... It still fails with ERROR: Could not start rpc.statd ! and now you wrote I can manually mount the NFS file system I am confused. When you can manually mount the NFS file system Or in other words: When you can manually mount the NFS file system The "Could not start rpc.statd !" is only in rpc.statd StopIfError "Could not start rpc.statd !" and if you change that to something like rpc.statd LogPrintIfError "Could not start rpc.statd !" it would only log the error message to the rear log file Does then "rear recover" work for you? |
overhauled 05_start_required_daemons.sh use plain 'rpcinfo -p' see issue #889 make it no longer fatal when rpc.statd is unavailable see isuue #870 removed all references to FD8 see issue #887 and pull request #874 first steps to be prepared for 'set -eu' see https://github.com/rear/rear/wiki/Coding-Style
With #891 Instead it now only shows a message like @GCChelp |
@jsmeix For completeness sake: I repeated my experiments with an NFS server based on openSUSE 13.1.
Sorry to confuse you. ;-) But even with them I don't get a working system with "rear recover": It runs and finishes. But when I try to boot the recovered system, it fails because GRUB 2 is not configured correctly. That doesn't really surprise me, because we are still using GRUB 1 (legacy)... But I suspect, this would be substance for another (new) issue?
Yes, with this workaround "rear recover" runs successful! And with this workaround I even no longer need any definition of REQUIRED_PROGS, COPY_AS_IS and MODULES_LOAD in my configuration!
When I can manually mount the NFS file system with "-o nolock", why can't rear do this when I define |
Regarding GRUB 2 versus GRUB legacy: Regarding rpc.statd LogPrintIfError "Could not start rpc.statd !" with this workaround "rear recover" runs successful This means that #891 Regarding When I can manually mount the NFS file system with "-o nolock", why can't rear do this when I define BACKUP_OPTIONS="nfsvers=3,nolock" "rear recover" did not reach the point where it would mount BACKUP_OPTIONS="nfsvers=3,nolock" ... In my "rear -d -D recover" log file the NFS mount comand is +++ mount -v -t nfs -o nfsvers=3,nolock 10.160.4.244:/nfs /tmp/rear.SWrvJXtLXSSvr6n/outputfs |
Will do this.
Confirmed |
rear version (/usr/sbin/rear -V): Relax-and-Recover 1.18 / Git
OS version (cat /etc/rear/os.conf or lsb_release -a):
OS_VENDOR=SUSE_LINUX
OS_VERSION=13.1
ARCH='Linux-i386'
OS='GNU/Linux'
OS_VERSION='13.1'
OS_VENDOR='SUSE_LINUX'
OS_VENDOR_VERSION='SUSE_LINUX/13.1'
OS_VENDOR_ARCH='SUSE_LINUX/i386'
rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
OUTPUT=USB
USB_DEVICE=/dev/disk/by-label/REAR-000
BACKUP=NETFS
BACKUP_URL=nfs://nfsserver/backups/rear
EXCLUDE_MOUNTPOINTS=( /home /scratch )
AUTOEXCLUDE_PATH=( /media /mnt )
AUTOEXCLUDE_AUTOFS=..
AUTOEXCLUDE_DISKS=y
SSH_ROOT_PASSWORD=XXX
Brief description of the issue
Experimenting with REAR on openSUSE 13.1, 64-bit.
Restore fails with
ERROR: Could not start rpc.statd !
In fact, it even can't be started manually:
rpc.statd -F
rpc.statd: failed to run /usr/sbin/sm-notify
BTW: Adding
BACKUP_OPTIONS="nfsvers=3,nolock"
to site.conf does change nothing. (And further: prevents the creation of the live rescue system on the USB device, because the backup options are used there as well.)
ln -s /bin/true /bin/rpc.statd
The text was updated successfully, but these errors were encountered: