Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL7.6: rpcinfo -p fails when starting NFS #2341

Closed
hpannenb opened this issue Mar 17, 2020 · 10 comments
Closed

RHEL7.6: rpcinfo -p fails when starting NFS #2341

hpannenb opened this issue Mar 17, 2020 · 10 comments
Assignees
Labels
enhancement Adaptions and new features fixed / solved / done minor bug An alternative or workaround exists

Comments

@hpannenb
Copy link
Contributor

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.5 / 2019-05-10

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
    Red Hat Enterprise Linux Server 7.6 (Maipo)

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

[...]
export TMPDIR="/var/tmp"

OUTPUT=PXE
OUTPUT_PREFIX_PXE=pxe-iso-$HOSTNAME

BACKUP=NETFS
BACKUP_URL=nfs://<SERVER>/var/backup/

PXE_TFTP_URL=nfs://<SERVER>/var/lib/tftpboot
PXE_CONFIG_URL=nfs://<SERVER>/var/lib/tftpboot/pxelinux.cfg

USE_DHCLIENT=
USE_STATIC_NETWORKING="y"
USE_RESOLV_CONF=n

MODULES=()
[...]
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
    BareMetal
  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x86
  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    BIOS
  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    local disk
  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):
    ...
  • Description of the issue (ideally so that others can reproduce it):
    The recovery fails because the NFSv4 shared backup directory /var/backup on cannot be accessed. rpcinfo -p always fails in NETFS/default/050_start_required_nfs_daemons.sh
  • Workaround, if any:
    In the RESCUE environment adjust the line 58 in file NETFS/default/050_start_required_nfs_daemons.sh from rpcinfo -p to rpcinfo -p <SERVER>. Afterwards rear recover works flawlessly.
  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
+ echo '2020-03-17 15:14:49.690880709 Leaving debugscripts mode (back to previous bash flags and options settings).'
2020-03-17 15:14:49.690880709 Leaving debugscripts mode (back to previous bash flags and options settings).
2020-03-17 15:14:49.697364352 Including verify/NETFS/default/050_start_required_nfs_daemons.sh
2020-03-17 15:14:49.699329362 Entering debugscripts mode via 'set -x'.
+ source /usr/share/rear/verify/NETFS/default/050_start_required_nfs_daemons.sh
+++ url_scheme nfs://ts9esesv805/var/backup/
+++ local url=nfs://ts9esesv805/var/backup/
+++ local scheme=nfs
+++ echo nfs
+++ grep -q :
+++ echo nfs
++ local backup_url_scheme=nfs
++ test nfs = nfs
++ local attempt=
++ local portmapper_program=
++ LogPrint 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ Log 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ echo '2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if availab
le.'
2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.
++ Print 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ has_binary portmap
++ for bin in '$@'
++ type portmap
/usr/share/rear/lib/_input-output-functions.sh: line 457: type: portmap: not found
++ return 1
++ has_binary rpcbind
++ for bin in '$@'
++ type rpcbind
++ return 0
++ portmapper_program=rpcbind
++ rpcinfo -p
++ rpcbind
++ LogPrint 'Started RPC portmapper '\''rpcbind'\''.'
++ Log 'Started RPC portmapper '\''rpcbind'\''.'
++ echo '2020-03-17 15:14:49.717288722 Started RPC portmapper '\''rpcbind'\''.'
2020-03-17 15:14:49.717288722 Started RPC portmapper 'rpcbind'.
++ Print 'Started RPC portmapper '\''rpcbind'\''.'
+++ seq 5
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ rpcinfo -p
++ grep -q portmapper
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ test ok = 5
++ Error 'RPC portmapper '\''rpcbind'\'' unavailable.'
++ PrintError 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ PrintError 'Some latest log messages since the last called script 050_start_required_nfs_daemons.sh:'
++ PrintError '  2020-03-17 15:14:49.697364352 Including verify/NETFS/default/050_start_required_nfs_daemons.sh
  2020-03-17 15:14:49.699329362 Entering debugscripts mode via '\''set -x'\''.
  2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.
  /usr/share/rear/lib/_input-output-functions.sh: line 457: type: portmap: not found
  2020-03-17 15:14:49.717288722 Started RPC portmapper '\''rpcbind'\''.'
++ Log 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ echo '2020-03-17 15:14:54.761443203 ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
2020-03-17 15:14:54.761443203 ERROR: RPC portmapper 'rpcbind' unavailable.
++ LogToSyslog 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ logger -t rear -i 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ echo '===== Stack trace ====='

Execution of rpcinfo in the RESCUE environment:

RESCUE <CLIENT>:~ # rpcinfo -p
rpcinfo: can't contact portmapper: RPC: Remote system error - Connection refused

RESCUE <CLIENT>:~ # rpcinfo -p <SERVER>
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    1   udp  20048  mountd
    100005    1   tcp  20048  mountd
    100024    1   udp  54592  status
    100024    1   tcp  37424  status
    100005    2   udp  20048  mountd
[...]
@hpannenb
Copy link
Contributor Author

seems to be related to #2250

@pcahyna
Copy link
Member

pcahyna commented Mar 17, 2020

@rmetrich @yontalcar can you please have a look?

@hpannenb
Copy link
Contributor Author

hpannenb commented Mar 17, 2020

I checked on https://access.redhat.com/solutions/1132143 and I got a trigger looking into /var/log/messages of the RESCUE environment:

Mar 17 18:15:20 CLIENT rpcbind[12127]: cannot get uid of 'rpc': Success

Adding the user "rpc" to the /etc/passwd

[...]
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
[...]

in the RESCUE environment solves this rpcbind startup failure. So it should be considered to include not only rpcuser but also the rpc user.

I will give it a try with adding this to local.conf:

CLONE_USERS=( ${CLONE_USERS[@]} rpcuser rpc )

which should be changed in Line 12 of 100_check_nfs_version.sh thereafter.

@hpannenb
Copy link
Contributor Author

I changed the mentioned line on my test system, rpcbind is starting and the NFS based recovery is working now.

@jsmeix jsmeix added enhancement Adaptions and new features minor bug An alternative or workaround exists labels Mar 18, 2020
@hpannenb
Copy link
Contributor Author

@jsmeix so in most of the cases it is better safe than sorry to always enabled the parameter with CLONE_ALL_USERS_GROUPS=yes.

Since Your change and mine were merged into current version I will close the issue.

@pcahyna
Copy link
Member

pcahyna commented Mar 19, 2020

@hpannenb thanks, so the addition of the parameter after -p is not needed anymore?

@xjezda00 I am curious about this, IIUC you have been testing restore from NFS regularly, have you seen this problem? If not, why not?

@pcahyna
Copy link
Member

pcahyna commented Mar 19, 2020

@hpannenb thanks, so the addition of the parameter after -p is not needed anymore?

Ah I see, it is intentionally called without a parameter according to a comment in the file:

# use plain 'rpcinfo -p' to check if RPC service is available
# instead of using 'rpcinfo -p localhost' because the latter
# does not work on some systems while the former works everywhere
https://github.com/rear/rear/issues/889

nevermind.

jsmeix added a commit that referenced this issue Mar 19, 2020
Added required NFS users for proper rpcbind startup:
Usually 'rpcuser' is used but 'rpc' is used in RHEL7.x
and '_rpc' is used in Debian 10,
see #2341
and #2250
@jsmeix jsmeix self-assigned this Mar 19, 2020
@jsmeix
Copy link
Member

jsmeix commented Mar 19, 2020

With #2342 merged
this issue should be fixed.

@hpannenb
thank you for your analysis what the root cause was
and for your fix!

@jsmeix
Copy link
Member

jsmeix commented Mar 19, 2020

@hpannenb
regarding your #2341 (comment)

I did not set CLONE_ALL_USERS_GROUPS="yes".
The default is still CLONE_ALL_USERS_GROUPS="no"
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L1412

@jsmeix
Copy link
Member

jsmeix commented Mar 19, 2020

I did #2345
to get the question answered if we should have
CLONE_ALL_USERS_GROUPS="yes" by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adaptions and new features fixed / solved / done minor bug An alternative or workaround exists
Projects
None yet
Development

No branches or pull requests

3 participants