Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boot from rescue ISO freezes at "pid_max: default: 32768 minimum: 301" on "ESXi 6.7 Update 2 and later" virtual machine #2439

Closed
will-code-for-pizza opened this issue Jun 25, 2020 · 19 comments
Labels
external tool The issue depends on other software e.g. third-party backup tools. fixed / solved / done not ReaR / invalid The root cause is not in the ReaR code or ReaR is misused. special hardware or VM The issue depends on non common (virtual) hardware. support / question

Comments

@will-code-for-pizza
Copy link

will-code-for-pizza commented Jun 25, 2020

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    2.4 to 2.6

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
    SuSE SLES 11.3

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

site.conf

BACKUP=NETFS
OUTPUT=ISO
RETENTION_TIME="Week"
USE_CFG2HTML=y
BACKUP_PROG=tar
BACKUP_PROG_COMPRESS_OPTIONS="--gzip"
LOGFILE="/var/log/rear/$HOSTNAME.log"
BACKUP_PROG_EXCLUDE=( '/mnt/*' '/cache/*' '/tmp/*' '/dev/shm/*' $VAR_DIR/output/\* )
BACKUP_PROG_INCLUDE=( '/*' '/boot/' '/home/*' '/opt/*' '/usr/*' '/var/*' '/oracleoem/*' )

local.conf

NETFS_KEEP_OLD_BACKUP_COPY=yes
BACKUP_URL=nfs://139.1.xxx.xxx/srv/IMAGES/rear/
MIGRATION_MODE='true'
BOOTLOADER="GRUB"
USE_STATIC_NETWORKING="yes"
NETWORKING_PREPARATION_COMMANDS=( 'ip addr add 139.1.xx.xx/25 dev eth0' 'ip link set dev eth0 up' 'ip route add default via 139.1.xx.1' 'return' )
KERNEL_CMDLINE="cgroup_disable=memory" # otherwise it throws some cgroup errors while booting
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):

ESX virtual guest

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):

amd64

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):

BIOS / Grub

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):

vmdk-virtual disks on SAN Storage Backend

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):

Could not be determined cause of hanging boot

  • Description of the issue (ideally so that others can reproduce it):

We want to P2V migrate a physical server to ESX virtual machine.
While booting the rescue ISO into virtual machine,
the boot from rescue ISO freezes at

pid_max: default: 32768 minimum: 301

on ESX virtual machine

  • Workaround, if any:

NONE

@jsmeix jsmeix added support / question special hardware or VM The issue depends on non common (virtual) hardware. labels Jun 25, 2020
@jsmeix
Copy link
Member

jsmeix commented Jun 25, 2020

@will-code-for-pizza
on my current openSUSE Leap 15.1 x86_64 home-office laptop I get

# dmesg | grep -3 pid_max

[    0.000000] tsc: Fast TSC calibration using PIT
[    0.004000] tsc: Detected 2394.502 MHz processor
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4789.00 BogoMIPS (lpj=9578008)
[    0.004000] pid_max: default: 32768 minimum: 301
[    0.004000] ACPI: Core revision 20170303
[    0.004000] Security Framework initialized
[    0.004000] AppArmor: AppArmor initialized
ault: 32768 minimum: 301

so the pid_max: default: 32768 minimum: 301 kernel message
sems to be a normal info message that does not indicate an error.

I assume it freezes at something that happens after the
pid_max: default: 32768 minimum: 301 kernel message
but I have not idea what that "something" is in your case
(I am not at all a kernel expert).

In particular I cannot help with "VMware ESXi (formerly ESX)" issues
because I neither have that nor did I ever use it.
https://en.wikipedia.org/wiki/VMware_ESXi
is all I know about it.
I use only QEMU/KVM virtual machines
(but I am also not at all a QEMU/KVM expert).

@gdha
Copy link
Member

gdha commented Jun 25, 2020

@will-code-for-pizza Try to add MODULES=( 'all_modules' ) in your local.conf file

@gozora
Copy link
Member

gozora commented Jun 25, 2020

@will-code-for-pizza can you please execute following command on your ORIGINAL system find /usr/share/rear -type l | wc -l and report back the output (should be single number) ?

Happened to me some time ago that copying ReaR over winscp discarded symbolic links which made ReaR recovery system un-bootable ...

V.

@jsmeix
Copy link
Member

jsmeix commented Jun 25, 2020

Since ReaR 2.5 MODULES=( 'all_modules' ) is the default, cf.
https://github.com/rear/rear/blob/master/doc/rear-release-notes.txt#L924

@will-code-for-pizza
an offtopic question:
What is RETENTION_TIME="Week" intended to do?
I can only find NSR_RETENTION_TIME in the ReaR scripts
but that only applies if you use BACKUP=NSR.

@will-code-for-pizza
Copy link
Author

What is RETENTION_TIME="Week" intended to do?

You are right @jsmeix

I can only remember, this was an option in an older version of ReaR ( ~ 1.7 )
So I will remove this option.
Thanks.

@jsmeix
Copy link
Member

jsmeix commented Jun 25, 2020

@gozora
but how could something missing below usr/share/rear
let booting freeze apparently while the kernel is in its startup phase
(as far as I imagine what freezes at "pid_max: default: 32768 minimum: 301" means)
i.e. while only the kernel seems to be running?

Or actually it freezes during ReaR recovery system startup scripts?
I have those symlinks of the ReaR recovery system startup scripts:

# find usr/share/rear -type l | grep skel
usr/share/rear/skel/default/etc/bash.bashrc
usr/share/rear/skel/default/usr/lib/systemd/system/multi-user.target.wants/sysinit.service
usr/share/rear/skel/default/usr/lib/systemd/system/multi-user.target.wants/getty.target
usr/share/rear/skel/default/usr/lib/systemd/system/multi-user.target.wants/systemd-journald.service
usr/share/rear/skel/default/usr/lib/systemd/system/multi-user.target.wants/sshd.service
usr/share/rear/skel/default/usr/lib/systemd/system/multi-user.target.wants/systemd-udev-trigger.service
usr/share/rear/skel/default/usr/lib/systemd/system/multi-user.target.wants/rear-boot-helper.service
usr/share/rear/skel/default/usr/lib/systemd/system/dbus.target.wants/dbus.service
usr/share/rear/skel/default/usr/lib/systemd/system/ctrl-alt-del.target
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/dbus.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/systemd-logger.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/systemd-shutdownd.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/systemd-journald.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/systemd-udevd-kernel.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/syslog.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/udev-kernel.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/udev-control.socket
usr/share/rear/skel/default/usr/lib/systemd/system/sockets.target.wants/systemd-udevd-control.socket
usr/share/rear/skel/default/usr/lib/systemd/system/getty.target.wants/getty@tty0.service
usr/share/rear/skel/default/usr/lib/systemd/system/getty.target.wants/serial-getty@ttyS0.service
usr/share/rear/skel/default/usr/lib/systemd/system/default.target

Perhaps what actually freezes is PID1 a.k.a. systemd?

@gozora
Copy link
Member

gozora commented Jun 25, 2020

@jsmeix discarded isn't the right word. The links were not missing. What happened in my particular case was that symbolic links were present but they were regular files with symbolic link target as their content.

Something like original symbolic link /bin/ls -> /usr/bin/ls after copied over winscp become:

# cat /bin/ls
/usr/bin/ls

@gozora
Copy link
Member

gozora commented Jun 25, 2020

Guess our Systemd setup didn't like one of /usr/share/rear/skel/default/usr/lib/systemd/system/ symbolic links modified this way.

V.

@will-code-for-pizza
Copy link
Author

Thanks a lot for all your support.

I use ReaR as an extracted TAR archive in /root/bin/rear-2.6/, so I guess, @gozora 's issue is not my case today.
The creation of the rescue ISO and the backup archive on the NFS target runs fine.

Starting a libvirt/kvm virtual machine or another physical server with this rescue ISO went fine.

Only this ESX vm freezes. :-(

Regards

@will-code-for-pizza
Copy link
Author

@will-code-for-pizza Try to add MODULES=( 'all_modules' ) in your local.conf file

Unfortunatelly, this was NOT the solution.

@will-code-for-pizza
Copy link
Author

@will-code-for-pizza can you please execute following command on your ORIGINAL system find /usr/share/rear -type l | wc -l and report back the output (should be single number) ?

Happened to me some time ago that copying ReaR over winscp discarded symbolic links which made ReaR recovery system un-bootable ...

V.

149

@will-code-for-pizza
Copy link
Author

Thanks a lot for all your support.

I use ReaR as an extracted TAR archive in /root/bin/rear-2.6/, so I guess, @gozora 's issue is not my case today.
The creation of the rescue ISO and the backup archive on the NFS target runs fine.

Starting a libvirt/kvm virtual machine or another physical server with this rescue ISO went fine.

Only this ESX vm freezes. :-(

Regards

By the way... The ESX vm has 16 GB RAM and 8 vCPUs.

@jsmeix
Copy link
Member

jsmeix commented Jun 25, 2020

For comparison:

# find usr/share/rear -type l | wc -l

149

I use latest ReaR upstream code.

By the way:
My QEMU/KVM test systems have 2 GiB RAM and one virtual CPU.
With FIRMWARE_FILES=( 'no' ) it also works with only 1 GiB RAM.

@will-code-for-pizza
Copy link
Author

By the way:
My QEMU/KVM test systems have 2 GiB RAM and one virtual CPU.
With FIRMWARE_FILES=( 'no' ) it also works with only 1 GiB RAM.

Well, but this is not the problem solver :-(

@jsmeix
Copy link
Member

jsmeix commented Jun 25, 2020

@will-code-for-pizza
I did not mean FIRMWARE_FILES=( 'no' ) would help here
I only meant that the ReaR recovery system does not need much RAM
i.e. that 16 GB RAM should be more than enough.

By the way - also not meant as a problem solver:
On my SLES11 SP4 system I have those kernel messages

# dmesg | grep -3 pid_max

[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] Detected 2394.456 MHz processor.
[    0.008000] Calibrating delay loop (skipped) preset value.. 4788.91 BogoMIPS (lpj=9577824)
[    0.008000] pid_max: default: 32768 minimum: 301
[    0.008000] kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved
[    0.008000] Security Framework initialized
[    0.008000] AppArmor: AppArmor initialized

so directly after pid_max: default: 32768 minimum: 301
it seems there is something keyboard related happening.
Is there perhaps something special with the keyboard on your ESX VM?
I.e. is there perhaps not a "normal" (virtual) keyboard?
This is plain blind guesswork from me.

@will-code-for-pizza
Copy link
Author

Short info: I have booted another VM on the ESX cluster successfully.
The settings for the VM differs in

WORKING:
Guest OS: Ubuntu Linux (64 Bit)
Compatibility: ESX/ESXi 4.0 and later

vs.

NOT WORKING
Guest OS: SuSE Linux Enterprise 11 (64-Bit)
Compatibility: ESXi 6.7 Update 2 and later

I will now re-create a new virtual machine template and try again.

@will-code-for-pizza
Copy link
Author

will-code-for-pizza commented Jun 25, 2020

SOLVED.

VM Option --> Compatibility: ESXi 6.7 Update 2 and later --> BAD SETTING

@jsmeix jsmeix added external tool The issue depends on other software e.g. third-party backup tools. fixed / solved / done not ReaR / invalid The root cause is not in the ReaR code or ReaR is misused. labels Jun 26, 2020
@jsmeix
Copy link
Member

jsmeix commented Jun 26, 2020

@will-code-for-pizza
thank you for your feedback what the reason was and what the solution is.

@jsmeix jsmeix changed the title Boot from rescue ISO freezes at "pid_max: default: 32768 minimum: 301" on ESX virtual machine Boot from rescue ISO freezes at "pid_max: default: 32768 minimum: 301" on "ESXi 6.7 Update 2 and later" virtual machine Jun 26, 2020
@jsmeix
Copy link
Member

jsmeix commented Jun 26, 2020

@will-code-for-pizza
I have a question out of curiosity:

I do not understand why the SLES11 Linux kernel freezes
when it runs on a particular newer version of a virtual machine
but works on an older version of that kind of virtual machine.

I could understand when some latest version of a guest program
that requires some latest newest hardware stuff fails when it is
run in an older version of a virtual machine because the older
virtual machine does not provide virtualization of the required
newest hardware.

But I do not understand the opposite because I would expect
that an old-style guest program would work on latest versions
of a virtual machine (i.e. I would expect newer virtual machine
versions to behave backward compatible).

Could you perhaps explain what the reason behind is why
it seems an "ESXi 6.7 Update 2" virtual machine does not behave
backward compatible with an "ESX/ESXi 4.0" virtual machine?

I found
https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-64D4B1C9-CD5D-4C68-8B50-585F6A87EBA0.html
but that does not answer my question (or I fail to see the answer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external tool The issue depends on other software e.g. third-party backup tools. fixed / solved / done not ReaR / invalid The root cause is not in the ReaR code or ReaR is misused. special hardware or VM The issue depends on non common (virtual) hardware. support / question
Projects
None yet
Development

No branches or pull requests

4 participants