Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: [ 32.960029] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ip:1924] #171

Closed
dmarjanovic opened this issue Jul 20, 2020 · 8 comments

Comments

@dmarjanovic
Copy link

dmarjanovic commented Jul 20, 2020

I'm trying to make scripts work on CentOS VM and I'm seeing soft lockups. Any hint is appreciated. KVM is enabled. Proxy is used and I can see env variables are correctly configured inside container, proxy is not localhost but an IP. I'm creating image and running container as:

$ emu-docker create --no-metrics emulator-30.0.12.zip sys-img-google_apis-29-Q-x86_64.zip
$ ./run.sh 427c93b52b2b -verbose

Emulator stays in offline mode forever. Private/public keys are set and seems to be correctly passed in container.

$ adb devices
* daemon not running; starting now at tcp:5037
* daemon started successfully
List of devices attached
emulator-5554	offline

There are soft lockups in log. Whole log is
container_soft_lockup.txt and below is only the first soft lockup:

Last `soft lockup` error (click to expand)
logcat: 07-20 14:20:43.680  1905  1926 I ServiceManager: Waiting for service 'package_native' on '/dev/binder'...
logcat: 07-20 14:20:43.780  1905  1926 W ServiceManager: Service package_native didn't start. Returning NULL
logcat: 07-20 14:20:43.780  1905  1926 E storaged: getService package_native failed
logcat: 07-20 14:20:43.786  1905  1926 I storaged: storaged: Start
logcat: 07-20 14:20:43.786  1905  1926 I ServiceManager: Waiting for service 'package_native' on '/dev/binder'...
logcat: 07-20 14:20:43.886  1905  1926 I ServiceManager: Waiting for service 'package_native' on '/dev/binder'...
logcat: 07-20 14:20:43.900  1814  1814 E IptablesRestoreController: Timed out waiting for response from iptables process 1929
logcat: 07-20 14:20:43.986  1905  1926 I ServiceManager: Waiting for service 'package_native' on '/dev/binder'...
logcat: 07-20 14:20:48.593  1905  1926 I chatty  : uid=0(root) /system/bin/storaged identical 46 lines
logcat: 07-20 14:20:48.693  1905  1926 I ServiceManager: Waiting for service 'package_native' on '/dev/binder'...
logcat: 07-20 14:20:48.793  1905  1926 W ServiceManager: Service package_native didn't start. Returning NULL
logcat: 07-20 14:20:48.793  1905  1926 E storaged: getService package_native failed
kernel: [   33.190031] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ip:1938]
kernel: [   33.192758] Modules linked in:
kernel: [   33.194017] CPU: 3 PID: 1938 Comm: ip Not tainted 4.14.150+ #1
kernel: [   33.196498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
kernel: [   33.201305] task: 00000000fb802cb6 task.stack: 000000007cabebdf
kernel: [   33.203722] RIP: 0010:iowrite16+0xd/0x35
kernel: [   33.205327] RSP: 0018:ffff9841001f3340 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff10
kernel: [   33.208414] RAX: 0000000000000002 RBX: ffff9003b984d000 RCX: 0000000000000001
kernel: [   33.211272] RDX: 00000000ffffffff RSI: ffff984100145008 RDI: 0000000000000002
kernel: [   33.214165] RBP: ffff9841001f3340 R08: 0000000000000003 R09: 0000000000000000
kernel: [   33.217041] R10: ffff9003b7f05140 R11: ffff9841001f3448 R12: ffff9841001f33b0
kernel: [   33.219893] R13: ffff9003ba1ad400 R14: ffff9841001f3301 R15: 0000000000000002
kernel: [   33.222856] FS:  000070546a1db010(0000) GS:ffff9003bfd80000(0000) knlGS:0000000000000000
kernel: [   33.226251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [   33.228813] CR2: 00007ffe8e8ec000 CR3: 00000001271c6000 CR4: 00000000000006a0
kernel: [   33.232083] Call Trace:
kernel: [   33.233275]  vp_notify+0x13/0x17
kernel: [   33.234769]  virtqueue_kick+0x22/0x35
kernel: [   33.236511]  virtnet_send_command+0xf3/0x165
kernel: [   33.238459]  virtnet_set_rx_mode+0xac/0x361
kernel: [   33.240387]  ? __this_cpu_preempt_check+0x13/0x15
kernel: [   33.242535]  ? __local_bh_enable_ip+0x54/0x71
kernel: [   33.244495]  ? local_bh_enable+0x15/0x17
kernel: [   33.246160]  __dev_set_rx_mode+0x7c/0x81
kernel: [   33.247773]  __dev_open+0xdf/0x14a
kernel: [   33.249176]  __dev_change_flags+0xad/0x1ce
kernel: [   33.250854]  dev_change_flags+0x22/0x57
kernel: [   33.252435]  do_setlink+0x39a/0xc17
kernel: [   33.253865]  rtnl_newlink+0x621/0x7e1
kernel: [   33.255372]  ? rtnl_newlink+0x131/0x7e1
kernel: [   33.256998]  ? selinux_capable+0x1e/0x20
kernel: [   33.258606]  ? security_capable+0x3c/0x55
kernel: [   33.260255]  ? ns_capable_common+0x2a/0x58
kernel: [   33.261926]  ? ns_capable+0xe/0x10
kernel: [   33.263316]  rtnetlink_rcv_msg+0x262/0x35b
kernel: [   33.264983]  ? selinux_capable+0x1e/0x20
kernel: [   33.266840]  ? security_capable+0x3c/0x55
kernel: [   33.268484]  ? __rtnl_unlock+0x23/0x40
kernel: [   33.270284]  ? netdev_run_todo+0x59/0x2d5
kernel: [   33.272186]  ? ns_capable_common+0x2a/0x58
kernel: [   33.274090]  ? ns_capable+0xe/0x10
kernel: [   33.275664]  ? avc_has_perm+0x89/0x153
kernel: [   33.277443]  ? preempt_count_add+0x68/0xb5
kernel: [   33.279322]  ? rtnetlink_bind+0x29/0x29
kernel: [   33.281119]  netlink_rcv_skb+0x7e/0xf7
kernel: [   33.282853]  rtnetlink_rcv+0x10/0x12
kernel: [   33.284505]  netlink_unicast+0x151/0x1c1
kernel: [   33.286385]  netlink_sendmsg+0x357/0x39e
kernel: [   33.288044]  sock_sendmsg+0x2d/0x3d
kernel: [   33.289464]  ___sys_sendmsg+0x1be/0x25c
kernel: [   33.291092]  ? mem_cgroup_event_ratelimit+0x39/0x9c
kernel: [   33.293084]  ? debug_smp_processor_id+0x17/0x19
kernel: [   33.294995]  ? __lru_cache_add+0x86/0x9d
kernel: [   33.296605]  ? _raw_spin_unlock+0x15/0x29
kernel: [   33.298430]  __sys_sendmsg+0x4d/0x8a
kernel: [   33.299946]  SyS_sendmsg+0xd/0x17
kernel: [   33.301537]  do_syscall_64+0x68/0x95
kernel: [   33.303204]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
kernel: [   33.305512] RIP: 0033:0x705469db5a97
kernel: [   33.307183] RSP: 002b:00007ffe8e8ebee8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
kernel: [   33.310612] RAX: ffffffffffffffda RBX: 00007ffe8e8f3fd0 RCX: 0000705469db5a97
kernel: [   33.313785] RDX: 0000000000000000 RSI: 00007ffe8e8ebf08 RDI: 0000000000000003
kernel: [   33.317044] RBP: 0000000000000000 R08: 0000000000008000 R09: 0000000000000000
kernel: [   33.320267] R10: 0000705469d5aef0 R11: 0000000000000246 R12: 00007ffe8e8ebf60
kernel: [   33.323501] R13: 00005629f6d760d8 R14: 0000000000000001 R15: 0000000000000000
kernel: [   33.326757] Code: 00 01 00 72 08 0f b7 d6 89 f8 ee eb 0f 48 89 f7 48 c7 c6 b9 b1 ec a7 e8 c6 fe ff ff 5d c3 55 48 89 e5 48 81 fe 00 00 04 00 72 05 <66> 89 3e eb 21 48 81 fe 01 00 01 00 72 09 0f b7 d6 89 f8 66 ef 
^Cemulator: WARNING: Not saving state: RAM not mapped as shared
emulator: Done with QEMU main loop
emulator: User configuration saved to /android-home/Pixel2.avd/emulator-user.ini
emulator: VERBOSE: AdbConnection.cpp:490: Adb transition disconnected -> disconnected
2020/07/20 14:21:09 socat[93] W read(5, 0x55fd04770100, 8192): Connection reset by peer
Only errors filtered (click to expand)
logcat: 07-20 14:20:37.795  1787  1787 E fsverity_init: Failed to load /product/etc/security/fsverity/*.der
logcat: 07-20 14:20:37.800  1788  1788 E /system/bin/mini-keyctl: Cannot find keyring id
logcat: 07-20 14:20:37.805  1789  1789 E fsverity_init: Failed to restrict .fs-verity keyring
logcat: 07-20 14:20:37.888  1790  1790 E apexd   : Failed to walk /product/apex : Can't open /product/apex for reading : No such file or directory
logcat: 07-20 14:20:37.902  1790  1790 E apexd   : Failed to activate packages from /product/apex : Failed to scan /product/apex : Can't open /product/apex for reading : No such file or directory
logcat: 07-20 14:20:38.292  1832  1832 E GoldfishGatekeeper: called goldfish_gatekeeper_open with gate keeper 0x7b805a409050 device 0x7b805a3b0000
logcat: 07-20 14:20:38.531  1888  1888 E cutils  : Failed to read /data/.layout_version: No such file or directory
logcat: 07-20 14:20:39.086  1822  1822 E audiohalservice: Could not get passthrough implementation for android.hardware.audio@5.0::IDevicesFactory/default.
logcat: 07-20 14:20:39.153  1824  1824 E EmulatedCamera_Factory: 2 cameras are being emulated. 2 of them are fake cameras.
logcat: 07-20 14:20:39.153  1824  1824 E EmulatedCamera_HotplugThread: createFileIfNotExists: Could not create file '/data/misc/media/emulator.camera.hotplug.0', error: 'Permission denied' (13)
logcat: 07-20 14:20:39.153  1824  1824 E EmulatedCamera_HotplugThread: createFileIfNotExists: Could not create file '/data/misc/media/emulator.camera.hotplug.1', error: 'Permission denied' (13)
logcat: 07-20 14:20:39.153  1824  1824 E CamPrvdr@2.4-legacy: setUpVendorTags: Vendor tag operations not fully defined. Ignoring definitions.
logcat: 07-20 14:20:39.153  1824  1824 E CamPrvdr@2.4-legacy: initialize: Vendor tag setup failed, will not be available.
logcat: 07-20 14:20:39.154  1822  1822 E audiohalservice: Could not get passthrough implementation for android.hardware.audio.effect@5.0::IEffectsFactory/default.
logcat: 07-20 14:20:39.178  1907  1907 E vndksupport: Could not load libstagefrighthw.so from default namespace: dlopen failed: library "libstagefrighthw.so" not found.
logcat: 07-20 14:20:39.221  1822  1822 E audiohalservice: Could not get passthrough implementation for android.hardware.soundtrigger@2.2::ISoundTriggerHw/default.
logcat: 07-20 14:20:39.221  1822  1822 E audiohalservice: Could not get passthrough implementation for android.hardware.soundtrigger@2.1::ISoundTriggerHw/default.
logcat: 07-20 14:20:39.267  1822  1822 E SoundTriggerHalImpl: couldn't load sound trigger module sound_trigger.primary (No such file or directory)
logcat: 07-20 14:20:39.267  1822  1822 E audiohalservice: Could not get passthrough implementation for android.hardware.bluetooth.audio@2.0::IBluetoothAudioProvidersFactory/default.
logcat: 07-20 14:20:39.267  1822  1822 E audiohalservice: Could not get passthrough implementation for android.hardware.bluetooth.a2dp@1.0::IBluetoothAudioOffload/default.
logcat: 07-20 14:20:39.432  1880  1880 E CameraProviderManager: DeviceInfo3: Converted ICameraDevice instance to nullptr
logcat: 07-20 14:20:39.433  1880  1880 E CameraProviderManager: DeviceInfo3: Converted ICameraDevice instance to nullptr
logcat: 07-20 14:20:39.452  1880  1880 E CameraHardwareInterface: initialize: openSession failed! No such file or directory (-2)
logcat: 07-20 14:20:39.452  1880  1880 E CameraFlashlight: connectCameraDevice: initializing camera 1 failed
logcat: 07-20 14:20:39.452  1880  1880 E CameraFlashlight: findFlashUnits: failed to check if camera 1 has a flash unit. No such file or directory (-2)
logcat: 07-20 14:20:39.452  1880  1880 E CameraHardwareInterface: initialize: openSession failed! No such file or directory (-2)
logcat: 07-20 14:20:39.452  1880  1880 E CameraFlashlight: connectCameraDevice: initializing camera 0 failed
logcat: 07-20 14:20:39.453  1880  1880 E CameraFlashlight: findFlashUnits: failed to check if camera 0 has a flash unit. No such file or directory (-2)
logcat: 07-20 14:20:39.877  1849  1849 E APM::Serializer: deserialize: Could not parse /odm/etc/audio_policy_configuration.xml document.
logcat: 07-20 14:20:39.887  1849  1849 E APM::AudioPolicyEngine/Config: parse: Could not parse document /vendor/etc/audio_policy_engine_configuration.xml
logcat: 07-20 14:20:39.887  1849  1849 E APM::AudioPolicyEngine/Config: parseLegacyVolumeFile: Could not parse document /odm/etc/audio_policy_configuration.xml
logcat: 07-20 14:20:39.931  1892  1892 E MediaExtractorFactory: couldn't opendir(/system/lib64/extractors)
logcat: 07-20 14:20:40.053  1852  1953 E SurfaceFlinger: ro.sf.lcd_density must be defined as a build property
logcat: 07-20 14:20:40.172  1984  2007 E BootAnimation: Could not add watch for /data/system
logcat: 07-20 14:20:43.780  1905  1926 E storaged: getService package_native failed
logcat: 07-20 14:20:43.900  1814  1814 E IptablesRestoreController: Timed out waiting for response from iptables process 1929
logcat: 07-20 14:20:48.793  1905  1926 E storaged: getService package_native failed
$ cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)

Thank you!

/cc @pokowaka

@dmarjanovic
Copy link
Author

dmarjanovic commented Jul 22, 2020

If I disable audio/camera/gps in avd/config.ini there are "only" CPU soft lockups and there are no other audio/camera warnings/errors that I saw earlier.

I'm seeing soft lockups elsewhere on the net with Qemu in some old threads but usually related to specific (buggy) kernel versions or Linux distro version but nothing specific to Android emulator and CentOS.

Is it possible that the kernel version used on this VM with CentOS 7.8 has bugs and incompatible with Qemu/Android emulator? Not sure how to verify.

$ uname -rms
Linux 3.10.0-1127.13.1.el7.x86_64 x86_64

Previously on different (physical) machine I managed to run scripts successfully and these were CentOS and kernel versions:

$ cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)

$ uname -rms
Linux 3.10.0-1062.4.3.el7.x86_64 x86_64

@dmarjanovic
Copy link
Author

dmarjanovic commented Jul 22, 2020

A bit more of kernel logging before the soft lockup:

kernel: [    7.577483] init: property 'ro.kernel.qemu.mediaprofile.video' doesn't exist while expanding '${ro.kernel.qemu.mediaprofile.video}'
kernel: [    7.723704] fscrypt: AES-256-XTS using implementation "xts-aes-aesni"
kernel: [    7.736468] logd.daemon: reinit
kernel: [    7.815979] file system registered
kernel: [    8.010574] healthd: battery l=100 v=5000 t=25.0 h=2 st=2 c=900 fc=300000 cc=10 chg=a
kernel: [    8.177803] IPv6: ADDRCONF(NETDEV_UP): radio0-peer: link is not ready
kernel: [   32.490032] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ip:1810]
kernel: [   32.492694] Modules linked in:
kernel: [   32.494078] CPU: 0 PID: 1810 Comm: ip Not tainted 4.14.150+ #1
...
kernel: [   60.494065] CPU: 0 PID: 1810 Comm: ip Tainted: G             L  4.14.150+ #1
...

This Comm: ip Tainted: G L looks like in #80 but no clue what is it about.

@pokowaka any idea how this "Comm: ip" can be further diagnosed? I've tried with older kernel and it's the same. Also I've tried with qemu monitor but it's removed and can't open adb shell since emulator is offline.

@pokowaka
Copy link
Collaborator

Unfortunately I'm not the kernel expert on the team. @741g would you have any insights?

@pokowaka
Copy link
Collaborator

According to https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html, the last log line merely indicates that the kernel marked itself as tainted after the soft lockup. The soft lockup indicates that the kernel was spinning for 23s without scheduling another thread. Resulting in no progress (which likely causes the device to be offline).

@pokowaka
Copy link
Collaborator

It looks like the kernel is trying to notify the host (https://elixir.bootlin.com/linux/v4.14.156/source/drivers/virtio/virtio_pci_common.c#L44) but is unable to proceed. I guess this should trap the hypervisor and give control to qemu to handle the actual write.

@pokowaka
Copy link
Collaborator

Calling conventions,

RSI: ffff984100145008 RDI: 0000000000000002
iowrite16(vq->index, (void __iomem *)vq->priv); 

Gives:

vq->index: 2
vq->priv:  ffff984100145008

2 cases it seems:

  • Guest never triggers qemu. This would indicate issues with kvm.
  • Kvm delivered event to qemu, but qemu never returned to the guest (ie. got locked handling the request).

@dmarjanovic
Copy link
Author

dmarjanovic commented Jul 29, 2020

Thank you for help with debugging. 👍

Small update: I am seeing soft lockup in the same VM also with native Android Emulator without using docker:

  • P/Q system image (Android kernel 4.x)
  • R system image (Android kernel 5.x)

Somehow O system image (Android kernel 3.10.x) does boot without the soft lockup issue.

@dmarjanovic
Copy link
Author

dmarjanovic commented Aug 10, 2020

Seems like an issue with CentOS 7 inside ESXi 6.5 and not connected directly to the project. No known workaround found. CentOS 8 seems to be working without soft lockups. Closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants