Unable to load lcd module... #113

sfrappier · 2013-01-25T20:37:47Z

Dev's - please let me know if you need additional information for debugging. t1 is my kernel prefix (FYI). I edited this again after posting because it formatted incorrectly.

Problem: module lcd does not load
Device: Hackberry A10 1GB
Distribution: Ubuntu 12.10 armhf
Kernel version: Linux version 3.0.57-t1 (sfrappier@tenderloin-pc) (gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-1[ 7.520000] [drm] Initialized drm 1.1.0 20060810

Description: Upon load of the lcd module, there is a kernel trace that is produced in dmesg. This also then causes issues when trying to load the mali module and there is no output on the device.

Module load order is the same as what is documented on the Wiki for the Mali 400 driver - on the issues list I saw that someone stated to load disp first - tried this as well and no success. Core issue looks to be that /devices/lcd has already been created? Is it a simple kernel config that needs to change? I'm using the default sun4i_defconfig for compile with additional modules enabled.

Error (kernel dmesg):
[ 7.520000] [drm] Initialized drm 1.1.0 20060810
[ 7.580000] ------------[ cut here ]------------
[ 7.580000] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0x90/0xc4()
[ 7.590000] sysfs: cannot create duplicate filename '/class/lcd'
[ 7.590000] Modules linked in: lcd(+) drm
[ 7.600000] Backtrace:
[ 7.600000] from
[ 7.630000] r6:c0151cec r5:00000009 r4:e715fd50 r3:00000000
[ 7.640000] from
[ 7.650000] from
[ 7.660000] r8:00000000 r7:e721a000 r6:e716d240 r5:e721a000 r4:ffffffef
[ 7.660000] r3:00000009
[ 7.670000] from
[ 7.670000] r3:e721a000 r2:c06c34e4
[ 7.680000] from
[ 7.690000] r7:e781e150 r6:e71140c8 r5:00000001 r4:e716d240
[ 7.690000] from
[ 7.700000] r[ 7[ 7.520000] [drm] Initialized drm 1.1.0 20060810
[ 7.580000] ------------[ cut here ]------------
[ 7.580000] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0x90/0xc4()
[ 7.590000] sysfs: cannot create duplicate filename '/class/lcd'
[ 7.590000] Modules linked in: lcd(+) drm
[ 7.600000] Backtrace:
[ 7.600000] from
[ 7.630000] r6:c0151cec r5:00000009 r4:e715fd50 r3:00000000
[ 7.640000] from
[ 7.650000] from
[ 7.660000] r8:00000000 r7:e721a000 r6:e716d240 r5:e721a000 r4:ffffffef
[ 7.660000] r3:00000009
[ 7.670000] from
[ 7.670000] r3:e721a000 r2:c06c34e4
[ 7.680000] from
[ 7.690000] r7:e781e150 r6:e71140c8 r5:00000001 r4:e716d240
[ 7.690000] from
[ 7.700000] r8:e7826648 r7:bf0408b0 r6:00000000 r5:e781e150 r4:e71140c8
[ 7.710000] from
[ 7.720000] r6:00000000 r5:e71140c8 r4:e71140c8
[ 7.720000] from
[ 7.740000] r8:bf0407d0 r7:bf0408b0 r6:00000000 r5:e71140c8 r4:e70fb180
[ 7.740000] from
[ 7.750000] r5:e71140c0 r4:e70fb180
[ 7.750000] from
[ 7.760000] r7:bf0408b0 r6:bf040680 r5:bf040788 r4:e70fb180
[ 7.770000] from [](lcd_class_init+0x20/0x64 [lcd])
[ 7.780000] r7:e715e000 r6:c079d600 r5:bf040788 r4:bf0408ac
[ 7.790000] [](lcd_class_init+0x0/0x64 [lcd]) from
[ 7.790000] r4:bf040788 r3:00000000
[ 7.800000] from
[ 7.800000] from
[ 7.810000] ---[ end trace 75de50d98331e6e7 ]---
[ 7.830000] kobject_add_internal failed for lcd with -EEXIST, don't try to register things with the same name in the same directory.
[ 7.940000] Backtrace:
[ 7.940000] from
[ 7.940000] r6:ffffffef r5:e71140c8 r4:e71140c8 r3:00000002
[ 8.010000] from
[ 8.060000] from
[ 8.130000] r8:bf0407d0 r7:bf0408b0 r6:00000000 r5:e71140c8 r4:e70fb180
[ 8.130000] from
[ 8.210000] r5:e71140c0 r4:e70fb180
[ 8.300000] from
[ 8.310000] r7:bf0408b0 r6:bf040680 r5:bf040788 r4:e70fb180
[ 8.310000] from [](lcd_class_init+0x20/0x64 [lcd])
[ 8.400000] r7:e715e000 r6:c079d600 r5:bf040788 r4:bf0408ac
[ 8.400000] [](lcd_class_init+0x0/0x64 [lcd]) from
[ 8.480000] r4:bf040788 r3:00000000
[ 8.480000] from
[ 8.530000] from
[ 8.590000] Unable to create backlight class; errno = -17
[ 8.820000] UMP: UMP device driver loaded
[ 9.570000] init: plymouth main process (51) killed by ABRT signal
[ 9.630000] init: plymouth-splash main process (232) terminated with status 2
[ 11.110000] init: failsafe main process (404) killed by TERM signal

sfrappier · 2013-01-25T21:15:57Z

OK - PEBKAC error...turns out LCD and HDMI were loaded in the kernel configuration (not as modules).

I still have the issue when the mali module loads - the entire system freezes. There is no response at all from video or from system. Nothing logged to dmesg or syslog. If I load every module except for mali and mali_drm, the system will boot into X (and is usable).

I have confirmed that mali and mali_drm are compiled as modules.

Am I missing something with the kernel configuration? I have tried both the X11 and Framebuffer drivers for X11. Compiled successfully and installed.

Trying to get DRI2 to work...

romanrm · 2013-01-25T21:24:59Z

AFAIK you need to load them in this strict order:

modprobe lcd
modprobe hdmi
modprobe disp
modprobe mali
modprobe fb
modprobe ump
modprobe mali_drm

maybe deviating from that, would be the cause of lock-ups?

sfrappier · 2013-01-25T21:33:21Z

I was hoping that would be the culprit - I re-ordered the module load based upon your comment, but it still locks up. What I have found is that it locks up a bit after the ext4 root fs is mounted, so I see a little bit of logging show up.

The other difference I see is that you have the fb as a module - I'm not sure if mine is a module or if it's in the kernel...I need to check...

romanrm · 2013-01-25T21:41:29Z

You can use my .configs which are available from http://romanrm.ru/en/a10/kernel (or even the kernels themselves)
the Desktop one is known-good to load Mali without locking up.

sfrappier · 2013-01-25T21:44:35Z

Appreciated - I'll try that as well to see if there's any major differences - I just recompiled the kernel with fb as a module so that I can use the strict order you defined as well - if that doesn't work I'll take a .config from your site and try again to see if I can get past the hard lock. Even if I don't get a display, I can at least then check out the logs to see what's up :).

sfrappier · 2013-01-25T21:49:38Z

No juice - I'm going to try on eof your configs over the weekend and I'll let you know what happens.

sfrappier · 2013-01-25T22:44:43Z

romanrm:

I just compiled the kernel using your config - same exact issue. I'm at 3.0.57 instead of 52 - do you think this would make any difference?

The system will boot if I comment out the mali and mali_drm modules. As soon as mali starts up - the system freezes and no further logging occurs.

This is a Hackberry A10 1GB - is that the same device you're using as well?

Appreciate your help.

Scott

amery · 2013-01-25T22:53:45Z

can you try the sunxi-v3.0.57-r1 tag?

sfrappier · 2013-01-26T01:02:30Z

amery - I'll try tonight and let you know...

sfrappier · 2013-01-26T01:12:14Z

Just to confirm - SHA1 = 89a5378 - correct?

sfrappier · 2013-01-26T05:28:20Z

amery - tried commit 89a5378 - booted first off with mali and mali_drm commented out in the strict order that roman mentioned. Booted successfully - dmesg and syslog populated correctly.

Went to modules and enabled mali and mali_drm - did a sudo reboot. System did not recover. Nothing logged to dmesg or syslog - the system was at a hard freeze.

If I try to load the mali driver manually from a started x session - the system also hangs - no further logging occurs.

Is there anything that I can give to you (I don't have a serial debugger yet) that can help you pin-point what's going on with the Hackberry A10? I'm a bit surprised that no one else has posted this, but then again most of the images that I've seen do not have DRI2 support.

Also note - my xorg.conf has DRI2 disabled in all scenarios - I "assumed" that even if it failed to load X, I could at least see log's from X that state it was unable to find a display. If I leave DRI enabled and do not load mali and/or mali_drm, it states there are no compatible screens which I think is the expected behavior.

I also tried sunxi-3.4.24 - same issue. There is a noticeable increase in performance without mali being loaded (window painting is much faster - in 3.0.xx I can see the paint, while in 3.4.xx it seems to be instant).

I'm more than happy to send you whatever you need - let me know how I can help.

Quick edit - I used roman's config as he stated that it is working on another device - lsmod does show the modules except for the ones I did not enable when booting. I can also verify that mali.ko and mali_drm.ko do exist. The rc.local has the chmod 777 for ump and mali (brute force I know), and there is a depmod -a for the 3.0.57 modules.

Last edit - used config (http://romanrm.ru/dl/a10/desktop/3.0.52-r1-desktop-rm2/config-r1-desktop-rm2) - make ARCH=arm oldconfig - only two new things the config asked for were SUNXI4 EHCI and OCHI - set both to Y.

romanrm · 2013-01-26T09:10:06Z

Where do you " comment out the mali and mali_drm modules", or "Went to modules and enabled mali and mali_drm"?
/etc/modules? Forget about that one, afaik the order is not guaranteed and it is unknown at which point it is being actually loaded (before or after X tries to start, for example).
Disable startup of your desktop manager (e.g. update-rc.d gdm remove), then in /etc/rc.local:
modprobe lcd
modprobe hdmi
modprobe disp
modprobe mali
modprobe fb
modprobe ump
modprobe mali_drm
/etc/init.d/gdm start

sfrappier · 2013-01-27T21:08:35Z

romanrm:

You are correct - I would put it in /etc/modules. I followed your instructions and still receive the same issue (using 3.0.57-r1). Ubuntu uses lightdm but I did what you suggested and made it so it started in rc.local and also disabled the automatic startup in rc.d. As soon as the modprobe mali occurred - the system froze again.

I even tried to do a load via SSH - basically made it headless and didn't load the modules. I then did each module in succession. Once the mali driver loaded, it froze again.

If I do a modprobe -vn mali - there are no errors. When using modinfo, I did see that mali depended upon fb so I also tested with lcd, hdmi, disp, fb, mali, ump, mali_drm as well.

I also set the kernel.panic in the sysctl.conf to 5 seconds. The interesting thing about it is that even with this set, the system does not reboot after 5 seconds. It just hangs indefinitely.

Do I need to start compiling kernels to see the last version that would load all modules correctly?

sfrappier · 2013-01-27T21:34:29Z

OK - 3.0.36 does load without a hard lockup:
[ 9.430000] mali: use config clk_div 3
[ 9.460000] mali: clk_div 3
[ 9.460000] Mali: mali clock set completed, clock is 320000000 Mhz
[ 9.470000] Mali: Mali device driver loaded

I'm going to try 3.0.42-r4 to see if that works...

sfrappier · 2013-01-27T22:04:17Z

3.0.42+ also loads without issues.

sfrappier · 2013-01-28T00:03:45Z

OK - the break occurs somewhere between 3.0.42-r2 and 3.0.42-r3. I compiled and checked out SHA1 - fdfa87c and it loads correctly. I tried 3.0.42-r3 and I get the hard lockup that I have been describing.

I picked fdfa87c because it looks like Luc Verhaegen did a bunch of cleanup and unifying in the code base. At this revision is it now using the r3p0 driver, or was that enabled in a later release to see if I can tie it to a specific change...?

sfrappier · 2013-01-29T15:12:53Z

Just checking in - let me know how I can help...

techn · 2013-01-29T17:34:20Z

Between r2 and r3 there seems to be generally disp restructuring.. But there is also some mali/sata clock related commits. ie. 52f50bb, 99fe31a, e630a4b. If I remember correctly wrong mali clock caused same problem for some ppl. You could try higher mali clk_div .

sfrappier · 2013-01-29T17:51:03Z

techn - thanks for your reply :)

Is there a way to set the clk_div upon module load - or is this something that I need to re-compile the kernel with/change code? Just checking before I attempt any brute force changes...

techn · 2013-01-29T17:57:40Z

Oh. Forgot to mention that clk_div can be changed from .fex file. Ie. https://github.com/linux-sunxi/sunxi-boards/blob/master/sys_config/a10/mini-x.fex#L83

sfrappier · 2013-01-29T20:43:34Z

OK - I tried with clk_div of 3, 4, 5, and 6 - the system still has a hard lock upon load of the mali module :(.

Each time I increased the number - the responsiveness of the system definitely was more "sluggish" with graphic operations. Easier to see painting, et cetera...I'm assuming this was expected...correct?

Any other ideas?

techn · 2013-01-30T08:43:12Z

I think next option would be to bisect between r2 and r3. And find which commit exactly causes this problem.

sfrappier · 2013-01-30T20:05:37Z

Sounds good - I'll start compiling and testing and let you know which commit looks to cause the issue. Thanks again for your suggestions.

sfrappier · 2013-01-30T21:00:32Z

OK - found it - the exact commit that causes the hard freeze on the Hackberry when loading the mali module is - cbf3d41 (arm: sun4i: disable USE_PLL6M_REPLACE_PLL4).

It looks like the following change causes the lockup:

cbf3d41b7ed513a743854fde1ca0b5c4d0f11610
 arch/arm/mach-sun4i/include/mach/aw_ccu.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-sun4i/include/mach/aw_ccu.h b/arch/arm/mach-sun4i/include/mach/aw_ccu.h
index 7af508e..98bfe80 100755
--- a/arch/arm/mach-sun4i/include/mach/aw_ccu.h
+++ b/arch/arm/mach-sun4i/include/mach/aw_ccu.h
@@ -29,7 +29,7 @@

 /* define if need use pll6 to take the place of pll4,
    this definition is significative on C ver. only */
-#define USE_PLL6M_REPLACE_PLL4      (1)
+#define USE_PLL6M_REPLACE_PLL4      (0)


 /* define clock error type      */

If I knew how to code C I'd suggest a change, but is it as simple as settings this back to 1 and only doing so for the Hackberry? Or would this cause other issues in newer builds?

sfrappier · 2013-01-30T21:14:59Z

Also - I reverted this change in the 3.4 kernel series - the mali and mali_drm modules loaded and the system did not freeze! :)

sfrappier · 2013-02-13T14:11:57Z

Gentlemen:

Is there anything further that you need from regarding this topic? I am manually changing the file for the Hackberry each time I download a kernel image after this commit. Everything seems to work, but I'm not sure if this is a "good thing" or if the kernel is not running optimally (or at the correct clock speeds for that matter)...

Just checking in on whether this is considered a bug, or if you need more information from me to make an appropriate patch?

techn · 2013-02-13T14:14:51Z

I propose to close this and re-open #40

sfrappier · 2013-02-18T18:25:26Z

Agreed - I'll leave it up to you guys know on how to proceed. Do I need to close this out or will an op take care of it?

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

When running tipcTC&tipcTS test suite, below lockdep unsafe locking scenario is reported: [ 1109.997854] [ 1109.997988] ================================= [ 1109.998290] [ INFO: inconsistent lock state ] [ 1109.998575] 3.17.0-rc1+ #113 Not tainted [ 1109.998762] --------------------------------- [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 1109.998762] (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] {SOFTIRQ-ON-W} state was registered at: [ 1109.998762] [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80 [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc] [ 1109.998762] [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc] [ 1109.998762] [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc] [ 1109.998762] [<ffffffff817676ee>] SYSC_connect+0xae/0xc0 [ 1109.998762] [<ffffffff81767b7e>] SyS_connect+0xe/0x10 [ 1109.998762] [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200 [ 1109.998762] [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f [ 1109.998762] irq event stamp: 241060 [ 1109.998762] hardirqs last enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0 [ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0 [ 1109.998762] softirqs last enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50 [ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [ 1109.998762] other info that might help us debug this: [ 1109.998762] Possible unsafe locking scenario: [ 1109.998762] [ 1109.998762] CPU0 [ 1109.998762] ---- [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] <Interrupt> [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] [ 1109.998762] *** DEADLOCK *** [ 1109.998762] [ 1109.998762] 2 locks held by swapper/7/0: [ 1109.998762] #0: (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] #1: (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [ 1109.998762] stack backtrace: [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113 [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007 [ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001 [ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000 [ 1109.998762] Call Trace: [ 1109.998762] <IRQ> [<ffffffff81a209eb>] dump_stack+0x4e/0x68 [ 1109.998762] [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202 [ 1109.998762] [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50 [ 1109.998762] [<ffffffff810a406c>] mark_lock+0x28c/0x2f0 [ 1109.998762] [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0 [ 1109.998762] [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80 [ 1109.998762] [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10 [ 1109.998762] [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc] [ 1109.998762] [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc] [ 1109.998762] [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70 [ 1109.998762] [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0 [ 1109.998762] [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70 [ 1109.998762] [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0 [ 1109.998762] [<ffffffff81785518>] napi_gro_receive+0xd8/0x240 [ 1109.998762] [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530 [ 1109.998762] [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffff817842b1>] net_rx_action+0x141/0x310 [ 1109.998762] [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150 [ 1109.998762] [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0 [ 1109.998762] [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [<ffffffff81a30d07>] do_IRQ+0x67/0x110 [ 1109.998762] [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f [ 1109.998762] <EOI> [<ffffffff8100d2b7>] ? default_idle+0x37/0x250 [ 1109.998762] [<ffffffff8100d2b5>] ? default_idle+0x35/0x250 [ 1109.998762] [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20 [ 1109.998762] [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0 [ 1109.998762] [<ffffffff81034c78>] start_secondary+0x188/0x1f0 When intra-node messages are delivered from one process to another process, tipc_link_xmit() doesn't disable BH before it directly calls tipc_sk_rcv() on process context to forward messages to destination socket. Meanwhile, if messages delivered by remote node arrive at the node and their destinations are also the same socket, tipc_sk_rcv() running on process context might be preempted by tipc_sk_rcv() running BH context. As a result, the latter cannot obtain the socket lock as the lock was obtained by the former, however, the former has no chance to be run as the latter is owning the CPU now, so headlock happens. To avoid it, BH should be always disabled in tipc_sk_rcv(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>

If instance directories are deleted while there are registered function triggers: # cd /sys/kernel/debug/tracing/instances # mkdir test # echo "schedule:enable_event:sched:sched_switch" > test/set_ftrace_filter # rmdir test Unable to handle kernel paging request for data at address 0x00000008 Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000021edde8 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_filter fuse binfmt_misc pseries_rng rng_core vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath virtio_net virtio_blk virtio_pci crc32c_vpmsum virtio_ring virtio CPU: 8 PID: 8694 Comm: rmdir Not tainted 4.11.0-nnr+ #113 task: c0000000bab52800 task.stack: c0000000baba0000 NIP: c0000000021edde8 LR: c0000000021f0590 CTR: c000000002119620 REGS: c0000000baba3870 TRAP: 0300 Not tainted (4.11.0-nnr+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22002422 XER: 20000000 CFAR: 00007fffabb725a8 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 0 GPR00: c00000000220f750 c0000000baba3af0 c000000003157e00 0000000000000000 GPR04: 0000000000000040 00000000000000eb 0000000000000040 0000000000000000 GPR08: 0000000000000000 0000000000000113 0000000000000000 c00000000305db98 GPR12: c000000002119620 c00000000fd42c00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000000bab52e90 0000000000000000 GPR24: 0000000000000000 00000000000000eb 0000000000000040 c0000000baba3bb0 GPR28: c00000009cb06eb0 c0000000bab52800 c00000009cb06eb0 c0000000baba3bb0 NIP [c0000000021edde8] ring_buffer_lock_reserve+0x8/0x4e0 LR [c0000000021f0590] trace_event_buffer_lock_reserve+0xe0/0x1a0 Call Trace: [c0000000baba3af0] [c0000000021f96c8] trace_event_buffer_commit+0x1b8/0x280 (unreliable) [c0000000baba3b60] [c00000000220f750] trace_event_buffer_reserve+0x80/0xd0 [c0000000baba3b90] [c0000000021196b8] trace_event_raw_event_sched_switch+0x98/0x180 [c0000000baba3c10] [c0000000029d9980] __schedule+0x6e0/0xab0 [c0000000baba3ce0] [c000000002122230] do_task_dead+0x70/0xc0 [c0000000baba3d10] [c0000000020ea9c8] do_exit+0x828/0xd00 [c0000000baba3dd0] [c0000000020eaf70] do_group_exit+0x60/0x100 [c0000000baba3e10] [c0000000020eb034] SyS_exit_group+0x24/0x30 [c0000000baba3e30] [c00000000200bcec] system_call+0x38/0x54 Instruction dump: 60000000 60420000 7d244b78 7f63db78 4bffaa09 393efff8 793e0020 39200000 4bfffecc 60420000 3c4c00f7 3842a020 <81230008> 2f890000 409e02f0 a14d0008 ---[ end trace b917b8985d0e650b ]--- Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000021edde8 Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000021edde8 Faulting instruction address: 0xc0000000021edde8 To address this, let's clear all registered function probes before deleting the ftrace instance. Link: http://lkml.kernel.org/r/c5f1ca624043690bd94642bb6bffd3f2fc504035.1494956770.git.naveen.n.rao@linux.vnet.ibm.com Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

I just happened to see the function header indentation of zfcp_fc_enqueue_event() and I picked some more from checkpatch: $ checkpatch.pl --strict -f drivers/s390/scsi/zfcp_fc.c ... CHECK: Alignment should match open parenthesis #113: FILE: drivers/s390/scsi/zfcp_fc.c:113: + fc_host_post_event(adapter->scsi_host, fc_get_event_number(), + event->code, event->data); CHECK: Blank lines aren't necessary before a close brace '}' #118: FILE: drivers/s390/scsi/zfcp_fc.c:118: + +} ... The change complements v2.6.36 commit 2d1e547 ("[SCSI] zfcp: Post events through FC transport class"). Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

When we do the following test, we got oops in ipmi_msghandler driver while((1)) do service ipmievd restart & service ipmievd restart done --------------------------------------------------------------- [ 294.230186] Unable to handle kernel paging request at virtual address 0000803fea6ea008 [ 294.230188] Mem abort info: [ 294.230190] ESR = 0x96000004 [ 294.230191] Exception class = DABT (current EL), IL = 32 bits [ 294.230193] SET = 0, FnV = 0 [ 294.230194] EA = 0, S1PTW = 0 [ 294.230195] Data abort info: [ 294.230196] ISV = 0, ISS = 0x00000004 [ 294.230197] CM = 0, WnR = 0 [ 294.230199] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000a1c1b75a [ 294.230201] [0000803fea6ea008] pgd=0000000000000000 [ 294.230204] Internal error: Oops: 96000004 [#1] SMP [ 294.235211] Modules linked in: nls_utf8 isofs rpcrdma ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce ses sha256_arm64 sha1_ce hibmc_drm hisi_sas_v2_hw enclosure sg hisi_sas_main sbsa_gwdt ip_tables mlx5_ib ib_uverbs marvell ib_core mlx5_core ixgbe ipmi_si mdio hns_dsaf ipmi_devintf ipmi_msghandler hns_enet_drv hns_mdio [ 294.277745] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.0.0-rc2+ linux-sunxi#113 [ 294.285511] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.37 11/21/2017 [ 294.292835] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 294.297695] pc : __srcu_read_lock+0x38/0x58 [ 294.301940] lr : acquire_ipmi_user+0x2c/0x70 [ipmi_msghandler] [ 294.307853] sp : ffff00001001bc80 [ 294.311208] x29: ffff00001001bc80 x28: ffff0000117e5000 [ 294.316594] x27: 0000000000000000 x26: dead000000000100 [ 294.321980] x25: dead000000000200 x24: ffff803f6bd06800 [ 294.327366] x23: 0000000000000000 x22: 0000000000000000 [ 294.332752] x21: ffff00001001bd04 x20: ffff80df33d19018 [ 294.338137] x19: ffff80df33d19018 x18: 0000000000000000 [ 294.343523] x17: 0000000000000000 x16: 0000000000000000 [ 294.348908] x15: 0000000000000000 x14: 0000000000000002 [ 294.354293] x13: 0000000000000000 x12: 0000000000000000 [ 294.359679] x11: 0000000000000000 x10: 0000000000100000 [ 294.365065] x9 : 0000000000000000 x8 : 0000000000000004 [ 294.370451] x7 : 0000000000000000 x6 : ffff80df34558678 [ 294.375836] x5 : 000000000000000c x4 : 0000000000000000 [ 294.381221] x3 : 0000000000000001 x2 : 0000803fea6ea000 [ 294.386607] x1 : 0000803fea6ea008 x0 : 0000000000000001 [ 294.391994] Process swapper/3 (pid: 0, stack limit = 0x0000000083087293) [ 294.398791] Call trace: [ 294.401266] __srcu_read_lock+0x38/0x58 [ 294.405154] acquire_ipmi_user+0x2c/0x70 [ipmi_msghandler] [ 294.410716] deliver_response+0x80/0xf8 [ipmi_msghandler] [ 294.416189] deliver_local_response+0x28/0x68 [ipmi_msghandler] [ 294.422193] handle_one_recv_msg+0x158/0xcf8 [ipmi_msghandler] [ 294.432050] handle_new_recv_msgs+0xc0/0x210 [ipmi_msghandler] [ 294.441984] smi_recv_tasklet+0x8c/0x158 [ipmi_msghandler] [ 294.451618] tasklet_action_common.isra.5+0x88/0x138 [ 294.460661] tasklet_action+0x2c/0x38 [ 294.468191] __do_softirq+0x120/0x2f8 [ 294.475561] irq_exit+0x134/0x140 [ 294.482445] __handle_domain_irq+0x6c/0xc0 [ 294.489954] gic_handle_irq+0xb8/0x178 [ 294.497037] el1_irq+0xb0/0x140 [ 294.503381] arch_cpu_idle+0x34/0x1a8 [ 294.510096] do_idle+0x1d4/0x290 [ 294.516322] cpu_startup_entry+0x28/0x30 [ 294.523230] secondary_start_kernel+0x184/0x1d0 [ 294.530657] Code: d538d082 d2800023 8b010c81 8b020021 (c85f7c25) [ 294.539746] ---[ end trace 8a7a880dee570b29 ]--- [ 294.547341] Kernel panic - not syncing: Fatal exception in interrupt [ 294.556837] SMP: stopping secondary CPUs [ 294.563996] Kernel Offset: disabled [ 294.570515] CPU features: 0x002,21006008 [ 294.577638] Memory Limit: none [ 294.587178] Starting crashdump kernel... [ 294.594314] Bye! Because the user->release_barrier.rda is freed in ipmi_destroy_user(), but the refcount is not zero, when acquire_ipmi_user() uses user->release_barrier.rda in __srcu_read_lock(), it causes oops. Fix this by calling cleanup_srcu_struct() when the refcount is zero. Fixes: e86ee2d ("ipmi: Rework locking and shutdown for hot remove") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>

commit 77f8269 upstream. When we do the following test, we got oops in ipmi_msghandler driver while((1)) do service ipmievd restart & service ipmievd restart done --------------------------------------------------------------- [ 294.230186] Unable to handle kernel paging request at virtual address 0000803fea6ea008 [ 294.230188] Mem abort info: [ 294.230190] ESR = 0x96000004 [ 294.230191] Exception class = DABT (current EL), IL = 32 bits [ 294.230193] SET = 0, FnV = 0 [ 294.230194] EA = 0, S1PTW = 0 [ 294.230195] Data abort info: [ 294.230196] ISV = 0, ISS = 0x00000004 [ 294.230197] CM = 0, WnR = 0 [ 294.230199] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000a1c1b75a [ 294.230201] [0000803fea6ea008] pgd=0000000000000000 [ 294.230204] Internal error: Oops: 96000004 [jwrdegoede#1] SMP [ 294.235211] Modules linked in: nls_utf8 isofs rpcrdma ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce ses sha256_arm64 sha1_ce hibmc_drm hisi_sas_v2_hw enclosure sg hisi_sas_main sbsa_gwdt ip_tables mlx5_ib ib_uverbs marvell ib_core mlx5_core ixgbe ipmi_si mdio hns_dsaf ipmi_devintf ipmi_msghandler hns_enet_drv hns_mdio [ 294.277745] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.0.0-rc2+ linux-sunxi#113 [ 294.285511] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.37 11/21/2017 [ 294.292835] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 294.297695] pc : __srcu_read_lock+0x38/0x58 [ 294.301940] lr : acquire_ipmi_user+0x2c/0x70 [ipmi_msghandler] [ 294.307853] sp : ffff00001001bc80 [ 294.311208] x29: ffff00001001bc80 x28: ffff0000117e5000 [ 294.316594] x27: 0000000000000000 x26: dead000000000100 [ 294.321980] x25: dead000000000200 x24: ffff803f6bd06800 [ 294.327366] x23: 0000000000000000 x22: 0000000000000000 [ 294.332752] x21: ffff00001001bd04 x20: ffff80df33d19018 [ 294.338137] x19: ffff80df33d19018 x18: 0000000000000000 [ 294.343523] x17: 0000000000000000 x16: 0000000000000000 [ 294.348908] x15: 0000000000000000 x14: 0000000000000002 [ 294.354293] x13: 0000000000000000 x12: 0000000000000000 [ 294.359679] x11: 0000000000000000 x10: 0000000000100000 [ 294.365065] x9 : 0000000000000000 x8 : 0000000000000004 [ 294.370451] x7 : 0000000000000000 x6 : ffff80df34558678 [ 294.375836] x5 : 000000000000000c x4 : 0000000000000000 [ 294.381221] x3 : 0000000000000001 x2 : 0000803fea6ea000 [ 294.386607] x1 : 0000803fea6ea008 x0 : 0000000000000001 [ 294.391994] Process swapper/3 (pid: 0, stack limit = 0x0000000083087293) [ 294.398791] Call trace: [ 294.401266] __srcu_read_lock+0x38/0x58 [ 294.405154] acquire_ipmi_user+0x2c/0x70 [ipmi_msghandler] [ 294.410716] deliver_response+0x80/0xf8 [ipmi_msghandler] [ 294.416189] deliver_local_response+0x28/0x68 [ipmi_msghandler] [ 294.422193] handle_one_recv_msg+0x158/0xcf8 [ipmi_msghandler] [ 294.432050] handle_new_recv_msgs+0xc0/0x210 [ipmi_msghandler] [ 294.441984] smi_recv_tasklet+0x8c/0x158 [ipmi_msghandler] [ 294.451618] tasklet_action_common.isra.5+0x88/0x138 [ 294.460661] tasklet_action+0x2c/0x38 [ 294.468191] __do_softirq+0x120/0x2f8 [ 294.475561] irq_exit+0x134/0x140 [ 294.482445] __handle_domain_irq+0x6c/0xc0 [ 294.489954] gic_handle_irq+0xb8/0x178 [ 294.497037] el1_irq+0xb0/0x140 [ 294.503381] arch_cpu_idle+0x34/0x1a8 [ 294.510096] do_idle+0x1d4/0x290 [ 294.516322] cpu_startup_entry+0x28/0x30 [ 294.523230] secondary_start_kernel+0x184/0x1d0 [ 294.530657] Code: d538d082 d2800023 8b010c81 8b020021 (c85f7c25) [ 294.539746] ---[ end trace 8a7a880dee570b29 ]--- [ 294.547341] Kernel panic - not syncing: Fatal exception in interrupt [ 294.556837] SMP: stopping secondary CPUs [ 294.563996] Kernel Offset: disabled [ 294.570515] CPU features: 0x002,21006008 [ 294.577638] Memory Limit: none [ 294.587178] Starting crashdump kernel... [ 294.594314] Bye! Because the user->release_barrier.rda is freed in ipmi_destroy_user(), but the refcount is not zero, when acquire_ipmi_user() uses user->release_barrier.rda in __srcu_read_lock(), it causes oops. Fix this by calling cleanup_srcu_struct() when the refcount is zero. Fixes: e86ee2d ("ipmi: Rework locking and shutdown for hot remove") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…supported commit f3e0269 upstream. This patch fixes warning seen when BLK-MQ is enabled and hardware does not support MQ. This will result into driver requesting MSIx vectors which are equal or less than pre_desc via PCI IRQ Affinity infrastructure. [ 19.746300] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.00.00.12-k. [ 19.746599] qla2xxx [0000:02:00.0]-001d: : Found an ISP2432 irq 18 iobase 0x(____ptrval____). [ 20.203186] ------------[ cut here ]------------ [ 20.203306] WARNING: CPU: 8 PID: 268 at drivers/pci/msi.c:1273 pci_irq_get_affinity+0xf4/0x120 [ 20.203481] Modules linked in: tg3 ptp qla2xxx(+) pps_core sg libphy scsi_transport_fc flash loop autofs4 [ 20.203700] CPU: 8 PID: 268 Comm: systemd-udevd Not tainted 5.0.0-rc5-00358-gdf3865f linux-sunxi#113 [ 20.203830] Call Trace: [ 20.203933] [0000000000461bb0] __warn+0xb0/0xe0 [ 20.204090] [00000000006c8f34] pci_irq_get_affinity+0xf4/0x120 [ 20.204219] [000000000068c764] blk_mq_pci_map_queues+0x24/0x120 [ 20.204396] [00000000007162f4] scsi_map_queues+0x14/0x40 [ 20.204626] [0000000000673654] blk_mq_update_queue_map+0x94/0xe0 [ 20.204698] [0000000000676ce0] blk_mq_alloc_tag_set+0x120/0x300 [ 20.204869] [000000000071077c] scsi_add_host_with_dma+0x7c/0x300 [ 20.205419] [00000000100ead54] qla2x00_probe_one+0x19d4/0x2640 [qla2xxx] [ 20.205621] [00000000006b3c88] pci_device_probe+0xc8/0x160 [ 20.205697] [0000000000701c0c] really_probe+0x1ac/0x2e0 [ 20.205770] [0000000000701f90] driver_probe_device+0x50/0x100 [ 20.205843] [0000000000702134] __driver_attach+0xf4/0x120 [ 20.205913] [0000000000700644] bus_for_each_dev+0x44/0x80 [ 20.206081] [0000000000700c98] bus_add_driver+0x198/0x220 [ 20.206300] [0000000000702950] driver_register+0x70/0x120 [ 20.206582] [0000000010248224] qla2x00_module_init+0x224/0x284 [qla2xxx] [ 20.206857] ---[ end trace b1de7a3f79fab2c2 ]--- The fix is to check if the hardware does not have Multi Queue capabiltiy, use pci_alloc_irq_vectors() call instead of pci_alloc_irq_affinity(). Fixes: f664a3c ("scsi: kill off the legacy IO path") Cc: stable@vger.kernel.org jwrdegoede#4.19 Signed-off-by: Giridhar Malavali <gmalavali@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kvm->arch.arm_pmu is set when userspace attempts to set the first PMU attribute. As certain attributes are mandatory, arm_pmu ends up always being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU. However, this only happens if the VCPU has the PMU feature. If the VCPU doesn't have the feature bit set, kvm->arch.arm_pmu will be left uninitialized and equal to NULL. KVM doesn't do ID register emulation for 32-bit guests and accesses to the PMU registers aren't gated by the pmu_visibility() function. This is done to prevent injecting unexpected undefined exceptions in guests which have detected the presence of a hardware PMU. But even though the VCPU feature is missing, KVM still attempts to emulate certain aspects of the PMU when PMU registers are accessed. This leads to a NULL pointer dereference like this one, which happens on an odroid-c4 board when running the kvm-unit-tests pmu-cycle-counter test with kvmtool and without the PMU feature being set: [ 454.402699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000150 [ 454.405865] Mem abort info: [ 454.408596] ESR = 0x96000004 [ 454.411638] EC = 0x25: DABT (current EL), IL = 32 bits [ 454.416901] SET = 0, FnV = 0 [ 454.419909] EA = 0, S1PTW = 0 [ 454.423010] FSC = 0x04: level 0 translation fault [ 454.427841] Data abort info: [ 454.430687] ISV = 0, ISS = 0x00000004 [ 454.434484] CM = 0, WnR = 0 [ 454.437404] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000c924000 [ 454.443800] [0000000000000150] pgd=0000000000000000, p4d=0000000000000000 [ 454.450528] Internal error: Oops: 96000004 [jwrdegoede#1] PREEMPT SMP [ 454.456036] Modules linked in: [ 454.459053] CPU: 1 PID: 267 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc4 linux-sunxi#113 [ 454.465697] Hardware name: Hardkernel ODROID-C4 (DT) [ 454.470612] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 454.477512] pc : kvm_pmu_event_mask.isra.0+0x14/0x74 [ 454.482427] lr : kvm_pmu_set_counter_event_type+0x2c/0x80 [ 454.487775] sp : ffff80000a9839c0 [ 454.491050] x29: ffff80000a9839c0 x28: ffff000000a83a00 x27: 0000000000000000 [ 454.498127] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000a510000 [ 454.505198] x23: ffff000000a83a00 x22: ffff000003b01000 x21: 0000000000000000 [ 454.512271] x20: 000000000000001f x19: 00000000000003ff x18: 0000000000000000 [ 454.519343] x17: 000000008003fe98 x16: 0000000000000000 x15: 0000000000000000 [ 454.526416] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 454.533489] x11: 000000008003fdbc x10: 0000000000009d20 x9 : 000000000000001b [ 454.540561] x8 : 0000000000000000 x7 : 0000000000000d00 x6 : 0000000000009d00 [ 454.547633] x5 : 0000000000000037 x4 : 0000000000009d00 x3 : 0d09000000000000 [ 454.554705] x2 : 000000000000001f x1 : 0000000000000000 x0 : 0000000000000000 [ 454.561779] Call trace: [ 454.564191] kvm_pmu_event_mask.isra.0+0x14/0x74 [ 454.568764] kvm_pmu_set_counter_event_type+0x2c/0x80 [ 454.573766] access_pmu_evtyper+0x128/0x170 [ 454.577905] perform_access+0x34/0x80 [ 454.581527] kvm_handle_cp_32+0x13c/0x160 [ 454.585495] kvm_handle_cp15_32+0x1c/0x30 [ 454.589462] handle_exit+0x70/0x180 [ 454.592912] kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0 [ 454.597485] kvm_vcpu_ioctl+0x23c/0x940 [ 454.601280] __arm64_sys_ioctl+0xa8/0xf0 [ 454.605160] invoke_syscall+0x48/0x114 [ 454.608869] el0_svc_common.constprop.0+0xd4/0xfc [ 454.613527] do_el0_svc+0x28/0x90 [ 454.616803] el0_svc+0x34/0xb0 [ 454.619822] el0t_64_sync_handler+0xa4/0x130 [ 454.624049] el0t_64_sync+0x18c/0x190 [ 454.627675] Code: a9be7bfd 910003fd f9000bf3 52807ff3 (b9415001) [ 454.633714] ---[ end trace 0000000000000000 ]--- In this particular case, Linux hasn't detected the presence of a hardware PMU because the PMU node is missing from the DTB, so userspace would have been unable to set the VCPU PMU feature even if it attempted it. What happens is that the 32-bit guest reads ID_DFR0, which advertises the presence of the PMU, and when it tries to program a counter, it triggers the NULL pointer dereference because kvm->arch.arm_pmu is NULL. kvm-arch.arm_pmu was introduced by commit 46b1878 ("KVM: arm64: Keep a per-VM pointer to the default PMU"). Until that commit, this error would be triggered instead: [ 73.388140] ------------[ cut here ]------------ [ 73.388189] Unknown PMU version 0 [ 73.390420] WARNING: CPU: 1 PID: 264 at arch/arm64/kvm/pmu-emul.c:36 kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.399821] Modules linked in: [ 73.402835] CPU: 1 PID: 264 Comm: kvm-vcpu-0 Not tainted 5.17.0 linux-sunxi#114 [ 73.409132] Hardware name: Hardkernel ODROID-C4 (DT) [ 73.414048] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 73.420948] pc : kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.425863] lr : kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.430779] sp : ffff80000a8db9b0 [ 73.434055] x29: ffff80000a8db9b0 x28: ffff000000dbaac0 x27: 0000000000000000 [ 73.441131] x26: ffff000000dbaac0 x25: 00000000c600000d x24: 0000000000180720 [ 73.448203] x23: ffff800009ffbe10 x22: ffff00000b612000 x21: 0000000000000000 [ 73.455276] x20: 000000000000001f x19: 0000000000000000 x18: ffffffffffffffff [ 73.462348] x17: 000000008003fe98 x16: 0000000000000000 x15: 0720072007200720 [ 73.469420] x14: 0720072007200720 x13: ffff800009d32488 x12: 00000000000004e6 [ 73.476493] x11: 00000000000001a2 x10: ffff800009d32488 x9 : ffff800009d32488 [ 73.483565] x8 : 00000000ffffefff x7 : ffff800009d8a488 x6 : ffff800009d8a488 [ 73.490638] x5 : ffff0000f461a9d8 x4 : 0000000000000000 x3 : 0000000000000001 [ 73.497710] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000000dbaac0 [ 73.504784] Call trace: [ 73.507195] kvm_pmu_event_mask.isra.0+0x6c/0x74 [ 73.511768] kvm_pmu_set_counter_event_type+0x2c/0x80 [ 73.516770] access_pmu_evtyper+0x128/0x16c [ 73.520910] perform_access+0x34/0x80 [ 73.524532] kvm_handle_cp_32+0x13c/0x160 [ 73.528500] kvm_handle_cp15_32+0x1c/0x30 [ 73.532467] handle_exit+0x70/0x180 [ 73.535917] kvm_arch_vcpu_ioctl_run+0x20c/0x6e0 [ 73.540489] kvm_vcpu_ioctl+0x2b8/0x9e0 [ 73.544283] __arm64_sys_ioctl+0xa8/0xf0 [ 73.548165] invoke_syscall+0x48/0x114 [ 73.551874] el0_svc_common.constprop.0+0xd4/0xfc [ 73.556531] do_el0_svc+0x28/0x90 [ 73.559808] el0_svc+0x28/0x80 [ 73.562826] el0t_64_sync_handler+0xa4/0x130 [ 73.567054] el0t_64_sync+0x1a0/0x1a4 [ 73.570676] ---[ end trace 0000000000000000 ]--- [ 73.575382] kvm: pmu event creation failed -2 The root cause remains the same: kvm->arch.pmuver was never set to something sensible because the VCPU feature itself was never set. The odroid-c4 is somewhat of a special case, because Linux doesn't probe the PMU. But the above errors can easily be reproduced on any hardware, with or without a PMU driver, as long as userspace doesn't set the PMU feature. Work around the fact that KVM advertises a PMU even when the VCPU feature is not set by gating all PMU emulation on the feature. The guest can still access the registers without KVM injecting an undefined exception. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425145530.723858-1-alexandru.elisei@arm.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load lcd module... #113

Unable to load lcd module... #113

sfrappier commented Jan 25, 2013

sfrappier commented Jan 25, 2013

romanrm commented Jan 25, 2013

sfrappier commented Jan 25, 2013

romanrm commented Jan 25, 2013

sfrappier commented Jan 25, 2013

sfrappier commented Jan 25, 2013

sfrappier commented Jan 25, 2013

amery commented Jan 25, 2013

sfrappier commented Jan 26, 2013

sfrappier commented Jan 26, 2013

sfrappier commented Jan 26, 2013

romanrm commented Jan 26, 2013

sfrappier commented Jan 27, 2013

sfrappier commented Jan 27, 2013

sfrappier commented Jan 27, 2013

sfrappier commented Jan 28, 2013

sfrappier commented Jan 29, 2013

techn commented Jan 29, 2013

sfrappier commented Jan 29, 2013

techn commented Jan 29, 2013

sfrappier commented Jan 29, 2013

techn commented Jan 30, 2013

sfrappier commented Jan 30, 2013

sfrappier commented Jan 30, 2013

sfrappier commented Jan 30, 2013

sfrappier commented Feb 13, 2013

techn commented Feb 13, 2013

sfrappier commented Feb 18, 2013

Unable to load lcd module... #113

Unable to load lcd module... #113

Comments

sfrappier commented Jan 25, 2013

sfrappier commented Jan 25, 2013

romanrm commented Jan 25, 2013

sfrappier commented Jan 25, 2013

romanrm commented Jan 25, 2013

sfrappier commented Jan 25, 2013

sfrappier commented Jan 25, 2013

sfrappier commented Jan 25, 2013

amery commented Jan 25, 2013

sfrappier commented Jan 26, 2013

sfrappier commented Jan 26, 2013

sfrappier commented Jan 26, 2013

romanrm commented Jan 26, 2013

sfrappier commented Jan 27, 2013

sfrappier commented Jan 27, 2013

sfrappier commented Jan 27, 2013

sfrappier commented Jan 28, 2013

sfrappier commented Jan 29, 2013

techn commented Jan 29, 2013

sfrappier commented Jan 29, 2013

techn commented Jan 29, 2013

sfrappier commented Jan 29, 2013

techn commented Jan 30, 2013

sfrappier commented Jan 30, 2013

sfrappier commented Jan 30, 2013

sfrappier commented Jan 30, 2013

sfrappier commented Feb 13, 2013

techn commented Feb 13, 2013

sfrappier commented Feb 18, 2013