Goodix GT801NI_3R15_1AV ("2 in 1"?) touchscreen doesn't work #26

turl · 2012-05-23T19:46:24Z

Log on stock firmware, where it works fine:

<3>[    7.934312] init: untracked pid 85 exited
<7>[    7.950463] goodix_touch_3F goodix_ts_init @20120306 m_inet_ctpState return false,continue deteck 
<7>[    7.970474] ctp_fetch_sysconfig_para. 
<7>[    7.970474] ctp_fetch_sysconfig_para: after: ctp_twi_addr is 0x55, dirty_addr_buf: 0x55. dirty_addr_buf[1]: 0xfffe 
<7>[    8.000466] ctp_fetch_sysconfig_para: ctp_twi_id is 2. 
<6>[    8.000466] ctp_fetch_sysconfig_para: screen_max_x = 1024. 
<6>[    8.020473] ctp_fetch_sysconfig_para: screen_max_y = 768. 
<6>[    8.020473] ctp_fetch_sysconfig_para: revert_x_flag = 0. 
<6>[    8.040477] ctp_fetch_sysconfig_para: revert_y_flag = 0. 
<6>[    8.040477] ctp_fetch_sysconfig_para: exchange_x_y_flag = 1. 
<6>[    8.070460] goodix_ts_init release by jicky 2012.02.06 
<7>[    8.070460] [jicky]===========================goodix_ts_init=====================
<7>[    8.090467] goodix_ts_init: after fetch_sysconfig_para:  normal_i2c: 0x55. normal_i2c[1]: 0xfffe 
<6>[    8.090467] ctp_detect: Detected chip Goodix-TS at adapter 2, address 0x55
<7>[    8.120483] ============= GT801 2+1 probe start ==============
<6>[    8.430491] Begin goodix i2c test
<6>[    8.430491] ===== goodix_touch_3F i2c test ok=======
<6>[    8.460492] Goodix-TS 2-0055: gt801 2pulus1 init panel!
<6>[    8.480494] Goodix-TS 2-0055: Touchscreen works in IIC wake up green mode!
<6>[    8.530503] input: Goodix-TS-board-3 as /devices/virtual/input/input2
<7>[    8.530503] Goodix TouchScreen Version:GT801NI_3R15_1AV
<6>[    8.544167] Goodix-TS 2-0055: Create proc entry success!
<6>[    8.544167] Goodix-TS 2-0055: Goodix debug sysfs create success!
<6>[    8.555667] Goodix-TS 2-0055: Start Goodix-TS-board-3 in interrupt mode
<6>[    8.562387] Goodix-TS 2-0055: Driver Modify Date:2011-06-27
<7>[    8.562387] ============= GT8012+1 probe finished ==============
<6>[    8.574098] init: command 'insmod' r=0
<7>[    8.590510] ssd253x-ts_86F rocky@20120308m_inet_ctpState return true,just return 
<6>[    8.590510] init: command 'insmod' r=-1
<7>[    8.610505] sun4i-ts.c: sun4i_ts_init: start ...
<7>[    8.610505] rtp_used == 0. 
<6>[    8.621778] init: command 'insmod' r=-1
<6>[    8.621778] init: command 'insmod' r=-1

Some info about the module on stock rom:

$ strings goodix_touch_3F.ko |grep =
license=GPL
description=Goodix Touchscreen Driver
depends=ft5x_ts
[...]

Log of "goodix_touch.ko" from source kernel

<4>[    3.220000] ===========================goodix_ts_init=====================
<4>[    3.220000] ctp_fetch_sysconfig_para. 
<4>[    3.230000] ctp_fetch_sysconfig_para: after: ctp_twi_addr is 0x55, dirty_addr_buf: 0x55. dirty_addr_buf[1]: 0xfffe 
<4>[    3.240000] ctp_fetch_sysconfig_para: ctp_twi_id is 2. 
<6>[    3.240000] ctp_fetch_sysconfig_para: screen_max_x = 1024. 
<6>[    3.250000] ctp_fetch_sysconfig_para: screen_max_y = 768. 
<6>[    3.250000] ctp_fetch_sysconfig_para: revert_x_flag = 0. 
<6>[    3.260000] ctp_fetch_sysconfig_para: revert_y_flag = 0. 
<6>[    3.260000] ctp_fetch_sysconfig_para: exchange_x_y_flag = 1. 
<4>[    3.270000] goodix_ts_init: after fetch_sysconfig_para:  normal_i2c: 0x55. normal_i2c[1]: 0xfffe 
<4>[    3.280000] script parser fetch err. 
<4>[    3.280000] ctp_init_platform_resource: tp_reset request gpio fail!
<4>[    3.290000] ctp_reset. 
<4>[    3.290000] ctp_wakeup. 
<6>[    3.490000] ctp_detect: Detected chip Goodix-TS at adapter 2, address 0x55
<6>[    3.500000] ===============================GT801 Probe===========================
<6>[    3.510000] Begin goodix i2c test
<4>[    3.510000] incomplete xfer (0x20)
<4>[    3.530000] incomplete xfer (0x20)
<4>[    3.550000] incomplete xfer (0x20)
<4>[    3.570000] incomplete xfer (0x20)
<4>[    3.590000] incomplete xfer (0x20)
<6>[    3.610000] Warnning: I2C connection might be something wrong!

The text was updated successfully, but these errors were encountered:

amery · 2012-05-23T20:50:41Z

tried contacting this Jicky http://forum.linuxmce.org/index.php?action=profile;u=59920 ? it's like the best match Google gives

turl · 2012-05-23T20:52:33Z

No, I haven't. I did however send an email to sales@goodix.com earlier today, no answer yet.

turl · 2012-05-26T06:21:19Z

After an ugly hack on the kernel (allwinner-dev-team@80752bd), disabling CONFIG_MODVERSIONS and editing the vermagic on the stock ROM module (goodix_touch_3F.ko), the binary module loads correctly, detects the touchscreen and works. This pretty much confirms it's not a bug on the i2c stack.

frantisheq · 2012-05-26T13:05:12Z

turl can you write some howto and upload working goodix_touch_3F.ko? i'd like to try it on gemei g9 tablet. thank you.

turl · 2012-05-26T19:16:03Z

What I did is, basically:

Include the patch linked previously on your kernel and enable the new config option
Disable MODVERSIONS (see https://github.com/allwinner-dev-team/linux-allwinner/blob/allwinner-3.0/arch/arm/configs/zatab_defconfig#L167). I also disabled MODULE_SRCVERSION_ALL, not sure if it's needed.
Make sure LOCALVERSION is empty and LOCALVERSION_AUTO is unset (see https://github.com/allwinner-dev-team/linux-allwinner/blob/allwinner-3.0/arch/arm/configs/zatab_defconfig#L37)
Build and get the kernel on your device
insmod the modified module .ko, you can get it from here https://github.com/allwinner-dev-team/android_device_allwinner_zatab/blob/ics/goodix_touch_3F.ko
Check dmesg, getevent

reg_timeout_work() calls restore_regulatory_settings() which takes cfg80211_mutex. reg_set_request_processed() already holds cfg80211_mutex before calling cancel_delayed_work_sync(reg_timeout), so it might deadlock. Call the async cancel_delayed_work instead, in order to avoid the potential deadlock. This is the relevant lockdep warning: cfg80211: Calling CRDA for country: XX ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc5-wl+ #26 Not tainted ------------------------------------------------------- kworker/0:2/1391 is trying to acquire lock: (cfg80211_mutex){+.+.+.}, at: [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211] but task is already holding lock: ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((reg_timeout).work){+.+...}: [<c008fd44>] validate_chain+0xb94/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c005b600>] wait_on_work+0x4c/0x154 [<c005c000>] __cancel_work_timer+0xd4/0x11c [<c005c064>] cancel_delayed_work_sync+0x1c/0x20 [<bf28b274>] reg_set_request_processed+0x50/0x78 [cfg80211] [<bf28bd84>] set_regdom+0x550/0x600 [cfg80211] [<bf294cd8>] nl80211_set_reg+0x218/0x258 [cfg80211] [<c03c7738>] genl_rcv_msg+0x1a8/0x1e8 [<c03c6a00>] netlink_rcv_skb+0x5c/0xc0 [<c03c7584>] genl_rcv+0x28/0x34 [<c03c6720>] netlink_unicast+0x15c/0x228 [<c03c6c7c>] netlink_sendmsg+0x218/0x298 [<c03933c8>] sock_sendmsg+0xa4/0xc0 [<c039406c>] __sys_sendmsg+0x1e4/0x268 [<c0394228>] sys_sendmsg+0x4c/0x70 [<c0013840>] ret_fast_syscall+0x0/0x3c -> #1 (reg_mutex){+.+.+.}: [<c008fd44>] validate_chain+0xb94/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c04734dc>] mutex_lock_nested+0x48/0x320 [<bf28b2cc>] reg_todo+0x30/0x538 [cfg80211] [<c0059f44>] process_one_work+0x2a0/0x480 [<c005a4b4>] worker_thread+0x1bc/0x2bc [<c0061148>] kthread+0x98/0xa4 [<c0014af4>] kernel_thread_exit+0x0/0x8 -> #0 (cfg80211_mutex){+.+.+.}: [<c008ed58>] print_circular_bug+0x68/0x2cc [<c008fb28>] validate_chain+0x978/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c04734dc>] mutex_lock_nested+0x48/0x320 [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211] [<bf28b200>] reg_timeout_work+0x1c/0x20 [cfg80211] [<c0059f44>] process_one_work+0x2a0/0x480 [<c005a4b4>] worker_thread+0x1bc/0x2bc [<c0061148>] kthread+0x98/0xa4 [<c0014af4>] kernel_thread_exit+0x0/0x8 other info that might help us debug this: Chain exists of: cfg80211_mutex --> reg_mutex --> (reg_timeout).work Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((reg_timeout).work); lock(reg_mutex); lock((reg_timeout).work); lock(cfg80211_mutex); *** DEADLOCK *** 2 locks held by kworker/0:2/1391: #0: (events){.+.+.+}, at: [<c0059e94>] process_one_work+0x1f0/0x480 #1: ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480 stack backtrace: [<c001b928>] (unwind_backtrace+0x0/0x12c) from [<c0471d3c>] (dump_stack+0x20/0x24) [<c0471d3c>] (dump_stack+0x20/0x24) from [<c008ef70>] (print_circular_bug+0x280/0x2cc) [<c008ef70>] (print_circular_bug+0x280/0x2cc) from [<c008fb28>] (validate_chain+0x978/0x10f0) [<c008fb28>] (validate_chain+0x978/0x10f0) from [<c0090b68>] (__lock_acquire+0x8c8/0x9b0) [<c0090b68>] (__lock_acquire+0x8c8/0x9b0) from [<c0090d40>] (lock_acquire+0xf0/0x114) [<c0090d40>] (lock_acquire+0xf0/0x114) from [<c04734dc>] (mutex_lock_nested+0x48/0x320) [<c04734dc>] (mutex_lock_nested+0x48/0x320) from [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) from [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) from [<c0059f44>] (process_one_work+0x2a0/0x480) [<c0059f44>] (process_one_work+0x2a0/0x480) from [<c005a4b4>] (worker_thread+0x1bc/0x2bc) [<c005a4b4>] (worker_thread+0x1bc/0x2bc) from [<c0061148>] (kthread+0x98/0xa4) [<c0061148>] (kthread+0x98/0xa4) from [<c0014af4>] (kernel_thread_exit+0x0/0x8) cfg80211: Calling CRDA to update world regulatory domain cfg80211: World regulatory domain updated: cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Cc: stable@kernel.org Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

simonckenyon · 2012-07-11T11:07:10Z

where is the above mentioned patch? i get a 404

amery · 2012-07-11T16:59:52Z

@simonckenyon: @turl's tree is rebased, so the hash changes every time. For the sake of the ticket I've pasted it into a gist. ~~https://gist.github.com/3091730~~ https://gist.github.com/3093889

edit: correct patch

simonckenyon · 2012-07-11T19:30:03Z

On 07/11/12 17:59, Alejandro Mery wrote:

@simonckenyon: @turl's tree is rebased, so the hash changes every time. For the sake of the ticket I've pasted it into a gist. https://gist.github.com/3091730

Reply to this email directly or view it on GitHub:
https://github.com/amery/linux-allwinner/issues/26#issuecomment-6912717

thank you.
unfortunately it doesn't work for me.
says there is an undefined method/call/whatever

this is so frustrating. all i want to do is compile a driver and i'm
having to go though this mess to achieve it.

thanks for all your help

simon

amery · 2012-07-11T19:31:45Z

that patch should work fine in the 3.0-v2 branch. what are you building?

turl · 2012-07-11T19:34:16Z

looks like @amery pasted the wrong patch, this is probably the one you're looking for https://github.com/allwinner-dev-team/linux-allwinner/commit/8e1968f86e15e442a9ae5944535ac8000356d2e1.patch

amery · 2012-07-11T19:35:37Z

@turl: doh :< ... yes, with the conversation in IRC I got stuck with the wifi driver

simonckenyon · 2012-07-11T20:37:19Z

I have been building that version.
My mail problem is that I bought a tablet from a vendor who will not provide any help. Not even an image.
So I am having to guess at what hardware is in it.
I eventally found an image that supports all the hardware and I have been using that as my backup.

My goal is to get cm9 with a self compiled kernel that I can then use for my project. That requires me to build a device driver for a digital persona fingerprint reader.

I am going to ask my business partners if I can put up a bounty. We have not gt funding yet, so money is very tight. But right now I have spent weeks on this and I can I'll afford to continue doing that.

Sorry for the rant but it has been a tough day.

Simon

Alejandro Mery reply@reply.github.com wrote:

that patch should work fine in the 3.0-v2 branch. what are you building?

Reply to this email directly or view it on GitHub:
https://github.com/amery/linux-allwinner/issues/26#issuecomment-6916978

simonckenyon · 2012-07-11T20:37:55Z

He did but I found the correct one. Thanks.

turl reply@reply.github.com wrote:

looks like @amery pasted the wrong patch, this is probably the one you're looking for https://github.com/allwinner-dev-team/linux-allwinner/commit/8e1968f86e15e442a9ae5944535ac8000356d2e1.patch

Reply to this email directly or view it on GitHub:
https://github.com/amery/linux-allwinner/issues/26#issuecomment-6917061

commit fe20b39 upstream. reg_timeout_work() calls restore_regulatory_settings() which takes cfg80211_mutex. reg_set_request_processed() already holds cfg80211_mutex before calling cancel_delayed_work_sync(reg_timeout), so it might deadlock. Call the async cancel_delayed_work instead, in order to avoid the potential deadlock. This is the relevant lockdep warning: cfg80211: Calling CRDA for country: XX ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc5-wl+ #26 Not tainted ------------------------------------------------------- kworker/0:2/1391 is trying to acquire lock: (cfg80211_mutex){+.+.+.}, at: [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211] but task is already holding lock: ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((reg_timeout).work){+.+...}: [<c008fd44>] validate_chain+0xb94/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c005b600>] wait_on_work+0x4c/0x154 [<c005c000>] __cancel_work_timer+0xd4/0x11c [<c005c064>] cancel_delayed_work_sync+0x1c/0x20 [<bf28b274>] reg_set_request_processed+0x50/0x78 [cfg80211] [<bf28bd84>] set_regdom+0x550/0x600 [cfg80211] [<bf294cd8>] nl80211_set_reg+0x218/0x258 [cfg80211] [<c03c7738>] genl_rcv_msg+0x1a8/0x1e8 [<c03c6a00>] netlink_rcv_skb+0x5c/0xc0 [<c03c7584>] genl_rcv+0x28/0x34 [<c03c6720>] netlink_unicast+0x15c/0x228 [<c03c6c7c>] netlink_sendmsg+0x218/0x298 [<c03933c8>] sock_sendmsg+0xa4/0xc0 [<c039406c>] __sys_sendmsg+0x1e4/0x268 [<c0394228>] sys_sendmsg+0x4c/0x70 [<c0013840>] ret_fast_syscall+0x0/0x3c -> #1 (reg_mutex){+.+.+.}: [<c008fd44>] validate_chain+0xb94/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c04734dc>] mutex_lock_nested+0x48/0x320 [<bf28b2cc>] reg_todo+0x30/0x538 [cfg80211] [<c0059f44>] process_one_work+0x2a0/0x480 [<c005a4b4>] worker_thread+0x1bc/0x2bc [<c0061148>] kthread+0x98/0xa4 [<c0014af4>] kernel_thread_exit+0x0/0x8 -> #0 (cfg80211_mutex){+.+.+.}: [<c008ed58>] print_circular_bug+0x68/0x2cc [<c008fb28>] validate_chain+0x978/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c04734dc>] mutex_lock_nested+0x48/0x320 [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211] [<bf28b200>] reg_timeout_work+0x1c/0x20 [cfg80211] [<c0059f44>] process_one_work+0x2a0/0x480 [<c005a4b4>] worker_thread+0x1bc/0x2bc [<c0061148>] kthread+0x98/0xa4 [<c0014af4>] kernel_thread_exit+0x0/0x8 other info that might help us debug this: Chain exists of: cfg80211_mutex --> reg_mutex --> (reg_timeout).work Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((reg_timeout).work); lock(reg_mutex); lock((reg_timeout).work); lock(cfg80211_mutex); *** DEADLOCK *** 2 locks held by kworker/0:2/1391: #0: (events){.+.+.+}, at: [<c0059e94>] process_one_work+0x1f0/0x480 #1: ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480 stack backtrace: [<c001b928>] (unwind_backtrace+0x0/0x12c) from [<c0471d3c>] (dump_stack+0x20/0x24) [<c0471d3c>] (dump_stack+0x20/0x24) from [<c008ef70>] (print_circular_bug+0x280/0x2cc) [<c008ef70>] (print_circular_bug+0x280/0x2cc) from [<c008fb28>] (validate_chain+0x978/0x10f0) [<c008fb28>] (validate_chain+0x978/0x10f0) from [<c0090b68>] (__lock_acquire+0x8c8/0x9b0) [<c0090b68>] (__lock_acquire+0x8c8/0x9b0) from [<c0090d40>] (lock_acquire+0xf0/0x114) [<c0090d40>] (lock_acquire+0xf0/0x114) from [<c04734dc>] (mutex_lock_nested+0x48/0x320) [<c04734dc>] (mutex_lock_nested+0x48/0x320) from [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) from [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) from [<c0059f44>] (process_one_work+0x2a0/0x480) [<c0059f44>] (process_one_work+0x2a0/0x480) from [<c005a4b4>] (worker_thread+0x1bc/0x2bc) [<c005a4b4>] (worker_thread+0x1bc/0x2bc) from [<c0061148>] (kthread+0x98/0xa4) [<c0061148>] (kthread+0x98/0xa4) from [<c0014af4>] (kernel_thread_exit+0x0/0x8) cfg80211: Calling CRDA to update world regulatory domain cfg80211: World regulatory domain updated: cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

johnsonc · 2012-09-26T15:36:55Z

Hallo,
May I start with a thank you and you guys rawk!

I think this is related:

I did a modprobe on goodix_touch

kern.log had this::

Jan 1 00:33:34 linaro-alip kernel: [ 2014.680000] ===========================goodix_ts_init=====================
Jan 1 00:33:34 linaro-alip kernel: [ 2014.690000] ctp_fetch_sysconfig_para.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.690000] ctp_fetch_sysconfig_para: after: ctp_twi_addr is 0x5d, dirty_addr_buf: 0
x5d. dirty_addr_buf[1]: 0xfffe
Jan 1 00:33:34 linaro-alip kernel: [ 2014.700000] ctp_fetch_sysconfig_para: ctp_twi_id is 2.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.710000] ctp_fetch_sysconfig_para: screen_max_x = 800.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.710000] ctp_fetch_sysconfig_para: screen_max_y = 480.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.720000] ctp_fetch_sysconfig_para: revert_x_flag = 1.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.720000] ctp_fetch_sysconfig_para: revert_y_flag = 0.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.730000] ctp_fetch_sysconfig_para: exchange_x_y_flag = 0.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.730000] goodix_ts_init: after fetch_sysconfig_para: normal_i2c: 0x5d. normal_i2
c[1]: 0xfffe
Jan 1 00:33:34 linaro-alip kernel: [ 2014.740000] ctp_reset.
Jan 1 00:33:34 linaro-alip kernel: [ 2014.770000] ctp_wakeup.
Jan 1 00:33:35 linaro-alip kernel: [ 2014.990000] i2c-core: driver [Goodix-TS] using legacy suspend method
Jan 1 00:33:35 linaro-alip kernel: [ 2014.990000] i2c-core: driver [Goodix-TS] using legacy resume method
Jan 1 00:33:35 linaro-alip kernel: [ 2015.000000] ctp_detect: Detected chip Goodix-TS at adapter 2, address 0x5d
Jan 1 00:33:35 linaro-alip kernel: [ 2015.010000] ===============================GT801 Probe===========================
Jan 1 00:33:35 linaro-alip kernel: [ 2015.010000] Begin goodix i2c test
Jan 1 00:33:35 linaro-alip kernel: [ 2015.010000] START can't sendout!
Jan 1 00:33:35 linaro-alip kernel: [ 2015.020000] Retrying transmission 2
Jan 1 00:33:35 linaro-alip kernel: [ 2015.030000] START can't sendout!
Jan 1 00:33:35 linaro-alip kernel: [ 2015.030000] Retrying transmission 1
Jan 1 00:33:35 linaro-alip kernel: [ 2015.030000] START can't sendout!
Jan 1 00:33:35 linaro-alip kernel: [ 2015.030000] Retrying transmission 0
Jan 1 00:33:35 linaro-alip kernel: [ 2015.050000] START can't sendout!
Jan 1 00:33:35 linaro-alip kernel: [ 2015.050000] Retrying transmission 2
Jan 1 00:33:35 linaro-alip kernel: [ 2015.050000] START can't sendout!
.
.
.
Jan 1 00:33:35 linaro-alip kernel: [ 2015.150000] Retrying transmission 0
Jan 1 00:33:35 linaro-alip kernel: [ 2015.170000] Warnning: I2C connection might be something wrong!
Jan 1 00:34:04 linaro-alip kernel: [ 2044.920000] Goodix-TS 2-005d: The driver is removing...
Jan 1 00:34:05 linaro-alip kernel: [ 2044.930000] ------------[ cut here ]------------
Jan 1 00:34:05 linaro-alip kernel: [ 2044.930000] WARNING: at kernel/irq/manage.c:1166 __free_irq+0xa8/0x1ac()
Jan 1 00:34:05 linaro-alip kernel: [ 2044.930000] Trying to free already-free IRQ 28
Jan 1 00:34:05 linaro-alip kernel: [ 2044.930000] Modules linked in: goodix_touch(-) disp_ump 8192cu mali_drm drm mali ump hdmi lcd disp cfbcopyarea cfbimgblt cfbfillrect

kern.log + backtrace is at http://pastebin.com/nqKD3NAB
syslog/dmesg spit out similar things..

Is this relevant to the current issue? I have a hunch it could be because of some device specific settings in script.bin.

tested on an A10_MID tablet...

johnsonc · 2012-09-26T15:56:00Z

btw the script.bin is the one from the device... no extra tweaks...

wingrime · 2012-10-14T19:39:50Z

May be simular issue that I had
please send script.bin

johnsonc · 2012-10-15T15:25:36Z

The script.fex is here: http://pastebin.com/H4xc41yr

Once the kernel is loaded, why does the script.bin config disrupt the driver??

techn · 2012-12-14T21:10:19Z

Could someone test this commit: 82bcb1a

This avoids: Apr 12 23:52:16 homeserver kernel: imon:send_packet: task interrupted Apr 12 23:52:16 homeserver kernel: ------------[ cut here ]------------ Apr 12 23:52:16 homeserver kernel: WARNING: at drivers/usb/core/urb.c:327 usb_submit_urb+0x353/0x370() Apr 12 23:52:16 homeserver kernel: Hardware name: Unknow Apr 12 23:52:16 homeserver kernel: URB f64b6f00 submitted while active Apr 12 23:52:16 homeserver kernel: Modules linked in: Apr 12 23:52:16 homeserver kernel: Pid: 3154, comm: LCDd Not tainted 3.8.6-htpc-00005-g9e6fc5e #26 Apr 12 23:52:16 homeserver kernel: Call Trace: Apr 12 23:52:16 homeserver kernel: [<c012d778>] ? warn_slowpath_common+0x78/0xb0 Apr 12 23:52:16 homeserver kernel: [<c04136c3>] ? usb_submit_urb+0x353/0x370 Apr 12 23:52:16 homeserver kernel: [<c04136c3>] ? usb_submit_urb+0x353/0x370 Apr 12 23:52:16 homeserver kernel: [<c0447010>] ? imon_ir_change_protocol+0x150/0x150 Apr 12 23:52:16 homeserver kernel: [<c012d843>] ? warn_slowpath_fmt+0x33/0x40 Apr 12 23:52:16 homeserver kernel: [<c04136c3>] ? usb_submit_urb+0x353/0x370 Apr 12 23:52:16 homeserver kernel: [<c0446c67>] ? send_packet+0x97/0x270 Apr 12 23:52:16 homeserver kernel: [<c0446cfe>] ? send_packet+0x12e/0x270 Apr 12 23:52:16 homeserver kernel: [<c05c5743>] ? do_nanosleep+0xa3/0xd0 Apr 12 23:52:16 homeserver kernel: [<c044760e>] ? vfd_write+0xae/0x250 Apr 12 23:52:16 homeserver kernel: [<c0447560>] ? lcd_write+0x180/0x180 Apr 12 23:52:16 homeserver kernel: [<c01b2b19>] ? vfs_write+0x89/0x140 Apr 12 23:52:16 homeserver kernel: [<c01b2dda>] ? sys_write+0x4a/0x90 Apr 12 23:52:16 homeserver kernel: [<c05c7c45>] ? sysenter_do_call+0x12/0x26 Apr 12 23:52:16 homeserver kernel: ---[ end trace a0b6f0fcfd2f9a1d ]--- Apr 12 23:52:16 homeserver kernel: imon:send_packet: error submitting urb(-16) Apr 12 23:52:16 homeserver kernel: imon:vfd_write: send packet #3 failed Apr 12 23:52:16 homeserver kernel: imon:send_packet: error submitting urb(-16) Apr 12 23:52:16 homeserver kernel: imon:vfd_write: send packet #0 failed Signed-off-by: Kevin Baradon <kevin.baradon@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

em28xx is oopsing with some DVB devices: [10856.061884] general protection fault: 0000 [#1] SMP [10856.067041] Modules linked in: rc_hauppauge em28xx_rc xc5000 drxk em28xx_dvb dvb_core em28xx videobuf2_vmalloc videobuf2_memops videobuf2_core rc_pixelview_new tuner_xc2028 tuner cx8800 cx88xx tveeprom btcx_risc videobuf_dma_sg videobuf_core rc_core v4l2_common videodev ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM be2iscsi iscsi_boot_sysfs iptable_mangle bnx2i cnic uio cxgb4i cxgb4 tun bridge cxgb3i cxgb3 stp ip6t_REJECT mdio libcxgbi nf_conntrack_ipv6 llc nf_defrag_ipv6 ib_iser rdma_cm ib_addr xt_conntrack iw_cm ib_cm ib_sa nf_conntrack ib_mad ib_core bnep bluetooth iscsi_tcp libiscsi_tcp ip6table_filter libiscsi ip6_tables scsi_transport_iscsi xfs libcrc32c snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm tg3 snd_page_alloc snd_timer [10856.139176] snd ptp iTCO_wdt soundcore pps_core iTCO_vendor_support lpc_ich mfd_core coretemp nfsd hp_wmi crc32c_intel microcode serio_raw rfkill sparse_keymap nfs_acl lockd sunrpc kvm_intel kvm uinput binfmt_misc firewire_ohci nouveau mxm_wmi i2c_algo_bit drm_kms_helper firewire_core crc_itu_t ttm drm i2c_core wmi [last unloaded: dib0070] [10856.168969] CPU 1 [10856.170799] Pid: 13606, comm: dvbv5-zap Not tainted 3.9.0-rc5+ #26 Hewlett-Packard HP Z400 Workstation/0AE4h [10856.181187] RIP: 0010:[<ffffffffa0459e47>] [<ffffffffa0459e47>] em28xx_write_regs_req+0x37/0x1c0 [em28xx] [10856.191028] RSP: 0018:ffff880118401a58 EFLAGS: 00010282 [10856.196533] RAX: 00020000012d0000 RBX: ffff88010804aec8 RCX: ffff880118401b14 [10856.203852] RDX: 0000000000000048 RSI: 0000000000000000 RDI: ffff88010804aec8 [10856.211174] RBP: ffff880118401ac8 R08: 0000000000000001 R09: 0000000000000000 [10856.218496] R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000048 [10856.226026] R13: ffff880118401b14 R14: ffff88011752b258 R15: ffff88011752b258 [10856.233352] FS: 00007f26636d2740(0000) GS:ffff88011fc20000(0000) knlGS:0000000000000000 [10856.241626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10856.247565] CR2: 00007f2663716e20 CR3: 00000000c7eb1000 CR4: 00000000000007e0 [10856.254889] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [10856.262215] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [10856.269542] Process dvbv5-zap (pid: 13606, threadinfo ffff880118400000, task ffff8800cd625d40) [10856.278340] Stack: [10856.280564] ffff88011ffe8de8 0000000000000002 0000000000000000 ffff88011ffe9b00 [10856.288191] ffff880118401b14 00ff88011ffe9b08 ffff880100000048 ffffffff8112a52a [10856.295893] 0000000000000001 ffff88010804aec8 0000000000000048 ffff880118401b14 [10856.303521] Call Trace: [10856.306182] [<ffffffff8112a52a>] ? __alloc_pages_nodemask+0x15a/0x960 [10856.312912] [<ffffffffa045a002>] em28xx_write_regs+0x32/0xa0 [em28xx] [10856.319638] [<ffffffffa045a221>] em28xx_write_reg+0x21/0x30 [em28xx] [10856.326279] [<ffffffffa045a2cc>] em28xx_gpio_set+0x9c/0x100 [em28xx] [10856.332919] [<ffffffffa045a3ac>] em28xx_set_mode+0x7c/0x80 [em28xx] [10856.339472] [<ffffffffa03ef032>] em28xx_dvb_bus_ctrl+0x32/0x40 [em28xx_dvb] This is caused by commit c7a45e5, that added support for two I2C buses. A partial fix was applied at 3de09fb, but it doesn't cover all cases, as the DVB core fills fe->dvb->priv with adapter->priv. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

With DT-based boot, the GPMC OneNAND sync mode setup does not work correctly. During the async mode setup, sync flags gets incorrectly set in the onenand_async data and the system crashes during the async setup. Also, the sync mode never gets set in gpmc_onenand_data->flags, so even without the crash, the actual sync mode setup would never be called. The patch fixes this by adjusting the gpmc_onenand_data->flags when the data is read from the DT. Also while doing this we force the onenand_async to be always async. The patch enables to use the following DTS chunk (that should correspond the arch/arm/mach-omap2/board-rm680.c board file setup) with Nokia N950, which currently crashes with 3.12-rc1. The crash output can be also found below. &gpmc { ranges = <0 0 0x04000000 0x20000000>; onenand@0,0 { #address-cells = <1>; #size-cells = <1>; reg = <0 0 0x20000000>; gpmc,sync-read; gpmc,sync-write; gpmc,burst-length = <16>; gpmc,burst-read; gpmc,burst-wrap; gpmc,burst-write; gpmc,device-width = <2>; gpmc,mux-add-data = <2>; gpmc,cs-on-ns = <0>; gpmc,cs-rd-off-ns = <87>; gpmc,cs-wr-off-ns = <87>; gpmc,adv-on-ns = <0>; gpmc,adv-rd-off-ns = <10>; gpmc,adv-wr-off-ns = <10>; gpmc,oe-on-ns = <15>; gpmc,oe-off-ns = <87>; gpmc,we-on-ns = <0>; gpmc,we-off-ns = <87>; gpmc,rd-cycle-ns = <112>; gpmc,wr-cycle-ns = <112>; gpmc,access-ns = <81>; gpmc,page-burst-access-ns = <15>; gpmc,bus-turnaround-ns = <0>; gpmc,cycle2cycle-delay-ns = <0>; gpmc,wait-monitoring-ns = <0>; gpmc,clk-activation-ns = <5>; gpmc,wr-data-mux-bus-ns = <30>; gpmc,wr-access-ns = <81>; gpmc,sync-clk-ps = <15000>; }; }; [ 1.467559] GPMC CS0: cs_on : 0 ticks, 0 ns (was 0 ticks) 0 ns [ 1.474822] GPMC CS0: cs_rd_off : 1 ticks, 5 ns (was 24 ticks) 5 ns [ 1.482116] GPMC CS0: cs_wr_off : 14 ticks, 71 ns (was 24 ticks) 71 ns [ 1.489349] GPMC CS0: adv_on : 0 ticks, 0 ns (was 0 ticks) 0 ns [ 1.496582] GPMC CS0: adv_rd_off: 3 ticks, 15 ns (was 3 ticks) 15 ns [ 1.503845] GPMC CS0: adv_wr_off: 3 ticks, 15 ns (was 3 ticks) 15 ns [ 1.511077] GPMC CS0: oe_on : 3 ticks, 15 ns (was 4 ticks) 15 ns [ 1.518310] GPMC CS0: oe_off : 1 ticks, 5 ns (was 24 ticks) 5 ns [ 1.525543] GPMC CS0: we_on : 0 ticks, 0 ns (was 0 ticks) 0 ns [ 1.532806] GPMC CS0: we_off : 8 ticks, 40 ns (was 24 ticks) 40 ns [ 1.540039] GPMC CS0: rd_cycle : 4 ticks, 20 ns (was 29 ticks) 20 ns [ 1.547302] GPMC CS0: wr_cycle : 4 ticks, 20 ns (was 29 ticks) 20 ns [ 1.554504] GPMC CS0: access : 0 ticks, 0 ns (was 23 ticks) 0 ns [ 1.561767] GPMC CS0: page_burst_access: 0 ticks, 0 ns (was 3 ticks) 0 ns [ 1.569641] GPMC CS0: bus_turnaround: 0 ticks, 0 ns (was 0 ticks) 0 ns [ 1.577270] GPMC CS0: cycle2cycle_delay: 0 ticks, 0 ns (was 0 ticks) 0 ns [ 1.585144] GPMC CS0: wait_monitoring: 0 ticks, 0 ns (was 0 ticks) 0 ns [ 1.592834] GPMC CS0: clk_activation: 0 ticks, 0 ns (was 0 ticks) 0 ns [ 1.600463] GPMC CS0: wr_data_mux_bus: 5 ticks, 25 ns (was 8 ticks) 25 ns [ 1.608154] GPMC CS0: wr_access : 0 ticks, 0 ns (was 23 ticks) 0 ns [ 1.615386] GPMC CS0 CLK period is 5 ns (div 1) [ 1.625122] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf009e442 [ 1.633178] Internal error: : 1008 [#1] ARM [ 1.637573] Modules linked in: [ 1.640777] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-rc1-n9xx-los.git-5318619-00006-g4baa700-dirty #26 [ 1.651123] task: ef04c000 ti: ef050000 task.ti: ef050000 [ 1.656799] PC is at gpmc_onenand_setup+0x98/0x1e0 [ 1.661865] LR is at gpmc_cs_set_timings+0x494/0x5a4 [ 1.667083] pc : [<c002e040>] lr : [<c001f384>] psr: 60000113 [ 1.667083] sp : ef051d10 ip : ef051ce0 fp : ef051d94 [ 1.679138] r10: c0caaf60 r9 : ef050000 r8 : ef18b32c [ 1.684631] r7 : f0080000 r6 : c0caaf60 r5 : 00000000 r4 : f009e400 [ 1.691497] r3 : f009e442 r2 : 80050000 r1 : 00000014 r0 : 00000000 [ 1.698333] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 1.706024] Control: 10c5387d Table: af290019 DAC: 00000015 [ 1.712066] Process swapper (pid: 1, stack limit = 0xef050240) [ 1.718200] Stack: (0xef051d10 to 0xef052000) [ 1.722778] 1d00: 00004000 00001402 00000000 00000005 [ 1.731384] 1d20: 00000047 00000000 0000000f 0000000f 00000000 00000028 0000000f 00000005 [ 1.739990] 1d40: 00000000 00000000 00000014 00000014 00000000 00000000 00000000 00000000 [ 1.748596] 1d60: 00000000 00000019 00000000 00000000 ef18b000 ef099c50 c0c8cb30 00000000 [ 1.757171] 1d80: c0488074 c048f868 ef051dcc ef051d98 c024447c c002dfb4 00000000 c048f868 [ 1.765777] 1da0: 00000000 00000000 c010e4a4 c0dbbb7c c0c8cb40 00000000 c0ca2500 c0488074 [ 1.774383] 1dc0: ef051ddc ef051dd0 c01fd508 c0244370 ef051dfc ef051de0 c01fc204 c01fd4f4 [ 1.782989] 1de0: c0c8cb40 c0ca2500 c0c8cb74 00000000 ef051e1c ef051e00 c01fc3b0 c01fc104 [ 1.791595] 1e00: ef0983bc 00000000 c0ca2500 c01fc31c ef051e44 ef051e20 c01fa794 c01fc328 [ 1.800201] 1e20: ef03634c ef0983b0 ef27d534 c0ca2500 ef27d500 c0c9a2f8 ef051e54 ef051e48 [ 1.808807] 1e40: c01fbcfc c01fa744 ef051e84 ef051e58 c01fb838 c01fbce4 c0411df8 c0caa040 [ 1.817413] 1e60: ef051e84 c0ca2500 00000006 c0caa040 00000066 c0488074 ef051e9c ef051e88 [ 1.825988] 1e80: c01fca30 c01fb768 c04975b8 00000006 ef051eac ef051ea0 c01fd728 c01fc9bc [ 1.834594] 1ea0: ef051ebc ef051eb0 c048808c c01fd6e4 ef051f4c ef051ec0 c0008888 c0488080 [ 1.843200] 1ec0: 0000006f c046bae8 00000000 00000000 ef051efc ef051ee0 ef051f04 ef051ee8 [ 1.851806] 1ee0: c046d400 c0181218 c046d410 c18da8d5 c036a8e4 00000066 ef051f4c ef051f08 [ 1.860412] 1f00: c004b9a8 c046d41c c048f840 00000006 00000006 c046b488 00000000 c043ec08 [ 1.869018] 1f20: ef051f4c c04975b8 00000006 c0caa040 00000066 c046d410 c048f85c c048f868 [ 1.877593] 1f40: ef051f94 ef051f50 c046db8c c00087a0 00000006 00000006 c046d410 ffffffff [ 1.886199] 1f60: ffffffff ffffffff ffffffff 00000000 c0348fd0 00000000 00000000 00000000 [ 1.894805] 1f80: 00000000 00000000 ef051fac ef051f98 c0348fe0 c046daa8 00000000 00000000 [ 1.903411] 1fa0: 00000000 ef051fb0 c000e7f8 c0348fdc 00000000 00000000 00000000 00000000 [ 1.912017] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1.920623] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff [ 1.929199] Backtrace: [ 1.931793] [<c002dfa8>] (gpmc_onenand_setup+0x0/0x1e0) from [<c024447c>] (omap2_onenand_probe+0x118/0x49c) [ 1.942047] [<c0244364>] (omap2_onenand_probe+0x0/0x49c) from [<c01fd508>] (platform_drv_probe+0x20/0x24) [ 1.952117] r8:c0488074 r7:c0ca2500 r6:00000000 r5:c0c8cb40 r4:c0dbbb7c [ 1.959197] [<c01fd4e8>] (platform_drv_probe+0x0/0x24) from [<c01fc204>] (driver_probe_device+0x10c/0x224) [ 1.969360] [<c01fc0f8>] (driver_probe_device+0x0/0x224) from [<c01fc3b0>] (__driver_attach+0x94/0x98) [ 1.979125] r7:00000000 r6:c0c8cb74 r5:c0ca2500 r4:c0c8cb40 [ 1.985107] [<c01fc31c>] (__driver_attach+0x0/0x98) from [<c01fa794>] (bus_for_each_dev+0x5c/0x90) [ 1.994506] r6:c01fc31c r5:c0ca2500 r4:00000000 r3:ef0983bc [ 2.000488] [<c01fa738>] (bus_for_each_dev+0x0/0x90) from [<c01fbcfc>] (driver_attach+0x24/0x28) [ 2.009735] r6:c0c9a2f8 r5:ef27d500 r4:c0ca2500 [ 2.014587] [<c01fbcd8>] (driver_attach+0x0/0x28) from [<c01fb838>] (bus_add_driver+0xdc/0x260) [ 2.023742] [<c01fb75c>] (bus_add_driver+0x0/0x260) from [<c01fca30>] (driver_register+0x80/0xfc) [ 2.033081] r8:c0488074 r7:00000066 r6:c0caa040 r5:00000006 r4:c0ca2500 [ 2.040161] [<c01fc9b0>] (driver_register+0x0/0xfc) from [<c01fd728>] (__platform_driver_register+0x50/0x64) [ 2.050476] r5:00000006 r4:c04975b8 [ 2.054260] [<c01fd6d8>] (__platform_driver_register+0x0/0x64) from [<c048808c>] (omap2_onenand_driver_init+0x18/0x20) [ 2.065490] [<c0488074>] (omap2_onenand_driver_init+0x0/0x20) from [<c0008888>] (do_one_initcall+0xf4/0x150) [ 2.075836] [<c0008794>] (do_one_initcall+0x0/0x150) from [<c046db8c>] (kernel_init_freeable+0xf0/0x1b4) [ 2.085815] [<c046da9c>] (kernel_init_freeable+0x0/0x1b4) from [<c0348fe0>] (kernel_init+0x10/0xec) [ 2.095336] [<c0348fd0>] (kernel_init+0x0/0xec) from [<c000e7f8>] (ret_from_fork+0x14/0x3c) [ 2.104125] r4:00000000 r3:00000000 [ 2.107879] Code: ebffc3ae e2505000 ba00002e e2843042 (e1d320b0) [ 2.114318] ---[ end trace b8ee3e3e5e002451 ]--- Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Tony Lindgren <tony@atomide.com>

As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Libin <huawei.libin@huawei.com> Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

…culation Currently mx53 (CortexA8) running at 1GHz reports: Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760) Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop) The original object code looks like this: 00000010 <__loop_const_udelay>: 10: e3e01000 mvn r1, #0 14: e51f201c ldr r2, [pc, #-28] ; 0 <__loop_udelay-0x8> 18: e5922000 ldr r2, [r2] 1c: e0800921 add r0, r0, r1, lsr #18 20: e1a00720 lsr r0, r0, #14 24: e0822b21 add r2, r2, r1, lsr #22 28: e1a02522 lsr r2, r2, #10 2c: e0000092 mul r0, r2, r0 30: e0800d21 add r0, r0, r1, lsr #26 34: e1b00320 lsrs r0, r0, #6 38: 01a0f00e moveq pc, lr 0000003c <__loop_delay>: 3c: e2500001 subs r0, r0, #1 40: 8afffffe bhi 3c <__loop_delay> 44: e1a0f00e mov pc, lr After adding the 'align 3' directive to __loop_delay (align to 8 bytes): 00000010 <__loop_const_udelay>: 10: e3e01000 mvn r1, #0 14: e51f201c ldr r2, [pc, #-28] ; 0 <__loop_udelay-0x8> 18: e5922000 ldr r2, [r2] 1c: e0800921 add r0, r0, r1, lsr #18 20: e1a00720 lsr r0, r0, #14 24: e0822b21 add r2, r2, r1, lsr #22 28: e1a02522 lsr r2, r2, #10 2c: e0000092 mul r0, r2, r0 30: e0800d21 add r0, r0, r1, lsr #26 34: e1b00320 lsrs r0, r0, #6 38: 01a0f00e moveq pc, lr 3c: e320f000 nop {0} 00000040 <__loop_delay>: 40: e2500001 subs r0, r0, #1 44: 8afffffe bhi 40 <__loop_delay> 48: e1a0f00e mov pc, lr 4c: e320f000 nop {0} , which now reports: Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736) Some more test results: On mx31 (ARM1136) running at 532 MHz, before the patch: Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184) On mx31 (ARM1136) running at 532 MHz after the patch: Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968) Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same BogoMIPS value before and after this patch. Reported-by: Tom Evans <tom_usenet@optusnet.com.au> Suggested-by: Tom Evans <tom_usenet@optusnet.com.au> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Printing the "start_ip" for every secondary cpu is very noisy on a large system - and doesn't add any value. Drop this message. Console log before: Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 96000 #2 smpboot cpu 2: start_ip = 96000 #3 smpboot cpu 3: start_ip = 96000 #4 smpboot cpu 4: start_ip = 96000 ... torvalds#31 smpboot cpu 31: start_ip = 96000 Brought up 32 CPUs Console log after: Booting Node 0, Processors #1 #2 #3 #4 #5 torvalds#6 torvalds#7 Ok. Booting Node 1, Processors torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 Ok. Booting Node 0, Processors torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 Ok. Booting Node 1, Processors torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 Brought up 32 CPUs Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4f452eb42507460426@agluck-desktop.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>

…d problems with booting Without that change booting leads to crash with more warnings like below: [ 0.284454] omap_hwmod: uart4: cannot clk_get main_clk uart4_fck [ 0.284484] omap_hwmod: uart4: cannot _init_clocks [ 0.284484] ------------[ cut here ]------------ [ 0.284545] WARNING: CPU: 0 PID: 1 at arch/arm/mach-omap2/omap_hwmod.c:2543 _init+0x300/0x3e4() [ 0.284545] omap_hwmod: uart4: couldn't init clocks [ 0.284576] Modules linked in: [ 0.284606] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-next-20140124-00020-gd2aefec-dirty #26 [ 0.284637] [<c00151c0>] (unwind_backtrace) from [<c0011e20>] (show_stack+0x10/0x14) [ 0.284667] [<c0011e20>] (show_stack) from [<c0568544>] (dump_stack+0x7c/0x94) [ 0.284729] [<c0568544>] (dump_stack) from [<c003ff94>] (warn_slowpath_common+0x6c/0x90) [ 0.284729] [<c003ff94>] (warn_slowpath_common) from [<c003ffe8>] (warn_slowpath_fmt+0x30/0x40) [ 0.284759] [<c003ffe8>] (warn_slowpath_fmt) from [<c07d1be8>] (_init+0x300/0x3e4) [ 0.284790] [<c07d1be8>] (_init) from [<c07d217c>] (__omap_hwmod_setup_all+0x40/0x8c) [ 0.284820] [<c07d217c>] (__omap_hwmod_setup_all) from [<c0008918>] (do_one_initcall+0xe8/0x14c) [ 0.284851] [<c0008918>] (do_one_initcall) from [<c07c5c18>] (kernel_init_freeable+0x104/0x1c8) [ 0.284881] [<c07c5c18>] (kernel_init_freeable) from [<c0563524>] (kernel_init+0x8/0x118) [ 0.284912] [<c0563524>] (kernel_init) from [<c000e368>] (ret_from_fork+0x14/0x2c) [ 0.285064] ---[ end trace 63de210ad43b627d ]--- Reference: https://lkml.org/lkml/2013/10/8/553 Signed-off-by: Marek Belisko <marek@goldelico.com> Signed-off-by: Tony Lindgren <tony@atomide.com>

[ Upstream commit af50e4b ] syzbot caught an infinite recursion in nsh_gso_segment(). Problem here is that we need to make sure the NSH header is of reasonable length. BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by syz-executor0/10189: #0: (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517 jwrdegoede#1: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] jwrdegoede#1: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 jwrdegoede#2: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] jwrdegoede#2: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 jwrdegoede#3: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] jwrdegoede#3: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 jwrdegoede#4: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] jwrdegoede#4: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 jwrdegoede#5: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] jwrdegoede#5: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 jwrdegoede#6: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] jwrdegoede#6: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#7: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#7: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#8: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#8: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#9: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#9: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#10: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#10: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#11: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#11: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#12: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#12: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#13: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#13: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#14: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#14: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#15: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#15: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#16: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#16: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#17: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#17: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#18: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#18: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#19: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#19: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#20: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#20: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#21: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#21: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#22: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#22: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#23: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#23: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#24: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#24: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#25: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#25: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#26: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#26: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#27: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#27: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#28: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#28: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#29: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#29: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#30: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#30: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#31: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#31: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 dccp_close: ABORT with 65423 bytes unread linux-sunxi#32: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#32: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#33: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#33: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#34: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#34: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#35: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#35: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#36: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#36: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#37: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#37: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#38: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#38: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#39: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#39: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#40: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#40: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#41: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#41: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#42: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#42: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#43: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#43: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#44: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#44: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#45: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#45: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#46: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#46: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 linux-sunxi#47: (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline] linux-sunxi#47: (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787 INFO: lockdep is turned off. CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ linux-sunxi#26 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920 rcu_lock_acquire include/linux/rcupdate.h:246 [inline] rcu_read_lock include/linux/rcupdate.h:632 [inline] skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865 skb_gso_segment include/linux/netdevice.h:4025 [inline] validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312 qdisc_restart net/sched/sch_generic.c:399 [inline] __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410 __dev_xmit_skb net/core/dev.c:3243 [inline] __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616 packet_snd net/packet/af_packet.c:2951 [inline] packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:639 __sys_sendto+0x3d7/0x670 net/socket.c:1789 __do_sys_sendto net/socket.c:1801 [inline] __se_sys_sendto net/socket.c:1797 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: c411ed8 ("nsh: add GSO support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Benc <jbenc@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 97f3c0a ] I found an ACPI cache leak in ACPI early termination and boot continuing case. When early termination occurs due to malicious ACPI table, Linux kernel terminates ACPI function and continues to boot process. While kernel terminates ACPI function, kmem_cache_destroy() reports Acpi-Operand cache leak. Boot log of ACPI operand cache leak is as follows: >[ 0.464168] ACPI: Added _OSI(Module Device) >[ 0.467022] ACPI: Added _OSI(Processor Device) >[ 0.469376] ACPI: Added _OSI(3.0 _SCP Extensions) >[ 0.471647] ACPI: Added _OSI(Processor Aggregator Device) >[ 0.477997] ACPI Error: Null stack entry at ffff880215c0aad8 (20170303/exresop-174) >[ 0.482706] ACPI Exception: AE_AML_INTERNAL, While resolving operands for [opcode_name unavailable] (20170303/dswexec-461) >[ 0.487503] ACPI Error: Method parse/execution failed [\DBG] (Node ffff88021710ab40), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.492136] ACPI Error: Method parse/execution failed [\_SB._INI] (Node ffff88021710a618), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.497683] ACPI: Interpreter enabled >[ 0.499385] ACPI: (supports S0) >[ 0.501151] ACPI: Using IOAPIC for interrupt routing >[ 0.503342] ACPI Error: Null stack entry at ffff880215c0aad8 (20170303/exresop-174) >[ 0.506522] ACPI Exception: AE_AML_INTERNAL, While resolving operands for [opcode_name unavailable] (20170303/dswexec-461) >[ 0.510463] ACPI Error: Method parse/execution failed [\DBG] (Node ffff88021710ab40), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.514477] ACPI Error: Method parse/execution failed [\_PIC] (Node ffff88021710ab18), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.518867] ACPI Exception: AE_AML_INTERNAL, Evaluating _PIC (20170303/bus-991) >[ 0.522384] kmem_cache_destroy Acpi-Operand: Slab cache still has objects >[ 0.524597] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc5 linux-sunxi#26 >[ 0.526795] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 >[ 0.529668] Call Trace: >[ 0.530811] ? dump_stack+0x5c/0x81 >[ 0.532240] ? kmem_cache_destroy+0x1aa/0x1c0 >[ 0.533905] ? acpi_os_delete_cache+0xa/0x10 >[ 0.535497] ? acpi_ut_delete_caches+0x3f/0x7b >[ 0.537237] ? acpi_terminate+0xa/0x14 >[ 0.538701] ? acpi_init+0x2af/0x34f >[ 0.540008] ? acpi_sleep_proc_init+0x27/0x27 >[ 0.541593] ? do_one_initcall+0x4e/0x1a0 >[ 0.543008] ? kernel_init_freeable+0x19e/0x21f >[ 0.546202] ? rest_init+0x80/0x80 >[ 0.547513] ? kernel_init+0xa/0x100 >[ 0.548817] ? ret_from_fork+0x25/0x30 >[ 0.550587] vgaarb: loaded >[ 0.551716] EDAC MC: Ver: 3.0.0 >[ 0.553744] PCI: Probing PCI hardware >[ 0.555038] PCI host bridge to bus 0000:00 > ... Continue to boot and log is omitted ... I analyzed this memory leak in detail and found acpi_ns_evaluate() function only removes Info->return_object in AE_CTRL_RETURN_VALUE case. But, when errors occur, the status value is not AE_CTRL_RETURN_VALUE, and Info->return_object is also not null. Therefore, this causes acpi operand memory leak. This cache leak causes a security threat because an old kernel (<= 4.9) shows memory locations of kernel functions in stack dump. Some malicious users could use this information to neutralize kernel ASLR. I made a patch to fix ACPI operand cache leak. Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 97f3c0a ] I found an ACPI cache leak in ACPI early termination and boot continuing case. When early termination occurs due to malicious ACPI table, Linux kernel terminates ACPI function and continues to boot process. While kernel terminates ACPI function, kmem_cache_destroy() reports Acpi-Operand cache leak. Boot log of ACPI operand cache leak is as follows: >[ 0.464168] ACPI: Added _OSI(Module Device) >[ 0.467022] ACPI: Added _OSI(Processor Device) >[ 0.469376] ACPI: Added _OSI(3.0 _SCP Extensions) >[ 0.471647] ACPI: Added _OSI(Processor Aggregator Device) >[ 0.477997] ACPI Error: Null stack entry at ffff880215c0aad8 (20170303/exresop-174) >[ 0.482706] ACPI Exception: AE_AML_INTERNAL, While resolving operands for [opcode_name unavailable] (20170303/dswexec-461) >[ 0.487503] ACPI Error: Method parse/execution failed [\DBG] (Node ffff88021710ab40), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.492136] ACPI Error: Method parse/execution failed [\_SB._INI] (Node ffff88021710a618), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.497683] ACPI: Interpreter enabled >[ 0.499385] ACPI: (supports S0) >[ 0.501151] ACPI: Using IOAPIC for interrupt routing >[ 0.503342] ACPI Error: Null stack entry at ffff880215c0aad8 (20170303/exresop-174) >[ 0.506522] ACPI Exception: AE_AML_INTERNAL, While resolving operands for [opcode_name unavailable] (20170303/dswexec-461) >[ 0.510463] ACPI Error: Method parse/execution failed [\DBG] (Node ffff88021710ab40), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.514477] ACPI Error: Method parse/execution failed [\_PIC] (Node ffff88021710ab18), AE_AML_INTERNAL (20170303/psparse-543) >[ 0.518867] ACPI Exception: AE_AML_INTERNAL, Evaluating _PIC (20170303/bus-991) >[ 0.522384] kmem_cache_destroy Acpi-Operand: Slab cache still has objects >[ 0.524597] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc5 #26 >[ 0.526795] Hardware name: innotek gmb_h virtual_box/virtual_box, BIOS virtual_box 12/01/2006 >[ 0.529668] Call Trace: >[ 0.530811] ? dump_stack+0x5c/0x81 >[ 0.532240] ? kmem_cache_destroy+0x1aa/0x1c0 >[ 0.533905] ? acpi_os_delete_cache+0xa/0x10 >[ 0.535497] ? acpi_ut_delete_caches+0x3f/0x7b >[ 0.537237] ? acpi_terminate+0xa/0x14 >[ 0.538701] ? acpi_init+0x2af/0x34f >[ 0.540008] ? acpi_sleep_proc_init+0x27/0x27 >[ 0.541593] ? do_one_initcall+0x4e/0x1a0 >[ 0.543008] ? kernel_init_freeable+0x19e/0x21f >[ 0.546202] ? rest_init+0x80/0x80 >[ 0.547513] ? kernel_init+0xa/0x100 >[ 0.548817] ? ret_from_fork+0x25/0x30 >[ 0.550587] vgaarb: loaded >[ 0.551716] EDAC MC: Ver: 3.0.0 >[ 0.553744] PCI: Probing PCI hardware >[ 0.555038] PCI host bridge to bus 0000:00 > ... Continue to boot and log is omitted ... I analyzed this memory leak in detail and found acpi_ns_evaluate() function only removes Info->return_object in AE_CTRL_RETURN_VALUE case. But, when errors occur, the status value is not AE_CTRL_RETURN_VALUE, and Info->return_object is also not null. Therefore, this causes acpi operand memory leak. This cache leak causes a security threat because an old kernel (<= 4.9) shows memory locations of kernel functions in stack dump. Some malicious users could use this information to neutralize kernel ASLR. I made a patch to fix ACPI operand cache leak. Signed-off-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 #1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 #2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c #3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a #4 [ffff891ee00039e0] no_context at ffffffffbd074643 #5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e #6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 #7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a #8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 #9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] #11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] #12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] #13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] #14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 #15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 #16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 #17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 #18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 #19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- #20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame #21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd #22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 #23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 #24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 #25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 #26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa #27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca #28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 #29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb #30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

If segment type in SSA and SIT is inconsistent, we will encounter below BUG_ON during GC, to avoid this panic, let's just skip doing GC on such segment. The bug is triggered with image reported in below link: https://bugzilla.kernel.org/show_bug.cgi?id=200223 [ 388.060262] ------------[ cut here ]------------ [ 388.060268] kernel BUG at /home/y00370721/git/devf2fs/gc.c:989! [ 388.061172] invalid opcode: 0000 [#1] SMP [ 388.061773] Modules linked in: f2fs(O) bluetooth ecdh_generic xt_tcpudp iptable_filter ip_tables x_tables lp ttm drm_kms_helper drm intel_rapl sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel fb_sys_fops ppdev aes_x86_64 syscopyarea crypto_simd sysfillrect parport_pc joydev sysimgblt glue_helper parport cryptd i2c_piix4 serio_raw mac_hid btrfs hid_generic usbhid hid raid6_pq psmouse pata_acpi floppy [ 388.064247] CPU: 7 PID: 4151 Comm: f2fs_gc-7:0 Tainted: G O 4.13.0-rc1+ #26 [ 388.065306] Hardware name: Xen HVM domU, BIOS 4.1.2_115-900.260_ 11/06/2015 [ 388.066058] task: ffff880201583b80 task.stack: ffffc90004d7c000 [ 388.069948] RIP: 0010:do_garbage_collect+0xcc8/0xcd0 [f2fs] [ 388.070766] RSP: 0018:ffffc90004d7fc68 EFLAGS: 00010202 [ 388.071783] RAX: ffff8801ed227000 RBX: 0000000000000001 RCX: ffffea0007b489c0 [ 388.072700] RDX: ffff880000000000 RSI: 0000000000000001 RDI: ffffea0007b489c0 [ 388.073607] RBP: ffffc90004d7fd58 R08: 0000000000000003 R09: ffffea0007b489dc [ 388.074619] R10: 0000000000000000 R11: 0052782ab317138d R12: 0000000000000018 [ 388.075625] R13: 0000000000000018 R14: ffff880211ceb000 R15: ffff880211ceb000 [ 388.076687] FS: 0000000000000000(0000) GS:ffff880214fc0000(0000) knlGS:0000000000000000 [ 388.083277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 388.084536] CR2: 0000000000e18c60 CR3: 00000001ecf2e000 CR4: 00000000001406e0 [ 388.085748] Call Trace: [ 388.086690] ? find_next_bit+0xb/0x10 [ 388.088091] f2fs_gc+0x1a8/0x9d0 [f2fs] [ 388.088888] ? lock_timer_base+0x7d/0xa0 [ 388.090213] ? try_to_del_timer_sync+0x44/0x60 [ 388.091698] gc_thread_func+0x342/0x4b0 [f2fs] [ 388.092892] ? wait_woken+0x80/0x80 [ 388.094098] kthread+0x109/0x140 [ 388.095010] ? f2fs_gc+0x9d0/0x9d0 [f2fs] [ 388.096043] ? kthread_park+0x60/0x60 [ 388.097281] ret_from_fork+0x25/0x30 [ 388.098401] Code: ff ff 48 83 e8 01 48 89 44 24 58 e9 27 f8 ff ff 48 83 e8 01 e9 78 fc ff ff 48 8d 78 ff e9 17 fb ff ff 48 83 ef 01 e9 4d f4 ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 [ 388.100864] RIP: do_garbage_collect+0xcc8/0xcd0 [f2fs] RSP: ffffc90004d7fc68 [ 388.101810] ---[ end trace 81c73d6e6b7da61d ]--- Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

[ Upstream commit 10d255c ] If segment type in SSA and SIT is inconsistent, we will encounter below BUG_ON during GC, to avoid this panic, let's just skip doing GC on such segment. The bug is triggered with image reported in below link: https://bugzilla.kernel.org/show_bug.cgi?id=200223 [ 388.060262] ------------[ cut here ]------------ [ 388.060268] kernel BUG at /home/y00370721/git/devf2fs/gc.c:989! [ 388.061172] invalid opcode: 0000 [jwrdegoede#1] SMP [ 388.061773] Modules linked in: f2fs(O) bluetooth ecdh_generic xt_tcpudp iptable_filter ip_tables x_tables lp ttm drm_kms_helper drm intel_rapl sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel fb_sys_fops ppdev aes_x86_64 syscopyarea crypto_simd sysfillrect parport_pc joydev sysimgblt glue_helper parport cryptd i2c_piix4 serio_raw mac_hid btrfs hid_generic usbhid hid raid6_pq psmouse pata_acpi floppy [ 388.064247] CPU: 7 PID: 4151 Comm: f2fs_gc-7:0 Tainted: G O 4.13.0-rc1+ linux-sunxi#26 [ 388.065306] Hardware name: Xen HVM domU, BIOS 4.1.2_115-900.260_ 11/06/2015 [ 388.066058] task: ffff880201583b80 task.stack: ffffc90004d7c000 [ 388.069948] RIP: 0010:do_garbage_collect+0xcc8/0xcd0 [f2fs] [ 388.070766] RSP: 0018:ffffc90004d7fc68 EFLAGS: 00010202 [ 388.071783] RAX: ffff8801ed227000 RBX: 0000000000000001 RCX: ffffea0007b489c0 [ 388.072700] RDX: ffff880000000000 RSI: 0000000000000001 RDI: ffffea0007b489c0 [ 388.073607] RBP: ffffc90004d7fd58 R08: 0000000000000003 R09: ffffea0007b489dc [ 388.074619] R10: 0000000000000000 R11: 0052782ab317138d R12: 0000000000000018 [ 388.075625] R13: 0000000000000018 R14: ffff880211ceb000 R15: ffff880211ceb000 [ 388.076687] FS: 0000000000000000(0000) GS:ffff880214fc0000(0000) knlGS:0000000000000000 [ 388.083277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 388.084536] CR2: 0000000000e18c60 CR3: 00000001ecf2e000 CR4: 00000000001406e0 [ 388.085748] Call Trace: [ 388.086690] ? find_next_bit+0xb/0x10 [ 388.088091] f2fs_gc+0x1a8/0x9d0 [f2fs] [ 388.088888] ? lock_timer_base+0x7d/0xa0 [ 388.090213] ? try_to_del_timer_sync+0x44/0x60 [ 388.091698] gc_thread_func+0x342/0x4b0 [f2fs] [ 388.092892] ? wait_woken+0x80/0x80 [ 388.094098] kthread+0x109/0x140 [ 388.095010] ? f2fs_gc+0x9d0/0x9d0 [f2fs] [ 388.096043] ? kthread_park+0x60/0x60 [ 388.097281] ret_from_fork+0x25/0x30 [ 388.098401] Code: ff ff 48 83 e8 01 48 89 44 24 58 e9 27 f8 ff ff 48 83 e8 01 e9 78 fc ff ff 48 8d 78 ff e9 17 fb ff ff 48 83 ef 01 e9 4d f4 ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 [ 388.100864] RIP: do_garbage_collect+0xcc8/0xcd0 [f2fs] RSP: ffffc90004d7fc68 [ 388.101810] ---[ end trace 81c73d6e6b7da61d ]--- Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

If segment type in SSA and SIT is inconsistent, we will encounter below BUG_ON during GC, to avoid this panic, let's just skip doing GC on such segment. The bug is triggered with image reported in below link: https://bugzilla.kernel.org/show_bug.cgi?id=200223 [ 388.060262] ------------[ cut here ]------------ [ 388.060268] kernel BUG at /home/y00370721/git/devf2fs/gc.c:989! [ 388.061172] invalid opcode: 0000 [#1] SMP [ 388.061773] Modules linked in: f2fs(O) bluetooth ecdh_generic xt_tcpudp iptable_filter ip_tables x_tables lp ttm drm_kms_helper drm intel_rapl sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel fb_sys_fops ppdev aes_x86_64 syscopyarea crypto_simd sysfillrect parport_pc joydev sysimgblt glue_helper parport cryptd i2c_piix4 serio_raw mac_hid btrfs hid_generic usbhid hid raid6_pq psmouse pata_acpi floppy [ 388.064247] CPU: 7 PID: 4151 Comm: f2fs_gc-7:0 Tainted: G O 4.13.0-rc1+ #26 [ 388.065306] Hardware name: Xen HVM domU, BIOS 4.1.2_115-900.260_ 11/06/2015 [ 388.066058] task: ffff880201583b80 task.stack: ffffc90004d7c000 [ 388.069948] RIP: 0010:do_garbage_collect+0xcc8/0xcd0 [f2fs] [ 388.070766] RSP: 0018:ffffc90004d7fc68 EFLAGS: 00010202 [ 388.071783] RAX: ffff8801ed227000 RBX: 0000000000000001 RCX: ffffea0007b489c0 [ 388.072700] RDX: ffff880000000000 RSI: 0000000000000001 RDI: ffffea0007b489c0 [ 388.073607] RBP: ffffc90004d7fd58 R08: 0000000000000003 R09: ffffea0007b489dc [ 388.074619] R10: 0000000000000000 R11: 0052782ab317138d R12: 0000000000000018 [ 388.075625] R13: 0000000000000018 R14: ffff880211ceb000 R15: ffff880211ceb000 [ 388.076687] FS: 0000000000000000(0000) GS:ffff880214fc0000(0000) knlGS:0000000000000000 [ 388.083277] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 388.084536] CR2: 0000000000e18c60 CR3: 00000001ecf2e000 CR4: 00000000001406e0 [ 388.085748] Call Trace: [ 388.086690] ? find_next_bit+0xb/0x10 [ 388.088091] f2fs_gc+0x1a8/0x9d0 [f2fs] [ 388.088888] ? lock_timer_base+0x7d/0xa0 [ 388.090213] ? try_to_del_timer_sync+0x44/0x60 [ 388.091698] gc_thread_func+0x342/0x4b0 [f2fs] [ 388.092892] ? wait_woken+0x80/0x80 [ 388.094098] kthread+0x109/0x140 [ 388.095010] ? f2fs_gc+0x9d0/0x9d0 [f2fs] [ 388.096043] ? kthread_park+0x60/0x60 [ 388.097281] ret_from_fork+0x25/0x30 [ 388.098401] Code: ff ff 48 83 e8 01 48 89 44 24 58 e9 27 f8 ff ff 48 83 e8 01 e9 78 fc ff ff 48 8d 78 ff e9 17 fb ff ff 48 83 ef 01 e9 4d f4 ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 [ 388.100864] RIP: do_garbage_collect+0xcc8/0xcd0 [f2fs] RSP: ffffc90004d7fc68 [ 388.101810] ---[ end trace 81c73d6e6b7da61d ]--- Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Increase kasan instrumented kernel stack size from 32k to 64k. Other architectures seems to get away with just doubling kernel stack size under kasan, but on s390 this appears to be not enough due to bigger frame size. The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting stack overflow is fs sync on xfs filesystem: #0 [9a0681e8] 704 bytes check_usage at 34b1fc #1 [9a0684a8] 432 bytes check_usage at 34c710 #2 [9a068658] 1048 bytes validate_chain at 35044a #3 [9a068a70] 312 bytes __lock_acquire at 3559fe #4 [9a068ba8] 440 bytes lock_acquire at 3576ee #5 [9a068d60] 104 bytes _raw_spin_lock at 21b44e0 #6 [9a068dc8] 1992 bytes enqueue_entity at 2dbf72 #7 [9a069590] 1496 bytes enqueue_task_fair at 2df5f0 #8 [9a069b68] 64 bytes ttwu_do_activate at 28f438 #9 [9a069ba8] 552 bytes try_to_wake_up at 298c4c #10 [9a069dd0] 168 bytes wake_up_worker at 23f97c #11 [9a069e78] 200 bytes insert_work at 23fc2e #12 [9a069f40] 648 bytes __queue_work at 2487c0 #13 [9a06a1c8] 200 bytes __queue_delayed_work at 24db28 #14 [9a06a290] 248 bytes mod_delayed_work_on at 24de84 #15 [9a06a388] 24 bytes kblockd_mod_delayed_work_on at 153e2a0 #16 [9a06a3a0] 288 bytes __blk_mq_delay_run_hw_queue at 158168c #17 [9a06a4c0] 192 bytes blk_mq_run_hw_queue at 1581a3c #18 [9a06a580] 184 bytes blk_mq_sched_insert_requests at 15a2192 #19 [9a06a638] 1024 bytes blk_mq_flush_plug_list at 1590f3a #20 [9a06aa38] 704 bytes blk_flush_plug_list at 1555028 #21 [9a06acf8] 320 bytes schedule at 219e476 #22 [9a06ae38] 760 bytes schedule_timeout at 21b0aac #23 [9a06b130] 408 bytes wait_for_common at 21a1706 #24 [9a06b2c8] 360 bytes xfs_buf_iowait at fa1540 #25 [9a06b430] 256 bytes __xfs_buf_submit at fadae6 #26 [9a06b530] 264 bytes xfs_buf_read_map at fae3f6 #27 [9a06b638] 656 bytes xfs_trans_read_buf_map at 10ac9a8 #28 [9a06b8c8] 304 bytes xfs_btree_kill_root at e72426 #29 [9a06b9f8] 288 bytes xfs_btree_lookup_get_block at e7bc5e #30 [9a06bb18] 624 bytes xfs_btree_lookup at e7e1a6 #31 [9a06bd88] 2664 bytes xfs_alloc_ag_vextent_near at dfa070 #32 [9a06c7f0] 144 bytes xfs_alloc_ag_vextent at dff3ca #33 [9a06c880] 1128 bytes xfs_alloc_vextent at e05fce #34 [9a06cce8] 584 bytes xfs_bmap_btalloc at e58342 #35 [9a06cf30] 1336 bytes xfs_bmapi_write at e618de #36 [9a06d468] 776 bytes xfs_iomap_write_allocate at ff678e #37 [9a06d770] 720 bytes xfs_map_blocks at f82af8 #38 [9a06da40] 928 bytes xfs_writepage_map at f83cd6 #39 [9a06dde0] 320 bytes xfs_do_writepage at f85872 #40 [9a06df20] 1320 bytes write_cache_pages at 73dfe8 #41 [9a06e448] 208 bytes xfs_vm_writepages at f7f892 #42 [9a06e518] 88 bytes do_writepages at 73fe6a #43 [9a06e570] 872 bytes __writeback_single_inode at a20cb6 #44 [9a06e8d8] 664 bytes writeback_sb_inodes at a23be2 #45 [9a06eb70] 296 bytes __writeback_inodes_wb at a242e0 #46 [9a06ec98] 928 bytes wb_writeback at a2500e #47 [9a06f038] 848 bytes wb_do_writeback at a260ae #48 [9a06f388] 536 bytes wb_workfn at a28228 #49 [9a06f5a0] 1088 bytes process_one_work at 24a234 #50 [9a06f9e0] 1120 bytes worker_thread at 24ba26 #51 [9a06fe40] 104 bytes kthread at 26545a #52 [9a06fea8] kernel_thread_starter at 21b6b62 To be able to increase the stack size to 64k reuse LLILL instruction in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE (65192) value as unsigned. Reported-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

commit 36eb8ff upstream. Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 jwrdegoede#1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 jwrdegoede#2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c jwrdegoede#3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a jwrdegoede#4 [ffff891ee00039e0] no_context at ffffffffbd074643 jwrdegoede#5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e jwrdegoede#6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 linux-sunxi#7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a linux-sunxi#8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 linux-sunxi#9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 linux-sunxi#10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] linux-sunxi#11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] linux-sunxi#12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] linux-sunxi#13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] linux-sunxi#14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 linux-sunxi#15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 linux-sunxi#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 linux-sunxi#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 linux-sunxi#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 linux-sunxi#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- linux-sunxi#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame linux-sunxi#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd linux-sunxi#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 linux-sunxi#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 linux-sunxi#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 linux-sunxi#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 linux-sunxi#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa linux-sunxi#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca linux-sunxi#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 linux-sunxi#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb linux-sunxi#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…not disabled In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory' for saving memory. Now kdump kernel will always panic when dump vmcore to local disk: BUG: kernel NULL pointer dereference, address: 0000000000000ab8 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ linux-sunxi#26 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140 Call Trace: __set_page_dirty+0x52/0xc0 iomap_set_page_dirty+0x50/0x90 iomap_write_end+0x6e/0x270 iomap_write_actor+0xce/0x170 iomap_apply+0xba/0x11e iomap_file_buffered_write+0x62/0x90 xfs_file_buffered_aio_write+0xca/0x320 [xfs] new_sync_write+0x12d/0x1d0 vfs_write+0xa5/0x1a0 ksys_write+0x59/0xd0 do_syscall_64+0x59/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 And this will corrupt the 1st kernel too with 'cgroup_disable=memory'. Via the trace and with debugging, it is pointing to commit 97b2782 ("writeback, memcg: Implement foreign dirty flushing") which introduced this regression. Disabling memcg causes the null pointer dereference at uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath(). Fix it by returning directly if memcg is disabled, but not trying to record the foreign writebacks with dirty pages. Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv Fixes: 97b2782 ("writeback, memcg: Implement foreign dirty flushing") Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jan Kara <jack@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) jwrdegoede#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 jwrdegoede#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 jwrdegoede#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 jwrdegoede#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 jwrdegoede#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 jwrdegoede#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 linux-sunxi#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 linux-sunxi#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 linux-sunxi#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 linux-sunxi#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 linux-sunxi#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 linux-sunxi#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 linux-sunxi#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 linux-sunxi#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 linux-sunxi#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 linux-sunxi#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 linux-sunxi#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 linux-sunxi#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 linux-sunxi#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 linux-sunxi#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 linux-sunxi#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 linux-sunxi#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 linux-sunxi#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 linux-sunxi#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 linux-sunxi#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 linux-sunxi#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

Netpoll can try to poll napi as soon as napi_enable() is called. It crashes trying to access a doorbell which is still NULL: BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 59 PID: 6039 Comm: ethtool Kdump: loaded Tainted: G S 5.9.0-rc1-00469-g5fd99b5d9950-dirty linux-sunxi#26 RIP: 0010:bnxt_poll+0x121/0x1c0 Code: c4 20 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 8b 86 a0 01 00 00 41 23 85 18 01 00 00 49 8b 96 a8 01 00 00 0d 00 00 00 24 <89> 02 41 f6 45 77 02 74 cb 49 8b ae d8 01 00 00 31 c0 c7 44 24 1a netpoll_poll_dev+0xbd/0x1a0 __netpoll_send_skb+0x1b2/0x210 netpoll_send_udp+0x2c9/0x406 write_ext_msg+0x1d7/0x1f0 console_unlock+0x23c/0x520 vprintk_emit+0xe0/0x1d0 printk+0x58/0x6f x86_vector_activate.cold+0xf/0x46 __irq_domain_activate_irq+0x50/0x80 __irq_domain_activate_irq+0x32/0x80 __irq_domain_activate_irq+0x32/0x80 irq_domain_activate_irq+0x25/0x40 __setup_irq+0x2d2/0x700 request_threaded_irq+0xfb/0x160 __bnxt_open_nic+0x3b1/0x750 bnxt_open_nic+0x19/0x30 ethtool_set_channels+0x1ac/0x220 dev_ethtool+0x11ba/0x2240 dev_ioctl+0x1cf/0x390 sock_do_ioctl+0x95/0x130 Reported-by: Rob Sherwood <rsher@fb.com> Fixes: c0c050c ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>

[ Upstream commit 9d23246 ] This patch fixes issues which occurs when dlm lowcomms synchronize their workqueues but dlm application layer already released the lockspace. In such cases messages like: dlm: gfs2: release_lockspace final free dlm: invalid lockspace 3841231384 from 1 cmd 1 type 11 are printed on the kernel log. This patch is solving this issue by introducing a new "shutdown" hook before calling "stop" hook when the lockspace is going to be released finally. This should pretend any dlm messages sitting in the workqueues during or after lockspace removal. It's necessary to call dlm_scand_stop() as I instrumented dlm_lowcomms_get_buffer() code to report a warning after it's called after dlm_midcomms_shutdown() functionality, see below: WARNING: CPU: 1 PID: 3794 at fs/dlm/midcomms.c:1003 dlm_midcomms_get_buffer+0x167/0x180 Modules linked in: joydev iTCO_wdt intel_pmc_bxt iTCO_vendor_support drm_ttm_helper ttm pcspkr serio_raw i2c_i801 i2c_smbus drm_kms_helper virtio_scsi lpc_ich virtio_balloon virtio_console xhci_pci xhci_pci_renesas cec qemu_fw_cfg drm [last unloaded: qxl] CPU: 1 PID: 3794 Comm: dlm_scand Tainted: G W 5.11.0+ linux-sunxi#26 Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 RIP: 0010:dlm_midcomms_get_buffer+0x167/0x180 Code: 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 45 31 e4 5b 5d 4c 89 e0 41 5c 41 5d 41 5e 41 5f c3 4c 89 e7 45 31 e4 e8 3b f1 ec ff eb 86 <0f> 0b 4c 89 e7 45 31 e4 e8 2c f1 ec ff e9 74 ff ff ff 0f 1f 80 00 RSP: 0018:ffffa81503f8fe60 EFLAGS: 00010202 RAX: 0000000000000008 RBX: ffff8f969827f200 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffad1e89a0 RDI: ffff8f96a5294160 RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8f96a250bc60 R10: 00000000000045d3 R11: 0000000000000000 R12: ffff8f96a250bc60 R13: ffffa81503f8fec8 R14: 0000000000000070 R15: 0000000000000c40 FS: 0000000000000000(0000) GS:ffff8f96fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055aa3351c000 CR3: 000000010bf22000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dlm_scan_rsbs+0x420/0x670 ? dlm_uevent+0x20/0x20 dlm_scand+0xbf/0xe0 kthread+0x13a/0x150 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 To synchronize all dlm scand messages we stop it right before shutdown hook. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... linux-sunxi#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 linux-sunxi#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 linux-sunxi#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... linux-sunxi#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... linux-sunxi#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 linux-sunxi#45 0x00007f8990968b85 in __getgrnam_r [...] linux-sunxi#46 0x00007f89909d6b77 in grantpt [...] linux-sunxi#47 0x00007f8990a9394e in __GI_openpty [...] linux-sunxi#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 linux-sunxi#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 linux-sunxi#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 linux-sunxi#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 linux-sunxi#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <rrs@debian.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Tested-By: Ritesh Raj Sarraf <rrs@debian.org> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>

@submit

[BUG] When running btrfs/027 with "-o compress" mount option, it always crashes with the following call trace: BTRFS critical (device dm-4): mapping failed logical 298901504 bio len 12288 len 8192 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:6651! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 5 PID: 31089 Comm: kworker/u24:10 Tainted: G OE 5.13.0-rc2-custom+ linux-sunxi#26 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-delalloc btrfs_work_helper [btrfs] RIP: 0010:btrfs_map_bio.cold+0x58/0x5a [btrfs] Call Trace: btrfs_submit_compressed_write+0x2d7/0x470 [btrfs] submit_compressed_extents+0x3b0/0x470 [btrfs] ? mark_held_locks+0x49/0x70 btrfs_work_helper+0x131/0x3e0 [btrfs] process_one_work+0x28f/0x5d0 worker_thread+0x55/0x3c0 ? process_one_work+0x5d0/0x5d0 kthread+0x141/0x160 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 ---[ end trace 63113a3a91f34e68 ]--- [CAUSE] The critical message before the crash means we have a bio at logical bytenr 298901504 length 12288, but only 8192 bytes can fit into one stripe, the remaining 4096 bytes go to another stripe. In btrfs, all bios are properly split to avoid cross stripe boundary, but commit 764c7c9 ("btrfs: zoned: fix parallel compressed writes") changed the behavior for compressed writes. Previously if we find our new page can't be fitted into current stripe, ie. "submit == 1" case, we submit current bio without adding current page. submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0); page->mapping = NULL; if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { But after the modification, we will add the page no matter if it crosses stripe boundary, leading to the above crash. submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0); if (pg_index == 0 && use_append) len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0); else len = bio_add_page(bio, page, PAGE_SIZE, 0); page->mapping = NULL; if (submit || len < PAGE_SIZE) { [FIX] It's no longer possible to revert to the original code style as we have two different bio_add_*_page() calls now. The new fix is to skip the bio_add_*_page() call if @submit is true. Also to avoid @len to be uninitialized, always initialize it to zero. If @submit is true, @len will not be checked. If @submit is not true, @len will be the return value of bio_add_*_page() call. Either way, the behavior is still the same as the old code. Reported-by: Josef Bacik <josef@toxicpanda.com> Fixes: 764c7c9 ("btrfs: zoned: fix parallel compressed writes") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[ Upstream commit 6c881ca ] To quote Alexey[1]: I was adding custom tracepoint to the kernel, grabbed full F34 kernel .config, disabled modules and booted whole shebang as VM kernel. Then did perf record -a -e ... It crashed: general protection fault, probably for non-canonical address 0x435f5346592e4243: 0000 [jwrdegoede#1] SMP PTI CPU: 1 PID: 842 Comm: cat Not tainted 5.12.6+ linux-sunxi#26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:t_show+0x22/0xd0 Then reproducer was narrowed to # cat /sys/kernel/tracing/printk_formats Original F34 kernel with modules didn't crash. So I started to disable options and after disabling AFS everything started working again. The root cause is that AFS was placing char arrays content into a section full of _pointers_ to strings with predictable consequences. Non canonical address 435f5346592e4243 is "CB.YFS_" which came from CM_NAME macro. Steps to reproduce: CONFIG_AFS=y CONFIG_TRACING=y # cat /sys/kernel/tracing/printk_formats Fix this by the following means: (1) Add enum->string translation tables in the event header with the AFS and YFS cache/callback manager operations listed by RPC operation ID. (2) Modify the afs_cb_call tracepoint to print the string from the translation table rather than using the string at the afs_call name pointer. (3) Switch translation table depending on the service we're being accessed as (AFS or YFS) in the tracepoint print clause. Will this cause problems to userspace utilities? Note that the symbolic representation of the YFS service ID isn't available to this header, so I've put it in as a number. I'm not sure if this is the best way to do this. (4) Remove the name wrangling (CM_NAME) macro and put the names directly into the afs_call_type structs in cmservice.c. Fixes: 8e8d7f1 ("afs: Add some tracepoints") Reported-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YLAXfvZ+rObEOdc%2F@localhost.localdomain/ [1] Link: https://lore.kernel.org/r/643721.1623754699@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162430903582.2896199.6098150063997983353.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/162609463957.3133237.15916579353149746363.stgit@warthog.procyon.org.uk/ # v1 (repost) Link: https://lore.kernel.org/r/162610726860.3408253.445207609466288531.stgit@warthog.procyon.org.uk/ # v2 Signed-off-by: Sasha Levin <sashal@kernel.org>

To quote Alexey[1]: I was adding custom tracepoint to the kernel, grabbed full F34 kernel .config, disabled modules and booted whole shebang as VM kernel. Then did perf record -a -e ... It crashed: general protection fault, probably for non-canonical address 0x435f5346592e4243: 0000 [#1] SMP PTI CPU: 1 PID: 842 Comm: cat Not tainted 5.12.6+ linux-sunxi#26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:t_show+0x22/0xd0 Then reproducer was narrowed to # cat /sys/kernel/tracing/printk_formats Original F34 kernel with modules didn't crash. So I started to disable options and after disabling AFS everything started working again. The root cause is that AFS was placing char arrays content into a section full of _pointers_ to strings with predictable consequences. Non canonical address 435f5346592e4243 is "CB.YFS_" which came from CM_NAME macro. Steps to reproduce: CONFIG_AFS=y CONFIG_TRACING=y # cat /sys/kernel/tracing/printk_formats Fix this by the following means: (1) Add enum->string translation tables in the event header with the AFS and YFS cache/callback manager operations listed by RPC operation ID. (2) Modify the afs_cb_call tracepoint to print the string from the translation table rather than using the string at the afs_call name pointer. (3) Switch translation table depending on the service we're being accessed as (AFS or YFS) in the tracepoint print clause. Will this cause problems to userspace utilities? Note that the symbolic representation of the YFS service ID isn't available to this header, so I've put it in as a number. I'm not sure if this is the best way to do this. (4) Remove the name wrangling (CM_NAME) macro and put the names directly into the afs_call_type structs in cmservice.c. Fixes: 8e8d7f1 ("afs: Add some tracepoints") Reported-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YLAXfvZ+rObEOdc%2F@localhost.localdomain/ [1] Link: https://lore.kernel.org/r/643721.1623754699@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162430903582.2896199.6098150063997983353.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/162609463957.3133237.15916579353149746363.stgit@warthog.procyon.org.uk/ # v1 (repost) Link: https://lore.kernel.org/r/162610726860.3408253.445207609466288531.stgit@warthog.procyon.org.uk/ # v2

commit 640b7ea upstream. The memory reserved by console/PALcode or non-volatile memory is not added to memblock.memory. Since commit fa3354e (mm: free_area_init: use maximal zone PFNs rather than zone sizes) the initialization of the memory map relies on the accuracy of memblock.memory to properly calculate zone sizes. The holes in memblock.memory caused by absent regions reserved by the firmware cause incorrect initialization of struct pages which leads to BUG() during the initial page freeing: BUG: Bad page state in process swapper pfn:2ffc53 page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty linux-sunxi#26 fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0 fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0 0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618 fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80 fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80 fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0 Trace: [<fffffc00011cd148>] bad_page+0x168/0x1b0 [<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290 [<fffffc00011d1abc>] free_unref_page+0x2c/0xa0 [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30 [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30 [<fffffc000101001c>] _stext+0x1c/0x20 Fix this by registering the reserved ranges in memblock.memory. Link: https://lore.kernel.org/lkml/20210726192311.uffqnanxw3ac5wwi@ivybridge Fixes: fa3354e ("mm: free_area_init: use maximal zone PFNs rather than zone sizes") Reported-by: Matt Turner <mattst88@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The memory reserved by console/PALcode or non-volatile memory is not added to memblock.memory. Since commit fa3354e (mm: free_area_init: use maximal zone PFNs rather than zone sizes) the initialization of the memory map relies on the accuracy of memblock.memory to properly calculate zone sizes. The holes in memblock.memory caused by absent regions reserved by the firmware cause incorrect initialization of struct pages which leads to BUG() during the initial page freeing: BUG: Bad page state in process swapper pfn:2ffc53 page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty linux-sunxi#26 fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0 fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0 0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618 fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80 fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80 fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0 Trace: [<fffffc00011cd148>] bad_page+0x168/0x1b0 [<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290 [<fffffc00011d1abc>] free_unref_page+0x2c/0xa0 [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30 [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30 [<fffffc000101001c>] _stext+0x1c/0x20 Fix this by registering the reserved ranges in memblock.memory. Link: https://lore.kernel.org/lkml/20210726192311.uffqnanxw3ac5wwi@ivybridge Fixes: fa3354e ("mm: free_area_init: use maximal zone PFNs rather than zone sizes") Reported-by: Matt Turner <mattst88@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Matt Turner <mattst88@gmail.com>

[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU linux-sunxi#24 skipping offline CPU linux-sunxi#25 skipping offline CPU linux-sunxi#26 skipping offline CPU linux-sunxi#27 skipping offline CPU linux-sunxi#28 skipping offline CPU linux-sunxi#29 skipping offline CPU linux-sunxi#30 skipping offline CPU linux-sunxi#31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>

commit b7fb2da upstream. struct rpmsg_ctrldev contains a struct cdev. The current code frees the rpmsg_ctrldev struct in rpmsg_ctrldev_release_device(), but the cdev is a managed object, therefore its release is not predictable and the rpmsg_ctrldev could be freed before the cdev is entirely released, as in the backtrace below. [ 93.625603] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c [ 93.636115] WARNING: CPU: 0 PID: 12 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0 [ 93.644799] Modules linked in: veth xt_cgroup xt_MASQUERADE rfcomm algif_hash algif_skcipher af_alg uinput ip6table_nat fuse uvcvideo videobuf2_vmalloc venus_enc venus_dec videobuf2_dma_contig hci_uart btandroid btqca snd_soc_rt5682_i2c bluetooth qcom_spmi_temp_alarm snd_soc_rt5682v [ 93.715175] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G B 5.4.163-lockdep linux-sunxi#26 [ 93.723855] Hardware name: Google Lazor (rev3 - 8) with LTE (DT) [ 93.730055] Workqueue: events kobject_delayed_cleanup [ 93.735271] pstate: 60c00009 (nZCv daif +PAN +UAO) [ 93.740216] pc : debug_print_object+0x13c/0x1b0 [ 93.744890] lr : debug_print_object+0x13c/0x1b0 [ 93.749555] sp : ffffffacf5bc7940 [ 93.752978] x29: ffffffacf5bc7940 x28: dfffffd000000000 [ 93.758448] x27: ffffffacdb11a800 x26: dfffffd000000000 [ 93.763916] x25: ffffffd0734f856c x24: dfffffd000000000 [ 93.769389] x23: 0000000000000000 x22: ffffffd0733c35b0 [ 93.774860] x21: ffffffd0751994a0 x20: ffffffd075ec27c0 [ 93.780338] x19: ffffffd075199100 x18: 00000000000276e0 [ 93.785814] x17: 0000000000000000 x16: dfffffd000000000 [ 93.791291] x15: ffffffffffffffff x14: 6e6968207473696c [ 93.796768] x13: 0000000000000000 x12: ffffffd075e2b000 [ 93.802244] x11: 0000000000000001 x10: 0000000000000000 [ 93.807723] x9 : d13400dff1921900 x8 : d13400dff1921900 [ 93.813200] x7 : 0000000000000000 x6 : 0000000000000000 [ 93.818676] x5 : 0000000000000080 x4 : 0000000000000000 [ 93.824152] x3 : ffffffd0732a0fa4 x2 : 0000000000000001 [ 93.829628] x1 : ffffffacf5bc7580 x0 : 0000000000000061 [ 93.835104] Call trace: [ 93.837644] debug_print_object+0x13c/0x1b0 [ 93.841963] __debug_check_no_obj_freed+0x25c/0x3c0 [ 93.846987] debug_check_no_obj_freed+0x18/0x20 [ 93.851669] slab_free_freelist_hook+0xbc/0x1e4 [ 93.856346] kfree+0xfc/0x2f4 [ 93.859416] rpmsg_ctrldev_release_device+0x78/0xb8 [ 93.864445] device_release+0x84/0x168 [ 93.868310] kobject_cleanup+0x12c/0x298 [ 93.872356] kobject_delayed_cleanup+0x10/0x18 [ 93.876948] process_one_work+0x578/0x92c [ 93.881086] worker_thread+0x804/0xcf8 [ 93.884963] kthread+0x2a8/0x314 [ 93.888303] ret_from_fork+0x10/0x18 The cdev_device_add/del() API was created to address this issue (see commit '233ed09d7fda ("chardev: add helper function to register char devs with a struct device")'), use it instead of cdev add/del(). Fixes: c0cdc19 ("rpmsg: Driver for user space endpoint interface") Signed-off-by: Sujit Kautkar <sujitka@chromium.org> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20220110104706.v6.1.Iaac908f3e3149a89190ce006ba166e2d3fd247a3@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... linux-sunxi#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 linux-sunxi#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] linux-sunxi#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] linux-sunxi#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] linux-sunxi#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] linux-sunxi#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] linux-sunxi#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] linux-sunxi#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] linux-sunxi#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 linux-sunxi#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 linux-sunxi#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 linux-sunxi#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf linux-sunxi#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 linux-sunxi#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 linux-sunxi#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 linux-sunxi#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff linux-sunxi#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f linux-sunxi#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

Driver's probe allocates memory for RX FIFO (port->rx_fifo) based on default RX FIFO depth, e.g. 16. Later during serial startup the qcom_geni_serial_port_setup() updates the RX FIFO depth (port->rx_fifo_depth) to match real device capabilities, e.g. to 32. The RX UART handle code will read "port->rx_fifo_depth" number of words into "port->rx_fifo" buffer, thus exceeding the bounds. This can be observed in certain configurations with Qualcomm Bluetooth HCI UART device and KASAN: Bluetooth: hci0: QCA Product ID :0x00000010 Bluetooth: hci0: QCA SOC Version :0x400a0200 Bluetooth: hci0: QCA ROM Version :0x00000200 Bluetooth: hci0: QCA Patch Version:0x00000d2b Bluetooth: hci0: QCA controller version 0x02000200 Bluetooth: hci0: QCA Downloading qca/htbtfw20.tlv bluetooth hci0: Direct firmware load for qca/htbtfw20.tlv failed with error -2 Bluetooth: hci0: QCA Failed to request file: qca/htbtfw20.tlv (-2) Bluetooth: hci0: QCA Failed to download patch (-2) ================================================================== BUG: KASAN: slab-out-of-bounds in handle_rx_uart+0xa8/0x18c Write of size 4 at addr ffff279347d578c0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rt5-00350-gb2450b7e00be-dirty linux-sunxi#26 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: dump_backtrace.part.0+0xe0/0xf0 show_stack+0x18/0x40 dump_stack_lvl+0x8c/0xb8 print_report+0x188/0x488 kasan_report+0xb4/0x100 __asan_store4+0x80/0xa4 handle_rx_uart+0xa8/0x18c qcom_geni_serial_handle_rx+0x84/0x9c qcom_geni_serial_isr+0x24c/0x760 __handle_irq_event_percpu+0x108/0x500 handle_irq_event+0x6c/0x110 handle_fasteoi_irq+0x138/0x2cc generic_handle_domain_irq+0x48/0x64 If the RX FIFO depth changes after probe, be sure to resize the buffer. Fixes: f9d690b ("tty: serial: qcom_geni_serial: Allocate port->rx_fifo buffer in probe") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20221221164022.1087814-1-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b8caf69 upstream. Driver's probe allocates memory for RX FIFO (port->rx_fifo) based on default RX FIFO depth, e.g. 16. Later during serial startup the qcom_geni_serial_port_setup() updates the RX FIFO depth (port->rx_fifo_depth) to match real device capabilities, e.g. to 32. The RX UART handle code will read "port->rx_fifo_depth" number of words into "port->rx_fifo" buffer, thus exceeding the bounds. This can be observed in certain configurations with Qualcomm Bluetooth HCI UART device and KASAN: Bluetooth: hci0: QCA Product ID :0x00000010 Bluetooth: hci0: QCA SOC Version :0x400a0200 Bluetooth: hci0: QCA ROM Version :0x00000200 Bluetooth: hci0: QCA Patch Version:0x00000d2b Bluetooth: hci0: QCA controller version 0x02000200 Bluetooth: hci0: QCA Downloading qca/htbtfw20.tlv bluetooth hci0: Direct firmware load for qca/htbtfw20.tlv failed with error -2 Bluetooth: hci0: QCA Failed to request file: qca/htbtfw20.tlv (-2) Bluetooth: hci0: QCA Failed to download patch (-2) ================================================================== BUG: KASAN: slab-out-of-bounds in handle_rx_uart+0xa8/0x18c Write of size 4 at addr ffff279347d578c0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rt5-00350-gb2450b7e00be-dirty linux-sunxi#26 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: dump_backtrace.part.0+0xe0/0xf0 show_stack+0x18/0x40 dump_stack_lvl+0x8c/0xb8 print_report+0x188/0x488 kasan_report+0xb4/0x100 __asan_store4+0x80/0xa4 handle_rx_uart+0xa8/0x18c qcom_geni_serial_handle_rx+0x84/0x9c qcom_geni_serial_isr+0x24c/0x760 __handle_irq_event_percpu+0x108/0x500 handle_irq_event+0x6c/0x110 handle_fasteoi_irq+0x138/0x2cc generic_handle_domain_irq+0x48/0x64 If the RX FIFO depth changes after probe, be sure to resize the buffer. Fixes: f9d690b ("tty: serial: qcom_geni_serial: Allocate port->rx_fifo buffer in probe") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20221221164022.1087814-1-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] linux-sunxi#7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] linux-sunxi#8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] linux-sunxi#9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] linux-sunxi#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] linux-sunxi#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] linux-sunxi#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 linux-sunxi#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] linux-sunxi#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] linux-sunxi#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 linux-sunxi#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 linux-sunxi#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 linux-sunxi#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 linux-sunxi#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 linux-sunxi#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 linux-sunxi#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 linux-sunxi#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a linux-sunxi#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 linux-sunxi#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 linux-sunxi#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f linux-sunxi#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 linux-sunxi#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] linux-sunxi#7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c linux-sunxi#8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 linux-sunxi#9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d linux-sunxi#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

[ Upstream commit 8d21155 ] adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled when unlock. DEADLOCK might happen if we have held other locks and disabled IRQ before invoking it. Fix it by using spin_lock_irqsave() instead, which can keep IRQ state consistent with before when unlock. ================================ WARNING: inconsistent lock state 5.10.0-02758-g8e5f91fd772f linux-sunxi#26 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 {IN-HARDIRQ-W} state was registered at: __lock_acquire+0x3d7/0x1070 lock_acquire+0x197/0x4a0 __raw_spin_lock_irqsave _raw_spin_lock_irqsave+0x3b/0x60 bfq_idle_slice_timer_body bfq_idle_slice_timer+0x53/0x1d0 __run_hrtimer+0x477/0xa70 __hrtimer_run_queues+0x1c6/0x2d0 hrtimer_interrupt+0x302/0x9e0 local_apic_timer_interrupt __sysvec_apic_timer_interrupt+0xfd/0x420 run_sysvec_on_irqstack_cond sysvec_apic_timer_interrupt+0x46/0xa0 asm_sysvec_apic_timer_interrupt+0x12/0x20 irq event stamp: 837522 hardirqs last enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore hardirqs last enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40 hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50 softirqs last enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&bfqd->lock); <Interrupt> lock(&bfqd->lock); *** DEADLOCK *** 3 locks held by kworker/2:3/388: #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0 #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0 linux-sunxi#2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq linux-sunxi#2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390 stack backtrace: CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f linux-sunxi#26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kthrotld blk_throtl_dispatch_work_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 print_usage_bug valid_state mark_lock_irq.cold+0x32/0x3a mark_lock+0x693/0xbc0 mark_held_locks+0x9e/0xe0 __trace_hardirqs_on_caller lockdep_hardirqs_on_prepare.part.0+0x151/0x360 trace_hardirqs_on+0x5b/0x180 __raw_spin_unlock_irq _raw_spin_unlock_irq+0x24/0x40 spin_unlock_irq adjust_inuse_and_calc_cost+0x4fb/0x970 ioc_rqos_merge+0x277/0x740 __rq_qos_merge+0x62/0xb0 rq_qos_merge bio_attempt_back_merge+0x12c/0x4a0 blk_mq_sched_try_merge+0x1b6/0x4d0 bfq_bio_merge+0x24a/0x390 __blk_mq_sched_bio_merge+0xa6/0x460 blk_mq_sched_bio_merge blk_mq_submit_bio+0x2e7/0x1ee0 __submit_bio_noacct_mq+0x175/0x3b0 submit_bio_noacct+0x1fb/0x270 blk_throtl_dispatch_work_fn+0x1ef/0x2b0 process_one_work+0x83e/0x13f0 process_scheduled_works worker_thread+0x7e3/0xd80 kthread+0x353/0x470 ret_from_fork+0x1f/0x30 Fixes: b0853ab ("blk-iocost: revamp in-period donation snapbacks") Signed-off-by: Li Nan <linan122@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230527091904.3001833-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>

The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 linux-sunxi#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 linux-sunxi#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 linux-sunxi#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 linux-sunxi#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 linux-sunxi#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 linux-sunxi#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 linux-sunxi#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 linux-sunxi#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 linux-sunxi#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 linux-sunxi#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 linux-sunxi#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 linux-sunxi#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 linux-sunxi#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 linux-sunxi#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 linux-sunxi#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 linux-sunxi#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 linux-sunxi#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 linux-sunxi#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 linux-sunxi#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 linux-sunxi#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 linux-sunxi#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 linux-sunxi#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 linux-sunxi#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 linux-sunxi#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 linux-sunxi#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 linux-sunxi#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

[ Upstream commit 37c3b9f ] The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 jwrdegoede#1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 jwrdegoede#2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 jwrdegoede#3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 jwrdegoede#4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb jwrdegoede#5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] jwrdegoede#6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] linux-sunxi#7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] linux-sunxi#8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] linux-sunxi#9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] linux-sunxi#10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] linux-sunxi#11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] linux-sunxi#12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 linux-sunxi#13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] linux-sunxi#14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] linux-sunxi#15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 linux-sunxi#16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 linux-sunxi#17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 linux-sunxi#18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 linux-sunxi#19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 linux-sunxi#20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 linux-sunxi#21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 linux-sunxi#22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a linux-sunxi#23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 linux-sunxi#24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 linux-sunxi#25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f linux-sunxi#26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 linux-sunxi#27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 jwrdegoede#1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 jwrdegoede#2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 jwrdegoede#3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b jwrdegoede#4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] jwrdegoede#5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] jwrdegoede#6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] linux-sunxi#7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c linux-sunxi#8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 linux-sunxi#9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d linux-sunxi#10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

amery mentioned this issue Sep 17, 2012

goodix_touch not work #78

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goodix GT801NI_3R15_1AV ("2 in 1"?) touchscreen doesn't work #26

Goodix GT801NI_3R15_1AV ("2 in 1"?) touchscreen doesn't work #26

turl commented May 23, 2012

amery commented May 23, 2012

turl commented May 23, 2012

turl commented May 26, 2012

frantisheq commented May 26, 2012

turl commented May 26, 2012

simonckenyon commented Jul 11, 2012

amery commented Jul 11, 2012

simonckenyon commented Jul 11, 2012

amery commented Jul 11, 2012

turl commented Jul 11, 2012

amery commented Jul 11, 2012

simonckenyon commented Jul 11, 2012

simonckenyon commented Jul 11, 2012

johnsonc commented Sep 26, 2012

johnsonc commented Sep 26, 2012

wingrime commented Oct 14, 2012

johnsonc commented Oct 15, 2012

techn commented Dec 14, 2012

Goodix GT801NI_3R15_1AV ("2 in 1"?) touchscreen doesn't work #26

Goodix GT801NI_3R15_1AV ("2 in 1"?) touchscreen doesn't work #26

Comments

turl commented May 23, 2012

amery commented May 23, 2012

turl commented May 23, 2012

turl commented May 26, 2012

frantisheq commented May 26, 2012

turl commented May 26, 2012

simonckenyon commented Jul 11, 2012

amery commented Jul 11, 2012

simonckenyon commented Jul 11, 2012

thanks for all your help

amery commented Jul 11, 2012

turl commented Jul 11, 2012

amery commented Jul 11, 2012

simonckenyon commented Jul 11, 2012

Sorry for the rant but it has been a tough day.

simonckenyon commented Jul 11, 2012

johnsonc commented Sep 26, 2012

johnsonc commented Sep 26, 2012

wingrime commented Oct 14, 2012

johnsonc commented Oct 15, 2012

techn commented Dec 14, 2012