forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 82
optee: avoid delayed release of shm #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced May 31, 2016
Well done Jens 👍 |
Avoids delayed release of a shared memory object allocated by supplicant. Tested-by: Jerome Forissier <jerome.forissier@linaro.org> (HiKey) Tested-by: Jens Wiklander <jens.wiklander@linaro.org> (QEMU) Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
5ece447
to
2faeddf
Compare
Tag applied |
showliu
pushed a commit
to showliu/linux
that referenced
this pull request
Jun 16, 2016
commit ec183d2 upstream. Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 linaro-swg#1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 linaro-swg#2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 linaro-swg#3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 linaro-swg#4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 linaro-swg#5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 linaro-swg#6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 linaro-swg#7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 linaro-swg#8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 linaro-swg#9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 linaro-swg#10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 linaro-swg#11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 linaro-swg#12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 linaro-swg#13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 linaro-swg#14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 linaro-swg#15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Tong Zhang <ztong@vt.edu> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/1453809921-24596-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vchong
pushed a commit
that referenced
this pull request
Feb 24, 2017
commit 4dfce57 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] #14 [ffff8800a57bbe00] evict at ffffffff811e1b67 #15 [ffff8800a57bbe28] iput at ffffffff811e23a5 #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 #17 [ffff8800a57bbe88] dput at ffffffff811dd06c #18 [ffff8800a57bbea8] __fput at ffffffff811c823b #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vchong
pushed a commit
that referenced
this pull request
Oct 4, 2018
syzbot hit the following crash on upstream commit 83beed7 (Fri Apr 20 17:56:32 2018 +0000) Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887 C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value ------------[ cut here ]------------ kernel BUG at fs/f2fs/node.c:1185! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: 0018:ffff8801d960e820 EFLAGS: 00010293 RAX: ffff8801d88205c0 RBX: 0000000000000003 RCX: ffffffff82f6cc06 RDX: 0000000000000000 RSI: ffffffff82f6d5e8 RDI: 0000000000000004 RBP: ffff8801d960ec30 R08: ffff8801d88205c0 R09: ffffed003b5e46c2 R10: 0000000000000003 R11: 0000000000000003 R12: ffff8801a86e00c0 R13: 0000000000000001 R14: ffff8801a86e0530 R15: ffff8801d9745240 FS: 000000000072c880(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3d403209b8 CR3: 00000001d8f3f000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: get_node_page fs/f2fs/node.c:1237 [inline] truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014 remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039 f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547 evict+0x4a6/0x960 fs/inode.c:557 iput_final fs/inode.c:1519 [inline] iput+0x62d/0xa80 fs/inode.c:1545 f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849 mount_bdev+0x30c/0x3e0 fs/super.c:1164 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020 mount_fs+0xae/0x328 fs/super.c:1267 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:1027 [inline] do_new_mount fs/namespace.c:2518 [inline] do_mount+0x564/0x3070 fs/namespace.c:2848 ksys_mount+0x12d/0x140 fs/namespace.c:3064 __do_sys_mount fs/namespace.c:3078 [inline] __se_sys_mount fs/namespace.c:3075 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x443dea RSP: 002b:00007ffcc7882368 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443dea RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffcc7882370 RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004 R13: 0000000000402ce0 R14: 0000000000000000 R15: 0000000000000000 RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: ffff8801d960e820 ---[ end trace 4edbeb71f002bb76 ]--- Reported-and-tested-by: syzbot+d154ec99402c6f628887@syzkaller.appspotmail.com Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
vchong
pushed a commit
that referenced
this pull request
Oct 4, 2018
This patch enhances sanity check for SIT entries. syzbot hit the following crash on upstream commit 83beed7 (Fri Apr 20 17:56:32 2018 +0000) Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): Try to recover 1th superblock, ret: 0 F2FS-fs (loop0): Mounted with checkpoint version = d F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740 ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:1884! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: 0018:ffff8801af526708 EFLAGS: 00010282 RAX: ffffed0035ea4cc0 RBX: ffff8801ad454f90 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff82eeb87e RDI: ffffed0035ea4cb6 RBP: ffff8801af526760 R08: ffff8801ad4a2480 R09: ffffed003b5e4f90 R10: ffffed003b5e4f90 R11: ffff8801daf27c87 R12: ffff8801adb8d380 R13: 0000000000000001 R14: 0000000000000008 R15: 00000000ffffffff FS: 00000000014af940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f06bc223000 CR3: 00000001adb02000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663 do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727 write_node_page+0x129/0x350 fs/f2fs/segment.c:2770 __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398 sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652 block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088 write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405 f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077 __sync_filesystem fs/sync.c:39 [inline] sync_filesystem+0x265/0x310 fs/sync.c:67 generic_shutdown_super+0xd7/0x520 fs/super.c:429 kill_block_super+0xa4/0x100 fs/super.c:1191 kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030 deactivate_locked_super+0x97/0x100 fs/super.c:316 deactivate_super+0x188/0x1b0 fs/super.c:347 cleanup_mnt+0xbf/0x160 fs/namespace.c:1174 __cleanup_mnt+0x16/0x20 fs/namespace.c:1181 task_work_run+0x1e4/0x290 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457d97 RSP: 002b:00007ffd46f9c8e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000457d97 RDX: 00000000014b09a3 RSI: 0000000000000002 RDI: 00007ffd46f9da50 RBP: 00007ffd46f9da50 R08: 0000000000000000 R09: 0000000000000009 R10: 0000000000000005 R11: 0000000000000246 R12: 00000000014b0940 R13: 0000000000000000 R14: 0000000000000002 R15: 000000000000658e RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: ffff8801af526708 ---[ end trace f498328bb02610a2 ]--- Reported-and-tested-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+7d6d31d3bc702f566ce3@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+0a725420475916460f12@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
vchong
pushed a commit
that referenced
this pull request
Oct 5, 2018
commit 8a29c12 upstream. This patch enhances sanity check for SIT entries. syzbot hit the following crash on upstream commit 83beed7 (Fri Apr 20 17:56:32 2018 +0000) Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. F2FS-fs (loop0): invalid crc value F2FS-fs (loop0): Try to recover 1th superblock, ret: 0 F2FS-fs (loop0): Mounted with checkpoint version = d F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740 ------------[ cut here ]------------ kernel BUG at fs/f2fs/segment.c:1884! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: 0018:ffff8801af526708 EFLAGS: 00010282 RAX: ffffed0035ea4cc0 RBX: ffff8801ad454f90 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff82eeb87e RDI: ffffed0035ea4cb6 RBP: ffff8801af526760 R08: ffff8801ad4a2480 R09: ffffed003b5e4f90 R10: ffffed003b5e4f90 R11: ffff8801daf27c87 R12: ffff8801adb8d380 R13: 0000000000000001 R14: 0000000000000008 R15: 00000000ffffffff FS: 00000000014af940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f06bc223000 CR3: 00000001adb02000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663 do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727 write_node_page+0x129/0x350 fs/f2fs/segment.c:2770 __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398 sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652 block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088 write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405 f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077 __sync_filesystem fs/sync.c:39 [inline] sync_filesystem+0x265/0x310 fs/sync.c:67 generic_shutdown_super+0xd7/0x520 fs/super.c:429 kill_block_super+0xa4/0x100 fs/super.c:1191 kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030 deactivate_locked_super+0x97/0x100 fs/super.c:316 deactivate_super+0x188/0x1b0 fs/super.c:347 cleanup_mnt+0xbf/0x160 fs/namespace.c:1174 __cleanup_mnt+0x16/0x20 fs/namespace.c:1181 task_work_run+0x1e4/0x290 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457d97 RSP: 002b:00007ffd46f9c8e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000457d97 RDX: 00000000014b09a3 RSI: 0000000000000002 RDI: 00007ffd46f9da50 RBP: 00007ffd46f9da50 R08: 0000000000000000 R09: 0000000000000009 R10: 0000000000000005 R11: 0000000000000246 R12: 00000000014b0940 R13: 0000000000000000 R14: 0000000000000002 R15: 000000000000658e RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: ffff8801af526708 ---[ end trace f498328bb02610a2 ]--- Reported-and-tested-by: syzbot+bf9253040425feb155ad@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+7d6d31d3bc702f566ce3@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+0a725420475916460f12@syzkaller.appspotmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vchong
pushed a commit
that referenced
this pull request
Oct 5, 2018
commit a5ba1d9 upstream. We have reports of the following crash: PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0" #0 [ffff88085c6db710] machine_kexec at ffffffff81046239 #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248 #2 [ffff88085c6db830] oops_end at ffffffff81008ae7 #3 [ffff88085c6db860] no_context at ffffffff81050b8f #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75 #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83 #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122 [exception RIP: uart_put_char+149] RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006 RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081 RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120 RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320 R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000 R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544 #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2 #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016 #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194 #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2 #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d #20 [ffff88085c6dbeb0] kthread at ffffffff81096384 #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f after slogging through some dissasembly: ffffffff814b6720 <uart_put_char>: ffffffff814b6720: 55 push %rbp ffffffff814b6721: 48 89 e5 mov %rsp,%rbp ffffffff814b6724: 48 83 ec 20 sub $0x20,%rsp ffffffff814b6728: 48 89 1c 24 mov %rbx,(%rsp) ffffffff814b672c: 4c 89 64 24 08 mov %r12,0x8(%rsp) ffffffff814b6731: 4c 89 6c 24 10 mov %r13,0x10(%rsp) ffffffff814b6736: 4c 89 74 24 18 mov %r14,0x18(%rsp) ffffffff814b673b: e8 b0 8e 58 00 callq ffffffff81a3f5f0 <mcount> ffffffff814b6740: 4c 8b a7 88 02 00 00 mov 0x288(%rdi),%r12 ffffffff814b6747: 45 31 ed xor %r13d,%r13d ffffffff814b674a: 41 89 f6 mov %esi,%r14d ffffffff814b674d: 49 83 bc 24 70 01 00 cmpq $0x0,0x170(%r12) ffffffff814b6754: 00 00 ffffffff814b6756: 49 8b 9c 24 80 01 00 mov 0x180(%r12),%rbx ffffffff814b675d: 00 ffffffff814b675e: 74 2f je ffffffff814b678f <uart_put_char+0x6f> ffffffff814b6760: 48 89 df mov %rbx,%rdi ffffffff814b6763: e8 a8 67 58 00 callq ffffffff81a3cf10 <_raw_spin_lock_irqsave> ffffffff814b6768: 41 8b 8c 24 78 01 00 mov 0x178(%r12),%ecx ffffffff814b676f: 00 ffffffff814b6770: 89 ca mov %ecx,%edx ffffffff814b6772: f7 d2 not %edx ffffffff814b6774: 41 03 94 24 7c 01 00 add 0x17c(%r12),%edx ffffffff814b677b: 00 ffffffff814b677c: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b6782: 75 23 jne ffffffff814b67a7 <uart_put_char+0x87> ffffffff814b6784: 48 89 c6 mov %rax,%rsi ffffffff814b6787: 48 89 df mov %rbx,%rdi ffffffff814b678a: e8 e1 64 58 00 callq ffffffff81a3cc70 <_raw_spin_unlock_irqrestore> ffffffff814b678f: 44 89 e8 mov %r13d,%eax ffffffff814b6792: 48 8b 1c 24 mov (%rsp),%rbx ffffffff814b6796: 4c 8b 64 24 08 mov 0x8(%rsp),%r12 ffffffff814b679b: 4c 8b 6c 24 10 mov 0x10(%rsp),%r13 ffffffff814b67a0: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff814b67a5: c9 leaveq ffffffff814b67a6: c3 retq ffffffff814b67a7: 49 8b 94 24 70 01 00 mov 0x170(%r12),%rdx ffffffff814b67ae: 00 ffffffff814b67af: 48 63 c9 movslq %ecx,%rcx ffffffff814b67b2: 41 b5 01 mov $0x1,%r13b ffffffff814b67b5: 44 88 34 0a mov %r14b,(%rdx,%rcx,1) ffffffff814b67b9: 41 8b 94 24 78 01 00 mov 0x178(%r12),%edx ffffffff814b67c0: 00 ffffffff814b67c1: 83 c2 01 add $0x1,%edx ffffffff814b67c4: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b67ca: 41 89 94 24 78 01 00 mov %edx,0x178(%r12) ffffffff814b67d1: 00 ffffffff814b67d2: eb b0 jmp ffffffff814b6784 <uart_put_char+0x64> ffffffff814b67d4: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff814b67db: 00 00 00 00 00 for our build, this is crashing at: circ->buf[circ->head] = c; Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf) protected by the "per-port mutex", which based on uart_port_check() is state->port.mutex. Indeed, the lock acquired in uart_put_char() is uport->lock, i.e. not the same lock. Anyway, since the lock is not acquired, if uart_shutdown() is called, the last chunk of that function may release state->xmit.buf before its assigned to null, and cause the race above. To fix it, let's lock uport->lock when allocating/deallocating state->xmit.buf in addition to the per-port mutex. v2: switch to locking uport->lock on allocation/deallocation instead of locking the per-port mutex in uart_put_char. Note that since uport->lock is a spin lock, we have to switch the allocation to GFP_ATOMIC. v3: move the allocation outside the lock, so we can switch back to GFP_KERNEL Signed-off-by: Tycho Andersen <tycho@tycho.ws> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vchong
pushed a commit
that referenced
this pull request
Oct 5, 2018
[ Upstream commit 46b3722 ] We occasionaly hit following assert failure in 'perf top', when processing the /proc info in multiple threads. perf: ...include/linux/refcount.h:109: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed. The gdb backtrace looks like this: [Switching to Thread 0x7ffff11ba700 (LWP 13749)] 0x00007ffff50839fb in raise () from /lib64/libc.so.6 (gdb) #0 0x00007ffff50839fb in raise () from /lib64/libc.so.6 #1 0x00007ffff5085800 in abort () from /lib64/libc.so.6 #2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000000535373 in refcount_inc (r=0x7fffdc009be0) at ...include/linux/refcount.h:109 #5 0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0) at util/comm.c:24 #6 0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:72 #7 0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:95 #8 0x000000000053582e in comm__new (str=0x7fffd000b260 ":2", timestamp=0, exec=false) at util/comm.c:111 #9 0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57 #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38, threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457 #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38, ... The failing assertion is this one: REFCOUNT_WARN(!refcount_inc_not_zero(r), ... The problem is that we keep global comm_str_root list, which is accessed by multiple threads during the 'perf top' startup and following 2 paths can race: thread 1: ... thread__new comm__new comm_str__findnew down_write(&comm_str_lock); __comm_str__findnew comm_str__get thread 2: ... comm__override or comm__free comm_str__put refcount_dec_and_test down_write(&comm_str_lock); rb_erase(&cs->rb_node, &comm_str_root); Because thread 2 first decrements the refcnt and only after then it removes the struct comm_str from the list, the thread 1 can find this object on the list with refcnt equls to 0 and hit the assert. This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects that already dropped the refcnt to 0. For the rest of the objects we take the refcnt before comparing its name and release it afterwards with comm_str__put, which can also release the object completely. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
odeprez
pushed a commit
to odeprez/linux-swg
that referenced
this pull request
Oct 3, 2023
Fix an error detected by memory sanitizer: ``` ==4033==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6 linaro-swg#1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2 linaro-swg#2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9 linaro-swg#3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32 linaro-swg#4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6 linaro-swg#5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8 linaro-swg#6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8 linaro-swg#7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9 linaro-swg#8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8 linaro-swg#9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15 linaro-swg#10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9 linaro-swg#11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8 linaro-swg#12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5 linaro-swg#13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9 linaro-swg#14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11 linaro-swg#15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8 linaro-swg#16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2 linaro-swg#17 0x55fb0f941dab in main tools/perf/perf.c:535:3 ``` Fixes: 7b723db ("perf pmu: Be lazy about loading event info files from sysfs") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230914022425.1489035-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
odeprez
pushed a commit
to odeprez/linux-swg
that referenced
this pull request
Oct 9, 2023
The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f linaro-swg#1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 linaro-swg#2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee linaro-swg#3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 linaro-swg#4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 linaro-swg#5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c linaro-swg#6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] linaro-swg#7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] linaro-swg#8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f linaro-swg#9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 linaro-swg#10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] linaro-swg#11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc linaro-swg#12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] linaro-swg#13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] linaro-swg#14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] linaro-swg#15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] linaro-swg#16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 linaro-swg#17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] linaro-swg#18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] linaro-swg#19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 linaro-swg#20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
jenswi-linaro
pushed a commit
to jenswi-linaro/linux-1
that referenced
this pull request
Apr 28, 2025
syzkaller reported a warning in __sk_skb_reason_drop(). Commit 61b95c7 ("net: ip: make ip_route_input_rcu() return drop reasons") missed a path where -EINVAL is returned. Then, the cited commit started to trigger the warning with the invalid error. Let's fix it by returning SKB_DROP_REASON_NOT_SPECIFIED. [0]: WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline] WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241 Modules linked in: CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.12.0-10686-gbb18265c3aba linaro-swg#10 1c308307628619808b5a4a0495c4aab5637b0551 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline] RIP: 0010:sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241 Code: 5d 41 5c 41 5d 41 5e e9 e7 9e 95 fd e8 e2 9e 95 fd 31 ff 44 89 e6 e8 58 a1 95 fd 45 85 e4 0f 85 a2 00 00 00 e8 ca 9e 95 fd 90 <0f> 0b 90 e8 c1 9e 95 fd 44 89 e6 bf 01 00 00 00 e8 34 a1 95 fd 41 RSP: 0018:ffa0000000007650 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 000000000000ffff RCX: ffffffff83bc3592 RDX: ff110001002a0000 RSI: ffffffff83bc34d6 RDI: 0000000000000007 RBP: ff11000109ee85f0 R08: 0000000000000001 R09: ffe21c00213dd0da R10: 000000000000ffff R11: 0000000000000000 R12: 00000000ffffffea R13: 0000000000000000 R14: ff11000109ee86d4 R15: ff11000109ee8648 FS: 0000000000000000(0000) GS:ff1100011a000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020177000 CR3: 0000000108a3d006 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: <IRQ> kfree_skb_reason include/linux/skbuff.h:1263 [inline] ip_rcv_finish_core.constprop.0+0x896/0x2320 net/ipv4/ip_input.c:424 ip_list_rcv_finish.constprop.0+0x1b2/0x710 net/ipv4/ip_input.c:610 ip_sublist_rcv net/ipv4/ip_input.c:636 [inline] ip_list_rcv+0x34a/0x460 net/ipv4/ip_input.c:670 __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline] __netif_receive_skb_list_core+0x536/0x900 net/core/dev.c:5762 __netif_receive_skb_list net/core/dev.c:5814 [inline] netif_receive_skb_list_internal+0x77c/0xdc0 net/core/dev.c:5905 gro_normal_list include/net/gro.h:515 [inline] gro_normal_list include/net/gro.h:511 [inline] napi_complete_done+0x219/0x8c0 net/core/dev.c:6256 wg_packet_rx_poll+0xbff/0x1e40 drivers/net/wireguard/receive.c:488 __napi_poll.constprop.0+0xb3/0x530 net/core/dev.c:6877 napi_poll net/core/dev.c:6946 [inline] net_rx_action+0x9eb/0xe30 net/core/dev.c:7068 handle_softirqs+0x1ac/0x740 kernel/softirq.c:554 do_softirq kernel/softirq.c:455 [inline] do_softirq+0x48/0x80 kernel/softirq.c:442 </IRQ> <TASK> __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:382 spin_unlock_bh include/linux/spinlock.h:396 [inline] ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline] wg_packet_decrypt_worker+0x3ba/0x580 drivers/net/wireguard/receive.c:499 process_one_work+0x940/0x1a70 kernel/workqueue.c:3229 process_scheduled_works kernel/workqueue.c:3310 [inline] worker_thread+0x639/0xe30 kernel/workqueue.c:3391 kthread+0x283/0x350 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244 </TASK> Fixes: 82d9983 ("net: ip: make ip_route_input_noref() return drop reasons") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241206020715.80207-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
jenswi-linaro
pushed a commit
to jenswi-linaro/linux-1
that referenced
this pull request
Aug 27, 2025
Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list() periodically release j_list_lock after processing a batch of buffers to avoid long hold times on the j_list_lock. However, since both functions contend for j_list_lock, the combined time spent waiting and processing can be significant. jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when need_resched() is true to avoid softlockups during prolonged operations. But jbd2_log_do_checkpoint() only exits its loop when need_resched() is true, relying on potentially sleeping functions like __flush_batch() or wait_on_buffer() to trigger rescheduling. If those functions do not sleep, the kernel may hit a softlockup. watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373] CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ linaro-swg#10 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017 Workqueue: writeback wb_workfn (flush-7:2) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : native_queued_spin_lock_slowpath+0x358/0x418 lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] Call trace: native_queued_spin_lock_slowpath+0x358/0x418 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2] add_transaction_credits+0x3bc/0x418 [jbd2] start_this_handle+0xf8/0x560 [jbd2] jbd2__journal_start+0x118/0x228 [jbd2] __ext4_journal_start_sb+0x110/0x188 [ext4] ext4_do_writepages+0x3dc/0x740 [ext4] ext4_writepages+0xa4/0x190 [ext4] do_writepages+0x94/0x228 __writeback_single_inode+0x48/0x318 writeback_sb_inodes+0x204/0x590 __writeback_inodes_wb+0x54/0xf8 wb_writeback+0x2cc/0x3d8 wb_do_writeback+0x2e0/0x2f8 wb_workfn+0x80/0x2a8 process_one_work+0x178/0x3e8 worker_thread+0x234/0x3b8 kthread+0xf0/0x108 ret_from_fork+0x10/0x20 So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid softlockup. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250812063752.912130-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
jenswi-linaro
pushed a commit
to jenswi-linaro/linux-1
that referenced
this pull request
Aug 27, 2025
…dlock When a user creates a dualpi2 qdisc it automatically sets a timer. This timer will run constantly and update the qdisc's probability field. The issue is that the timer acquires the qdisc root lock and runs in hardirq. The qdisc root lock is also acquired in dev.c whenever a packet arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq, it may interrupt the packet processing running in softirq. If that happens and it runs on the same CPU, it will acquire the same lock and cause a deadlock. The following splat shows up when running a kernel compiled with lock debugging: [ +0.000224] WARNING: inconsistent lock state [ +0.000224] 6.16.0+ linaro-swg#10 Not tainted [ +0.000169] -------------------------------- [ +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes: [ +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140 [ +0.000000] {IN-HARDIRQ-W} state was registered at: [ +0.000000] lock_acquire.part.0+0xb6/0x220 [ +0.000000] _raw_spin_lock+0x31/0x80 [ +0.000000] dualpi2_timer+0x6f/0x270 [ +0.000000] __hrtimer_run_queues+0x1c5/0x360 [ +0.000000] hrtimer_interrupt+0x115/0x260 [ +0.000000] __sysvec_apic_timer_interrupt+0x6d/0x1a0 [ +0.000000] sysvec_apic_timer_interrupt+0x6e/0x80 [ +0.000000] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ +0.000000] pv_native_safe_halt+0xf/0x20 [ +0.000000] default_idle+0x9/0x10 [ +0.000000] default_idle_call+0x7e/0x1e0 [ +0.000000] do_idle+0x1e8/0x250 [ +0.000000] cpu_startup_entry+0x29/0x30 [ +0.000000] rest_init+0x151/0x160 [ +0.000000] start_kernel+0x6f3/0x700 [ +0.000000] x86_64_start_reservations+0x24/0x30 [ +0.000000] x86_64_start_kernel+0xc8/0xd0 [ +0.000000] common_startup_64+0x13e/0x148 [ +0.000000] irq event stamp: 6884 [ +0.000000] hardirqs last enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270 [ +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270 [ +0.000000] softirqs last enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270 [ +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140 [ +0.000000] other info that might help us debug this: [ +0.000000] Possible unsafe locking scenario: [ +0.000000] CPU0 [ +0.000000] ---- [ +0.000000] lock(&sch->root_lock_key); [ +0.000000] <Interrupt> [ +0.000000] lock(&sch->root_lock_key); [ +0.000000] *** DEADLOCK *** [ +0.000000] 4 locks held by ping/156: [ +0.000000] #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40 [ +0.000000] #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190 [ +0.000000] #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950 [ +0.000000] #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140 I am able to reproduce it consistently when running the following: tc qdisc add dev lo handle 1: root dualpi2 ping -f 127.0.0.1 To fix it, make the timer run in softirq. Fixes: 320d031 ("sched: Struct definition and parsing of dualpi2 qdisc") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20250815135317.664993-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Avoids delayed release of a shared memory object allocated by
supplicant.
Tested-by: Jens Wiklander jens.wiklander@linaro.org (QEMU)
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org
Fixes OP-TEE/optee_test#96 and hopefully OP-TEE/optee_os#798 too.