Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel 7th Generation Core #6

Closed
maqiangddb opened this issue Sep 11, 2017 · 12 comments
Closed

Intel 7th Generation Core #6

maqiangddb opened this issue Sep 11, 2017 · 12 comments

Comments

@maqiangddb
Copy link

@maqiangddb maqiangddb commented Sep 11, 2017

Hi:
If Intel 7th Generation Core will support this feature?

zhiwang1 pushed a commit that referenced this issue Oct 16, 2017
…ug deadlock

4.14-rc1 gained the fancy new cross-release support in lockdep, which
seems to have uncovered a few more rules about what is allowed and
isn't.

This one here seems to indicate that allocating a work-queue while
holding mmap_sem is a no-go, so let's try to preallocate it.

Of course another way to break this chain would be somewhere in the
cpu hotplug code, since this isn't the only trace we're finding now
which goes through msr_create_device.

Full lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
------------------------------------------------------
prime_mmap/1551 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50

but task is already holding lock:
 (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev_priv->mm_lock){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xa3/0x840
       cpuhp_thread_fun+0x7a/0x150
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x10b/0x170
       __cpuhp_setup_state_cpuslocked+0x134/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x52/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       apply_workqueue_attrs+0x17/0x50
       __alloc_workqueue_key+0x1d8/0x4d9
       i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev_priv->mm_lock);
                               lock(&mm->mmap_sem);
                               lock(&dev_priv->mm_lock);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

2 locks held by prime_mmap/1551:
 #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
 #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

stack backtrace:
CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? apply_workqueue_attrs+0x17/0x50
 cpus_read_lock+0x3d/0xb0
 ? apply_workqueue_attrs+0x17/0x50
 apply_workqueue_attrs+0x17/0x50
 __alloc_workqueue_key+0x1d8/0x4d9
 ? __lockdep_init_map+0x57/0x1c0
 i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
 i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 drm_ioctl_kernel+0x69/0xb0
 drm_ioctl+0x2f9/0x3d0
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 ? __do_page_fault+0x2a4/0x570
 do_vfs_ioctl+0x94/0x670
 ? entry_SYSCALL_64_fastpath+0x5/0xb1
 ? __this_cpu_preempt_check+0x13/0x20
 ? trace_hardirqs_on_caller+0xe3/0x1b0
 SyS_ioctl+0x41/0x70
 entry_SYSCALL_64_fastpath+0x1c/0xb1
RIP: 0033:0x7fbb83c39587
RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
 ? __this_cpu_preempt_check+0x13/0x20

Note that this also has the minor benefit of slightly reducing the
critical section where we hold mmap_sem.

v2: Set ret correctly when we raced with another thread.

v3: Use Chris' diff. Attach the right lockdep splat.

v4: Repaint in Tvrtko's colors (aka don't report ENOMEM if we race and
some other thread managed to not also get an ENOMEM and successfully
install the mmu notifier. Note that the kernel guarantees that small
allocations succeed, so this never actually happens).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sasha Levin <alexander.levin@verizon.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Cc: Tejun Heo <tj@kernel.org>
References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009164401.16035-1-daniel.vetter@ffwll.ch
zhiwang1 pushed a commit that referenced this issue Oct 16, 2017
stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.

This patch tries to address the locking abuse of stop_machine() from

commit 20e4933
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 22 14:41:21 2016 +0000

    drm/i915: Stop the machine as we install the wedged submit_request handler

Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.

I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.

Unfortunately there's a complication with the call to
intel_engine_init_global_seqno:

- Without stop_machine we must hold the corresponding spinlock.

- Without stop_machine we must ensure that all requests are marked as
  having failed with dma_fence_set_error() before we call it. That
  means we need to split the nop request submission into two phases,
  both synchronized with rcu:

  1. Only stop submitting the requests to hw and mark them as failed.

  2. After all pending requests in the scheduler/ring are suitably
  marked up as failed and we can force complete them all, also force
  complete by calling intel_engine_init_global_seqno().

This should fix the followwing lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40

but task is already holding lock:
 (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev->struct_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_interruptible_nested+0x1b/0x20
       i915_mutex_lock_interruptible+0x51/0x130 [i915]
       i915_gem_fault+0x209/0x650 [i915]
       __do_fault+0x1e/0x80
       __handle_mm_fault+0xa08/0xed0
       handle_mm_fault+0x156/0x300
       __do_page_fault+0x2c5/0x570
       do_page_fault+0x28/0x250
       page_fault+0x22/0x30

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xc9/0xbf0
       cpuhp_thread_fun+0x17b/0x240
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state-up){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x133/0x1c0
       __cpuhp_setup_state_cpuslocked+0x139/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x53/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       stop_machine+0x1c/0x40
       i915_gem_set_wedged+0x1a/0x20 [i915]
       i915_reset+0xb9/0x230 [i915]
       i915_reset_device+0x1f6/0x260 [i915]
       i915_handle_error+0x2d8/0x430 [i915]
       hangcheck_declare_hang+0xd3/0xf0 [i915]
       i915_hangcheck_elapsed+0x262/0x2d0 [i915]
       process_one_work+0x233/0x660
       worker_thread+0x4e/0x3b0
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev->struct_mutex);
                               lock(&mm->mmap_sem);
                               lock(&dev->struct_mutex);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

3 locks held by kworker/3:4/562:
 #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 ? irq_work_queue+0x86/0xe0
 ? wake_up_klogd+0x53/0x70
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? stop_machine+0x1c/0x40
 ? i915_gem_object_truncate+0x50/0x50 [i915]
 cpus_read_lock+0x3d/0xb0
 ? stop_machine+0x1c/0x40
 stop_machine+0x1c/0x40
 i915_gem_set_wedged+0x1a/0x20 [i915]
 i915_reset+0xb9/0x230 [i915]
 i915_reset_device+0x1f6/0x260 [i915]
 ? gen8_gt_irq_ack+0x170/0x170 [i915]
 ? work_on_cpu_safe+0x60/0x60
 i915_handle_error+0x2d8/0x430 [i915]
 ? vsnprintf+0xd1/0x4b0
 ? scnprintf+0x3a/0x70
 hangcheck_declare_hang+0xd3/0xf0 [i915]
 ? intel_runtime_pm_put+0x56/0xa0 [i915]
 i915_hangcheck_elapsed+0x262/0x2d0 [i915]
 process_one_work+0x233/0x660
 worker_thread+0x4e/0x3b0
 kthread+0x152/0x190
 ? process_one_work+0x660/0x660
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang

v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.

v3: We need to protect the seqno update with the timeline spinlock (in
set_wedged) to avoid racing with other updates of the seqno, like we
already do in nop_submit_request (Chris).

v4: Use two-phase sequence to plug the race Chris spotted where we can
complete requests before they're marked up with -EIO.

v5: Review from Chris:
- simplify nop_submit_request.
- Add comment to rcu_read_lock section.
- Align comments with the new style.

v6: Remove unused variable to appease CI.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch
zhenyw pushed a commit that referenced this issue Oct 26, 2017
Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL
pointer deref when he ran it on a data with namespace events.  It was
because the buildid_id__mark_dso_hit_ops lacks the namespace event
handler and perf_too__fill_default() didn't set it.

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000000000 in ?? ()
  Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x
  +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x
  +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x
  (gdb) where
  #0  0x0000000000000000 in ?? ()
  #1  0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18,
      evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888,
      tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287
  #2  0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>,
      sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340
  #3  perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470,
      file_offset=file_offset@entry=1136) at util/session.c:1522
  #4  0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>,
      data_offset=<optimized out>, session=0x0) at util/session.c:1899
  #5  perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953
  #6  0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>)
      at builtin-buildid-list.c:83
  #7  cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115
  #8  0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0)
      at perf.c:296
  #9  0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348
  #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392
  #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536
  (gdb)

Fix it by adding a stub event handler for namespace event.

Committer testing:

Further clarifying, plain using 'perf buildid-list' will not end up in a
SEGFAULT when processing a perf.data file with namespace info:

  # perf record -a --namespaces sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ]
  # perf buildid-list | wc -l
  38
  # perf buildid-list | head -5
  e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux
  874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko
  ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko
  5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko
  f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so
  #

It is only when one asks for checking what of those entries actually had
samples, i.e. when we use either -H or --with-hits, that we will process
all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c
neither explicitely set a perf_tool.namespaces() callback nor the
default stub was set that we end up, when processing a
PERF_RECORD_NAMESPACE record, causing a SEGFAULT:

  # perf buildid-list -H
  Segmentation fault (core dumped)
  ^C
  #

Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Fixes: f3b3614 ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info")
Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Nov 2, 2017
rdma_nl_rcv_msg() checks to see if it should use the .dump() callback
or the .doit() callback.  The check is done with this check:

if (flags & NLM_F_DUMP) ...

The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH).

When an RDMA_NL_LS message (response) is received, the bit used for
indicating an error is the same bit as NLM_F_ROOT.

NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR.

ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error
occurs in the service.  The current code then misinterprets the
NLM_F_DUMP bit and trys to call the .dump() callback.

If the .dump() callback for the specified request is not available
(which is true for the RDMA_NL_LS messages) the following Oops occurs:

[ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at
   (null)
[ 4555.969046] IP:           (null)
[ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0
[ 4555.979287] Oops: 0010 [#1] SMP
[ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod
target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod
dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd
glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core mei_me
lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci
pps_core drm dca libata i2c_algo_bit i2c_core
[ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G          I
4.14.0-rc2+ #6
[ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000
[ 4556.087967] RIP: 0010:          (null)
[ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246
[ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX:
0000000000000000
[ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI:
ffff881056362000
[ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09:
0000000000001100
[ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12:
ffff881056362000
[ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15:
0000000000000ec0
[ 4556.137945] FS:  00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000)
knlGS:0000000000000000
[ 4556.147000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4:
00000000001606e0
[ 4556.161419] Call Trace:
[ 4556.164167]  ? netlink_dump+0x12c/0x290
[ 4556.168468]  __netlink_dump_start+0x186/0x1f0
[ 4556.173357]  rdma_nl_rcv_msg+0x193/0x1b0 [ib_core]
[ 4556.178724]  rdma_nl_rcv+0xdc/0x130 [ib_core]
[ 4556.183604]  netlink_unicast+0x181/0x240
[ 4556.187998]  netlink_sendmsg+0x2c2/0x3b0
[ 4556.192392]  sock_sendmsg+0x38/0x50
[ 4556.196299]  SYSC_sendto+0x102/0x190
[ 4556.200308]  ? __audit_syscall_entry+0xaf/0x100
[ 4556.205387]  ? syscall_trace_enter+0x1d0/0x2b0
[ 4556.210366]  ? __audit_syscall_exit+0x209/0x290
[ 4556.215442]  SyS_sendto+0xe/0x10
[ 4556.219060]  do_syscall_64+0x67/0x1b0
[ 4556.223165]  entry_SYSCALL64_slow_path+0x25/0x25
[ 4556.228328] RIP: 0033:0x7fe0b9db2a63
[ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX:
000000000000002c
[ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX:
00007fe0b9db2a63
[ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI:
000000000000000d
[ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09:
000000000000000c
[ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12:
00007ffc55edc280
[ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15:
0000000000000001
[ 4556.282368] Code:  Bad RIP value.
[ 4556.286629] RIP:           (null) RSP: ffffc900246b7bc8
[ 4556.293013] CR2: 0000000000000000
[ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]---
[ 4556.305465] Kernel panic - not syncing: Fatal exception
[ 4556.313786] Kernel Offset: disabled
[ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception
[ 4556.328960] ------------[ cut here ]------------

Special case RDMA_NL_LS response messages to call the appropriate
callback.

Additionally, make sure that the .dump() callback is not NULL
before calling it.

Fixes: 647c75a ("RDMA/netlink: Convert LS to doit callback")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
zhenyw pushed a commit that referenced this issue Dec 4, 2017
Commit 4f350c6 (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure
properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1)
null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1.

In L1:

BUG: unable to handle kernel paging request at ffffffffc015bf8f
 IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
 PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
 Oops: 0003 [#1] PREEMPT SMP
 CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
 RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
 Call Trace:
 WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002

In L0:

-----------[ cut here ]------------
 WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
 CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc7+ #25
 RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
 Call Trace:
  paging64_page_fault+0x500/0xde0 [kvm]
  ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
  ? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
  ? __asan_storeN+0x12/0x20
  ? paging64_gva_to_gpa+0xb0/0x120 [kvm]
  ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
  ? lock_acquire+0x2c0/0x2c0
  ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
  ? vmx_get_segment+0x2a6/0x310 [kvm_intel]
  ? sched_clock+0x1f/0x30
  ? check_chain_key+0x137/0x1e0
  ? __lock_acquire+0x83c/0x2420
  ? kvm_multiple_exception+0xf2/0x220 [kvm]
  ? debug_check_no_locks_freed+0x240/0x240
  ? debug_smp_processor_id+0x17/0x20
  ? __lock_is_held+0x9e/0x100
  kvm_mmu_page_fault+0x90/0x180 [kvm]
  kvm_handle_page_fault+0x15c/0x310 [kvm]
  ? __lock_is_held+0x9e/0x100
  handle_exception+0x3c7/0x4d0 [kvm_intel]
  vmx_handle_exit+0x103/0x1010 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]

The commit avoids to load host state of vmcs12 as vmcs01's guest state
since vmcs12 is not modified (except for the VM-instruction error field)
if the checking of vmcs control area fails. However, the mmu context is
switched to nested mmu in prepare_vmcs02() and it will not be reloaded
since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME
fails. This patch fixes it by reloading mmu context when nested
VMLAUNCH/VMRESUME fails.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
zhenyw pushed a commit that referenced this issue Dec 18, 2017
This gets rid of the following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc2-CI-Patchwork_7428+ #1 Not tainted
------------------------------------------------------
debugfs_test/1351 is trying to acquire lock:
 (&dev->struct_mutex){+.+.}, at: [<000000009d90d1a3>] i915_mutex_lock_interruptible+0x47/0x130 [i915]

but task is already holding lock:
 (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&mm->mmap_sem){++++}:
       __might_fault+0x63/0x90
       _copy_to_user+0x1e/0x70
       filldir+0x8c/0xf0
       dcache_readdir+0xeb/0x160
       iterate_dir+0xe6/0x150
       SyS_getdents+0xa0/0x130
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #5 (&sb->s_type->i_mutex_key#5){++++}:
       lockref_get+0x9/0x20

-> #4 ((completion)&req.done){+.+.}:
       wait_for_common+0x54/0x210
       devtmpfs_create_node+0x130/0x150
       device_add+0x5ad/0x5e0
       device_create_groups_vargs+0xd4/0xe0
       device_create+0x35/0x40
       msr_device_create+0x22/0x40
       cpuhp_invoke_callback+0xc5/0xbf0
       cpuhp_thread_fun+0x167/0x210
       smpboot_thread_fn+0x17f/0x270
       kthread+0x173/0x1b0
       ret_from_fork+0x24/0x30

-> #3 (cpuhp_state-up){+.+.}:
       cpuhp_issue_call+0x132/0x1c0
       __cpuhp_setup_state_cpuslocked+0x12f/0x2a0
       __cpuhp_setup_state+0x3a/0x50
       page_writeback_init+0x3a/0x5c
       start_kernel+0x393/0x3e2
       secondary_startup_64+0xa5/0xb0

-> #2 (cpuhp_state_mutex){+.+.}:
       __mutex_lock+0x81/0x9b0
       __cpuhp_setup_state_cpuslocked+0x4b/0x2a0
       __cpuhp_setup_state+0x3a/0x50
       page_alloc_init+0x1f/0x26
       start_kernel+0x139/0x3e2
       secondary_startup_64+0xa5/0xb0

-> #1 (cpu_hotplug_lock.rw_sem){++++}:
       cpus_read_lock+0x34/0xa0
       apply_workqueue_attrs+0xd/0x40
       __alloc_workqueue_key+0x2c7/0x4e1
       intel_guc_submission_init+0x10c/0x650 [i915]
       intel_uc_init_hw+0x29e/0x460 [i915]
       i915_gem_init_hw+0xca/0x290 [i915]
       i915_gem_init+0x115/0x3a0 [i915]
       i915_driver_load+0x9a8/0x16c0 [i915]
       i915_pci_probe+0x2e/0x90 [i915]
       pci_device_probe+0x9c/0x120
       driver_probe_device+0x2a3/0x480
       __driver_attach+0xd9/0xe0
       bus_for_each_dev+0x57/0x90
       bus_add_driver+0x168/0x260
       driver_register+0x52/0xc0
       do_one_initcall+0x39/0x150
       do_init_module+0x56/0x1ef
       load_module+0x231c/0x2d70
       SyS_finit_module+0xa5/0xe0
       entry_SYSCALL_64_fastpath+0x1c/0x89

-> #0 (&dev->struct_mutex){+.+.}:
       lock_acquire+0xaf/0x200
       __mutex_lock+0x81/0x9b0
       i915_mutex_lock_interruptible+0x47/0x130 [i915]
       i915_gem_fault+0x201/0x760 [i915]
       __do_fault+0x15/0x70
       __handle_mm_fault+0x85b/0xe40
       handle_mm_fault+0x14f/0x2f0
       __do_page_fault+0x2d1/0x560
       page_fault+0x22/0x30

other info that might help us debug this:

Chain exists of:
  &dev->struct_mutex --> &sb->s_type->i_mutex_key#5 --> &mm->mmap_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key#5);
                               lock(&mm->mmap_sem);
  lock(&dev->struct_mutex);

 *** DEADLOCK ***

1 lock held by debugfs_test/1351:
 #0:  (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

stack backtrace:
CPU: 2 PID: 1351 Comm: debugfs_test Not tainted 4.15.0-rc2-CI-Patchwork_7428+ #1
Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0057.2017.0119.1758 01/19/2017
Call Trace:
 dump_stack+0x5f/0x86
 print_circular_bug+0x230/0x3b0
 check_prev_add+0x439/0x7b0
 ? lockdep_init_map_crosslock+0x20/0x20
 ? unwind_get_return_address+0x16/0x30
 ? __lock_acquire+0x1385/0x15a0
 __lock_acquire+0x1385/0x15a0
 lock_acquire+0xaf/0x200
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 __mutex_lock+0x81/0x9b0
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? i915_mutex_lock_interruptible+0x47/0x130 [i915]
 i915_mutex_lock_interruptible+0x47/0x130 [i915]
 ? __pm_runtime_resume+0x4f/0x80
 i915_gem_fault+0x201/0x760 [i915]
 __do_fault+0x15/0x70
 __handle_mm_fault+0x85b/0xe40
 handle_mm_fault+0x14f/0x2f0
 __do_page_fault+0x2d1/0x560
 page_fault+0x22/0x30
RIP: 0033:0x7f98d6f49116
RSP: 002b:00007ffd6ffc3278 EFLAGS: 00010283
RAX: 00007f98d39a2bc0 RBX: 0000000000000000 RCX: 0000000000001680
RDX: 0000000000001680 RSI: 00007ffd6ffc3400 RDI: 00007f98d39a2bc0
RBP: 00007ffd6ffc33a0 R08: 0000000000000000 R09: 00000000000005a0
R10: 000055e847c2a830 R11: 0000000000000002 R12: 0000000000000001
R13: 000055e847c1d040 R14: 00007ffd6ffc3400 R15: 00007f98d6752ba0

v2: Init preempt_work unconditionally (Chris)
v3: Mention that we need the enable_guc=1 for lockdep splat (Chris)

Testcase: igt/debugfs_test/read_all_entries # with i915.enable_guc=1
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-2-michal.winiarski@intel.com
@jiaochuandong
Copy link

@jiaochuandong jiaochuandong commented Dec 20, 2017

I have an error in Intel 7th Generation Core(i5 7500)

#echo "78a289f9-6004-4de3-9fdc-37f3e2ba69b6" > "/sys/bus/pci/devices/0000:00:02.0/
mdev_supported_types/i915-GVTg_V5_4/create"
bash: echo: write error: Invalid argument

detail:
ubuntu 16.04.3
uname -r
4.12.0+
lsmod | grep kvm
kvm_intel 200704 0
kvmgt 24576 1
mdev 20480 2 kvmgt,vfio_mdev
vfio 28672 3 vfio_iommu_type1,kvmgt,vfio_mdev
kvm 585728 2 kvm_intel,kvmgt
irqbypass 16384 1 kvm
drm 344064 6 kvmgt,i915,drm_kms_helper
cat /boot/grub/grub.cfg
...
linux /boot/vmlinuz-4.12.0+ root=UUID=33b0a2ea-7ee0-45c7-bb60-96192194838e ro quiet splash $vt_handoff i915.enable_gvt=1 kvm.ignore_msrs=1 intel_iommu=on
....

can you gave me some advices?

@TerrenceXu
Copy link

@TerrenceXu TerrenceXu commented Dec 20, 2017

@jiaochuandong
Copy link

@jiaochuandong jiaochuandong commented Dec 21, 2017

Thanks for your help
now i can create vgpu in my device(i5 7500). kernel is 4.15.0-rc4+.
But I don't found skl_dmc_ver1_27.bin and kbl_dmc_ver1_04.bin on https://01.org/linuxgraphics/downloads/firmware

  • sudo dpkg -i linux-image-4.15.0-rc4+_4.15.0-rc4+-7_amd64.deb
    ...
    W: Possible missing firmware /lib/firmware/i915/skl_dmc_ver1_27.bin for module i915
    W: Possible missing firmware /lib/firmware/i915/kbl_dmc_ver1_04.bin for module i915
    I am not sure if this warns influence the final result?

  • i had created vgpu and new.img before I used comment line:
    sudo qemu-system-x86_64 -m 2048 -smp 2 -M pc -name gvt-g-guest -hda new.img -bios /usr/bin/bios.bin -enable-kvm -k en-us -serial stdio -vnc :1 -machine kernel_irqchip=on -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cpu host -usb -usbdevice tablet -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/56f07026-e611-11e7-81b4-f36d1cecf706,rombar=0
    select_vgahw vga is std
    set vendor id(8086) for devfn(0)
    set vendor id(8086) for devfn(8)
    set vendor id(1234) for devfn(10)
    set vendor id(8086) for devfn(18)
    set vendor id(8086) for devfn(9)
    set vendor id(8086) for devfn(a)
    set vendor id(8086) for devfn(b)
    set vendor id(0) for devfn(20)
    qemu-system-x86_64: vfio: Failed to start device
    I don't know how to solve this error.

After that
vncviewer :1
I use lspci on guest's terminal, I can see my VGA:
00:02:0 VGA ....
00:04:0 VGA compatible controller: Intel Corporation Device 5912
I changed xrog.conf
cat /etc/X11/xorg.conf
Section "Device"
Identifier "intel"
Driver "intel"
BusID "PCI:0:4:0" # Sample: "PCI:0:2:0"
EndSection

Section "Screen"
Identifier "intel"
Device "intel"
EndSection
then I reboot ,I can't enter the graphic
screenshot from 2017-12-21 16-58-58
screenshot from 2017-12-21 17-03-03
screenshot from 2017-12-21 18-57-18

@TerrenceXu
Copy link

@TerrenceXu TerrenceXu commented Dec 21, 2017

@jiaochuandong
Copy link

@jiaochuandong jiaochuandong commented Dec 21, 2017

I have added "-vga qxl"
sudo qemu-system-x86_64 -m 2048 -smp 2 -M pc -name gvt-g-guest -hda new.img -bios /usr/bin/bios.bin -enable-kvm -vga qxl -k en-us -serial stdio -vnc :1 -machine kernel_irqchip=on -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -cpu host -usb -usbdevice tablet -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/56f07026-e611-11e7-81b4-f36d1cecf706,rombar=0
screenshot from 2017-12-21 21-18-57

But I just want to use intel's video card rather than virtual video card to play 3D video game or deep learning. I have to modify /etc/X11/xorg.conf
Section "Device"
Identifier "intel"
Driver "intel"
BusID "PCI:0:4:0" # Sample: "PCI:0:2:0"
EndSection

Section "Screen"
Identifier "intel"
Device "intel"
EndSection

use the same command line
unfortunately, I can't see my Desktop.
screenshot from 2017-12-21 21-44-05

Thanks
Chuandong

@TerrenceXu
Copy link

@TerrenceXu TerrenceXu commented Dec 22, 2017

@jiaochuandong
Copy link

@jiaochuandong jiaochuandong commented Dec 22, 2017

when I reboot my host I found I could not find vgpu's uuid in that path(/sys/bus/pci/devices/0000:00:02.0/)
How could I save vgpu device after I create them?

when I create them again I discover an error
echo "4b22bf63-cd51-4b91-9573-b95812783b54" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"
bash: echo: write error: No space left on device

Thanks
Chuandong

@TerrenceXu
Copy link

@TerrenceXu TerrenceXu commented Dec 22, 2017

@jiaochuandong
Copy link

@jiaochuandong jiaochuandong commented Dec 23, 2017

Hello TerrenceXu
Thanks for your help. I think I can connect to guest by vncviewer 192.168.31.5(vm's ip)
I can not use vncviewer :1 to connect the guest. is it true?

But I have two question

  1. when i enter the guest I found two VGA card (02.00 and 04.00). can you give me command which one is working in my guest? I just use glxgears to show value.
    screenshot from 2017-12-23 15-10-59

  2. In my host. I create two or more vgpu. I want to know if I can set dynamic gpu memory?if ture How to set ?

Thanks
Chuandong

@jiaochuandong
Copy link

@jiaochuandong jiaochuandong commented Dec 25, 2017

I restarted the pc before I create three uuid
#uuid -n 3
I want to create 3 vgpu ,but only one can be created

echo "a23c05e4-e90f-11e7-92a8-4fde5fa8e7b7" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"

echo "a23c0666-e90f-11e7-92a9-4be30e41841e" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"

bash: echo: write error: No space left on device

echo "a23c0684-e90f-11e7-92aa-d7f4ad5f3750" > "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create"

bash: echo: write error: No space left on device

pc's memory: 16G
I want to know Why I can not create 3 vcpus?

@TerrenceXu
Copy link

@TerrenceXu TerrenceXu commented Dec 25, 2017

@TerrenceXu
Copy link

@TerrenceXu TerrenceXu commented Dec 25, 2017

zhenyw pushed a commit that referenced this issue Feb 14, 2018
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhenyw pushed a commit that referenced this issue Feb 28, 2018
It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
 #1  0x00007fc08889c2f3 malloc (libc.so.6)
 #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x0000560e6005e75c n/a (urxvt)
 #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x0000560e6005cb55 ev_run (urxvt)
 #9  0x0000560e6003b9b9 main (urxvt)
 #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
 #11 0x0000560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is bd4c82c ("mm,
THP, swap: delay splitting THP after swapped out").

The root cause is as follows:

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance.  But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved.  After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.

This is fixed by refusing to save page in the frontswap store functions
if the page is a THP.  So that the THP will be swapped out to swap
device.

Another choice is to split THP if frontswap is enabled.  But it is found
that the frontswap enabling isn't flexible.  For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.

Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.

Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>	[put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shaohua Li <shli@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org>	[4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhenyw pushed a commit that referenced this issue Mar 14, 2018
Currently we can crash perf record when running in pipe mode, like:

  $ perf record ls | perf report
  # To display the perf.data header info, please use --header/--header-only options.
  #
  perf: Segmentation fault
  Error:
  The - file has no samples!

The callstack of the crash is:

    0x0000000000515242 in perf_event__synthesize_event_update_name
  3513            ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]);
  (gdb) bt
  #0  0x0000000000515242 in perf_event__synthesize_event_update_name
  #1  0x00000000005158a4 in perf_event__synthesize_extra_attr
  #2  0x0000000000443347 in record__synthesize
  #3  0x00000000004438e3 in __cmd_record
  #4  0x000000000044514e in cmd_record
  #5  0x00000000004cbc95 in run_builtin
  #6  0x00000000004cbf02 in handle_internal_command
  #7  0x00000000004cc054 in run_argv
  #8  0x00000000004cc422 in main

The reason of the crash is that the evsel does not have ids array
allocated and the pipe's synthesize code tries to access it.

We don't force evsel ids allocation when we have single event, because
it's not needed. However we need it when we are in pipe mode even for
single event as a key for evsel update event.

Fixing this by forcing evsel ids allocation event for single event, when
we are in pipe mode.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180302161354.30192-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue May 8, 2018
When the module is removed the led workqueue is destroyed in the remove
callback, before the led device is unregistered from the led subsystem.

This leads to a NULL pointer derefence when the led device is
unregistered automatically later as part of the module removal cleanup.
Bellow is the backtrace showing the problem.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: __queue_work+0x8c/0x410
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP NOPTI
  Modules linked in: ccm edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 joydev crypto_simd asus_nb_wmi glue_helper uvcvideo snd_hda_codec_conexant snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel asus_wmi snd_hda_codec cryptd snd_hda_core sparse_keymap videobuf2_vmalloc arc4 videobuf2_memops snd_hwdep input_leds videobuf2_v4l2 ath9k psmouse videobuf2_core videodev ath9k_common snd_pcm ath9k_hw media fam15h_power ath k10temp snd_timer mac80211 i2c_piix4 r8169 mii mac_hid cfg80211 asus_wireless(-) snd soundcore wmi shpchp 8250_dw ip_tables x_tables amdkfd amd_iommu_v2 amdgpu radeon chash i2c_algo_bit drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ahci ttm libahci drm video
  CPU: 3 PID: 2177 Comm: rmmod Not tainted 4.15.0-5-generic #6+dev94.b4287e5bem1-Endless
  Hardware name: ASUSTeK COMPUTER INC. X555DG/X555DG, BIOS 5.011 05/05/2015
  RIP: 0010:__queue_work+0x8c/0x410
  RSP: 0018:ffffbe8cc249fcd8 EFLAGS: 00010086
  RAX: ffff992ac6810800 RBX: 0000000000000000 RCX: 0000000000000008
  RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff992ac6400e18
  RBP: ffffbe8cc249fd18 R08: ffff992ac6400db0 R09: 0000000000000000
  R10: 0000000000000040 R11: ffff992ac6400dd8 R12: 0000000000002000
  R13: ffff992abd762e00 R14: ffff992abd763e38 R15: 000000000001ebe0
  FS:  00007f318203e700(0000) GS:ffff992aced80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 00000001c720e000 CR4: 00000000001406e0
  Call Trace:
   queue_work_on+0x38/0x40
   led_state_set+0x2c/0x40 [asus_wireless]
   led_set_brightness_nopm+0x14/0x40
   led_set_brightness+0x37/0x60
   led_trigger_set+0xfc/0x1d0
   led_classdev_unregister+0x32/0xd0
   devm_led_classdev_release+0x11/0x20
   release_nodes+0x109/0x1f0
   devres_release_all+0x3c/0x50
   device_release_driver_internal+0x16d/0x220
   driver_detach+0x3f/0x80
   bus_remove_driver+0x55/0xd0
   driver_unregister+0x2c/0x40
   acpi_bus_unregister_driver+0x15/0x20
   asus_wireless_driver_exit+0x10/0xb7c [asus_wireless]
   SyS_delete_module+0x1da/0x2b0
   entry_SYSCALL_64_fastpath+0x24/0x87
  RIP: 0033:0x7f3181b65fd7
  RSP: 002b:00007ffe74bcbe18 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3181b65fd7
  RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555ea2559258
  RBP: 0000555ea25591f0 R08: 00007ffe74bcad91 R09: 000000000000000a
  R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000003
  R13: 00007ffe74bcae00 R14: 0000000000000000 R15: 0000555ea25591f0
  Code: 01 00 00 02 0f 85 7d 01 00 00 48 63 45 d4 48 c7 c6 00 f4 fa 87 49 8b 9d 08 01 00 00 48 03 1c c6 4c 89 f7 e8 87 fb ff ff 48 85 c0 <48> 8b 3b 0f 84 c5 01 00 00 48 39 f8 0f 84 bc 01 00 00 48 89 c7
  RIP: __queue_work+0x8c/0x410 RSP: ffffbe8cc249fcd8
  CR2: 0000000000000000
  ---[ end trace 7aa4f4a232e9c39c ]---

Unregistering the led device on the remove callback before destroying the
workqueue avoids this problem.

https://bugzilla.kernel.org/show_bug.cgi?id=196097

Reported-by: Dun Hum <bitter.taste@gmx.com>
Cc: stable@vger.kernel.org
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
zhenyw pushed a commit that referenced this issue May 16, 2018
syzbot caught an infinite recursion in nsh_gso_segment().

Problem here is that we need to make sure the NSH header is of
reasonable length.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by syz-executor0/10189:
 #0:         (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517
 #1:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #1:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #2:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #2:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #3:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #3:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #4:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #4:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #5:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #5:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #6:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #6:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #7:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #7:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #8:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #8:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #9:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #9:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #10:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #10:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #11:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #11:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #12:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #12:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #13:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #13:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #14:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #14:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #15:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #15:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #16:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #16:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #17:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #17:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #18:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #18:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #19:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #19:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #20:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #20:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #21:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #21:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #22:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #22:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #23:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #23:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #24:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #24:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #25:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #25:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #26:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #26:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #27:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #27:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #28:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #28:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #29:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #29:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #30:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #30:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #31:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #31:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
dccp_close: ABORT with 65423 bytes unread
 #32:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #32:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #33:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #33:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #34:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #34:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #35:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #35:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #36:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #36:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #37:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #37:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #38:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #38:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #39:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #39:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #40:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #40:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #41:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #41:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #42:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #42:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #43:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #43:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #44:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #44:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #45:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #45:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #46:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #46:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #47:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #47:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
INFO: lockdep is turned off.
CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #26
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449
 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
 rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
 rcu_read_lock include/linux/rcupdate.h:632 [inline]
 skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4025 [inline]
 validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118
 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168
 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312
 qdisc_restart net/sched/sch_generic.c:399 [inline]
 __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410
 __dev_xmit_skb net/core/dev.c:3243 [inline]
 __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616
 packet_snd net/packet/af_packet.c:2951 [inline]
 packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 __sys_sendto+0x3d7/0x670 net/socket.c:1789
 __do_sys_sendto net/socket.c:1801 [inline]
 __se_sys_sendto net/socket.c:1797 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: c411ed8 ("nsh: add GSO support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Benc <jbenc@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xiongzha pushed a commit that referenced this issue May 29, 2018
Petr Machata says:

====================
net: ip6_gre: Fixes in headroom handling

This series mends some problems in headroom management in ip6_gre
module. The current code base has the following three closely-related
problems:

- ip6gretap tunnels neglect to ensure there's enough writable headroom
  before pushing GRE headers.

- ip6erspan does this, but assumes that dev->needed_headroom is primed.
  But that doesn't happen until ip6_tnl_xmit() is called later. Thus for
  the first packet, ip6erspan actually behaves like ip6gretap above.

- ip6erspan shares some of the code with ip6gretap, including
  calculations of needed header length. While there is custom
  ERSPAN-specific code for calculating the headroom, the computed
  values are overwritten by the ip6gretap code.

The first two issues lead to a kernel panic in situations where a packet
is mirrored from a veth device to the device in question. They are
fixed, respectively, in patches #1 and #2, which include the full panic
trace and a reproducer.

The rest of the patchset deals with the last issue. In patches #3 to #6,
several functions are split up into reusable parts. Finally in patch #7
these blocks are used to compose ERSPAN-specific callbacks where
necessary to fix the hlen calculation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
zhenyw pushed a commit that referenced this issue Jul 24, 2018
Crash dump shows following instructions

crash> bt
PID: 0      TASK: ffffffffbe412480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1
 #1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2
 #2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c
 #3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a
 #4 [ffff891ee00039e0] no_context at ffffffffbd074643
 #5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e
 #6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64
 #7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a
 #8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8
 #9 [ffff891ee0003b50] page_fault at ffffffffbda01925
    [exception RIP: qlt_schedule_sess_for_deletion+15]
    RIP: ffffffffc02e526f  RSP: ffff891ee0003c08  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffffffffc0307847
    RDX: 00000000000020e6  RSI: ffff891edbc377c8  RDI: 0000000000000000
    RBP: ffff891ee0003c18   R8: ffffffffc02f0b20   R9: 0000000000000250
    R10: 0000000000000258  R11: 000000000000b780  R12: ffff891ed9b43000
    R13: 00000000000000f0  R14: 0000000000000006  R15: ffff891edbc377c8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx]
 #11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx]
 #12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx]
 #13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx]
 #14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59
 #15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02
 #16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90
 #17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984
 #18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5
 #19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18
 --- <IRQ stack> ---
 #20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e
    [exception RIP: unknown or invalid address]
    RIP: 000000000000001f  RSP: 0000000000000000  RFLAGS: fff3b8c2091ebb3f
    RAX: ffffbba5a0000200  RBX: 0000be8cdfa8f9fa  RCX: 0000000000000018
    RDX: 0000000000000101  RSI: 000000000000015d  RDI: 0000000000000193
    RBP: 0000000000000083   R8: ffffffffbe403e38   R9: 0000000000000002
    R10: 0000000000000000  R11: ffffffffbe56b820  R12: ffff891ee001cf00
    R13: ffffffffbd11c0a4  R14: ffffffffbe403d60  R15: 0000000000000001
    ORIG_RAX: ffff891ee0022ac0  CS: 0000  SS: ffffffffffffffb9
 bt: WARNING: possibly bogus exception frame
 #21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd
 #22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907
 #23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3
 #24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42
 #25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3
 #26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa
 #27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca
 #28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675
 #29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb
 #30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5

Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login")
Cc: <stable@vger.kernel.org> # v4.17+
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
zhenyw pushed a commit that referenced this issue Jul 31, 2018
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.

	creator                     other
	vsnprintf:
	  fill (not terminated)
	  count the rest            trace_sched_waking(p):
	  ...                         memcpy(comm, p->comm, TASK_COMM_LEN)
	  write \0

The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):

	crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
	0xffffffd5b3818640:     "irq/497-pwr_evenkworker/u16:12"

...and a strcpy out of there would cause stack corruption:

	[224761.522292] Kernel panic - not syncing: stack-protector:
	    Kernel stack is corrupted in: ffffff9bf9783c78

	crash-arm64> kbt | grep 'comm\|trace_print_context'
	#6  0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
	      comm (char [16]) =  "irq/497-pwr_even"

	crash-arm64> rd 0xffffffd4d0e17d14 8
	ffffffd4d0e17d14:  2f71726900000000 5f7277702d373934   ....irq/497-pwr_
	ffffffd4d0e17d24:  726f776b6e657665 3a3631752f72656b   evenkworker/u16:
	ffffffd4d0e17d34:  f9780248ff003231 cede60e0ffffff9b   12..H.x......`..
	ffffffd4d0e17d44:  cede60c8ffffffd4 00000fffffffffd4   .....`..........

The workaround in e09e286 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.

Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.

Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com

Cc: stable@vger.kernel.org
Fixes: bc0c38d ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
zhenyw pushed a commit that referenced this issue Aug 7, 2018
Kernel panic when with high memory pressure, calltrace looks like,

PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
 #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
 #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
 #6 [ffff881ec7ed7838] __node_set at ffffffff81680300
 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
 #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
    [exception RIP: _raw_spin_lock_irqsave+47]
    RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
    RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
    RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
    RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
    R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.

Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.

It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.

Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages")
Cc: stable@vger.kernel.org
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Huang Chong <huang.chong@zte.com.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
zhenyw pushed a commit that referenced this issue Aug 29, 2018
Commit efda1b5 ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Introduced additional hardening for ambiguity in the ACPI spec for
ars_status output sizing. However, it had a couple of cases mixed up.
Where it should have been checking for (and returning) "out_field[1] -
4" it was using "out_field[1] - 8" and vice versa.

This caused a four byte discrepancy in the buffer size passed on to
the command handler, and in some cases, this caused memory corruption
like:

  ./daxdev-errors.sh: line 76: 24104 Aborted   (core dumped) ./daxdev-errors $busdev $region
  malloc(): memory corruption
  Program received signal SIGABRT, Aborted.
  [...]
  #5  0x00007ffff7865a2e in calloc () from /lib64/libc.so.6
  #6  0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136
  #7  0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144
  #8  test_daxdev_clear_error (region_name=<optimized out>, bus_name=<optimized out>)
      at daxdev-errors.c:332

Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Lukasz Dorau <lukasz.dorau@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Fixes: efda1b5 ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-of-by: Dave Jiang <dave.jiang@intel.com>
zhenyw pushed a commit that referenced this issue Aug 29, 2018
Currently, whenever a new node is created/re-used from the memhotplug
path, we call free_area_init_node()->free_area_init_core().  But there is
some code that we do not really need to run when we are coming from such
path.

free_area_init_core() performs the following actions:

1) Initializes pgdat internals, such as spinlock, waitqueues and more.
2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on
   when creating hash tables.
3) Account number of managed_pages per zone, substracting dma_reserved and
   memmap pages.
4) Initializes some fields of the zone structure data
5) Calls init_currently_empty_zone to initialize all the freelists
6) Calls memmap_init to initialize all pages belonging to certain zone

When called from memhotplug path, free_area_init_core() only performs
actions #1 and #4.

Action #2 is pointless as the zones do not have any pages since either the
node was freed, or we are re-using it, eitherway all zones belonging to
this node should have 0 pages.  For the same reason, action #3 results
always in manages_pages being 0.

Action #5 and #6 are performed later on when onlining the pages:
 online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone()
 online_pages()->move_pfn_range_to_zone()->memmap_init_zone()

This patch does two things:

First, moves the node/zone initializtion to their own function, so it
allows us to create a small version of free_area_init_core, where we only
perform:

1) Initialization of pgdat internals, such as spinlock, waitqueues and more
4) Initialization of some fields of the zone structure data

These two functions are: pgdat_init_internals() and zone_init_internals().

The second thing this patch does, is to introduce
free_area_init_core_hotplug(), the memhotplug version of
free_area_init_core():

Currently, we call free_area_init_node() from the memhotplug path.  In
there, we set some pgdat's fields, and call calculate_node_totalpages().
calculate_node_totalpages() calculates the # of pages the node has.

Since the node is either new, or we are re-using it, the zones belonging
to this node should not have any pages, so there is no point to calculate
this now.

Actually, we re-set these values to 0 later on with the calls to:

reset_node_managed_pages()
reset_node_present_pages()

The # of pages per node and the # of pages per zone will be calculated when
onlining the pages:

online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range()
online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range()

Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
__paginginit with __init, so their code gets freed up.

[osalvador@techadventures.net: fix section usage]
  Link: http://lkml.kernel.org/r/20180731101752.GA473@techadventures.net
[osalvador@suse.de: v6]
  Link: http://lkml.kernel.org/r/20180801122348.21588-6-osalvador@techadventures.net
Link: http://lkml.kernel.org/r/20180730101757.28058-5-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhenyw pushed a commit that referenced this issue Aug 29, 2018
Patch series "add support for relative references in special sections", v10.

This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references.  This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata for
these sections in relocatable kernels (e.g., for KASLR) that needs to be
fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
4 byte relative reference)

Patch #3 was sent out before as a single patch.  This series supersedes
the previous submission.  This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which
architectures it should be blacklisted.

Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.

Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.

Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
means we save about 28 KB of vmlinux space for each of these patches.

[From the v7 series blurb, which included the jump_label patches as well]:

  For the arm64 kernel, all patches combined reduce the memory footprint
  of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
  KASLR enabled), of which ~1 MB is the size reduction of the RELA section
  in .init, and the remaining 300 KB is reduction of .text/.data.

This patch (of 6):

Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the number of
absolute relocations that need to be processed at runtime by relocatable
kernels, introduce the Kconfig symbol and define it for some architectures
that should be able to support and benefit from it.

Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: James Morris <jmorris@namei.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Cc: James Morris <james.morris@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhenyw pushed a commit that referenced this issue Jan 6, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
zhenyw pushed a commit that referenced this issue Jan 18, 2021
Like other tunneling interfaces, the bareudp doesn't need TXLOCK.
So, It is good to set the NETIF_F_LLTX flag to improve performance and
to avoid lockdep's false-positive warning.

Test commands:
    ip netns add A
    ip netns add B
    ip link add veth0 netns A type veth peer name veth1 netns B
    ip netns exec A ip link set veth0 up
    ip netns exec A ip a a 10.0.0.1/24 dev veth0
    ip netns exec B ip link set veth1 up
    ip netns exec B ip a a 10.0.0.2/24 dev veth1

    for i in {2..1}
    do
            let A=$i-1
            ip netns exec A ip link add bareudp$i type bareudp \
		    dstport $i ethertype ip
            ip netns exec A ip link set bareudp$i up
            ip netns exec A ip a a 10.0.$i.1/24 dev bareudp$i
            ip netns exec A ip r a 10.0.$i.2 encap ip src 10.0.$A.1 \
		    dst 10.0.$A.2 via 10.0.$i.2 dev bareudp$i

            ip netns exec B ip link add bareudp$i type bareudp \
		    dstport $i ethertype ip
            ip netns exec B ip link set bareudp$i up
            ip netns exec B ip a a 10.0.$i.2/24 dev bareudp$i
            ip netns exec B ip r a 10.0.$i.1 encap ip src 10.0.$A.2 \
		    dst 10.0.$A.1 via 10.0.$i.1 dev bareudp$i
    done
    ip netns exec A ping 10.0.2.2

Splat looks like:
[   96.992803][  T822] ============================================
[   96.993954][  T822] WARNING: possible recursive locking detected
[   96.995102][  T822] 5.10.0+ #819 Not tainted
[   96.995927][  T822] --------------------------------------------
[   96.997091][  T822] ping/822 is trying to acquire lock:
[   96.998083][  T822] ffff88810f753898 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[   96.999813][  T822]
[   96.999813][  T822] but task is already holding lock:
[   97.001192][  T822] ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[   97.002908][  T822]
[   97.002908][  T822] other info that might help us debug this:
[   97.004401][  T822]  Possible unsafe locking scenario:
[   97.004401][  T822]
[   97.005784][  T822]        CPU0
[   97.006407][  T822]        ----
[   97.007010][  T822]   lock(_xmit_NONE#2);
[   97.007779][  T822]   lock(_xmit_NONE#2);
[   97.008550][  T822]
[   97.008550][  T822]  *** DEADLOCK ***
[   97.008550][  T822]
[   97.010057][  T822]  May be due to missing lock nesting notation
[   97.010057][  T822]
[   97.011594][  T822] 7 locks held by ping/822:
[   97.012426][  T822]  #0: ffff888109a144f0 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x12f7/0x2b00
[   97.014191][  T822]  #1: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
[   97.016045][  T822]  #2: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
[   97.017897][  T822]  #3: ffff88810c385498 (_xmit_NONE#2){+.-.}-{2:2}, at: __dev_queue_xmit+0x1f52/0x2960
[   97.019684][  T822]  #4: ffffffffbce2f600 (rcu_read_lock){....}-{1:2}, at: bareudp_xmit+0x31b/0x3690 [bareudp]
[   97.021573][  T822]  #5: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0x249/0x2020
[   97.023424][  T822]  #6: ffffffffbce2f5a0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1fd/0x2960
[   97.025259][  T822]
[   97.025259][  T822] stack backtrace:
[   97.026349][  T822] CPU: 3 PID: 822 Comm: ping Not tainted 5.10.0+ #819
[   97.027609][  T822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[   97.029407][  T822] Call Trace:
[   97.030015][  T822]  dump_stack+0x99/0xcb
[   97.030783][  T822]  __lock_acquire.cold.77+0x149/0x3a9
[   97.031773][  T822]  ? stack_trace_save+0x81/0xa0
[   97.032661][  T822]  ? register_lock_class+0x1910/0x1910
[   97.033673][  T822]  ? register_lock_class+0x1910/0x1910
[   97.034679][  T822]  ? rcu_read_lock_sched_held+0x91/0xc0
[   97.035697][  T822]  ? rcu_read_lock_bh_held+0xa0/0xa0
[   97.036690][  T822]  lock_acquire+0x1b2/0x730
[   97.037515][  T822]  ? __dev_queue_xmit+0x1f52/0x2960
[   97.038466][  T822]  ? check_flags+0x50/0x50
[   97.039277][  T822]  ? netif_skb_features+0x296/0x9c0
[   97.040226][  T822]  ? validate_xmit_skb+0x29/0xb10
[   97.041151][  T822]  _raw_spin_lock+0x30/0x70
[   97.041977][  T822]  ? __dev_queue_xmit+0x1f52/0x2960
[   97.042927][  T822]  __dev_queue_xmit+0x1f52/0x2960
[   97.043852][  T822]  ? netdev_core_pick_tx+0x290/0x290
[   97.044824][  T822]  ? mark_held_locks+0xb7/0x120
[   97.045712][  T822]  ? lockdep_hardirqs_on_prepare+0x12c/0x3e0
[   97.046824][  T822]  ? __local_bh_enable_ip+0xa5/0xf0
[   97.047771][  T822]  ? ___neigh_create+0x12a8/0x1eb0
[   97.048710][  T822]  ? trace_hardirqs_on+0x41/0x120
[   97.049626][  T822]  ? ___neigh_create+0x12a8/0x1eb0
[   97.050556][  T822]  ? __local_bh_enable_ip+0xa5/0xf0
[   97.051509][  T822]  ? ___neigh_create+0x12a8/0x1eb0
[   97.052443][  T822]  ? check_chain_key+0x244/0x5f0
[   97.053352][  T822]  ? rcu_read_lock_bh_held+0x56/0xa0
[   97.054317][  T822]  ? ip_finish_output2+0x6ea/0x2020
[   97.055263][  T822]  ? pneigh_lookup+0x410/0x410
[   97.056135][  T822]  ip_finish_output2+0x6ea/0x2020
[ ... ]

Acked-by: Guillaume Nault <gnault@redhat.com>
Fixes: 571912c ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201228152136.24215-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
zhenyw pushed a commit that referenced this issue Jan 18, 2021
We had kernel panic, it is caused by unload module and last
close confirmation.

call trace:
[1196029.743127]  free_sess+0x15/0x50 [rtrs_client]
[1196029.743128]  rtrs_clt_close+0x4c/0x70 [rtrs_client]
[1196029.743129]  ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
[1196029.743130]  close_rtrs+0x25/0x50 [rnbd_client]
[1196029.743131]  rnbd_client_exit+0x93/0xb99 [rnbd_client]
[1196029.743132]  __x64_sys_delete_module+0x190/0x260

And in the crashdump confirmation kworker is also running.
PID: 6943   TASK: ffff9e2ac8098000  CPU: 4   COMMAND: "kworker/4:2"
 #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
 #1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
 #2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
 #3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
 #4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
 #5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
 #6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
 #7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
 #8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
 #9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]

The problem is both code path try to close same session, which lead to
panic.

To fix it, just skip the sess if the refcount already drop to 0.

Fixes: f7a7a5c ("block/rnbd: client: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
zhenyw pushed a commit that referenced this issue Jan 18, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
zhenyw pushed a commit that referenced this issue Jan 29, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
zhenyw pushed a commit that referenced this issue Feb 22, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
zhenyw pushed a commit that referenced this issue Mar 16, 2021
Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
I met below warning when cating a small size(about 80bytes) txt file
on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we
miss iov_iter_advance() if the read count is 0 for zerocopy case, so
we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is
full. Fix it by removing the exception for 0 to ensure to call
iov_iter_advance() even on empty read for zerocopy case.

[    8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40
[    8.280028] Modules linked in:
[    8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6
[    8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40
[    8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b
[    8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246
[    8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000
[    8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90
[    8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058
[    8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050
[    8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600
[    8.285439] FS:  00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[    8.285844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0
[    8.286710] Call Trace:
[    8.288279]  generic_file_splice_read+0x31/0x1a0
[    8.289273]  ? do_splice_to+0x2f/0x90
[    8.289511]  splice_direct_to_actor+0xcc/0x220
[    8.289788]  ? pipe_to_sendpage+0xa0/0xa0
[    8.290052]  do_splice_direct+0x8b/0xd0
[    8.290314]  do_sendfile+0x1ad/0x470
[    8.290576]  do_syscall_64+0x2d/0x40
[    8.290818]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    8.291409] RIP: 0033:0x7f24fd7dca0a
[    8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8
[    8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028
[    8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a
[    8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
[    8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[    8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003
[    8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
[    8.295782] ---[ end trace 63317af81b3ca24b ]---

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

  # perf test -v 4
   4: Read samples using the mmap interface      :
  --- start ---
  test child forked, pid 139782
  mmap size 528384B

  =================================================================
  ==139782==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f1f76daee8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x564ba21a0fea in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x564ba21a1a0f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x564ba21a21cf in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x564ba21a21cf in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x564ba1e48298 in test__basic_mmap tests/mmap-basic.c:55
    #6 0x564ba1e278fb in run_test tests/builtin-test.c:428
    #7 0x564ba1e278fb in test_and_print tests/builtin-test.c:458
    #8 0x564ba1e29a53 in __cmd_test tests/builtin-test.c:679
    #9 0x564ba1e29a53 in cmd_test tests/builtin-test.c:825
    #10 0x564ba1e95cb4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x564ba1d1fa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x564ba1d1fa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x564ba1d1fa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7f1f768e4d09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Read samples using the mmap interface: FAILED!
  failed to open shell test directory: /home/namhyung/libexec/perf-core/tests/shell

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lore.kernel.org/r/20210301140409.184570-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

Also change the goto label since it doesn't need to have two.

  # perf test -v 24
  24: Number of exit events of a simple workload :
  --- start ---
  test child forked, pid 145915
  mmap size 528384B

  =================================================================
  ==145915==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fc44e50d1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x561cf50f4d2e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x561cf4eeb949 in thread_map__new_by_tid util/thread_map.c:63
    #3 0x561cf4db7fd2 in test__task_exit tests/task-exit.c:74
    #4 0x561cf4d798fb in run_test tests/builtin-test.c:428
    #5 0x561cf4d798fb in test_and_print tests/builtin-test.c:458
    #6 0x561cf4d7ba53 in __cmd_test tests/builtin-test.c:679
    #7 0x561cf4d7ba53 in cmd_test tests/builtin-test.c:825
    #8 0x561cf4de7d04 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #9 0x561cf4c71a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #10 0x561cf4c71a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #11 0x561cf4c71a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #12 0x7fc44e042d09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Number of exit events of a simple workload: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
The evlist has the maps with its own refcounts so we don't need to set
the pointers to NULL.  Otherwise following error was reported by Asan.

Also change the goto label since it doesn't need to have two.

  # perf test -v 25
  25: Software clock events period values        :
  --- start ---
  test child forked, pid 149154
  mmap size 528384B
  mmap size 528384B

  =================================================================
  ==149154==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fef5cd071f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x56260d5e8b8e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x56260d3df7a9 in thread_map__new_by_tid util/thread_map.c:63
    #3 0x56260d2ac6b2 in __test__sw_clock_freq tests/sw-clock.c:65
    #4 0x56260d26d8fb in run_test tests/builtin-test.c:428
    #5 0x56260d26d8fb in test_and_print tests/builtin-test.c:458
    #6 0x56260d26fa53 in __cmd_test tests/builtin-test.c:679
    #7 0x56260d26fa53 in cmd_test tests/builtin-test.c:825
    #8 0x56260d2dbb64 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #9 0x56260d165a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #10 0x56260d165a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #11 0x56260d165a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #12 0x7fef5c83cd09 in __libc_start_main ../csu/libc-start.c:308

    ...
  test child finished with 1
  ---- end ----
  Software clock events period values      : FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
The evlist and the cpu/thread maps should be released together.
Otherwise following error was reported by Asan.

Note that this test still has memory leaks in DSOs so it still fails
even after this change.  I'll take a look at that too.

  # perf test -v 26
  26: Object code reading                        :
  --- start ---
  test child forked, pid 154184
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  symsrc__init: cannot get elf header.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
  Parsing event 'cycles'
  mmap size 528384B
  ...
  =================================================================
  ==154184==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fcb66e77037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55ad9b7e821e in dso__new_id util/dso.c:1256
    #2 0x55ad9b8cfd4a in __machine__addnew_vdso util/vdso.c:132
    #3 0x55ad9b8cfd4a in machine__findnew_vdso util/vdso.c:347
    #4 0x55ad9b845b7e in map__new util/map.c:176
    #5 0x55ad9b8415a2 in machine__process_mmap2_event util/machine.c:1787
    #6 0x55ad9b8fab16 in perf_tool__process_synth_event util/synthetic-events.c:64
    #7 0x55ad9b8fab16 in perf_event__synthesize_mmap_events util/synthetic-events.c:499
    #8 0x55ad9b8fbfdf in __event__synthesize_thread util/synthetic-events.c:741
    #9 0x55ad9b8ff3e3 in perf_event__synthesize_thread_map util/synthetic-events.c:833
    #10 0x55ad9b738585 in do_test_code_reading tests/code-reading.c:608
    #11 0x55ad9b73b25d in test__code_reading tests/code-reading.c:722
    #12 0x55ad9b6f28fb in run_test tests/builtin-test.c:428
    #13 0x55ad9b6f28fb in test_and_print tests/builtin-test.c:458
    #14 0x55ad9b6f4a53 in __cmd_test tests/builtin-test.c:679
    #15 0x55ad9b6f4a53 in cmd_test tests/builtin-test.c:825
    #16 0x55ad9b760cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x55ad9b5eaa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x55ad9b5eaa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x55ad9b5eaa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fcb669acd09 in __libc_start_main ../csu/libc-start.c:308

    ...
  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Object code reading: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
The evlist and the cpu/thread maps should be released together.
Otherwise following error was reported by Asan.

  $ perf test -v 28
  28: Use a dummy software event to keep tracking:
  --- start ---
  test child forked, pid 156810
  mmap size 528384B

  =================================================================
  ==156810==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f637d2bce8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55cc6295cffa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55cc6295da1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x55cc6295e1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x55cc6295e1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x55cc626287cf in test__keep_tracking tests/keep-tracking.c:84
    #6 0x55cc625e38fb in run_test tests/builtin-test.c:428
    #7 0x55cc625e38fb in test_and_print tests/builtin-test.c:458
    #8 0x55cc625e5a53 in __cmd_test tests/builtin-test.c:679
    #9 0x55cc625e5a53 in cmd_test tests/builtin-test.c:825
    #10 0x55cc62651cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x55cc624dba88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x55cc624dba88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x55cc624dba88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7f637cdf2d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Use a dummy software event to keep tracking: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
The evlist and cpu/thread maps should be released together.
Otherwise the following error was reported by Asan.

  $ perf test -v 35
  35: Track with sched_switch                    :
  --- start ---
  test child forked, pid 159287
  Using CPUID GenuineIntel-6-8E-C
  mmap size 528384B
  1295 events recorded

  =================================================================
  ==159287==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7fa28d9a2e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x5652f5a5affa in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x5652f5a5ba1f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x5652f5a5c1df in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x5652f5a5c1df in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x5652f5723bbf in test__switch_tracking tests/switch-tracking.c:350
    #6 0x5652f56e18fb in run_test tests/builtin-test.c:428
    #7 0x5652f56e18fb in test_and_print tests/builtin-test.c:458
    #8 0x5652f56e3a53 in __cmd_test tests/builtin-test.c:679
    #9 0x5652f56e3a53 in cmd_test tests/builtin-test.c:825
    #10 0x5652f574fcc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x5652f55d9a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x5652f55d9a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x5652f55d9a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7fa28d4d8d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Track with sched_switch: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
It missed to call perf_thread_map__put() after using the map.

  $ perf test -v 43
  43: Synthesize thread map                      :
  --- start ---
  test child forked, pid 162640

  =================================================================
  ==162640==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fd48cdaa1f8 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x563e6d5f8d0e in perf_thread_map__realloc /home/namhyung/project/linux/tools/lib/perf/threadmap.c:23
    #2 0x563e6d3ef69a in thread_map__new_by_pid util/thread_map.c:46
    #3 0x563e6d2cec90 in test__thread_map_synthesize tests/thread-map.c:97
    #4 0x563e6d27d8fb in run_test tests/builtin-test.c:428
    #5 0x563e6d27d8fb in test_and_print tests/builtin-test.c:458
    #6 0x563e6d27fa53 in __cmd_test tests/builtin-test.c:679
    #7 0x563e6d27fa53 in cmd_test tests/builtin-test.c:825
    #8 0x563e6d2ebce4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #9 0x563e6d175a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #10 0x563e6d175a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #11 0x563e6d175a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #12 0x7fd48c8dfd09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 8224 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Synthesize thread map: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
It should be released after printing the map.

  $ perf test -v 52
  52: Print cpu map                              :
  --- start ---
  test child forked, pid 172233

  =================================================================
  ==172233==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 156 byte(s) in 1 object(s) allocated from:
    #0 0x7fc472518e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55e63b378f7a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55e63b37a05c in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:237
    #3 0x55e63b056d16 in cpu_map_print tests/cpumap.c:102
    #4 0x55e63b056d16 in test__cpu_map_print tests/cpumap.c:120
    #5 0x55e63afff8fb in run_test tests/builtin-test.c:428
    #6 0x55e63afff8fb in test_and_print tests/builtin-test.c:458
    #7 0x55e63b001a53 in __cmd_test tests/builtin-test.c:679
    #8 0x55e63b001a53 in cmd_test tests/builtin-test.c:825
    #9 0x55e63b06dc44 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #10 0x55e63aef7a88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #11 0x55e63aef7a88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #12 0x55e63aef7a88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #13 0x7fc47204ed09 in __libc_start_main ../csu/libc-start.c:308
  ...

  SUMMARY: AddressSanitizer: 448 byte(s) leaked in 7 allocation(s).
  test child finished with 1
  ---- end ----
  Print cpu map: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
It should release the maps at the end.

  $ perf test -v 71
  71: Convert perf time to TSC                   :
  --- start ---
  test child forked, pid 178744
  mmap size 528384B
  1st event perf time 59207256505278 tsc 13187166645142
  rdtsc          time 59207256542151 tsc 13187166723020
  2nd event perf time 59207256543749 tsc 13187166726393

  =================================================================
  ==178744==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7faf601f9e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55b620cfc00a in cpu_map__trim_new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:79
    #2 0x55b620cfca2f in perf_cpu_map__read /home/namhyung/project/linux/tools/lib/perf/cpumap.c:149
    #3 0x55b620cfd1ef in cpu_map__read_all_cpu_map /home/namhyung/project/linux/tools/lib/perf/cpumap.c:166
    #4 0x55b620cfd1ef in perf_cpu_map__new /home/namhyung/project/linux/tools/lib/perf/cpumap.c:181
    #5 0x55b6209ef1b2 in test__perf_time_to_tsc tests/perf-time-to-tsc.c:73
    #6 0x55b6209828fb in run_test tests/builtin-test.c:428
    #7 0x55b6209828fb in test_and_print tests/builtin-test.c:458
    #8 0x55b620984a53 in __cmd_test tests/builtin-test.c:679
    #9 0x55b620984a53 in cmd_test tests/builtin-test.c:825
    #10 0x55b6209f0cd4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #11 0x55b62087aa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #12 0x55b62087aa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #13 0x55b62087aa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #14 0x7faf5fd2fd09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 72 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Convert perf time to TSC: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Mar 16, 2021
I got a segfault when using -r option with event groups.  The option
makes it run the workload multiple times and it will reuse the evlist
and evsel for each run.

While most of resources are allocated and freed properly, the id hash
in the evlist was not and it resulted in the bug.  You can see it with
the address sanitizer like below:

  $ perf stat -r 100 -e '{cycles,instructions}' true
  =================================================================
  ==693052==ERROR: AddressSanitizer: heap-use-after-free on
      address 0x6080000003d0 at pc 0x558c57732835 bp 0x7fff1526adb0 sp 0x7fff1526ada8
  WRITE of size 8 at 0x6080000003d0 thread T0
    #0 0x558c57732834 in hlist_add_head /home/namhyung/project/linux/tools/include/linux/list.h:644
    #1 0x558c57732834 in perf_evlist__id_hash /home/namhyung/project/linux/tools/lib/perf/evlist.c:237
    #2 0x558c57732834 in perf_evlist__id_add /home/namhyung/project/linux/tools/lib/perf/evlist.c:244
    #3 0x558c57732834 in perf_evlist__id_add_fd /home/namhyung/project/linux/tools/lib/perf/evlist.c:285
    #4 0x558c5747733e in store_evsel_ids util/evsel.c:2765
    #5 0x558c5747733e in evsel__store_ids util/evsel.c:2782
    #6 0x558c5730b717 in __run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:895
    #7 0x558c5730b717 in run_perf_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:1014
    #8 0x558c5730b717 in cmd_stat /home/namhyung/project/linux/tools/perf/builtin-stat.c:2446
    #9 0x558c57427c24 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #10 0x558c572b1a48 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #11 0x558c572b1a48 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #12 0x558c572b1a48 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #13 0x7fcadb9f7d09 in __libc_start_main ../csu/libc-start.c:308
    #14 0x558c572b60f9 in _start (/home/namhyung/project/linux/tools/perf/perf+0x45d0f9)

Actually the nodes in the hash table are struct perf_stream_id and
they were freed in the previous run.  Fix it by resetting the hash.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210225035148.778569-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Apr 12, 2021
I got several memory leak reports from Asan with a simple command.  It
was because VDSO is not released due to the refcount.  Like in
__dsos_addnew_id(), it should put the refcount after adding to the list.

  $ perf record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]

  =================================================================
  ==692599==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
    #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
    #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
    #4 0x559bce50826c in map__new util/map.c:175
    #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #10 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #11 0x559bce519bea in perf_session__process_events util/session.c:2297
    #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  Indirect leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
    #2 0x559bce50821b in map__new util/map.c:168
    #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #8 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #9 0x559bce519bea in perf_session__process_events util/session.c:2297
    #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Apr 12, 2021
For each device, the nosy driver allocates a pcilynx structure.
A use-after-free might happen in the following scenario:

 1. Open nosy device for the first time and call ioctl with command
    NOSY_IOC_START, then a new client A will be malloced and added to
    doubly linked list.
 2. Open nosy device for the second time and call ioctl with command
    NOSY_IOC_START, then a new client B will be malloced and added to
    doubly linked list.
 3. Call ioctl with command NOSY_IOC_START for client A, then client A
    will be readded to the doubly linked list. Now the doubly linked
    list is messed up.
 4. Close the first nosy device and nosy_release will be called. In
    nosy_release, client A will be unlinked and freed.
 5. Close the second nosy device, and client A will be referenced,
    resulting in UAF.

The root cause of this bug is that the element in the doubly linked list
is reentered into the list.

Fix this bug by adding a check before inserting a client.  If a client
is already in the linked list, don't insert it.

The following KASAN report reveals it:

   BUG: KASAN: use-after-free in nosy_release+0x1ea/0x210
   Write of size 8 at addr ffff888102ad7360 by task poc
   CPU: 3 PID: 337 Comm: poc Not tainted 5.12.0-rc5+ #6
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
   Call Trace:
     nosy_release+0x1ea/0x210
     __fput+0x1e2/0x840
     task_work_run+0xe8/0x180
     exit_to_user_mode_prepare+0x114/0x120
     syscall_exit_to_user_mode+0x1d/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

   Allocated by task 337:
     nosy_open+0x154/0x4d0
     misc_open+0x2ec/0x410
     chrdev_open+0x20d/0x5a0
     do_dentry_open+0x40f/0xe80
     path_openat+0x1cf9/0x37b0
     do_filp_open+0x16d/0x390
     do_sys_openat2+0x11d/0x360
     __x64_sys_open+0xfd/0x1a0
     do_syscall_64+0x33/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

   Freed by task 337:
     kfree+0x8f/0x210
     nosy_release+0x158/0x210
     __fput+0x1e2/0x840
     task_work_run+0xe8/0x180
     exit_to_user_mode_prepare+0x114/0x120
     syscall_exit_to_user_mode+0x1d/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

   The buggy address belongs to the object at ffff888102ad7300 which belongs to the cache kmalloc-128 of size 128
   The buggy address is located 96 bytes inside of 128-byte region [ffff888102ad7300, ffff888102ad7380)

[ Modified to use 'list_empty()' inside proper lock  - Linus ]

Link: https://lore.kernel.org/lkml/1617433116-5930-1-git-send-email-zheyuma97@gmail.com/
Reported-and-tested-by: 马哲宇 (Zheyu Ma) <zheyuma97@gmail.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhenyw pushed a commit that referenced this issue Apr 15, 2021
The following deadlock is detected:

  truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

  PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
   #0  __schedule at ffffffff818667cc
   #1  schedule at ffffffff81866de6
   #2  inode_dio_wait at ffffffff812a2d04
   #3  ocfs2_setattr at ffffffffc05f322e [ocfs2]
   #4  notify_change at ffffffff812a5a09
   #5  do_truncate at ffffffff812808f5
   #6  do_sys_ftruncate.constprop.18 at ffffffff81280cf2
   #7  sys_ftruncate at ffffffff81280d8e
   #8  do_syscall_64 at ffffffff81003949
   #9  entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:

   #0  __schedule+700 at ffffffff818667cc
   #1  schedule+54 at ffffffff81866de6
   #2  rwsem_down_write_failed+536 at ffffffff8186aa28
   #3  call_rwsem_down_write_failed+23 at ffffffff8185a1b7
   #4  down_write+45 at ffffffff81869c9d
   #5  ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
   #6  ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
   #7  dio_complete+140 at ffffffff812c873c
   #8  dio_aio_complete_work+25 at ffffffff812c89f9
   #9  process_one_work+361 at ffffffff810b1889
  #10  worker_thread+77 at ffffffff810b233d
  #11  kthread+261 at ffffffff810b7fd5
  #12  ret_from_fork+62 at ffffffff81a0035e

Thus above forms ABBA deadlock.  The same deadlock was mentioned in
upstream commit 28f5a8a ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()").  It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.

End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.

This is to fix the deadlock itself.  It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.

[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
  Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com

Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zhenyw pushed a commit that referenced this issue Jun 1, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
zhenyw pushed a commit that referenced this issue Jun 18, 2021
ASan reported a memory leak caused by info_linear not being deallocated.

The info_linear was allocated during in perf_event__synthesize_one_bpf_prog().

This patch adds the corresponding free() when bpf_prog_info_node
is freed in perf_env__purge_bpf().

  $ sudo ./perf record -- sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]

  =================================================================
  ==297735==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 7688 byte(s) in 19 object(s) allocated from:
      #0 0x4f420f in malloc (/home/user/linux/tools/perf/perf+0x4f420f)
      #1 0xc06a74 in bpf_program__get_prog_info_linear /home/user/linux/tools/lib/bpf/libbpf.c:11113:16
      #2 0xb426fe in perf_event__synthesize_one_bpf_prog /home/user/linux/tools/perf/util/bpf-event.c:191:16
      #3 0xb42008 in perf_event__synthesize_bpf_events /home/user/linux/tools/perf/util/bpf-event.c:410:9
      #4 0x594596 in record__synthesize /home/user/linux/tools/perf/builtin-record.c:1490:8
      #5 0x58c9ac in __cmd_record /home/user/linux/tools/perf/builtin-record.c:1798:8
      #6 0x58990b in cmd_record /home/user/linux/tools/perf/builtin-record.c:2901:8
      #7 0x7b2a20 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
      #8 0x7b12ff in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
      #9 0x7b2583 in run_argv /home/user/linux/tools/perf/perf.c:409:2
      #10 0x7b0d79 in main /home/user/linux/tools/perf/perf.c:539:3
      #11 0x7fa357ef6b74 in __libc_start_main /usr/src/debug/glibc-2.33-8.fc34.x86_64/csu/../csu/libc-start.c:332:16

Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20210602224024.300485-1-rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhenyw pushed a commit that referenced this issue Jun 18, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
zhenyw pushed a commit that referenced this issue Jun 30, 2021
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
zhiwang1 pushed a commit that referenced this issue Jan 13, 2022
In line:
	upper = info->upper_dev;
We access upper_dev field, which is related only for particular events
(e.g. event == NETDEV_CHANGEUPPER). So, this line cause invalid memory
access for another events,
when ptr is not netdev_notifier_changeupper_info.

The KASAN logs are as follows:

[   30.123165] BUG: KASAN: stack-out-of-bounds in prestera_netdev_port_event.constprop.0+0x68/0x538 [prestera]
[   30.133336] Read of size 8 at addr ffff80000cf772b0 by task udevd/778
[   30.139866]
[   30.141398] CPU: 0 PID: 778 Comm: udevd Not tainted 5.16.0-rc3 #6
[   30.147588] Hardware name: DNI AmazonGo1 A7040 board (DT)
[   30.153056] Call trace:
[   30.155547]  dump_backtrace+0x0/0x2c0
[   30.159320]  show_stack+0x18/0x30
[   30.162729]  dump_stack_lvl+0x68/0x84
[   30.166491]  print_address_description.constprop.0+0x74/0x2b8
[   30.172346]  kasan_report+0x1e8/0x250
[   30.176102]  __asan_load8+0x98/0xe0
[   30.179682]  prestera_netdev_port_event.constprop.0+0x68/0x538 [prestera]
[   30.186847]  prestera_netdev_event_handler+0x1b4/0x1c0 [prestera]
[   30.193313]  raw_notifier_call_chain+0x74/0xa0
[   30.197860]  call_netdevice_notifiers_info+0x68/0xc0
[   30.202924]  register_netdevice+0x3cc/0x760
[   30.207190]  register_netdev+0x24/0x50
[   30.211015]  prestera_device_register+0x8a0/0xba0 [prestera]

Fixes: 3d5048c ("net: marvell: prestera: move netdev topology validation to prestera_main")
Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu>
Link: https://lore.kernel.org/r/20211216171714.11341-1-yevhen.orlov@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
zhiwang1 pushed a commit that referenced this issue Jan 13, 2022
The fixed commit attempts to close inject.output even if it was never
opened e.g.

  $ perf record uname
  Linux
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
  $ perf inject -i perf.data --vm-time-correlation=dry-run
  Segmentation fault (core dumped)
  $ gdb --quiet perf
  Reading symbols from perf...
  (gdb) r inject -i perf.data --vm-time-correlation=dry-run
  Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

  Program received signal SIGSEGV, Segmentation fault.
  0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
  48      iofclose.c: No such file or directory.
  (gdb) bt
  #0  0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48
  #1  0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376
  #2  0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085
  #3  0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313
  #4  0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
  #5  run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
  #6  main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539
  (gdb)

Fixes: 02e6246 ("perf inject: Close inject.output on exit")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhiwang1 pushed a commit that referenced this issue Jan 13, 2022
The fixed commit attempts to get the output file descriptor even if the
file was never opened e.g.

  $ perf record uname
  Linux
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ]
  $ perf inject -i perf.data --vm-time-correlation=dry-run
  Segmentation fault (core dumped)
  $ gdb --quiet perf
  Reading symbols from perf...
  (gdb) r inject -i perf.data --vm-time-correlation=dry-run
  Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

  Program received signal SIGSEGV, Segmentation fault.
  __GI___fileno (fp=0x0) at fileno.c:35
  35      fileno.c: No such file or directory.
  (gdb) bt
  #0  __GI___fileno (fp=0x0) at fileno.c:35
  #1  0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72
  #2  perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69
  #3  cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017
  #4  0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313
  #5  0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
  #6  run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409
  #7  main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539
  (gdb)

Fixes: 0ae0389 ("perf tools: Pass a fd to perf_file_header__read_pipe()")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zhiwang1 pushed a commit that referenced this issue Feb 17, 2022
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.

[  +0.000003] WARNING: possible recursive locking detected
[  +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted
[  +0.000004] --------------------------------------------
[  +0.000002] python/4822 is trying to acquire lock:
[  +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000203]
              but task is already holding lock:
[  +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.000017]
              other info that might help us debug this:
[  +0.000002]  Possible unsafe locking scenario:

[  +0.000003]        CPU0
[  +0.000002]        ----
[  +0.000002]   lock(reservation_ww_class_mutex);
[  +0.000004]   lock(reservation_ww_class_mutex);
[  +0.000003]
               *** DEADLOCK ***

[  +0.000002]  May be due to missing lock nesting notation

[  +0.000003] 7 locks held by python/4822:
[  +0.000003]  #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[  +0.000232]  #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[  +0.000241]  #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[  +0.000236]  #3: ffffb2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[  +0.000235]  #4: ffff932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.000015]  #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[  +0.000038]  #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[  +0.000195]
              stack backtrace:
[  +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[  +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[  +0.000003] Call Trace:
[  +0.000003]  dump_stack+0x6d/0x89
[  +0.000010]  __lock_acquire+0xb93/0x1a90
[  +0.000009]  lock_acquire+0x25d/0x2d0
[  +0.000005]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  ? lock_is_held_type+0xa2/0x110
[  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  __ww_mutex_lock.constprop.17+0xca/0x1060
[  +0.000007]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  ? lock_release+0x13f/0x270
[  +0.000005]  ? lock_is_held_type+0xa2/0x110
[  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000185]  ttm_bo_release+0x4c6/0x580 [ttm]
[  +0.000010]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[  +0.000183]  amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[  +0.000189]  amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[  +0.000189]  amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[  +0.000191]  update_gpuvm_pte+0xcc/0x290 [amdgpu]
[  +0.000229]  ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[  +0.000190]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[  +0.000234]  kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[  +0.000218]  kfd_ioctl+0x2b9/0x600 [amdgpu]
[  +0.000216]  ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[  +0.000216]  ? lock_release+0x13f/0x270
[  +0.000006]  ? __fget_files+0x107/0x1e0
[  +0.000007]  __x64_sys_ioctl+0x8b/0xd0
[  +0.000007]  do_syscall_64+0x36/0x70
[  +0.000004]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0.000007] RIP: 0033:0x7fbff90a7317
[  +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[  +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX:
00007fbff90a7317
[  +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI:
0000000000000004
[  +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09:
00007fbcc402d880
[  +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12:
00000000c0184b18
[  +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15:
00007fbcc402d820

Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
zhiwang1 pushed a commit that referenced this issue Mar 7, 2022
We inadvertently create a dependency on mmap_sem with a whole chain.

This breaks any user who wants to take a lock and call rcu_barrier(),
while also taking that lock inside mmap_sem:

<4> [604.892532] ======================================================
<4> [604.892534] WARNING: possible circular locking dependency detected
<4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ #1 Tainted: G     U
<4> [604.892537] ------------------------------------------------------
<4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock:
<4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [604.892547]
but task is already holding lock:
<4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915]
<4> [604.892592]
which lock already depends on the new lock.
<4> [604.892593]
the existing dependency chain (in reverse order) is:
<4> [604.892594]
-> #6 (reservation_ww_class_mutex){+.+.}:
<4> [604.892597]        __ww_mutex_lock.constprop.15+0xc3/0x1090
<4> [604.892598]        ww_mutex_lock+0x39/0x70
<4> [604.892600]        dma_resv_lockdep+0x10e/0x1f5
<4> [604.892602]        do_one_initcall+0x58/0x300
<4> [604.892604]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892605]        kernel_init+0x5/0x100
<4> [604.892606]        ret_from_fork+0x24/0x50
<4> [604.892607]
-> #5 (reservation_ww_class_acquire){+.+.}:
<4> [604.892609]        dma_resv_lockdep+0xec/0x1f5
<4> [604.892610]        do_one_initcall+0x58/0x300
<4> [604.892610]        kernel_init_freeable+0x17b/0x1dc
<4> [604.892611]        kernel_init+0x5/0x100
<4> [604.892612]        ret_from_fork+0x24/0x50
<4> [604.892613]
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615]        __might_fault+0x63/0x90
<4> [604.892617]        _copy_to_user+0x1e/0x80
<4> [604.892619]        perf_read+0x200/0x2b0
<4> [604.892621]        vfs_read+0x96/0x160
<4> [604.892622]        ksys_read+0x9f/0xe0
<4> [604.892623]        do_syscall_64+0x4f/0x220
<4> [604.892624]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626]        __mutex_lock+0x9a/0x9c0
<4> [604.892627]        perf_event_init_cpu+0xa4/0x140
<4> [604.892629]        perf_event_init+0x19d/0x1cd
<4> [604.892630]        start_kernel+0x362/0x4e4
<4> [604.892631]        secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633]        __mutex_lock+0x9a/0x9c0
<4> [604.892633]        perf_event_init_cpu+0x6b/0x140
<4> [604.892635]        cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636]        _cpu_up+0xa2/0x140
<4> [604.892637]        do_cpu_up+0x61/0xa0
<4> [604.892639]        smp_init+0x57/0x96
<4> [604.892639]        kernel_init_freeable+0x87/0x1dc
<4> [604.892640]        kernel_init+0x5/0x100
<4> [604.892642]        ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643]        cpus_read_lock+0x34/0xd0
<4> [604.892644]        rcu_barrier+0xaa/0x190
<4> [604.892645]        kernel_init+0x21/0x100
<4> [604.892647]        ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [604.892649]        __lock_acquire+0x1328/0x15d0
<4> [604.892650]        lock_acquire+0xa7/0x1c0
<4> [604.892651]        __mutex_lock+0x9a/0x9c0
<4> [604.892652]        rcu_barrier+0x23/0x190
<4> [604.892680]        i915_gem_object_unbind+0x29d/0x3f0 [i915]
<4> [604.892707]        i915_gem_object_pin_to_display_plane+0x141/0x270 [i915]
<4> [604.892737]        intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915]
<4> [604.892767]        intel_plane_pin_fb+0x3f/0xd0 [i915]
<4> [604.892797]        intel_prepare_plane_fb+0x13b/0x5c0 [i915]
<4> [604.892798]        drm_atomic_helper_prepare_planes+0x85/0x110
<4> [604.892827]        intel_atomic_commit+0xda/0x390 [i915]
<4> [604.892828]        drm_atomic_helper_set_config+0x57/0xa0
<4> [604.892830]        drm_mode_setcrtc+0x1c4/0x720
<4> [604.892830]        drm_ioctl_kernel+0xb0/0xf0
<4> [604.892831]        drm_ioctl+0x2e1/0x390
<4> [604.892833]        ksys_ioctl+0x7b/0x90
<4> [604.892835]        __x64_sys_ioctl+0x11/0x20
<4> [604.892835]        do_syscall_64+0x4f/0x220
<4> [604.892836]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892837]

Changes since v1:
- Use (*values)[n++] in perf_read_one().
Changes since v2:
- Centrally allocate values.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants