Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools, bpftool: Avoid array index warnings. #276

Closed
wants to merge 3 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: tools, bpftool: Avoid array index warnings.
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=371913

kernel-patches-bot and others added 3 commits October 27, 2020 18:41
The bpf_caps array is shorter without CAP_BPF, avoid out of bounds reads
if this isn't defined. Working around this avoids -Wno-array-bounds with
clang.

Signed-off-by: Ian Rogers <irogers@google.com>
Avoid an unused variable warning.

Signed-off-by: Ian Rogers <irogers@google.com>
@kernel-patches-bot
Copy link
Author

Master branch: c66dca9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=371913
version: 1

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=371913 irrelevant now. Closing PR.

@kernel-patches-bot kernel-patches-bot deleted the series/371913=>bpf branch November 2, 2020 18:13
kernel-patches-bot pushed a commit that referenced this pull request Jan 20, 2021
This reverts commit 367c820.

lockup_detector_init() makes heavy use of per-cpu variables and must be
called with preemption disabled. Usually, it's handled early during boot
in kernel_init_freeable(), before SMP has been initialised.

Since we do not know whether or not our PMU interrupt can be signalled
as an NMI until considerably later in the boot process, the Arm PMU
driver attempts to re-initialise the lockup detector off the back of a
device_initcall(). Unfortunately, this is called from preemptible
context and results in the following splat:

  | BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
  | caller is debug_smp_processor_id+0x20/0x2c
  | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |   dump_backtrace+0x0/0x3c0
  |   show_stack+0x20/0x6c
  |   dump_stack+0x2f0/0x42c
  |   check_preemption_disabled+0x1cc/0x1dc
  |   debug_smp_processor_id+0x20/0x2c
  |   hardlockup_detector_event_create+0x34/0x18c
  |   hardlockup_detector_perf_init+0x2c/0x134
  |   watchdog_nmi_probe+0x18/0x24
  |   lockup_detector_init+0x44/0xa8
  |   armv8_pmu_driver_init+0x54/0x78
  |   do_one_initcall+0x184/0x43c
  |   kernel_init_freeable+0x368/0x380
  |   kernel_init+0x1c/0x1cc
  |   ret_from_fork+0x10/0x30

Rather than bodge this with raw_smp_processor_id() or randomly disabling
preemption, simply revert the culprit for now until we figure out how to
do this properly.

Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20201221162249.3119-1-lecopzer.chen@mediatek.com
Link: https://lore.kernel.org/r/20210112221855.10666-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
kernel-patches-bot pushed a commit that referenced this pull request Aug 11, 2021
…CKOPT

Add verifier ctx test to call bpf_get_netns_cookie from
cgroup/setsockopt.

  #269/p pass ctx or null check, 1: ctx Did not run the program (not supported) OK
  #270/p pass ctx or null check, 2: null Did not run the program (not supported) OK
  #271/p pass ctx or null check, 3: 1 OK
  #272/p pass ctx or null check, 4: ctx - const OK
  #273/p pass ctx or null check, 5: null (connect) Did not run the program (not supported) OK
  #274/p pass ctx or null check, 6: null (bind) Did not run the program (not supported) OK
  #275/p pass ctx or null check, 7: ctx (bind) Did not run the program (not supported) OK
  #276/p pass ctx or null check, 8: null (bind) OK
  #277/p pass ctx or null check, 9: ctx (cgroup/setsockopt) Did not run the program (not supported) OK
  #278/p pass ctx or null check, 10: null (cgroup/setsockopt) Did not run the program (not supported) OK

Signed-off-by: Stanislav Fomichev <sdf@google.com>
kernel-patches-bot pushed a commit that referenced this pull request Aug 12, 2021
…CKOPT

Add verifier ctx test to call bpf_get_netns_cookie from
cgroup/setsockopt.

  #269/p pass ctx or null check, 1: ctx Did not run the program (not supported) OK
  #270/p pass ctx or null check, 2: null Did not run the program (not supported) OK
  #271/p pass ctx or null check, 3: 1 OK
  #272/p pass ctx or null check, 4: ctx - const OK
  #273/p pass ctx or null check, 5: null (connect) Did not run the program (not supported) OK
  #274/p pass ctx or null check, 6: null (bind) Did not run the program (not supported) OK
  #275/p pass ctx or null check, 7: ctx (bind) Did not run the program (not supported) OK
  #276/p pass ctx or null check, 8: null (bind) OK
  #277/p pass ctx or null check, 9: ctx (cgroup/setsockopt) Did not run the program (not supported) OK
  #278/p pass ctx or null check, 10: null (cgroup/setsockopt) Did not run the program (not supported) OK

Signed-off-by: Stanislav Fomichev <sdf@google.com>
kernel-patches-bot pushed a commit that referenced this pull request Aug 12, 2021
…CKOPT

Add verifier ctx test to call bpf_get_netns_cookie from
cgroup/setsockopt.

  #269/p pass ctx or null check, 1: ctx Did not run the program (not supported) OK
  #270/p pass ctx or null check, 2: null Did not run the program (not supported) OK
  #271/p pass ctx or null check, 3: 1 OK
  #272/p pass ctx or null check, 4: ctx - const OK
  #273/p pass ctx or null check, 5: null (connect) Did not run the program (not supported) OK
  #274/p pass ctx or null check, 6: null (bind) Did not run the program (not supported) OK
  #275/p pass ctx or null check, 7: ctx (bind) Did not run the program (not supported) OK
  #276/p pass ctx or null check, 8: null (bind) OK
  #277/p pass ctx or null check, 9: ctx (cgroup/setsockopt) Did not run the program (not supported) OK
  #278/p pass ctx or null check, 10: null (cgroup/setsockopt) Did not run the program (not supported) OK

Signed-off-by: Stanislav Fomichev <sdf@google.com>
kernel-patches-bot pushed a commit that referenced this pull request Aug 13, 2021
…CKOPT

Add verifier ctx test to call bpf_get_netns_cookie from
cgroup/setsockopt.

  #269/p pass ctx or null check, 1: ctx Did not run the program (not supported) OK
  #270/p pass ctx or null check, 2: null Did not run the program (not supported) OK
  #271/p pass ctx or null check, 3: 1 OK
  #272/p pass ctx or null check, 4: ctx - const OK
  #273/p pass ctx or null check, 5: null (connect) Did not run the program (not supported) OK
  #274/p pass ctx or null check, 6: null (bind) Did not run the program (not supported) OK
  #275/p pass ctx or null check, 7: ctx (bind) Did not run the program (not supported) OK
  #276/p pass ctx or null check, 8: null (bind) OK
  #277/p pass ctx or null check, 9: ctx (cgroup/setsockopt) Did not run the program (not supported) OK
  #278/p pass ctx or null check, 10: null (cgroup/setsockopt) Did not run the program (not supported) OK

Signed-off-by: Stanislav Fomichev <sdf@google.com>
kernel-patches-bot pushed a commit that referenced this pull request Aug 13, 2021
…CKOPT

Add verifier ctx test to call bpf_get_netns_cookie from
cgroup/setsockopt.

  #269/p pass ctx or null check, 1: ctx Did not run the program (not supported) OK
  #270/p pass ctx or null check, 2: null Did not run the program (not supported) OK
  #271/p pass ctx or null check, 3: 1 OK
  #272/p pass ctx or null check, 4: ctx - const OK
  #273/p pass ctx or null check, 5: null (connect) Did not run the program (not supported) OK
  #274/p pass ctx or null check, 6: null (bind) Did not run the program (not supported) OK
  #275/p pass ctx or null check, 7: ctx (bind) Did not run the program (not supported) OK
  #276/p pass ctx or null check, 8: null (bind) OK
  #277/p pass ctx or null check, 9: ctx (cgroup/setsockopt) Did not run the program (not supported) OK
  #278/p pass ctx or null check, 10: null (cgroup/setsockopt) Did not run the program (not supported) OK

Signed-off-by: Stanislav Fomichev <sdf@google.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jan 5, 2024
…test_init()

The synth_event_gen_test module can be built in, if someone wants to run
the tests at boot up and not have to load them.

The synth_event_gen_test_init() function creates and enables the synthetic
events and runs its tests.

The synth_event_gen_test_exit() disables the events it created and
destroys the events.

If the module is builtin, the events are never disabled. The issue is, the
events should be disable after the tests are run. This could be an issue
if the rest of the boot up tests are enabled, as they expect the events to
be in a known state before testing. That known state happens to be
disabled.

When CONFIG_SYNTH_EVENT_GEN_TEST=y and CONFIG_EVENT_TRACE_STARTUP_TEST=y
a warning will trigger:

 Running tests on trace events:
 Testing event create_synth_test:
 Enabled event during self test!
 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1 at kernel/trace/trace_events.c:4150 event_trace_self_tests+0x1c2/0x480
 Modules linked in:
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc2-test-00031-gb803d7c664d5-dirty #276
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 RIP: 0010:event_trace_self_tests+0x1c2/0x480
 Code: bb e8 a2 ab 5d fc 48 8d 7b 48 e8 f9 3d 99 fc 48 8b 73 48 40 f6 c6 01 0f 84 d6 fe ff ff 48 c7 c7 20 b6 ad bb e8 7f ab 5d fc 90 <0f> 0b 90 48 89 df e8 d3 3d 99 fc 48 8b 1b 4c 39 f3 0f 85 2c ff ff
 RSP: 0000:ffffc9000001fdc0 EFLAGS: 00010246
 RAX: 0000000000000029 RBX: ffff88810399ca80 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffffffffb9f19478 RDI: ffff88823c734e64
 RBP: ffff88810399f300 R08: 0000000000000000 R09: fffffbfff79eb32a
 R10: ffffffffbcf59957 R11: 0000000000000001 R12: ffff888104068090
 R13: ffffffffbc89f0a0 R14: ffffffffbc8a0f08 R15: 0000000000000078
 FS:  0000000000000000(0000) GS:ffff88823c700000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001f6282001 CR4: 0000000000170ef0
 Call Trace:
  <TASK>
  ? __warn+0xa5/0x200
  ? event_trace_self_tests+0x1c2/0x480
  ? report_bug+0x1f6/0x220
  ? handle_bug+0x6f/0x90
  ? exc_invalid_op+0x17/0x50
  ? asm_exc_invalid_op+0x1a/0x20
  ? tracer_preempt_on+0x78/0x1c0
  ? event_trace_self_tests+0x1c2/0x480
  ? __pfx_event_trace_self_tests_init+0x10/0x10
  event_trace_self_tests_init+0x27/0xe0
  do_one_initcall+0xd6/0x3c0
  ? __pfx_do_one_initcall+0x10/0x10
  ? kasan_set_track+0x25/0x30
  ? rcu_is_watching+0x38/0x60
  kernel_init_freeable+0x324/0x450
  ? __pfx_kernel_init+0x10/0x10
  kernel_init+0x1f/0x1e0
  ? _raw_spin_unlock_irq+0x33/0x50
  ret_from_fork+0x34/0x60
  ? __pfx_kernel_init+0x10/0x10
  ret_from_fork_asm+0x1b/0x30
  </TASK>

This is because the synth_event_gen_test_init() left the synthetic events
that it created enabled. By having it disable them after testing, the
other selftests will run fine.

Link: https://lore.kernel.org/linux-trace-kernel/20231220111525.2f0f49b0@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 9fe41ef ("tracing: Add synth event generation test module")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Alexander Graf <graf@amazon.com>
Tested-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jan 26, 2024
======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-fkuehlin #276 Not tainted
------------------------------------------------------
kworker/8:2/2676 is trying to acquire lock:
ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550

but task is already holding lock:
ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&svms->lock){+.+.}-{3:3}:
       __mutex_lock+0x97/0xd30
       kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]
       kfd_ioctl+0x1b2/0x5d0 [amdgpu]
       __x64_sys_ioctl+0x86/0xc0
       do_syscall_64+0x39/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #1 (&mm->mmap_lock){++++}-{3:3}:
       down_read+0x42/0x160
       svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

-> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}:
       __lock_acquire+0x1426/0x2200
       lock_acquire+0xc1/0x2b0
       __flush_work+0x80/0x550
       __cancel_work_timer+0x109/0x190
       svm_range_bo_release+0xdc/0x1c0 [amdgpu]
       svm_range_free+0x175/0x180 [amdgpu]
       svm_range_deferred_list_work+0x15d/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

other info that might help us debug this:

Chain exists of:
  (work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&svms->lock);
                               lock(&mm->mmap_lock);
                               lock(&svms->lock);
  lock((work_completion)(&svm_bo->eviction_work));

I believe this cannot really lead to a deadlock in practice, because
svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO
refcount is non-0. That means it's impossible that svm_range_bo_release
is running concurrently. However, there is no good way to annotate this.

To avoid the problem, take a BO reference in
svm_range_schedule_evict_svm_bo instead of in the worker. That way it's
impossible for a BO to get freed while eviction work is pending and the
cancel_work_sync call in svm_range_bo_release can be eliminated.

v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also
removed redundant checks that are already done in
amdkfd_fence_enable_signaling.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants