Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user_exit_info: Print out debug dump if available #104

Merged
merged 1 commit into from
Jan 25, 2024
Merged

user_exit_info: Print out debug dump if available #104

merged 1 commit into from
Jan 25, 2024

Conversation

htejun
Copy link
Contributor

@htejun htejun commented Jan 25, 2024

When a scx scheduler aborts, it can be difficult to find out what happened because all the scheduling states are lost as the system is reverted back to the default scheduler. Kernel now captures debug dump. Print it out if available. It looks like the following:

root@test ~# os/work/tools/sched_ext/build/bin/scx_qmap -t 500
enq=5, dsp=0, delta=5, reenq=0, deq=0, core=0
enq=451, dsp=445, delta=6, reenq=0, deq=0, core=0
enq=871, dsp=868, delta=3, reenq=0, deq=0, core=0
enq=1263, dsp=1259, delta=4, reenq=8, deq=0, core=0
enq=1595, dsp=1593, delta=2, reenq=8, deq=0, core=0
enq=1870, dsp=1869, delta=1, reenq=8, deq=0, core=0
enq=1940, dsp=1939, delta=1, reenq=8, deq=0, core=0
enq=1974, dsp=1974, delta=0, reenq=15, deq=0, core=0

DEBUG DUMP
================================================================================

kworker/u16:4[173] triggered exit kind 1026:
  runnable task stall (yes[1632] failed to run for 5.314s)

Backtrace:
  scx_watchdog_workfn+0x143/0x1d0
  process_scheduled_works+0x245/0x4e0
  worker_thread+0x270/0x360
  kthread+0xeb/0x110
  ret_from_fork+0x36/0x40
  ret_from_fork_asm+0x11/0x20

Runqueue states
---------------

CPU 0   : nr_run=1 flags=0x0 cpu_rel=0 ops_qseq=435 pnt_seq=440
          curr=stress[1643] class=ext_sched_class

 *R stress          : pid=1643 state/flags=3/0x5 dsq_flags=0x0
                      ops_state/qseq=0/0 run_at=-4ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1

CPU 1   : nr_run=2 flags=0x0 cpu_rel=0 ops_qseq=398 pnt_seq=400
          curr=yes[1636] class=ext_sched_class

 *R yes             : pid=1636 state/flags=3/0x5 dsq_flags=0x0
                      ops_state/qseq=0/0 run_at=-20ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1

  R yes             : pid=1630 state/flags=3/0x1 dsq_flags=0x0
                      ops_state/qseq=2/196 run_at=-4097ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1
    irqentry_exit+0x54/0x80
    sysvec_apic_timer_interrupt+0x44/0x80
    asm_sysvec_apic_timer_interrupt+0x1b/0x20
    __fdget_pos+0x4d/0xc0
    ksys_write+0x21/0xc0
    __x64_sys_write+0x1b/0x20
    do_syscall_64+0x40/0xe0
    entry_SYSCALL_64_after_hwframe+0x46/0x4e

CPU 2   : nr_run=1 flags=0x0 cpu_rel=0 ops_qseq=411 pnt_seq=415
          curr=stress[1646] class=ext_sched_class

 *R stress          : pid=1646 state/flags=3/0x5 dsq_flags=0x0
                      ops_state/qseq=0/0 run_at=-3ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1

CPU 3   : nr_run=2 flags=0x0 cpu_rel=0 ops_qseq=383 pnt_seq=382
          curr=swapper/3[0] class=idle_sched_class

  R yes             : pid=1632 state/flags=3/0x1 dsq_flags=0x0
                      ops_state/qseq=2/118 run_at=-5314ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1
    irqentry_exit+0x54/0x80
    sysvec_apic_timer_interrupt+0x44/0x80
    asm_sysvec_apic_timer_interrupt+0x1b/0x20
    syscall_enter_from_user_mode+0x39/0x1b0
    do_syscall_64+0x21/0xe0
    entry_SYSCALL_64_after_hwframe+0x46/0x4e

  R stress          : pid=1648 state/flags=3/0x1 dsq_flags=0x0
                      ops_state/qseq=2/382 run_at=-5ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1
    exit_to_user_mode_prepare+0x96/0x120
    irqentry_exit_to_user_mode+0x9/0x40
    irqentry_exit+0x31/0x80
    sysvec_apic_timer_interrupt+0x44/0x80
    asm_sysvec_apic_timer_interrupt+0x1b/0x20

CPU 4   : nr_run=2 flags=0x0 cpu_rel=0 ops_qseq=412 pnt_seq=415
          curr=stress[1644] class=ext_sched_class

 *R stress          : pid=1644 state/flags=3/0x5 dsq_flags=0x0
                      ops_state/qseq=0/0 run_at=-5ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1

  R stress          : pid=1642 state/flags=3/0x1 dsq_flags=0x0
                      ops_state/qseq=2/273 run_at=-2775ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1
    exit_to_user_mode_prepare+0x96/0x120
    irqentry_exit_to_user_mode+0x9/0x40
    irqentry_exit+0x31/0x80
    sysvec_apic_timer_interrupt+0x44/0x80
    asm_sysvec_apic_timer_interrupt+0x1b/0x20

CPU 5   : nr_run=1 flags=0x0 cpu_rel=0 ops_qseq=383 pnt_seq=387
          curr=stress[1649] class=ext_sched_class

 *R stress          : pid=1649 state/flags=3/0x5 dsq_flags=0x0
                      ops_state/qseq=0/0 run_at=-6ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1

CPU 6   : nr_run=2 flags=0x0 cpu_rel=0 ops_qseq=391 pnt_seq=390
          curr=stress[1645] class=ext_sched_class

 *R stress          : pid=1645 state/flags=3/0x5 dsq_flags=0x0
                      ops_state/qseq=0/0 run_at=-18ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1

  R stress          : pid=1647 state/flags=3/0x1 dsq_flags=0x0
                      ops_state/qseq=2/71 run_at=-6470ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1
    exit_to_user_mode_prepare+0x96/0x120
    irqentry_exit_to_user_mode+0x9/0x40
    irqentry_exit+0x31/0x80
    sysvec_apic_timer_interrupt+0x44/0x80
    asm_sysvec_apic_timer_interrupt+0x1b/0x20

CPU 7   : nr_run=2 flags=0x0 cpu_rel=0 ops_qseq=357 pnt_seq=373
          curr=kworker/u16:4[173] class=ext_sched_class

 *R kworker/u16:4   : pid=173 state/flags=3/0xd dsq_flags=0x0
                      ops_state/qseq=0/0 run_at=+0ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1
    scx_ops_error_irq_workfn+0x312/0x400
    irq_work_run_list+0x7d/0xc0
    irq_work_run+0x18/0x30
    __sysvec_irq_work+0x38/0x100
    sysvec_irq_work+0x69/0x80
    asm_sysvec_irq_work+0x1b/0x20
    scx_watchdog_workfn+0x168/0x1d0
    process_scheduled_works+0x245/0x4e0
    worker_thread+0x270/0x360
    kthread+0xeb/0x110
    ret_from_fork+0x36/0x40
    ret_from_fork_asm+0x11/0x20

  R yes             : pid=1634 state/flags=3/0x1 dsq_flags=0x0
                      ops_state/qseq=2/343 run_at=-1473ms
                      dsq_id=(n/a) sticky/holding_cpu=-1/-1
    irqentry_exit+0x54/0x80
    sysvec_apic_timer_interrupt+0x44/0x80
    asm_sysvec_apic_timer_interrupt+0x1b/0x20
    syscall_enter_from_user_mode+0x39/0x1b0
    do_syscall_64+0x21/0xe0
    entry_SYSCALL_64_after_hwframe+0x46/0x4e

================================================================================

EXIT: runnable task stall (yes[1632] failed to run for 5.314s)

Signed-off-by: Tejun Heo <tj@kernel.org>
@htejun htejun merged commit 7117a22 into main Jan 25, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant