Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get k(g)db working #140

Open
mpe opened this issue Apr 30, 2018 · 20 comments
Open

Get k(g)db working #140

mpe opened this issue Apr 30, 2018 · 20 comments
Labels
bug It's a bug medium Possibly not too difficult

Comments

@mpe
Copy link
Member

mpe commented Apr 30, 2018

The very basics work, eg:

# echo hvc0 >  /sys/module/kgdboc/parameters/kgdboc 
# echo g > /proc/sysrq-trigger
Entering kdb (current=0x00000000651b7ab4, pid 0) on processor 13 due to Keyboard Entry
[13]kdb> btp 1
Stack traceback for pid 1
0x000000003966c3d4        1        0  0    5   S  0x000000008ee0e51a  systemd
Call Trace:
[c0000000fea03900] [c0000000fea03960] 0xc0000000fea03960 (unreliable)
[c0000000fea03ad0] [c00000000001eb1c] __switch_to+0x34c/0x4b0
[c0000000fea03b30] [c000000000b387f0] __schedule+0x380/0xbe0
[c0000000fea03c00] [c000000000b390a4] schedule+0x54/0xd0
[c0000000fea03c30] [c000000000b40444] schedule_hrtimeout_range_clock+0x184/0x190
[c0000000fea03cc0] [c000000000419c74] ep_poll+0x344/0x430
[c0000000fea03d80] [c000000000419e64] do_epoll_wait+0x104/0x120
[c0000000fea03dd0] [c00000000041b1a4] sys_epoll_pwait+0x1b4/0x1c0
[c0000000fea03e30] [c00000000000b860] system_call+0x58/0x6c

But then other things oops, in particular the self tests blow up.

[13]kdb> btc
btc: cpu status: Currently on cpu 13
Available cpus: 0-12(I), 13, 14-15(I)
Unable to handle kernel paging request for data at address 0x98d3bc0c
Faulting instruction address: 0xc000000000151a58
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c kvm binfmt_misc vmx_crypto ip_tables x_tables autofs4 crc32c_vpmsum virtio_net
CPU: 13 PID: 0 Comm: swapper/13 Not tainted 4.17.0-rc1-gcc-6.3.1-00001-gb56aa49046fe #1403
NIP:  c000000000151a58 LR: c000000000227284 CTR: 0000000000000008
REGS: c0000001fffb6ea0 TRAP: 0300   Not tainted  (4.17.0-rc1-gcc-6.3.1-00001-gb56aa49046fe)
MSR:  8000000000001033 <SF,ME,IR,DR,RI,LE>  CR: 24004428  XER: 20000000
CFAR: c000000000008830 DAR: 0000000098d3bc0c DSISR: 40000000 SOFTE: 1 
GPR00: c00000000022a994 c0000001fffb7120 c0000000010f6400 0000000098d3bc04 
GPR04: 0000000000000010 c0000001fffb70e0 c000000001b1a6ee 0000000098d3bc04 
GPR08: ffffffffffffffbf c000000001b16400 ffffffffffffffc9 000000098d3bc040 
GPR12: c00000000022a780 c00000003ffdf300 c000000001b1a970 c000000000fcc360 
GPR16: c000000000d62140 c000000000d62178 c000000001b027c0 c000000000b8ccc0 
GPR20: c000000000d62120 0000000000000001 c000000001b19cc0 0000000000000000 
GPR24: 0000000000000000 0000000000000032 c000000001b1a6d8 0000000000000000 
GPR28: 0000000000000001 c000000001b1a7b0 c000000001b1a6d8 0000000098d3bc04 
NIP [c000000000151a58] task_curr+0x8/0x40
LR [c000000000227284] kdb_set_current_task+0x34/0xc0
Call Trace:
[c0000001fffb7120] [c000000001b1a6ee] cbuf.35602+0x16/0xcc (unreliable)
[c0000001fffb7150] [c00000000022a994] kdb_bt+0x214/0x500
[c0000001fffb7240] [c000000000226d04] kdb_parse+0x4f4/0x8b0
[c0000001fffb7310] [c00000000022aa78] kdb_bt+0x2f8/0x500
[c0000001fffb7400] [c000000000226d04] kdb_parse+0x4f4/0x8b0
[c0000001fffb74d0] [c000000000227930] kdb_main_loop+0x470/0xa20
[c0000001fffb75d0] [c00000000022bc9c] kdb_stub+0x30c/0x5e0
[c0000001fffb7650] [c00000000021df78] kgdb_cpu_enter+0x378/0x790
[c0000001fffb7750] [c00000000021e760] kgdb_handle_exception+0x190/0x2b0
[c0000001fffb7820] [c00000000004a7c4] kgdb_handle_breakpoint+0x64/0xa0
[c0000001fffb7850] [c00000000002b214] program_check_exception+0x264/0x370
[c0000001fffb78c0] [c000000000009020] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x3c/0x70
    LR = __handle_sysrq+0x12c/0x2c0
[c0000001fffb7bb0] [c000000000d29e40] flag_spec.61673+0x133f74/0x1e9e0c (unreliable)
[c0000001fffb7bd0] [c0000000006de5cc] __handle_sysrq+0x12c/0x2c0
[c0000001fffb7c70] [c0000000006f6850] hvc_poll+0x1c0/0x360
[c0000001fffb7d00] [c0000000006f7b3c] hvc_handle_interrupt+0x2c/0x60
[c0000001fffb7d30] [c00000000019ff90] __handle_irq_event_percpu+0x110/0x3c0
[c0000001fffb7e20] [c0000000001a027c] handle_irq_event_percpu+0x3c/0x90
[c0000001fffb7e60] [c0000000001a0330] handle_irq_event+0x60/0xb0
[c0000001fffb7ea0] [c0000000001a5dd8] handle_fasteoi_irq+0xc8/0x240
[c0000001fffb7ee0] [c00000000019e514] generic_handle_irq+0x54/0x80
[c0000001fffb7f10] [c0000000000190fc] __do_irq+0xbc/0x2d0
[c0000001fffb7f90] [c00000000002eca0] call_do_irq+0x14/0x24
[c0000001fe7979b0] [c0000000000193b0] do_IRQ+0xa0/0x130
[c0000001fe797a10] [c000000000008d30] hardware_interrupt_common+0x150/0x160
--- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
    LR = check_and_cede_processor+0x34/0x50
[c0000001fe797d00] [c000000000952bd0] check_and_cede_processor+0x20/0x50 (unreliable)
[c0000001fe797d60] [c000000000952da0] shared_cede_loop+0x50/0x140
[c0000001fe797d90] [c00000000094fc78] cpuidle_enter_state+0xa8/0x440
[c0000001fe797df0] [c00000000015ada0] call_cpuidle+0x70/0xd0
[c0000001fe797e30] [c00000000015b4e8] do_idle+0x328/0x3a0
[c0000001fe797ec0] [c00000000015b7a8] cpu_startup_entry+0x38/0x50
[c0000001fe797ef0] [c00000000004d6bc] start_secondary+0x4ec/0x530
[c0000001fe797f90] [c00000000000b170] start_secondary_prolog+0x10/0x14
Instruction dump:
2f890000 409eff10 3c62ffc6 39200001 3863edc8 992aee5b 4bfbacd9 60000000 
0fe00000 4bfffef0 3c4c00fa 384249b0 <e9430008> 3d020004 3908d6f0 3d22ffe7 
---[ end trace d4d77b5b70c0a456 ]---

Kernel panic - not syncing: Fatal exception in interrupt
@rnav
Copy link
Member

rnav commented Apr 30, 2018

Is that used by anyone on powerpc? I thought xmon is our preferred debugger. Are there scenarios where k(g)db would be useful to have?

@mpe
Copy link
Member Author

mpe commented Apr 30, 2018

It's not used much because we've always had xmon. Though xmon is more of a "crash handler" than a debugger. kgdb can (in theory) do full gdb-style debugging against a running kernel, which could be useful at times.

But really we should either get it working or prevent it from being enabled, I don't like having things that are known to not work sitting around for people to trip up on.

mpe referenced this issue in linuxppc/linux Jun 18, 2018
While hacking on kTLS, I ran into the following panic from an
unprivileged netserver / netperf TCP session:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
  Oops: 0010 [#1] SMP KASAN PTI
  CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ torvalds#139
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  RIP: 0010:          (null)
  Code: Bad RIP value.
  RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
  RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
  RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
  RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
  R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
  R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
  FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? tls_sw_poll+0xa4/0x160 [tls]
   ? sock_poll+0x20a/0x680
   ? do_select+0x77b/0x11a0
   ? poll_schedule_timeout.constprop.12+0x130/0x130
   ? pick_link+0xb00/0xb00
   ? read_word_at_a_time+0x13/0x20
   ? vfs_poll+0x270/0x270
   ? deref_stack_reg+0xad/0xe0
   ? __read_once_size_nocheck.constprop.6+0x10/0x10
  [...]

Debugging further, it turns out that calling into ctx->sk_poll() is
invalid since sk_poll itself is NULL which was saved from the original
TCP socket in order for tls_sw_poll() to invoke it.

Looks like the recent conversion from poll to poll_mask callback started
in 1525242 ("net: add support for ->poll_mask in proto_ops") missed
to eventually convert kTLS, too: TCP's ->poll was converted over to the
->poll_mask in commit 2c7d3da ("net/tcp: convert to ->poll_mask")
and therefore kTLS wrongly saved the ->poll old one which is now NULL.

Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
tcp_poll_mask() as well that is mangled here.

Fixes: 2c7d3da ("net/tcp: convert to ->poll_mask")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Watson <davejwatson@fb.com>
Tested-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
@chleroy
Copy link

chleroy commented Sep 14, 2018

Tried it on 8xx, no Oops, but uggly 'ptrval'. Should we do something about it ?

root@vgoip:~# tty
/dev/ttyCPM0
root@vgoip:~# echo ttyCPM0 > /sys/module/kgdboc/parameters/kgdboc
root@vgoip:~#
root@vgoip:~# echo g > /proc/sysrq-trigger
[  240.108192] sysrq: SysRq : DEBUG

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btp 1
Stack traceback for pid 1
0x(ptrval)        1        0  0    0   S  0x(ptrval)  init
Call Trace:
[c60e1db0] [100c82b6] 0x100c82b6 (unreliable)
[c60e1e70] [c05352ac] __schedule+0x22c/0x5ac
[c60e1eb0] [c053565c] schedule+0x30/0x5c
[c60e1ec0] [c001fbfc] do_wait+0x1a8/0x29c
[c60e1ef0] [c0020b18] kernel_wait4+0x80/0x128
[c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0
kdb>

But it silently hangs after calling 'help'

kdb> help
Command         Usage                Description
----------------------------------------------------------
md              <vaddr>             Display Memory Contents, also mdWcN, e.g. md8c1
mdr             <vaddr> <bytes>     Display Raw Memory
mdp             <paddr> <bytes>     Display Physical Memory
mds             <vaddr>             Display Memory Symbolically
mm              <vaddr> <contents>  Modify Memory Contents
go              [<vaddr>]           Continue Execution
rd                                  Display Registers
rm              <reg> <contents>    Modify Registers
ef              <vaddr>             Display exception frame
bt              [<vaddr>]           Stack traceback
btp             <pid>               Display stack for process <pid>
bta             [D|R|S|T|C|Z|E|U|I|M|A]
                                    Backtrace all processes matching state flag
btc                                 Backtrace current process on each cpu
btt             <vaddr>             Backtrace process given its struct task address
env                                 Show environment variables
set                                 Set environment variables
help                                Display Help Message

@chleroy
Copy link

chleroy commented Sep 14, 2018

bta silently hangs as well

kdb> bta
15 sleeping system daemon (state M) processes suppressed,
use 'ps A' to see all.
Stack traceback for pid 282
0x(ptrval)      282      280  1    0   R  0x(ptrval) *sh
Call Trace:
[c656fb10] [c0089cd0] kdb_show_stack+0x80/0xa4 (unreliable)
[c656fb30] [c0089d90] kdb_bt1.isra.0+0x9c/0xf4
[c656fb60] [c0089e60] kdb_bt+0x78/0x348
[c656fbf0] [c00875f8] kdb_parse+0x430/0x730
[c656fc40] [c0087d5c] kdb_main_loop+0x348/0x8f4
[c656fca0] [c008acdc] kdb_stub+0x18c/0x3c0
[c656fcd0] [c0080c24] kgdb_handle_exception+0x2c8/0x720
[c656fd60] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
[c656fd70] [c000af38] program_check_exception+0x104/0x700
[c656fd90] [c000e45c] ret_from_except_full+0x0/0x4
[c656fe50] [c0266494] __handle_sysrq+0x120/0x1a0
[c656fe80] [c026698c] write_sysrq_trigger+0x44/0x5c
[c656fe90] [c016bd64] proc_reg_write+0x60/0xf0
[c656fea0] [c011b130] __vfs_write+0x28/0x178
[c656fef0] [c011b460] vfs_write+0xb8/0x1cc
[c656ff10] [c011b6ec] ksys_write+0x4c/0xc4
[c656ff40] [c000e11c] ret_from_syscall+0x0/0x38

@chleroy
Copy link

chleroy commented Sep 14, 2018

Those silent hangs are in fact a problem in CPM serial driver.
Following patch is proposed to fix it: https://patchwork.ozlabs.org/patch/969723/

@chleroy
Copy link

chleroy commented Sep 14, 2018

When booting with parameter 'debug_boot_weak_hash', I get the following

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

Seems like kdb_getarea tries to access hashed address and not real address (The 0xba99ad80 in first line is written by %p while kdb_getarea() uses %lx

@chleroy
Copy link

chleroy commented Sep 14, 2018

The issue seems to be linked to the following call:

sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));

On the 8xx, we end up with btt (ptrval) (it takes a huge amount of time before getting enough entropy to print hashed values).

On faster platforms, we most likely end up with an hashed pointer, which is by definition a non valid address hence the Oops.

@chleroy
Copy link

chleroy commented Sep 14, 2018

When replacing that %p by %px, it works:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
Stack traceback for pid 282
0x(ptrval) 282 280 1 0 R 0x(ptrval) *sh
Call Trace:
[c627ba30] [c0089cd0] kdb_show_stack+0x80/0xa4 (unreliable)
[c627ba50] [c0089d90] kdb_bt1.isra.0+0x9c/0xf4
[c627ba80] [c0089f64] kdb_bt+0x17c/0x348
[c627bb10] [c00875f8] kdb_parse+0x430/0x730
[c627bb60] [c0089ff8] kdb_bt+0x210/0x348
[c627bbf0] [c00875f8] kdb_parse+0x430/0x730
[c627bc40] [c0087d5c] kdb_main_loop+0x348/0x8f4
[c627bca0] [c008acdc] kdb_stub+0x18c/0x3c0
[c627bcd0] [c0080c24] kgdb_handle_exception+0x2c8/0x720
[c627bd60] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
[c627bd70] [c000af38] program_check_exception+0x104/0x700
[c627bd90] [c000e45c] ret_from_except_full+0x0/0x4
[c627be50] [c0266494] __handle_sysrq+0x120/0x1a0
[c627be80] [c026698c] write_sysrq_trigger+0x44/0x5c
[c627be90] [c016bd64] proc_reg_write+0x60/0xf0
[c627bea0] [c011b130] __vfs_write+0x28/0x178
[c627bef0] [c011b460] vfs_write+0xb8/0x1cc
[c627bf10] [c011b6ec] ksys_write+0x4c/0xc4
[c627bf40] [c000e11c] ret_from_syscall+0x0/0x38

@chleroy
Copy link

chleroy commented Sep 14, 2018

See https://patchwork.ozlabs.org/patch/969879/

@mpe, does it fix the issue you reported ?

fengguang referenced this issue in 0day-ci/linux Sep 14, 2018
On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: <stable@vger.kernel.org> # 4.15+
@mpe
Copy link
Member Author

mpe commented Sep 17, 2018

It helps, it makes btc work.

But the selftests still crash.

KGDB: Registered I/O driver kgdbts
kgdbts:RUN plant and detach test

Entering kdb (current=0x(____ptrval____), pid 1) on processor 5 due to Keyboard Entry
[5]kdb> kgdbts:RUN sw breakpoint test
kgdbts: BP mismatch c0000000001feb80 expected c000000000751300
KGDB: re-enter exception: ALL breakpoints killed
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802e50] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802e90] [c00000000020027c] kgdb_handle_exception+0x2bc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
    LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
    LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80
Kernel panic - not syncing: Recursive entry to debugger
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802db0] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802df0] [c00000000010df7c] panic+0x144/0x318
[c0000001fe802e90] [c00000000020028c] kgdb_handle_exception+0x2cc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
    LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
    LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80
BUG: sleeping function called from invalid context at ../arch/powerpc/kernel/rtas.c:515
in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc3-gcc-7.3.1-00015-ga1691649edf6-dirty #153
Call Trace:
[c0000001fe802bd0] [c000000000b2530c] dump_stack+0xb0/0xf4 (unreliable)
[c0000001fe802c10] [c00000000014afec] ___might_sleep+0x13c/0x170
[c0000001fe802c70] [c00000000003a86c] rtas_busy_delay+0x3c/0xe0
[c0000001fe802ca0] [c00000000003c5a4] rtas_os_term+0xa4/0xf0
[c0000001fe802d20] [c0000000000c91e0] pseries_panic+0x30/0x50
[c0000001fe802d50] [c00000000002d910] ppc_panic_event+0x70/0x90
[c0000001fe802d70] [c000000000140a4c] notifier_call_chain+0x9c/0x110
[c0000001fe802dc0] [c000000000140ba8] __atomic_notifier_call_chain+0x38/0x60
[c0000001fe802df0] [c00000000010dfc0] panic+0x188/0x318
[c0000001fe802e90] [c00000000020028c] kgdb_handle_exception+0x2cc/0x2d0
[c0000001fe802f60] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe802f90] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803000] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at check_and_rewind_pc+0x268/0x290
    LR = check_and_rewind_pc+0x264/0x290
[c0000001fe803370] [c000000000750fb4] validate_simple_test+0x54/0x120
[c0000001fe803390] [c000000000751720] run_simple_test+0x190/0x3f0
[c0000001fe803410] [c000000000751134] kgdbts_put_char+0x44/0x60
[c0000001fe803430] [c000000000200aa0] put_packet+0x130/0x210
[c0000001fe803480] [c000000000201a8c] gdb_serial_stub+0x3ec/0x12b0
[c0000001fe803590] [c0000000001ff9a8] kgdb_cpu_enter+0x3e8/0x820
[c0000001fe803690] [c000000000200270] kgdb_handle_exception+0x2b0/0x2d0
[c0000001fe803760] [c00000000004c418] kgdb_handle_breakpoint+0x58/0xb0
[c0000001fe803790] [c00000000002c1e0] program_check_exception+0x280/0x3a0
[c0000001fe803800] [c000000000009070] program_check_common+0x170/0x180
--- interrupt: 700 at kgdb_breakpoint+0x30/0x50
    LR = run_breakpoint_test+0x94/0x110
[c0000001fe803af0] [c000000000752de0] run_breakpoint_test+0x90/0x110 (unreliable)
[c0000001fe803b50] [c000000000753428] configure_kgdbts+0x298/0x6f0
[c0000001fe803c40] [c0000000000109b8] do_one_initcall+0x58/0x290
[c0000001fe803d00] [c000000000e4486c] kernel_init_freeable+0x3b0/0x49c
[c0000001fe803dc0] [c000000000010d54] kernel_init+0x24/0x170
[c0000001fe803e30] [c00000000000bddc] ret_from_kernel_thread+0x5c/0x80


SLOF **********************************************************************

@chleroy
Copy link

chleroy commented Sep 17, 2018

Apparently, it works on the 8xx:

[    1.590212] KGDB: Registered I/O driver kgdbts
[    1.594598] kgdbts:RUN plant and detach test

Entering kdb (current=(ptrval), pid 1) due to Keyboard Entry
kdb> [    1.606219] kgdbts:RUN sw breakpoint test
[    1.606219] kgdbts:RUN sw breakpoint test
[    1.614619] kgdbts:RUN bad memory access test
[    1.620122] kgdbts:RUN singlestep test 1000 iterations
[    1.633252] kgdbts:RUN singlestep [0/1000]
[    2.458827] kgdbts:RUN singlestep [100/1000]
[    3.284343] kgdbts:RUN singlestep [200/1000]
[    4.109926] kgdbts:RUN singlestep [300/1000]
[    4.935381] kgdbts:RUN singlestep [400/1000]
[    5.760988] kgdbts:RUN singlestep [500/1000]
[    6.586595] kgdbts:RUN singlestep [600/1000]
[    7.412013] kgdbts:RUN singlestep [700/1000]
[    8.237622] kgdbts:RUN singlestep [800/1000]
[    9.063095] kgdbts:RUN singlestep [900/1000]
[    9.880413] kgdbts:RUN do_fork for 100 breakpoints
[   18.626055] KGDB: Unregistered I/O driver kgdbts, debugger disabled

@chleroy
Copy link

chleroy commented Sep 17, 2018

Seems to work properly on 83xx as well:

[    0.559537] KGDB: Registered I/O driver kgdbts
[    0.564081] kgdbts:RUN plant and detach test

Entering kdb (current=(ptrval), pid 1) due to Keyboard Entry
kdb> [    0.575225] kgdbts:RUN sw breakpoint test
[    0.581083] kgdbts:RUN bad memory access test
[    0.585874] kgdbts:RUN singlestep test 1000 iterations
[    0.594401] kgdbts:RUN singlestep [0/1000]
[    0.926293] kgdbts:RUN singlestep [100/1000]
[    1.258422] kgdbts:RUN singlestep [200/1000]
[    1.590503] kgdbts:RUN singlestep [300/1000]
[    1.922563] kgdbts:RUN singlestep [400/1000]
[    2.254630] kgdbts:RUN singlestep [500/1000]
[    2.586718] kgdbts:RUN singlestep [600/1000]
[    2.918883] kgdbts:RUN singlestep [700/1000]
[    3.250982] kgdbts:RUN singlestep [800/1000]
[    3.583073] kgdbts:RUN singlestep [900/1000]
[    3.911853] kgdbts:RUN do_fork for 100 breakpoints
[   11.127248] KGDB: Unregistered I/O driver kgdbts, debugger disabled

@chleroy
Copy link

chleroy commented Sep 17, 2018

@mpe, it seems you get a program check from somewhere else than expected:

kgdbts: BP mismatch c0000000001feb80 expected c000000000751300

Then the second program check is the WARN_ON in eprintk() called from check_and_rewind_pc()

Could you tell what is at c000000000751300 and what is at c0000000001feb80 ?

@chleroy
Copy link

chleroy commented Sep 26, 2018

In the meantime, I discovered that kdb was left over when we implemented STRICT_KERNEL_RWX.

The following patch fixes setting the breakpoint with STRICT_KERNEL_RWX is active:

https://patchwork.ozlabs.org/patch/971040/

@mpe
Copy link
Member Author

mpe commented Sep 27, 2018

I've fixed the breakpoint mismatch. On LE we need to use ppc_function_entry() in lookup_addr().

Now it's getting to the singlestep test, which seems to be getting stuck.

@farosas
Copy link

farosas commented Sep 28, 2018

Not sure if you want to track this here but I see that the hvc driver only sends KGDB output to hvc0, regardless of kgdboc value in the boot line:

[root@localhost ~]# cat /proc/cmdline
root=UUID=dcda20b4-8fbe-4f52-ba40-b1a98fa55139 console=hvc0 kgdboc=hvc1
[root@localhost ~]# tty
/dev/hvc0
[root@localhost ~]# echo g > /proc/sysrq-trigger 
[  102.855530] sysrq: SysRq : DEBUG
[  102.855571] KGDB: Entering KGDB
+$OK#9a                    <-- this should only work in hvc1

From drivers/tty/hvc/hvc_console.c:

static int hvc_poll_get_char(struct tty_driver *driver, int line)
{
	struct tty_struct *tty = driver->ttys[0];
	struct hvc_struct *hp = tty->driver_data;
	...
}

static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch)
{
	struct tty_struct *tty = driver->ttys[0];
	struct hvc_struct *hp = tty->driver_data;
	...
}

This is particularly relevant when using QEMU with -serial mon:stdio -serial tcp:0:1234,server,nowait since there's no way to "detach" from hvc0 to connect gdb.

@leitao
Copy link
Member

leitao commented Oct 25, 2018

I am testing this on 4.19 kernel, and this is what I get, not sure if it is related, or I am mis-using it:

[    8.762802] sysrq: SysRq : DEBUG
[    8.763406] KGDB: Entering KGDB
[    8.765296] Unable to handle kernel paging request for data at address 0x00000260
[    8.765370] Faulting instruction address: 0xc00000000062ac9c
[    8.765735] KGDB: re-enter exception: ALL breakpoints killed
[    8.766044] CPU: 0 PID: 49 Comm: sh Not tainted 4.19.0-04681-g01aa9d5 #3
[    8.766253] Call Trace:
[    8.766867] [c00000001e853070] [c0000000009160e4] dump_stack+0xe8/0x164 (unreliable)
[    8.767037] [c00000001e8530c0] [c0000000001fd544] kgdb_handle_exception+0x294/0x2c0
[    8.767142] [c00000001e853190] [c000000000048b70] kgdb_debugger+0xc0/0xe0
[    8.767226] [c00000001e8531b0] [c000000000029b44] die+0xc4/0xf0
[    8.767301] [c00000001e8531f0] [c000000000069ec8] bad_page_fault+0xe8/0x180
[    8.767385] [c00000001e853260] [c00000000000b160] handle_page_fault+0x34/0x38
[    8.767500] --- interrupt: 300 at hvc_poll_get_char+0x2c/0x90
[    8.767500]     LR = kgdboc_get_char+0x4c/0x70

Looking at the failing instruction, I see:

c00000000062ac98:       00 00 29 e9     ld      r9,0(r9)
c00000000062ac9c:       60 02 29 e9     ld      r9,608(r9)

Looking at the code, I see:

 873 {
 874         struct tty_struct *tty = driver->ttys[0];
 875         struct hvc_struct *hp = tty->driver_data;

So, it means that tty is null, and it is being de-referenced by driver_data

pull bot referenced this issue in lokeshbv/linux Nov 14, 2018
On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
plbossart referenced this issue in plbossart/sound Nov 15, 2018
On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74 ("printk: hash addresses printed with %p")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: <stable@vger.kernel.org> # 4.15+
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
(cherry picked from commit a0ca72c
 git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git kgdb-next)

BUG=chromium:900598
TEST=kgdb works better

Change-Id: I5ecb3860244f98d1ae17edcd7d95398b101ca85a
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1310053
Reviewed-by: Guenter Roeck <groeck@chromium.org>
@chleroy
Copy link

chleroy commented Nov 18, 2018

woodsts referenced this issue in woodsts/linux-stable Nov 21, 2018
commit dded2e1 upstream.

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts referenced this issue in woodsts/linux-stable Nov 21, 2018
commit dded2e1 upstream.

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts referenced this issue in woodsts/linux-stable Nov 21, 2018
commit dded2e1 upstream.

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dsd referenced this issue in endlessm/linux Nov 27, 2018
BugLink: https://bugs.launchpad.net/bugs/1805158

commit dded2e1 upstream.

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
isjerryxiao referenced this issue in isjerryxiao/Amlogic_s905-kernel Nov 27, 2018
commit dded2e1 upstream.

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74 ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@mpe mpe transferred this issue from linuxppc/linux Jan 7, 2019
@mpe mpe added bug It's a bug medium Possibly not too difficult labels Jan 7, 2019
@chleroy
Copy link

chleroy commented May 2, 2019

Are there still issues with kgdb ?

@adelva1984
Copy link

I'm seeing the bug in #140 (comment) still happening with 5.8.0-rc4, so I don't think kgdb works with the hvc driver currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug It's a bug medium Possibly not too difficult
Projects
Status: 📋 Backlog
Development

No branches or pull requests

6 participants