Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when running sysctl sys.device.drmn0.pcie_replay_count with amdgpu #15

Closed
NorwegianRockCat opened this issue Jul 18, 2020 · 9 comments
Labels
bug Something isn't working

Comments

@NorwegianRockCat
Copy link
Contributor

NorwegianRockCat commented Jul 18, 2020

There seems to be a panic when running the sysctl device.drmn0.pcie_replay_count with the amdgpu loaded (a Navi10 card).

The dump does not look very helpful:

panic: page fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 23; apic id = 17
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe00f834e7f8
frame pointer           = 0x28:0xfffffe00f834e810
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3330 (sysctl)
trap number             = 12
panic: page fault
cpuid = 23
time = 1595050508
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f834e4b0
vpanic() at vpanic+0x182/frame 0xfffffe00f834e500
panic() at panic+0x43/frame 0xfffffe00f834e560
trap_fatal() at trap_fatal+0x387/frame 0xfffffe00f834e5c0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00f834e610
trap() at trap+0x271/frame 0xfffffe00f834e720
calltrap() at calltrap+0x8/frame 0xfffffe00f834e720
--- trap 0xc, rip = 0, rsp = 0xfffffe00f834e7f8, rbp = 0xfffffe00f834e810 ---
??() at 0/frame 0xfffffe00f834e810
sysctl_handle_attr() at sysctl_handle_attr+0x70/frame 0xfffffe00f834e860
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x91/frame 0xfffffe00f834e8b0
sysctl_root() at sysctl_root+0x249/frame 0xfffffe00f834e930
userland_sysctl() at userland_sysctl+0x173/frame 0xfffffe00f834e9e0
sys___sysctl() at sys___sysctl+0x5f/frame 0xfffffe00f834ea90
amd64_syscall() at amd64_syscall+0x119/frame 0xfffffe00f834ebb0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00f834ebb0
--- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800414caa, rsp = 0x7fffffffc3b8, rbp = 0x7fffffffc3f0 ---
KDB: enter: panic

I put the core.txt up on pastebin if more information is wanted.

@evadot
Copy link
Contributor

evadot commented Jul 24, 2020

I cannot reproduce on my amd machine.
Does that happens everytime for you ?
Do you have other kernel modules that could cause this ?

@evadot evadot added the bug Something isn't working label Jul 24, 2020
@NorwegianRockCat
Copy link
Contributor Author

NorwegianRockCat commented Jul 25, 2020

It seems to happen every time I tried this. This core dump was done right after I loaded the system. I typed sysctl -a to see that it worked.

then

kldload amdgpu

then sysctl -a and the system crashed.

Otherwise, cpuctl and amdtemp are the only other modules I loaded explicitly. I think the core dump might show what other required modules were loaded.

I am away from the system until Friday, but I can try and investigate a little more when I am back.

@NorwegianRockCat
Copy link
Contributor Author

NorwegianRockCat commented Aug 1, 2020

Did a new rebuild on the system and it still happens when amdgpu is loaded. I'll see if I can dig more.

Perusing the source code I did find this interesting comment in amdgpu_pm.c:418:

   /* sysctl -a can panic if this data is uninitialized */
    memset(&data, 0, sizeof(struct pp_states_info));

So, I wonder if there is some other Navi 10-specific structures that might have a similar issue? Any suggestions where one could start looking? Are all snprintf() calls likely being fed to sysctl?

@NorwegianRockCat NorwegianRockCat changed the title Panic when running sysctl -a with amdgpu loaded Panic when running sysctl sys.device.drmn0.pcie_replay_count with amdgpu Aug 2, 2020
@NorwegianRockCat
Copy link
Contributor Author

I had a little bit of time and read the sysctl manpage. It seems that oid that causes the problem is sys.device.drmn0.pcie_replay_count. I've updated the bug report to reflect that information.

@DarkKirb
Copy link

DarkKirb commented Nov 4, 2020

i poked around with kgdb a bit and it appears that https://github.com/freebsd/drm-kmod/blob/master/drivers/gpu/drm/amd/amdgpu/nv.c#L564-L582 nv_asic_funcs is not fully defined, namely it is missing the fields:

  • int (*get_pcie_lanes)(struct amdgpu_device *adev);
  • void (*set_pcie_lanes)(struct amdgpu_device *adev, int lanes);
  • uint64_t (*get_pcie_replay_count)(struct amdgpu_device *adev);

This bug affects Navi GPUs (the rx 5xxx series). The other instantiations of the amdgpu_asic_funcs structure contain the get_pcie_replay count field, meaning it is not reproducable there.

DarkKirb pushed a commit to DarkKirb/drm-kmod that referenced this issue Nov 4, 2020
this commit adds a stub "nv_get_pcie_replay_count" function that prevents nullptr() from being called in kernelspace on systems with navi gpus.

This commit fixes freebsd#15
@NorwegianRockCat
Copy link
Contributor Author

NorwegianRockCat commented Nov 4, 2020

Great sleuthing @DarkKirb! Thanks for tracking this down! I've been meaning to track this more, but my current machine with the Navi card is currently packed away waiting for a move.

@NorwegianRockCat
Copy link
Contributor Author

Hi @evadot, this problem is still present in 13-STABLE (and -BETA I imagine).

You mentioned that you would cherry-pick torvalds/linux@2af8153 from 5.5 to 5.4-lts (ref #37) . The cherry-pick indeed solves the problem.

It would be nice to have this working for 13-RELEASE as sysctl -a |grep … is a pattern that shows up from time to time in some of my workflows.

Can you do the cherry-pick, please?

@evadot
Copy link
Contributor

evadot commented Mar 11, 2021

Mhm, I somehow was sure that I did cherry-picked the patch but it seems not ...
Anyway, it's done now.
I have a few other stuff to include before I cut a new release but if you could confirm that building form the 5.4-lts branch fixes the issue for you.
Thanks.

@NorwegianRockCat
Copy link
Contributor Author

I already have that patch applied locally to the 5.4-lts branch, and I can confirm that it does fix the problem.

I thought you had applied the patch too, but when I looked at the code, I couldn't see it. So hence my nudge here, glad it was finally dealt with before the 13.0-RELEASE.

I'll let you do the honors of closing the issue :-).

@evadot evadot closed this as completed Oct 8, 2021
lutzbichler pushed a commit to lutzbichler/drm-kmod that referenced this issue May 14, 2023
Recently we got a hard hang during the boot on DCN 3.0.1,
which caused the below null pointer exception:

[ +0.000426] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ +0.000003] #PF: supervisor read access in kernel mode
[ +0.000003] #PF: error_code(0x0000) - not-present page
[ +0.000003] PGD 0 P4D 0
[ +0.000004] Oops: 0000 [freebsd#1] PREEMPT SMP NOPTI
[ +0.000005] CPU: 6 PID: 874 Comm: Xorg Not tainted 5.16.0.asdn-apr28+ freebsd#15
[ +0.000004] Hardware name: AMD Chachani-VN/Chachani-VN, BIOS WCH2303N 03/03/2022
[ +0.000003] RIP: 0010:resource_map_pool_resources+0x431/0xa70 [amdgpu]
[ +0.000356] Code: c1 4d 89 c8 49 c1 e0 07 4d 01 c8 49 c1 e0 04 4d 01 f0 49 83 b8 f0 01 00 00 00 0f 85 16 02 00 00 49 8b b8 e0 02 00 00 89 45 c0 <48> 8b 17 4c 8b 92 a0 01 00 00 4d 85 d2 74 24 4c 89 4d 88 48 8d 4d
[ +0.000003] RSP: 0018:ffffa92a4142f718 EFLAGS: 00010246
[ +0.000003] RAX: 0000000000000000 RBX: ffff9a0b86d93000 RCX: 0000000000000000
[ +0.000002] RDX: 0000000000000000 RSI: 000000000000554b RDI: 0000000000000000
[ +0.000002] RBP: ffffa92a4142f798 R08: ffff9a0bdb3c0000  0000000000000000
[ +0.000002] R10: 0000000000000000 R11: 000000000000f000 R12: 0000000000000000
[ +0.000001] R13: ffff9a0b88360000 R14: ffff9a0bdb3c0000 R15: ffff9a0b86273000
[ +0.000003] FS: 00007f4b5641ca40(0000) GS:ffff9a0cb7f80000(0000) knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000002] CR2: 0000000000000000 CR3: 0000000102cb2000 CR4: 00000000003506e0
[ +0.000003] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] ? kvmalloc_node+0x5c/0x90
[ +0.000009] dcn20_add_stream_to_ctx+0x1c/0x90 [amdgpu]
[ +0.000330] dcn30_add_stream_to_ctx+0xe/0x10 [amdgpu]
[ +0.000313] dc_add_stream_to_ctx+0x67/0x80 [amdgpu]
[ +0.000300] dm_update_crtc_state+0x4dd/0x6e0 [amdgpu]
[ +0.000320] amdgpu_dm_atomic_check+0x63b/0x1270 [amdgpu]
[ +0.000311] ? __drm_mode_object_add+0x90/0xc0 [drm]
[ +0.000043] ? preempt_count_add+0x74/0xc0
[ +0.000005] ? _raw_spin_lock_irqsave+0x2a/0x60
[ +0.000006] ? _raw_spin_unlock_irqrestore+0x29/0x3d
[ +0.000003] ? drm_connector_list_iter_next+0x8e/0xb0 [drm]
[ +0.000038] drm_atomic_check_only+0x5dd/0xa20 [drm]
[ +0.000044] drm_atomic_commit+0x18/0x60 [drm]
[ +0.000046] drm_client_modeset_commit_atomic+0x1e5/0x220 [drm]
[ +0.000051] drm_client_modeset_commit_locked+0x57/0x160 [drm]
[ +0.000038] __drm_fb_helper_restore_fbdev_mode_unlocked+0x60/0xd0 [drm_kms_helper]
[ +0.000027] drm_fb_helper_set_par+0x40/0x50 [drm_kms_helper]
[ +0.000022] fb_set_var+0x1c8/0x3d0
[ +0.000007] ? __ext4_mark_inode_dirty+0x83/0x210
[ +0.000006] ? __ext4_journal_stop+0x3c/0xb0
[ +0.000008] fbcon_blank+0x228/0x290
[ +0.000007] do_unblank_screen+0xae/0x150
[ +0.000005] vt_ioctl+0xcf4/0x1360
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000004] ? debug_smp_processor_id+0x17/0x20
[ +0.000004] tty_ioctl+0x373/0x8a0
[ +0.000005] ? __fput+0x123/0x260
[ +0.000004] ? __fget_light+0xc5/0x100
[ +0.000005] __x64_sys_ioctl+0x91/0xc0
[ +0.000005] do_syscall_64+0x3b/0xc0
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x44/0xae

This issue happens because "pipe_ctx->stream_res.tg"
needs to be initialized first before reading its members.
This commit fixes this issue by properly initializing
the pointer before accessing the target data.

Fixes: 663d2daeaee6 ("drm/amd/display: Add odm seamless boot support")
Cc: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com>
Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
wulf7 pushed a commit to wulf7/drm-kmod that referenced this issue Nov 12, 2023
Recently we got a hard hang during the boot on DCN 3.0.1,
which caused the below null pointer exception:

[ +0.000426] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ +0.000003] #PF: supervisor read access in kernel mode
[ +0.000003] #PF: error_code(0x0000) - not-present page
[ +0.000003] PGD 0 P4D 0
[ +0.000004] Oops: 0000 [freebsd#1] PREEMPT SMP NOPTI
[ +0.000005] CPU: 6 PID: 874 Comm: Xorg Not tainted 5.16.0.asdn-apr28+ freebsd#15
[ +0.000004] Hardware name: AMD Chachani-VN/Chachani-VN, BIOS WCH2303N 03/03/2022
[ +0.000003] RIP: 0010:resource_map_pool_resources+0x431/0xa70 [amdgpu]
[ +0.000356] Code: c1 4d 89 c8 49 c1 e0 07 4d 01 c8 49 c1 e0 04 4d 01 f0 49 83 b8 f0 01 00 00 00 0f 85 16 02 00 00 49 8b b8 e0 02 00 00 89 45 c0 <48> 8b 17 4c 8b 92 a0 01 00 00 4d 85 d2 74 24 4c 89 4d 88 48 8d 4d
[ +0.000003] RSP: 0018:ffffa92a4142f718 EFLAGS: 00010246
[ +0.000003] RAX: 0000000000000000 RBX: ffff9a0b86d93000 RCX: 0000000000000000
[ +0.000002] RDX: 0000000000000000 RSI: 000000000000554b RDI: 0000000000000000
[ +0.000002] RBP: ffffa92a4142f798 R08: ffff9a0bdb3c0000  0000000000000000
[ +0.000002] R10: 0000000000000000 R11: 000000000000f000 R12: 0000000000000000
[ +0.000001] R13: ffff9a0b88360000 R14: ffff9a0bdb3c0000 R15: ffff9a0b86273000
[ +0.000003] FS: 00007f4b5641ca40(0000) GS:ffff9a0cb7f80000(0000) knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000002] CR2: 0000000000000000 CR3: 0000000102cb2000 CR4: 00000000003506e0
[ +0.000003] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] ? kvmalloc_node+0x5c/0x90
[ +0.000009] dcn20_add_stream_to_ctx+0x1c/0x90 [amdgpu]
[ +0.000330] dcn30_add_stream_to_ctx+0xe/0x10 [amdgpu]
[ +0.000313] dc_add_stream_to_ctx+0x67/0x80 [amdgpu]
[ +0.000300] dm_update_crtc_state+0x4dd/0x6e0 [amdgpu]
[ +0.000320] amdgpu_dm_atomic_check+0x63b/0x1270 [amdgpu]
[ +0.000311] ? __drm_mode_object_add+0x90/0xc0 [drm]
[ +0.000043] ? preempt_count_add+0x74/0xc0
[ +0.000005] ? _raw_spin_lock_irqsave+0x2a/0x60
[ +0.000006] ? _raw_spin_unlock_irqrestore+0x29/0x3d
[ +0.000003] ? drm_connector_list_iter_next+0x8e/0xb0 [drm]
[ +0.000038] drm_atomic_check_only+0x5dd/0xa20 [drm]
[ +0.000044] drm_atomic_commit+0x18/0x60 [drm]
[ +0.000046] drm_client_modeset_commit_atomic+0x1e5/0x220 [drm]
[ +0.000051] drm_client_modeset_commit_locked+0x57/0x160 [drm]
[ +0.000038] __drm_fb_helper_restore_fbdev_mode_unlocked+0x60/0xd0 [drm_kms_helper]
[ +0.000027] drm_fb_helper_set_par+0x40/0x50 [drm_kms_helper]
[ +0.000022] fb_set_var+0x1c8/0x3d0
[ +0.000007] ? __ext4_mark_inode_dirty+0x83/0x210
[ +0.000006] ? __ext4_journal_stop+0x3c/0xb0
[ +0.000008] fbcon_blank+0x228/0x290
[ +0.000007] do_unblank_screen+0xae/0x150
[ +0.000005] vt_ioctl+0xcf4/0x1360
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000004] ? debug_smp_processor_id+0x17/0x20
[ +0.000004] tty_ioctl+0x373/0x8a0
[ +0.000005] ? __fput+0x123/0x260
[ +0.000004] ? __fget_light+0xc5/0x100
[ +0.000005] __x64_sys_ioctl+0x91/0xc0
[ +0.000005] do_syscall_64+0x3b/0xc0
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x44/0xae

This issue happens because "pipe_ctx->stream_res.tg"
needs to be initialized first before reading its members.
This commit fixes this issue by properly initializing
the pointer before accessing the target data.

Fixes: 663d2daeaee6 ("drm/amd/display: Add odm seamless boot support")
Cc: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com>
Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
wulf7 pushed a commit to wulf7/drm-kmod that referenced this issue Nov 12, 2023
Recently we got a hard hang during the boot on DCN 3.0.1,
which caused the below null pointer exception:

[ +0.000426] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ +0.000003] #PF: supervisor read access in kernel mode
[ +0.000003] #PF: error_code(0x0000) - not-present page
[ +0.000003] PGD 0 P4D 0
[ +0.000004] Oops: 0000 [freebsd#1] PREEMPT SMP NOPTI
[ +0.000005] CPU: 6 PID: 874 Comm: Xorg Not tainted 5.16.0.asdn-apr28+ freebsd#15
[ +0.000004] Hardware name: AMD Chachani-VN/Chachani-VN, BIOS WCH2303N 03/03/2022
[ +0.000003] RIP: 0010:resource_map_pool_resources+0x431/0xa70 [amdgpu]
[ +0.000356] Code: c1 4d 89 c8 49 c1 e0 07 4d 01 c8 49 c1 e0 04 4d 01 f0 49 83 b8 f0 01 00 00 00 0f 85 16 02 00 00 49 8b b8 e0 02 00 00 89 45 c0 <48> 8b 17 4c 8b 92 a0 01 00 00 4d 85 d2 74 24 4c 89 4d 88 48 8d 4d
[ +0.000003] RSP: 0018:ffffa92a4142f718 EFLAGS: 00010246
[ +0.000003] RAX: 0000000000000000 RBX: ffff9a0b86d93000 RCX: 0000000000000000
[ +0.000002] RDX: 0000000000000000 RSI: 000000000000554b RDI: 0000000000000000
[ +0.000002] RBP: ffffa92a4142f798 R08: ffff9a0bdb3c0000  0000000000000000
[ +0.000002] R10: 0000000000000000 R11: 000000000000f000 R12: 0000000000000000
[ +0.000001] R13: ffff9a0b88360000 R14: ffff9a0bdb3c0000 R15: ffff9a0b86273000
[ +0.000003] FS: 00007f4b5641ca40(0000) GS:ffff9a0cb7f80000(0000) knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000002] CR2: 0000000000000000 CR3: 0000000102cb2000 CR4: 00000000003506e0
[ +0.000003] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] ? kvmalloc_node+0x5c/0x90
[ +0.000009] dcn20_add_stream_to_ctx+0x1c/0x90 [amdgpu]
[ +0.000330] dcn30_add_stream_to_ctx+0xe/0x10 [amdgpu]
[ +0.000313] dc_add_stream_to_ctx+0x67/0x80 [amdgpu]
[ +0.000300] dm_update_crtc_state+0x4dd/0x6e0 [amdgpu]
[ +0.000320] amdgpu_dm_atomic_check+0x63b/0x1270 [amdgpu]
[ +0.000311] ? __drm_mode_object_add+0x90/0xc0 [drm]
[ +0.000043] ? preempt_count_add+0x74/0xc0
[ +0.000005] ? _raw_spin_lock_irqsave+0x2a/0x60
[ +0.000006] ? _raw_spin_unlock_irqrestore+0x29/0x3d
[ +0.000003] ? drm_connector_list_iter_next+0x8e/0xb0 [drm]
[ +0.000038] drm_atomic_check_only+0x5dd/0xa20 [drm]
[ +0.000044] drm_atomic_commit+0x18/0x60 [drm]
[ +0.000046] drm_client_modeset_commit_atomic+0x1e5/0x220 [drm]
[ +0.000051] drm_client_modeset_commit_locked+0x57/0x160 [drm]
[ +0.000038] __drm_fb_helper_restore_fbdev_mode_unlocked+0x60/0xd0 [drm_kms_helper]
[ +0.000027] drm_fb_helper_set_par+0x40/0x50 [drm_kms_helper]
[ +0.000022] fb_set_var+0x1c8/0x3d0
[ +0.000007] ? __ext4_mark_inode_dirty+0x83/0x210
[ +0.000006] ? __ext4_journal_stop+0x3c/0xb0
[ +0.000008] fbcon_blank+0x228/0x290
[ +0.000007] do_unblank_screen+0xae/0x150
[ +0.000005] vt_ioctl+0xcf4/0x1360
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000004] ? debug_smp_processor_id+0x17/0x20
[ +0.000004] tty_ioctl+0x373/0x8a0
[ +0.000005] ? __fput+0x123/0x260
[ +0.000004] ? __fget_light+0xc5/0x100
[ +0.000005] __x64_sys_ioctl+0x91/0xc0
[ +0.000005] do_syscall_64+0x3b/0xc0
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x44/0xae

This issue happens because "pipe_ctx->stream_res.tg"
needs to be initialized first before reading its members.
This commit fixes this issue by properly initializing
the pointer before accessing the target data.

Fixes: 663d2daeaee6 ("drm/amd/display: Add odm seamless boot support")
Cc: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com>
Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
wulf7 pushed a commit to wulf7/drm-kmod that referenced this issue Nov 12, 2023
Recently we got a hard hang during the boot on DCN 3.0.1,
which caused the below null pointer exception:

[ +0.000426] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ +0.000003] #PF: supervisor read access in kernel mode
[ +0.000003] #PF: error_code(0x0000) - not-present page
[ +0.000003] PGD 0 P4D 0
[ +0.000004] Oops: 0000 [freebsd#1] PREEMPT SMP NOPTI
[ +0.000005] CPU: 6 PID: 874 Comm: Xorg Not tainted 5.16.0.asdn-apr28+ freebsd#15
[ +0.000004] Hardware name: AMD Chachani-VN/Chachani-VN, BIOS WCH2303N 03/03/2022
[ +0.000003] RIP: 0010:resource_map_pool_resources+0x431/0xa70 [amdgpu]
[ +0.000356] Code: c1 4d 89 c8 49 c1 e0 07 4d 01 c8 49 c1 e0 04 4d 01 f0 49 83 b8 f0 01 00 00 00 0f 85 16 02 00 00 49 8b b8 e0 02 00 00 89 45 c0 <48> 8b 17 4c 8b 92 a0 01 00 00 4d 85 d2 74 24 4c 89 4d 88 48 8d 4d
[ +0.000003] RSP: 0018:ffffa92a4142f718 EFLAGS: 00010246
[ +0.000003] RAX: 0000000000000000 RBX: ffff9a0b86d93000 RCX: 0000000000000000
[ +0.000002] RDX: 0000000000000000 RSI: 000000000000554b RDI: 0000000000000000
[ +0.000002] RBP: ffffa92a4142f798 R08: ffff9a0bdb3c0000  0000000000000000
[ +0.000002] R10: 0000000000000000 R11: 000000000000f000 R12: 0000000000000000
[ +0.000001] R13: ffff9a0b88360000 R14: ffff9a0bdb3c0000 R15: ffff9a0b86273000
[ +0.000003] FS: 00007f4b5641ca40(0000) GS:ffff9a0cb7f80000(0000) knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000002] CR2: 0000000000000000 CR3: 0000000102cb2000 CR4: 00000000003506e0
[ +0.000003] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] ? kvmalloc_node+0x5c/0x90
[ +0.000009] dcn20_add_stream_to_ctx+0x1c/0x90 [amdgpu]
[ +0.000330] dcn30_add_stream_to_ctx+0xe/0x10 [amdgpu]
[ +0.000313] dc_add_stream_to_ctx+0x67/0x80 [amdgpu]
[ +0.000300] dm_update_crtc_state+0x4dd/0x6e0 [amdgpu]
[ +0.000320] amdgpu_dm_atomic_check+0x63b/0x1270 [amdgpu]
[ +0.000311] ? __drm_mode_object_add+0x90/0xc0 [drm]
[ +0.000043] ? preempt_count_add+0x74/0xc0
[ +0.000005] ? _raw_spin_lock_irqsave+0x2a/0x60
[ +0.000006] ? _raw_spin_unlock_irqrestore+0x29/0x3d
[ +0.000003] ? drm_connector_list_iter_next+0x8e/0xb0 [drm]
[ +0.000038] drm_atomic_check_only+0x5dd/0xa20 [drm]
[ +0.000044] drm_atomic_commit+0x18/0x60 [drm]
[ +0.000046] drm_client_modeset_commit_atomic+0x1e5/0x220 [drm]
[ +0.000051] drm_client_modeset_commit_locked+0x57/0x160 [drm]
[ +0.000038] __drm_fb_helper_restore_fbdev_mode_unlocked+0x60/0xd0 [drm_kms_helper]
[ +0.000027] drm_fb_helper_set_par+0x40/0x50 [drm_kms_helper]
[ +0.000022] fb_set_var+0x1c8/0x3d0
[ +0.000007] ? __ext4_mark_inode_dirty+0x83/0x210
[ +0.000006] ? __ext4_journal_stop+0x3c/0xb0
[ +0.000008] fbcon_blank+0x228/0x290
[ +0.000007] do_unblank_screen+0xae/0x150
[ +0.000005] vt_ioctl+0xcf4/0x1360
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000004] ? debug_smp_processor_id+0x17/0x20
[ +0.000004] tty_ioctl+0x373/0x8a0
[ +0.000005] ? __fput+0x123/0x260
[ +0.000004] ? __fget_light+0xc5/0x100
[ +0.000005] __x64_sys_ioctl+0x91/0xc0
[ +0.000005] do_syscall_64+0x3b/0xc0
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x44/0xae

This issue happens because "pipe_ctx->stream_res.tg"
needs to be initialized first before reading its members.
This commit fixes this issue by properly initializing
the pointer before accessing the target data.

Fixes: 663d2daeaee6 ("drm/amd/display: Add odm seamless boot support")
Cc: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com>
Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
wulf7 pushed a commit to wulf7/drm-kmod that referenced this issue Nov 28, 2023
Recently we got a hard hang during the boot on DCN 3.0.1,
which caused the below null pointer exception:

[ +0.000426] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ +0.000003] #PF: supervisor read access in kernel mode
[ +0.000003] #PF: error_code(0x0000) - not-present page
[ +0.000003] PGD 0 P4D 0
[ +0.000004] Oops: 0000 [freebsd#1] PREEMPT SMP NOPTI
[ +0.000005] CPU: 6 PID: 874 Comm: Xorg Not tainted 5.16.0.asdn-apr28+ freebsd#15
[ +0.000004] Hardware name: AMD Chachani-VN/Chachani-VN, BIOS WCH2303N 03/03/2022
[ +0.000003] RIP: 0010:resource_map_pool_resources+0x431/0xa70 [amdgpu]
[ +0.000356] Code: c1 4d 89 c8 49 c1 e0 07 4d 01 c8 49 c1 e0 04 4d 01 f0 49 83 b8 f0 01 00 00 00 0f 85 16 02 00 00 49 8b b8 e0 02 00 00 89 45 c0 <48> 8b 17 4c 8b 92 a0 01 00 00 4d 85 d2 74 24 4c 89 4d 88 48 8d 4d
[ +0.000003] RSP: 0018:ffffa92a4142f718 EFLAGS: 00010246
[ +0.000003] RAX: 0000000000000000 RBX: ffff9a0b86d93000 RCX: 0000000000000000
[ +0.000002] RDX: 0000000000000000 RSI: 000000000000554b RDI: 0000000000000000
[ +0.000002] RBP: ffffa92a4142f798 R08: ffff9a0bdb3c0000  0000000000000000
[ +0.000002] R10: 0000000000000000 R11: 000000000000f000 R12: 0000000000000000
[ +0.000001] R13: ffff9a0b88360000 R14: ffff9a0bdb3c0000 R15: ffff9a0b86273000
[ +0.000003] FS: 00007f4b5641ca40(0000) GS:ffff9a0cb7f80000(0000) knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000002] CR2: 0000000000000000 CR3: 0000000102cb2000 CR4: 00000000003506e0
[ +0.000003] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] ? kvmalloc_node+0x5c/0x90
[ +0.000009] dcn20_add_stream_to_ctx+0x1c/0x90 [amdgpu]
[ +0.000330] dcn30_add_stream_to_ctx+0xe/0x10 [amdgpu]
[ +0.000313] dc_add_stream_to_ctx+0x67/0x80 [amdgpu]
[ +0.000300] dm_update_crtc_state+0x4dd/0x6e0 [amdgpu]
[ +0.000320] amdgpu_dm_atomic_check+0x63b/0x1270 [amdgpu]
[ +0.000311] ? __drm_mode_object_add+0x90/0xc0 [drm]
[ +0.000043] ? preempt_count_add+0x74/0xc0
[ +0.000005] ? _raw_spin_lock_irqsave+0x2a/0x60
[ +0.000006] ? _raw_spin_unlock_irqrestore+0x29/0x3d
[ +0.000003] ? drm_connector_list_iter_next+0x8e/0xb0 [drm]
[ +0.000038] drm_atomic_check_only+0x5dd/0xa20 [drm]
[ +0.000044] drm_atomic_commit+0x18/0x60 [drm]
[ +0.000046] drm_client_modeset_commit_atomic+0x1e5/0x220 [drm]
[ +0.000051] drm_client_modeset_commit_locked+0x57/0x160 [drm]
[ +0.000038] __drm_fb_helper_restore_fbdev_mode_unlocked+0x60/0xd0 [drm_kms_helper]
[ +0.000027] drm_fb_helper_set_par+0x40/0x50 [drm_kms_helper]
[ +0.000022] fb_set_var+0x1c8/0x3d0
[ +0.000007] ? __ext4_mark_inode_dirty+0x83/0x210
[ +0.000006] ? __ext4_journal_stop+0x3c/0xb0
[ +0.000008] fbcon_blank+0x228/0x290
[ +0.000007] do_unblank_screen+0xae/0x150
[ +0.000005] vt_ioctl+0xcf4/0x1360
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000005] ? get_max_files+0x20/0x20
[ +0.000004] ? debug_smp_processor_id+0x17/0x20
[ +0.000004] tty_ioctl+0x373/0x8a0
[ +0.000005] ? __fput+0x123/0x260
[ +0.000004] ? __fget_light+0xc5/0x100
[ +0.000005] __x64_sys_ioctl+0x91/0xc0
[ +0.000005] do_syscall_64+0x3b/0xc0
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x44/0xae

This issue happens because "pipe_ctx->stream_res.tg"
needs to be initialized first before reading its members.
This commit fixes this issue by properly initializing
the pointer before accessing the target data.

Fixes: 663d2daeaee6 ("drm/amd/display: Add odm seamless boot support")
Cc: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com>
Reviewed-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants