Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphics/drm-61-kmod amdgpu panic: Unregistered use of FPU in kernel #277

Closed
jownit opened this issue Jan 10, 2024 · 12 comments
Closed

graphics/drm-61-kmod amdgpu panic: Unregistered use of FPU in kernel #277

jownit opened this issue Jan 10, 2024 · 12 comments
Assignees
Labels
amdgpu amdgpu related problems bug Something isn't working

Comments

@jownit
Copy link

jownit commented Jan 10, 2024

Describe the bug
Panics when loading amdgpu.ko

FreeBSD version
FreeBSD joxan 15.0-CURRENT FreeBSD 15.0-CURRENT #123 main-n267479-13720136fbf9: Wed Jan 10 09:26:10 CET 2024 root@joxan:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 1500008 1500008

PCI Info

pciconf -lv hostb0@pci0:0:0:0: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15d0 subvendor=0x1022 subdevice=0x15d0 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Root Complex' class = bridge subclass = HOST-PCI none0@pci0:0:0:2: class=0x080600 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15d1 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 IOMMU' class = base peripheral subclass = IOMMU hostb1@pci0:0:1:0: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1452 subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge' class = bridge subclass = HOST-PCI pcib1@pci0:0:1:2: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x15d3 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 PCIe GPP Bridge [6:0]' class = bridge subclass = PCI-PCI pcib2@pci0:0:1:3: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x15d3 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 PCIe GPP Bridge [6:0]' class = bridge subclass = PCI-PCI pcib3@pci0:0:1:4: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x15d3 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 PCIe GPP Bridge [6:0]' class = bridge subclass = PCI-PCI pcib4@pci0:0:1:7: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x15d3 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 PCIe GPP Bridge [6:0]' class = bridge subclass = PCI-PCI hostb2@pci0:0:8:0: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1452 subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge' class = bridge subclass = HOST-PCI pcib5@pci0:0:8:1: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1022 device=0x15db subvendor=0x5126 subdevice=0x17aa vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A' class = bridge subclass = PCI-PCI intsmb0@pci0:0:20:0: class=0x0c0500 rev=0x61 hdr=0x00 vendor=0x1022 device=0x790b subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'FCH SMBus Controller' class = serial bus subclass = SMBus isab0@pci0:0:20:3: class=0x060100 rev=0x51 hdr=0x00 vendor=0x1022 device=0x790e subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'FCH LPC Bridge' class = bridge subclass = PCI-ISA hostb3@pci0:0:24:0: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e8 subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 0' class = bridge subclass = HOST-PCI hostb4@pci0:0:24:1: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e9 subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 1' class = bridge subclass = HOST-PCI hostb5@pci0:0:24:2: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15ea subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 2' class = bridge subclass = HOST-PCI hostb6@pci0:0:24:3: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15eb subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 3' class = bridge subclass = HOST-PCI hostb7@pci0:0:24:4: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15ec subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 4' class = bridge subclass = HOST-PCI hostb8@pci0:0:24:5: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15ed subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 5' class = bridge subclass = HOST-PCI hostb9@pci0:0:24:6: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15ee subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 6' class = bridge subclass = HOST-PCI hostb10@pci0:0:24:7: class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15ef subvendor=0x0000 subdevice=0x0000 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven/Raven2 Device 24: Function 7' class = bridge subclass = HOST-PCI iwm0@pci0:1:0:0: class=0x028000 rev=0x29 hdr=0x00 vendor=0x8086 device=0x2526 subvendor=0x8086 subdevice=0x0014 vendor = 'Intel Corporation' device = 'Wireless-AC 9260' class = network nvme0@pci0:2:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x15b7 device=0x5006 subvendor=0x15b7 subdevice=0x5006 vendor = 'Sandisk Corp' device = 'SanDisk Extreme Pro / WD Black SN750 / PC SN730 / Red SN700 NVMe SSD' class = mass storage subclass = NVM re0@pci0:3:0:0: class=0x020000 rev=0x0e hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x17aa subdevice=0x5126 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet none1@pci0:3:0:1: class=0x070002 rev=0x0e hdr=0x00 vendor=0x10ec device=0x816a subvendor=0x17aa subdevice=0x5126 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111xP UART' class = simple comms subclass = UART none2@pci0:3:0:2: class=0x070002 rev=0x0e hdr=0x00 vendor=0x10ec device=0x816b subvendor=0x17aa subdevice=0x5126 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111xP UART' class = simple comms subclass = UART none3@pci0:3:0:3: class=0x0c0701 rev=0x0e hdr=0x00 vendor=0x10ec device=0x816c subvendor=0x17aa subdevice=0x5126 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111xP IPMI interface' class = serial bus subclass = IPMI ehci0@pci0:3:0:4: class=0x0c0320 rev=0x0e hdr=0x00 vendor=0x10ec device=0x816d subvendor=0x17aa subdevice=0x5126 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL811x EHCI host controller' class = serial bus subclass = USB rtsx0@pci0:4:0:0: class=0xff0000 rev=0x01 hdr=0x00 vendor=0x10ec device=0x522a subvendor=0x17aa subdevice=0x5126 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTS522A PCI Express Card Reader' vgapci0@pci0:5:0:0: class=0x030000 rev=0xd2 hdr=0x00 vendor=0x1002 device=0x15d8 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series]' class = display subclass = VGA hdac0@pci0:5:0:1: class=0x040300 rev=0x00 hdr=0x00 vendor=0x1002 device=0x15de subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]' device = 'Raven/Raven2/Fenghuang HDMI/DP Audio Controller' class = multimedia subclass = HDA none4@pci0:5:0:2: class=0x108000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15df subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Family 17h (Models 10h-1fh) Platform Security Processor' class = encrypt/decrypt xhci0@pci0:5:0:3: class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e0 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven USB 3.1' class = serial bus subclass = USB xhci1@pci0:5:0:4: class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e1 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Raven USB 3.1' class = serial bus subclass = USB none5@pci0:5:0:5: class=0x048000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e2 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'ACP/ACP3X/ACP6x Audio Coprocessor' class = multimedia hdac1@pci0:5:0:6: class=0x040300 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e3 subvendor=0x17aa subdevice=0x5126 vendor = 'Advanced Micro Devices, Inc. [AMD]' device = 'Family 17h/19h HD Audio Controller' class = multimedia subclass = HDA

DRM KMOD version
drm-61-kmod-6.1.69

To Reproduce
Install drm-61-kmod
kldload amdgpu

Screenshots
Panic message provided below

Additional context
<6>[drm] amdgpu kernel modesetting enabled.
drmn0: on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
<6>[drm] initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x17AA:0x5126 0xD2).
<6>[drm] register mmio base: 0xD0500000
<6>[drm] register mmio size: 524288
<6>[drm] add ip block number 0 <soc15_common>
<6>[drm] add ip block number 1 <gmc_v9_0>
<6>[drm] add ip block number 2 <vega10_ih>
<6>[drm] add ip block number 3
<6>[drm] add ip block number 4
<6>[drm] add ip block number 5
<6>[drm] add ip block number 6 <gfx_v9_0>
<6>[drm] add ip block number 7 <sdma_v4_0>
<6>[drm] add ip block number 8 <vcn_v1_0>
drmn0: successfully loaded firmware image 'amdgpu/picasso_gpu_info.bin'
drmn0: Fetched VBIOS from VFCT
<6>amdgpu: ATOM BIOS: 113-PICASSO-117
drmn0: successfully loaded firmware image 'amdgpu/picasso_sdma.bin'
<6>[drm] VCN decode is enabled in VM mode
<6>[drm] VCN encode is enabled in VM mode
<6>[drm] JPEG decode is enabled in VM mode
drmn0: Trusted Memory Zone (TMZ) feature enabled
drmn0: PCIE atomic ops is not supported
<6>[drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
drmn0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
drmn0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
drmn0: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
[drm ERROR :amdgpu_bo_init] Unable to set WC memtype for the aperture base
<6>[drm] Detected VRAM RAM=2048M, BAR=2048M
<6>[drm] RAM width 128bits DDR4
<6>[drm] amdgpu: 2048M of VRAM memory ready
<6>[drm] amdgpu: 7091M of GTT memory ready.
<6>[drm] GART: num cpu pages 262144, num gpu pages 262144
<6>[drm] PCIE GART of 1024M enabled.
<6>[drm] PTB located at 0x000000F400A00000
drmn0: successfully loaded firmware image 'amdgpu/picasso_asd.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_ta.bin'
drmn0: PSP runtime database doesn't exist
drmn0: PSP runtime database doesn't exist
<6>amdgpu: hwmgr_sw_init smu backed is smu10_smu
drmn0: could not load firmware image 'amdgpu/raven_dmcu.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_pfp.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_me.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_ce.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_rlc.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_mec.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_mec2.bin'
drmn0: successfully loaded firmware image 'amdgpu/picasso_vcn.bin'
<6>[drm] Found VCN firmware Version ENC: 1.13 DEC: 2 VEP: 0 Revision: 4
drmn0: Will use PSP to load VCN firmware
<6>[drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
drmn0: RAS: optional ras ta ucode is not available
drmn0: RAP: optional rap ta ucode is not available
<6>[drm] DM_PPLIB: values for F clock
<6>[drm] DM_PPLIB: 400000 in kHz, 2924 in mV
<6>[drm] DM_PPLIB: 933000 in kHz, 3249 in mV
<6>[drm] DM_PPLIB: 1067000 in kHz, 3924 in mV
<6>[drm] DM_PPLIB: 1200000 in kHz, 4074 in mV
<6>[drm] DM_PPLIB: values for DCF clock
<6>[drm] DM_PPLIB: 300000 in kHz, 2924 in mV
<6>[drm] DM_PPLIB: 600000 in kHz, 3249 in mV
<6>[drm] DM_PPLIB: 626000 in kHz, 3924 in mV
<6>[drm] DM_PPLIB: 654000 in kHz, 4074 in mV
panic: Unregistered use of FPU in kernel
cpuid = 2
time = 1704878955
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0100b3af70
vpanic() at vpanic+0x131/frame 0xfffffe0100b3b0a0
panic() at panic+0x43/frame 0xfffffe0100b3b100
trap() at trap+0x8dc/frame 0xfffffe0100b3b220
calltrap() at calltrap+0x8/frame 0xfffffe0100b3b220
--- trap 0x16, rip = 0xffffffff8400299b, rsp = 0xfffffe0100b3b2f0, rbp = 0xfffffe0100b3b2f0 ---
dcn_bw_sync_calcs_and_dml() at dcn_bw_sync_calcs_and_dml+0xb/frame 0xfffffe0100b3b2f0
dcn10_create_resource_pool() at dcn10_create_resource_pool+0x6a6/frame 0xfffffe0100b3b490
dc_create_resource_pool() at dc_create_resource_pool+0x4c/frame 0xfffffe0100b3b4b0
dc_create() at dc_create+0x330/frame 0xfffffe0100b3b4f0
dm_hw_init() at dm_hw_init+0x3d9/frame 0xfffffe0100b3b6e0
amdgpu_device_ip_hw_init_phase2() at amdgpu_device_ip_hw_init_phase2+0x5a/frame 0xfffffe0100b3b710
amdgpu_device_ip_init() at amdgpu_device_ip_init+0x370/frame 0xfffffe0100b3b790
amdgpu_device_init() at amdgpu_device_init+0x1cdb/frame 0xfffffe0100b3b850
amdgpu_driver_load_kms() at amdgpu_driver_load_kms+0x16/frame 0xfffffe0100b3b880
amdgpu_pci_probe() at amdgpu_pci_probe+0x283/frame 0xfffffe0100b3b8c0
linux_pci_attach_device() at linux_pci_attach_device+0x478/frame 0xfffffe0100b3b910
device_attach() at device_attach+0x3b5/frame 0xfffffe0100b3b960
bus_generic_driver_added() at bus_generic_driver_added+0xa1/frame 0xfffffe0100b3b990
devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe0100b3b9d0
devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe0100b3ba10
_linux_pci_register_driver() at _linux_pci_register_driver+0xcc/frame 0xfffffe0100b3ba40
amdgpu_evh() at amdgpu_evh+0x80/frame 0xfffffe0100b3ba50
module_register_init() at module_register_init+0x85/frame 0xfffffe0100b3ba80
linker_load_module() at linker_load_module+0xbf9/frame 0xfffffe0100b3bd70
kern_kldload() at kern_kldload+0x16a/frame 0xfffffe0100b3bdd0
sys_kldload() at sys_kldload+0x5c/frame 0xfffffe0100b3be00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe0100b3bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0100b3bf30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x32d5ff8d767a, rsp = 0x32d5fe406be8, rbp = 0x32d5fe407160 ---
KDB: enter: panic

@evadot evadot added bug Something isn't working amdgpu amdgpu related problems labels Feb 1, 2024
@wulf7
Copy link
Contributor

wulf7 commented Feb 1, 2024

Test this patch:

diff --git a/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c
index e73f089c84..a4c1e94f79 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c
@@ -1561,6 +1561,7 @@ void dcn_bw_notify_pplib_of_wm_ranges(
 
 void dcn_bw_sync_calcs_and_dml(struct dc *dc)
 {
+	DC_FP_START();
 	DC_LOG_BANDWIDTH_CALCS("sr_exit_time: %f ns\n"
 			"sr_enter_plus_exit_time: %f ns\n"
 			"urgent_latency: %f ns\n"
@@ -1697,6 +1698,7 @@ void dcn_bw_sync_calcs_and_dml(struct dc *dc)
 			dc->dcn_ip->can_vstartup_lines_exceed_vsync_plus_back_porch_lines_minus_one,
 			dc->dcn_ip->bug_forcing_luma_and_chroma_request_to_same_size_fixed,
 			dc->dcn_ip->dcfclk_cstate_latency);
+	DC_FP_END();
 
 	dc->dml.soc.sr_exit_time_us = dc->dcn_soc->sr_exit_time;
 	dc->dml.soc.sr_enter_plus_exit_time_us = dc->dcn_soc->sr_enter_plus_exit_time;

@jownit
Copy link
Author

jownit commented Feb 2, 2024

I tried the patch, but unfortunately it does not make any difference.

panic: Unregistered use of FPU in kernel
cpuid = 5
time = 1706869368
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0100aaff60
vpanic() at vpanic+0x135/frame 0xfffffe0100ab0090
panic() at panic+0x43/frame 0xfffffe0100ab00f0
trap() at trap+0x8dc/frame 0xfffffe0100ab0210
calltrap() at calltrap+0x8/frame 0xfffffe0100ab0210
--- trap 0x16, rip = 0xffffffff84001001, rsp = 0xfffffe0100ab02e8, rbp = 0xfffffe0100ab02f0 ---
dcn_bw_sync_calcs_and_dml() at dcn_bw_sync_calcs_and_dml+0x31/frame 0xfffffe0100ab02f0
dcn10_create_resource_pool() at dcn10_create_resource_pool+0x6a6/frame 0xfffffe0100ab0490
dc_create_resource_pool() at dc_create_resource_pool+0x4c/frame 0xfffffe0100ab04b0
dc_create() at dc_create+0x330/frame 0xfffffe0100ab04f0
dm_hw_init() at dm_hw_init+0x3d9/frame 0xfffffe0100ab06e0
amdgpu_device_ip_hw_init_phase2() at amdgpu_device_ip_hw_init_phase2+0x5a/frame 0xfffffe0100ab0710
amdgpu_device_ip_init() at amdgpu_device_ip_init+0x370/frame 0xfffffe0100ab0790
amdgpu_device_init() at amdgpu_device_init+0x1cdb/frame 0xfffffe0100ab0850
amdgpu_driver_load_kms() at amdgpu_driver_load_kms+0x16/frame 0xfffffe0100ab0880
amdgpu_pci_probe() at amdgpu_pci_probe+0x283/frame 0xfffffe0100ab08c0
linux_pci_attach_device() at linux_pci_attach_device+0x478/frame 0xfffffe0100ab0910
device_attach() at device_attach+0x3b5/frame 0xfffffe0100ab0960
bus_generic_driver_added() at bus_generic_driver_added+0xa1/frame 0xfffffe0100ab0990
devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe0100ab09d0
devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe0100ab0a10
_linux_pci_register_driver() at _linux_pci_register_driver+0xcc/frame 0xfffffe0100ab0a40
amdgpu_evh() at amdgpu_evh+0x80/frame 0xfffffe0100ab0a50
module_register_init() at module_register_init+0x85/frame 0xfffffe0100ab0a80
linker_load_module() at linker_load_module+0xbf9/frame 0xfffffe0100ab0d70
kern_kldload() at kern_kldload+0x16a/frame 0xfffffe0100ab0dd0
sys_kldload() at sys_kldload+0x5c/frame 0xfffffe0100ab0e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe0100ab0f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0100ab0f30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x331be343b69a, rsp = 0x331be1d001c8, rbp = 0x331be1d00740 ---
KDB: enter: panic

@wulf7
Copy link
Contributor

wulf7 commented Feb 2, 2024

Ok. Then let`s increase DC_FP scope:

diff --git a/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c
index e73f089c84..22613780c3 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c
@@ -1561,6 +1561,7 @@ void dcn_bw_notify_pplib_of_wm_ranges(
 
 void dcn_bw_sync_calcs_and_dml(struct dc *dc)
 {
+	DC_FP_START();
 	DC_LOG_BANDWIDTH_CALCS("sr_exit_time: %f ns\n"
 			"sr_enter_plus_exit_time: %f ns\n"
 			"urgent_latency: %f ns\n"
@@ -1749,4 +1750,5 @@ void dcn_bw_sync_calcs_and_dml(struct dc *dc)
 	dc->dml.ip.bug_forcing_LC_req_same_size_fixed =
 		dc->dcn_ip->bug_forcing_luma_and_chroma_request_to_same_size_fixed == dcn_bw_yes;
 	dc->dml.ip.dcfclk_cstate_latency = dc->dcn_ip->dcfclk_cstate_latency;
+	DC_FP_END();
 }

@jownit
Copy link
Author

jownit commented Feb 2, 2024

That made a difference!
No panic, and my external display works.
Looks good so far, X11 starts and looks like before, when using drm-510-kmod

@evadot
Copy link
Contributor

evadot commented Feb 8, 2024

Same problem on :
vgapci0@pci0:8:0:0: class=0x030000 rev=0x81 hdr=0x00 vendor=0x1002 device=0x15dd subvendor=0x1002 subdevice=0x15dd
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series]'
class = display
subclass = VGA

Patch works here too.

@alfonsosiciliano
Copy link
Contributor

alfonsosiciliano commented Feb 8, 2024

Patch works also for:
Device: PCI 1002:15d8 Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series]

Thanks!

@lbartoletti
Copy link

(Maybe a different problem? But,) I also have a panic using drm-61-kmod on FreeBSD 15.0

amd_panic

It's an APU (raphael) on aAMD Ryzen 9 7900X

b.f.o issue for reference: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268394#c10

@alfonsosiciliano
Copy link
Contributor

(Maybe a different problem? But,) I also have a panic using drm-61-kmod on FreeBSD 15.0

amd_panic

It's an APU (raphael) on aAMD Ryzen 9 7900X

b.f.o issue for reference: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268394#c10

Sorry, unfortunately I can't "read" the image.

Is the problem: "panic: Unregistered use of FPU"?
Do core file and kgdb backtrace refer to the "dcn_bw_sync_calcs_and_dml()" function?
Does the problem occur after the patch "DC_FP_START\END()"?

wulf7 added a commit that referenced this issue Feb 11, 2024
with adding DC_FP_START/DC_FP_END around block of code that uses FPU

The panic message is:

panic: Unregistered use of FPU in kernel
cpuid = 2
time = 1704878955
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0100b3af70
vpanic() at vpanic+0x131/frame 0xfffffe0100b3b0a0
panic() at panic+0x43/frame 0xfffffe0100b3b100
trap() at trap+0x8dc/frame 0xfffffe0100b3b220
calltrap() at calltrap+0x8/frame 0xfffffe0100b3b220
--- trap 0x16, rip = 0xffffffff8400299b, rsp = 0xfffffe0100b3b2f0, rbp = 0xfffffe0100b3b2f0 ---
dcn_bw_sync_calcs_and_dml() at dcn_bw_sync_calcs_and_dml+0xb/frame 0xfffffe0100b3b2f0
dcn10_create_resource_pool() at dcn10_create_resource_pool+0x6a6/frame 0xfffffe0100b3b490
dc_create_resource_pool() at dc_create_resource_pool+0x4c/frame 0xfffffe0100b3b4b0
dc_create() at dc_create+0x330/frame 0xfffffe0100b3b4f0
dm_hw_init() at dm_hw_init+0x3d9/frame 0xfffffe0100b3b6e0
amdgpu_device_ip_hw_init_phase2() at amdgpu_device_ip_hw_init_phase2+0x5a/frame 0xfffffe0100b3b710
amdgpu_device_ip_init() at amdgpu_device_ip_init+0x370/frame 0xfffffe0100b3b790
amdgpu_device_init() at amdgpu_device_init+0x1cdb/frame 0xfffffe0100b3b850
amdgpu_driver_load_kms() at amdgpu_driver_load_kms+0x16/frame 0xfffffe0100b3b880
amdgpu_pci_probe() at amdgpu_pci_probe+0x283/frame 0xfffffe0100b3b8c0
linux_pci_attach_device() at linux_pci_attach_device+0x478/frame 0xfffffe0100b3b910
device_attach() at device_attach+0x3b5/frame 0xfffffe0100b3b960
bus_generic_driver_added() at bus_generic_driver_added+0xa1/frame 0xfffffe0100b3b990
devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe0100b3b9d0
devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe0100b3ba10
_linux_pci_register_driver() at _linux_pci_register_driver+0xcc/frame 0xfffffe0100b3ba40
amdgpu_evh() at amdgpu_evh+0x80/frame 0xfffffe0100b3ba50
module_register_init() at module_register_init+0x85/frame 0xfffffe0100b3ba80
linker_load_module() at linker_load_module+0xbf9/frame 0xfffffe0100b3bd70
kern_kldload() at kern_kldload+0x16a/frame 0xfffffe0100b3bdd0
sys_kldload() at sys_kldload+0x5c/frame 0xfffffe0100b3be00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe0100b3bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0100b3bf30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x32d5ff8d767a, rsp = 0x32d5fe406be8, rbp = 0x32d5fe407160 ---
KDB: enter: panic

#277

Sponsored by:	Serenity Cyber Security, LLC
wulf7 added a commit that referenced this issue Feb 11, 2024
with adding DC_FP_START/DC_FP_END around block of code that uses FPU

The panic message is:

panic: Unregistered use of FPU in kernel
cpuid = 2
time = 1704878955
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0100b3af70
vpanic() at vpanic+0x131/frame 0xfffffe0100b3b0a0
panic() at panic+0x43/frame 0xfffffe0100b3b100
trap() at trap+0x8dc/frame 0xfffffe0100b3b220
calltrap() at calltrap+0x8/frame 0xfffffe0100b3b220
--- trap 0x16, rip = 0xffffffff8400299b, rsp = 0xfffffe0100b3b2f0, rbp = 0xfffffe0100b3b2f0 ---
dcn_bw_sync_calcs_and_dml() at dcn_bw_sync_calcs_and_dml+0xb/frame 0xfffffe0100b3b2f0
dcn10_create_resource_pool() at dcn10_create_resource_pool+0x6a6/frame 0xfffffe0100b3b490
dc_create_resource_pool() at dc_create_resource_pool+0x4c/frame 0xfffffe0100b3b4b0
dc_create() at dc_create+0x330/frame 0xfffffe0100b3b4f0
dm_hw_init() at dm_hw_init+0x3d9/frame 0xfffffe0100b3b6e0
amdgpu_device_ip_hw_init_phase2() at amdgpu_device_ip_hw_init_phase2+0x5a/frame 0xfffffe0100b3b710
amdgpu_device_ip_init() at amdgpu_device_ip_init+0x370/frame 0xfffffe0100b3b790
amdgpu_device_init() at amdgpu_device_init+0x1cdb/frame 0xfffffe0100b3b850
amdgpu_driver_load_kms() at amdgpu_driver_load_kms+0x16/frame 0xfffffe0100b3b880
amdgpu_pci_probe() at amdgpu_pci_probe+0x283/frame 0xfffffe0100b3b8c0
linux_pci_attach_device() at linux_pci_attach_device+0x478/frame 0xfffffe0100b3b910
device_attach() at device_attach+0x3b5/frame 0xfffffe0100b3b960
bus_generic_driver_added() at bus_generic_driver_added+0xa1/frame 0xfffffe0100b3b990
devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe0100b3b9d0
devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe0100b3ba10
_linux_pci_register_driver() at _linux_pci_register_driver+0xcc/frame 0xfffffe0100b3ba40
amdgpu_evh() at amdgpu_evh+0x80/frame 0xfffffe0100b3ba50
module_register_init() at module_register_init+0x85/frame 0xfffffe0100b3ba80
linker_load_module() at linker_load_module+0xbf9/frame 0xfffffe0100b3bd70
kern_kldload() at kern_kldload+0x16a/frame 0xfffffe0100b3bdd0
sys_kldload() at sys_kldload+0x5c/frame 0xfffffe0100b3be00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe0100b3bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0100b3bf30
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x32d5ff8d767a, rsp = 0x32d5fe406be8, rbp = 0x32d5fe407160 ---
KDB: enter: panic

#277

Sponsored by:	Serenity Cyber Security, LLC

(cherry picked from commit 4ca06b0)
@wulf7
Copy link
Contributor

wulf7 commented Feb 11, 2024

I pushed slightly different version of the patch to 6.1-lts branch. If it still works we can update drm-61-kmod port

@wulf7
Copy link
Contributor

wulf7 commented Feb 11, 2024

(Maybe a different problem? But,) I also have a panic using drm-61-kmod on FreeBSD 15.0

It's an APU (raphael) on aAMD Ryzen 9 7900X

b.f.o issue for reference: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268394#c10

That is different panic.

What you see is a driver unloading bug (misplaced vm_phys_fictitious_unreg_range() call). It is triggered by a fatal error (not by the driver crash!) during initialization. You should check message buffer content rather than backtrace in that case. Most probably the reason of error is unsupported GPU or missing firmware module.
Try to install all FLAVORs of graphics/gpu-firmware-amd-kmod and if it does not help you may try WIP 6.6 branch.

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Feb 24, 2024
- Allow build on recent 14-STABLE
- Fix freebsd/drm-kmod#277
- Attempt to fix PR/274770

Sponsored by:	Serenity Cyber Security, LLC
Approved by:	x11 (manu, implicit)
@lbartoletti
Copy link

(Maybe a different problem? But,) I also have a panic using drm-61-kmod on FreeBSD 15.0
It's an APU (raphael) on aAMD Ryzen 9 7900X
b.f.o issue for reference: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268394#c10

That is different panic.

What you see is a driver unloading bug (misplaced vm_phys_fictitious_unreg_range() call). It is triggered by a fatal error (not by the driver crash!) during initialization. You should check message buffer content rather than backtrace in that case. Most probably the reason of error is unsupported GPU or missing firmware module. Try to install all FLAVORs of graphics/gpu-firmware-amd-kmod and if it does not help you may try WIP 6.6 branch.

Thanks, and sorry for my late answer.

Different problem now with 61-lts, I can load amdgpu, but now, I have black screen.

I was unable to compile the 6.6 branch due to a missing header:

@evadot
Copy link
Contributor

evadot commented Jun 4, 2024

Closing as the original bug is fixed.

@evadot evadot closed this as completed Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
amdgpu amdgpu related problems bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants