Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(bug) NVRM kgspInitRm_IMPL: missing NVDEC0 engine, cannot initialize GSP-RM #116

Closed
kfazz opened this issue May 12, 2022 · 4 comments
Closed
Assignees
Labels
bug Something isn't working Implemented Fixed, in test prior to release integration NV-Triaged An NVBug has been created for dev to investigate
Milestone

Comments

@kfazz
Copy link

kfazz commented May 12, 2022

The open source KM fails to load for me with the following error:
[ 3.596579] NVRM kgspInitRm_IMPL: missing NVDEC0 engine, cannot initialize GSP-RM
[ 3.596583] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[ 3.596745] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x63:0x56:1689)
[ 3.597360] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 3.597537] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ 3.597749] [drm:nv_drm_probe_devices [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[ 3.786715] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[ 3.786773] ucsi_ccg 1-0008: i2c_transfer failed -110
[ 3.786818] ucsi_ccg 1-0008: ucsi_ccg_init failed - -110

This is on a ubuntu 22.04 system with an RTX 2060, packages are from the cuda apt repo.

After switching back to the proprietary KM using:
$ sudo apt remove nvidia-kernel-open-515 && sudo apt install --reinstall nvidia-dkms-515 && sudo reboot

The proprietary driver loads just fine.

==============NVSMI LOG==============

Timestamp : Thu May 12 14:55:08 2022
Driver Version : 515.43.04
CUDA Version : 11.7

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce RTX 2060
Product Brand : GeForce
Product Architecture : Turing
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-0e997e95-3209-c2c9-6214-347945351445
Minor Number : 0
VBIOS Version : 90.06.2E.C0.0B
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1F0810DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x37591028
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 1816
Replay Number Rollovers : 231
Tx Throughput : 3000 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 32 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6144 MiB
Reserved : 212 MiB
Used : 193 MiB
Free : 5738 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 1 %
Memory : 1 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 39 C
GPU Shutdown Temp : 93 C
GPU Slowdown Temp : 90 C
GPU Max Operating Temp : 88 C
GPU Target Temperature : 83 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 11.18 W
Power Limit : 160.00 W
Default Power Limit : 160.00 W
Enforced Power Limit : 160.00 W
Min Power Limit : 125.00 W
Max Power Limit : 160.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7001 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 1257
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 70 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 1445
Type : G
Name : /usr/libexec/gnome-remote-desktop-daemon
Used GPU Memory : 1 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 1485
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 118 MiB

@aaronp24
Copy link
Member

Can you please run sudo nvidia-bug-report.sh and attach the resulting nvidia-bug-report.log.gz file?

@kfazz
Copy link
Author

kfazz commented May 12, 2022

bug report tgz against open driver:
nvidia-bug-report.log.gz

@PAR2020
Copy link
Contributor

PAR2020 commented May 12, 2022

This is a known limitation of this driver version.
Due to GSP software design, it is required that the NVDEC0 engine is present on chip.
If the engine is not present, it is not possible to boot GSP-RM, as the error message states.
This limitation will be addressed in the next driver release, internal bug 3586266.

Note that this is not related to bug #104 -- NVDEC is supported by the open source driver.
The proprietary driver is not impacted because it does not boot GSP firmware and thus does not rely on NVDEC0.

@PAR2020 PAR2020 added the bug Something isn't working label May 12, 2022
@PAR2020 PAR2020 added NV-Triaged An NVBug has been created for dev to investigate Implemented Fixed, in test prior to release integration labels May 25, 2022
@aritger
Copy link
Collaborator

aritger commented May 31, 2022

This should be fixed in 515.48.07. Please let me know if you continue to have troubles. Marking fixed.

@aritger aritger closed this as completed May 31, 2022
@PAR2020 PAR2020 added this to the 515.48.07 milestone May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Implemented Fixed, in test prior to release integration NV-Triaged An NVBug has been created for dev to investigate
Projects
None yet
Development

No branches or pull requests

5 participants