Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] ACPI/PCIe lockup issue on Skylake + Maxwell hybrid graphics laptops #21

Open
Lekensteyn opened this issue Nov 3, 2016 · 0 comments

Comments

@Lekensteyn
Copy link

Lekensteyn commented Nov 3, 2016

There is a widespread problem among modern Skylake laptops (from Asus, Clevo, Dell, HP and others) equipped with Maxwell GPUs (like GTX 970M, Quadro M1000M, etc.): Bug 156341 - Nvidia fails to power on again, resulting in AML_INFINITE_LOOP/lockups (multiple laptops affected)

While powering off the GPU works (through the vendor-specific _DSM ACPI method or by relying on management of Power Resources), restoring power fails. This results in lockups and breaks suspend and delays power off on systems that report Windows 8 (or newer) compatibility.

Question: Maybe you have a helpful document for UEFI firmware writers or know what problems the firmware authors tried to workaround?

One example of a problematic ACPI method can be seen in this file: https://github.com/Lekensteyn/acpi-stuff/blob/master/dsl/Clevo_P651RA/ssdt3.dsl#L4001
Notes of reverse-engineering the ACPI tables are at: https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-P651RA/notes.txt

If you need more details, please ask! Thanks!


Already tried / observed:

  • Checking if any ACPI calls are missing: there do not seem to be missing ones. Timing does not seem relevant either (more delays did not help).
  • Problem occurs on restoring power on the Power Resource of the PCIe port, there is an infinite loop in the ACPI code where it checks whether the link is restored (register Q0L0 in the PCIe extended config space). Some laptops do not even have this check, so there is no infinite loop in the ACPI code, but still the power is not restored.
  • Disabling the ACPI wakeup methid (_DSW) has no effect. (I observed that Win10 never uses a combination that would result in enabling wake at dGPU poweroff).

Without a driver, there is no lockup for some reason even when PCIe port PM powers off the appropriate power resource. Test:

echo > /sys/bus/pci/devices/0000:01:00.0/power/control auto
lspci -s1: -H1 # verify that device has runtime-suspended (should show no output)
echo > /sys/bus/pci/devices/0000:01:00.0/power/control on
lspci -s1 -H1  # verify that the dGPU is back again

Note that after this, loading nouveau fails with nouveau 0000:01:00.0: unknown chipset (ffffffff). Removing device from PCI bus and rescanning "fixes" this again:

echo > /sys/bus/pci/devices/0000:00:01.0/0000:01.0/remove 1
echo > /sys/bus/pci/devices/0000:00:01.0/rescan 1
modprobe -v nouveau  # now succeeds

Question: any known issues related to PCI/PM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant