New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLP conflicts with Bumblebee, disabling GPU until reboot #244

Closed
jkdf2 opened this Issue Jan 5, 2017 · 19 comments

Comments

Projects
None yet
6 participants
@jkdf2

jkdf2 commented Jan 5, 2017

Symptom data

  1. Does the problem occur on battery or AC or both?

After going to battery, even if AC reconnected, until reboot.

  1. Attach the full output of tlp-stat via Gist for all cases of 1.

https://gist.github.com/jkdf2/c296e7dc8cf69f94efa517a1eb2b7fc0

Expected behavior

optirun glxgears should be able to be started when on battery and especially when on AC - it's a deliberate user choice.

Actual behavior

After being on battery-only, optirun glxgears reports:

[ 525.648588] [ERROR]Cannot access secondary GPU - error: Could not enable discrete graphics card

Unless RUNTIME_PM_BLACKLIST and RUNTIME_PM_DRIVER_BLACKLIST are configured correctly. Perhaps this can be automated on TLP install if it could detect an Nvidia Optimus configuration?

Steps to reproduce the problem

Have a device that supports Nvidia Optimus, the technology that uses the integrated graphics but switches to the discrete graphics card as required, and install the Bumblebee driver to manage it. Then attempt to run an application in this mode with optirun.

For reference, see this GitHub issue over at Bumblebee.

@linrunner

This comment has been minimized.

Show comment
Hide comment
@linrunner

linrunner Jan 5, 2017

Owner

Perhaps this can be automated on TLP install if it could detect an Nvidia Optimus configuration?

I see this as a feature request. If you want to have it, send me a pull request :-).

ps. i don't own optimus hardware (and never will).

Owner

linrunner commented Jan 5, 2017

Perhaps this can be automated on TLP install if it could detect an Nvidia Optimus configuration?

I see this as a feature request. If you want to have it, send me a pull request :-).

ps. i don't own optimus hardware (and never will).

@jkdf2

This comment has been minimized.

Show comment
Hide comment
@jkdf2

jkdf2 Jan 5, 2017

Thanks @linrunner.

Anybody who comes across this can assume this is open for development. I don't have time at the moment to make this a priority, unless I follow-up otherwise.

jkdf2 commented Jan 5, 2017

Thanks @linrunner.

Anybody who comes across this can assume this is open for development. I don't have time at the moment to make this a priority, unless I follow-up otherwise.

@linrunner

This comment has been minimized.

Show comment
Hide comment
@linrunner

linrunner Jan 5, 2017

Owner

As a first step a concept how to to reliably detect nvidia eGPUs and Optimus would be nice. A PCI ID of "01:00.0" seems not sufficient to me.

Unfortunately the existing blacklisting via driver (nvidia, nouveau et. al.) doesn't work in your case, because the disabled eGPU has no driver loaded.

/sys/bus/pci/devices/0000:01:00.0/power/control = on (0x030200, 3D controller, no driver)

Owner

linrunner commented Jan 5, 2017

As a first step a concept how to to reliably detect nvidia eGPUs and Optimus would be nice. A PCI ID of "01:00.0" seems not sufficient to me.

Unfortunately the existing blacklisting via driver (nvidia, nouveau et. al.) doesn't work in your case, because the disabled eGPU has no driver loaded.

/sys/bus/pci/devices/0000:01:00.0/power/control = on (0x030200, 3D controller, no driver)

@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Jan 7, 2017

Reliably detect: search for two VGA+3D devices in PCI devices (so VGA+VGA, VGA+3D or 3D+3D, though I’ve never seen that one AFAIR), one being Intel and the other Nvidia (you want to avoid Nvidia+Nvidia for instance). The PCI ID is definitively a bad idea, we’ve seen it going from 01 to 10 at least.

However, the issue is to be (partially?) fixed in bbswitch, since supporting 4.8 kernel PCIe port PM require making bbswitch a proper driver. This is the target for next milestone, but only @Lekensteyn is working on it and he has been quite busy for a while.

ArchangeGabriel commented Jan 7, 2017

Reliably detect: search for two VGA+3D devices in PCI devices (so VGA+VGA, VGA+3D or 3D+3D, though I’ve never seen that one AFAIR), one being Intel and the other Nvidia (you want to avoid Nvidia+Nvidia for instance). The PCI ID is definitively a bad idea, we’ve seen it going from 01 to 10 at least.

However, the issue is to be (partially?) fixed in bbswitch, since supporting 4.8 kernel PCIe port PM require making bbswitch a proper driver. This is the target for next milestone, but only @Lekensteyn is working on it and he has been quite busy for a while.

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Jan 7, 2017

In my udev I have this rule to match nvidia GPUs:

# 2016-10-23 let nouveau detect the device and power off as appropriate
SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", GOTO="powersave_end"

Starting with kernel 4.8, PCIe ports support RPM, but this may have undesired effects if no PCI driver is loaded for the Nvidia GPU (on an hybrid graphics laptop). The result is that loading the driver fails, unless you remove the PCI device via sysfs and rescan the parent PCIe bus. No idea why that is the case though.

Maybe you could do a similar thing, disabling RPM for all nvidia GPUs as above. When nouveau is in use, it will enable RPM itself. When nvidia blob is in use, RPM is not usable anyway.

Lekensteyn commented Jan 7, 2017

In my udev I have this rule to match nvidia GPUs:

# 2016-10-23 let nouveau detect the device and power off as appropriate
SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", GOTO="powersave_end"

Starting with kernel 4.8, PCIe ports support RPM, but this may have undesired effects if no PCI driver is loaded for the Nvidia GPU (on an hybrid graphics laptop). The result is that loading the driver fails, unless you remove the PCI device via sysfs and rescan the parent PCIe bus. No idea why that is the case though.

Maybe you could do a similar thing, disabling RPM for all nvidia GPUs as above. When nouveau is in use, it will enable RPM itself. When nvidia blob is in use, RPM is not usable anyway.

@linrunner

This comment has been minimized.

Show comment
Hide comment
@linrunner

linrunner Jan 8, 2017

Owner

@jkdf2 : let's see if your card fits @Lekensteyn 's rule?

lspci -vnn -s 01:00.0
Owner

linrunner commented Jan 8, 2017

@jkdf2 : let's see if your card fits @Lekensteyn 's rule?

lspci -vnn -s 01:00.0
@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Jan 8, 2017

ATTR{vendor}=="0x10de" for sure, but ATTR{class}=="0x030000" I don’t think so, because this is for VGA compatible controller [0300], but as I’ve stated above you’re also going to have 3D controller [0302], which is probably ATTR{class}=="0x030200".

Here is mine:

02:00.0 3D controller [0302]: NVIDIA Corporation GK107M [GeForce GT 750M] [10de:0fe4]

That’s being said, @Lekensteyn idea to blacklist all Nvidia GPU from RPM looks fine indeed.

ArchangeGabriel commented Jan 8, 2017

ATTR{vendor}=="0x10de" for sure, but ATTR{class}=="0x030000" I don’t think so, because this is for VGA compatible controller [0300], but as I’ve stated above you’re also going to have 3D controller [0302], which is probably ATTR{class}=="0x030200".

Here is mine:

02:00.0 3D controller [0302]: NVIDIA Corporation GK107M [GeForce GT 750M] [10de:0fe4]

That’s being said, @Lekensteyn idea to blacklist all Nvidia GPU from RPM looks fine indeed.

@jkdf2

This comment has been minimized.

Show comment
Hide comment
@jkdf2

jkdf2 Jan 8, 2017

lspci -vnn -s 01:00.0 output:

01:00.0 3D controller [0302]: NVIDIA Corporation GM107M [GeForce GTX 960M] [10de:139b] (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel modules: nvidia

jkdf2 commented Jan 8, 2017

lspci -vnn -s 01:00.0 output:

01:00.0 3D controller [0302]: NVIDIA Corporation GM107M [GeForce GTX 960M] [10de:139b] (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel modules: nvidia
@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel Jan 8, 2017

@jkdf2 (and @linrunner for info) If you want the full output, you have to power on the card first.

sudo tee /proc/acpi/bbswitch <<< ON
lspci -vnn -s 01:00.0
sudo tee /proc/acpi/bbswitch <<< OFF

ArchangeGabriel commented Jan 8, 2017

@jkdf2 (and @linrunner for info) If you want the full output, you have to power on the card first.

sudo tee /proc/acpi/bbswitch <<< ON
lspci -vnn -s 01:00.0
sudo tee /proc/acpi/bbswitch <<< OFF
@linrunner

This comment has been minimized.

Show comment
Hide comment
@linrunner

linrunner Jan 8, 2017

Owner

It's fine like that, because detection must work even when the card is off.

Owner

linrunner commented Jan 8, 2017

It's fine like that, because detection must work even when the card is off.

@ArchangeGabriel

This comment has been minimized.

Show comment
Hide comment
@ArchangeGabriel

ArchangeGabriel commented Jan 8, 2017

Sure. :)

@Lekensteyn

This comment has been minimized.

Show comment
Hide comment
@Lekensteyn

Lekensteyn Jan 9, 2017

The above udev rule works for me:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 965M] [10de:13d9] (rev a1)

For more compatibility you might want to include both the 0300 and 0302 classes.

Lekensteyn commented Jan 9, 2017

The above udev rule works for me:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 965M] [10de:13d9] (rev a1)

For more compatibility you might want to include both the 0300 and 0302 classes.

linrunner added a commit that referenced this issue Jan 13, 2017

Runtime PM: improve Nvidia GPU blacklisting
Rationale: Nvidia dGPU in off state may have no driver associated (in
conjunction with Bumblebee), thus driver blacklisting fails.

Solution:
* Detect Nvidia GPU's by PCIe device vendor (0x10de) and class (0x300/0x302)
* Implicitly use RUNTIME_PM_DRIVER_BLACKLIST contents (nouveau, nvidia)
  to enable the feature

Reference:
* Issue #244: #244

@linrunner linrunner self-assigned this Jan 13, 2017

@linrunner linrunner added this to the 1.0 milestone Jan 13, 2017

@linrunner linrunner added the planned label Jan 13, 2017

@linrunner

This comment has been minimized.

Show comment
Hide comment
@linrunner

linrunner Jan 15, 2017

Owner

Please download shiny new packages of version 0.9.901 for Debian/Ubuntu here, test your requested feature and report back. Thank you.

Arch Linux users: use tlp-git and tlp-rdw-git from the AUR.

Owner

linrunner commented Jan 15, 2017

Please download shiny new packages of version 0.9.901 for Debian/Ubuntu here, test your requested feature and report back. Thank you.

Arch Linux users: use tlp-git and tlp-rdw-git from the AUR.

@jkdf2

This comment has been minimized.

Show comment
Hide comment
@jkdf2

jkdf2 Jan 15, 2017

I admit that I don't have the understanding of RPM that y'all do but it seems to be working great for me. I purged tlp and tlp-rdw and installed the beta version.

optirun glxgears now works correctly:

  • On AC
  • On battery after disconnecting AC
  • After suspend, on battery

Here's an output of tlp-stat in case it helps with anything:

https://gist.github.com/jkdf2/bdb8262911d369655a55855701181c07

Thanks! 🎆

jkdf2 commented Jan 15, 2017

I admit that I don't have the understanding of RPM that y'all do but it seems to be working great for me. I purged tlp and tlp-rdw and installed the beta version.

optirun glxgears now works correctly:

  • On AC
  • On battery after disconnecting AC
  • After suspend, on battery

Here's an output of tlp-stat in case it helps with anything:

https://gist.github.com/jkdf2/bdb8262911d369655a55855701181c07

Thanks! 🎆

@linrunner

This comment has been minimized.

Show comment
Hide comment
@linrunner

linrunner Jan 19, 2017

Owner

Fine, thanks for testing.

Owner

linrunner commented Jan 19, 2017

Fine, thanks for testing.

@ion-storm

This comment has been minimized.

Show comment
Hide comment
@ion-storm

ion-storm Feb 2, 2017

can confirm the beta fixed compatibility with bumblebee, thanks for the fix!

ion-storm commented Feb 2, 2017

can confirm the beta fixed compatibility with bumblebee, thanks for the fix!

@SalahAdDin

This comment has been minimized.

Show comment
Hide comment
@SalahAdDin

SalahAdDin May 12, 2017

I can confirm this error also.
How can i install the beta in Antergos?

SalahAdDin commented May 12, 2017

I can confirm this error also.
How can i install the beta in Antergos?

@SalahAdDin

This comment has been minimized.

Show comment
Hide comment
@SalahAdDin

SalahAdDin commented May 12, 2017

I'll applied the provisional solution: https://bbs.archlinux.org/viewtopic.php?id=218757.

@linrunner

This comment has been minimized.

Show comment
Hide comment
@linrunner

linrunner May 25, 2017

Owner

Released with 1.0.

Owner

linrunner commented May 25, 2017

Released with 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment