Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable dGPU in integrated mode if it doesn't support runtimepm #326

Merged
merged 2 commits into from May 3, 2022

Conversation

crawfxrd
Copy link
Member

If a device does not support runtime power management, then remove it from the bus when in integrated mode.

Fixes: 1e1599a ("daemon: Always enable GPU power")
Resolves: #325

@crawfxrd crawfxrd requested review from a team March 30, 2022 20:01
@crawfxrd crawfxrd marked this pull request as ready for review March 30, 2022 20:01
jackpot51
jackpot51 previously approved these changes Mar 30, 2022
@n3m0-22 n3m0-22 self-assigned this Apr 8, 2022
Copy link
Contributor

@n3m0-22 n3m0-22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

21.10 I have tested this on the oryp4:

apt policy system76-power 
system76-power:
  Installed: 1.1.20~1648670246~21.10~77def87
  Candidate: 1.1.20~1648670246~21.10~77def87
  Version table:
 *** 1.1.20~1648670246~21.10~77def87 1002
       1002 http://apt.pop-os.org/staging/missing-runpm impish/main amd64 Packages
        100 /var/lib/dpkg/status
     1.1.20~1648241921~21.10~1e1599a 1001
       1001 http://apt.pop-os.org/release impish/main amd64 Packages

with the 510 and 470 drivers in both in integrated and hybrid modes and the GPU fan is always running at around 4300rpm.

Refering back to the orginal issue #325 the output of /sys/bus/pci/devices/0000:01:00.0/power/control in in integrated mode is now on and auto in hybrid mode.

Since I was also using this #329 to test the 470 issue earlier I also tried it here, but the problem was the same.

@crawfxrd
Copy link
Member Author

crawfxrd commented Apr 8, 2022

Right...I query the loaded drivers to get version, because NVIDIA puts the version in the folder name at /usr/share/doc/nvidia-driver-<VERSION>. So the check for runtimepm will fail.

So I need to fix the runtimepm check to not depend on the driver being loaded.

jackpot51
jackpot51 previously approved these changes Apr 11, 2022
src/graphics.rs Outdated
Comment on lines 294 to 318
let docs: Vec<path::PathBuf> = fs::read_dir("/usr/share/doc")
.map_err(|e| {
GraphicsDeviceError::Json(io::Error::new(io::ErrorKind::InvalidData, e.to_string()))
})?
.filter_map(Result::ok)
.map(|f| f.path())
.filter(|f| f.starts_with("nvidia-driver-"))
.collect();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely there's a better way to get the major version to find this folder?

@n3m0-22
Copy link
Contributor

n3m0-22 commented Apr 12, 2022

On 21.10 oryp4 this seem to be working now with the 510 drivers in Integrated mode, but the GPU fan still runs nonstop with the 470 drivers in Integrated mode.

apt policy system76-power 
system76-power:
  Installed: 1.1.20~1649700305~21.10~e5a37b4
  Candidate: 1.1.20~1649700305~21.10~e5a37b4
  Version table:
 *** 1.1.20~1649700305~21.10~e5a37b4 1002
       1002 http://apt.pop-os.org/staging/missing-runpm impish/main amd64 Packages
        100 /var/lib/dpkg/status
     1.1.20~1648241921~21.10~1e1599a 1001
       1001 http://apt.pop-os.org/release impish/main amd64 Packages

510.54

510 54_missing-runpm_fan_integrated_mode

510.60.02

510 60 02_missing-runpm_fan_integrated_mode

470.86

470 86_missing-runpm_fan_integrated_mode

470.103.01

470 103 01_missing-runpm_fan_integrated_mode

@crawfxrd
Copy link
Member Author

What's the journal say?

systemctl -b -u system76-power

@n3m0-22
Copy link
Contributor

n3m0-22 commented Apr 12, 2022

That just gives me systemctl: invalid option -- 'b'.

@crawfxrd
Copy link
Member Author

Sorry, journalctl, not systemctl.

@n3m0-22
Copy link
Contributor

n3m0-22 commented Apr 12, 2022

No worries. Here you go.

journalctl -b -u system76-power
-- Journal begins at Sun 2022-03-27 10:22:36 MDT, ends at Tue 2022-04-12 08:00:53 MDT. --
Apr 11 21:59:17 ws-op4-l systemd[1]: Starting System76 Power Daemon...
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] Starting daemon
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] Connecting to dbus system bus
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] Rescanning PCI bus
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] 0000:01:00.0: NVIDIA graphics
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] 0000:01:00.0: Function for 0000:01:00.0
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] 0000:01:00.1: Function for 0000:01:00.0
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] 0000:00:02.0: Intel graphics
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] 0000:00:02.0: Function for 0000:00:02.0
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] Disabling NMI Watchdog (for kernel debu>
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] Setting automatic graphics power
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] Device 0x7073 features: ["dpycbcr420", >
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] Disabling graphics power
Apr 11 21:59:17 ws-op4-l system76-power[873]: [INFO] snd_hda_intel: Unbinding 0000:01:00.1
Apr 11 21:59:18 ws-op4-l system76-power[873]: [INFO] 0000:01:00.0: Removing
Apr 11 21:59:18 ws-op4-l system76-power[873]: [INFO] 0000:01:00.1: Removing
Apr 11 21:59:18 ws-op4-l system76-power[873]: [INFO] Initializing with the balanced profile
Apr 11 21:59:18 ws-op4-l system76-power[873]: setting powersave with max 4100000
Apr 11 21:59:18 ws-op4-l system76-power[873]: [INFO] Registering dbus name com.system76.Powe>
Apr 11 21:59:18 ws-op4-l systemd[1]: Started System76 Power Daemon.
Apr 11 21:59:18 ws-op4-l system76-power[873]: [INFO] Adding dbus path /com/system76/PowerDae>
Apr 11 21:59:18 ws-op4-l system76-power[873]: [ERROR] fan daemon: platform hwmon not found
Apr 11 21:59:18 ws-op4-l system76-power[873]: [INFO] Handling dbus requests
Apr 11 21:59:18 ws-op4-l system76-power[873]: [ERROR] hid_backlight: no system76_acpi::kbd_b>
Apr 11 22:01:42 ws-op4-l system76-power[873]: [INFO] DBUS Received GetSwitchable() method
Apr 11 22:01:42 ws-op4-l system76-power[873]: [INFO] DBUS Received GetGraphics() method
Apr 11 22:01:42 ws-op4-l system76-power[873]: [INFO] DBUS Received GetProfile() method
Apr 12 06:27:19 ws-op4-l system76-power[873]: [INFO] DBUS Received GetSwitchable() method
Apr 12 06:27:19 ws-op4-l system76-power[873]: [INFO] DBUS Received GetGraphics() method
Apr 12 06:27:19 ws-op4-l system76-power[873]: [INFO] DBUS Received GetProfile() method

@Brikaa
Copy link

Brikaa commented Apr 15, 2022

I have a question: since the integrated mode blacklists the Nvidia drivers and modules causing the device to be driverless, how will devices that support runtimepm power manage themselves in integrated mode? So why isn't removing the device from PCI the default behavior for all cards in integrated mode?

@crawfxrd
Copy link
Member Author

Removing the device from the bus was the previous default. But if a device supports RTD3 then it shouldn't be required. The kernel will suspend the device when it's not used.

@Brikaa
Copy link

Brikaa commented Apr 15, 2022

So the drivers don't need to be loaded in order for the kernel to suspend the device?

@Brikaa
Copy link

Brikaa commented Apr 15, 2022

I mean, isn't power management the driver's responsibility? And in this case, it is blacklisted.

@n3m0-22
Copy link
Contributor

n3m0-22 commented May 1, 2022

@crawfxrd In testing 22.04 on the orpy4 this problem still persists. I came up with a temporary solution that might help fix this. Running system76-power graphics power off stopped the fan issue, so I made to following.

[Unit]
Description=Stop GPU fan from running non-stop in Integrated Graphics mode

[Service]
ExecStartPre=/bin/sleep 30
ExecStart=/bin/bash /usr/bin/gpu-fan-fix.sh

[Install]
WantedBy=multi-user.target
#!/bin/bash

if [[ $(system76-power graphics) = 'integrated' ]]; then
    if [[ $(system76-power graphics power) = 'on (discrete)' ]]; then
        system76-power graphics power off
   fi
fi

If a device does not support runtime power management, then remove it
from the bus when in integrated mode.

Fixes: 1e1599a ("daemon: Always enable GPU power")
Signed-off-by: Tim Crawford <tcrawford@system76.com>
Store the device ID at PCI enumeration so it does not need to be fetched
from sysfs again later.

Search /usr/share/doc for the NVIDIA driver folder instead of getting
the version of the loaded driver.

Allows getting the dGPU features when the driver is unloaded and the PCI
device has been removed from the bus.

Signed-off-by: Tim Crawford <tcrawford@system76.com>
@n3m0-22 n3m0-22 self-requested a review May 2, 2022 22:34
Copy link
Contributor

@n3m0-22 n3m0-22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the fan issues in Integrated mode.

CLI

Tasks which test the behavior of the CLI client.

  • Power profiles can be queried and set
  • Laptop with switchable graphics:
    • Switchable graphics can be queried
      • Command returns daemon returned an error message: "The name com.system76.PowerDaemon was not provided by any .service files on a non-switchable system
      • Command returns switchable on a laptop with switchable graphics
    • Switching from Integrated to NVIDIA
    • Switching from Integrated to Hybrid
    • Switching from Integrated to Compute
    • Switching from NVIDIA to Integrated
    • Switching from NVIDIA to Hybrid
    • Switching from NVIDIA to Compute
    • Switching from Hybrid to Integrated
    • Switching from Hybrid to NVIDIA
    • Switching from Hybrid to Compute
    • Switching from Compute to Integrated
    • Switching from Compute to NVIDIA
    • Switching from Compute to Hybrid
    • Discrete graphics power state can be queried and set

GNOME Shell

Tasks which test the behavior of the shell extension.

  • Test that the power profile can be switched, and that the dots are correct
  • Test that any power profile change from the CLI is reflected in the extension
  • When switching to balanced, with screen brightness maxed, screen brightness drops to 50%
  • When switching to battery, with screen brightness maxed, screen brightness drops to 10%
  • When switching to balanced, with screen brightness minimized, screen brightness does not change
  • When restarting the daemon, and the daemon defaults to a balanced profile, the brightness should not change
  • When restarting the system, screen brightness should be the same as before
  • Laptop with switchable graphics:
    • Switching from Integrated to NVIDIA
    • Switching from Integrated to Hybrid
    • Switching from Integrated to Compute
    • Switching from NVIDIA to Integrated
    • Switching from NVIDIA to Hybrid
    • Switching from NVIDIA to Compute
    • Switching from Hybrid to Integrated
    • Switching from Hybrid to NVIDIA
    • Switching from Hybrid to Compute
    • Switching from Compute to Integrated
    • Switching from Compute to NVIDIA
    • Switching from Compute to Hybrid
    • Test that switchable graphics changes from the CLI are reflected in the extension

@crawfxrd
Copy link
Member Author

crawfxrd commented May 2, 2022

@n3m0-22 did it actually work? I didn't change anything, just rebased. Do you have the workaround still applied?

@n3m0-22
Copy link
Contributor

n3m0-22 commented May 2, 2022

Yes I completely removed the service and script I created then applied this. It had worked before on 21.10. This weekend was the first time I had a chance to fully test 22.04 on the orpy4. Since this was behind the current version I made the workaround as a temporary fix. I'm running the same tests now on #336. So far that's working too, and the daemon returned an error message: "The name com.system76.PowerDaemon was not provided by any .service files on a non switchable showed not switchable this time. I'll post what my findings there as soon as I'm finished.

@crawfxrd
Copy link
Member Author

crawfxrd commented May 2, 2022

If this works then #336 isn't needed.

@n3m0-22
Copy link
Contributor

n3m0-22 commented May 2, 2022

Either way I tested both, and both solve the fan issue on the orpy4. The one difference being the CLI error on a non-switchable system with this one.

@n3m0-22
Copy link
Contributor

n3m0-22 commented May 2, 2022

The graphs I posted on the other PR match the GPU fan performance change as with this PR.

@jackpot51 jackpot51 merged commit 98e7ad8 into master May 3, 2022
@jackpot51 jackpot51 deleted the missing-runpm branch May 3, 2022 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU fan runs nonstop after update to version 1.1.20~1648241921~21.10~1e1599a
4 participants