DRM devices opened by logind stay referenced indefinitely by PID 1 #6908

Closed
kokoko3k opened this Issue Sep 25, 2017 · 7 comments

Comments

Projects
None yet
4 participants

Submission type

  • Bug report

systemd version the issue has been seen with

234.11

Used distribution

Archlinux

Hi, i'm trying to Nvidia "Reverse Prime" with a gtx750 + intel hd4000
Displays are attached to the intel connectors.
What i've done so far:
My xorg.conf:

Section "Module"
	Load "modesetting"
EndSection

Section "Device"
	Identifier "nvidia"
	Driver "nvidia"
	#BusID ""
	Option "AllowEmptyConfiguration"
EndSection

.xinitrc:

export DISPLAY=:0.0 
xrandr --setprovideroutputsource modesetting NVIDIA-0
xrandr --auto
openbox --replace &
setxkbmap it
xset mouse 1.4 4
exec sudo -u koko lxterminal &
p1=$!
wait $p1

System starts with intel as the primary adapter, nvidia modules are NOT autoloaded.
So i do:
modprobe nvidia-drm
modprobe nvidia_modeset
modprobe nvidia

All is good whitin the X server, and works as expected, but what seems wrong to me is that as i exit the Xorg session, nvidia_drm remains in use.

lsmod | grep nvidia
nvidia_drm             45056  1
nvidia_modeset        798720  1 nvidia_drm
nvidia              11476992  1 nvidia_modeset
drm_kms_helper        126976  2 i915,nvidia_drm
drm                   299008  7 i915,ttm,nvidia_drm,drm_kms_helper

It took me a while to understand, but there was an open handle to /dev/dri/card1 (nvidia):

# lsof -n | grep "/dev/dri/card/1"
systemd      1           root   19u      CHR              226,1      0t0            21785 /dev/dri/card1

It prevents me from unloading nvidia_drm and i need to do that to pass the gpu to a libvirt domain.

Manually use gdb to attach systemd and closing the handle makes it work, but i cannot do that (systemd crashes badly that way).

Note that the open handles i am speaking about increases by on on every xorg restart.

Who is to blame? nvidia driver? Modesetting driver?

As pointed by Aaron Plattner, stopping and starting systemd-logind "frees" the handle.

Is this expected? Any way to avoid?

Thanks

@Nightbane112 Nightbane112 referenced this issue in Witko/nvidia-xrun Sep 26, 2017

Closed

Unable to unload nvidia_drm #32

Owner

poettering commented Sep 27, 2017

logind manages device access for graphical display services, so that they lose access to input and DRM when the user switches away from them using Alt-Fn. For that it keeps the devices open. However, there's a bug lurking. In order to make logind rrestartable without losing all display managers logind nowadays pushes open device fds into PID 1 while running, so that they stay referenced. However, they are currently never removed there except if the devices physically go away (i.e. when POLLERR or POLLHUP is seen). I figure this needs to be fixed, and logind needs to ask PID 1 to release devices before that.

@poettering poettering changed the title from nvidia_drm remains in use for no apparent reason after Xorg shutdown to DRM devices opened by logind stay referenced indefinitely by PID 1 Sep 27, 2017

Member

vcaputo commented Sep 27, 2017

@poettering There's something quite annoying I've observed but not bothered to investigate yet on my debian 9 (v232) system which may be related:

  1. Xorg + minimal window manager running via startx from shell on VC1, no graphical login
  2. switch to VC2 via ctrl-alt-F2, login as a different user from VC1
  3. start Xorg on VC2 via startx, another minimal window manager arrangement
  4. Do some stuff in VC2, quit the window manager exiting Xorg back to the shell in VC2
  5. Send EOF to bash via Ctrl-D, logging out of the shell on VC2
  6. Unexpectedly, a VC switch back to VC1 appears to occur, which is still running Xorg, and this causes Xorg to immediately exit due to some DRM errors (drm master contention problem?)

The VC1 Xorg exit don't manifest if I leave VC2 at the shell, skipping step #5, and switch via Alt-F1, everything works just fine and dandy. I don't understand why VC2's shell exiting didn't simply return me to a VC2 login prompt; I don't necessarily want to switch back to VC1 immediately.

Does this sound related/familiar/expected? Should I file a separate bug?

Atavic commented Sep 27, 2017

That's another issue IMHO

poettering added a commit to poettering/systemd that referenced this issue Nov 13, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908
Owner

poettering commented Nov 13, 2017

Fix waiting in #7316

Owner

poettering commented Nov 13, 2017

Some testing that #7316 actually does what it is supposed to do would be very welcome, as I lack the hatrdware/drivers this was originally reported about. Thank you

@poettering poettering added the has-pr label Nov 13, 2017

poettering added a commit to poettering/systemd that referenced this issue Nov 14, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

Just tested, and now the module is freed correctly, thank you!

@kokoko3k kokoko3k closed this Nov 14, 2017

Owner

poettering commented Nov 14, 2017

Please leave this open until the PR is merged. In fact, github will merge this issue automatically as soon as #7316 is merged.

@poettering poettering reopened this Nov 14, 2017

poettering added a commit to poettering/systemd that referenced this issue Nov 15, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

poettering added a commit to poettering/systemd that referenced this issue Nov 17, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

poettering added a commit to poettering/systemd that referenced this issue Nov 20, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

poettering added a commit to poettering/systemd that referenced this issue Nov 20, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

poettering added a commit to poettering/systemd that referenced this issue Nov 21, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

poettering added a commit to poettering/systemd that referenced this issue Nov 21, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

poettering added a commit to poettering/systemd that referenced this issue Nov 27, 2017

logind: use the new FDSTOREREMOVE=1 sd_notify() message
Let's explicitly tell PID 1 that we don't need an fd anymore, instead of
relying exclusively on POLLERR/POLLHUP for it to be removed.

Fixes: #6908

@keszybz keszybz closed this in #7316 Nov 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment