New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only get session on alternate login #95

Closed
markhindley opened this Issue Nov 12, 2018 · 18 comments

Comments

Projects
None yet
2 participants
@markhindley
Copy link
Contributor

markhindley commented Nov 12, 2018

No description provided.

@markhindley

This comment has been minimized.

Copy link
Contributor Author

markhindley commented Nov 12, 2018

Hi,

I am still debugging my build of v239-stable that we are testing to upload to Debian.

At the moment I see 3 unexpected and maybe related issues:

  • The dmesg log shows elogind reusing session numbers. I expected them to be distinct

[ 68.563706] elogind-daemon[1474]: New seat seat0.
[ 68.574291] elogind-daemon[1474]: Watching system buttons on /dev/input/event3 (Power Button)
[ 68.581228] elogind-daemon[1474]: Watching system buttons on /dev/input/event0 (AT Translated Set 2 keyboard)
[ 241.350382] elogind-daemon[1474]: New session 1 of user test.
[ 395.808014] elogind-daemon[1474]: Removed session 1.
[ 1049.186245] elogind-daemon[1474]: New session 1 of user test.
[ 1267.800447] elogind-daemon[1474]: Removed session 1.
[ 1286.147904] elogind-daemon[1474]: New session 1 of user test.
[ 1351.550361] elogind-daemon[1474]: Removed session 1.
[ 1506.854037] elogind-daemon[1474]: New session 1 of user test.
[ 1864.036307] elogind-daemon[1474]: New session 2 of user test.
[ 1915.200166] elogind-daemon[1474]: Removed session 2.
[ 2541.371118] elogind-daemon[1474]: Removed session 1.
[ 2547.278959] elogind-daemon[1474]: New session 3 of user test.
[ 2960.443731] elogind-daemon[1474]: Removed session 3.
[ 2974.514006] elogind-daemon[1474]: New session 3 of user test.

  • When I login via a DM (slim, but I doubt that is significant) I only get XDG_SESSION_ID set on alternate logins

  • The mapping from pids to sessions is broken such that polkit often complains abut no session for PID

I suspect the last is a symptom of the first.

I am using -Dcgroup-controller=elogind. Built on debian sid but with polkit-1 115 from debian experimental.

Have I made a mistake with the build?

Thanks.

Mark

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 12, 2018

The dmesg log shows elogind reusing session numbers. I expected them to be distinct

They should unless a session can be re-used. This is how it looks on my system, after I had some fun logging in/out on TTY2-TTY4:

[Nov12 09:38] elogind-daemon[9007]: New session 4 of user sed.
[Nov12 09:39] elogind-daemon[9007]: New session 5 of user sed.
[ +12,370711] elogind-daemon[9007]: Removed session 4.
[  +5,423703] elogind-daemon[9007]: New session 6 of user sed.
[ +13,573155] elogind-daemon[9007]: Removed session 6.
[ +10,397601] elogind-daemon[9007]: New session 7 of user sed.
[ +11,114175] elogind-daemon[9007]: New session 8 of user sed.
[Nov12 09:40] elogind-daemon[9007]: Removed session 8.
[  +2,236716] elogind-daemon[9007]: Removed session 5.
[  +2,368467] elogind-daemon[9007]: Removed session 7.

When I login via a DM (slim, but I doubt that is significant) I only get XDG_SESSION_ID set on alternate logins

XDG_SESSION_ID is set by pam_elogin.so. I haven't tried out slim, but with SDDM it is always set unless I 'su' to root.

I am currently installing slim, maybe I can find something out ...

The mapping from pids to sessions is broken such that polkit often complains abut no session for PID

Maybe that has something to do with a mistake I made when writing the patch for polkit-1.115 to support elogind. Please check whether the patch from here is applied:
https://gitlab.freedesktop.org/polkit/polkit/issues/29

But I think it is alright, or no sessions would be created at all...

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 12, 2018

HA! I just logged into OpenBox using SLiM, and XDG_SESSION_ID is not set, and loginctl returns an empty list. (if I log into Plasma, XDG_SESSION_ID is always set.)

The logs show this:

Nov 12 18:05:46 sed-notebook slim[30715]: pam_unix(slim:session): session closed for user sed
Nov 12 18:06:01 sed-notebook slim[1891]: pam_unix(slim:session): session opened for user sed by (uid=0)
Nov 12 18:06:01 sed-notebook slim[1891]: pam_elogind(slim:session): Cannot create session: Already running in a session or user slice

(The first line is from logging out of Plasma.)

For SDDM the logs show:

Nov 12 09:06:08 sed-notebook kernel: elogind-daemon[11845]: New session 5 of user sddm.
Nov 12 09:06:11 sed-notebook sddm-helper[11930]: pam_unix(sddm:session): session opened for user sed by (uid=0)
Nov 12 09:06:11 sed-notebook kernel: elogind-daemon[11845]: New session 6 of user sed.
Nov 12 09:06:11 sed-notebook sddm-helper[11918]: pam_unix(sddm-greeter:session): session closed for user sddm
Nov 12 09:06:11 sed-notebook kernel: elogind-daemon[11845]: Removed session 5.

When I go comparing to what happens when I use SDDM versus SLiM, a line containing
kernel: elogind-daemon[9007]: Removed session <id>
is missing for the latter. So it looks like the logging off isn't reported to elogind.

But I have to dig deeper...

@markhindley

This comment has been minimized.

Copy link
Contributor Author

markhindley commented Nov 12, 2018

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 13, 2018

I think this is an upstream bug of either SLiM or PAM.
If I look into the SLiM sources (namely app.cpp), I can see that it calls PAM on login to create a user session. That's fine, as this will eventually call pam_elogind.so to create an elogind session.

But when things end, SLiM only calls pam.close_session();. Does this only close its own session? Is the closing of the user session first required here?

This is how login-logout-login look with SDDM starting OpenBox:

08:39:27 sddm-helper[20014]: pam_unix(sddm-greeter:session): session opened for user sddm by (uid=0)
08:39:27 kernel: elogind-daemon[9007]: New session 10 of user sddm.
08:39:35 sddm-helper[20026]: pam_unix(sddm:session): session opened for user sed by (uid=0)
08:39:35 kernel: elogind-daemon[9007]: New session 13 of user sed.
08:39:35 sddm-helper[20014]: pam_unix(sddm-greeter:session): session closed for user sddm
08:39:35 kernel: elogind-daemon[9007]: Removed session 10.
(...)
08:39:59 sddm-helper[20026]: pam_unix(sddm:session): session closed for user sed
08:39:59 kernel: elogind-daemon[9007]: Removed session 13.
08:39:59 sddm-helper[20633]: pam_unix(sddm-greeter:session): session opened for user sddm by (uid=0)
08:39:59 kernel: elogind-daemon[9007]: New session 10 of user sddm.
08:40:03 sddm-helper[20690]: pam_unix(sddm:session): session opened for user sed by (uid=0)
08:40:03 kernel: elogind-daemon[9007]: New session 14 of user sed.
08:40:03 sddm-helper[20633]: pam_unix(sddm-greeter:session): session closed for user sddm

And this is how it looks for SLiM, also using OpenBox:

08:37:57 slim[18211]: pam_unix(slim:session): session opened for user sed by (uid=0)
08:37:57 kernel: elogind-daemon[9007]: New session 12 of user sed.
(...)
08:38:24 slim[18211]: pam_unix(slim:session): session closed for user sed
08:38:31 slim[18956]: pam_unix(slim:session): session opened for user sed by sed(uid=0)
08:38:31 slim[18956]: pam_elogind(slim:session): Cannot create session: Already running in a session or user slice
(...)
08:38:44 kernel: elogind-daemon[9007]: Removed session 12.
08:38:59 slim[18956]: pam_unix(slim:session): session closed for user sed

Can you see how it takes 20 seconds (timeout, maybe?) until the closing of the session in SLiM reaches elogind, but is instant with SDDM?

What puzzles me is that the log messages come from pam_unix. So why does it take 20 seconds to close the session with elogind via SLiM, but is instant with SDDM?

The SLiM pam config in /etc/pam.d/slim is quite rudimentary. It delegates everything to system-local-login, which delegates everything to system-login, which includes system-auth, which eventually calls pam_elogind for session inclusion.

The SDDM pam config in /etc/pam.d/sddm-greeter is more thorough, but behind the scene it does basically the same.

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 13, 2018

I have found out something else. If I wait for the 20 seconds before logging back into OpenBox with SLiM, everything works as expected, but I get the same session ID:

09:14:17 slim[5104]: pam_unix(slim:session): session opened for user sed by sed(uid=0)
09:14:17 kernel: elogind-daemon[9007]: New session 17 of user sed.
09:14:40 slim[5104]: pam_unix(slim:session): session closed for user sed
09:15:00 kernel: elogind-daemon[9007]: Removed session 17.
09:15:21 slim[5767]: pam_unix(slim:session): session opened for user sed by sed(uid=0)
09:15:21 kernel: elogind-daemon[9007]: New session 17 of user sed.

This is odd, but understandable if we look into the session status as reported by loginctl:

17 - sed (1004)
	   Since: Tue 2018-11-13 09:20:03 CET; 52s ago
	  Leader: 6943 (slim)
	    Seat: seat0; vc7
	 Display: :0.0
	  Remote: user root
	 Service: slim; type x11; class user
	   State: active

As you can see, slim is still there and listed as both "Leader" and "Service", with root being listed as Remote participant.

With SDDM it looks like this in a running session:

18 - sed (1004)
           Since: Tue 2018-11-13 09:22:08 CET; 3min 54s ago
          Leader: 8240 (sddm-helper)
            Seat: seat0; vc7
         Display: :0
         Service: sddm; type x11; class user
         Desktop: KDE
           State: active

Here sddm is the session leader, but sddm-helper has already closed its session. SLiM has not done something like that, as it does not spawn a helper. At least as far as I could see.

When I log out of OpenBox after using SLiM and check the session-status on a TTY within this 20 seconds delay, I get this:

17 - sed (1004)
	   Since: Tue 2018-11-13 09:20:03 CET; 1min 21s ago
	  Leader: 6943
	    Seat: seat0; vc7
	 Display: :0.0
	  Remote: user root
	 Service: slim; type x11; class user
	   State: closing

The session is closing. And the session leader, slim[6943], isn't there any more. SLiM restarts and gets a new PID (or at least re-forks) when you log out of anything.

These exact 20 seconds puzzle me. I will check session handling and finalizing in elogind tonight. Maybe it is something we can fix within elogind. Maybe the session closing in SLiM only needs to be re-organized a bit. WE'll see.

@markhindley

This comment has been minimized.

Copy link
Contributor Author

markhindley commented Nov 13, 2018

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 13, 2018

This which works as expected, but I am wondering if we can avoid this and get elogind to work with polkit compiled against libsystemd0?

No, you can't.
systemd stores session information differently than elogind does, with user units and scopes. elogind can not support either and uses internal hash containers.
So when polkit asks libsystemd for a session id to a pid, it will search for a session-<pid>.scope and find nothing.
Please see here:
https://github.com/elogind/elogind/blob/d4a3f29/src/basic/cgroup-util.c#L1713
That's the function that gets called to do the task eventually.

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 13, 2018

HA! Maybe I have a solution.

Take a look at this:
systemd/systemd@9afe9ef

The delay was added to speed up re-logins, as the users service manager (private systemd instance) would not be restarted so often thanks to this enhancement.
As elogind does not launch a private systemd instance, that delay is causing more harm than good.

I am currently testing a possible fix. :-)

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 14, 2018

Nope, not there, yet.

The delay has dropped from 20 to 11 seconds. This is caused by the user simply not logging out directly. Something is holding that back.

When PAM calls elogind to release the session, it is not stopped, but a 20 seconds timer is started. Once that is over, the session is torn down. This is done like this since 2014, to ensure, that a user is logged out properly. And it does, as the 20 second limit isn't even reached.
But what is holding back the user logout for 11 seconds?

I have made elogind to ignore sessions that are already closing, or scheduled for closing. But then SLiM itself is occupying the old session, so we still won't get a new one.
It is problematic that SLiM does not fork out a helper that identifies itself as a "Greeter" to elogind, like SDDM and lightdm do. Another problem is, that SLiM does this as root, while the others drop privilieges and start their own sessions from which they operate.

So what I will try next, is to look again into the SLiM sources to find out how much work would be involved to teach it to start its own session. This way the restarted slim process (or re-forked) won't occupy the closing session of the user.

@markhindley

This comment has been minimized.

Copy link
Contributor Author

markhindley commented Nov 14, 2018

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 14, 2018

@markhindley : I think we are hitting a dead end here.

If I am not much mistaken, SLiM has to be tought to fork a helper process that drops its privileges to a distinct SLiM user, so it can have its own session being created. Then, when the user logs in, that helper simply quits, releasing its session.
Currently SLiM does fork, but stays around until the user logs out. Unfortunately, being the session leader, SLiM occupies the session stalling the session finalization.

That the current SLiM version is from 2013-10-02, and hasn't seen any activity since then, doesn't help either.

Unless someone forks it and takes care of it, I do not see much hope. We could patch it (to death), but maybe it would be better to warn users, that they should not do fast re-logins if they use elogind.
LightDM and SDDM seem to be a better choice here...

systemd spawns a service manager per user, I guess that works around the problem. But I haven't done any research in that direction.

@markhindley

This comment has been minimized.

Copy link
Contributor Author

markhindley commented Nov 14, 2018

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 14, 2018

Are you going to persist with your first change to reduce the timeout inherited from systemd?

Absolutely! That timeout is surely useful for systemd, but for elogind it doesn't do anything else than lagging the user logout.

Before pushing that change and closing this issue, I want to test a theory of mine tonight. Currently, at least as far as I remember, there is no check whether the session leader PID (in this case from the SLiM process that just went away) really belongs to a living process.
So my theory is, that if such a check really does not exist, adding it might solve this problem. I guess I have to re-add the user-logout-delay, 1 or 2 seconds should be enough, but in that second nobody manages to re-login via SLiM anyway.

Yamakuzure added a commit that referenced this issue Nov 15, 2018

Prep v239.2: Set user_stop_delay to 0 - it is not needed.
The delay has been added by upstream, because systemd spawns an
individual service manager for users logging in. If a user logs out
of the system and logs right back in, closing and respawning of this
service manager would cause an unneccessary delay.

But elogind does not spawn any service managers, thus we do not need
such a delay and can proceed with the logout immediately.

Bug: #95
Signed-off-by: Sven Eden <sven.eden@prydeworx.com>
@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 15, 2018

Unfortunately my idea was fruitless.

SLiM does a single fork, keeping itself as session leader around. This can not be patched away easily, at least as far as I can see.

@markhindley

This comment has been minimized.

Copy link
Contributor Author

markhindley commented Nov 15, 2018

@markhindley

This comment has been minimized.

Copy link
Contributor Author

markhindley commented Nov 15, 2018

@Yamakuzure

This comment has been minimized.

Copy link
Collaborator

Yamakuzure commented Nov 15, 2018

Glad that it works! 👍

@Yamakuzure Yamakuzure closed this Nov 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment