login: allow re-using users #1402

dvdhrm · 2015-09-29T10:02:57Z

A bunch of commits to clean up the User object in systemd-logind, and eventually allow calling user_start() on a user that is already stopping. This is required to allow new login while the user is currently shutting down.

Note that this series does not really fix the issue. However, I cannot see how the series makes anything worse, so I think we can safely apply it. Anyway, the issue that is left, is:

Sep 29 11:55:03 david-t2 systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Sep 29 11:55:03 david-t2 systemd-logind[734]: Failed to start user slice: Transaction is destructive.
Sep 29 11:55:03 david-t2 su[1980]: pam_systemd(su-l:session): Failed to create session: Resource deadlock avoided

That is, if the user-service is still shutting down, queing "START" for the slice (which is still running, but queued for removal) causes this transaction-error. I could make systemd-logind wait for the queued jobs to finish, before re-queuing the START jobs. However, I really cannot understand why pid1 cannot figure it out. This should not be the burden of the user, but of pid1 to resolve, or what am I missing?

Ideas?

Lets not pretend we support changing XDG_RUNTIME_DIR via logind state files. There is no reason to ever write the string into /run, as we allocate it statically based on the UID, anyway. Lets stop that and just allocate the runtime_path in "struct User" at all times. We keep writing it into the /run state to make sure pam_systemd of previous installs can still read it. However, pam_systemd is now fixed to allocate it statically as well, so we can safely remove that some time in the future. Last but not least: If software depends on systemd, they're more than free to assume /run/user/$uid is their runtime dir. Lets not require sane applications to query the environment to get their runtime dir. As long as applications know their login-UID, they should be safe to deduce the runtime dir.

Currently, we allocate user->slice when starting a slice, but we never release it. This is incompatible if we want to re-use a user object once it was stopped. Hence, make sure user->slice is allocated statically on the user object and use "u->started || u->stopping" as an indication whether the slice is actually available on pid1 or not.

Few changes to user_new() and user_free(): - Use _cleanup_(user_freep) in constructor - return 'int' from user_new() - make user_free() deal with partially initialized objects - keep reverse-order in user_free() compared to user_new() - make user_free() return NULL - make user_free() accept NULL as no-op

Just like user->slice, there is no reason to store the unit name in /run, nor should we allocate it dynamically on job instantiation/removal. Just keep it statically around at all times and rely on user->started || user->stopping to figure out whether the unit exists or not.

Make sure to put static fields together in "struct User". This makes it easier to figure out the lifetime of each field.

If the last reference to a user is released, we queue stop-jobs for the user-service and slice. Only once those are finished, we drop the user-object. However, if a new session is opened before the user object is fully dropped, we currently incorrectly re-use the object. This has the effect, that we get stale sessions without a valid "systemd --user" instance. Fix this by properly allowing user_start() to be called, even if user->stopping is true.

poettering · 2015-09-29T19:34:40Z

Regarding the "transaction is destructive" issue: It should be enough to change manager_start_unit() to pass "replace" rather than "fail" as mode argument. "fail" means that the enqueuing of the job should rather fail than kick out other jobs, which is what you see here, "replace" means that the old job should be replaced if necessary.

dvdhrm · 2015-10-07T11:16:03Z

With "replace" I get:

Oct 07 13:14:48 david-t2 su[682]: pam_systemd(su-l:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'

dvdhrm · 2015-10-07T11:18:27Z

Oh, that might be expected.. hupsi

dvdhrm · 2015-10-08T17:47:48Z

Ok, so a heads-up to anyone interested in this: The problem is, that we get a "canceled" JobRemoved event back if we replace the job. This will then trigger a cancellation of the CreateSession() call. However, what we rather need to do is remember the job we queued and ignore any JobRemoved events for jobs that we don't listen to. Right now, we react to all job-events for the UNIT we work with.

heftig · 2015-10-22T18:00:41Z

I'd like to see this one on the 228 milestone, but I understand if there's not enough time before the release.

heftig · 2015-10-22T18:30:29Z

@dvdhrm Could the solution be as simple as this?

diff --git a/src/login/logind-dbus.c b/src/login/logind-dbus.c
index aeedf68..ee83f53 100644
--- a/src/login/logind-dbus.c
+++ b/src/login/logind-dbus.c
@@ -2600,10 +2600,8 @@ int match_job_removed(sd_bus_message *message, void *userdata, sd_bus_error *err
         }

         session = hashmap_get(m->session_units, unit);
-        if (session) {
-
-                if (streq_ptr(path, session->scope_job))
-                        session->scope_job = mfree(session->scope_job);
+        if (session && streq_ptr(path, session->scope_job)) {
+                session->scope_job = mfree(session->scope_job);

                 session_jobs_reply(session, unit, result);

@@ -2614,12 +2612,12 @@ int match_job_removed(sd_bus_message *message, void *userdata, sd_bus_error *err

         user = hashmap_get(m->user_units, unit);
         if (user) {
-
                 if (streq_ptr(path, user->service_job))
                         user->service_job = mfree(user->service_job);
-
-                if (streq_ptr(path, user->slice_job))
+                else if (streq_ptr(path, user->slice_job))
                         user->slice_job = mfree(user->slice_job);
+                else
+                        return 0;

                 LIST_FOREACH(sessions_by_user, session, user->sessions)
                         session_jobs_reply(session, unit, result);

heftig · 2015-10-22T19:05:22Z

Well, at least looks good to me. Rebased this patchset onto master, applied my patch, and I can't reproduce anymore; not that it happened all that often before…

dvdhrm · 2015-10-23T13:12:02Z

Did you test this based on this branch, or on master? Because your snippet does not fix the issue entirely, but it does help on top of this branch.

However, your snippet breaks on pid1-reloading (which spills out JobRemoved regardless whether it was removed). I'm not entirely sure what to do there, @poettering? I mean, what happens if you reload/reexecute pid1 if there are queued jobs? Do you get JobRemoved followed by JobNew? What is logind supposed to do? This doesn't make much sense to me and looks broken right now.

heftig · 2015-10-23T14:09:22Z

@dvdhrm As I said, I rebased this PR onto master, then applied my patch.

heftig · 2015-10-28T08:18:35Z

Isn't the result an improvement from the status quo? Or is the breakage on reload a regression?

This should be a fix for https://bugs.freedesktop.org/show_bug.cgi?id=89145 and https://bugs.archlinux.org/task/46387, which is hurting our users.

heftig · 2015-11-12T13:22:50Z

Can this still get into 228?

dvdhrm · 2015-11-16T14:50:21Z

See #1917.

David Herrmann added 6 commits September 28, 2015 13:00

login: group static fields in "struct User"

506f880

Make sure to put static fields together in "struct User". This makes it easier to figure out the lifetime of each field.

zonque added the login label Sep 29, 2015

dvdhrm mentioned this pull request Nov 16, 2015

login: allow re-using users (v2) #1917

Closed

dvdhrm closed this Nov 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

login: allow re-using users #1402

login: allow re-using users #1402

dvdhrm commented Sep 29, 2015

poettering commented Sep 29, 2015

dvdhrm commented Oct 7, 2015

dvdhrm commented Oct 7, 2015

dvdhrm commented Oct 8, 2015

heftig commented Oct 22, 2015

heftig commented Oct 22, 2015

heftig commented Oct 22, 2015

dvdhrm commented Oct 23, 2015

heftig commented Oct 23, 2015

heftig commented Oct 28, 2015

heftig commented Nov 12, 2015

dvdhrm commented Nov 16, 2015

login: allow re-using users #1402

login: allow re-using users #1402

Conversation

dvdhrm commented Sep 29, 2015

poettering commented Sep 29, 2015

dvdhrm commented Oct 7, 2015

dvdhrm commented Oct 7, 2015

dvdhrm commented Oct 8, 2015

heftig commented Oct 22, 2015

heftig commented Oct 22, 2015

heftig commented Oct 22, 2015

dvdhrm commented Oct 23, 2015

heftig commented Oct 23, 2015

heftig commented Oct 28, 2015

heftig commented Nov 12, 2015

dvdhrm commented Nov 16, 2015