Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

login: allow re-using users #1402

Closed
wants to merge 6 commits into from
Closed

login: allow re-using users #1402

wants to merge 6 commits into from

Conversation

dvdhrm
Copy link
Contributor

@dvdhrm dvdhrm commented Sep 29, 2015

A bunch of commits to clean up the User object in systemd-logind, and eventually allow calling user_start() on a user that is already stopping. This is required to allow new login while the user is currently shutting down.

Note that this series does not really fix the issue. However, I cannot see how the series makes anything worse, so I think we can safely apply it. Anyway, the issue that is left, is:

Sep 29 11:55:03 david-t2 systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Sep 29 11:55:03 david-t2 systemd-logind[734]: Failed to start user slice: Transaction is destructive.
Sep 29 11:55:03 david-t2 su[1980]: pam_systemd(su-l:session): Failed to create session: Resource deadlock avoided

That is, if the user-service is still shutting down, queing "START" for the slice (which is still running, but queued for removal) causes this transaction-error. I could make systemd-logind wait for the queued jobs to finish, before re-queuing the START jobs. However, I really cannot understand why pid1 cannot figure it out. This should not be the burden of the user, but of pid1 to resolve, or what am I missing?

Ideas?

David Herrmann added 6 commits September 28, 2015 13:00
Lets not pretend we support changing XDG_RUNTIME_DIR via logind state
files. There is no reason to ever write the string into /run, as we
allocate it statically based on the UID, anyway. Lets stop that and just
allocate the runtime_path in "struct User" at all times.

We keep writing it into the /run state to make sure pam_systemd of
previous installs can still read it. However, pam_systemd is now fixed to
allocate it statically as well, so we can safely remove that some time in
the future.

Last but not least: If software depends on systemd, they're more than free
to assume /run/user/$uid is their runtime dir. Lets not require sane
applications to query the environment to get their runtime dir. As long as
applications know their login-UID, they should be safe to deduce the
runtime dir.
Currently, we allocate user->slice when starting a slice, but we never
release it. This is incompatible if we want to re-use a user object once
it was stopped. Hence, make sure user->slice is allocated statically on
the user object and use "u->started || u->stopping" as an indication
whether the slice is actually available on pid1 or not.
Few changes to user_new() and user_free():
 - Use _cleanup_(user_freep) in constructor
 - return 'int' from user_new()
 - make user_free() deal with partially initialized objects
 - keep reverse-order in user_free() compared to user_new()
 - make user_free() return NULL
 - make user_free() accept NULL as no-op
Just like user->slice, there is no reason to store the unit name in /run,
nor should we allocate it dynamically on job instantiation/removal. Just
keep it statically around at all times and rely on user->started ||
user->stopping to figure out whether the unit exists or not.
Make sure to put static fields together in "struct User". This makes it
easier to figure out the lifetime of each field.
If the last reference to a user is released, we queue stop-jobs for the
user-service and slice. Only once those are finished, we drop the
user-object. However, if a new session is opened before the user object is
fully dropped, we currently incorrectly re-use the object. This has the
effect, that we get stale sessions without a valid "systemd --user"
instance.

Fix this by properly allowing user_start() to be called, even if
user->stopping is true.
@zonque zonque added the login label Sep 29, 2015
@poettering
Copy link
Member

Regarding the "transaction is destructive" issue: It should be enough to change manager_start_unit() to pass "replace" rather than "fail" as mode argument. "fail" means that the enqueuing of the job should rather fail than kick out other jobs, which is what you see here, "replace" means that the old job should be replaced if necessary.

@dvdhrm
Copy link
Contributor Author

dvdhrm commented Oct 7, 2015

With "replace" I get:

Oct 07 13:14:48 david-t2 su[682]: pam_systemd(su-l:session): Failed to create session: Start job for unit user-0.slice failed with 'canceled'

@dvdhrm
Copy link
Contributor Author

dvdhrm commented Oct 7, 2015

Oh, that might be expected.. hupsi

@dvdhrm
Copy link
Contributor Author

dvdhrm commented Oct 8, 2015

Ok, so a heads-up to anyone interested in this: The problem is, that we get a "canceled" JobRemoved event back if we replace the job. This will then trigger a cancellation of the CreateSession() call. However, what we rather need to do is remember the job we queued and ignore any JobRemoved events for jobs that we don't listen to. Right now, we react to all job-events for the UNIT we work with.

@heftig
Copy link
Contributor

heftig commented Oct 22, 2015

I'd like to see this one on the 228 milestone, but I understand if there's not enough time before the release.

@heftig
Copy link
Contributor

heftig commented Oct 22, 2015

@dvdhrm Could the solution be as simple as this?

diff --git a/src/login/logind-dbus.c b/src/login/logind-dbus.c
index aeedf68..ee83f53 100644
--- a/src/login/logind-dbus.c
+++ b/src/login/logind-dbus.c
@@ -2600,10 +2600,8 @@ int match_job_removed(sd_bus_message *message, void *userdata, sd_bus_error *err
         }

         session = hashmap_get(m->session_units, unit);
-        if (session) {
-
-                if (streq_ptr(path, session->scope_job))
-                        session->scope_job = mfree(session->scope_job);
+        if (session && streq_ptr(path, session->scope_job)) {
+                session->scope_job = mfree(session->scope_job);

                 session_jobs_reply(session, unit, result);

@@ -2614,12 +2612,12 @@ int match_job_removed(sd_bus_message *message, void *userdata, sd_bus_error *err

         user = hashmap_get(m->user_units, unit);
         if (user) {
-
                 if (streq_ptr(path, user->service_job))
                         user->service_job = mfree(user->service_job);
-
-                if (streq_ptr(path, user->slice_job))
+                else if (streq_ptr(path, user->slice_job))
                         user->slice_job = mfree(user->slice_job);
+                else
+                        return 0;

                 LIST_FOREACH(sessions_by_user, session, user->sessions)
                         session_jobs_reply(session, unit, result);

@heftig
Copy link
Contributor

heftig commented Oct 22, 2015

Well, at least looks good to me. Rebased this patchset onto master, applied my patch, and I can't reproduce anymore; not that it happened all that often before…

@dvdhrm
Copy link
Contributor Author

dvdhrm commented Oct 23, 2015

Did you test this based on this branch, or on master? Because your snippet does not fix the issue entirely, but it does help on top of this branch.

However, your snippet breaks on pid1-reloading (which spills out JobRemoved regardless whether it was removed). I'm not entirely sure what to do there, @poettering? I mean, what happens if you reload/reexecute pid1 if there are queued jobs? Do you get JobRemoved followed by JobNew? What is logind supposed to do? This doesn't make much sense to me and looks broken right now.

@heftig
Copy link
Contributor

heftig commented Oct 23, 2015

@dvdhrm As I said, I rebased this PR onto master, then applied my patch.

@heftig
Copy link
Contributor

heftig commented Oct 28, 2015

Isn't the result an improvement from the status quo? Or is the breakage on reload a regression?

This should be a fix for https://bugs.freedesktop.org/show_bug.cgi?id=89145 and https://bugs.archlinux.org/task/46387, which is hurting our users.

@heftig
Copy link
Contributor

heftig commented Nov 12, 2015

Can this still get into 228?

@dvdhrm
Copy link
Contributor Author

dvdhrm commented Nov 16, 2015

See #1917.

@dvdhrm dvdhrm closed this Nov 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

4 participants