New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logind-user: switch tracking logic to user-runtime-dir@.service #30910
Conversation
7759270
to
9afdaf1
Compare
9afdaf1
to
b371624
Compare
Consider the following situation: 1. A 'background-light' session is created (i.e. no user service manager is needed) 2. The new session scope unit pulls in user-runtime-dir@.service through Want= (weak dep) 3. The session exits. A stop job is enqueued for user-runtime-dir@.service due to StopWhenUnneeded=yes 4. At the same time, another session which requires user manager is started. However, the session scope unit has JobMode=fail, and the start job for user-runtime-dir@.service is deleted as it conflicts with the stop job. As we do require these services, let's use Requires=. Normally, logind should enqueue a start job for user@.service explicitly through user_start(). Yet this cannot happen if the user is already started, and the existing session requires no user manager. This will be fixed later in systemd#30910.
b371624
to
b829e12
Compare
b829e12
to
e4f7542
Compare
6ffb264
to
03a029a
Compare
6979992
to
6291942
Compare
@poettering Any chance you can bless this? TEST-35-LOGIN is currently racy (e.g. https://jenkins-systemd.apps.ocp.cloud.ci.centos.org/job/upstream-centos8/9519/) |
@mrc0mmand Could you test this PR and see if the race is indeed gone? |
so, hmmm unless i am misunderstand this this would have the effect that sessions would not wait for user@.service to fully become up before they are allowed to proceed? but that's a problem. we should make sure that regular session login is delayed until user@.service is fully up, only those of type "manager" should be able to come up before that. This pretty much requires the user@.service service job to be tracked in all cases. I have the suspicion we should have two service fields: one for tracking user@.service jobs and one for tracking user-runtime-dir@.service. And then regular sessions are allowed to complete statr when both jobs are finished, and "manager" sessions are allowed once the latter session type finishes. or am i misreading this PR? |
1d4f38c
to
5fa0356
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all reasonable. All that login stuff is always hard to get right, but at the conceptual level, this sounds like the right approach.
(!SESSION_CLASS_WANTS_SERVICE_MANAGER(s->class) || !s->user->service_job); | ||
return s->scope_job || | ||
s->user->runtime_dir_job || | ||
(SESSION_CLASS_WANTS_SERVICE_MANAGER(s->class) && s->user->service_manager_job); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to check the class here? If the job is there, we might as well wait for it to finish. (It doesn't matter greatly, but the condition "are there any jobs" is easier than "are the any jobs that meeting the following criteria".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then manager sessions would deadlock, since they'll start waiting for themselves to finish startup.
5fa0356
to
ac87f2c
Compare
lgtm, thanks for working on this. and sorry for taking so long on the review. |
ac87f2c
to
368f323
Compare
Rebased onto #30226 and added one more commit. @poettering and @mrc0mmand, please take a final look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been hammering this PR with TEST-35 for the past 2+ hours and it hasn't failed once, so LGTM, thank you!
…temd#30226) We don't usually say ", refusing" in bus error messages. Also, make use of unref_and_replace_full.
…ir@.service Before systemd#30884, the user state is tied to user@.service (user service manager). However, systemd#30884 introduced sessions that need no manager, and we can no longer rely on that. Consider the following situation: 1. A 'background-light' session '1' is created (i.e. no user service manager is needed) 2. Session '1' scope unit pulls in user-runtime-dir@.service 3. Session '1' exits. A stop job is enqueued for user-runtime-dir@.service due to StopWhenUnneeded=yes 4. At the same time, another session '2' which requires user manager is started. However, session scope units have JobMode=fail, therefore the start job for user-runtime-dir@.service that was pulled in by session '2' scope job is deleted as it conflicts with the stop job. We want session scope units to continue using JobMode=fail, but we still need the dependencies to be started correctly, i.e. explicitly requested by logind beforehand. Therefore, let's stop using StopWhenUnneeded=yes for user-runtime-dir@.service, and track users' `started` and `stopping` state based on that when user@.service is not needed. Then, for every invocation of user_start(), we'll recheck if we need the service manager and start it if so. Also, the dependency type on user-runtime-dir@.service from user@.service is upgraded to `BindsTo=`, in order to ensure that when logind stops the former, the latter is stopped as well.
Since we do require these basic user services, let's make the dependency stronger. Note that logind should enqueue start jobs for these already in user_start(), so mostly just paranoia.
Follow-up for 4cb4e6c Fixes systemd#31287
368f323
to
0ba2495
Compare
lgtm, thanks! |
The Testing Farm fail is unrelated (it looks like the intermittent QEMU segfaults were replaced by intermittent hangs, lovely). |
Currently, user_start() doesn't return any error, but let's not eat up the return value. Preparation for systemd#30910, after which user_start() does return error.
Also fixes #31287