core: in --user mode, report READY=1 as soon as basic.target is reached #7102

keszybz · 2017-10-15T21:56:23Z

When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out two READY=1 notifications:

once with status "Reached basic.target"
and later the normal one with "Startup finished in ..."
I think this is OK.

Also fixes #2863.

poettering · 2017-10-18T16:31:54Z

src/core/manager.c

+        if (MANAGER_IS_USER(m) &&
+            unit && streq(unit->id, SPECIAL_BASIC_TARGET) &&
+            job_type == JOB_START && job_result == JOB_DONE)
+                manager_notify_ready(m, unit);


hmm, i think it would be nicer to to make this "level triggered" rather than "edge triggered". that way we the function wouldn't have to have the additional arguments. i.e. something like:

unit = manager_get_unit(m, SPECIAL_BASIC_TARGET); if (!unit || !unit->job) manager_notify_ready(m);

and then manager_notify_ready() should probably maintain an internal bool in the Manager object, that sends the READY=1 only once?

poettering · 2017-10-19T07:51:15Z

src/core/manager.h

@@ -254,6 +254,8 @@ struct Manager {

        bool taint_usr:1;

+        bool ready_sent:1;


hmm, i figure we should serialize this bool during daemon reload

(admittedly sending out READY=1 twice isn't too bad, but still)

When a user logs in, systemd-pam will wait for the user manager instance to report readiness. We don't need to wait for all the jobs to finish, it is enough if the basic startup is done and the user manager is responsive. systemd --user will now send out a READY=1 notification when either of two conditions becomes true: - basic.target/start job is gone, - the initial transaction is done. Also fixes systemd#2863.

poettering · 2017-10-19T08:10:37Z

lgtm

nh2 · 2017-10-22T14:27:34Z

@keszybz Could you have a short look at #2863 (comment) regarding this PR?

Especially the bit about

Also, @keszybz do the investigations for your patch linked above explain this, or would they potentially hide this error source if it's a different one?

If there is an issue with differences in behaviour between the fstab generator and .mount units, and this fix would hide the issue, I imagine it would make it more difficult to investigate.

That is not to say I wouldn't value this fix by itself, but I've spent many days to arrive at a well working repro for that difference/bug, and it would be a shame if this would result in breaking the repro before it's clear whether there is a bug or not.

keszybz · 2017-10-23T08:25:21Z

@nh2 I have no idea. nixos is foreign enough that any issues you have that are related to unit enablement is somethething that upstream developers are unlikely to be able to help with.

nh2 · 2017-10-23T13:01:59Z

nixos is foreign enough that any issues you have that are related to unit enablement is somethething that upstream developers are unlikely to be able to help with.

@keszybz How comes? Are you aware of any differences in systemd usage between NixOS and other distributions that would affect unit enablement? The only difference I'm aware of is the means by which the unit files end up on disk before boot; no magic there. And for my case, the issue seems to be straight in /etc/fstab as present on virtually any distribution.

keszybz · 2017-10-23T13:44:01Z

In the comment you link to an issue where you link to .nix files. That's a completely different technology, and systemd upstream can't help you debug that. If you are able to simplify that to something that can be described in systemd terminology (i.e. in terms of systemd units and dependency and ordering relationships) we could help.

the issue seems to be straight in /etc/fstab as present on virtually any distribution.

Not really. Normally there's no direction connection between fstab and the user units.

nh2 · 2017-10-23T14:41:58Z

In the comment you link to an issue where you link to .nix files. That's a completely different technology, and systemd upstream can't help you debug that. If you are able to simplify that to something that can be described in systemd terminology (i.e. in terms of systemd units and dependency and ordering relationships) we could help.

There's no issue with the .nix files themselves, they render out fine, creating a set of systemd unit files and menioned fstab entry, in the same fashion as Ansible, Chef or Puppet might be creating those.

It may be difficult for me to reduce the problem to only unit files, as the problem (at least as far as I have managed to minimise the repro so far) occurs only when a network file system is mounted via an fstab entry that is executed via systemd (when I switch the fstab entry for a .mount unit, the problem disappears).

That means I currently have to provide at least two machines, one that provides the network file system to mount, and one that mounts it and shows the problematic symptoms. The nix-based tooling I've used so far makes it reasonably easy to provide those machines with their exact reproducible disk contents (though I, too, wish that there was a smaller repro for the issue, e.g. if I could make it fail that way inside a single VM image).

There are two ways I could immediately make what I have so far available for inspection by upstream devs that don't involve nix based tooling and only involves unit files:

I could deploy my reproduction to a set of AWS machines and provide access to them (given an SSH pubkey). The issue becomes apparent immetiately after a reboot in journalctl.
I could do the above and provide the AWS virtual machine images publicly.

Would any of this be useful / would you be willing or have the time to take a short look via SSH at my reproduction system to judge whether your fix in this issue might hide that problem?

I'm aware this may be asking a lot, especially if working on systemd isn't your day job, but it may be one of the faster ways to ensure #2863 is really fixed for good (and I'd also be happy to make myself available in a video call or real time chat to minimise the time it would take).

Normally there's no direction connection between fstab and the user units.

Agreed, which is why I suspect that there's a bug somewhere in there. Of course there's still the chance that I'm using something the wrong way, but like you I see no way how my fstab entry could result in a pam_systemd(login:session) error.

poettering · 2017-10-24T07:44:10Z

hmm, any chance you can rebase this PR on current git and force push? The CI was broken with git until just now, and this was fixed a few moments ago with another commit. To make sure we don't break it right-away again, it would be great if you could rebase this PR and force push so that the CI runs over this again?

keszybz · 2017-10-24T08:31:45Z

Rebuilding.

languitar · 2017-11-26T14:01:37Z

When will this end up in a release? Neither this PR nor #2863 seem to be assigned to a milestone.

poettering · 2017-11-26T17:09:44Z

$ git describe 0c2826c60c508a0300f9978ae3818d6f4c00d53a
v235-84-g0c2826c60

Or in other words, this has been merged after v235, and hence will be included in v236 as soon as that's released (which is not too far off, given that only 4 issues are left that are marked for v236:
https://github.com/systemd/systemd/milestones/v236)

…ed (systemd#7102) When a user logs in, systemd-pam will wait for the user manager instance to report readiness. We don't need to wait for all the jobs to finish, it is enough if the basic startup is done and the user manager is responsive. systemd --user will now send out a READY=1 notification when either of two conditions becomes true: - basic.target/start job is gone, - the initial transaction is done. Also fixes systemd#2863. (cherry picked from commit 0c2826c) [fbui: adjust context]

keszybz added pid1 user-session labels Oct 15, 2017

poettering requested changes Oct 18, 2017

View reviewed changes

keszybz force-pushed the start-user-mode-quickly branch from 83f4781 to 2f15bc8 Compare October 18, 2017 19:59

poettering requested changes Oct 19, 2017

View reviewed changes

keszybz force-pushed the start-user-mode-quickly branch from 2f15bc8 to 80909b4 Compare October 19, 2017 08:03

poettering approved these changes Oct 19, 2017

View reviewed changes

poettering added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Oct 19, 2017

poettering merged commit 0c2826c into systemd:master Oct 24, 2017

keszybz deleted the start-user-mode-quickly branch October 25, 2017 08:36

devurandom mentioned this pull request Mar 5, 2018

SSH login delayed by 25 seconds with pam_systemd(sshd:session): Failed to create session: Connection timed out NixOS/nixpkgs#30348

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: in --user mode, report READY=1 as soon as basic.target is reached #7102

core: in --user mode, report READY=1 as soon as basic.target is reached #7102

keszybz commented Oct 15, 2017

poettering Oct 18, 2017

keszybz Oct 18, 2017

poettering Oct 19, 2017

poettering Oct 19, 2017

keszybz Oct 19, 2017

poettering commented Oct 19, 2017

nh2 commented Oct 22, 2017

keszybz commented Oct 23, 2017

nh2 commented Oct 23, 2017

keszybz commented Oct 23, 2017

nh2 commented Oct 23, 2017

poettering commented Oct 24, 2017

keszybz commented Oct 24, 2017

languitar commented Nov 26, 2017

poettering commented Nov 26, 2017

		@@ -254,6 +254,8 @@ struct Manager {

		bool taint_usr:1;

		bool ready_sent:1;

core: in --user mode, report READY=1 as soon as basic.target is reached #7102

core: in --user mode, report READY=1 as soon as basic.target is reached #7102

Conversation

keszybz commented Oct 15, 2017

poettering Oct 18, 2017

Choose a reason for hiding this comment

keszybz Oct 18, 2017

Choose a reason for hiding this comment

poettering Oct 19, 2017

Choose a reason for hiding this comment

poettering Oct 19, 2017

Choose a reason for hiding this comment

keszybz Oct 19, 2017

Choose a reason for hiding this comment

poettering commented Oct 19, 2017

nh2 commented Oct 22, 2017

keszybz commented Oct 23, 2017

nh2 commented Oct 23, 2017

keszybz commented Oct 23, 2017

nh2 commented Oct 23, 2017

poettering commented Oct 24, 2017

keszybz commented Oct 24, 2017

languitar commented Nov 26, 2017

poettering commented Nov 26, 2017