Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: in --user mode, report READY=1 as soon as basic.target is reached #7102

Merged
merged 1 commit into from Oct 24, 2017

Conversation

4 participants
@keszybz
Copy link
Member

commented Oct 15, 2017

When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out two READY=1 notifications:

  • once with status "Reached basic.target"
  • and later the normal one with "Startup finished in ..."
    I think this is OK.

Also fixes #2863.

if (MANAGER_IS_USER(m) &&
unit && streq(unit->id, SPECIAL_BASIC_TARGET) &&
job_type == JOB_START && job_result == JOB_DONE)
manager_notify_ready(m, unit);

This comment has been minimized.

Copy link
@poettering

poettering Oct 18, 2017

Member

hmm, i think it would be nicer to to make this "level triggered" rather than "edge triggered". that way we the function wouldn't have to have the additional arguments. i.e. something like:

unit = manager_get_unit(m, SPECIAL_BASIC_TARGET);
if (!unit || !unit->job)
        manager_notify_ready(m);

and then manager_notify_ready() should probably maintain an internal bool in the Manager object, that sends the READY=1 only once?

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 18, 2017

Author Member

Updated.

@keszybz keszybz force-pushed the keszybz:start-user-mode-quickly branch from 83f4781 to 2f15bc8 Oct 18, 2017

@@ -254,6 +254,8 @@ struct Manager {

bool taint_usr:1;

bool ready_sent:1;

This comment has been minimized.

Copy link
@poettering

poettering Oct 19, 2017

Member

hmm, i figure we should serialize this bool during daemon reload

This comment has been minimized.

Copy link
@poettering

poettering Oct 19, 2017

Member

(admittedly sending out READY=1 twice isn't too bad, but still)

This comment has been minimized.

Copy link
@keszybz

keszybz Oct 19, 2017

Author Member

Updated.

core: in --user mode, report READY=1 as soon as basic.target is reached
When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out a READY=1 notification when either of two
conditions becomes true:
- basic.target/start job is gone,
- the initial transaction is done.

Also fixes #2863.

@keszybz keszybz force-pushed the keszybz:start-user-mode-quickly branch from 2f15bc8 to 80909b4 Oct 19, 2017

@poettering

This comment has been minimized.

Copy link
Member

commented Oct 19, 2017

lgtm

@nh2

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2017

@keszybz Could you have a short look at #2863 (comment) regarding this PR?

Especially the bit about

Also, @keszybz do the investigations for your patch linked above explain this, or would they potentially hide this error source if it's a different one?

If there is an issue with differences in behaviour between the fstab generator and .mount units, and this fix would hide the issue, I imagine it would make it more difficult to investigate.

That is not to say I wouldn't value this fix by itself, but I've spent many days to arrive at a well working repro for that difference/bug, and it would be a shame if this would result in breaking the repro before it's clear whether there is a bug or not.

@keszybz

This comment has been minimized.

Copy link
Member Author

commented Oct 23, 2017

@nh2 I have no idea. nixos is foreign enough that any issues you have that are related to unit enablement is somethething that upstream developers are unlikely to be able to help with.

@nh2

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2017

nixos is foreign enough that any issues you have that are related to unit enablement is somethething that upstream developers are unlikely to be able to help with.

@keszybz How comes? Are you aware of any differences in systemd usage between NixOS and other distributions that would affect unit enablement? The only difference I'm aware of is the means by which the unit files end up on disk before boot; no magic there. And for my case, the issue seems to be straight in /etc/fstab as present on virtually any distribution.

@keszybz

This comment has been minimized.

Copy link
Member Author

commented Oct 23, 2017

In the comment you link to an issue where you link to .nix files. That's a completely different technology, and systemd upstream can't help you debug that. If you are able to simplify that to something that can be described in systemd terminology (i.e. in terms of systemd units and dependency and ordering relationships) we could help.

the issue seems to be straight in /etc/fstab as present on virtually any distribution.

Not really. Normally there's no direction connection between fstab and the user units.

@nh2

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2017

In the comment you link to an issue where you link to .nix files. That's a completely different technology, and systemd upstream can't help you debug that. If you are able to simplify that to something that can be described in systemd terminology (i.e. in terms of systemd units and dependency and ordering relationships) we could help.

There's no issue with the .nix files themselves, they render out fine, creating a set of systemd unit files and menioned fstab entry, in the same fashion as Ansible, Chef or Puppet might be creating those.

It may be difficult for me to reduce the problem to only unit files, as the problem (at least as far as I have managed to minimise the repro so far) occurs only when a network file system is mounted via an fstab entry that is executed via systemd (when I switch the fstab entry for a .mount unit, the problem disappears).

That means I currently have to provide at least two machines, one that provides the network file system to mount, and one that mounts it and shows the problematic symptoms. The nix-based tooling I've used so far makes it reasonably easy to provide those machines with their exact reproducible disk contents (though I, too, wish that there was a smaller repro for the issue, e.g. if I could make it fail that way inside a single VM image).

There are two ways I could immediately make what I have so far available for inspection by upstream devs that don't involve nix based tooling and only involves unit files:

  • I could deploy my reproduction to a set of AWS machines and provide access to them (given an SSH pubkey). The issue becomes apparent immetiately after a reboot in journalctl.
  • I could do the above and provide the AWS virtual machine images publicly.

Would any of this be useful / would you be willing or have the time to take a short look via SSH at my reproduction system to judge whether your fix in this issue might hide that problem?

I'm aware this may be asking a lot, especially if working on systemd isn't your day job, but it may be one of the faster ways to ensure #2863 is really fixed for good (and I'd also be happy to make myself available in a video call or real time chat to minimise the time it would take).

Normally there's no direction connection between fstab and the user units.

Agreed, which is why I suspect that there's a bug somewhere in there. Of course there's still the chance that I'm using something the wrong way, but like you I see no way how my fstab entry could result in a pam_systemd(login:session) error.

@poettering

This comment has been minimized.

Copy link
Member

commented Oct 24, 2017

hmm, any chance you can rebase this PR on current git and force push? The CI was broken with git until just now, and this was fixed a few moments ago with another commit. To make sure we don't break it right-away again, it would be great if you could rebase this PR and force push so that the CI runs over this again?

@keszybz

This comment has been minimized.

Copy link
Member Author

commented Oct 24, 2017

Rebuilding.

@poettering poettering merged commit 0c2826c into systemd:master Oct 24, 2017

4 checks passed

semaphoreci The build passed on Semaphore.
Details
xenial-amd64 autopkgtest finished (success)
Details
xenial-i386 autopkgtest finished (success)
Details
xenial-s390x autopkgtest finished (success)
Details

@keszybz keszybz deleted the keszybz:start-user-mode-quickly branch Oct 25, 2017

@languitar

This comment has been minimized.

Copy link

commented Nov 26, 2017

When will this end up in a release? Neither this PR nor #2863 seem to be assigned to a milestone.

@poettering

This comment has been minimized.

Copy link
Member

commented Nov 26, 2017

$ git describe 0c2826c60c508a0300f9978ae3818d6f4c00d53a
v235-84-g0c2826c60

Or in other words, this has been merged after v235, and hence will be included in v236 as soon as that's released (which is not too far off, given that only 4 issues are left that are marked for v236:
https://github.com/systemd/systemd/milestones/v236)

Werkov pushed a commit to Werkov/systemd that referenced this pull request Nov 27, 2018

core: in --user mode, report READY=1 as soon as basic.target is reach…
…ed (systemd#7102)

When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out a READY=1 notification when either of two
conditions becomes true:
- basic.target/start job is gone,
- the initial transaction is done.

Also fixes systemd#2863.

(cherry picked from commit 0c2826c)

[fbui: adjust context]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.