Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: in --user mode, report READY=1 as soon as basic.target is reached #7102

Merged
merged 1 commit into from
Oct 24, 2017

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented Oct 15, 2017

When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out two READY=1 notifications:

  • once with status "Reached basic.target"
  • and later the normal one with "Startup finished in ..."
    I think this is OK.

Also fixes #2863.

if (MANAGER_IS_USER(m) &&
unit && streq(unit->id, SPECIAL_BASIC_TARGET) &&
job_type == JOB_START && job_result == JOB_DONE)
manager_notify_ready(m, unit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i think it would be nicer to to make this "level triggered" rather than "edge triggered". that way we the function wouldn't have to have the additional arguments. i.e. something like:

unit = manager_get_unit(m, SPECIAL_BASIC_TARGET);
if (!unit || !unit->job)
        manager_notify_ready(m);

and then manager_notify_ready() should probably maintain an internal bool in the Manager object, that sends the READY=1 only once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -254,6 +254,8 @@ struct Manager {

bool taint_usr:1;

bool ready_sent:1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i figure we should serialize this bool during daemon reload

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(admittedly sending out READY=1 twice isn't too bad, but still)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out a READY=1 notification when either of two
conditions becomes true:
- basic.target/start job is gone,
- the initial transaction is done.

Also fixes systemd#2863.
@poettering
Copy link
Member

lgtm

@poettering poettering added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Oct 19, 2017
@nh2
Copy link
Contributor

nh2 commented Oct 22, 2017

@keszybz Could you have a short look at #2863 (comment) regarding this PR?

Especially the bit about

Also, @keszybz do the investigations for your patch linked above explain this, or would they potentially hide this error source if it's a different one?

If there is an issue with differences in behaviour between the fstab generator and .mount units, and this fix would hide the issue, I imagine it would make it more difficult to investigate.

That is not to say I wouldn't value this fix by itself, but I've spent many days to arrive at a well working repro for that difference/bug, and it would be a shame if this would result in breaking the repro before it's clear whether there is a bug or not.

@keszybz
Copy link
Member Author

keszybz commented Oct 23, 2017

@nh2 I have no idea. nixos is foreign enough that any issues you have that are related to unit enablement is somethething that upstream developers are unlikely to be able to help with.

@nh2
Copy link
Contributor

nh2 commented Oct 23, 2017

nixos is foreign enough that any issues you have that are related to unit enablement is somethething that upstream developers are unlikely to be able to help with.

@keszybz How comes? Are you aware of any differences in systemd usage between NixOS and other distributions that would affect unit enablement? The only difference I'm aware of is the means by which the unit files end up on disk before boot; no magic there. And for my case, the issue seems to be straight in /etc/fstab as present on virtually any distribution.

@keszybz
Copy link
Member Author

keszybz commented Oct 23, 2017

In the comment you link to an issue where you link to .nix files. That's a completely different technology, and systemd upstream can't help you debug that. If you are able to simplify that to something that can be described in systemd terminology (i.e. in terms of systemd units and dependency and ordering relationships) we could help.

the issue seems to be straight in /etc/fstab as present on virtually any distribution.

Not really. Normally there's no direction connection between fstab and the user units.

@nh2
Copy link
Contributor

nh2 commented Oct 23, 2017

In the comment you link to an issue where you link to .nix files. That's a completely different technology, and systemd upstream can't help you debug that. If you are able to simplify that to something that can be described in systemd terminology (i.e. in terms of systemd units and dependency and ordering relationships) we could help.

There's no issue with the .nix files themselves, they render out fine, creating a set of systemd unit files and menioned fstab entry, in the same fashion as Ansible, Chef or Puppet might be creating those.

It may be difficult for me to reduce the problem to only unit files, as the problem (at least as far as I have managed to minimise the repro so far) occurs only when a network file system is mounted via an fstab entry that is executed via systemd (when I switch the fstab entry for a .mount unit, the problem disappears).

That means I currently have to provide at least two machines, one that provides the network file system to mount, and one that mounts it and shows the problematic symptoms. The nix-based tooling I've used so far makes it reasonably easy to provide those machines with their exact reproducible disk contents (though I, too, wish that there was a smaller repro for the issue, e.g. if I could make it fail that way inside a single VM image).

There are two ways I could immediately make what I have so far available for inspection by upstream devs that don't involve nix based tooling and only involves unit files:

  • I could deploy my reproduction to a set of AWS machines and provide access to them (given an SSH pubkey). The issue becomes apparent immetiately after a reboot in journalctl.
  • I could do the above and provide the AWS virtual machine images publicly.

Would any of this be useful / would you be willing or have the time to take a short look via SSH at my reproduction system to judge whether your fix in this issue might hide that problem?

I'm aware this may be asking a lot, especially if working on systemd isn't your day job, but it may be one of the faster ways to ensure #2863 is really fixed for good (and I'd also be happy to make myself available in a video call or real time chat to minimise the time it would take).

Normally there's no direction connection between fstab and the user units.

Agreed, which is why I suspect that there's a bug somewhere in there. Of course there's still the chance that I'm using something the wrong way, but like you I see no way how my fstab entry could result in a pam_systemd(login:session) error.

@poettering
Copy link
Member

hmm, any chance you can rebase this PR on current git and force push? The CI was broken with git until just now, and this was fixed a few moments ago with another commit. To make sure we don't break it right-away again, it would be great if you could rebase this PR and force push so that the CI runs over this again?

@keszybz
Copy link
Member Author

keszybz commented Oct 24, 2017

Rebuilding.

@poettering poettering merged commit 0c2826c into systemd:master Oct 24, 2017
@keszybz keszybz deleted the start-user-mode-quickly branch October 25, 2017 08:36
@languitar
Copy link

When will this end up in a release? Neither this PR nor #2863 seem to be assigned to a milestone.

@poettering
Copy link
Member

$ git describe 0c2826c60c508a0300f9978ae3818d6f4c00d53a
v235-84-g0c2826c60

Or in other words, this has been merged after v235, and hence will be included in v236 as soon as that's released (which is not too far off, given that only 4 issues are left that are marked for v236:
https://github.com/systemd/systemd/milestones/v236)

Werkov pushed a commit to Werkov/systemd that referenced this pull request Nov 27, 2018
…ed (systemd#7102)

When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out a READY=1 notification when either of two
conditions becomes true:
- basic.target/start job is gone,
- the initial transaction is done.

Also fixes systemd#2863.

(cherry picked from commit 0c2826c)

[fbui: adjust context]
Werkov pushed a commit to Werkov/systemd that referenced this pull request Jun 3, 2020
…ed (systemd#7102)

When a user logs in, systemd-pam will wait for the user manager instance to
report readiness. We don't need to wait for all the jobs to finish, it
is enough if the basic startup is done and the user manager is responsive.

systemd --user will now send out a READY=1 notification when either of two
conditions becomes true:
- basic.target/start job is gone,
- the initial transaction is done.

Also fixes systemd#2863.

(cherry picked from commit 0c2826c)

[fbui: adjust context]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed pid1 user-session
4 participants