Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make logind restartable #5600

Merged
merged 5 commits into from
Jun 24, 2017
Merged

Conversation

fbuihuu
Copy link
Contributor

@fbuihuu fbuihuu commented Mar 16, 2017

Please double check as I'm really not confident with this area.

It seems to work as expected here with http://cgit.freedesktop.org/xorg/xserver/commit/?id=dc48bd653c7e1013e2d69e3f59ae3cbc0c893473 reverted.

Thanks.

@fbuihuu fbuihuu added the login label Mar 16, 2017
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. This is a complicated area, but the changes look correct (module some minor nitpicks). We should merge and then test carefully.


fprintf(f, "DEVICES=");
HASHMAP_FOREACH(sd, s->devices, i)
fprintf(f, "%lu ", (unsigned long)sd->dev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use DEV_FMT and drop the cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Mar 23, 2017

Yeah, I would feel more confident if it could be reviewed by more people. Maybe @dvdhrm could have a look as he wrote the session device stuff...

Copy link
Contributor

@dvdhrm dvdhrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you re-open the file-descriptors after restart, they will be a different file-context than before. Hence, the revocation logic will not work. I don't see how this is supposed to work with your patch.

I recommend using the FD-Store feature of pid1 to store your FDs, just like the journal does.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Mar 24, 2017

If you re-open the file-descriptors after restart, they will be a different file-context than before. Hence, the revocation logic will not work.

Hmm, the file descriptors of the (session) devices are supposed to be kept open by the xserver. And in this case I don't why also keeping a second references to those fds by sending the fds to PIDs would help when restarting logind.

Which data of the file-context are definitively lost after re-opening the devices ?

I don't see how this is supposed to work with your patch.

Well it did here: I did a quick test by reverting the commit made in the xserver side I mentionned earlier and switched to different ttys and get back to the tty used by the xserver. But I've probably missed something...

@dvdhrm
Copy link
Contributor

dvdhrm commented Mar 25, 2017

If you re-open the file-descriptors after restart, they will be a different file-context than before. Hence, the revocation logic will not work.
Hmm, the file descriptors of the (session) devices are supposed to be kept open by the xserver. And in this case I don't why also keeping a second references to those fds by sending the fds to PIDs would help when restarting logind.

Which data of the file-context are definitively lost after re-opening the devices ?

In case of input-devices, you need a share file-description (sic) so EVIOCREVOKE works. If you do not share the file-description, you cannot revoke the FDs.

I don't see how this is supposed to work with your patch.
Well it did here: I did a quick test by reverting the commit made in the xserver side I mentionned earlier and switched to different ttys and get back to the tty used by the xserver. But I've probably missed something...

Sure it works, but it no longer correctly revokes FD access. That is, multiple sessions might now have device access in parallel.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments

assert(s);

if (devices)
FOREACH_WORD(word, l, devices, state) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use FOREARCH_WORD in new code anymore, use extract_first_word().

if (devices)
FOREACH_WORD(word, l, devices, state) {
SessionDevice *sd;
dev_t dev = strtoul(word, NULL, 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use "strtoul()", use something like safe_atou32() or so (or even better, add a parse_dev() or so, which does this check properly.

Also, don't mix variable declarations and function invocations, see CODING_STYLE.

r = session_device_new(s, dev, &sd);
if (r < 0)
log_warning("Failed to restore device [%u:%u]: %m",
major(dev), minor(dev));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed here already by others: the only way this can work correctly is if we keep the fds open across the restart. There's infrastructure for that in place now using the fdstore stuff. It's pretty easy to use even too.

@poettering poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Mar 30, 2017
@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 3, 2017

In case of input-devices, you need a share file-description (sic) so EVIOCREVOKE works. If you do not share the file-description, you cannot revoke the FDs.

oops, I missed that part. I'll rework the patch accordingly, thanks for the inputs.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 20, 2017

I'm back to this matter, sorry for the delay.

Now I'm wondering how we're supposed to handle systemctl stop because in this case PID1 will drop the saved fds and when logind will be started again it won't have any way to retrieve the old fds and later revoke them if needed.

Could anybody shed some light in this case ?

@keszybz keszybz dismissed their stale review April 20, 2017 19:18

needs a different approach

@keszybz
Copy link
Member

keszybz commented Apr 20, 2017

how we're supposed to handle systemctl stop

File descriptor storage only applies to restarts. A full stop would not be supported, and I don't think there's any reason to. dc48bd653c7e1013e2d69e3f59ae3cbc0c893473 would have to be updated to accept restarts of systemd-logind.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 21, 2017

File descriptor storage only applies to restarts. A full stop would not be supported, and I don't think there's any reason to.

That would let a session controller having non revoked fds forever which may be a security issue.

@keszybz
Copy link
Member

keszybz commented Apr 21, 2017

Indeed. Does that mean that logind should revoke all fds on shutdown? I'd think so.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 21, 2017

yes but that would let the unexpected cases (where logind crashes or killed) for example still uncovered...

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 21, 2017

and also stopping logind would lead to a frozen GUI, I'm not sure users will appreciate.

@keszybz
Copy link
Member

keszybz commented Apr 21, 2017

yes but that would let the unexpected cases (where logind crashes or killed) for example still uncovered...

systemd-logind.service has Restart=always, so the new instance will get the old fds from pid1 and manage them. So this is not an issue.

and also stopping logind would lead to a frozen GUI, I'm not sure users will appreciate.

Xorg has the patch to detect logind going away. It should be updated to ignore restarts of systemd-logind, and fail if it goes away.

@keszybz
Copy link
Member

keszybz commented Apr 21, 2017

... and anyway, I don't think this is such a big issue. If logind dies and cannot be restarted, then the graphical login manager will be hosed and users will not be able to log in anyway. So I don't think this is a concern in practice.

@poettering
Copy link
Member

Now I'm wondering how we're supposed to handle systemctl stop because in this case PID1 will drop the saved fds and when logind will be started again it won't have any way to retrieve the old fds and later revoke them if needed.

It's a good thing we drop them in this case. if you explicitly stop logind, you want it to go away, hence yes, we should drop all our references to those resources...

@poettering
Copy link
Member

That would let a session controller having non revoked fds forever which may be a security issue.

I would disagree that this is a security issue. Stopping logind while a client is still running means "logind, please stop managing" devices. And that's entirely OK, as doing this is privileged, and is not part of any common flow of control.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 25, 2017

Stopping logind while a client is still running means "logind, please stop managing" devices.

Indeed but if logind is started again later, the sysadmin probably will expect that the devices will be automagically managed again.

@poettering
Copy link
Member

Indeed but if logind is started again later, the sysadmin probably will expect that the devices will be automagically managed again.

Well, I think the admin might be happy if it would, but I am not convinced all admins would expect that if they pull something from the middle of the stack and stick it back in that the top of the stack remains unaffected...

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 26, 2017

I am not convinced all admins would expect that...

I think a lot of them would. At least a big fat warning when logind exits (and if possible when logind is started again later) would be nice, not sure if it's doable though.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 26, 2017

There's another issue: when logind is restarted all saved fds passed back by PID1 are subject to flags_fds() which drops/sets O_NONBLOCK.

It has a bad side effect to make the X server failing to detect further events from the devices.

A possible workaround would be to add to system-logind.service unit file "NonBlocking=yes" but that doesn't seem correct.

I'm not sure to understand why this option has been added to service units and not to socket ones but it seems that we could at least limit flags_fds() to operate on sockets only.

WDYT ?

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Apr 28, 2017

Ok I've pushed a new version.

It's still not ready for merging as there're still open questions (see my previous comment and FIXMEs in the content of the patch) but at least it allows to make sure that the implementation is getting to the right direction.

Thanks !

@poettering
Copy link
Member

There's another issue: when logind is restarted all saved fds passed back by PID1 are subject to flags_fds() which drops/sets O_NONBLOCK.

Uh, I figure we should stop setting that, as this indeed might confuse other processes which have access to the fd, after all O_NONBLOCK is not specific to the file handler itself, but the open file.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only half a review, needed to get going, just wanted to post this aready

@@ -367,9 +359,39 @@ int session_device_new(Session *s, dev_t dev, SessionDevice **out) {
r = hashmap_put(s->devices, &sd->dev, sd);
if (r < 0) {
r = -ENOMEM;
free(sd->node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really strange, you need to reset sd->node to NULL here, otherwise it will point into invalidated memory? also, why reset that here anyway? I am pretty sure functions should try to rollback only the changes they made themselves on failure, but this function never initializes node

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it looks strange. But this was already the case, the commit only moved the code around. I think we could revisit this later.

goto error;
}

LIST_PREPEND(sd_by_device, sd->device->session_devices, sd);
error:
return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's only a single "return r" left in the goto block, I think it's much nicer to just return directly rather than involve goto

static void session_device_deinit(SessionDevice *sd) {
LIST_REMOVE(sd_by_device, sd->device->session_devices, sd);
hashmap_remove(sd->session->devices, &sd->dev);
free(sd->node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am really not feeling good about functions that "half-destroy" something but don't invalidate what they destroyed. Either functions should fully destroy things, or they should at least invalidate what they destroyed, i.e. "free(sd->node)" should be "sd->node = mfree(sd->node)", and so on. so that the function becomes idempotent.

@@ -1170,7 +1170,7 @@ static int on_bus_track(sd_bus_track *track, void *userdata) {
return 0;
}

int session_set_controller(Session *s, const char *sender, bool force) {
int session_set_controller(Session *s, const char *sender, bool force, bool prepare) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of functions taking multiple bools (usually a flags param is nicer than as the invocations become more descriptive), but I figure as long as its just two its OK

#elif SIZEOF_DEV_T == 4
uint32_t l;
r = safe_atou32(s, &l);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am pretty sure we should serialize and deserializue dev_t as major:minor, as dev_t isn't well-defined, different on all archs. I mean, it might happen that the reader is i386 but the parse is x86-64 even on the same machine, and dev_t differs, hence it's always better to do this properly I think.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here is the rest of the review...

@@ -32,6 +32,7 @@ RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6
SystemCallFilter=~@clock @cpu-emulation @debug @keyring @module @obsolete @raw-io @reboot @swap
SystemCallArchitectures=native
FileDescriptorStoreMax=512
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned I think we should drop all O_NONBLOCK manipulation for stored fds, and simply use them as is. Given that we only care about POLLHUP on them, it doesn't really matter whether the flag is set for us or not, we cannot really end up blocking on it by accident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means getting rid of "NonBlocking=", right ?

if (sd->active && r == -EINVAL) {
sd->active = false;
r = session_device_open(sd, false);
if (open) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not overload the "open" symbol like this. after all, if we'd remove the bool argument "open" of this function the code would still compile, as libc exposes the open() function...

* Note: for devices supporting revocation, PID1 will drop the
* passed fds automatically as soon as the corresponding device
* are revoked. */
r = asprintf(&state, "FDSTORE=1\nFDNAME=%s", sd->session->id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that it matters, but I think it would be nice to write this as:

r = asprintf(&state, "FDSTORE=1\n"
                     "FDNAME=…

i.e. actually line break the string where we have the \n in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, could you please prefix the FDNAME with something? just to be safe, in case we later on want to push other stuff to PID 1 too, and we should better not collide with the naming here. Hence, please use FDNAME=session-%s rather than just FDNAME=%s or so

* are revoked. */
r = asprintf(&state, "FDSTORE=1\nFDNAME=%s", sd->session->id);
if (r < 0)
return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, this needs to be return -ENOMEM. asprintf() after all returns -1 on error

log_warning("Failed to attach fd for unknown session: %s", fdnames[i]);
r = -EINVAL;
/* FIXME: should we close the fd ? */
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please be defnesive here. please log about where we can't match up the fdnames list with the session list, but continue. And close all fds we can't find matching sessions for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why do you return -EINVAL in the updated version? I think a session going away is a normal occurrence, and should not result in an error.

Also, the grammar in the comment is a bit off. Maybe "If the session doesn't exist any more, the associated session device attached to this fd doesn't either. Let's simply close this fd.".


k = fstat(fd, &st);
if (k < 0) {
r = k;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i figure you mean "r = -errno" here...


sd = hashmap_get(s->devices, &st.st_rdev);
if (!sd) {
r = -ENOENT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, let's be defensive here: match up what we can match up. log about the rest, but continue gracefully

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colon before %s doesn't make much sense.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the later review. Close, but there's one real bug lurking!

@@ -589,3 +589,13 @@ int parse_ip_port(const char *s, uint16_t *ret) {

return 0;
}

int parse_dev(const char *s, dev_t *ret) {
unsigned major, minor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

major() and minor() are macros defined in sysmacros.h, the same place makedev() is defined. I think it's better not to name variables like that and avoid any collision. Maybe just name them "a"/"b" or "x"/"y"...

if (sscanf(s, "%u:%u", &major, &minor) != 2)
return -EINVAL;

*ret = makedev(major, minor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we are at it, I think it would be nice to be stricter here on the range of the two parts we accept. Unfortunately libc doesn't expose information about the valid range, but an OK check could be to immediately split major/minor out again, and compare it with the original values, to ensure the full information is retained and doesn't get mangled up. i.e., something like this:

unsigned x, y;
dev_t d;
…
if (sscanf(s, "%u:%u", &x, &y) != 2)
        return -EINVAL;

d = makedev(x, y);
if ((unsigned) major(d) != x || (unsigned) minor(d) != y)
        return -EINVAL;

*ret = d;
…

i.e. i want that parse_dev("4294967295:4294967295", &ret) results in a parse failure. With your current code this would parse successfully, but ret would contain nonsense data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, a test case for this parser would be very welcome)

if (!startswith(fdnames[i], "session-"))
continue;

id = fdnames[i] + 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that it matters, but the three lines above could be rewritten in a slightly less opaque way as:

id = startswith(fdnames[i], "session-"));
if (!id)
        continue;

i.e. taking benefit of the fact that startswith() returns a pointer to the first character after the determined prefix

major(st.st_rdev), minor(st.st_rdev), s->id);
close_nointr(fd);
r = -ENOENT;
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are still modifying "r" here, and you shouldn't, I don#t think we should ever propagate an error if we fail to match up... Log yes, but not fail

session_device_attach_fd(sd, fd, s->was_active);
}

return r;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't actually need "r" at all, do you? You could just remove it and "return 0" here...

Defaults to false.</para></listitem>
<listitem><para>Set the <constant>O_NONBLOCK</constant> flag for all file descriptors passed via socket-based
activation. If true, all file descriptors >= 3 (i.e. all except stdin, stdout, stderr and those not passed via
socket-based activation) will have the <constant>O_NONBLOCK</constant> flag set and hence are in non-blocking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"except … not" is double negation, that's just confusing.

Moreover , the parens are placed incorrectly, after all the "those not passed via socket-based activation" part actually applies to the "all file descriptors >= 3", too.

Hence, I'd maybe rewrite this as:

"… If true, all file descriptors >= 3 (i.e. all except stdin, stdout, stderr), excluding those passed in via the file descriptor storage logic (see FileDescriptorStoreMax= for details), will have …"

That way the parens are placed correctly, and we avoid the double negation

@@ -247,6 +247,7 @@ struct ExecParameters {
int *fds;
char **fd_names;
unsigned n_fds;
unsigned n_sock_fds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really don't like the abbreviation "n_sock_fds" instead of "n_socket_fds" I must say (really, abbreviating 2 characters away?). Also, I don't like the redundancy of this, i.e. that n_fds includes n_sock_fds.

I think these should be two independent counters:

unsigned n_socket_fds;
unsigned n_storage_fds;

And that this shoul be passed everywhere like this. And if code wants to iterate through all of them it should explicitly add n_socket_fds + n_storage_fds. That way the redundancy goes away as each fd is counted at one place only. (The array for storing the fds can stay together)

hope this makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it does, I'll do that in a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it's easier to review.

Make sure to only apply the O_NONBLOCK flag to the fds passed via socket
activation.

Previously the flag was also applied to the fds which came from the fd store
but this was incorrect since services, after being restarted, expect that these
passed fds have their flags unchanged and can be reused as before.

The documentation was a bit unclear about this so clarify it.
@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jun 8, 2017

Sorry for the later review. Close, but there's one real bug lurking!

I'm not sure what was the bug but I agree with all your points.

I'll push a new version soon.

Thanks for the review.

…ameters struct

'n_fds' field in the ExecParameters structure was counting the total number of
file descriptors to be passed to a unit.

This counter also includes the number of passed socket fds which is counted by
'n_socket_fds' already.

This patch removes that redundancy by replacing 'n_fds' with
'n_storage_fds'. The new field only counts the fds passed via the storage store
mechanism.  That way each fd is counted at one place only.

Subsequently the patch makes sure to fix code that used 'n_fds' and also wanted
to iterate through all of them by explicitly adding 'n_socket_fds' + 'n_storage_fds'.

Suggested by Lennart.
… is restarted

When assigning a new session controller to a session, the VT is prepared so the
controller can expect the VT to be in a good default state.

However when logind is restarted and a session controller already took control
of a session, there's no need to prepare th VT otherwise logind may screw up
the VT state set by the controller.

This patch prevents the preparation of the VT in this case.
…ptors

This patch ensures that session devices are saved for each session.

In order to make the revokation logic work when logind is restarted, the
session devices are now saved in the session state files and their respective
file descriptors sent to PID1's fdstore in order to keep them open accross
restart.

This is mandatory in order to keep the revokation logic working. Indeed in case
of input-devices, the same file descriptors must be shared by logind and a
given session controller in order EVIOCREVOKE to work otherwise multiple
sessions can have device access in parallel.

This should be the only remaining and missing piece for making logind fully
restartable.

Fixes: systemd#1163
@fbuihuu fbuihuu removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jun 8, 2017
@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jun 8, 2017

@poettering new version force pushed.

@keszybz
Copy link
Member

keszybz commented Jun 23, 2017

Hm, this does not seem to work unfortunately. I tried restarting systemd-logind from a graphical session, and gnome-shell seems to hang somehow (the graphics still work, but input is hosed). I'm not sure if this is a fault of logind or something else in the stack.

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jun 23, 2017

Sigh... it looks like it will never make it.

I dont see any issue here on Tumbleweed with either Gnome or LXDE and with http://cgit.freedesktop.org/xorg/xserver/commit/?id=dc48bd653c7e1013e2d69e3f59ae3cbc0c893473 reverted.

Could you describe your setup more accurately ? any chance you can share it ?

@keszybz
Copy link
Member

keszybz commented Jun 24, 2017

I didn't mean to imply that I expect you to debug it. I'll try to do it myself, I just didn't have time to do it properly. I think we'll get it, at least I want that to happen.

Anyway, my setup is a pretty standard Fedora 26 (originally installed as F24 and then upgraded once or twice, so not entirely pristine), gnome-shell with wayland.

Do you know if there's the equivalent of https://cgit.freedesktop.org/xorg/xserver/commit/?id=dc48bd653c7e1013e2d69e3f59ae3cbc0c893473 in gnome-wayland? If yes, this would need to be reverted too of course.

@keszybz
Copy link
Member

keszybz commented Jun 24, 2017

@whot, @jwrdegoede do you know of gnome-wayland has something equivalent to https://cgit.freedesktop.org/xorg/xserver/commit/?id=dc48bd653c7e1013e2d69e3f59ae3cbc0c893473? Any ideas why this PR might not work as expected?

@keszybz
Copy link
Member

keszybz commented Jun 24, 2017

So… I tested restarting systemd-logind with this patch set, without gnome-wayland running (just a gdm), and it's fine. Then I tested it with a sway session, and it also is fine.

Then I had the hypothesis that gnome-wayland crashes because Xwayland (with dc48bd653c7e10), so I built a Xorg with dc48bd653c7e10 reverted. This makes no difference, the crash was the same as before.

But now I had a local copy of Xorg with dc48bd653c7e10 reverted, so I could test this patchset with gnome-shell on Xorg. gnome-shell on Xorg survives a systemd-logind restart just dandy with this patch.

So it's only gnome-wayland that has issues. I assume it's reacting to some signal, but after browsing the gnome-shell sources, I don't see how that's happening. Anyway, this seems something that requires changes on the gnome-shell side, so this PR should not be blocked.

@whot, @jwrdegoede ideas?

@fbuihuu can you submit your patch to revert dc48bd653c7e10 upstream? (It does not revert cleanly, and I'm rather uncertain if the hack job I did locally is correct.)

(Also, @fbuihuu, I loved your commit messages. It's a pleasure to read something so clear.)

@keszybz keszybz merged commit 7e86713 into systemd:master Jun 24, 2017
@jwrdegoede
Copy link
Contributor

jwrdegoede commented Jun 25, 2017 via email

@whot
Copy link
Contributor

whot commented Jun 26, 2017

@keszybz: Regarding XWayland, the dc48bd commit doesn't affect it at all. XWayland doesn't handle anything logind-related, that's only for the Xorg server (i.e. the real xserver).

cc-ing @jadahl and @garnacho,those two will know where the code is for handling logind. mutter has the meta-backend-native and I don't find anything handling restarts there so I guess that could be it.

@jadahl
Copy link
Contributor

jadahl commented Jun 26, 2017

The logind facing code in mutter is in src/backends/native/meta-launcher.c. I suppose it needs a change similar to the Xserver one posted above.

@fbuihuu fbuihuu deleted the make-logind-restartable branch June 26, 2017 13:34
@fbuihuu
Copy link
Contributor Author

fbuihuu commented Jun 26, 2017

@fbuihuu can you submit your patch to revert dc48bd653c7e10 upstream? (It does not revert cleanly, and I'm rather uncertain if the hack job I did locally is correct.)

@keszybz , shouldn't we wait for the release of v234 (at least) before submitting the revert ?

@keszybz
Copy link
Member

keszybz commented Jun 30, 2017

I think you can submit it… v234 should be released any day now.

@mbiebl
Copy link
Contributor

mbiebl commented Aug 26, 2017

@fbuihuu now that v234 has been released, was this patch submitted upstream?
I can't find the revert in https://cgit.freedesktop.org/xorg/xserver/

@fbuihuu
Copy link
Contributor Author

fbuihuu commented Aug 29, 2017 via email

@michalsrb
Copy link

michalsrb commented Sep 8, 2017

Hi, I've been referred to this thread by @fbuihuu.

The dc48bd653c7e10 commit in X server detects the termination/restart of logind by listening for "org.freedesktop.DBus" "NameOwnerChanged" signal, which is IMHO a bit of a hack. To detect only termination and not restart, maybe it could listen for "PropertiesChanged" on the object of the logind unit. Then it could differentiate between termination and restart.

But the whole thing seems a bit backwards. Why should X server track this dependency manually using dbus. The whole task of managing dependencies between services is systemd's job. If the X server depends on the logind service, then it would be best to have it somehow defined in the unit files. Systemd should then terminate X server when logind terminates (but not when it restarts).

@whot
Copy link
Contributor

whot commented Sep 8, 2017

If the X server depends on the logind service, then it would be best to have it somehow defined in the unit files.

The X server doesn't have a unit file, it's a process usually started by the display manager. And that one doesn't necessarily know whether the server uses logind.

@michalsrb
Copy link

Ok, so X server is supposed to recognize whether journald was stopped or crashed and terminate itself, but it must no longer do that in case journald just restarted. I was observing the "PropertiesChanged" signal and I no longer think that it can be used to differentiate stopping vs restarting.

What is the best way X server could detect that journald stopped/crashed?

I was reading about transient units and it seems like it would be great solution if X server could register itself as a transient service that requires journald, but as far as I can tell, it is not possible to associate running process with the new service.

@boucman
Copy link
Contributor

boucman commented Oct 11, 2017

@michalsrb you can do that with scopes (man systemd.scope) they are basically services without ExecStart, that can only be created via dbus, and that can adopt a PID

@michalsrb
Copy link

@boucman That sounds perfect, thanks! I will try to use that to let X server register itself as dependent on logind if it uses its file descriptors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

10 participants