Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

homed: allow logging into home areas via ssh without unlocking them locally first #30226

Merged
merged 14 commits into from Feb 14, 2024

Conversation

poettering
Copy link
Member

@poettering poettering commented Nov 28, 2023

This is some work I had in a local branch since last year, let's dust it off and get it merged.

Fixes: #23978
Replaces: #23569 #15742

Copy link

github-actions bot commented Nov 28, 2023

We had successfully released a new major release. We are no longer in a development freeze phase.
We will try our best to get back to your PR as soon as possible. Thank you for your patience.

@poettering
Copy link
Member Author

This is now complete btw, integration test added. PTAL.

test/units/testsuite-35.sh Fixed Show fixed Hide fixed
Copy link

An -rc1 tag has been created and a release is being prepared, so please note that PRs introducing new features and APIs will be held back until the new version has been released.

@poettering poettering force-pushed the homed-fallback-shell branch 2 times, most recently from bae4f08 to 1061214 Compare November 29, 2023 21:16
@poettering poettering added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 30, 2023
@github-actions github-actions bot removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 30, 2023
@poettering poettering force-pushed the homed-fallback-shell branch 2 times, most recently from 2c3d298 to ff7b2d6 Compare December 1, 2023 08:42
src/login/pam_systemd.c Outdated Show resolved Hide resolved
@mrc0mmand

This comment was marked as outdated.

@mrc0mmand mrc0mmand added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Dec 7, 2023
Copy link
Contributor

@AdrianVovk AdrianVovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm. Just some minor nits and clarifying questions

src/login/pam_systemd.c Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why add the level of indirection of useFallback? Why not define that homeDirectory and shell themselves can appear in the status section and naturally override anything in regular, perMachine, or binding? Then if the home directory is inacitve homed just sets homeDirectory and shell in the status section, then once it's activated it unsets these fields?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading through the rest of the PR, I still feel useFallback can probably go... The only place where it meaningfully makes a difference to the code path is in the fallback shell outer loop, which as I mention in a later comment I don't really understand the point of in the first place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, the bool is redundant, we could of course use the fact that the fallback shell json field is defined as only indicator. But it simply seemed cleaner to me to hide this behind one clear boolean field. After all the fallback paths are not really dynamic, they are always the same and constant. It's simply the decision whether to go for them or not that changes dynamically.

Or to say this differently: independently if the fallback mode is on or not it should be possibly to "see" both shells: the real one, and the fallback one. And the fallback one is actually more interesting than the real one in many cases, because if you invoke it, things always work, regardless in which state the home dir currently is.

The way I see it, the fallback shell field is really the shell that logins/gettys/ssh should invoke, basically always, and the regular shell field is then what our fallback shell chain loads to give the user what they want. In that thinking model it just makes sense to surface the shell in all cases, and just have a toggle boolean to route things in the right place for compat NSS struct passwd stuff.

In my thinking a modern /bin/login ideally would always invoke the fallback shell, and let the fallback shell figure things out. That would be perfect actually, as it would mean we could address one thing people have been complaining about since a long time: that there's no generic, shell independent way to insert env vars into all sessions of a user, with control from the user. You could do it with .profile and .bashrc of course, but that would be shell-specific and not be generic. You could also do it from PAM, but that's not under user control, and is very hard to do safely if it shall read data from the homedir. Now, this PR is not going to deliver that, but I think it still makes sense to prepare for a world, where the fallbackShell field is the primary entrypoint to the session if it exists and shell is the binary that it invokes. But for that we need both fields to always be visible.

Does any of the above make any sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

But then why not expose the fallback shell always for struct passwd stuff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but I think people would be really confused by this. i.e. they say "I pick tcsh" and then we say "of course! here you have systemd-fallback-shell!". And they then come back "man, i said tcsh". And we say "yeah, ok, got it, see: systemd-fallback-shell".

Hence I figured, let's maybe not make this so confusing if we can: only show the fallback shell if we really have to. And if we don't really have to just show the user-selected shell.

docs/USER_RECORD.md Outdated Show resolved Hide resolved
src/home/homed-home-bus.c Outdated Show resolved Hide resolved
Comment on lines +4110 to +4169
/* Let's start by checking if this all is even necessary, i.e. if the useFallback boolean field is actually set. */
r = bus_call_method(bus, bus_mgr, "GetUserRecordByName", &error, &reply, "s", NULL); /* empty user string means: our calling user */
if (r < 0)
return log_error_errno(r, "Failed to inspect home: %s", bus_error_message(&error, r));

r = sd_bus_message_read(reply, "sbo", &json, NULL, NULL);
if (r < 0)
return bus_log_parse_error(r);

r = json_parse(json, JSON_PARSE_SENSITIVE, &v, NULL, NULL);
if (r < 0)
return log_error_errno(r, "Failed to parse JSON identity: %m");

hr = user_record_new();
if (!hr)
return log_oom();

r = user_record_load(hr, v, USER_RECORD_LOAD_REFUSE_SECRET|USER_RECORD_LOG|USER_RECORD_PERMISSIVE);
if (r < 0)
return r;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost word-for-word identical code to inspect_home. Maybe worth splitting out a helper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it though? the error handling and stuff is quite different still.


sd_bus_error_free(&error);
} else
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is a bit confusing.

If ActivateHomeIfReferenced is successful, you still loop the outer loop to check that use_fallback actually became false. If the method fails, we either stay in the inner loop (if we can retry auth) or completely return from the function and thus kill the SSH session.

So what's the point of the outer retry loop? It seems to only loop if ActivateHomeIfReferenced successfully returned but still somehow failed to do its job of activating the home? When would something like this happen?

useFallback is a synonym of the home's state being active, after all...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's again about the race: multiple clients starting/stopping sessions at the very same time, with and without reference counting. we want the user record where the fallback logic is off and with the most current data (which will only be available after successful activation). Hence we keep activating things until we get a user record that looks like it's right. we give up after 5 tries though, because if we keep looping around this it looks more like someone is fucking with us then actual concurrent operations...

i mean, as long as all logins and other attempts to access home dirs go through this pam-systemd-home code paths then it should in theory suffice to just do get-record→activate→get-record and be done with it. But it seemed safer to me to try a couple of times, since there's als ActivateHome without refcounting after all...

src/home/homectl.c Show resolved Hide resolved
@@ -621,7 +621,7 @@ static int acquire_home(

r = sd_bus_call(bus, m, HOME_SLOW_BUS_CALL_TIMEOUT_USEC, &error, &reply);
if (r < 0) {
if (sd_bus_error_has_name(&error, BUS_ERROR_HOME_NOT_ACTIVE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to squash this into the previous commit? Up to you

…SSION_INCOMPLETE env var is set

This allows earlier PAM modules (i.e. pam_systemd_home) to inform
pam_systemd that the session is not "complete" yet (i.e. doesn't have
the home dir set up properly yet).
This adds fields to the user record logic to allow a "fallback" home
directory and shell to be set as part of the "status" section of the
user record, i.e. supplied by the manager of the user record.

The idea is that if the fallback homedir/shell is set it will take
precedence over the real one in most ways.

Usecase: let's try to make ssh logins into homed directories work.
systemd-homed would set a fallback shell/homedir for inactive home dirs.
Thus, when ssh logins take place via key auth, we can allow them, and
these fallback session params would be used because the real home cannot
be activated just yet becasue we cannot acquire any password for it from
the user.
This is useful for allowing users to login without the ability to unlock
their home dir. Usecase is ssh: ssh might grant access without giving
us the chance to unlock the home dir for the user (because it doesn't
allow us asking questions during authentication), hence with this call
we can pin the home dir, but not activate it and then allow the
activation to be delayed until later.
This is very similar to ActivateHome() but will fail if the home
directory is not referenced yet. Or in other words, this doesn't add any
new reference, but simply is the other side of RefUnrestricted(): if we
allowed a home dir to be referenced without it actually being active,
then this can catch up with things and activated what was previously
referenced already.

This also relaxes access rights to that users can always activate their
own home dirs. This is useful once we allow user code to run without the
home dir being activated.
…or home password

ssh runs PAM session hooks before they allocate a pty for the session.
(That's because they allow multiplexed connections, and hence might run
multiple ptys over the same same session). This means PAM modules cannot
interactively ask the user for additional information as they deem fit.
That's a problem for us, since generally during an SSH login no
authentication token (aka "password") is supplied to us which we could
use to unlock the user's home dir.

With this commit we implement a way out: we allow the login to proceed,
but without the home dir activated, and then override the user's shell
with our fallback shell, which will ask for the user's password and then
chainload the actual shell.

This will of course only work if the login actually involves invoking
the configured interactive shell of the user. For other logins (such as
sftp or so), this cannot work, and they'll see an empty home dir
instead.
…operly

RefUnit() only succeeds it a home dir is fully up. We already dealt with
it not being up at all, but let's also cover the case where it is
currently busy with changing state, and in that case fall back to
RefUnrestricted(), with the usual implications.

This has the effect that two subsequent ssh logins one-after-the-other
will work correctly.
@poettering
Copy link
Member Author

force pushed a new version. only minor fixes, and a rebase. Given @AdrianVovk's review, taking liberty to set green label.

@poettering poettering added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed please-review PR is ready for (re-)review by a maintainer labels Feb 14, 2024
@poettering poettering merged commit a043786 into systemd:main Feb 14, 2024
45 of 49 checks passed
@github-actions github-actions bot removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Feb 14, 2024
@AdrianVovk
Copy link
Contributor

🎉 cool

YHNdnzj added a commit to YHNdnzj/systemd that referenced this pull request Feb 14, 2024
…temd#30226)

We don't usually say ", refusing" in bus error messages.
Also, make use of unref_and_replace_full.
YHNdnzj added a commit to YHNdnzj/systemd that referenced this pull request Feb 15, 2024
…temd#30226)

We don't usually say ", refusing" in bus error messages.
Also, make use of unref_and_replace_full.
ayhamthemayhem pushed a commit to neighbourhoodie/nh-systemd that referenced this pull request Mar 25, 2024
…temd#30226)

We don't usually say ", refusing" in bus error messages.
Also, make use of unref_and_replace_full.
chunyi-wu pushed a commit to chunyi-wu/systemd that referenced this pull request Apr 3, 2024
…temd#30226)

We don't usually say ", refusing" in bus error messages.
Also, make use of unref_and_replace_full.
@mrc0mmand mrc0mmand removed the logind label Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants