New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
homed: allow logging into home areas via ssh without unlocking them locally first #30226
Conversation
We had successfully released a new major release. We are no longer in a development freeze phase. |
1d55e3c
to
00bdaa6
Compare
This is now complete btw, integration test added. PTAL. |
00bdaa6
to
7f0f60e
Compare
An -rc1 tag has been created and a release is being prepared, so please note that PRs introducing new features and APIs will be held back until the new version has been released. |
bae4f08
to
1061214
Compare
1061214
to
be7c3b8
Compare
2c3d298
to
ff7b2d6
Compare
ff7b2d6
to
f513486
Compare
This comment was marked as outdated.
This comment was marked as outdated.
6f78be1
to
708a7bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm. Just some minor nits and clarifying questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why add the level of indirection of useFallback
? Why not define that homeDirectory
and shell
themselves can appear in the status
section and naturally override anything in regular
, perMachine
, or binding
? Then if the home directory is inacitve homed just sets homeDirectory
and shell
in the status
section, then once it's activated it unsets these fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading through the rest of the PR, I still feel useFallback
can probably go... The only place where it meaningfully makes a difference to the code path is in the fallback shell outer loop, which as I mention in a later comment I don't really understand the point of in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, the bool is redundant, we could of course use the fact that the fallback shell json field is defined as only indicator. But it simply seemed cleaner to me to hide this behind one clear boolean field. After all the fallback paths are not really dynamic, they are always the same and constant. It's simply the decision whether to go for them or not that changes dynamically.
Or to say this differently: independently if the fallback mode is on or not it should be possibly to "see" both shells: the real one, and the fallback one. And the fallback one is actually more interesting than the real one in many cases, because if you invoke it, things always work, regardless in which state the home dir currently is.
The way I see it, the fallback shell field is really the shell that logins/gettys/ssh should invoke, basically always, and the regular shell field is then what our fallback shell chain loads to give the user what they want. In that thinking model it just makes sense to surface the shell in all cases, and just have a toggle boolean to route things in the right place for compat NSS struct passwd
stuff.
In my thinking a modern /bin/login ideally would always invoke the fallback shell, and let the fallback shell figure things out. That would be perfect actually, as it would mean we could address one thing people have been complaining about since a long time: that there's no generic, shell independent way to insert env vars into all sessions of a user, with control from the user. You could do it with .profile and .bashrc of course, but that would be shell-specific and not be generic. You could also do it from PAM, but that's not under user control, and is very hard to do safely if it shall read data from the homedir. Now, this PR is not going to deliver that, but I think it still makes sense to prepare for a world, where the fallbackShell
field is the primary entrypoint to the session if it exists and shell
is the binary that it invokes. But for that we need both fields to always be visible.
Does any of the above make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
But then why not expose the fallback shell always for struct passwd
stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but I think people would be really confused by this. i.e. they say "I pick tcsh" and then we say "of course! here you have systemd-fallback-shell!". And they then come back "man, i said tcsh". And we say "yeah, ok, got it, see: systemd-fallback-shell".
Hence I figured, let's maybe not make this so confusing if we can: only show the fallback shell if we really have to. And if we don't really have to just show the user-selected shell.
/* Let's start by checking if this all is even necessary, i.e. if the useFallback boolean field is actually set. */ | ||
r = bus_call_method(bus, bus_mgr, "GetUserRecordByName", &error, &reply, "s", NULL); /* empty user string means: our calling user */ | ||
if (r < 0) | ||
return log_error_errno(r, "Failed to inspect home: %s", bus_error_message(&error, r)); | ||
|
||
r = sd_bus_message_read(reply, "sbo", &json, NULL, NULL); | ||
if (r < 0) | ||
return bus_log_parse_error(r); | ||
|
||
r = json_parse(json, JSON_PARSE_SENSITIVE, &v, NULL, NULL); | ||
if (r < 0) | ||
return log_error_errno(r, "Failed to parse JSON identity: %m"); | ||
|
||
hr = user_record_new(); | ||
if (!hr) | ||
return log_oom(); | ||
|
||
r = user_record_load(hr, v, USER_RECORD_LOAD_REFUSE_SECRET|USER_RECORD_LOG|USER_RECORD_PERMISSIVE); | ||
if (r < 0) | ||
return r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost word-for-word identical code to inspect_home
. Maybe worth splitting out a helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it though? the error handling and stuff is quite different still.
|
||
sd_bus_error_free(&error); | ||
} else | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is a bit confusing.
If ActivateHomeIfReferenced
is successful, you still loop the outer loop to check that use_fallback
actually became false. If the method fails, we either stay in the inner loop (if we can retry auth) or completely return from the function and thus kill the SSH session.
So what's the point of the outer retry loop? It seems to only loop if ActivateHomeIfReferenced
successfully returned but still somehow failed to do its job of activating the home? When would something like this happen?
useFallback
is a synonym of the home's state being active, after all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's again about the race: multiple clients starting/stopping sessions at the very same time, with and without reference counting. we want the user record where the fallback logic is off and with the most current data (which will only be available after successful activation). Hence we keep activating things until we get a user record that looks like it's right. we give up after 5 tries though, because if we keep looping around this it looks more like someone is fucking with us then actual concurrent operations...
i mean, as long as all logins and other attempts to access home dirs go through this pam-systemd-home code paths then it should in theory suffice to just do get-record→activate→get-record and be done with it. But it seemed safer to me to try a couple of times, since there's als ActivateHome without refcounting after all...
src/home/pam_systemd_home.c
Outdated
@@ -621,7 +621,7 @@ static int acquire_home( | |||
|
|||
r = sd_bus_call(bus, m, HOME_SLOW_BUS_CALL_TIMEOUT_USEC, &error, &reply); | |||
if (r < 0) { | |||
if (sd_bus_error_has_name(&error, BUS_ERROR_HOME_NOT_ACTIVE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to squash this into the previous commit? Up to you
…SSION_INCOMPLETE env var is set This allows earlier PAM modules (i.e. pam_systemd_home) to inform pam_systemd that the session is not "complete" yet (i.e. doesn't have the home dir set up properly yet).
This adds fields to the user record logic to allow a "fallback" home directory and shell to be set as part of the "status" section of the user record, i.e. supplied by the manager of the user record. The idea is that if the fallback homedir/shell is set it will take precedence over the real one in most ways. Usecase: let's try to make ssh logins into homed directories work. systemd-homed would set a fallback shell/homedir for inactive home dirs. Thus, when ssh logins take place via key auth, we can allow them, and these fallback session params would be used because the real home cannot be activated just yet becasue we cannot acquire any password for it from the user.
This is useful for allowing users to login without the ability to unlock their home dir. Usecase is ssh: ssh might grant access without giving us the chance to unlock the home dir for the user (because it doesn't allow us asking questions during authentication), hence with this call we can pin the home dir, but not activate it and then allow the activation to be delayed until later.
This is very similar to ActivateHome() but will fail if the home directory is not referenced yet. Or in other words, this doesn't add any new reference, but simply is the other side of RefUnrestricted(): if we allowed a home dir to be referenced without it actually being active, then this can catch up with things and activated what was previously referenced already. This also relaxes access rights to that users can always activate their own home dirs. This is useful once we allow user code to run without the home dir being activated.
…or home password ssh runs PAM session hooks before they allocate a pty for the session. (That's because they allow multiplexed connections, and hence might run multiple ptys over the same same session). This means PAM modules cannot interactively ask the user for additional information as they deem fit. That's a problem for us, since generally during an SSH login no authentication token (aka "password") is supplied to us which we could use to unlock the user's home dir. With this commit we implement a way out: we allow the login to proceed, but without the home dir activated, and then override the user's shell with our fallback shell, which will ask for the user's password and then chainload the actual shell. This will of course only work if the login actually involves invoking the configured interactive shell of the user. For other logins (such as sftp or so), this cannot work, and they'll see an empty home dir instead.
…operly RefUnit() only succeeds it a home dir is fully up. We already dealt with it not being up at all, but let's also cover the case where it is currently busy with changing state, and in that case fall back to RefUnrestricted(), with the usual implications. This has the effect that two subsequent ssh logins one-after-the-other will work correctly.
708a7bf
to
1c5d03c
Compare
force pushed a new version. only minor fixes, and a rebase. Given @AdrianVovk's review, taking liberty to set green label. |
🎉 cool |
…temd#30226) We don't usually say ", refusing" in bus error messages. Also, make use of unref_and_replace_full.
…temd#30226) We don't usually say ", refusing" in bus error messages. Also, make use of unref_and_replace_full.
…temd#30226) We don't usually say ", refusing" in bus error messages. Also, make use of unref_and_replace_full.
…temd#30226) We don't usually say ", refusing" in bus error messages. Also, make use of unref_and_replace_full.
This is some work I had in a local branch since last year, let's dust it off and get it merged.
Fixes: #23978
Replaces: #23569 #15742