Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] pam_fscrypt hangs on login via xrdp #350

Open
iskunk opened this issue Apr 8, 2022 · 10 comments
Open

[bug] pam_fscrypt hangs on login via xrdp #350

iskunk opened this issue Apr 8, 2022 · 10 comments
Labels

Comments

@iskunk
Copy link

iskunk commented Apr 8, 2022

This issue may or may not relate to #323. I am investigating the use of xrdp + fscrypt.

I am observing that the pam_fscrypt module hangs after the user has authenticated, leaving the xrdp login process dead in the water. This is on Ubuntu, using a daily build of 22.04/jammy. The fscrypt packages are version 0.3.3-1, so they appear current.

I enabled the debug keyword on the module in /etc/pam.d/common-session, and this is what /var/log/syslog looks like:

Apr  8 06:07:09 xubuntu xrdp[5563]: [INFO ] started connecting
Apr  8 06:07:09 xubuntu xrdp[5563]: [INFO ] lib_mod_connect: connecting via UNIX socket
Apr  8 06:07:09 xubuntu xrdp-sesman[5461]: [ERROR] sesman_data_in: scp_process_msg failed
Apr  8 06:07:09 xubuntu xrdp-sesman[5461]: [ERROR] sesman_main_loop: trans_check_wait_objs failed, removing trans
Apr  8 06:07:09 xubuntu xrdp-sesman[5571]: [INFO ] [session start] (display 10): calling auth_start_session from pid 5571
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: OpenSession(map[debug:true]) starting
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: Session count for UID=1000 updated to 1
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: Current privs (real, effective): uid=(0,0) gid=(0,0) groups=[]
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: Setting euid=1000 egid=1000 groups=[1000]
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: Current privs (real, effective): uid=(0,1000) gid=(0,1000) groups=[1000]
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: Reading config from "/etc/fscrypt.conf"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: creating context for user "testuser"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: found overlay filesystem "/" ()
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: listing protectors in "/.fscrypt/protectors"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: found 1 protectors
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: successfully read metadata from "/.fscrypt/protectors/f16e151ff1887a1d"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: Getting protector f16e151ff1887a1d from option
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: successfully read metadata from "/.fscrypt/protectors/f16e151ff1887a1d"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: listing policies in "/.fscrypt/policies"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: found 0 policies
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: following protector link /mnt/.fscrypt/protectors/f16e151ff1887a1d.link
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: resolved filesystem link using mountpoint path "/"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: successfully read metadata from "/.fscrypt/protectors/f16e151ff1887a1d"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: listing policies in "/mnt/.fscrypt/policies"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: found 1 policies
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: successfully read metadata from "/mnt/.fscrypt/policies/47ae2fc75b32c2d7cb51530f70f187db"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: got data for 47ae2fc75b32c2d7cb51530f70f187db from "/mnt"
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: unlocking 1 policies protected with AUTHTOK
Apr  8 06:07:09 xubuntu pam_fscrypt[5571]: running passphrase hash for protector f16e151ff1887a1d

There, it stops. The process (xrdp-sesman) is not using CPU; it's not doing anything. Eventually, the login process times out with an error.

As this requires somewhat of a non-trivial setup to reproduce, I have put together an initialization script that will do all the work. Get a daily build of Ubuntu Live 22.04/jammy (I used a Xubuntu .iso for my testing), boot up the GUI environment, run this script as root, and then invoke xfreerdp as directed to trigger the bug.

(I can't say if the login will work once this bug is fixed; there may be other issues on the xrdp side. pam_fscrypt should at least be returning within a reasonable amount of time, however.)

@iskunk iskunk changed the title pam_fscrypt hangs on login via xrdp [bug] pam_fscrypt hangs on login via xrdp Apr 8, 2022
@iskunk
Copy link
Author

iskunk commented Apr 8, 2022

I messed around with a Go debugger (delve) to try to figure out where the issue was happening. I have little experience with Go, so I probably got one thing or another wrong, but I was able to generate this backtrace from the hung login process:

# dlv attach 14221 /usr/sbin/xrdp-sesman
2022-04-07T06:14:52-04:00 error layer=debugger error loading binary "/lib/x86_64-linux-gnu/libc.so.6": could not parse .eh_frame section: pointer encoding not supported 0x9b at 0x59f4
Type 'help' for list of commands.
(dlv) bt
 0  0x00007f128c84d4a3 in runtime.futex
    at runtime/sys_linux_amd64.s:553
 1  0x00007f128c818d56 in runtime.futexsleep
    at runtime/os_linux.go:66
 2  0x00007f128c7f48e7 in runtime.notetsleep_internal
    at runtime/lock_futex.go:182
 3  0x00007f128c7f4a05 in runtime.notetsleepg
    at runtime/lock_futex.go:236
 4  0x00007f128c803cb8 in runtime.gcBgMarkStartWorkers
    at runtime/mgc.go:1133
 5  0x00007f128c8023d2 in runtime.gcStart
    at runtime/mgc.go:641
 6  0x00007f128c7f60dc in runtime.mallocgc
    at runtime/malloc.go:1205
 7  0x00007f128c832eb2 in runtime.makeslice
    at runtime/slice.go:103
 8  0x00007f128c8dfdf0 in golang.org/x/crypto/argon2.initBlocks
    at golang.org/x/crypto/argon2/argon2.go:160
 9  0x00007f128c8df93a in golang.org/x/crypto/argon2.deriveKey
    at golang.org/x/crypto/argon2/argon2.go:115
10  0x00007f128c93fefd in golang.org/x/crypto/argon2.IDKey
    at golang.org/x/crypto/argon2/argon2.go:99
11  0x00007f128c93fefd in github.com/google/fscrypt/crypto.PassphraseHash
    at github.com/google/fscrypt/crypto/crypto.go:220
12  0x00007f128c952305 in github.com/google/fscrypt/actions.getWrappingKey
    at github.com/google/fscrypt/actions/callback.go:80
13  0x00007f128c9524bf in github.com/google/fscrypt/actions.unwrapProtectorKey
    at github.com/google/fscrypt/actions/callback.go:90
14  0x00007f128c996f19 in github.com/google/fscrypt/actions.(*Protector).Unlock
    at github.com/google/fscrypt/actions/protector.go:261
15  0x00007f128c996f19 in main.OpenSession
    at github.com/google/fscrypt/pam_fscrypt/pam_fscrypt.go:184
16  0x00007f128c9989ec in main.(*PamFunc).Run
    at github.com/google/fscrypt/pam_fscrypt/run_fscrypt.go:106
17  0x00007f128c999caf in main.pam_sm_open_session
    at github.com/google/fscrypt/pam_fscrypt/pam_fscrypt.go:372
18  0x00007f128c999caf in _cgoexp_2819d48d7355_pam_sm_open_session
    at _cgo_gotypes.go:94
19  0x00007f128c7eca2a in runtime.cgocallbackg1
    at runtime/cgocall.go:314
20  0x00007f128c7ec6a9 in runtime.cgocallbackg
    at runtime/cgocall.go:233
21  0x00007f128c84d971 in runtime.cgocallbackg
    at <autogenerated>:1
22  0x00007f128c84b3b3 in runtime.cgocallback
    at runtime/asm_amd64.s:971
23  0x00007f128c84b601 in runtime.goexit
    at runtime/asm_amd64.s:1571
(dlv) exit
Would you like to kill the process? [Y/n] n

@josephlr
Copy link
Member

josephlr commented Apr 8, 2022

so something weird is happening when fscrypt is doing the password hashing with argon2. It seems like when all the memory gets allocated for the hashing, the runtime freezes. I have no idea why this would happen, but it seems to be happening in runtime.mallocgc

@josephlr
Copy link
Member

josephlr commented Apr 8, 2022

Maybe we need to build the PAM module in some sort of "single threaded" mode? But that might not work as we need threads to do the password hashing.

@ebiggers
Copy link
Collaborator

ebiggers commented Apr 8, 2022

This is the same deadlock in the Go runtime that I saw while investigating why pam_fscrypt doesn't work with sshd's ChallengeResponseAuthentication. This happens when the process logging in the user (sshd in that issue, xrdp in this issue) fork()s itself after authenticating the user with pam_authenticate() but before opening the user's session with pam_open_session(). You can see the fork in xrdp here, just above the call to auth_start_session() in the child process which calls pam_open_session(). This causes a deadlock because the Go runtime is multithreaded and thus doesn't support being fork()-ed except to immediately execve() a program.

The real bug here is that these programs (sshd and xrdp) are calling the PAM functions from different processes. This doesn't just cause problems with Go, but it also breaks the pam_get_item() and pam_set_item() functions for all PAM modules, regardless of what programming language they are written it. But Go definitely brings more problems here.

Unfortunately, I don't see a way for xrdp and pam_fscrypt to work together unless pam_fscrypt is rewritten in another programming language. Any other ideas would be appreciated, though.

@iskunk
Copy link
Author

iskunk commented Apr 8, 2022

Is there a way that pam_fscrypt can detect the situation, and peace out with a explanation if it's present? That would at least address the bad behavior, and put the onus back on the login application.

As for xrdp, it apparently changed over to calling auth_start_session() in a child process due to issues with systemd:

So it's not even clear to me if (and how) this issue could be articulated to them as a bug in their handling of PAM.

@ebiggers
Copy link
Collaborator

ebiggers commented Apr 8, 2022

Is there a way that pam_fscrypt can detect the situation, and peace out with a explanation if it's present? That would at least address the bad behavior, and put the onus back on the login application.

Maybe. Can you try #352 and see if it works to detect the issue with xrdp? You should no longer get a deadlock, but pam_fscrypt won't unlock directories and you'll get a warning in the syslog about that.

However, we can't detect the situation until Go code is already running, so it might be the case that you'll still just get a deadlock.

As for xrdp, it apparently changed over to calling auth_start_session() in a child process due to issues with systemd:

So it's not even clear to me if (and how) this issue could be articulated to them as a bug in their handling of PAM.

Well, neutrinolabs/xrdp#806 doesn't have any explanation, so it's not clear to me that it was the right thing to do. As far as I can tell, libpam has never supported the situation where a single PAM transaction is split among multiple processes.

@iskunk
Copy link
Author

iskunk commented Apr 9, 2022

@josephlr, that systemd-service proposal looks pretty good. I especially appreciate this point:

  • No ties to systemd in particular. Any init service can call fscrypt system [start|stop] <uid>.

@ebiggers, I'm happy to test patches. I've little experience with Go builds, but I can easily rebuild patched .deb packages from source, so there is little chance of getting that wrong.

A possible solution to this (albeit not as comprehensive as the systemd-service route) might be to have pam_fscrypt fork/exec fscrypt to do the crypto operations, so the Go runtime gets its own dedicated process. It seems almost like the unlocking could be done by something like

session optional pam_exec.so expose_authtok type=open_session fscrypt unlock ... --quiet

were it not for --unlock-with=<dunno>.

@azazar
Copy link

azazar commented Mar 8, 2023

Is there a plan to implement that solution?

@void-m4110c
Copy link

I have the exact same problem when trying to decrypt the filesystem vai slim. It works until "running passphrase hash for protextor 98324abff89034e"

@matt335672
Copy link

Anyone following this thread may be interested in this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants