Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pam_systemd(login:session): Failed to release session: Interrupted system call #18817

Closed
rafamadriz opened this issue Feb 27, 2021 · 9 comments
Closed
Labels
login needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer

Comments

@rafamadriz
Copy link

rafamadriz commented Feb 27, 2021

systemd version the issue has been seen with

… systemd 247 (247.3-1-arch)

Used distribution

Arch Linux 64bit
Linux kernel version used (uname -a)

5.10.19-1-lts

CPU architecture issue was seen on

x86_64

Expected behaviour you didn't see

normal session, using root commands normally.

Unexpected behaviour you saw

systemd fails to release session , which caused the pam module to kick in and block sudo access everywhere for the default fail timeout period. When it happens it doesn't allow to use commands as root.

Additional program output to the terminal or log subsystem illustrating the issue

https://gist.githubusercontent.com/rafamadriz/c7cc6439f6374832bb07de67b3f48797/raw/0ba81fdc880d2d0030757a72017c4069def23885/gistfile1.txt

@poettering
Copy link
Member

What's the pam stack used?

Is this a getty login?

@poettering poettering added login needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer labels Mar 1, 2021
@rafamadriz
Copy link
Author

rafamadriz commented Mar 2, 2021

What's the pam stack used?

The pam related packages I have are the ones that come by default in Arch Linux:
pambase version: 20200721.1-2
pam version: 1.5.1-1

I have both up to date.

Is this a getty login?

Sorry for my ignorance, but didn't know what getty means. The definition I found: "getty is a program which manages a terminal line and its connected terminal. Its purpose is to protect the system from unauthorized access. Generally, each getty process is started by systemd and manages a single terminal line. "

But still not sure of what should I answer to your question.

@poettering
Copy link
Member

What I am looking for is: which program is requesting the PAM session? The logs only say "login", which is likely getty, but might not be. getty is the thing that asks you for your username on the console when you log in there.

And I also want to know the /etc/pam.d/ snippet for that service.

Is there anything interesting in the logs? The excerpt you posted is jut too short to be useful, please provide more context on that.

@rafamadriz
Copy link
Author

rafamadriz commented Mar 5, 2021

To give a little bit more of context:

Like a week ago, At a given moment of the day I suddenly couldn't use sudo (the output was just: Sorry, try again ). I was 100% sure my password was right, after like ten minutes trying to see what's was going on, I was able to use sudo again.

I found I post of a guy who had this exact same problem, he described that the pam module was updated and it includes using a new fail module that blocks access for your user temporarily across all terminals (rather than just the current one) on multiple incorrect password attempt. Part of the problem ended up being a script I had running with crond and using root access, which was failing and made the pam module to kick in.

This was the output on journalctl

CROND[18486]: pam_unix(crond:session): session closed for user root

I removed my script, and the pam module stopped to fail because of CROND. Whoever the problem is still there. My journalctl is still fill with those pam_unix(sudo:auth): conversation failed and archos login [477]: pam_systemd(login:session): Failed to release session: Interrupted system call The problem is not because of typing wrong my password. I've typed my password wrong on purpose to see what's the output on journalctl and it is different from the other two previous errors. myuser : 1 incorrect password attempt ; TTY=pts/2 ; PWD=/home/myuser ; USER=root

Finally, I did a post on Reddit and turns out there is a bunch of other people with the same problem. Some of them didn't even a have cronjob. That's the reason I decided to open an issue, apparently the is more people with the same problem.

This is a new install, and every single day my journalctl is having a few archos login[556]: pam_systemd(login:session): Failed to release session: Interrupted system call

What I am looking for is: which program is requesting the PAM session? The logs only say "login", which is likely getty, but might not be. getty is the thing that asks you for your username on the console when you log in there.

I think that the program is sudo, in my first comment, the link to the gist shows that most of the errors start with sudo. I really don't know how to debug this in a way to be 100% sure that the program is sudo. But that's what it seems like. And if I check my journalctl it is still the same, most of those errors start with login or sudo. Example:

Mar 03 11:33:51 archos sudo[43358]: pam_systemd_home(sudo:auth): Failed to query user record: Interrupted system call
Mar 03 18:05:23 archos login[477]: pam_systemd(login:session): Failed to release session: Interrupted system call

And I also want to know the /etc/pam.d/ snippet for that service.

$ cat /etc/pam.d/sudo

#%PAM-1.0
auth		include		system-auth
account		include		system-auth
session		include		system-auth

@poettering
Copy link
Member

Hmm, you closed this? Did you figure it out?

@rafamadriz
Copy link
Author

Hmm, you closed this? Did you figure it out?

Oh sorry for the late response, I didn't see the message.

No I didn't figured it out, I still have the same messages on my journaltctl every now and then (pam_unix(sudo:auth): conversation failed and pam_systemd(login:session): Failed to release session: Interrupted system call) But at least not every single day as before.

I decided to close because I wasn't having any mayor problem that were caused by those errors in journalctl . The whole situation where I couldn't use sudo for a certain period of time was because of the cronjob I had as root, as explained before. So in conclusion those errors doesn't seem to be anything that I have to be extremely worry about.

@archiecobbs
Copy link

FYI, I get this same error frequently, but only when rebooting a server.

See below for more logging around a recent example. This was on openSUSE Leap 15.2 with openSUSE's systemd-234-lp152.31.34.1.

Without doing a full analysis, it seems like the most likely explanation is that during the flurry of activity that occurs on shutdown, systemd receives a signal in the middle of an I/O system call, that sysytem call returns EINTR, but then systemd is not programmed to properly handle this return value and just returns it back up the call stack instead of retrying.

For example, possibly this code (and many other similar examples):

int bus_socket_write_message(sd_bus *bus, sd_bus_message *m, size_t *idx) {
        struct iovec *iov;
        ssize_t k;

        ...

        if (bus->prefer_writev)
                k = writev(bus->output_fd, iov, m->n_iovec);
        else {
                struct msghdr mh = {
                        .msg_iov = iov,
                        .msg_iovlen = m->n_iovec,
                };

                ...

                k = sendmsg(bus->output_fd, &mh, MSG_DONTWAIT|MSG_NOSIGNAL);
                if (k < 0 && errno == ENOTSOCK) {
                        bus->prefer_writev = true;
                        k = writev(bus->output_fd, iov, m->n_iovec);
                }
        }

Note that both writev(2) and sendmsg(2) document that EINTR is a possible when the process receives a signal.

     [EINTR]            A signal interrupts the system call before any data is transmitted.

If one would expect systemd to receive at least one signal during the shutdown process, then that would likely explain the bug.

The code above is invoked (indirectly) from here, which is where the EINTR is finally "caught" and then reported as an error:

_public_ PAM_EXTERN int pam_sm_close_session(
                pam_handle_t *handle,
                int flags,
                int argc, const char **argv) {

        ...

        id = pam_getenv(handle, "XDG_SESSION_ID");
        if (id && !existing) {
                _cleanup_(sd_bus_error_free) sd_bus_error error = SD_BUS_ERROR_NULL;
                _cleanup_(sd_bus_flush_close_unrefp) sd_bus *bus = NULL;

                /* Before we go and close the FIFO we need to tell logind that this is a clean session
                 * shutdown, so that it doesn't just go and slaughter us immediately after closing the fd */

                r = pam_acquire_bus_connection(handle, &bus);
                if (r != PAM_SUCCESS)
                        return r;

                r = bus_call_method(bus, bus_login_mgr, "ReleaseSession", &error, NULL, "s", id);
                if (r < 0) {
                        pam_syslog(handle, LOG_ERR, "Failed to release session: %s", bus_error_message(&error, r));
                        return PAM_SESSION_ERR;
                }
        }

And yes, this is a bug, not an "error", because any process that is doing blocking system calls while possibly receiving a signal must be prepared to handle EINTR as a normal return value and then retry (an example where systemd does properly doing this is posix_fallocate_loop()). Handling signals in C code is a pain - libraries like libevent, or using POSIX threads and sigwait(2) can help - see the example in the pthread_sigmask(3) man page. But I'm sure @poettering probably already knows all this. What I don't know is how systemd is currently setup to handle it, if at all.

Logging around recent example:

Nov 29 11:21:04 test syslog-ng[844]: syslog-ng shutting down; version='3.26.1'
Nov 29 11:21:04 test systemd[1]: Closed LVM2 poll daemon socket.
Nov 29 11:21:04 test systemd[1]: Stopped Early Kernel Boot Messages.
Nov 29 11:21:04 test systemd[1]: Stopped target Multi-User System.
Nov 29 11:21:04 test systemd[1]: Stopping MariaDB database server...
Nov 29 11:21:04 test systemd[1]: Stopped Apply settings from /etc/sysconfig/keyboard.
Nov 29 11:21:04 test sshd[876]: Received signal 15; terminating.
Nov 29 11:21:04 test systemd[1]: Stopping OpenSSH Daemon...
Nov 29 11:21:04 test systemd[1]: Stopped target Login Prompts.
Nov 29 11:21:04 test systemd[1]: Stopped Detect if the system suffers from bsc#1089761.
Nov 29 11:21:04 test systemd[1]: Stopping LSB: Fonehome client...
Nov 29 11:21:04 test systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Nov 29 11:21:04 test systemd[1]: Stopping System Logging Service...
Nov 29 11:21:04 test systemd[1]: Stopping Getty on tty1...
Nov 29 11:21:04 test systemd[1]: Stopped target Timers.
Nov 29 11:21:04 test systemd[1]: Stopped Daily rotation of log files.
Nov 29 11:21:04 test systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Nov 29 11:21:04 test sshd[20197]: pam_systemd(sshd:session): Failed to release session: Interrupted system call
Nov 29 11:21:04 test systemd[1]: Stopping Session 29819 of user archie.
Nov 29 11:21:04 test systemd[1]: Stopped Discard unused blocks once a week.
Nov 29 11:21:04 test systemd[1]: Stopped Do daily mandb update.
Nov 29 11:21:04 test systemd[1]: Stopping Command Scheduler...
Nov 29 11:21:04 test systemd[1]: Stopping Restore /run/initramfs on shutdown...
Nov 29 11:21:04 test systemd[1]: Stopped OpenSSH Daemon.
Nov 29 11:21:04 test sshd[20197]: pam_unix(sshd:session): session closed for user archie
Nov 29 11:21:04 test su[20272]: pam_unix(su:session): session closed for user root
Nov 29 11:21:04 test cron[879]: (CRON) INFO (Shutting down)
Nov 29 11:21:04 test systemd[1]: Stopped Command Scheduler.
Nov 29 11:21:04 test systemd[1]: Stopped Getty on tty1.
Nov 29 11:21:04 test systemd-logind[443]: Session 29819 logged out. Waiting for processes to exit.
Nov 29 11:21:04 test systemd[1]: Stopped Restore /run/initramfs on shutdown.
Nov 29 11:21:04 test systemd[1]: Starting Show Plymouth Reboot Screen...
Nov 29 11:21:04 test systemd[1]: Stopping The Apache Webserver...
Nov 29 11:21:04 test systemd[1]: Removed slice system-getty.slice.
Nov 29 11:21:04 test systemd[1]: Stopping Write information about system start to sysstat log...
Nov 29 11:21:04 test systemd[1]: Received SIGRTMIN+20 from PID 36590 (plymouthd).
Nov 29 11:21:04 test systemd[1]: Stopped Write information about system start to sysstat log.
Nov 29 11:21:04 test systemd[1]: Started Show Plymouth Reboot Screen.
Nov 29 11:21:04 test systemd[1]: Stopped The Apache Webserver.
Nov 29 11:21:04 test systemd[1]: Stopped target System Time Synchronized.
Nov 29 11:21:04 test systemd[1]: Stopping NTP Server Daemon...
Nov 29 11:21:04 test systemd[1]: Stopped NTP Server Daemon.
Nov 29 11:21:04 test systemd[1]: Stopped System Logging Service.
Nov 29 11:21:04 test fonehome[36568]: Stopping fonehome: ..done
Nov 29 11:21:04 test systemd[1]: Stopped LSB: Fonehome client.
Nov 29 11:21:04 test systemd[1]: Stopped target Network is Online.
Nov 29 11:21:05 test systemd[1]: Stopped MariaDB database server.
Nov 29 11:21:06 test sudo[20271]: pam_unix(sudo:session): session closed for user root
Nov 29 11:21:06 test systemd[1]: Stopped Session 29819 of user archie.
Nov 29 11:21:06 test systemd[1]: user-runtime-dir@1001.service: Unit not needed anymore. Stopping.
Nov 29 11:21:06 test systemd[1]: Stopping User Manager for UID 1001...
Nov 29 11:21:06 test systemd[1]: Stopping Login Service...
Nov 29 11:21:06 test systemd[20200]: Stopped target Default.
Nov 29 11:21:06 test systemd[20200]: Stopped target Basic System.
Nov 29 11:21:06 test systemd[20200]: Stopped target Sockets.
Nov 29 11:21:06 test systemd[20200]: Closed D-Bus User Message Bus Socket.
Nov 29 11:21:06 test systemd[20200]: Stopped target Timers.
Nov 29 11:21:06 test systemd[20200]: Stopped target Paths.
Nov 29 11:21:06 test systemd[20200]: Reached target Shutdown.
Nov 29 11:21:06 test systemd[20200]: Starting Exit the Session...
Nov 29 11:21:06 test systemd[20200]: Received SIGRTMIN+24 from PID 36602 (kill).
Nov 29 11:21:06 test systemd-logind[443]: Removed session 29819.
Nov 29 11:21:06 test systemd[20201]: pam_unix(systemd-user:session): session closed for user archie
Nov 29 11:21:06 test systemd[1]: user-runtime-dir@1001.service: Unit not needed anymore. Stopping.
Nov 29 11:21:06 test systemd[1]: Stopped User Manager for UID 1001.
Nov 29 11:21:06 test systemd[1]: user-runtime-dir@1001.service: Unit not needed anymore. Stopping.
Nov 29 11:21:06 test systemd[1]: user-1001.slice: Unit not needed anymore. Stopping.
Nov 29 11:21:06 test systemd[1]: Stopping User Runtime Directory /run/user/1001...
Nov 29 11:21:06 test systemd[1]: Unmounted /run/user/1001.
Nov 29 11:21:06 test systemd[1]: Stopped User Runtime Directory /run/user/1001.
Nov 29 11:21:06 test systemd[1]: user-1001.slice: Unit not needed anymore. Stopping.
Nov 29 11:21:06 test systemd[1]: Removed slice User Slice of UID 1001.
Nov 29 11:21:06 test systemd[1]: Stopping Permit User Sessions...
Nov 29 11:21:06 test systemd[1]: Reached target Unmount All Filesystems.
Nov 29 11:21:06 test systemd[1]: Stopped Permit User Sessions.
Nov 29 11:21:06 test systemd[1]: Stopped target Remote File Systems.
Nov 29 11:21:06 test systemd[1]: Stopped target Remote File Systems (Pre).
Nov 29 11:21:06 test systemd[1]: Stopped target Network.
Nov 29 11:21:06 test systemd[1]: Stopping wicked managed network interfaces...
Nov 29 11:21:06 test systemd[1]: Stopped Login Service.
Nov 29 11:21:06 test wicked[36605]: eth0            device-ready
Nov 29 11:21:06 test systemd[1]: Stopped wicked managed network interfaces.
Nov 29 11:21:06 test systemd[1]: Stopping wicked network nanny service...
Nov 29 11:21:06 test systemd[1]: Stopped wicked network nanny service.
Nov 29 11:21:06 test systemd[1]: Stopping wicked network management service daemon...
Nov 29 11:21:06 test systemd[1]: Stopped wicked network management service daemon.
Nov 29 11:21:06 test systemd[1]: Stopping wicked DHCPv4 supplicant service...
Nov 29 11:21:06 test systemd[1]: Stopping wicked AutoIPv4 supplicant service...
Nov 29 11:21:06 test systemd[1]: Stopping wicked DHCPv6 supplicant service...
Nov 29 11:21:06 test systemd[1]: Stopped wicked DHCPv4 supplicant service.
Nov 29 11:21:06 test systemd[1]: Stopped wicked AutoIPv4 supplicant service.
Nov 29 11:21:06 test systemd[1]: Stopped wicked DHCPv6 supplicant service.
Nov 29 11:21:06 test systemd[1]: Stopping D-Bus System Message Bus...
Nov 29 11:21:06 test systemd[1]: Stopped D-Bus System Message Bus.
Nov 29 11:21:06 test systemd[1]: Stopped target Basic System.
Nov 29 11:21:06 test systemd[1]: Stopped target Sockets.
Nov 29 11:21:06 test systemd[1]: Stopped target Slices.
Nov 29 11:21:06 test systemd[1]: Removed slice User and Session Slice.
Nov 29 11:21:06 test systemd[1]: Stopped target Paths.
Nov 29 11:21:06 test systemd[1]: Stopped Watch for changes in CA certificates.
Nov 29 11:21:06 test systemd[1]: Closed D-Bus System Message Bus Socket.
Nov 29 11:21:06 test systemd[1]: Stopped target System Initialization.
Nov 29 11:21:06 test systemd[1]: Stopping Update UTMP about System Boot/Shutdown...
Nov 29 11:21:06 test systemd[1]: Stopping Load/Save Random Seed...
Nov 29 11:21:06 test systemd[1]: Stopped target Local Encrypted Volumes.
Nov 29 11:21:06 test systemd[1]: Stopped Dispatch Password Requests to Console Directory Watch.
Nov 29 11:21:06 test systemd[1]: Stopped target Swap.
Nov 29 11:21:06 test systemd[1]: Stopped Apply Kernel Variables.
Nov 29 11:21:06 test systemd[1]: Stopped Load Kernel Modules.
Nov 29 11:21:06 test systemd[1]: Stopped Update UTMP about System Boot/Shutdown.
Nov 29 11:21:06 test systemd[1]: Stopped Create Volatile Files and Directories.
Nov 29 11:21:06 test systemd[1]: Stopped target Local File Systems.
Nov 29 11:21:06 test systemd[1]: Stopped target Local File Systems (Pre).
Nov 29 11:21:06 test systemd[1]: Stopped Create Static Device Nodes in /dev.
Nov 29 11:21:06 test systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Nov 29 11:21:06 test systemd[1]: Stopped Load/Save Random Seed.
Nov 29 11:21:06 test systemd[1]: Stopped Remount Root and Kernel File Systems.
Nov 29 11:21:06 test systemd[1]: Stopped File System Check on Root Device.
Nov 29 11:21:06 test systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Nov 29 11:21:06 test systemd[1]: Reached target Shutdown.
Nov 29 11:21:06 test systemd[1]: Reached target Final Step.
Nov 29 11:21:06 test systemd[1]: Starting Reboot...
Nov 29 11:21:06 test systemd[1]: Shutting down.
Nov 29 11:21:06 test systemd-shutdown: Syncing filesystems and block devices.
Nov 29 11:21:06 test systemd-shutdown: Sending SIGTERM to remaining processes...
Nov 29 11:21:06 test systemd-journald[27065]: Journal stopped

@Damenly
Copy link

Damenly commented Dec 14, 2021

I saw the issue on my Archlinux too. And It caused my machine to hang on reboot.
The only suspicious I found in my log is "pam_systemd(sshd:session): Failed to release session: Interrupted".
So I changed my /etc/ssh/sshd_config to "UsePAM no". The issue disappears.

@mpeter50
Copy link

mpeter50 commented Jan 1, 2022

I have also encountered this error message on a Debian system. The reason I looked is because it took unexpectedly long to shut down, but that might have been caused by something else.

Though it seems to me a fix has been merged for version 250, and my system is currently on 247 (based on systemctl --version).
@Damenly it might be the same for you too, hopefully 250 will fix it.

fbuihuu added a commit to fbuihuu/systemd that referenced this issue Nov 16, 2022
bus_ppoll() relies on ppoll(2) which can return EINTR when interrupted by a
signal. bus_ppoll() should be prepared to handle this case properly by
restarting the system call.

Hence let's make it call safe_ppoll_usec() instead since it does that
automatically for us.

This should address one of the cases that lead to the following error, which
might occur when shutting down the system:

 login[1489]: pam_systemd(login:session): Failed to release session: Interrupted system call

IOW this patch closes one possible root cause of issue systemd#18817.
fbuihuu added a commit to fbuihuu/systemd that referenced this issue Nov 21, 2022
bus_ppoll() relies on ppoll(2) which can return EINTR when interrupted by a
signal. bus_ppoll() should be prepared to handle this case properly by
restarting the system call.

Hence let's make it call safe_ppoll_usec() instead since it does that
automatically for us.

This should address one of the cases that lead to the following error, which
might occur when shutting down the system:

 login[1489]: pam_systemd(login:session): Failed to release session: Interrupted system call

IOW this patch closes another possible root cause of issue systemd#18817.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
login needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer
Development

No branches or pull requests

5 participants