Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd-firstboot fails with "Failed to write /etc/hostname: Bad file descriptor" #29137

Closed
NekkoDroid opened this issue Sep 8, 2023 · 11 comments · Fixed by #29138
Closed

Systemd-firstboot fails with "Failed to write /etc/hostname: Bad file descriptor" #29137

NekkoDroid opened this issue Sep 8, 2023 · 11 comments · Fixed by #29138
Labels
bug 🐛 Programming errors, that need preferential fixing firstboot regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Milestone

Comments

@NekkoDroid
Copy link

systemd version the issue has been seen with

254.2

Used distribution

Arch Linux

Linux kernel version used

6.4.7-arch1-1

CPU architectures issue was seen on

x86_64

Component

other

Expected behaviour you didn't see

Systemd-firstboot runs successfully without error writing to file

Unexpected behaviour you saw

Successfully runs up to entering the hostname, but fails to write hostname to /etc/hostname with "Bad file descriptor"

Steps to reproduce the problem

This is on a freshly installed & chroot'd system

  1. Run systemd-firstboot --prompt --reset
  2. Fill out steps up to entering hostname
  3. Enter hostname -> Systemd fails to write file

Additional program output to the terminal or log subsystem illustrating the issue

(not specifically logs)

I have manually downgraded systemd to 254.1 and run the command again and it seems to still work with that specific version.
@NekkoDroid NekkoDroid added the bug 🐛 Programming errors, that need preferential fixing label Sep 8, 2023
@poettering
Copy link
Member

any chance you can reproduce this in strace? or at least get us logs of the precise output?

@NekkoDroid
Copy link
Author

NekkoDroid commented Sep 8, 2023

I did reduce the command to just systemd-firstboot --hostname=<name> (error still happened even without strace)

https://gist.github.com/NekkoDroid/36824ba3e282172ebbbb932af0578082

The entire output of that command was just Failed to write /etc/hostname: Bad file descriptor

@poettering
Copy link
Member

hmm, wtf:

getpid()                                = 554
…
openat(6, ".#hostname3b3853f81064cd8a", O_RDWR|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0600) = 4
fcntl(4, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
newfstatat(4, "", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
write(4, "NekkoLaptop\n", 12)           = 12
fsync(4)                                = 0
newfstatat(4, "", {st_mode=S_IFREG|0600, st_size=12, ...}, AT_EMPTY_PATH) = 0
readlinkat(AT_FDCWD, "/proc/554/fd/4", 0x563983652e10, 4096) = -1 ENOENT (No such file or directory)

we just operated on fd 4 but /proc/553/fd/4 suddently doesn't exist anymore? how could that possibly be?

My educated guess is that this is fall-out from 4419735, i.e. #28829.

But how come your version even has that commit? that's long post v255, and cosmetic only, so should never have been backported.

@poettering
Copy link
Member

So this is probably because we set ProtectProc=invisible for hostnamed, which apparently mans /proc/self/ still works, but /proc/$$ does not...

@mrc0mmand mrc0mmand added the regression ⚠️ A bug in something that used to work correctly and broke through some recent commit label Sep 8, 2023
@mrc0mmand mrc0mmand added this to the v255 milestone Sep 8, 2023
@YHNdnzj
Copy link
Member

YHNdnzj commented Sep 8, 2023

But how come your version even has that commit? that's long post v255, and cosmetic only, so should never have been backported.

This is even backported to v252-stable...

@NekkoDroid
Copy link
Author

just out of curiousity: how come it is able to write the other files like vconsole and locale that are before it, but consistently only fails once the hostname is to be written?

@nafets227
Copy link
Contributor

You may want to have a look at my comment in archlinux at https://bugs.archlinux.org/task/79619

Summarizing it seems to be caused by arch-chroot, that is creating a new PID namespace but leaving the /proc mounted from the root namespace.
A workaround to remount /proc in the chroot solved my issue.

If my analysis is correct, it's not a system but a arch linux issue.

@NekkoDroid
Copy link
Author

It mostly makes sense to me.

What I am wondering now is what could be done to in theory not have to have reverted the commit?

  • I assume having it not create a new PID namespace isn't ideal
  • I assume there are some things wanting /proc from the root namespace

Or am I overthinking it and this revert/initial patch doesn't have any major/minor benefit to it.

@nafets227
Copy link
Contributor

IMHO the reverted commit is the right QUICK FIX.

Long term I would suggest Arch linux fixes arch-chroot to mount /proc with chroot NS. Hoping this does not brake anything else, as you correctly stated.
It might also be an option to let arch-chroot decide by parameter to use a new PID namespace or not.

After that the commit in systemd can be re-applied

@quietvoid
Copy link

quietvoid commented Sep 9, 2023

This morning my system failed to boot (using systemd-boot on Arch), and downgrading to 254.1 fixed it.
I don't know if it's related to this issue, just thought I'd mention it. Haven't seen other complaints.

@gottaeat
Copy link

gottaeat commented Sep 9, 2023

Haven't seen other complaints.

well, as things stand now, unless you remount procfs within the arch-chroot, you cannot install arch till they either bump to 254.3 or make the arch-chroot behavior match. i emailed the systemd package maintainer and the arch bugzilla people have made enough noise to bump the issue to the top 3, hopefully something gets done soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing firstboot regression ⚠️ A bug in something that used to work correctly and broke through some recent commit
Development

Successfully merging a pull request may close this issue.

7 participants