Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Useradd segmentation fault #628

Open
kaizushi opened this issue Jan 18, 2023 · 15 comments
Open

Useradd segmentation fault #628

kaizushi opened this issue Jan 18, 2023 · 15 comments

Comments

@kaizushi
Copy link

kaizushi commented Jan 18, 2023

Operating system: Gentoo
Kernel version: Linux 6.1.2
GCC version: Gentoo Hardened 12.2.1_p20221231 p8
Shadow version: 4.13-r1

This is a production system and a workaround would be appreciated. I suspect my kernel is too new and one of its hardening features is getting in the way, so I am rebuilding an older kernel for now. My kernel configuration might be an issue, as I started from 'make tinyconfig' to build a minimal kernel for KVM/qemu virtio with pretty much every optional security feature enabled.

I was having issues with useradd which I discussed on Libera IRC channel #gentoo and the person helping me told me to make this bug report. They ran me through doing the debugging below with gdb...

GDB output...

# gdb --args useradd -m -G users test1298
GNU gdb (Gentoo 12.1 vanilla) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from useradd...
Reading symbols from /usr/lib/debug//usr/sbin/useradd.debug...
(gdb) run
Starting program: /usr/sbin/useradd -m -G users test1298
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00006441cfa84319 in __sgr_dup (sgent=0x1000) at sgroupio.c:36
36      sgroupio.c: No such file or directory.
(gdb) bt
#0  0x00006441cfa84319 in __sgr_dup (sgent=0x1000) at sgroupio.c:36
#1  0x00006441cfa85c29 in commonio_open (db=db@entry=0x6441cfa92ca0 <gshadow_db>, mode=2, mode@entry=66)
    at commonio.c:694
#2  0x00006441cfa846ce in sgr_open (mode=mode@entry=66) at sgroupio.c:246
#3  0x00006441cfa7856e in open_group_files () at useradd.c:1856
#4  0x00006441cfa790fd in get_groups (list=0x7ffff616ef6d "users") at useradd.c:766
#5  process_flags (argc=argc@entry=5, argv=argv@entry=0x7ffff616ddb8) at useradd.c:1354
#6  0x00006441cfa73e46 in main (argc=5, argv=0x7ffff616ddb8) at useradd.c:2499
(gdb) p *sgent
Cannot access memory at address 0x1000
@anthonyryan1
Copy link

I'm also seeing this in 4.13 but not in 4.12.3.

I've seen the same segfault in useradd and gpasswd. Will update this issue as I investigate.

@thesamesam
Copy link
Contributor

I'm also seeing this in 4.13 but not in 4.12.3.

Could you bisect?

@anthonyryan1
Copy link

anthonyryan1 commented Feb 20, 2023

I'll aim to have a bisect ready tomorrow. What's interesting is that it seems to be tied to the existing system users/groups.

Running the same command on a couple hundred different machines that all have unique user/group, a handful will repeatedly segfault while most will succeed.

I'm going to try and figure out what's special about the group file on the affected machines. One theory I'm still considering is that this may be some sort of subtle malformation in /etc generated by a much older version of shadow tools.

I'm also using Gentoo, and the group files on some machines I'm testing were created and modified for over a decade. I'm going to run pwck and grpck on a few of the affected ones to try and rule that out.

@kaizushi
Copy link
Author

After opening this I found a workaround which is to run version 4.12.3.

@anthonyryan1
Copy link

I did successfully bisect this, although it held some extra surprises.

It turns out, the culprit commit is not in the release. Building 4.13 from git failed to reproduce the bug. It turns out the Gentoo developers decided to backport a commit that has not yet made it into a stable release and is the culprit.

The problematic commit is:
a281f24
#595

And here you can see the Gentoo developers backporting it:
https://github.com/gentoo/gentoo/blob/master/sys-apps/shadow/files/shadow-4.13-configure-clang16.patch

I'm going to CC @fweimer-rh here as the author of that commit to see if he has any insight into what might be going on.

@fweimer-rh
Copy link
Contributor

I backported this patch into Fedora rawhide as well, but cannot reproduce the problem there. sgent=0x1000 is certainly suspicious, but I don't know where that could be coming from.

@anthonyryan1
Copy link

Recompiling with -O0 to avoid anything being optimized out. I'm getting the following from GDB:

#0  0x00007ffff79e3e86 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libasan.so.8
#1  0x00007ffff797e25e in strdup () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libasan.so.8
#2  0x000055555559d9ed in __sgr_dup (sgent=0x5555555c1908) at sgroupio.c:62
#3  0x000055555559f32a in gshadow_dup (ent=0x5555555c1908) at sgroupio.c:116
#4  0x00005555555a58f7 in commonio_open (db=0x5555555e1620 <gshadow_db>, mode=0) at commonio.c:694
#5  0x00005555555a0318 in sgr_open (mode=0) at sgroupio.c:246
#6  0x000055555558f3b0 in open_files () at grpck.c:294
#7  0x0000555555593b44 in main (argc=2, argv=0x7fffffffdfc8) at grpck.c:835

The code looks correct to me, in that gshadow_dup is being run with the eptr of gshadow_parse, which is just a wrapper around glibc's sgetsgent. It doesn't seem likely, but is it possible we're getting faulty pointers out of sgetsgent?

It would fit with the bug. With proper detection for sgetsgent we've changed from the function in lib/gshadow.c to the function in glibc.

I've tested against glibc 2.36 and glibc 2.37 (possibly including distro backports), and still get segfaults in both after the fixed sgetsgent detection.

@fweimer-rh
Copy link
Contributor

fweimer-rh commented Feb 20, 2023

Well, we used to have a glibc bug in this area:

But this should be fixed in current glibc. The fix went into glibc 2.32 and has been backported widely, too.

But looking at sgetsgent_r and sgetsgent, it looks like the ERANGE protocol is not correctly implemented within glibc. Do you have a long line in /etc/gshadow?

Could you check that the sgetsgent in glibc is actually called, and not the version from shadow-utils?

@thesamesam
Copy link
Contributor

thesamesam commented Feb 20, 2023

@anthonyryan1 Just want to add that there's nothing really unusual about the backporting part (it's not a controverisal patch and distros, including us, do it all the time when it's required), I can't reproduce the bug, and there's a good reason for backporting all of this work. But I won't distract from the debugging effort here. If you want to discuss that side of it more, feel free to email me at sam@g.o though.

@fweimer-rh
Copy link
Contributor

I fixed the glibc bug:

Of course I don't know if that's the problem you are seeing, @anthonyryan1.

@anthonyryan1
Copy link

@fweimer-rh I do have a one very long line in gshadow, over 1200 bytes.

I expect that line length will likely explain all the machines the segfault vs the ones which do not. I'll explore this a bit more later today.

Additionally, I agree that the patch doesn't look to be the problem. Rather I feel like it's revealed a different bug that was previously masked by using the alternate code path.

@kaizushi
Copy link
Author

Interesting the length comes into play, as in the command above I used to create this bug report the line for group users has a lot of entries, and is 515 characters long.

@hannob
Copy link

hannob commented Feb 21, 2023

I also noticed this bug, however with grpck (on gentoo). A very simple reproducer for me is to run grpck on an empty group file and a gshadow file with a 1024 byte line (e.g. just "a"s). Reproducer:

touch 1
for x in $(seq 1 1024); do echo -n a; done > 2
grpck 1 2

I can confirm that @fweimer-rh 's glibc patch fixes this issue.

@anthonyryan1
Copy link

I can also confirm the glibc patch is working.

It looks like glibc merged the patch from the mailing list yesterday: https://sourceware.org/git/?p=glibc.git;a=commit;h=969e9733c7d17edf1e239a73fa172f357561f440

I think we're good to close this issue. The bug wasn't in shadow-utils, and the fix is now in glibc master. Anyone who still hits this combination can find the necessary information here in the closed issue just fine.

@thesamesam
Copy link
Contributor

It'd be worth waiting for it to trickle down to glibc's stable branches first, especially if a new shadow release ends up being made in the interim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants