Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: test failures on 32-bit x86 in 36-sim-ipc_syscalls and 37-sim-ipc_syscalls_be #166

Closed
andreasbaumann opened this issue Aug 2, 2019 · 26 comments

Comments

@andreasbaumann
Copy link

andreasbaumann commented Aug 2, 2019

Happens with kernel 5.2.x on Intel 32-bit (Archlinux32):

Test 36-sim-ipc_syscalls%%001-00001 result:   ERROR 36-sim-ipc_syscalls rc=14
Test 37-sim-ipc_syscalls_be%%011-00001 result:   ERROR 37-sim-ipc_syscalls_be rc=14

Adding printf to 36-sim-ipc_syscalls.c:

        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(semget), 0);
        printf( "semget: %d\n", rc );

shows:

semget: -14

Same applies for:

./37-sim-ipc_syscalls_be

yielding:

semget: -14
@pcmoore pcmoore changed the title test failures on 32-bit x86 in 36-sim-ipc_syscalls and 37-sim-ipc_syscalls_be BUG: test failures on 32-bit x86 in 36-sim-ipc_syscalls and 37-sim-ipc_syscalls_be Aug 5, 2019
@pcmoore pcmoore added this to the v2.4.2 milestone Aug 5, 2019
@pcmoore
Copy link
Member

pcmoore commented Aug 5, 2019

Hmm, this is curious.

The -14 return value is -EFAULT which sees limited use in the general libseccomp code. Based on what you are reporting, it looks like the only reasonable location would be in arch_syscall_translate:

int arch_syscall_translate(const struct arch_def *arch, int *syscall)
{
        int sc_num;
        const char *sc_name;

        /* special handling for syscall -1 */
        if (*syscall == -1)
                return 0;

        if (arch->token != arch_def_native->token) {
                sc_name = arch_syscall_resolve_num(arch_def_native, *syscall);
                if (sc_name == NULL)
                        return -EFAULT;

                sc_num = arch_syscall_resolve_name(arch, sc_name);
                if (sc_num == __NR_SCMP_ERROR)
                        return -EFAULT;

                *syscall = sc_num;
        }

        return 0;
}

... and even then it should only trigger in the multilib case. @andreasbaumann you reported this as 32-bit x86, but is this on a 64-bit x86_64 system where you are compiling for 32-bit x86?

@andreasbaumanneit
Copy link

andreasbaumanneit commented Aug 6, 2019

It's in a 32-bit chroot on a 64-bit system (Archlinux32 chroot on Archlinux).
I'll try to provide some more information as soon as I manage to run the test program in gdb. :-)

@andreasbaumanneit
Copy link

Aha. So the tests run if I build the package on a i486 virtual machine.
So the library in the chroot sees the 64-bit kernel which is slightly different.
This means we have to test the package not in a 32-bit chroot but on the emulated machine.

@andreasbaumanneit
Copy link

andreasbaumanneit commented Aug 6, 2019

Or there is a mismatch with kernel header/glibc in our 32-bit packages and the host kernel..

@pcmoore
Copy link
Member

pcmoore commented Aug 6, 2019

Thanks for the clarification, this is starting to make sense. Yes, I believe the issue is rooted in the mismatch between the host's native ABI (64-bit x86_64) and the process (32-bit x86).

At this moment I don't happen to have a 64-bit x86_64 system with a 32-bit x86 chroot for testing, @andreasbaumann would you be able to instrument arch_syscall_translate() to check/display the return values from arch_syscall_resolve_num() and arch_syscall_resolve_name()? I suspect one of those functions is failing which is causing the -EFAULT.

@andreasbaumann
Copy link
Author

andreasbaumann commented Aug 6, 2019

I added:

        if (arch->token != arch_def_native->token) {
                sc_name = arch_syscall_resolve_num(arch_def_native, *syscall);
                printf( "arch_syscall_resolve_num: %s\n", sc_name );
                if (sc_name == NULL)
                        return -EFAULT;

                sc_num = arch_syscall_resolve_name(arch, sc_name);
                printf( "arch_syscall_resolve_name: %d\n", sc_num );
                if (sc_num == __NR_SCMP_ERROR)
                        return -EFAULT;

and:

        /* translate the syscall */
        rc = arch_syscall_translate(db->arch, &rule_dup->syscall);
        printf( "arch_syscall_translate rc: %d\n", rc );

and I'm getting with test '36-sim-ipc_syscalls':

arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 65
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 1073741889
arch_syscall_translate rc: 0
semop: 0
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 65
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 1073741889
arch_syscall_translate rc: 0
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semtimedop
arch_syscall_resolve_name: 220
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semtimedop
arch_syscall_resolve_name: 1073742044
arch_syscall_translate rc: 0
semtimedop: 0
arch_syscall_translate rc: 0
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 65
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semtimedop
arch_syscall_resolve_name: 220
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 1073741889
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semtimedop
arch_syscall_resolve_name: 1073742044
arch_syscall_translate rc: 0
arch_syscall_translate rc: 0
arch_syscall_resolve_num: (null)
arch_syscall_translate rc: -14
arch_syscall_resolve_num: (null)
arch_syscall_translate rc: -14
semget: -14

@michelmno
Copy link

FYI similar failing tests for ppc64le architecture as reported in https://bugzilla.opensuse.org/show_bug.cgi?id=1142614 ; but "semtimedop" reports rc -14 (not semget as for this issue)

@pcmoore
Copy link
Member

pcmoore commented Aug 8, 2019

Looking at the output @andreasbaumann posted I see the following for semop:

arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 65
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 1073741889
arch_syscall_translate rc: 0

In this case we know we are talking about "36-sim-ipc_syscalls" so we can see that we have configured libseccomp to generate filters for x86, x86_64, and x32; with x86 being the "native" ABI.
The first ABI translation results in semop resolving to 65 which I'm guessing is for x86_64 (it is the second configured ABI, and the first non-native ABI). The second ABI translation results in semop resolving to 1073741889, which looks odd at first until one realizes the large syscall number is due to the "x32 bit" being set; if we look at the syscall number in the context of x32 we see it is the x32 variant of semop. Considering the code in "36-sim-ipc_syscalls" this looks correct.

We then see something similar with semtimedop. The duplicated semop translation is due to libseccomp's internal transaction code (it's a long and complicated "feature" that is necessary to recover from internal errors when adding rules, at some point we will expose it for use by callers). Everything still looks good.

Then we get to the semget failure:

arch_syscall_translate rc: 0
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 65
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semtimedop
arch_syscall_resolve_name: 220
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semop
arch_syscall_resolve_name: 1073741889
arch_syscall_translate rc: 0
arch_syscall_resolve_num: semtimedop
arch_syscall_resolve_name: 1073742044
arch_syscall_translate rc: 0
arch_syscall_translate rc: 0
arch_syscall_resolve_num: (null)
arch_syscall_translate rc: -14
arch_syscall_resolve_num: (null)
arch_syscall_translate rc: -14

... here it looks like we fail to resolve the semget name in arch_syscall_resolve_num() given the native arch of x86. I'm guessing @michelmno's reported problem with semtimedop and ppc64le is similar.

Looking over the libseccomp arch/ABI code quickly, everything would appear to be correct. However, looking at recent related kernel changes I see the following commit which first shipped in Linux v5.1:

commit 0d6040d4681735dfc47565de288525de405a5c99
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Mon Dec 31 14:38:26 2018 +0100

arch: add split IPC system calls where needed

The IPC system call handling is highly inconsistent across architectures,
some use sys_ipc, some use separate calls, and some use both.  We also
have some architectures that require passing IPC_64 in the flags, and
others that set it implicitly.

For the addition of a y2038 safe semtimedop() system call, I chose to only
support the separate entry points, but that requires first supporting
the regular ones with their own syscall numbers.

The IPC_64 is now implied by the new semctl/shmctl/msgctl system
calls even on the architectures that require passing it with the ipc()
multiplexer.

I'm not adding the new semtimedop() or semop() on 32-bit architectures,
those will get implemented using the new semtimedop_time64() version
that gets added along with the other time64 calls.
Three 64-bit architectures (powerpc, s390 and sparc) get semtimedop().

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>

@andreasbaumann you mentioned that you are seeing this problem on Linux v5.2 (@michelmno I don't see a kernel version in the bug report you posted), do you also see the problem on Linux v5.1? What about Linux v5.0 or earlier?

@andreasbaumann
Copy link
Author

Yes, I tested on 5.2.

@pcmoore
Copy link
Member

pcmoore commented Aug 8, 2019

Yes, I tested on 5.2.

Yes, thanks. I was trying to understand if you saw similar problems on Linux v5.1 and v5.0 (or earlier).

@andreasbaumann
Copy link
Author

I didn't test explicitely with 5.1 or 5.0. Let me dig into the archives to see what versions
libseccomp has been compiled on in the past..

@andreasbaumann
Copy link
Author

-rw-r--r-- 1 http http 72900 Jun 16 15:15 pool/libseccomp-2.4.1-2.1-i686.pkg.tar.xz
So that would be 5.1 most likely, and there I didn't have any errors in the tests.

@michelmno
Copy link

tests for ppc64le on opensuse was with kernel Linux versions: 5.1.16 & 5.2.2 ; the last available rpm built successfully on June 24th that was around linux version 5.1.7 (I do not have details of previous builds except an old one that succeeded with 4.12.14)

@pcmoore
Copy link
Member

pcmoore commented Aug 8, 2019

Hmm. I would have expected builds based on Linux v5.1 to fail as well. Bummer.

It still may be worth testing this again once the syscall table is updated (see issue #163).

@andreasbaumann
Copy link
Author

andreasbaumann commented Aug 9, 2019

I redid all building and testing in a virtual machine (libvirt/qemu/i686) and in 32-bit chroots on
64-bit kernels (swapping kernels on the host respectively in the virtual machine between
5.2.6, 5.1.16 and 5.0.13). Here are the results with libseccomp 2.4.1:

5.0.x, VM:
Test 36-sim-ipc_syscalls%%025-00001 result:   FAILURE 36-sim-ipc_syscalls rc=14
Test 37-sim-ipc_syscalls_be%%013-00001 result:   FAILURE 37-sim-ipc_syscalls_be rc=14

5.1.x, VM:
Test 36-sim-ipc_syscalls%%025-00001 result:   FAILURE 36-sim-ipc_syscalls rc=14
Test 37-sim-ipc_syscalls_be%%013-00001 result:   FAILURE 37-sim-ipc_syscalls_be rc=14

5.2.x, VM:
Test 36-sim-ipc_syscalls%%025-00001 result:   FAILURE 36-sim-ipc_syscalls rc=14
Test 37-sim-ipc_syscalls_be%%013-00001 result:   FAILURE 37-sim-ipc_syscalls_be rc=14

5.0.x, chroot:
Test 36-sim-ipc_syscalls%%025-00001 result:   FAILURE 36-sim-ipc_syscalls rc=14
Test 37-sim-ipc_syscalls_be%%013-00001 result:   FAILURE 37-sim-ipc_syscalls_be rc=14

5.1.x, chroot:
Test 36-sim-ipc_syscalls%%025-00001 result:   FAILURE 36-sim-ipc_syscalls rc=14
Test 37-sim-ipc_syscalls_be%%013-00001 result:   FAILURE 37-sim-ipc_syscalls_be rc=14

5.2.x, chroot:
Test 36-sim-ipc_syscalls%%025-00001 result:   FAILURE 36-sim-ipc_syscalls rc=14
Test 37-sim-ipc_syscalls_be%%013-00001 result:   FAILURE 37-sim-ipc_syscalls_be rc=14

On the positive side, 5.1 also fails. :-)
On the negative side, I have to find the last version, which actually worked.

Could it be, that libseccomp 2.4.1 cannot work on 5.0 for instance, so I have to use an
older version on that kernel?

@andreasbaumann
Copy link
Author

I really start to wonder whether we ever run the tests on Archlinux32 before..

@pcmoore
Copy link
Member

pcmoore commented Aug 12, 2019

I would be curious to hear if this was always broken.

@pcmoore
Copy link
Member

pcmoore commented Oct 22, 2019

@andreasbaumann I'm not sure if you still have the ability to test things quickly, but I would be curious to hear if PR #176 fixes this for you.

@andreasbaumann
Copy link
Author

I diffed the branch https://github.com/pcmoore/misc-libseccomp/tree/gh164 against
release 2.4.1 (hope, this was the right thing to do) for the patch.
I ran 64-bit and 32-bit builds on Archlinux32 in chroots on a 64-bit machine with kernel 5.3.7
(where the tests failed before in the 32-bit chroot). This looks good know, all tests passed. :-)

@pcmoore
Copy link
Member

pcmoore commented Oct 22, 2019

Great, thanks for testing @andreasbaumann!

@pcmoore
Copy link
Member

pcmoore commented Oct 22, 2019

I think we can mark this closed as soon as #164 and #163 are merged/closed.

@cpaelzer
Copy link

Just FYI this also affects other combination (as expected), I see it failing on i386, ppc64le, and s390x.
No chroots, just building there as-is, but with newer kernels - or to be more specific with linux-libc-dev built from that new kernel source.

I can confirm that line 343 of this gives me what I need for the case that I have debugged.

I agree that with #163 #164 fixed through #176 being released this can be closed.

As for root cause analysis IMHO this was more breaking compared to other new syscalls because plenty of (existing but per-arch non-implemented) syscall numbers got suddenly defined by the kernel change to unify numbers which will let them show up in e.g.:

/usr/include/s390x-linux-gnu/asm/unistd_64.h:336:#define __NR_semtimedop 392

That makes 392 and others needed for e.g. s390x in libseccomps files, while IIRC not really being implemented there yet.

And that also matches the kernel 5.2 as breaking point that was identified, as that is when the change got merged.

@pcmoore
Copy link
Member

pcmoore commented Oct 25, 2019

Thanks @cpaelzer for the verification, hopefully we'll get the updated syscall tables merged soon.

@pcmoore
Copy link
Member

pcmoore commented Oct 31, 2019

FYI, the master branch now has an updated syscall table, with the release-2.4 branch expected to get it soon.

@drakenclimber
Copy link
Member

The release-2.4 branch now has the updated syscall table as well.
https://github.com/seccomp/libseccomp/tree/release-2.4

@pcmoore
Copy link
Member

pcmoore commented Nov 4, 2019

Thanks for the backport @drakenclimber. Since the updated syscall table should resolve this issue based on the discussion above, I'm going to close out this issue; feel free to reopen if the problem remains.

@pcmoore pcmoore closed this as completed Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants