Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal_receive.gen test fails on sparc #21

Closed
Whissi opened this issue Jan 2, 2018 · 11 comments
Closed

signal_receive.gen test fails on sparc #21

Whissi opened this issue Jan 2, 2018 · 11 comments

Comments

@Whissi
Copy link

Whissi commented Jan 2, 2018

A Gentoo sparc user (@DerDakon) is reporting the following reproducible test failure with strace-4.20:

FAIL: signal_receive.gen
========================

--- exp	2018-01-01 22:35:11.070789051 +0100
+++ log	2018-01-01 22:35:11.070789051 +0100
@@ -13,7 +13,7 @@
 kill(3066, SIGEMT) = 0
 --- SIGEMT {si_signo=SIGEMT, si_code=SI_USER, si_pid=3066, si_uid=250} ---
 kill(3066, SIGFPE) = 0
---- SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=3066, si_uid=250} ---
+--- SIGFPE {si_signo=SIGFPE, si_code=SI_USER, si_pid=250, si_uid=0} ---
 kill(3066, SIGBUS) = 0
 --- SIGBUS {si_signo=SIGBUS, si_code=SI_USER, si_pid=3066, si_uid=250} ---
 kill(3066, SIGSEGV) = 0
signal_receive.gen.test: failed test: ../../strace -a16 -e trace=kill ../signal_receive output mismatch

His system's details:

Portage 2.3.13 (python 3.5.4-final-0, default/linux/sparc/17.0, gcc-6.4.0, glibc-2.25-r9, 4.14.8-gentoo-r1 sparc64)
=================================================================
                         System Settings
=================================================================
System uname: Linux-4.14.8-gentoo-r1-sparc64-sun4v-with-gentoo-2.3
KiB Mem:    33133552 total,  19838904 free
KiB Swap:          0 total,         0 free
Timestamp of repository gentoo: Mon, 01 Jan 2018 01:15:01 +0000
Head commit of repository gentoo: 09c9b588c22ad79bf62481b1c30a40419f0429b7
sh bash 4.3_p48-r1
ld GNU ld (Gentoo 2.29.1 p3) 2.29.1
app-shells/bash:          4.3_p48-r1::gentoo
dev-lang/perl:            5.24.3::gentoo
dev-lang/python:          2.7.14-r1::gentoo, 3.4.5::gentoo, 3.5.4-r1::gentoo
dev-util/cmake:           3.9.6::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.3::gentoo
sys-apps/openrc:          0.34.11::gentoo
sys-apps/sandbox:         2.10-r4::gentoo
sys-devel/autoconf:       2.69::gentoo
sys-devel/automake:       1.15.1-r1::gentoo
sys-devel/binutils:       2.29.1-r1::gentoo
sys-devel/gcc:            6.4.0::gentoo
sys-devel/gcc-config:     1.8-r1::gentoo
sys-devel/libtool:        2.4.6-r3::gentoo
sys-devel/make:           4.2.1::gentoo
sys-kernel/linux-headers: 4.13::gentoo (virtual/os-headers)
sys-libs/glibc:           2.25-r9::gentoo

Build.log: https://bugs.gentoo.org/attachment.cgi?id=512660

Bug: https://bugs.gentoo.org/643060

@ldv-alt
Copy link
Member

ldv-alt commented Jan 2, 2018

What is a misterious process with pid==250 and uid==0, and why it sends signals instead of the process itself? It must be a problem in the operating system.

@DerDakon
Copy link

DerDakon commented Jan 2, 2018

I suspect that either the the kernel fills the struct with wrong information, or strace wrongly interprets the struct layout. Any idea where to look?

@Whissi
Copy link
Author

Whissi commented Jan 2, 2018

UID=250 is the user ("portage") building strace and running the test suite. So this could be an indicator that some struct information went bad...

@ldv-alt
Copy link
Member

ldv-alt commented Jan 3, 2018

Looks like you build strace for 32-bit sparc. The last time I built strace this way was v4.16, the box where I could test this configuration (bender.sparc.dev.gentoo.org) is no longer accessible.

@DerDakon
Copy link

DerDakon commented Jan 3, 2018

Send me patches ;)

@ldv-alt
Copy link
Member

ldv-alt commented Jan 3, 2018

$ cat sigfpe.c
#include <assert.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>

static void
handler(int sig, siginfo_t *info, void *ucontext)
{

fprintf(stderr, "sig=%d, si_signo=%d, si_pid=%d, si_uid=%d\n",
	sig, info->si_signo, info->si_pid, info->si_uid);

}

static void
setup(int sig, sigset_t *mask)
{

static const struct sigaction act = { .sa_sigaction = handler, .sa_flags = SA_SIGINFO };
assert(!sigaction(sig, &act, NULL));
assert(!sigaddset(mask, sig));

}

static void
test(int sig)
{

assert(!raise(sig));
assert(!kill(getpid(), sig));

}

int
main(void)
{

sigset_t mask;
assert(!sigemptyset(&mask));

setup(SIGFPE, &mask);
setup(SIGTERM, &mask);

assert(!sigprocmask(SIG_UNBLOCK, &mask, NULL));

test(SIGFPE);
test(SIGTERM);

return 0;

}

$ gcc -m32 -O2 sigfpe.c && ./a.out
sig=8, si_signo=8, si_pid=12345, si_uid=1000
sig=8, si_signo=8, si_pid=1000, si_uid=0
sig=15, si_signo=15, si_pid=12345, si_uid=1000
sig=15, si_signo=15, si_pid=12345, si_uid=1000

Could you make these buggy sparc kernels fixed, please?

@jrtc27
Copy link
Contributor

jrtc27 commented Jan 3, 2018

jrtc27@deb4g:~/tmp/sigfpe$ gcc -O2 sigfpe.c -o sigfpe64
jrtc27@deb4g:~/tmp/sigfpe$ ./sigfpe64
sig=8, si_signo=8, si_pid=135186, si_uid=1008
sig=8, si_signo=8, si_pid=135186, si_uid=1008
sig=15, si_signo=15, si_pid=135186, si_uid=1008
sig=15, si_signo=15, si_pid=135186, si_uid=1008
jrtc27@deb4g:~/tmp/sigfpe$ gcc -m32 -O2 sigfpe.c -o sigfpe32
jrtc27@deb4g:~/tmp/sigfpe$ ./sigfpe32
sig=8, si_signo=8, si_pid=135192, si_uid=1008
sig=8, si_signo=8, si_pid=1008, si_uid=0
sig=15, si_signo=15, si_pid=135192, si_uid=1008
sig=15, si_signo=15, si_pid=135192, si_uid=1008

@ldv-alt
Copy link
Member

ldv-alt commented Jan 3, 2018

yes, sparc64 doesn't suffer from this, only sparc does.

@DerDakon
Copy link

DerDakon commented Jan 3, 2018

Has this test been working before or is this a new test?

@ldv-alt
Copy link
Member

ldv-alt commented Jan 3, 2018

The test used to pass on bender.sparc.dev.gentoo.org, "uname -a" used to print the following:
Linux bender 4.5.0 #4 SMP Thu Mar 24 18:28:58 UTC 2016 sparc64 sun4v UltraSparc T1 (Niagara) GNU/Linux

@DerDakon
Copy link

DerDakon commented Jan 3, 2018

ldv-alt added a commit that referenced this issue Apr 11, 2018
Recent kernel siginfo changes, namely, v4.14-rc1~60^2^2~1 and
v4.16-rc1~159^2~10, introduced ABI regressions that render
the whole siginfo interface unreliable.

Looks like the kernel side is not eager to fix the breakage,
so here is a workaround.

* tests/signal_receive.c (s_sig, s_code, s_pid, s_uid): New volatile
variables.
(handler): Add siginfo_t parameter, save siginfo_t fields.
(sig_print): Remove.
(main): Rewrite.  Use variables saved by handler to print expected
siginfo output. Print diagnostics in case of siginfo mismatch.

Closes: #21
ldv-alt added a commit that referenced this issue Apr 12, 2018
Recent kernel siginfo changes, namely, v4.14-rc1~60^2^2~1 and
v4.16-rc1~159^2~10, introduced ABI regressions that render
the whole siginfo interface unreliable.

Looks like the kernel side is not eager to fix the breakage,
so here is a workaround.

* tests/signal_receive.c (s_sig, s_code, s_pid, s_uid): New volatile
variables.
(handler): Add siginfo_t parameter, save siginfo_t fields.
(sig_print): Remove.
(main): Rewrite.  Use variables saved by handler to print expected
siginfo output. Print diagnostics in case of siginfo mismatch.
* strace.spec.in (%check): Extract the diagnostics.

Closes: #21
avagin pushed a commit to avagin/linux that referenced this issue Apr 25, 2018
Starting with commit v4.14-rc1~60^2^2~1, a SIGFPE signal sent via kill
results to wrong values in si_pid and si_uid fields of compat siginfo_t.

This happens due to FPE_FIXME being defined to 0 for sparc, and at the
same time siginfo_layout() introduced by the same commit returns
SIL_FAULT for SIGFPE if si_code == SI_USER and FPE_FIXME is defined to 0.

Fix this regression by removing FPE_FIXME macro and changing all its users
to assign FPE_FLTUNK to si_code instead of FPE_FIXME.

Note that FPE_FLTUNK is a new macro introduced by commit
266da65.

Tested with commit v4.16-11958-g16e205cf42da.

This bug was found by strace test suite.

In the discussion about FPE_FLTUNK on sparc David Miller said:
> Eric, feel free to do something similar on Sparc.

Link: strace/strace#21
Fixes: cc73152 ("signal: Remove kernel interal si_code magic")
Fixes: 2.3.41
Cc: David Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Conceptually-Acked-By: David Miller <davem@davemloft.net>
Thanks-to: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
avagin pushed a commit to avagin/linux that referenced this issue Jun 6, 2018
Starting with commit v4.14-rc1~60^2^2~1, a SIGFPE signal sent via kill
results to wrong values in si_pid and si_uid fields of compat siginfo_t.

This happens due to FPE_FIXME being defined to 0 for sparc, and at the
same time siginfo_layout() introduced by the same commit returns
SIL_FAULT for SIGFPE if si_code == SI_USER and FPE_FIXME is defined to 0.

Fix this regression by removing FPE_FIXME macro and changing all its users
to assign FPE_FLTUNK to si_code instead of FPE_FIXME.

Note that FPE_FLTUNK is a new macro introduced by commit
266da65.

Tested with commit v4.16-11958-g16e205cf42da.

This bug was found by strace test suite.

Link: strace/strace#21
Fixes: cc73152 ("signal: Remove kernel interal si_code magic")
Thanks-to: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants