Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strace hangs on ia64 (fresh kernel) #33

Closed
trofi opened this Issue Jan 30, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@trofi
Copy link
Contributor

trofi commented Jan 30, 2018

It's a version of downstream bug: https://bugs.gentoo.org/518130

I've recently upgraded kernel on ia64 machine
from 3.14.14-gentoo (GCC: Gentoo 4.9.3 p1.2, pie-0.6.3)
to 4.9.72-gentoo (GCC: Gentoo 6.4.0-r1 p1.3)

and strace started hanging up there. It might be a state decoding problem in strace or a bug in kernel.

slyfox@guppy 0:0 ~ $ strace -d ls
strace: ptrace_setoptions = 0x51
strace: new tcb for pid 10345, active tcbs:1
strace: [wait(0x80137f) = 10345] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
strace: pid 10345 has TCB_STARTUP, initializing it
strace: [wait(0x80057f) = 10345] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: [wait(0x00127f) = 10345] WIFSTOPPED,sig=SIGCONT
strace: [wait(0x00857f) = 10345] WIFSTOPPED,sig=133
????

I can reproduce the strace hangun in ski (ia64 emulator on current upstream kernel). When added a few printk() statements in kernel I found the place where 133 code comes from: http://elixir.free-electrons.com/linux/latest/source/include/linux/tracehook.h#L66

static inline int ptrace_report_syscall(struct pt_regs *regs)
{
	int ptrace = current->ptrace;

	if (!(ptrace & PT_PTRACED))
		return 0;

	ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
...

which looks like normal mode of operation. But looks like strace does not recognize it and outputs raw sig=133 message. There is a possibility of kernel being miscompiled by new compiler as gdb hangs the same way.

x86_64 also does not decode that sig=133.

@ldv-alt

This comment has been minimized.

Copy link
Member

ldv-alt commented Jan 31, 2018

I've added some debug printing to get_regs:
$ ./strace -d /
./strace: ptrace_setoptions = 0x51
./strace: new tcb for pid 11080, active tcbs:1
./strace: [wait(0x80137f) = 11080] WIFSTOPPED,sig=SIGSTOP,EVENT_STOP (128)
./strace: pid 11080 has TCB_STARTUP, initializing it
./strace: [wait(0x80057f) = 11080] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
./strace: [wait(0x00127f) = 11080] WIFSTOPPED,sig=SIGCONT
./strace: [wait(0x00857f) = 11080] WIFSTOPPED,sig=133
./strace: get_regs: get_regs_error: Input/output error
????

Looks like ptrace(PTRACE_GETREGS) always fails with EIO on this new kernel.

@ldv-alt

This comment has been minimized.

Copy link
Member

ldv-alt commented Jan 31, 2018

133 == SIGTRAP | 0x80, strace doesn't decode it in -d output, most probably because the main audience of strace -d is familiar with it.

@esyr esyr added the kernel bug label Jan 31, 2018

@trofi

This comment has been minimized.

Copy link
Contributor Author

trofi commented Jan 31, 2018

That's very helpful! Thank you!

Looking at kernel side of unwinder (used to read registers).

fengguang added a commit to 0day-ci/linux that referenced this issue Feb 5, 2018

ia64: fix ptrace(PTRACE_GETREGS) (unbreaks strace, gdb)
The strace breakage looks like that:
./strace: get_regs: get_regs_error: Input/output error

It happens because ia64 needs to load unwind tables
to read certain registers. Unwind tables fail to load
due to GCC quirk on the following code:

    extern char __end_unwind[];
    const struct unw_table_entry *end = (struct unw_table_entry *)table_end;
    table->end = segment_base + end[-1].end_offset;

GCC does not generate correct code for this single memory
reference after constant propagation (see https://gcc.gnu.org/PR84184).
Two triggers are required for bad code generation:
- '__end_unwind' has alignment lower (char), than
  'struct unw_table_entry' (8).
- symbol offset is negative.

This commit workarounds it by fixing alignment of '__end_unwind'.
While at it use hidden symbols to generate shorter gp-relative
relocations.

CC: Tony Luck <tony.luck@intel.com>
CC: Fenghua Yu <fenghua.yu@intel.com>
CC: linux-ia64@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Bug: strace/strace#33
Bug: https://gcc.gnu.org/PR84184
Reported-by: Émeric Maschino <emeric.maschino@gmail.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
@trofi

This comment has been minimized.

Copy link
Contributor Author

trofi commented Feb 5, 2018

Confirmed it was ptrace(PTRACE_GETREGS) bug. With 0day-ci/linux@97d0c59 applied ia64 machine can run strace without problems. Thanks again!

@ldv-alt

This comment has been minimized.

Copy link
Member

ldv-alt commented Feb 8, 2018

The kernel seems to be fixed now, with
Linux guppy 4.9.72-gentoo #3 SMP Mon Feb 5 23:10:14 UTC 2018 ia64 Dual-Core Intel(R) Itanium(R) Processor 9040 GenuineIntel GNU/Linux
strace passes tests again, thanks!

@ldv-alt ldv-alt closed this Feb 8, 2018

esyr-rh pushed a commit to esyr-rh/strace that referenced this issue Feb 11, 2018

Add ability to specify syscalls for specific personalities
* syscall.c (personality_designators): New array.
* defs.h (personality_designators): New declaration.
* basic_filters.c (qualify_syscall_separate_personality,
qualify_syscall_number_personality): New function.
(qualify_syscall_number): Use qualify_syscall_separate_personality for
checking for a personality specification, call
qualify_syscall_number_personality for setting number set for specific
personality.
(qualify_syscall_name_personality): New function.
(qualify_syscall_name): Use qualify_syscall_separate_personality for
checking for a personality specification, call
qualify_syscall_name_personality for setting number set for specific
personality.
* strace.1.in (.SS Filtering): Document it.
* NEWS: Mention it.

Closes: strace#33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.