Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rr doesn't work on newer libcs #2428

Closed
emilio opened this issue Jan 13, 2020 · 11 comments
Closed

rr doesn't work on newer libcs #2428

emilio opened this issue Jan 13, 2020 · 11 comments

Comments

@emilio
Copy link
Contributor

emilio commented Jan 13, 2020

Recording any program that uses something like clock_getres fails with newer glibc.

Simplest example is $ ./bin/rr ./bin/clock

I noticed this syscall is unsupported, and thought it'd be a matter of just supporting it... But apparently the code that's segfaulting is the INLINE_VSYSCALL macro inside glibc.

The libc code we're hitting looks like this:

int                                                                                                                                                                                                                          │
│   29          __clock_getres64 (clockid_t clock_id, struct __timespec64 *res)                                                                                                                                                              │
│   30          {                                                                                                                                                                                                                            │
│   31          #ifdef __ASSUME_TIME64_SYSCALLS                                                                                                                                                                                              │
│   32            /* 64 bit ABIs or Newer 32-bit ABIs that only support 64-bit time_t.  */                                                                                                                                                   │
│   33          # ifndef __NR_clock_getres_time64                                                                                                                                                                                            │
│   34          #  define __NR_clock_getres_time64 __NR_clock_getres                                                                                                                                                                         │
│   35          # endif                                                                                                                                                                                                                      │
│   36          # ifdef HAVE_CLOCK_GETRES64_VSYSCALL                                                                                                                                                                                         │
│  >37            return INLINE_VSYSCALL (clock_getres_time64, 2, clock_id, res);

I don't know how we deal with vsyscalls, but I'm happy to give fixing this a shot with some pointers.

@khuey
Copy link
Collaborator

khuey commented Jan 13, 2020

Can you disassemble exactly what we're crashing on (by attaching the emergency debugger to the recorded session if necessary)?

@emilio
Copy link
Contributor Author

emilio commented Jan 13, 2020

This is what I see from the replay:

(rr) disas
Dump of assembler code for function __clock_getres:
   0x00007fd3645ac6b0 <+0>:	endbr64 
   0x00007fd3645ac6b4 <+4>:	push   %r12
   0x00007fd3645ac6b6 <+6>:	mov    %rsi,%r12
   0x00007fd3645ac6b9 <+9>:	push   %rbp
   0x00007fd3645ac6ba <+10>:	mov    %edi,%ebp
   0x00007fd3645ac6bc <+12>:	sub    $0x8,%rsp
   0x00007fd3645ac6c0 <+16>:	mov    0xf97a1(%rip),%rax        # 0x7fd3646a5e68
   0x00007fd3645ac6c7 <+23>:	mov    0x1c0(%rax),%rax
   0x00007fd3645ac6ce <+30>:	test   %rax,%rax
   0x00007fd3645ac6d1 <+33>:	je     0x7fd3645ac708 <__clock_getres+88>
   0x00007fd3645ac6d3 <+35>:	callq  *%rax
=> 0x00007fd3645ac6d5 <+37>:	movslq %eax,%rdx
   0x00007fd3645ac6d8 <+40>:	cmp    $0xfffffffffffff000,%rdx
   0x00007fd3645ac6df <+47>:	jbe    0x7fd3645ac6fc <__clock_getres+76>
   0x00007fd3645ac6e1 <+49>:	cmp    $0xffffffffffffffda,%rdx
   0x00007fd3645ac6e5 <+53>:	je     0x7fd3645ac708 <__clock_getres+88>
   0x00007fd3645ac6e7 <+55>:	mov    0xf978a(%rip),%rax        # 0x7fd3646a5e78
   0x00007fd3645ac6ee <+62>:	neg    %edx
   0x00007fd3645ac6f0 <+64>:	mov    %edx,%fs:(%rax)
   0x00007fd3645ac6f3 <+67>:	mov    $0xffffffff,%eax
   0x00007fd3645ac6f8 <+72>:	endbr64 
   0x00007fd3645ac6fc <+76>:	add    $0x8,%rsp
   0x00007fd3645ac700 <+80>:	pop    %rbp
   0x00007fd3645ac701 <+81>:	pop    %r12
   0x00007fd3645ac703 <+83>:	retq   
   0x00007fd3645ac704 <+84>:	nopl   0x0(%rax)
   0x00007fd3645ac708 <+88>:	mov    %r12,%rsi
   0x00007fd3645ac70b <+91>:	mov    %ebp,%edi
   0x00007fd3645ac70d <+93>:	mov    $0xe5,%eax
   0x00007fd3645ac712 <+98>:	syscall 
   0x00007fd3645ac714 <+100>:	mov    %rax,%rdx
   0x00007fd3645ac717 <+103>:	cmp    $0xfffffffffffff000,%rax
   0x00007fd3645ac71d <+109>:	ja     0x7fd3645ac6e7 <__clock_getres+55>
   0x00007fd3645ac71f <+111>:	add    $0x8,%rsp
   0x00007fd3645ac723 <+115>:	pop    %rbp
   0x00007fd3645ac724 <+116>:	pop    %r12
   0x00007fd3645ac726 <+118>:	retq

I don't see any emergency debugger instructions.

$ gdb --args ./bin/rr ./bin/clock somewhat-surprisingly passes...

@khuey
Copy link
Collaborator

khuey commented Jan 13, 2020

Oh, we're dying during replay? What is rax?

@emilio
Copy link
Contributor Author

emilio commented Jan 13, 2020

During record too:

$ ./bin/rr ./bin/clock                                                                                                                                                                                                                     
rr: Saving execution to trace directory `/home/emilio/.local/share/rr/clock-36'.
[1]    1602664 segmentation fault  ./bin/rr ./bin/clock

But it seems it's the recorded process that dies... rax is 1:

(rr) info registers
rax            0x1                 1
rbx            0x401640            4200000
rcx            0x1                 1
rdx            0x7ffee55b7f58      140732746399576
rsi            0x7ffee55b7e38      140732746399288
rdi            0x1                 1
rbp            0x1                 0x1
rsp            0x7ffee55b7dc0      0x7ffee55b7dc0
r8             0x0                 0
r9             0x7f7f93a108e0      140185914378464
r10            0x7f7f93a1e430      140185914434608
r11            0x7f7f938726b0      140185912682160
r12            0x7ffee55b7e38      140732746399288
r13            0x7ffee55b7f40      140732746399552
r14            0x0                 0
r15            0x0                 0
rip            0x7f7f938726d5      0x7f7f938726d5 <__clock_getres+37>
eflags         0x10293             [ CF AF SF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
fs_base        0x7f7f937a9180      0x7f7f937a9180
gs_base        0x0                 0x0

@rocallahan
Copy link
Collaborator

What version of glibc is this?

Can you pack that trace and put it somewhere I can get it?

@emilio
Copy link
Contributor Author

emilio commented Jan 13, 2020

This is glibc 2.30.9000. Trace is here.

@rocallahan
Copy link
Collaborator

@emilio Try this patch:

diff --git a/src/Monkeypatcher.cc b/src/Monkeypatcher.cc
index 154beda59..22fa6b003 100644
--- a/src/Monkeypatcher.cc
+++ b/src/Monkeypatcher.cc
@@ -732,7 +732,7 @@ void patch_after_exec_arch<X64Arch>(RecordTask* t, Monkeypatcher& patcher) {
 
   static const named_syscall syscalls_to_monkeypatch[] = {
 #define S(n) { "__vdso_" #n, X64Arch::n }
-    S(clock_gettime), S(gettimeofday), S(time), S(getcpu),
+    S(clock_gettime), S(clock_getres), S(gettimeofday), S(time), S(getcpu),
 #undef S
   };
 

@rocallahan
Copy link
Collaborator

Well, that patch is pretty clearly the right thing to do given modern VDSOs support clock_getres (added in torvalds/linux@f66501d, June last year). I guess glibc just started using it. So I pushed that patch as e4e0f51.

@rocallahan
Copy link
Collaborator

(Of course, reopen if that patch fails to get things working for you.)

@emilio
Copy link
Contributor Author

emilio commented Jan 14, 2020

Yeah, that wfm, thanks! Should that also be in the x86 syscalls_to_monkeypatch?

@rocallahan
Copy link
Collaborator

Yes, good catch. Done: bf625d2

wiedzmin referenced this issue in wiedzmin/nixos-config Nov 23, 2020
Using unstable `rr` because of no new stable releases.
Were hit by `https://github.com/rr-debugger/rr/issues/2428`
in production, but it only fixed in master branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants