Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FATAL src/ReplaySession.cc:1181:check_ticks_consistency()] #3711

Closed
tysmith opened this issue Mar 13, 2024 · 30 comments
Closed

[FATAL src/ReplaySession.cc:1181:check_ticks_consistency()] #3711

tysmith opened this issue Mar 13, 2024 · 30 comments

Comments

@tysmith
Copy link

tysmith commented Mar 13, 2024

This is triggered on replay. Re-recording the same Firefox crash and replaying leads to the same failure.

[FATAL src/ReplaySession.cc:1181:check_ticks_consistency()]
 (task 1450809 (rec:1444446) at time 169218)
 -> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SYSCALL: write'; expected 31488429, got 31488430
Tail of trace dump:
{
  real_time:222636.292945 global_time:169198, event:`SYSCALL: epoll_wait' (state:EXITING_SYSCALL) tid:1444446, ticks:31435157
rax:0x1 rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0x20 rsi:0x7feb00002030 rdi:0x13 rbp:0x7feb1df38960 rsp:0x7feb1d738df0 r8:0x0 r9:0xaaaaaaaaaaaaaaaa r10:0xffffffff r11:0x246 r12:0x7feb1df39640 r13:0x0 r14:0x7feb498947d0 r15:0x7ffce1612990 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe8 fs_base:0x7feb1df39640 gs_base:0x0
  { tid:1444446, addr:0x7feb00002030, length:0xc }
}
{
  real_time:222636.293045 global_time:169199, event:`SIGNAL: SIGCHLD(async)' tid:1444446, ticks:31435157
rax:0x1 rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0x20 rsi:0x7feb00002030 rdi:0x13 rbp:0x7feb1df38960 rsp:0x7feb1d738df0 r8:0x0 r9:0xaaaaaaaaaaaaaaaa r10:0xffffffff r11:0x246 r12:0x7feb1df39640 r13:0x0 r14:0x7feb498947d0 r15:0x7ffce1612990 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe8 fs_base:0x7feb1df39640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x403ade0b6b3a76400000 st7:0x403ade0b6b3a76400012 ymm0:0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ymm1:0x7feb49e234d8 ymm2:0x7feb49e23cb8 ymm3:0x343436392e4544373145383746443432 ymm4:0x203a726e2071657320312e3140364139 ymm5:0x31453837464434323839314342203e3d ymm6:0x352e4136413445343133453335333942 ymm7:0x43312e44384236443843364637314237 ymm8:0x77f6f5c10000ca7c768f57940000ca7c ymm9:0x0 ymm10:0x0 ymm11:0x0 ymm12:0x0 ymm13:0x0 ymm14:0x0 ymm15:0x0
}
{
  real_time:222636.293085 global_time:169200, event:`SIGNAL_DELIVERY: SIGCHLD(async)' tid:1444446, ticks:31435157
rax:0x1 rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0x20 rsi:0x7feb00002030 rdi:0x13 rbp:0x7feb1df38960 rsp:0x7feb1d738df0 r8:0x0 r9:0xaaaaaaaaaaaaaaaa r10:0xffffffff r11:0x246 r12:0x7feb1df39640 r13:0x0 r14:0x7feb498947d0 r15:0x7ffce1612990 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe8 fs_base:0x7feb1df39640 gs_base:0x0
  { tid:1444446, addr:0x7feb1d738df0, length:(nil) }
}
{
  real_time:222636.293161 global_time:169201, event:`SYSCALLBUF_FLUSH' tid:1444446, ticks:31435657
  { syscall:'clock_gettime', ret:0x0, size:0x20 }
  { syscall:'recvmsg', ret:0x0, size:0x398, desched:1 }
}
{
  real_time:222636.293166 global_time:169202, event:`SYSCALL: epoll_ctl' (state:ENTERING_SYSCALL) tid:1444446, ticks:31435657
rax:0xffffffffffffffda rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xa8 rsi:0x2 rdi:0x13 rbp:0x7feb1df381d0 rsp:0x7feb1d738df0 r8:0x0 r9:0x1010 r10:0x7feb1df3828c r11:0x246 r12:0x7feb1df39640 r13:0x0 r14:0x7feb498947d0 r15:0x7ffce1612990 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe9 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293169 global_time:169203, event:`SYSCALLBUF_RESET' tid:1444446, ticks:31435657
}
{
  real_time:222636.293193 global_time:169204, event:`SYSCALL: epoll_ctl' (state:EXITING_SYSCALL) tid:1444446, ticks:31435657
rax:0x0 rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xa8 rsi:0x2 rdi:0x13 rbp:0x7feb1df381d0 rsp:0x7feb1d738df0 r8:0x0 r9:0x1010 r10:0x7feb1df3828c r11:0x246 r12:0x7feb1df39640 r13:0x0 r14:0x7feb498947d0 r15:0x7ffce1612990 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe9 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293376 global_time:169205, event:`SYSCALLBUF_FLUSH' tid:1444446, ticks:31451224
  { syscall:'close', ret:0x0, size:0x10 }
  { syscall:'getpid', ret:0x160a54, size:0x10 }
}
{
  real_time:222636.293382 global_time:169206, event:`SYSCALL: write' (state:ENTERING_SYSCALL) tid:1444446, ticks:31451224
rax:0xffffffffffffffda rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xc8 rsi:0x7feb1df35330 rdi:0x2 rbp:0x7feb1df35330 rsp:0x7feb1d738df0 r8:0x0 r9:0x7fffffff r10:0x0 r11:0x246 r12:0xc8 r13:0x7feb49a1b6a0 r14:0x7feb49a16a00 r15:0xc8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293384 global_time:169207, event:`SYSCALLBUF_RESET' tid:1444446, ticks:31451224
}
{
  real_time:222636.293417 global_time:169208, event:`SYSCALL: write' (state:EXITING_SYSCALL) tid:1444446, ticks:31451224
rax:0xc8 rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xc8 rsi:0x7feb1df35330 rdi:0x2 rbp:0x7feb1df35330 rsp:0x7feb1d738df0 r8:0x0 r9:0x7fffffff r10:0x0 r11:0x246 r12:0xc8 r13:0x7feb49a1b6a0 r14:0x7feb49a16a00 r15:0xc8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293535 global_time:169209, event:`SYSCALLBUF_FLUSH' tid:1444446, ticks:31463554
  { syscall:'futex', ret:0x1, size:0x14 }
  { syscall:'getpid', ret:0x160a54, size:0x10 }
}
{
  real_time:222636.293540 global_time:169210, event:`SYSCALL: write' (state:ENTERING_SYSCALL) tid:1444446, ticks:31463554
rax:0xffffffffffffffda rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xc8 rsi:0x7feb1df35330 rdi:0x2 rbp:0x7feb1df35330 rsp:0x7feb1d738df0 r8:0x0 r9:0x7fffffff r10:0x0 r11:0x246 r12:0xc8 r13:0x7feb49a1b6a0 r14:0x7feb49a16a00 r15:0xc8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293543 global_time:169211, event:`SYSCALLBUF_RESET' tid:1444446, ticks:31463554
}
{
  real_time:222636.293574 global_time:169212, event:`SYSCALL: write' (state:EXITING_SYSCALL) tid:1444446, ticks:31463554
rax:0xc8 rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xc8 rsi:0x7feb1df35330 rdi:0x2 rbp:0x7feb1df35330 rsp:0x7feb1d738df0 r8:0x0 r9:0x7fffffff r10:0x0 r11:0x246 r12:0xc8 r13:0x7feb49a1b6a0 r14:0x7feb49a16a00 r15:0xc8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293674 global_time:169213, event:`SYSCALLBUF_FLUSH' tid:1444446, ticks:31475979
  { syscall:'futex', ret:0x1, size:0x14 }
  { syscall:'getpid', ret:0x160a54, size:0x10 }
}
{
  real_time:222636.293678 global_time:169214, event:`SYSCALL: write' (state:ENTERING_SYSCALL) tid:1444446, ticks:31475979
rax:0xffffffffffffffda rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xc8 rsi:0x7feb1df35330 rdi:0x2 rbp:0x7feb1df35330 rsp:0x7feb1d738df0 r8:0x0 r9:0x7fffffff r10:0x0 r11:0x246 r12:0xc8 r13:0x7feb49a1b6a0 r14:0x7feb49a16a00 r15:0xc8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293681 global_time:169215, event:`SYSCALLBUF_RESET' tid:1444446, ticks:31475979
}
{
  real_time:222636.293704 global_time:169216, event:`SYSCALL: write' (state:EXITING_SYSCALL) tid:1444446, ticks:31475979
rax:0xc8 rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xc8 rsi:0x7feb1df35330 rdi:0x2 rbp:0x7feb1df35330 rsp:0x7feb1d738df0 r8:0x0 r9:0x7fffffff r10:0x0 r11:0x246 r12:0xc8 r13:0x7feb49a1b6a0 r14:0x7feb49a16a00 r15:0xc8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293797 global_time:169217, event:`SYSCALLBUF_FLUSH' tid:1444446, ticks:31488429
  { syscall:'getpid', ret:0x160a54, size:0x10 }
}
{
  real_time:222636.293801 global_time:169218, event:`SYSCALL: write' (state:ENTERING_SYSCALL) tid:1444446, ticks:31488429
rax:0xffffffffffffffda rbx:0x7feb1d738fa0 rcx:0xffffffffffffffff rdx:0xc8 rsi:0x7feb1df35330 rdi:0x2 rbp:0x7feb1df35330 rsp:0x7feb1d738df0 r8:0x0 r9:0x7fffffff r10:0x0 r11:0x246 r12:0xc8 r13:0x7feb49a1b6a0 r14:0x7feb49a16a00 r15:0xc8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7feb1df39640 gs_base:0x0
}
{
  real_time:222636.293804 global_time:169219, event:`SYSCALLBUF_RESET' tid:1444446, ticks:31488429
}
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x60)[0x561c94e77d02]
rr(_ZN2rr15emergency_debugEPNS_4TaskE+0xf3)[0x561c94c76c95]
rr(+0x44e1ae)[0x561c94c7b1ae]
rr(_ZN2rr21EmergencyDebugOstreamD1Ev+0x63)[0x561c94c7b471]
rr(_ZN2rr13ReplaySession23check_ticks_consistencyEPNS_10ReplayTaskERKNS_5EventE+0x143)[0x561c94da245b]
rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x948)[0x561c94da7032]
rr(_ZN2rr13ReplaySession11replay_stepENS_10RunCommandE+0x4f)[0x561c94d9b043]
rr(+0x56c3b7)[0x561c94d993b7]
rr(+0x56cd6c)[0x561c94d99d6c]
rr(_ZN2rr13ReplayCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x52a)[0x561c94d9ad0c]
rr(main+0x278)[0x561c94c8f014]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f9b80429d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f9b80429e40]
rr(_start+0x25)[0x561c94b727c5]
=== End rr backtrace
@khuey
Copy link
Collaborator

khuey commented Mar 13, 2024

What CPU did this happen on?

If an AMD system, does it reproduce on Intel hardware?

@tysmith
Copy link
Author

tysmith commented Mar 13, 2024

I am using an Intel i7-8650U.

@khuey
Copy link
Collaborator

khuey commented Mar 13, 2024

Can you get a recording with the --intel-pt option that reproduces the replay failure?

https://github.com/rr-debugger/rr/wiki/Debugging-divergence#intel-pt

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

Running with --intel-pt leads to:

[FATAL src/PerfCounters.cc:768:open() errno: EPERM] Can't allocate memory for PT AUX area

How do I "Make sure that the "max locked memory" limit is very high..."? Could that be the issue with --intel-pt?

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

You need to set ulimit -l really high (e.g. ulimit -l unlimited, though that may consume all your RAM).

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

I can't figure out how to do what is needed on Ubuntu.

ulimit -l unlimited returns
bash: ulimit: max locked memory: cannot modify limit: Operation not permitted

I tried modifying /etc/security/limits.conf and rebooting and that doesn't work either.

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

Got it. Typo in my /etc/security/limits.conf entry.

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

Same result: [FATAL src/PerfCounters.cc:768:open() errno: EPERM] Can't allocate memory for PT AUX area but it uses a lot more memory first.

@rocallahan
Copy link
Collaborator

rocallahan commented Mar 14, 2024 via email

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

Looks like Firefox does not support it.
Running Nightly as root in a regular user's session is not supported

I also tried --headless to hopefully same some memory but it still fails to launch,

@rocallahan
Copy link
Collaborator

rocallahan commented Mar 14, 2024 via email

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

I disabled a few extra browser processes and managed to get it to launch. After using up most of the available memory and hanging for a long period I got this:

[FATAL src/PerfCounters.cc:811:flush()] Unexpected AUX packet flags 1

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

And running rr record --intel-pt ls works quickly as expected.

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

that's PERF_AUX_FLAG_TRUNCATED

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

Can you share the steps to reproduce (privately, if necessary)?

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

I am unable to reproduce the issue independent of the tools we use in automation.

STR:

It can take a few minutes for the crash. The rr command is actually created in ffpuppet[1] so if that needs to change you can either modify the version installed by pip or pull down a copy from GitHub, modify that and do a local install of ffpuppet. Let me know if that works.

[1] https://github.com/MozillaSecurity/ffpuppet/blob/master/src/ffpuppet/core.py#L479

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

Is that supposed to be http://general.fund instead of http://genral.fund?

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

Assuming it's supposed to be http://general.fund, the bad news is that I can't reproduce the rr issue. The good news is that I can reproduce the Firefox issue. Do you want me to upload it to Pernosco?

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

Yep sorry that was a typo. I was able to capture it without the tools this morning and submit to Pernosco already, thanks.

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

What rr rev are you recording with?

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

2bb38b9

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

Ok, that's what I tried. I can't reproduce an rr divergence here unfortunately.

How much RAM does your machine have?

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

16GB

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

You could try reducing PT_PERF_AUX_SIZE in src/PerfCounters.cc

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

OK I set PT_PERF_AUX_SIZE to 32*1024*1024 and now I get the following error: [FATAL src/PerfCounters.cc:811:flush()] Unexpected AUX packet flags 1

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

Ok that's the same issue as before unfortunately.

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

I just rebuilt with clang 14 and can't reproduce.

@khuey
Copy link
Collaborator

khuey commented Mar 14, 2024

rebuilt rr?

@tysmith
Copy link
Author

tysmith commented Mar 14, 2024

Yes

@khuey
Copy link
Collaborator

khuey commented Mar 17, 2024

Alright, if even you can't reproduce it there's nothing actionable left here.

@khuey khuey closed this as not planned Won't fix, can't repro, duplicate, stale Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants