Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in diversion sessions for function calls from gdb since 5.4.0, affecting Chromium #2829

Closed
dbaron opened this issue Mar 29, 2021 · 14 comments

Comments

@dbaron
Copy link

dbaron commented Mar 29, 2021

@khuey just landed 4 patches (35d477f, 0bc6138, db2dcd1, e679d6a) to fix Chromium issue 1193532 (which I probably should have filed here rather than discussing by email, at some point after asking "is this an rr bug").

However, testing with rr trunk revealed that there's a distinct regression with the same steps to reproduce in that bug.

The new failure is this:

For what it's worth, the new problem is this:

(rr) p &object
$1 = (const blink::LayoutObject *) 0x25780060c010
(rr) call showTree($1)
[1539974:1:0100/000000.481715:FATAL:unix_domain_socket.cc(278)] Check failed: false.

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
base::debug::(anonymous namespace)::DebugBreak ()
    at ../../base/debug/debugger_posix.cc:321
321 }
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(showTree(blink::LayoutObject const*)) will be abandoned.
When the function is done executing, GDB will silently stop.
(rr) bt 40
#0  base::debug::(anonymous namespace)::DebugBreak() ()
    at ../../base/debug/debugger_posix.cc:321
#1  0x00007f8459498eb5 in base::debug::BreakDebugger() ()
    at ../../base/debug/debugger_posix.cc:344
#2  0x00007f8459261435 in logging::LogMessage::~LogMessage()
    (this=0x2fe0993577a0) at ../../base/logging.cc:888
#3  0x00007f8459261569 in logging::LogMessage::~LogMessage()
    (this=0x2fe0993577a0) at ../../base/logging.cc:559
#4  0x00007f84591d1c7b in logging::CheckError::~CheckError()
    (this=0x7ffe9600c9b8) at ../../base/check.cc:106
#5  0x00007f84594ba7f2 in base::UnixDomainSocket::SendRecvMsgWithFlags(int, unsigned char*, unsigned int, int, int*, base::Pickle const&)
    (fd=4, reply=0x7ffe9600cb70 "\020", max_reply_len=512, recvmsg_flags=0, result_fd=0x0, request=...) at ../../base/posix/unix_domain_socket.cc:278
#6  0x00007f84594ba573 in base::UnixDomainSocket::SendRecvMsg(int, unsigned char*, unsigned int, int*, base::Pickle const&)
    (fd=4, reply=0x7ffe9600cb70 "\020", max_reply_len=512, result_fd=0x0, request=...) at ../../base/posix/unix_domain_socket.cc:236
#7  0x00007f84336ebf9f in sandbox::(anonymous namespace)::ProxyLocaltimeCallToBrowser(long, tm*, char*, unsigned long)
    (input=1617058425, output=0x7ffe9600ce88, timezone_out=0x0, timezone_out_len=0) at ../../sandbox/linux/services/libc_interceptor.cc:130
#8  0x00007f84336ec254 in localtime_r
    (timep=0x7ffe9600cec0, result=0x7ffe9600ce88)
    at ../../sandbox/linux/services/libc_interceptor.cc:279
#9  0x00007f845926096f in logging::LogMessage::Init(char const*, int)
    (this=0x7ffe9600d0e8, file=0x7f84430eb693 "../../third_party/blink/renderer/core/dom/node.cc", line=3343) at ../../base/logging.cc:939
#10 0x00007f845926078f in logging::LogMessage::LogMessage(char const*, int, int) (this=0x7ffe9600d0e8, file=0x7f84430eb693 "../../third_party/blink/renderer/core/dom/node.cc", line=3343, severity=0) at ../../base/logging.cc:550
#11 0x00007f8444f055b0 in showTree(blink::Node const*) (node=0x381809101a78)
    at ../../third_party/blink/renderer/core/dom/node.cc:3343
#12 0x00007f8445d7563d in blink::LayoutObject::ShowTreeForThis() const
    (this=0x25780060c010)
    at ../../third_party/blink/renderer/core/layout/layout_object.cc:2014
#13 0x00007f8445d803d9 in showTree(blink::LayoutObject const*)
    (object=0x25780060c010)
    at ../../third_party/blink/renderer/core/layout/layout_object.cc:4791
#14 0x00007ffe9600d3df in <function called from gdb> ()
#15 blink::(anonymous namespace)::HasBlockingEventHandlerHelper(blink::LayoutObject const&, blink::(anonymous namespace)::BlockingEventHandlerType)
--Type <RET> for more, q to quit, c to continue without paging--
    (object=..., event_type=blink::(anonymous namespace)::BlockingEventHandlerType::kTouchStartOrMoveBlockingEventHandler)
    at ../../third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc:350
#16 0x00007f84464a731a in blink::(anonymous namespace)::HasBlockingTouchEventHandler(blink::LayoutObject const&) (object=...)
    at ../../third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc:370
#17 0x00007f84464a72c2 in blink::PrePaintTreeWalk::UpdateEffectiveAllowedTouchAction(blink::LayoutObject const&, blink::PrePaintTreeWalk::PrePaintTreeWalkContext&) (this=0x7ffe9600dc00, object=..., context=...)
    at ../../third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc:389
#18 0x00007f84464a7f2e in blink::PrePaintTreeWalk::WalkInternal(blink::LayoutObject const&, blink::NGFragmentChildIterator const*, blink::PrePaintTreeWalk::PrePaintTreeWalkContext&) (this=0x7ffe9600dc00, object=..., iterator=
    0x0, context=...)
    at ../../third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc:614
#19 0x00007f84464a6ee4 in blink::PrePaintTreeWalk::Walk(blink::LayoutObject const&, blink::NGFragmentChildIterator const*)
    (this=0x7ffe9600dc00, object=..., iterator=0x0)
    at ../../third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc:994
#20 0x00007f84464a6a55 in blink::PrePaintTreeWalk::Walk(blink::LocalFrameView&)
    (this=0x7ffe9600dc00, frame_view=...)
    at ../../third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc:306
#21 0x00007f84464a6172 in blink::PrePaintTreeWalk::WalkTree(blink::LocalFrameView&) (this=0x7ffe9600dc00, root_frame_view=...)
    at ../../third_party/blink/renderer/core/paint/pre_paint_tree_walk.cc:211
#22 0x00007f84453d6a2d in blink::LocalFrameView::RunPrePaintLifecyclePhase(blink::DocumentLifecycle::LifecycleState)
    (this=0x2981da4c56a0, target_state=blink::DocumentLifecycle::kPaintClean)
    at ../../third_party/blink/renderer/core/frame/local_frame_view.cc:2874
#23 0x00007f84453d59d4 in blink::LocalFrameView::UpdateLifecyclePhasesInternal(blink::DocumentLifecycle::LifecycleState)
    (this=0x2981da4c56a0, target_state=blink::DocumentLifecycle::kPaintClean)
    at ../../third_party/blink/renderer/core/frame/local_frame_view.cc:2607
#24 0x00007f84453d457d in blink::LocalFrameView::UpdateLifecyclePhases(blink::DocumentLifecycle::LifecycleState, blink::DocumentUpdateReason)
    (this=0x2981da4c56a0, target_state=blink::DocumentLifecycle::kPaintClean, reason=blink::DocumentUpdateReason::kSizeChange)
    at ../../third_party/blink/renderer/core/frame/local_frame_view.cc:2515
#25 0x00007f84453d3ee0 in blink::LocalFrameView::UpdateAllLifecyclePhases(blink::DocumentUpdateReason)
    (this=0x2981da4c56a0, reason=blink::DocumentUpdateReason::kSizeChange)
    at ../../third_party/blink/renderer/core/frame/local_frame_view.cc:2255
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(rr) f 5
#5  0x00007f84594ba7f2 in base::UnixDomainSocket::SendRecvMsgWithFlags (fd=4,
    reply=0x7ffe9600cb70 "\020", max_reply_len=512, recvmsg_flags=0,
    result_fd=0x0, request=...) at ../../base/posix/unix_domain_socket.cc:278
278    NOTREACHED();
(rr) bt 5
#0  base::debug::(anonymous namespace)::DebugBreak() ()
    at ../../base/debug/debugger_posix.cc:321
#1  0x00007f8459498eb5 in base::debug::BreakDebugger() ()
    at ../../base/debug/debugger_posix.cc:344
#2  0x00007f8459261435 in logging::LogMessage::~LogMessage()
    (this=0x2fe0993577a0) at ../../base/logging.cc:888
#3  0x00007f8459261569 in logging::LogMessage::~LogMessage()
    (this=0x2fe0993577a0) at ../../base/logging.cc:559
#4  0x00007f84591d1c7b in logging::CheckError::~CheckError()
    (this=0x7ffe9600c9b8) at ../../base/check.cc:106
(More stack frames follow...)
(rr) l
273    return -1;
274
275  // If we received more file descriptors than caller expected, then we treat
276  // that as an error.
277  if (recv_fds.size() > (result_fd != nullptr ? 1 : 0)) {
278    NOTREACHED();
279    return -1;
280  }
281
282  if (result_fd)
(rr)

git bisect shows that this is a regression from #2748:
[4f99ff0] Propagate SIGTRAP to client during diversion

@khuey
Copy link
Collaborator

khuey commented Mar 30, 2021

While this is perhaps a regression in the most literal sense, I think our post #2748 behavior is obviously more correct, and the real problem here is that logging anything in chromium (with timestamps enabled) triggers an IPC roundtrip (to remote localtime(3))

@rocallahan
Copy link
Collaborator

logging anything in chromium (with timestamps enabled) triggers an IPC roundtrip (to remote localtime(3))

This is a bit shocking, honestly.

@cbiesinger
Copy link

Maybe the chromium sandbox does not allow calls to localtime_r?

@rocallahan
Copy link
Collaborator

I'm not surprised it blocks calls to localtime. I am surprised that an IPC roundtrip per message is considered reasonable logging overhead.

@cbiesinger
Copy link

Since gettimeofday() is allowed, I assume it forwards localtime just for the timezone conversion. That's unfortunate.

But I wonder, since the actual IPC call surely didn't work even before that rr patch, how was this working before?

@khuey
Copy link
Collaborator

khuey commented Mar 30, 2021

Before #2748 the SIGTRAP DebugBreak() generates was ignored and execution just proceeded.

@cbiesinger
Copy link

cbiesinger commented Mar 30, 2021

So did the logging code here just use 0/some garbage number?

@khuey
Copy link
Collaborator

khuey commented Mar 30, 2021

@dbaron ^?

@dbaron
Copy link
Author

dbaron commented Mar 30, 2021

The output prior to the patch was:

(rr) call showTree($1)
[1659895:1:0100/000000.526028:FATAL:unix_domain_socket.cc(278)] Check failed: false. 
[1659895:1:0100/000000.524552:INFO:node.cc(3343)] 
*#document
	HTML
		HEAD
		BODY

now that I look more closely.

@dbaron
Copy link
Author

dbaron commented Mar 30, 2021

By the way, is the diversion session used to handle call / print commands from gdb only branching the state of the process being debugged and not the entire replay (i.e., not including the parent process that would need to respond to the IPC message)? Is that why this isn't expected to work? Or is there some other reason that I'm missing?

@rocallahan
Copy link
Collaborator

It does duplicate the entire process set, but other threads never run.

@rocallahan
Copy link
Collaborator

Even if we did run them, IPC wouldn't work because the file descriptors are not set up during replay.

@khuey
Copy link
Collaborator

khuey commented Mar 30, 2021

Right, the biggest obstacle is that the IPC pipe doesn't actually exist during the replay.

@dbaron
Copy link
Author

dbaron commented Mar 30, 2021

OK, sounds like we'll need to find a way around this on the Chromium side.

@dbaron dbaron closed this as completed Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants