Skip to content

Conversation

fmela
Copy link

@fmela fmela commented May 4, 2013

This pull request will demangle C++ names and makes stack traces far more readable.

Example before:

Sat May  4 14:04:52.352   Assertion failure isABSONObj() src/mongo/db/../bson/bson-inl.h 183
0x10017116b 0x100135396 0x10001bbde 0x100001e21 0x7fff8d4af94a 0 0x10015f232 0x10015fdad 0x100160095 0x10007104d 0x1000a2581 0x1000627bf 0x10007e3eb 0x100127a85 0x10010f92f 0x3258efc4c4db
 0   mongo                               0x000000010017116b _ZN5mongo15printStackTraceERSo + 43
 1   mongo                               0x0000000100135396 _ZN5mongo12verifyFailedEPKcS1_j + 310
 2   mongo                               0x000000010001bbde _ZNK5mongo11shell_utils18ConnectionRegistry30killOperationsOnAllConnectionsEb + 1342
 3   mongo                               0x0000000100001e21 _Z10quitNicelyi + 145
 4   libsystem_c.dylib                   0x00007fff8d4af94a _sigtramp + 26
 5   ???                                 0x0000000000000000 0x0 + 0
 6   mongo                               0x000000010015f232 _ZN5mongo13MessagingPort4recvERNS_7MessageE + 130
 7   mongo                               0x000000010015fdad _ZN5mongo13MessagingPort4recvERKNS_7MessageERS1_ + 45
 8   mongo                               0x0000000100160095 _ZN5mongo13MessagingPort4callERNS_7MessageES2_ + 53
 9   mongo                               0x000000010007104d _ZN5mongo18DBClientConnection4callERNS_7MessageES2_bPSs + 77
 10  mongo                               0x00000001000a2581 _ZN5mongo14DBClientCursor4initEv + 161
 11  mongo                               0x00000001000627bf _ZN5mongo12DBClientBase5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii + 191
 12  mongo                               0x000000010007e3eb _ZN5mongo18DBClientConnection5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii + 139
 13  mongo                               0x0000000100127a85 _ZN5mongo9mongoFindEPNS_7V8ScopeERKN2v89ArgumentsE + 965
 14  mongo                               0x000000010010f92f _ZN5mongo7V8Scope10v8CallbackERKN2v89ArgumentsE + 175
 15  ???                                 0x00003258efc4c4db 0x0 + 55357561160923
Sat May  4 14:04:52.354 terminate() called in shell, printing stack:
0x10017116b 0x100001d33 0x7fff862703c9 0x7fff86270424 0x7fff8627158b 0x10013556c 0x10001bbde 0x100001e21 0x7fff8d4af94a 0 0x10015f232 0x10015fdad 0x100160095 0x10007104d 0x1000a2581 0x1000627bf 0x10007e3eb 0x100127a85 0x10010f92f 0x3258efc4c4db
 0   mongo                               0x000000010017116b _ZN5mongo15printStackTraceERSo + 43
 1   mongo                               0x0000000100001d33 _Z11myterminatev + 67
 2   libc++abi.dylib                     0x00007fff862703c9 _ZL19safe_handler_callerPFvvE + 8
 3   libc++abi.dylib                     0x00007fff86270424 __cxa_bad_typeid + 0
 4   libc++abi.dylib                     0x00007fff8627158b _ZL23__gxx_exception_cleanup19_Unwind_Reason_CodeP17_Unwind_Exception + 0
 5   mongo                               0x000000010013556c _ZN5mongo12verifyFailedEPKcS1_j + 780
 6   mongo                               0x000000010001bbde _ZNK5mongo11shell_utils18ConnectionRegistry30killOperationsOnAllConnectionsEb + 1342
 7   mongo                               0x0000000100001e21 _Z10quitNicelyi + 145
 8   libsystem_c.dylib                   0x00007fff8d4af94a _sigtramp + 26
 9   ???                                 0x0000000000000000 0x0 + 0
 10  mongo                               0x000000010015f232 _ZN5mongo13MessagingPort4recvERNS_7MessageE + 130
 11  mongo                               0x000000010015fdad _ZN5mongo13MessagingPort4recvERKNS_7MessageERS1_ + 45
 12  mongo                               0x0000000100160095 _ZN5mongo13MessagingPort4callERNS_7MessageES2_ + 53
 13  mongo                               0x000000010007104d _ZN5mongo18DBClientConnection4callERNS_7MessageES2_bPSs + 77
 14  mongo                               0x00000001000a2581 _ZN5mongo14DBClientCursor4initEv + 161
 15  mongo                               0x00000001000627bf _ZN5mongo12DBClientBase5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii + 191
 16  mongo                               0x000000010007e3eb _ZN5mongo18DBClientConnection5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii + 139
 17  mongo                               0x0000000100127a85 _ZN5mongo9mongoFindEPNS_7V8ScopeERKN2v89ArgumentsE + 965
 18  mongo                               0x000000010010f92f _ZN5mongo7V8Scope10v8CallbackERKN2v89ArgumentsE + 175
 19  ???                                 0x00003258efc4c4db 0x0 + 55357561160923

Example after:

Sat May  4 14:06:17.138   Assertion failure isABSONObj() src/mongo/bson/bson-inl.h 183
0x10c7c1d4d 0x10c79ffcc 0x10c757351 0x10c6e3e0a 0x10c6d5cd5 0x7fff8d4af94a 0x10d5f20a0 0x10c7b6c4c 0x10c7b7291 0x10c7b7628 0x10c71b0ae 0x10c71b2fd 0x10c73e39a 0x10c718b70 0x10c724532 0x10c79a637 0x10c783882 0x23cf5ea4ce17
 0          0x10c7c1d4d mongo::printStackTrace(std::ostream&) + 61
 1          0x10c79ffcc mongo::verifyFailed(char const*, char const*, unsigned int) + 284
 2          0x10c757351 mongo::BSONElement::embeddedObject() const + 193
 3          0x10c6e3e0a mongo::shell_utils::ConnectionRegistry::killOperationsOnAllConnections(bool) const + 524
 4          0x10c6d5cd5 quitNicely(int) + 85
 5       0x7fff8d4af94a _sigtramp + 26
 6          0x10d5f20a0 0x0 + 4519305376
 7          0x10c7b6c4c mongo::MessagingPort::recv(mongo::Message&) + 112
 8          0x10c7b7291 mongo::MessagingPort::recv(mongo::Message const&, mongo::Message&) + 33
 9          0x10c7b7628 mongo::MessagingPort::call(mongo::Message&, mongo::Message&) + 52
 10         0x10c71b0ae mongo::DBClientConnection::call(mongo::Message&, mongo::Message&, bool, std::string*) + 88
 11         0x10c71b2fd non-virtual thunk to mongo::DBClientConnection::call(mongo::Message&, mongo::Message&, bool, std::string*) + 13
 12         0x10c73e39a mongo::DBClientCursor::init() + 304
 13         0x10c718b70 mongo::DBClientBase::query(std::string const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) + 180
 14         0x10c724532 mongo::DBClientConnection::query(std::string const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) + 130
 15         0x10c79a637 mongo::mongoFind(mongo::V8Scope*, v8::Arguments const&) + 1031
 16         0x10c783882 mongo::V8Scope::v8Callback(v8::Arguments const&) + 116
 17      0x23cf5ea4ce17 0x0 + 39373553061399
Sat May  4 14:06:17.141 terminate() called in shell, printing stack:
0x10c7c1d4d 0x10c6d5eaf 0x7fff862703c9 0x7fff86270424 0x7fff8627158b 0x10c7a0113 0x10c757351 0x10c6e3e0a 0x10c6d5cd5 0x7fff8d4af94a 0x10d5f20a0 0x10c7b6c4c 0x10c7b7291 0x10c7b7628 0x10c71b0ae 0x10c71b2fd 0x10c73e39a 0x10c718b70 0x10c724532 0x10c79a637
 0          0x10c7c1d4d mongo::printStackTrace(std::ostream&) + 61
 1          0x10c6d5eaf myterminate() + 79
 2       0x7fff862703c9 safe_handler_caller(void (*)()) + 8
 3       0x7fff86270424 __cxa_bad_typeid + 0
 4       0x7fff8627158b __gxx_exception_cleanup(_Unwind_Reason_Code, _Unwind_Exception*) + 0
 5          0x10c7a0113 mongo::verifyFailed(char const*, char const*, unsigned int) + 611
 6          0x10c757351 mongo::BSONElement::embeddedObject() const + 193
 7          0x10c6e3e0a mongo::shell_utils::ConnectionRegistry::killOperationsOnAllConnections(bool) const + 524
 8          0x10c6d5cd5 quitNicely(int) + 85
 9       0x7fff8d4af94a _sigtramp + 26
 10         0x10d5f20a0 0x0 + 4519305376
 11         0x10c7b6c4c mongo::MessagingPort::recv(mongo::Message&) + 112
 12         0x10c7b7291 mongo::MessagingPort::recv(mongo::Message const&, mongo::Message&) + 33
 13         0x10c7b7628 mongo::MessagingPort::call(mongo::Message&, mongo::Message&) + 52
 14         0x10c71b0ae mongo::DBClientConnection::call(mongo::Message&, mongo::Message&, bool, std::string*) + 88
 15         0x10c71b2fd non-virtual thunk to mongo::DBClientConnection::call(mongo::Message&, mongo::Message&, bool, std::string*) + 13
 16         0x10c73e39a mongo::DBClientCursor::init() + 304
 17         0x10c718b70 mongo::DBClientBase::query(std::string const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) + 180
 18         0x10c724532 mongo::DBClientConnection::query(std::string const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) + 130
 19         0x10c79a637 mongo::mongoFind(mongo::V8Scope*, v8::Arguments const&) + 1031

@kangas
Copy link
Contributor

kangas commented May 8, 2013

Hi Farooq,

Before we can accept any pull request, we have two main requirements. (1) reference a SERVER ticket, (2) make sure you've signed the contributor agreement. This is explained in CONTRIBUTING.rst in the root of our source tree

I see you already have the SERVER ticket taken care of, so that's great! Now we need the contributor agreement. The form is at http://www.10gen.com/legal/contributor-agreement

Even though it's not "required" in the form, please make sure you specify your GitHub username. This will help us verify compliance more quickly in the future.

Thank you for contributing to MongoDB!

@fmela
Copy link
Author

fmela commented May 8, 2013

OK, Matt. Done!

@kangas
Copy link
Contributor

kangas commented May 8, 2013

Confirmed, I see that you have now signed the contributor agreement. I will find somebody to review your pull request as soon as possible.

@acmorrow
Copy link
Contributor

Hi Farooq,

Thank you for the pull request. Your code looks good. However, if you look through the history, you will find that at one point, we did actually demangle stack traces, but have since decided against doing so. The downside to doing the demangling is that some symbols demangle to extremely large and complex names, and we found that leaving the symbols mangled was actually somewhat better for readability. So unfortunately I think we cannot accept this pull request as it is.

We are certainly always open though to ideas about how to improve the utility of our crash reports. If you have suggestions or ideas along these lines, I'd suggest filing SERVER enhancement tickets at jira.10gen.com before starting work so you can open a conversation with the kernel team. This way, we can ensure that the idea you are considering hasn't already been considered or tried before you expend considerable effort.

Thanks,
Andrew

@acmorrow acmorrow closed this May 24, 2013
jiongle1 pushed a commit to scantist-ossops-m2/mongo that referenced this pull request Mar 30, 2024
eviction server thread work.  I think I'm fixing two problems:
    First, the cache->disabled_eviction handling isn't sufficient (we
were emptying the LRU queue, but we weren't waiting for it to drain).
If a thread of control took a buffer off the LRU eviction queue and went
to sleep, it would be possible for it to race with a thread of control
during the internal-page phase of a checkpoint, and we can't allow any
pages at all to be written after the internal-page checkpoint phase
starts.   The fix is to contact the eviction server at the start of the
internal-page phase of a checkpoint and have it wait for the LRU queue
to drain.
    Second, we can't discard any page with a modified structure during
the internal-page checkpoint phase because that can race with the
checkpoint threads looking at the WT_REF structure for the page being
discarded, in other words, there's a state change in the internal page
just when the internal page is being read.

I also changed it so we don't turn off writes for the entire cache when
doing the internal-page phase of a checkpoint, we only need to turn off
writes for the file being checkpointed.

Move WT_SYNC_XXX flags into dist/flags.py, they're no longer specific
to an eviction server operation, they only place you see them all is in
the __wt_bt_cache_flush() function.

Reference mongodb#419.
jiongle1 pushed a commit to scantist-ossops-m2/mongo that referenced this pull request Mar 30, 2024
server thread work.  The last set of changes had (at least) two
problems:

First, the test to avoid selecting a dirty page for eviction was
insufficient.  It appeared in __evict_walk_file, which is before
eviction has exclusive access, the test should still appear there to
avoid selecting pages there is no hope of evicting, but should also
appear in __rec_review after the page is locked down to ensure no pages
modified after selection, but before final review, are written.

Second, the test to avoid selecting a dirty page for eviction wasted
performance because we couldn't select a page ever considered for
modification (regardless of whether or not it was actually modified),
because racing with the checkpoint thread reviewing an internal page
that referenced the evicted page could drop core when the evicted page's
modification structure disappeared.

This set of changes adds a test in __rec_review to resolve the first
problem.

The second problem is trickier, and messes with page states (oh joy!).
The key is to inform page reconciliation if it's being called by the
eviction server thread or a checkpoint thread.  In the case of being
called by the eviction server thread, any child page of an internal page
that's in the WT_REF_LOCKED state is in a stable, in-memory state,
because the calling thread set the WT_REF_LOCKED state.  In the case of
being called by the checkpoint thread, any child page of an internal
page that's in the WT_REF_LOCKED state is in a temporary state because
it's been locked by eviction.  It will either be evicted (and the page
state reset to WT_REF_DISK), or skipped (and the page state reset to
WT_REF_MEM).  Regardless, the reconciliation within the checkpoint
thread has to wait on that state change.

This is all built on top of a set of flags passed into reconciliation,
which additionally offers better control over skipping updates on a
page.  We now explicitly flag if reconciliation should (1) quit early
if unable to entirely clean a page, set by the eviction code, (2) panic
if it's unable to entirely clean a page, set by sync during file close
and by salvage, or (3) not worry about it, set by the checkpoint passes.

Reference mongodb#419.
jiongle1 pushed a commit to scantist-ossops-m2/mongo that referenced this pull request Mar 30, 2024
jiongle1 pushed a commit to scantist-ossops-m2/mongo that referenced this pull request Mar 30, 2024
an internal page might race with us as we evict a child in the page's
subtree.

One half of that test is in the reconciliation code: the checkpoint thread
waits for eviction-locked pages to settle before determining their status.
The other half of the test is in eviction: after acquiring the exclusive
eviction lock on a page, confirm no page in the page's stack of pages
from the root is being reconciled in a checkpoint.  This ensures we
either see the checkpoint-walk state in eviction, or the reconciliation
of the internal page sees our exclusive lock on the child page and waits
until we're finished evicting the child page (or give up if eviction
isn't possible).

Reference mongodb#419.
jiongle1 pushed a commit to scantist-ossops-m2/mongo that referenced this pull request Mar 30, 2024
…e of a

child page in diagnostic runs, and to save the previous state of the page in
the WT_RECONCILE structure.  This is what I've been using to debug mongodb#419.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants