-
Notifications
You must be signed in to change notification settings - Fork 5.7k
[SERVER-9586] Demangle C++ stack trace symbols for more readable stack traces. #419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi Farooq, Before we can accept any pull request, we have two main requirements. (1) reference a SERVER ticket, (2) make sure you've signed the contributor agreement. This is explained in CONTRIBUTING.rst in the root of our source tree I see you already have the SERVER ticket taken care of, so that's great! Now we need the contributor agreement. The form is at http://www.10gen.com/legal/contributor-agreement Even though it's not "required" in the form, please make sure you specify your GitHub username. This will help us verify compliance more quickly in the future. Thank you for contributing to MongoDB! |
OK, Matt. Done! |
Confirmed, I see that you have now signed the contributor agreement. I will find somebody to review your pull request as soon as possible. |
Hi Farooq, Thank you for the pull request. Your code looks good. However, if you look through the history, you will find that at one point, we did actually demangle stack traces, but have since decided against doing so. The downside to doing the demangling is that some symbols demangle to extremely large and complex names, and we found that leaving the symbols mangled was actually somewhat better for readability. So unfortunately I think we cannot accept this pull request as it is. We are certainly always open though to ideas about how to improve the utility of our crash reports. If you have suggestions or ideas along these lines, I'd suggest filing SERVER enhancement tickets at jira.10gen.com before starting work so you can open a conversation with the kernel team. This way, we can ensure that the idea you are considering hasn't already been considered or tried before you expend considerable effort. Thanks, |
eviction server thread work. I think I'm fixing two problems: First, the cache->disabled_eviction handling isn't sufficient (we were emptying the LRU queue, but we weren't waiting for it to drain). If a thread of control took a buffer off the LRU eviction queue and went to sleep, it would be possible for it to race with a thread of control during the internal-page phase of a checkpoint, and we can't allow any pages at all to be written after the internal-page checkpoint phase starts. The fix is to contact the eviction server at the start of the internal-page phase of a checkpoint and have it wait for the LRU queue to drain. Second, we can't discard any page with a modified structure during the internal-page checkpoint phase because that can race with the checkpoint threads looking at the WT_REF structure for the page being discarded, in other words, there's a state change in the internal page just when the internal page is being read. I also changed it so we don't turn off writes for the entire cache when doing the internal-page phase of a checkpoint, we only need to turn off writes for the file being checkpointed. Move WT_SYNC_XXX flags into dist/flags.py, they're no longer specific to an eviction server operation, they only place you see them all is in the __wt_bt_cache_flush() function. Reference mongodb#419.
server thread work. The last set of changes had (at least) two problems: First, the test to avoid selecting a dirty page for eviction was insufficient. It appeared in __evict_walk_file, which is before eviction has exclusive access, the test should still appear there to avoid selecting pages there is no hope of evicting, but should also appear in __rec_review after the page is locked down to ensure no pages modified after selection, but before final review, are written. Second, the test to avoid selecting a dirty page for eviction wasted performance because we couldn't select a page ever considered for modification (regardless of whether or not it was actually modified), because racing with the checkpoint thread reviewing an internal page that referenced the evicted page could drop core when the evicted page's modification structure disappeared. This set of changes adds a test in __rec_review to resolve the first problem. The second problem is trickier, and messes with page states (oh joy!). The key is to inform page reconciliation if it's being called by the eviction server thread or a checkpoint thread. In the case of being called by the eviction server thread, any child page of an internal page that's in the WT_REF_LOCKED state is in a stable, in-memory state, because the calling thread set the WT_REF_LOCKED state. In the case of being called by the checkpoint thread, any child page of an internal page that's in the WT_REF_LOCKED state is in a temporary state because it's been locked by eviction. It will either be evicted (and the page state reset to WT_REF_DISK), or skipped (and the page state reset to WT_REF_MEM). Regardless, the reconciliation within the checkpoint thread has to wait on that state change. This is all built on top of a set of flags passed into reconciliation, which additionally offers better control over skipping updates on a page. We now explicitly flag if reconciliation should (1) quit early if unable to entirely clean a page, set by the eviction code, (2) panic if it's unable to entirely clean a page, set by sync during file close and by salvage, or (3) not worry about it, set by the checkpoint passes. Reference mongodb#419.
exist for us. Reference mongodb#419.
an internal page might race with us as we evict a child in the page's subtree. One half of that test is in the reconciliation code: the checkpoint thread waits for eviction-locked pages to settle before determining their status. The other half of the test is in eviction: after acquiring the exclusive eviction lock on a page, confirm no page in the page's stack of pages from the root is being reconciled in a checkpoint. This ensures we either see the checkpoint-walk state in eviction, or the reconciliation of the internal page sees our exclusive lock on the child page and waits until we're finished evicting the child page (or give up if eviction isn't possible). Reference mongodb#419.
…e of a child page in diagnostic runs, and to save the previous state of the page in the WT_RECONCILE structure. This is what I've been using to debug mongodb#419.
This pull request will demangle C++ names and makes stack traces far more readable.
Example before:
Example after: