Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tracing of threads other than the calling thread on Windows #473

Closed
wants to merge 13 commits into from

Conversation

aminya
Copy link

@aminya aminya commented Jul 10, 2022

This is a continuation of the work by @Jardynq that makes it possible to trace threads other than the current one. This can be used to profile Windows applications.

tikv/pprof-rs#9 (comment)

tikv/pprof-rs#122

Jardynq and others added 6 commits July 9, 2022 18:16
This probably wont compile on anything other than windows currently.

Previously RtlCaptureContext was used to effortlessly get the calling thread's context.
Instead pass in a handle to trace and then use GetThreadContext to get the context of the thread.

This should allow reading the backtrace of threads other than the calling thread. This is needed in the windows implementation of pprof-rs.

Some notes:
- The thread handle must be opened with THREAD_GET_CONTEXT access (see windows docs)
- Thread must be suspended before the trace call was made (see windows docs)
- A crossplatform interface/implementation might not be possible.
- I have no idea whether the stacktrace is even valid. It makes sense if it is?
@aminya aminya changed the title Added tracing of threads other than the calling thread Added tracing of threads other than the calling thread on Windows Jul 10, 2022
@aminya
Copy link
Author

aminya commented Jul 10, 2022

This is ready to go. Let me know if there is any feedback.

The MIR test is irrelevant to this pull request, and it is also happening on the master branch.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm wary of adding platform-specific functionality to this crate so this may best live elsewhere. The dbghelp.rs file alone isn't the most complicated to copy for something like that.

Otherwise this is not a safe implementation because the thread is resumed after the register context is captured. That means the stack walking may not work because the stack could change while the stack was being walked.

@aminya
Copy link
Author

aminya commented Jul 11, 2022

I'm wary of adding platform-specific functionality to this crate so this may best live elsewhere. The dbghelp.rs file alone isn't the most complicated to copy for something like that.

This only adds Windows for now because it is Rust cannot be profiled on Windows, and we need this to remove a platform-specific restriction. In the future, if other operating systems have such functionality, people can suggest implementations for those too.

I think it is more important to allow people to profile Rust on Windows. This is way more limiting than not having the implementation of trace_thread for other operating systems.

Otherwise this is not a safe implementation because the thread is resumed after the register context is captured. That means the stack walking may not work because the stack could change while the stack was being walked.

Based on the official Windows docs, the only possible way to get a valid context for a running thread is to suspend it.
https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getthreadcontext#remarks

As I mentioned in this comment, ResumeThread just increments the counter back to the initial state of the thread.
#473 (comment)

@bjorn3
Copy link
Member

bjorn3 commented Jul 12, 2022

This only adds Windows for now because it is Rust cannot be profiled on Windows

backtrace-rs is not magic. You can use the same api's as for C/C++ stack walking to walk the stack for Rust. This means that you can copy dbghelp.rs from backtrace-rs into a separate crate and implement walking the stack of a different thread in this copy. No need to change backtrace-rs.

and we need this to remove a platform-specific restriction.

I fail to see how it is a platform-specific restriction as backtrace-rs has no support for stack walking of a different thread on any platform at all.

Based on the official Windows docs, the only possible way to get a valid context for a running thread is to suspend it.
https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getthreadcontext#remarks

I think @alexcrichton means that you need to keep the running thread suspended while walking the stack to prevent corruption of the resulting backtrace.

@CraftSpider
Copy link

Are the objections to this PR more about the feature of walking other threads in general, or about it being added for Windows in particular? I'm coming here from pprof as well, and interested in trying to move this work forwards in some way, and I have access to both a Windows and Linux environment, so could try to implement similar functionality on both if that would make this work more appealing.

The talk of 'platform specific restriction' is that Windows doesn't support profiling from a single running thread, so the lack of this feature prevents profiling on Windows entirely without forking the library, which I think is reasonable for the pprof maintainers to be wary of taking on - it would be a significant maintenance burden to maintain what is effectively just 'backtrace with this code patched in'.

Copy link
Contributor

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate is now primarily under the maintenance of the crate-maintainers team. I personally view maintaining this crate primarily as part and parcel of the responsibility of improving the standard library that I have been entrusted with. This means I evaluate PRs to it with the needs of std in mind.

Frankly, we are not exactly well-furnished with plenty of resources for maintaining this, either. With recruited assistance, we just recently managed to scrape together a bunch of maintenance on this crate to keep it up to date with the needs of recent Rusts, after it had fallen way behind. It helps this crate is, in many ways, pretty simple, despite the dark arts it performs.

I think we have reason to be quite proud about that.

Unfortunately, that's still pride about barely hanging on. And this isn't the only pull request of the nature of "please make backtrace-rs more useful for profilers at the cost of additional complexity". So let me entirely set aside this discussion about "platform-specific functionality" versus "platform-specific restrictions" and be simple about it: as far as I am concerned, either a PR to this is making std's backtrace work for a platform or it doesn't get to make maintaining this crate harder.

If you can work out a way to add all the code you folks want to add to help this crate specifically support profiling better, while keeping it out of the way of the codepaths that std needs, that seems plausible to me. It wouldn't hurt to have more people get involved in opening PRs to this crate with its maintenance needs in mind. But I do mean out of the way... it's not necessarily about whether std lights up such codepaths, but that I need to be able to gloss over that code during review of PRs to backtrace-rs for supporting std, rather than having to also account for code that has possibly been microoptimized or given a strange API for a profiler's performance or behavior requirements... and I will still hold such PRs to a high correctness standard because even trying to keep those paths isolated, they will wind up interacting anyways.

Comment on lines +106 to +111
// The suspending and resuming of threads is very risky.
// It can end in a deadlock if the current thread tries to access
// an object thats been locked by the suspended thread.
// That's why we only do as little work as possible while
// the thread is suspended, and resume it quickly after.
// There might be more pitfalls we haven't thought of or encountered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No speculative "maybe this is bad?" in fresh code, please. I'd like to hear why this is actually okay, despite concerns about corrupting the backtrace.

// the thread is suspended, and resume it quickly after.
// There might be more pitfalls we haven't thought of or encountered

context.0.ContextFlags = CONTEXT_ALL; // TODO: Narrow down required flags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO would have to be addressed.

Comment on lines +660 to +664
pub const CONTEXT_CONTROL: DWORD = CONTEXT_AMD64 | 0x00000001;
pub const CONTEXT_INTEGER: DWORD = CONTEXT_AMD64 | 0x00000002;
pub const CONTEXT_SEGMENTS: DWORD = CONTEXT_AMD64 | 0x00000004;
pub const CONTEXT_FLOATING_POINT: DWORD = CONTEXT_AMD64 | 0x00000008;
pub const CONTEXT_DEBUG_REGISTERS: DWORD = CONTEXT_AMD64 | 0x00000010;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we avoid repeating the same 5 expressions for 3 different architectures?

Comment on lines +77 to +79
/// If the given thread handle is not the current thread,
/// this function suspends the thread for a short amount of time to be able to extract the thread context.
/// This might end in a deadlock if the current thread tries to access an object thats been locked by the suspended thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these statements about the briefness and the concern about deadlocking related to each other?

@workingjubilee
Copy link
Contributor

workingjubilee commented Mar 9, 2024

This has fallen behind the current work on dbghelp, so I am closing it. Apologies. Please feel free to reopen and rebase this (in that order) if you wish to pursue this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants