Skip to content

Get a list of threads from Linux debuggee #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jul 31, 2020

Conversation

fabiim
Copy link
Contributor

@fabiim fabiim commented Jul 26, 2020

Fixes #36

  • Add a Thread trait
  • Retrieves a list of threads in linux

Notes:

I dropped a couple of the suggestions in the ticket (#36) because:

  • I decided to use procfs because libproc-rs does not support getting the list of threads from linux right now. I imagine this crate being useful in the future anyhow.

  • Having a default associated type in the Threat trait is not stable.

  • Returning a slice of Threat references does not make much sense to me because the threats change over time and I feel it's more useful to get a fresh current view every time.

@fabiim fabiim changed the title Get a list of threads from Linux debuggee Get a list of threads from Linux debuggee - WIP (fixing tests) Jul 26, 2020
@fabiim fabiim changed the title Get a list of threads from Linux debuggee - WIP (fixing tests) Get a list of threads from Linux debuggee Jul 26, 2020

fn name(&self) -> Option<String>;
fn thread_id(&self) -> Self::ThreadId;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like the right abstraction. Every thread has it's own register set that can be read and written. Also individual threads can be waiting on the debugger and be running. On Linux at least ptrace attaches to a single thread, not a single process:

https://man7.org/linux/man-pages/man2/ptrace.2.html

A tracee first needs to be attached to the tracer. Attachment and subsequent commands are per thread: in a multithreaded process, every thread can be individually attached to a (potentially different) tracer, or left not attached and thus not debugged. Therefore, "tracee" always means "(one) thread", never "a (possibly multithreaded) process".

This also means that you need to enable the PTRACE_O_TRACECLONE option to attach to spawned threads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it looks like Thread should be almost like a subset of Target.
What are your thoughts on this @fabiim, @bjorn3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could say that Target has one or more Thread's with one being the main thread. read_regs will then ve moved to Thread, but read and write remain part of Target. The unpause method should remain part of Target I think, at least for as long as we don't have an equivalent to gdb's non-stop mode in which individual threads are paused and continued.

Copy link
Contributor Author

@fabiim fabiim Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I get it:

  • Move the read/write registers methods to the thread trait
  • Add a way to retrieve a main thread from a target

If that is correct, I have a couple of questions:

  1. What should be the abstraction for the registers structure be in the trait?
  2. Should threads() return all threads (including the main thread?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be the abstraction for the registers structure be in the trait?

Just type Registers: Debug; or something like that for now I think. It is possible to later add additional bounds as necessary. I think threads should return a concrete thread type though for that to work.

Should threads() return all threads (including the main thread?)

I don't have an opinion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should threads() return all threads (including the main thread?)

Yes, I believe so. At least, most (all?) major debuggers impose this behaviour - e.g., if you launch a single-threaded process in gdb and run the info threads command, the output will include the main thread.

Copy link
Contributor

@bjorn3 bjorn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

fabiim and others added 6 commits July 26, 2020 21:54
 - introduces a Thread struct
 - retrieves a list of threads in linux

Notes:

- I decided to use procfs because libproc-rs does not support getting the list of threads from linux
right now [1]. I imagine this crate being useful in the future
anyhow.

[1] https://github.com/andrewdavidmackenzie/libproc-rs/blob/123d1762727c9c5d4e81888d8ea1a7f9701ddb45/src/libproc/proc_pid.rs#L260
Fails to compile on OSX. It's not needed there anywhere, so this is
better.
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
That is the only place it's being used.
fabiim added 5 commits July 26, 2020 23:00
- Was printing the wrong message.
- Was not waiting for the thread to start before retrieving the list
- of threads. Seen it failed on CI.
Adding to supress warning until it's used on a integration test or by
something else.
It's the original type in the kernel.
And remove silly dead code supression :D
Copy link
Member

@nbaksalyar nbaksalyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I left a few suggestions & comments.


fn name(&self) -> Option<String>;
fn thread_id(&self) -> Self::ThreadId;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it looks like Thread should be almost like a subset of Target.
What are your thoughts on this @fabiim, @bjorn3?

@nbaksalyar
Copy link
Member

currently there's no way to get a number of test threads from inside the tests (reading the env var is inconsistent)

FYI, I've raised a new issue for Rust: rust-lang/rust#74845

Makes API more flexible.

Co-authored-by: Nikita Baksalyar <nikita.baksalyar@gmail.com>
Copy link
Member

@nbaksalyar nbaksalyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! 👍
I think the questions re trait Thread can be addressed in separate issues/PRs.

@nbaksalyar nbaksalyar requested a review from bjorn3 July 28, 2020 21:12
@bjorn3
Copy link
Contributor

bjorn3 commented Jul 28, 2020

target::linux::tests::reads_threads stdout ----

Error: NotFound(Some("/proc/5167/task/5170/stat"))

@fabiim
Copy link
Contributor Author

fabiim commented Jul 28, 2020

target::linux::tests::reads_threads stdout ----

Error: NotFound(Some("/proc/5167/task/5170/stat"))

Yep, there is a bug, it should tolerate tasks that have gone away . I imagine this hit that.
Fixing it...

fabiim added 2 commits July 28, 2020 23:34
- Banged the thread change
- Had duplicate procfs entries
Some tasks might not be there between getting them from the iterator
and actually trying to read their page.
@fabiim
Copy link
Contributor Author

fabiim commented Jul 29, 2020

target::linux::tests::reads_threads stdout ----

Error: NotFound(Some("/proc/5167/task/5170/stat"))

Yep, there is a bug, it should tolerate tasks that have gone away . I imagine this hit that.
Fixing it...

Fixed. Now we just skip threads not found and/or not ready yet.

It does not make much sense.
Threads come and go, we don't have any consistency guarantees when calling
the threads method. If the thread info is not ready it can be caught in a
subsequent call to threads
// ok to skip. Thread is gone or it's page is not complete yet.
}
Err(err) => return Err(Box::new(err)),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thread could change it's own name I think. Maybe LinuxThread should only store pid and tid and name() would read the current name or return None when the thread doesn't exist anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... interesting. Ok will check this later today then.

Regarding keeping the pid and tid: as far as I can tell they are the same thing.
So we only need to keep one. Is this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

procfs::process::Task contains the pid of the whole process and the tid of the specific thread. You could also directly store that struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, a thread can change the name. From man procfs:

A thread may modify its comm value, or that of any of other thread in the same thread group

Going with your idea of having the name method return the most recent: do you think it will be fine to swallow all other procfs::ProcError into a None? If so, the user wouldn't know something went wrong. Should we use a result on the name?

Also I'm not sure I understand the benefits of the change. If you are getting a list of threads you either have them all paused or as soon as you get your information it might already be outdated. So the user will always have this issue of seeing old data, even if we push it elsewhere. He will always need to poll the thread list to get up-to-date information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a result on the name?

Yes

If you are getting a list of threads you either have them all paused or as soon as you get your information it might already be outdated.

I think once we support actually debugging threads, we would store a list of all active threads. When using PTRACE_O_TRACECLONE to automatically attach to new threads we have to manually continue the new thread anyway, so we could just as well store it in a list. Only when attaching to a new process do we need to list all active threads. Making the LinuxThread persistent also has the benefit that you could continue and pause individual threads rather than the whole process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I think once we support actually debugging threads, we would store a list of all active threads. When using PTRACE_O_TRACECLONE to automatically attach to new threads we have to manually continue the new thread anyway, so we could just as well store it in a list. Only when attaching to a new process do we need to list all active threads. Making the LinuxThread persistent also has the benefit that you could continue and pause individual threads rather than the whole process.

Alright, I completely missed that the first time you mentioned. I think I get it now - thank you!. We are going to be in lockstep with threads getting notified through ptrace that a new thread was created and adding that to the list of the process.

@fabiim
Copy link
Contributor Author

fabiim commented Jul 31, 2020

@nbaksalyar @bjorn3 Is this ok?

My understanding from the discussion is that 2 essential features are missing:

  • The Thread abstraction should have (to start with) the read registers method using a Register: Debug Trait
  • The thread list should be kept inside the process and we need to update it as threads come and go

But I think it might be better to handle them as separate PR's?

@bjorn3 bjorn3 requested a review from nbaksalyar July 31, 2020 08:55
fabiim added 2 commits July 31, 2020 10:23
Keeping the task to retrieve the name forces us to change the API to
be a bit more complex but it allows to capture thread names changes.
@nbaksalyar
Copy link
Member

But I think it might be better to handle them as separate PR's?

Definitely. This PR underwent quite a number of changes & rebases already - thanks for your patience!

@nbaksalyar nbaksalyar merged commit 9a15ae1 into headcrab-rs:master Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get a list of process threads on Linux
4 participants