Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple /stat reading from the Process constructor #60

Closed
edigaryev opened this issue Dec 20, 2019 · 7 comments
Closed

Decouple /stat reading from the Process constructor #60

edigaryev opened this issue Dec 20, 2019 · 7 comments

Comments

@edigaryev
Copy link
Contributor

edigaryev commented Dec 20, 2019

Reading procfs is a rather slow operation and sometimes when reading it in bulk you only need a specific file like /proc/<pid>/maps, which is currently constructed through Process.maps().

Is there a reason why the Process structure constructor reads and parses /proc/<pid>/stat by default? Also, recently it got coupled even more, see bfd2c86.

It would be nice to have bare Process structure and then query the related resources on an on-demand basis.

@eminence
Copy link
Owner

This is a good question. No, there is no good reason why the constructor always parses the stat info, except that I generally always needed that info in one of the projects that prompted me to write this crate in the first place.

I would definitely consider decoupling this. Do you think there is a cheap way to figure out the process ID? I would be a little sad if bfd2c86 had to be reverted. Normally the constructor is passed in a pid, but with the new Process::new_with_root, we might not know the pid. I guess we could try to parse the latest path component of the PathBuf, but that seems unreliable.

@edigaryev
Copy link
Contributor Author

Is there a reason why the pid field is cached in the Process structure at all? A new call to stat() won't return cached data, yet a call to pid() will?

Note that I use the term "caching" for pid field. Currently it represents a value semantically similar to gettid(2)'s return value, not getpid(2)'s one. Unfortunately, it's possible for gettid(2)'s value to change during an execve(2) in a multi-threaded process, which will make this field's value stale:

https://github.com/torvalds/linux/blob/f1fd1610cbb6655883d1838ac79e53301596685d/fs/exec.c#L1145-L1150

Do you think there is a cheap way to figure out the process ID?

Process::myself() can be implemented using std::process::id() instead of reading /proc/self. Also note that /proc/self is not the the same thing as /proc/thread-self, which may be important in some multi-threaded scenarios.

As for Process::new_with_root(), it can be reincarnated as a constructor taking procfs root (e.g. /proc) instead of process/thread root (e.g. /proc/1). The new constructor can technically have the same name as it's signature will change to include the pid argument.

One can even go further and future-proof things a bit by naming the constructor something like Process::new_with_config() and let it take a structure with fields wrapped in Option's (analogous to default arguments in other languages).

@eminence
Copy link
Owner

Can you explain this part a little more:

The pid field currently represents a value semantically similar to gettid(2)'s return value, not getpid(2)'s one

Is this because in Process::myself(), the root is saved as /proc/self/, instead of dereferencing the symlink? I hadn't carefully thought about this, in the face of execv. I'm not sure the current implementation matches the behavior I have in my head. Would dereferencing /proc/self in the myself constructor bring us closer to getpid() semantics (i.e. the pid field would never go stale`)?

But with Process::new(pid) It's my understanding (and expectation) that the pid field will always match stat().unwrap().pid

@edigaryev
Copy link
Contributor Author

Sorry, I've just realized that this caching problem I've introduced above seems to manifest itself only when the Process is created with non-main thread credentials (using gettid(2) value of a thread for which getpid(2) is different).

Process::new() and Process::new_with_root() are affected, but Process::myself() is not, because /proc/self uses getpid(2) semantics (compared to /proc/self-thread, which uses gettid(2) semantics).

@eminence
Copy link
Owner

hi @edigaryev I'm coming back to this issue after several months. I confess I am not sure where we stand on this topic, since I am not sure I really understood the issue you were trying to describe.

What's your recommendation?

I am still thinking about your suggestion to remove the pid and stat fields from Process. This would be a breaking change, but it might be the right thing to do.

@edigaryev
Copy link
Contributor Author

Hi 👋!

I'm mostly unaffected by this now, as I currently need some fields from /stat for each process too when scraping procfs.

I suggest that we close this for now, until the issue resurfaces again for someone else.

@eminence
Copy link
Owner

Ok, sounds good. Thanks for letting me know. I've created a new tracking issue for this as to not forget about it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants