Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup process tree #143

Merged
merged 15 commits into from Jul 24, 2018
Merged

Cleanup process tree #143

merged 15 commits into from Jul 24, 2018

Conversation

gaborcsardi
Copy link
Member

@gaborcsardi gaborcsardi commented Jul 23, 2018

Every processx process p is marked with an environment variable that has a random part and a time stamp. Importantly, this variable is not set in the current process, that would lead to having it set in other, non-processx child processes, eg. started by RStudio.

This env var is then inherited by all processes in the process tree rooted at p, even if some processes in the tree are orphaned. (In theory processes could opt out and unset the env var, but we don't anticipate this happening very often.)

kill_tree() uses this env var, and the (internal) ps package function ps_kill_tree to clean up the whole tree.

Closes #139.

EDIT: some more notes:

  • Only Windows, macOS and Linux are supported, because these are the systems for which ps can read out environment variables from arbitrary processes. On unsupported platforms kill_tree() fails with "not_implemented".
  • ps is currently optional, should we just import it? It has no hard dependencies.
  • kill_tree() returns data about the killed processes, I'll add that to the docs in a minute.

@codecov-io
Copy link

codecov-io commented Jul 23, 2018

Codecov Report

Merging #143 into master will increase coverage by 0.62%.
The diff coverage is 74.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   69.67%   70.29%   +0.62%     
==========================================
  Files          29       30       +1     
  Lines        2783     2956     +173     
==========================================
+ Hits         1939     2078     +139     
- Misses        844      878      +34
Impacted Files Coverage Δ
src/init.c 100% <ø> (ø) ⬆️
src/tools/px.c 59.45% <0%> (-2.23%) ⬇️
R/on-load.R 0% <0%> (ø) ⬆️
src/unix/processx.c 59.83% <40%> (-0.28%) ⬇️
R/process.R 70.86% <53.84%> (-2.18%) ⬇️
src/create-time.c 72.54% <72.54%> (ø)
R/initialize.R 93.84% <80%> (-1.32%) ⬇️
R/utils.R 91.89% <88.88%> (-0.27%) ⬇️
src/win/processx.c 83.87% <95.12%> (+2.64%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7243ff...00c38dc. Read the comment docs.

paste0(
"PS",
paste(sample(c(LETTERS, 0:9), 10, replace = TRUE), collapse = ""),
"_", as.integer(Internal(Sys.time()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to use .Internal(), or could we just unclass(Sys.time())?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without .Internal(), on some platforms it is ~10-1000x slower. Considering that process$new() is currently about 1-3 ms, and can be still faster by avoiding assert_that(), etc., we might want the faster solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, yeah structure() is really slow.

struct <- function(x, ...) {
  attributes(x) <- c(attributes(x), list(...))
  x
}

bench::mark(
 Sys.time(),
 .Internal(Sys.time()),
 .POSIXct(.Internal(Sys.time())),
 struct(.Internal(Sys.time()), class = c("POSIXct", "POSIXt")),
 check=FALSE, relative = TRUE)
#> # A tibble: 4 x 10
#>   expression   min  mean median   max `itr/sec` mem_alloc  n_gc n_itr
#>   <chr>      <dbl> <dbl>  <dbl> <dbl>     <dbl>     <dbl> <dbl> <dbl>
#> 1 Sys.time()  63.2  46.9   45.2  23.7      1          NaN   Inf  1   
#> 2 .Internal…   1     1      1     1       46.9        NaN   NaN  1.00
#> 3 .POSIXct(…  60.4  38.9   40.4  12.3      1.21       NaN   Inf  1   
#> 4 "struct(.…  17.5  13.5   11.0 507.       3.48       NaN   Inf  1.00
#> # ... with 1 more variable: total_time <dbl>

Sounds like we what you have is the best option then!

)
}

format_unix_time <- function(z) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a call to .POSIXct(x, tz = "GMT")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't .POSIXct an internal function? And as.POSIXct() is very slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention I guess yes, but because it is a base function they are all the equivalent of exported. I don't believe you get a NOTE from using them...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just by convention:

Internal objects in the base package most of which are only
user-visible because of the special nature of the base namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is what I mean, if they were in a different package they would not be exported; but because they are in base everything is visible, so they are only internal by convention.

R/utils.R Outdated
structure(z, class = c("POSIXct", "POSIXt"), tzone = "GMT")
}

r_version <- function(x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not base::getRversion() for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I did not find that.... thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

class = c("not_implemented", "error", "condition")))
}

get("ps_kill_tree", asNamespace("ps"))(private$tree_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to call an unexported function? Couldn't you export it in ps if needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the kill-tree functions are somewhat dangerous if R is multi-threaded and the other threads are starting processes, e.g. with_process_cleanup() reliably crashes RStudio. So I don't want people to call these, necessarily. We could still export ps_kill_tree(), but I just want to test it a bit in the wild, before letting people use it in their packages.

}

ll = ((LONGLONG) ftCreate.dwHighDateTime) << 32;
ll += ftCreate.dwLowDateTime - 116444736000000000LL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth linking to documentation on this magic number. maybe https://docs.microsoft.com/en-us/windows/desktop/sysinfo/converting-a-time-t-value-to-a-file-time ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, why not, it is just the difference between the two origins....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


do {
if (rem_size == 0) {
*buffer = S_realloc(*buffer, buffer_size * 2, buffer_size, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of growing the buffer as you go why not just allocate the buffer to the full file size up front, using either lseek to seek to the end or with fstat? e.g. https://stackoverflow.com/a/13322673/2055486

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do that with files in /proc, their size is not known. Based on my quick test, you also cannot seek to the end in them. They essentially behave like character devices AFAICT.

return 0.0;
}

ret = processx__read_file(path, &buf, /* buffer= */ 2048);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this proc file is always a single line why can you not use a single call to getline() rather than writing processx__read_file().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is one line, but /proc/stat is not, for example. It is just better to have a generic solution that always works. Although we only read two files here, this is from ps, which reads a bunch of different files in /proc.

return starttime;
}

void *processx__memmem(const void *haystack, size_t n1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there are reason you need memmem here and cannot use strstr? Aren't needle and haystack both strings in your case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strstr is good for this file. Some \proc files have zero separated records, and this is a generic solution from ps.

@gaborcsardi
Copy link
Member Author

Thanks!

@gaborcsardi
Copy link
Member Author

@jimhester What do you think about importing ps in processx? Basically a processx process is a superset of a ps process: it is a child process, so you can do more with it. So it would make sense to define processx methods for the ps operations, I think.

ps only works on macOS, Linux and Windows currently, so on other platforms these methods will just throw "not_implemented".

OTOH, ps could also use processx. processx has an interrupt() method, which is cool, ps could define that for external processes, but on windows, it needs starting an external process. (Yeah, I know.) So it could use processx. Of course that cannot both depend on the other, the only solutions are 1) merging them or 2) using system2() in ps. 2) is not that bad, because we don't need a background process.

- Use getRversion()
- Add comment about FILETIME -> Unix time conversion.
@jimhester
Copy link
Member

I guess technically they could both depend on each other as long as the dependency was a runtime, rather than build-time dependency.

@gaborcsardi gaborcsardi merged commit a26ff9f into master Jul 24, 2018
@gaborcsardi gaborcsardi deleted the featue/kill-tree branch August 15, 2018 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants