Caught segfault during siege #289
Migrating from here:
I can reproduce a segfault with bad requests against a plumber API
Steps to reproduce:
The text was updated successfully, but these errors were encountered:
sure, here's session_info
we have a couple of internal packages there which are loading everything else, xxxxxx and yyyyyy, I'm digging through them to be sure. As far as I can tell though, the plumber function involved only does data.frame lookups... still digging that out at the moment
Hit the endpoint with good requests first:
then hit with a lot of 404's
From R on the API host:
EDIT: also, got this when I tried again: (the exact combination of bad requests and good requests is unclear, I just alternate till I get the segfault)
At the time of the crash I was using this plumber file:
EDIT: updated (and smaller) session_info:
I haven't seen this error before, so I don't know if it's related to httpuv. The fact that those errors are different each time suggests that there's some memory corruption going on.
Has this server been upgraded from older versions of R, or is it a fresh server? If it has been upgraded, it's possible that some old packages were compiled against an old version of R and are not compatible. If you run:
pkgs <- as.data.frame(installed.packages(), stringsAsFactors = FALSE, row.names = FALSE) pkgs[, c("Package", "Version", "Built")]
The Built column should ideally be 3.4.4 for all packages, or at least 3.4.0. You can update the packages with:
update.packages(ask = FALSE, checkBuilt = TRUE)
See http://shiny.rstudio.com/articles/upgrade-R.html for more information.
The Docker image contains a bunch of builds of R that help catch memory problems. Running under
Another option is to run R under gdb and see if that helps catch the problem.
If you can provide very simple instructions for reproducing the problem, I'd appreciate it!
Server is reasonably fresh, all are on 3.4.0 or higher:
I'll upgrade all the same and try again (current versions below)
Will try running it off that container, and then under gdb
@wch Seems to be specifically an R 3.4.4 issue. I cannot reproduce using R 3.5, but as soon as I reinstalled R 3.4.4 I get the problem again
Inside the container I only get 3.5, so I was not able to reproduce with the given tools.
going to try with gdb and R 3.4.4
In depth details:
Outside of container:
Using two urlfiles:
(if you want you can replace the /test1/ endpoint and just use a port, but in my network I need to use port 80
siege commands (I alternate between these):
with RDcsan, plumber doesn't actually start:
R 3.4.4 with gdb
could be triggered by cancelling the siege... seems to happen as soon as I hit CTRL+C
R 3.5 release notes appear to have made a large number of changes to buffered connections, related?
@shapenaji and I worked on this for a while, but we haven't been able to come up with an easily reproducible test case. However, I'm quite sure that I've found the source of the segmentation fault:
Here's a walkthrough of what this looks like for us in gdb. Note that I've compiled R 3.4.4 (which is the latest version for Ubuntu) with debug symbols. I'm running an extremely basic
I've also set a breakpoint on
After a while:
Eventually there's a broken socket, but for some reason the signal ends up on thread 2, as you can see above. Further, the
Since R is not thread-safe, and
Where the segmentation fault occurs seems to be a bit random; often it is in the line above, but we saw it in several other places. It probably just depends on where the main thread happens to be when SIGPIPE shows up. I can provide full backtrace output but I think it's irrelevant in this case.
Does this information help? Are we doing something wrong configuring
@atheriel Thanks for the detailed investigation!
httpuv starts another thread when a server is started (which is what plumber does). A httpuv application may also use the
In the stack traces you provided, thread 2 was running httpuv code. If the third thread is running code from later, it will look something like this:
FWIW, in a clean R session, you can cause later to start a new thread with the following:
I'm not sure at this point exactly how the thread and signal handling is working, but if it is the case that the SIGPIPE is caused by something that happens on thread 2 (httpuv), and then it causes that thread to execute internal R code, that could trigger the kind of problems you've seen. R is not thread-safe, so we have tried very hard in httpuv to not accidentally execute internal R code on a background thread. But a signal handler could conceivably bypass the safeguards we've put in place to avoid these problems.
In my debugging it was always the second thread (the one executing
It might be possible to prevent the threads started by