Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caught segfault during siege #289

Closed
shapenaji opened this issue Aug 9, 2018 · 18 comments · Fixed by rstudio/httpuv#169
Closed

Caught segfault during siege #289

shapenaji opened this issue Aug 9, 2018 · 18 comments · Fixed by rstudio/httpuv#169
Assignees
Labels
difficulty: advanced Best for maintainers to address effort: low < 1 day of work priority: high Must be fixed before next release

Comments

@shapenaji
Copy link
Contributor

Migrating from here:
https://community.rstudio.com/t/stressing-the-plumber/12327/4

I can reproduce a segfault with bad requests against a plumber API

API Host:
Ubuntu 16.04 running on Azure
4 vCPUs
64 G ram

Steps to reproduce:
Set up a plumber API on port XYZ
siege api with bad requests

Message:

address 0x4f64650, cause 'invalid permissions'
Segmentation fault (core dumped)
Traceback
Traceback:
 1: FUN(X[[i]], ...)
 2: vapply(expr[-1], identical, logical(1), quote(.))
 3: is_first(rhs)
 4: split_chain(match.call(), env = env)
<5-11 Redacted>
12: do.call(private$func, args, envir = private$envir)
13: (function (...) {    args <- getRelevantArgs(list(...), plumberExpression = private$expr)    hookEnv <- new.env()    private$runHooks("preexec", c(list(data = hookEnv), list(...)))    val <- do.call(private$func, args, envir = private$envir)    val <- private$runHooks("postexec", c(list(data = hookEnv,         value = val), list(...)))    val})(res = <environment>, req = <environment>, model_version = "1.0.0",     x = "1", y = "-40", z = "43",     w = "-103")
14: do.call(h$exec, req$args)
15: doTryCatch(return(expr), name, parentenv, handler)
16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
17: tryCatchList(expr, classes, parentenv, handlers)
18: tryCatch({    options(warn = 1)    h <- getHandle("__first__")    if (!is.null(h)) {        if (!is.null(h$serializer)) {            res$serializer <- h$serializer        }        req$args <- c(h$getPathParams(path), req$args)        return(do.call(h$exec, req$args))    }    if (length(private$filts) > 0) {        for (i in 1:length(private$filts)) {            fi <- private$filts[[i]]            h <- getHandle(fi$name)            if (!is.null(h)) {                if (!is.null(h$serializer)) {                  res$serializer <- h$serializer                }                req$args <- c(h$getPathParams(path), req$args)                return(do.call(h$exec, req$args))            }            .globals$forwarded <- FALSE            fres <- do.call(fi$exec, req$args)            if (!.globals$forwarded) {                if (!is.null(fi$serializer)) {                  res$serializer <- fi$serializer                }                return(fres)            }        }    }    h <- getHandle("__no-preempt__")    if (!is.null(h)) {        if (!is.null(h$serializer)) {            res$serializer <- h$serializer        }        req$args <- c(h$getPathParams(path), req$args)        return(do.call(h$exec, req$args))    }    for (mountPath in names(private$mnts)) {        if (nchar(path) >= nchar(mountPath) && substr(path, 0,             nchar(mountPath)) == mountPath) {            req$PATH_INFO <- substr(req$PATH_INFO, nchar(mountPath),                 nchar(req$PATH_INFO))            return(private$mnts[[mountPath]]$route(req, res))        }    }    val <- private$notFoundHandler(req = req, res = res)    return(val)}, error = function(e) {    val <- private$errorHandler(req, res, e)    return(val)}, finally = options(warn = oldWarn))
19: self$route(req, res)
20: self$serve(req, res)
21: func(req)
22: compute()
23: doTryCatch(return(expr), name, parentenv, handler)
24: tryCatchOne(expr, names, parentenv, handlers[[1L]])
25: tryCatchList(expr, classes, parentenv, handlers)
26: tryCatch(compute(), error = function(e) compute_error <<- e)
27: rookCall(.app$call, req, req$.bodyData, seek(req$.bodyData))
28: (function (req, cpp_callback) {    resp <- rookCall(.app$call, req, req$.bodyData, seek(req$.bodyData))    clean_up <- function() {        if (!is.null(req$.bodyData)) {            close(req$.bodyData)        }        req$.bodyData <- NULL    }    if (is.promise(resp)) {        resp <- resp %...>% invokeCppCallback(., cpp_callback)        finally(resp, clean_up)    }    else {        on.exit(clean_up())        invokeCppCallback(resp, cpp_callback)    }    invisible()})(<environment>, <pointer: 0x7f9828048140>)
29: evalq((function (req, cpp_callback) {    resp <- rookCall(.app$call, req, req$.bodyData, seek(req$.bodyData))    clean_up <- function() {        if (!is.null(req$.bodyData)) {            close(req$.bodyData)        }        req$.bodyData <- NULL    }    if (is.promise(resp)) {        resp <- resp %...>% invokeCppCallback(., cpp_callback)        finally(resp, clean_up)    }    else {        on.exit(clean_up())        invokeCppCallback(resp, cpp_callback)    }    invisible()})(<environment>, <pointer: 0x7f9828048140>), <environment>)
30: evalq((function (req, cpp_callback) {    resp <- rookCall(.app$call, req, req$.bodyData, seek(req$.bodyData))    clean_up <- function() {        if (!is.null(req$.bodyData)) {            close(req$.bodyData)        }        req$.bodyData <- NULL    }    if (is.promise(resp)) {        resp <- resp %...>% invokeCppCallback(., cpp_callback)        finally(resp, clean_up)    }    else {        on.exit(clean_up())        invokeCppCallback(resp, cpp_callback)    }    invisible()})(<environment>, <pointer: 0x7f9828048140>), <environment>)
31: doTryCatch(return(expr), name, parentenv, handler)
32: tryCatchOne(expr, names, parentenv, handlers[[1L]])
33: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
34: doTryCatch(return(expr), name, parentenv, handler)
35: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]),     names[nh], parentenv, handlers[[nh]])
36: tryCatchList(expr, classes, parentenv, handlers)
37: tryCatch(evalq((function (req, cpp_callback) {    resp <- rookCall(.app$call, req, req$.bodyData, seek(req$.bodyData))    clean_up <- function() {        if (!is.null(req$.bodyData)) {            close(req$.bodyData)        }        req$.bodyData <- NULL    }    if (is.promise(resp)) {        resp <- resp %...>% invokeCppCallback(., cpp_callback)        finally(resp, clean_up)    }    else {        on.exit(clean_up())        invokeCppCallback(resp, cpp_callback)    }    invisible()})(<environment>, <pointer: 0x7f9828048140>), <environment>),     error = function (x)     x, interrupt = function (x)     x)
38: .Call("_later_execCallbacks", PACKAGE = "later", timeoutSecs)
39: execCallbacks(timeoutSecs)
40: run_now(check_time)
41: service(0)
42: httpuv::runServer(host, port, self)
43: srv$run(port = 9400)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)
@shapenaji
Copy link
Contributor Author

I have a feeling this is related to something trying to access something on disk... cannot reproduce with simple-errors. Trying to make a better reprex

@trestletech
Copy link
Contributor

CC @wch does this look like an httpuv issue?

@shapenaji can you include the devtools::session_info() from that machine so we can see what versions of each package you're using?

@shapenaji
Copy link
Contributor Author

shapenaji commented Aug 9, 2018

Hi @trestletech

sure, here's session_info

Session Info
> library(plumber);library(xxxxx);devtools::session_info()
Session info ------------------------------------------------------------------
 setting  value
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 tz       Etc/UTC
 date     2018-08-09

Packages ----------------------------------------------------------------------
 package    * version date       source
 assertthat   0.2.0   2017-04-11 cran (@0.2.0)
 base       * 3.4.4   2018-03-16 local
 bindr        0.1.1   2018-03-13 cran (@0.1.1)
 bindrcpp     0.2.2   2018-03-29 cran (@0.2.2)
 compiler     3.4.4   2018-03-16 local
 crayon       1.3.4   2017-09-16 cran (@1.3.4)
 datasets   * 3.4.4   2018-03-16 local
 devtools     1.13.6  2018-06-27 CRAN (R 3.4.4)
 digest       0.6.15  2018-01-28 CRAN (R 3.4.4)
 dplyr        0.7.6   2018-06-29 cran (@0.7.6)
 glue         1.3.0   2018-07-17 cran (@1.3.0)
 graphics   * 3.4.4   2018-03-16 local
 grDevices  * 3.4.4   2018-03-16 local
 httpuv       1.4.5   2018-07-19 cran (@1.4.5)
 jsonlite     1.5     2017-06-01 CRAN (R 3.4.4)
 later        0.7.3   2018-06-08 cran (@0.7.3)
 magrittr     1.5     2014-11-22 cran (@1.5)
 memoise      1.1.0   2017-04-21 CRAN (R 3.4.4)
 methods    * 3.4.4   2018-03-16 local
 pillar       1.3.0   2018-07-14 cran (@1.3.0)
 pkgconfig    2.0.1   2017-03-21 cran (@2.0.1)
 plumber    * 0.4.6   2018-06-05 cran (@0.4.6)
 promises     1.0.1   2018-04-13 cran (@1.0.1)
 purrr        0.2.5   2018-05-29 cran (@0.2.5)
 R6           2.2.2   2017-06-17 CRAN (R 3.4.4)
 Rcpp         0.12.18 2018-07-23 cran (@0.12.18)
 rlang        0.2.1   2018-05-30 cran (@0.2.1)
 xxxxxxx    0.3.0   2018-08-08 cran (@0.3.0)
 yyyyyy    * 1.0.8   2018-08-08 cran (@1.0.8)
 stats      * 3.4.4   2018-03-16 local
 stringi      1.2.4   2018-07-20 cran (@1.2.4)
 tibble       1.4.2   2018-01-22 cran (@1.4.2)
 tidyr        0.8.1   2018-05-18 cran (@0.8.1)
 tidyselect   0.2.4   2018-02-26 cran (@0.2.4)
 tools        3.4.4   2018-03-16 local
 utils      * 3.4.4   2018-03-16 local
 withr        2.1.2   2018-03-15 CRAN (R 3.4.4)

we have a couple of internal packages there which are loading everything else, xxxxxx and yyyyyy, I'm digging through them to be sure. As far as I can tell though, the plumber function involved only does data.frame lookups... still digging that out at the moment

@shapenaji
Copy link
Contributor Author

shapenaji commented Aug 9, 2018

Got it:

Hit the endpoint with good requests first:

good_urls
http://xxx.xxx.xxx.xxx/endpoint/prediction?a=15&b=5
http://xxx.xxx.xxx.xxx/endpoint/prediction?a=-4&b=2
....
http://xxx.xxx.xxx.xxx/nflpretest/prediction?a=-2&b=5

then hit with a lot of 404's
by using a function that did not exist

bad_urls >
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=1&x_s=16&t_p=52&x_t=3                                                                           [0/1855]
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=13&x_s=34&t_p=36&x_t=31
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-14&x_s=8&t_p=47&x_t=-35
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=11&x_s=-106&t_p=40&x_t=-41
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-8&x_s=-119&t_p=37&x_t=94
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-6&x_s=-107&t_p=39&x_t=-70
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-2&x_s=-97&t_p=53&x_t=34
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=9&x_s=3&t_p=55&x_t=-44
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-8&x_s=-44&t_p=38&x_t=-39
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-9&x_s=2&t_p=55&x_t=-4
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-9&x_s=41&t_p=35&x_t=-58
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=5&x_s=-104&t_p=41&x_t=50
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=-1&x_s=21&t_p=36&x_t=-75
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=0&x_s=10&t_p=32&x_t=-89
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=5&x_s=-86&t_p=36&x_t=-37
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=1&x_s=16&t_p=52&x_t=3
http://xxx.xxx.xxx.xxx/endpoint/api_fun?model_version=1.0.0&s_p=13&x_s=34&t_p=36&x_t=31

From R on the API host:

Starting server to listen on port 9400
Error in tryCatchOne(expr, names, parentenv, handlers[[1L]]) :
  R_Reprotect: only 56 protected items, can't reprotect index 77
Calls: <Anonymous> ... tryCatchOne -> doTryCatch -> tryCatchList -> tryCatchOne

 *** caught segfault ***
address (nil), cause 'memory not mapped'
Segmentation fault (core dumped)

EDIT: also, got this when I tried again: (the exact combination of bad requests and good requests is unclear, I just alternate till I get the segfault)


> library(plumber);srv <- plumb("plumber.R");srv$run(port = 9400)
Starting server to listen on port 9400
Error in tryCatchOne(expr, names, parentenv, handlers[[1L]]) :
  not a BUILTIN function
Calls: <Anonymous> -> options
Fatal error: error during cleanup

At the time of the crash I was using this plumber file:

plumber.R
library(tibble)
#* return the sum of two numbers
#* @param a The first number to add
#* @param b The second number to add
#* @get /prediction
prediction <- function(...){
  structure(list(x = c(9, 9, 9, 9, 9, 9), y = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), z = c(9,
9, 9, 9, 9, 9), w = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_)), .Names = c("x", "y",
"z", "w"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))

EDIT: updated (and smaller) session_info:

session_info
> devtools::session_info()
Session info ------------------------------------------------------------------
 setting  value
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 tz       Etc/UTC
 date     2018-08-09

Packages ----------------------------------------------------------------------
 package    * version date       source
 assertthat   0.2.0   2017-04-11 cran (@0.2.0)
 base       * 3.4.4   2018-03-16 local
 bindr        0.1.1   2018-03-13 cran (@0.1.1)
 bindrcpp     0.2.2   2018-03-29 cran (@0.2.2)
 compiler     3.4.4   2018-03-16 local
 crayon       1.3.4   2017-09-16 cran (@1.3.4)
 datasets   * 3.4.4   2018-03-16 local
 devtools     1.13.6  2018-06-27 CRAN (R 3.4.4)
 digest       0.6.15  2018-01-28 CRAN (R 3.4.4)
 dplyr      * 0.7.6   2018-06-29 cran (@0.7.6)
 glue         1.3.0   2018-07-17 cran (@1.3.0)
 graphics   * 3.4.4   2018-03-16 local
 grDevices  * 3.4.4   2018-03-16 local
 httpuv       1.4.5   2018-07-19 cran (@1.4.5)
 jsonlite     1.5     2017-06-01 CRAN (R 3.4.4)
 later        0.7.3   2018-06-08 cran (@0.7.3)
 magrittr     1.5     2014-11-22 cran (@1.5)

@wch
Copy link
Collaborator

wch commented Aug 9, 2018

I haven't seen this error before, so I don't know if it's related to httpuv. The fact that those errors are different each time suggests that there's some memory corruption going on.

Has this server been upgraded from older versions of R, or is it a fresh server? If it has been upgraded, it's possible that some old packages were compiled against an old version of R and are not compatible. If you run:

pkgs <- as.data.frame(installed.packages(), stringsAsFactors = FALSE, row.names = FALSE)
pkgs[, c("Package", "Version", "Built")]

The Built column should ideally be 3.4.4 for all packages, or at least 3.4.0. You can update the packages with:

update.packages(ask = FALSE, checkBuilt = TRUE)

See http://shiny.rstudio.com/articles/upgrade-R.html for more information.

@shapenaji If you're feeling bold, you can try running your code in this Docker image: https://hub.docker.com/r/wch1/r-debug/

The Docker image contains a bunch of builds of R that help catch memory problems. Running under RDsan, RDcsan, or RDvalgrind should provide more information about memory problems. But if you find it unclear how to do that, I can try to get to it later.

Another option is to run R under gdb and see if that helps catch the problem.

If you can provide very simple instructions for reproducing the problem, I'd appreciate it!

@shapenaji
Copy link
Contributor Author

Server is reasonably fresh, all are on 3.4.0 or higher:

I'll upgrade all the same and try again (current versions below)

packageVersions
> pkgs[, c("Package", "Version", "Built")]
          Package    Version Built
1         anytime      0.3.1 3.4.4
2      assertthat      0.2.0 3.4.4
3              BH   1.66.0-1 3.4.4
4           bindr      0.1.1 3.4.4
5        bindrcpp      0.2.2 3.4.4
6             cli      1.0.0 3.4.4
7          crayon      1.3.4 3.4.4
8            curl        3.2 3.4.4
9      data.table     1.11.4 3.4.4
10       devtools     1.13.6 3.4.4
11         digest     0.6.15 3.4.4
12          dplyr      0.7.6 3.4.4
13          fansi      0.2.3 3.4.4
14        formatR        1.5 3.4.4
15  futile.logger 1.4.3.9001 3.4.4
16 futile.options      1.0.1 3.4.4
17          git2r     0.23.0 3.4.4
18           glue      1.3.0 3.4.4
19        hashmap      0.2.2 3.4.4
20         httpuv      1.4.5 3.4.4
21           httr      1.3.1 3.4.4
22       jsonlite        1.5 3.4.4
23       lambda.r      1.2.3 3.4.4
24          later      0.7.3 3.4.4
25      lubridate      1.7.4 3.4.4
26       magrittr        1.5 3.4.4
27        memoise      1.1.0 3.4.4
28           mime        0.5 3.4.4
29 odds.converter       1.45 3.4.4
30        openssl      1.0.2 3.4.4
31         pillar      1.3.0 3.4.4
32      pkgconfig      2.0.1 3.4.4
33          plogr      0.2.0 3.4.4
34        plumber      0.4.6 3.4.4
35       promises      1.0.1 3.4.4
36          purrr      0.2.5 3.4.4
37             R6      2.2.2 3.4.4
38   RApiDatetime      0.0.3 3.4.4
39           Rcpp    0.12.18 3.4.4
40          rlang      0.2.1 3.4.4
41      abcdef      0.3.0 3.4.4
42    abcdeg        1.1.0 3.4.4
43        abcdefg            1.0.8 3.4.4
44      abcdefgh      1.2.4 3.4.4
45      abcdefgh      0.4.5 3.4.4
46     rstudioapi        0.7 3.4.4
47        rsyslog      1.0.1 3.4.4
48        stringi      1.2.4 3.4.4
49        stringr      1.3.1 3.4.4
50         tibble      1.4.2 3.4.4
51          tidyr      0.8.1 3.4.4
52     tidyselect      0.2.4 3.4.4
53           utf8      1.1.4 3.4.4
54           uuid      0.1-2 3.4.4
55        whisker      0.3-2 3.4.4
56          withr      2.1.2 3.4.4
57           base      3.4.4 3.4.4
58           boot     1.3-20 3.4.1
59          class     7.3-14 3.4.0
60        cluster      2.0.6 3.4.0

Will try running it off that container, and then under gdb

@shapenaji
Copy link
Contributor Author

shapenaji commented Aug 9, 2018

@wch Seems to be specifically an R 3.4.4 issue. I cannot reproduce using R 3.5, but as soon as I reinstalled R 3.4.4 I get the problem again

Inside the container I only get 3.5, so I was not able to reproduce with the given tools.

going to try with gdb and R 3.4.4

In depth details:

DETAILS

Host:

Outside of container:

R --no-save --no-restore -e "library(plumber);srv <- plumb(\"plumber.R\");srv\$run(port = 9500)"

plumber.R

#* return the sum of two numbers
#* @param a The first number to add
#* @param b The second number to add
#* @get /prediction
prediction <- function(a, b){
  message(a, appendLF=FALSE)
  message(paste(rep("\r", nchar(a)), collapse = ''), appendLF=FALSE)
  structure(list(x = c(9, 9, 9, 9, 9, 9), y = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), z = c(9,
9, 9, 9, 9, 9), w = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_)), .Names = c("x", "y",
"z", "w"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
}

Client

Using two urlfiles:

(if you want you can replace the /test1/ endpoint and just use a port, but in my network I need to use port 80

minigood:

http://xxx.xxx.xxx.xxx/test1/prediction?a=-5&b=-1
http://xxx.xxx.xxx.xxx/test1/prediction?a=-2&b=4
http://xxx.xxx.xxx.xxx/test1/prediction?a=-13&b=0
http://xxx.xxx.xxx.xxx/test1/prediction?a=4&b=-2
http://xxx.xxx.xxx.xxx/test1/prediction?a=-2&b=-3
http://xxx.xxx.xxx.xxx/test1/prediction?a=11&b=1
http://xxx.xxx.xxx.xxx/test1/prediction?a=-6&b=4
http://xxx.xxx.xxx.xxx/test1/prediction?a=-6&b=4
http://xxx.xxx.xxx.xxx/test1/prediction?a=9&b=2
http://xxx.xxx.xxx.xxx/test1/prediction?a=-11&b=1

minibad:

http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=-8&o_s=-113&t_p=48&o_t=12
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=-7&o_s=94&t_p=50&o_t=102
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=-13&o_s=-101&t_p=33&o_t=-72
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=0&o_s=11&t_p=37&o_t=-77
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=2&o_s=38&t_p=49&o_t=-85
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=-7&o_s=16&t_p=43&o_t=-8
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=-11&o_s=74&t_p=40&o_t=-89
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=5&o_s=-82&t_p=40&o_t=85
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=15&o_s=-53&t_p=46&o_t=48
http://xxx.xxx.xxx.xxx/test1/api_fun?model_version=1.0.0&s_p=-13&o_s=-83&t_p=45&o_t=-71

siegerc:

failures = 15000

siege commands (I alternate between these):

siege -c100 -b -i -f minigood
siege -c100 -b -i -f minibad

Extra Notes
Tested with R, RDsan, and RDvalgrind inside the container.

with RDcsan, plumber doesn't actually start:

/usr/local/RDcsan/lib/R/site-library/Rcpp/include/Rcpp/routines.h:209:12: runtime error: call to function (unknown) through pointer to incorrect function type 'void *(*)(SEXPREC *)'
(/usr/local/RDcsan/lib/R/site-library/Rcpp/libs/Rcpp.so+0x285650): note: (unknown) defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/local/RDcsan/lib/R/site-library/Rcpp/include/Rcpp/routines.h:209:12 in

@shapenaji
Copy link
Contributor Author

shapenaji commented Aug 9, 2018

R 3.4.4 with gdb
R -d gdb --no-save --no-restore -e "library(plumber);srv <- plumb(\"plumber.R\");srv\$run(port = 9500)"
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/R/bin/exec/R...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/lib/R/bin/exec/R --no-save --no-restore -e library\(plumber\)\;srv\~+\~\<-\~+\~plumb\(\"plumber.R\"\)\;srv\$run\(port\~+\~=\~+\~9500\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(plumber);srv <- plumb("plumber.R");srv$run(port = 9500)
Starting server to listen on port 9500
[New Thread 0x7fffefe55700 (LWP 8060)]
[New Thread 0x7fffef654700 (LWP 8110)]
-72
Thread 2 "R" received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7fffefe55700 (LWP 8060)]
0x00007ffff75a24bd in write () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.

(gdb) step
Warning: stack imbalance in '{', 165 then 99
Warning: stack imbalance in 'standardGeneric', 156 then 90
Warning: stack imbalance in '{', 152 then 86
0x00007ffff78c0f29 in ?? () from /usr/lib/R/lib/libR.so

(gdb) backtrace
#0  0x00007ffff75a24bd in write () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff00b6ad3 in uv__write (stream=stream@entry=0x7fffe80252b0) at src/unix/stream.c:843
#2  0x00007ffff00b7fe4 in uv_write2 (req=0x7fffe8012490, stream=0x7fffe80252b0, bufs=0x7fffe8012558, nbufs=nbufs@entry=1, send_handle=send_handle@entry=0x0, cb=cb@entry=0x7ffff00a1af0 <writecb(uv_write_t*, int)>) at src/unix/stream.c:1473
#3  0x00007ffff00b804b in uv_write (req=<optimized out>, handle=<optimized out>, bufs=<optimized out>, nbufs=nbufs@entry=1, cb=cb@entry=0x7ffff00a1af0 <writecb(uv_write_t*, int)>) at src/unix/stream.c:1498
#4  0x00007ffff00a1a58 in ExtendedWrite::next (this=0x7fffe802ff50) at uvutil.cpp:104
#5  0x00007ffff00a1ae5 in ExtendedWrite::begin (this=<optimized out>) at uvutil.cpp:71
#6  0x00007ffff0095813 in HttpResponse::onResponseWritten (this=this@entry=0x15f05d0, status=status@entry=0) at httpresponse.cpp:128
#7  0x00007ffff009598d in on_response_written (handle=0x7fffe800fb70, status=0) at httpresponse.cpp:17
#8  0x00007ffff00b6f48 in uv__write_callbacks (stream=stream@entry=0x7fffe80252b0) at src/unix/stream.c:961
#9  0x00007ffff00b7101 in uv__stream_io (loop=<optimized out>, w=0x7fffe8025338, events=<optimized out>) at src/unix/stream.c:1334
#10 0x00007ffff00b137b in uv__run_pending (loop=0x7ffff02e2ea0 <io_loop>) at src/unix/core.c:778
#11 uv_run (loop=loop@entry=0x7ffff02e2ea0 <io_loop>, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:360
#12 0x00007ffff0097021 in io_thread (data=0x7fffffffa3a0) at httpuv.cpp:114
#13 0x00007ffff75996ba in start_thread (arg=0x7fffefe55700) at pthread_create.c:333
#14 0x00007ffff72cf41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Backtrace:

(gdb) backtrace
#0  0x00007ffff75a24bd in write () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff00b6ad3 in uv__write (stream=stream@entry=0x7fffe80252b0) at src/unix/stream.c:843
#2  0x00007ffff00b7fe4 in uv_write2 (req=0x7fffe8012490, stream=0x7fffe80252b0, bufs=0x7fffe8012558, nbufs=nbufs@entry=1, send_handle=send_handle@entry=0x0, cb=cb@entry=0x7ffff00a1af0 <writecb(uv_write_t*, int)>) at src/unix/stream.c:1473
#3  0x00007ffff00b804b in uv_write (req=<optimized out>, handle=<optimized out>, bufs=<optimized out>, nbufs=nbufs@entry=1, cb=cb@entry=0x7ffff00a1af0 <writecb(uv_write_t*, int)>) at src/unix/stream.c:1498
#4  0x00007ffff00a1a58 in ExtendedWrite::next (this=0x7fffe802ff50) at uvutil.cpp:104
#5  0x00007ffff00a1ae5 in ExtendedWrite::begin (this=<optimized out>) at uvutil.cpp:71
#6  0x00007ffff0095813 in HttpResponse::onResponseWritten (this=this@entry=0x15f05d0, status=status@entry=0) at httpresponse.cpp:128
#7  0x00007ffff009598d in on_response_written (handle=0x7fffe800fb70, status=0) at httpresponse.cpp:17
#8  0x00007ffff00b6f48 in uv__write_callbacks (stream=stream@entry=0x7fffe80252b0) at src/unix/stream.c:961
#9  0x00007ffff00b7101 in uv__stream_io (loop=<optimized out>, w=0x7fffe8025338, events=<optimized out>) at src/unix/stream.c:1334
#10 0x00007ffff00b137b in uv__run_pending (loop=0x7ffff02e2ea0 <io_loop>) at src/unix/core.c:778
#11 uv_run (loop=loop@entry=0x7ffff02e2ea0 <io_loop>, mode=mode@entry=UV_RUN_DEFAULT) at src/unix/core.c:360
#12 0x00007ffff0097021 in io_thread (data=0x7fffffffa3a0) at httpuv.cpp:114
#13 0x00007ffff75996ba in start_thread (arg=0x7fffefe55700) at pthread_create.c:333
#14 0x00007ffff72cf41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

could be triggered by cancelling the siege... seems to happen as soon as I hit CTRL+C

@shapenaji
Copy link
Contributor Author

R 3.5 release notes appear to have made a large number of changes to buffered connections, related?

https://cran.r-project.org/doc/manuals/r-release/NEWS.html

@atheriel
Copy link
Contributor

@shapenaji and I worked on this for a while, but we haven't been able to come up with an easily reproducible test case. However, I'm quite sure that I've found the source of the segmentation fault: SIGPIPE is getting handled off the main thread when under heavy load, which of course crashes R. I'm not sure if this is a bug in httpuv alone, or if it has something to do with the way that plumber uses httpuv.

Here's a walkthrough of what this looks like for us in gdb. Note that I've compiled R 3.4.4 (which is the latest version for Ubuntu) with debug symbols. I'm running an extremely basic plumber.R file, which I can provide, but I don't believe it's important.

I've also set a breakpoint on handlePipe() in R's main.c file, for further illustration.

$ R-3.4.4/bin/R -d "gdb --silent" --no-save --no-restore -e "library(plumber);srv <- plumb(\"plumber.R\");srv\$run(port = 9500)"
Reading symbols from /opt/api/testapi/R-3.4.4/bin/exec/R...done.
(gdb) break main.c:handlePipe
No source file named main.c.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (main.c:handlePipe) pending.
(gdb) run
Starting program: /opt/api/testapi/R-3.4.4/bin/exec/R --no-save --no-restore -e library\(plumber\)\;srv\~+\~
\<-\~+\~plumb\(\"plumber.R\"\)\;srv\$run\(port\~+\~=\~+\~9500\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(plumber);srv <- plumb("plumber.R");srv$run(port = 9500)
Starting server to listen on port 9500
[New Thread 0x7ffff01f9700 (LWP 23240)]
[New Thread 0x7fffef9f8700 (LWP 23264)]

I think httpuv is setting up the third thread only when necessary; it only seems to appear under load.

After a while:

Thread 2 "R" received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7ffff01f9700 (LWP 23240)]
0x00007ffff75404bd in write () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) info threads
  Id   Target Id         Frame 
  1    Thread 0x7ffff7fdf7c0 (LWP 23233) "R" 0x00007ffff7851442 in duplicate1 (s=0x331b478, deep=TRUE)
    at duplicate.c:344
* 2    Thread 0x7ffff01f9700 (LWP 23240) "R" 0x00007ffff75404bd in write ()
    at ../sysdeps/unix/syscall-template.S:84
  3    Thread 0x7fffef9f8700 (LWP 23264) "R" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

Eventually there's a broken socket, but for some reason the signal ends up on thread 2, as you can see above. Further, the handlePipe() function gets called on that thread:

(gdb) continue
Continuing.

Thread 2 "R" hit Breakpoint 1, handlePipe (dummy=13) at main.c:443
443         signal(SIGPIPE, handlePipe);

Since R is not thread-safe, and handlePipe() calls Rf_error(), this is really bad news:

(gdb) continue
Continuing.

Thread 2 "R" received signal SIGSEGV, Segmentation fault.
0x00007ffff7817037 in R_restore_globals (cptr=0x7ffffffd3170) at context.c:181
181             SET_PRSEEN(R_PendingPromises->promise, 2);

Where the segmentation fault occurs seems to be a bit random; often it is in the line above, but we saw it in several other places. It probably just depends on where the main thread happens to be when SIGPIPE shows up. I can provide full backtrace output but I think it's irrelevant in this case.

Does this information help? Are we doing something wrong configuring plumber here?

@wch
Copy link
Collaborator

wch commented Aug 13, 2018

@atheriel Thanks for the detailed investigation!

httpuv starts another thread when a server is started (which is what plumber does). A httpuv application may also use the later package to schedule tasks, and this can cause a third thread to start.

In the stack traces you provided, thread 2 was running httpuv code. If the third thread is running code from later, it will look something like this:

(gdb) bt
#0  0x00007ffff51b99f3 in futex_wait_cancelable (private=<optimized out>, expected=0, 
    futex_word=0x7ffff001b458 <(anonymous namespace)::timer+120>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, 
    mutex=0x7ffff001b408 <(anonymous namespace)::timer+40>, 
    cond=0x7ffff001b430 <(anonymous namespace)::timer+80>) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x7ffff001b430 <(anonymous namespace)::timer+80>, 
    mutex=0x7ffff001b408 <(anonymous namespace)::timer+40>) at pthread_cond_wait.c:655
#3  0x00007fffefe0e25c in Timer::bg_main (this=0x7ffff001b3e0 <(anonymous namespace)::timer>)
    at timer_posix.cpp:21
#4  0x00007fffefe0e1c4 in Timer::bg_main_func (data=0x7ffff001b3e0 <(anonymous namespace)::timer>)
    at timer_posix.cpp:10
#5  0x00007ffff51b36db in start_thread (arg=0x7fffeeed6700) at pthread_create.c:463
#6  0x00007ffff745788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

FWIW, in a clean R session, you can cause later to start a new thread with the following:

later(function() cat("hello\n"))

I'm not sure at this point exactly how the thread and signal handling is working, but if it is the case that the SIGPIPE is caused by something that happens on thread 2 (httpuv), and then it causes that thread to execute internal R code, that could trigger the kind of problems you've seen. R is not thread-safe, so we have tried very hard in httpuv to not accidentally execute internal R code on a background thread. But a signal handler could conceivably bypass the safeguards we've put in place to avoid these problems.

@atheriel
Copy link
Contributor

In my debugging it was always the second thread (the one executing uv_write()) that was receiving SIGPIPE and therefore causing the problem, but it's possible that it is only in the presence of the later thread that this happens, which would be consistent with what I observed. That may be pure coincidence, however.

It might be possible to prevent the threads started by httpuv from receiving SIGPIPE via pthread_sigmask(), but I don't have a sophisticated-enough understanding of how the package (and the underlying libuv library) work to actually test this. I know that in the past httpuv turned off SIGPIPE handling explicitly, but I'm not sure where that code went to after the version 1.4 refactor -- was it removed for a reason?

@wch
Copy link
Collaborator

wch commented Aug 13, 2018

@atheriel You're right: previously, httpuv did ignore SIGPIPE (rstudio/httpuv@fe95d0d8). It's possible this workaround got lost when we a big overhaul of httpuv's internals.

@schloerke
Copy link
Collaborator

schloerke commented Oct 17, 2018

Remaining actions:

  • httpuv gets a CRAN release
  • Need to update plumber to require httpuv > 1.4.5 in Description

@schloerke schloerke added the effort: low < 1 day of work label Oct 17, 2018
@schloerke schloerke self-assigned this Oct 17, 2018
@schloerke schloerke added the priority: high Must be fixed before next release label Oct 17, 2018
@schloerke
Copy link
Collaborator

remaining actions moved to PR above. Still waiting on httpuv release

@schloerke
Copy link
Collaborator

Merged to master. Currently, httpuv is still a remote.

@atheriel Sorry for the delay, I thought httpuv would be released sooner.

@arpankbasak
Copy link

Is there a direct way to resolve the segmentation fault? I have tried subsetting the GTF file but it shouts the same as @shapenaji error.

@wch
Copy link
Collaborator

wch commented Aug 30, 2019

@arpankbasak If you're encountering a similar problem, please file a new issue, preferably with a reproducible example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: advanced Best for maintainers to address effort: low < 1 day of work priority: high Must be fixed before next release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants