Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process hangs when trying to use LightGBM with RestRServe #2217

Closed
demirev opened this issue Jun 4, 2019 · 9 comments
Closed

Process hangs when trying to use LightGBM with RestRServe #2217

demirev opened this issue Jun 4, 2019 · 9 comments
Labels

Comments

@demirev
Copy link

@demirev demirev commented Jun 4, 2019

I am trying to train a LightGBM model using the R library, and then serve it using RestRServe. However after making a request to the microservice the process hangs and I receive no response. I also don't receive any error or warning messages (usually when there is an R error, RestRServe will return the callback to the console).

RestRServe (and RServe as far as I know) fork the R process every time a call is made to the API. Running top reveals that the forked process has been created and some memory has been allocated (In the actual workflow where I encountered the issue CPU usage by the forked process initially increases then drops to 0. In the example below CPU usage is negligible, so I couldn't track it).

Here is a minimal reproducible example:

library(lightgbm)
library(RestRserve)

data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train
test <- agaricus.test
bst <- lightgbm(
  data = train$data,
  label = train$label,
  num_leaves = 4,
  learning_rate = 1,
  nrounds = 2,
  objective = "binary"
)


dummy_api_function  <- function(request, response) {
  result <- predict(bst, test$data)[1]
  response$body <- jsonlite::toJSON(result)
  response$content_type <- "application/json"
  response$headers <- character(0)
  response$status_code <- 200L
  forward()
}

RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_post(path = "/api/dummy_api", FUN = dummy_api_function)
RestRserveApp$run(8001)

And an example of a curl request to test the api (submitting this request results in the process hanging):

curl --header "Content-Type: application/json" --request POST --data '{"foo":"bar"}' localhost:8001/api/dummy_api

For a working example, replace result <- predict(bst, test$data)[1] with result <- 1 and submit the same curl request.

Here is some info about my environment:

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=bg_BG.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=bg_BG.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=bg_BG.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=bg_BG.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-12       RestRserve_0.1.0.13 lightgbm_2.2.4      R6_2.4.0           

loaded via a namespace (and not attached):
[1] compiler_3.4.4    magrittr_1.5      tools_3.4.4       yaml_2.2.0        Rserve_1.7-3      grid_3.4.4        data.table_1.12.0 jsonlite_1.6     
[9] lattice_0.20-35
@StrikerRUS StrikerRUS added the r-package label Jun 4, 2019
@deann88

This comment has been minimized.

Copy link

@deann88 deann88 commented Jun 7, 2019

I have stumbled upon the same issue, and it actually is a very big problem for R in general, as if you are developing models for production, lgb is one of the most useful libraries out there, and if you can not deploy it with Rserve, what else could one be using? Python!?
I believe RestRserve is a great contribution to the R ecosystem and the R community should pay more attention to productionizing models using R.

@s-u

This comment has been minimized.

Copy link

@s-u s-u commented Jun 7, 2019

This is likely a problem in LightGBM in that it's probably not fork-safe, so try loading the package in the client code.

@StrikerRUS

This comment has been minimized.

Copy link
Collaborator

@StrikerRUS StrikerRUS commented Jun 12, 2019

I guess Question 11 in our FAQ is about this issue.
Ping @Laurae2

@deann88

This comment has been minimized.

Copy link

@deann88 deann88 commented Jun 12, 2019

I confirm this fixes my issue - both nthreads and num_threads work in R.
Thanks, @StrikerRUS!

@demirev

This comment has been minimized.

Copy link
Author

@demirev demirev commented Jun 12, 2019

Yes, seems that was it. I'm closing this issue. Thank you for looking into it!

@demirev demirev closed this Jun 12, 2019
@Laurae2

This comment has been minimized.

Copy link
Collaborator

@Laurae2 Laurae2 commented Jun 12, 2019

@demirev @deann88 @s-u For exact details, this is due to gcc compiler implementation of OpenMP which results in this issue: using OpenMP then forking, causes any OpenMP code to hang indefinitely in any fork (this is a known issue and no solution to it exists yet).

To avoid this issue, one should compile R with another compiler which is not gcc if OpenMP (any OpenMP code) followed by forking + OpenMP is a must. Example: icc (not "free") instead of gcc.

@s-u

This comment has been minimized.

Copy link

@s-u s-u commented Jun 12, 2019

@Laurae2 thanks for the details, that's very useful. We have seen issues with gomp in the past, so it's not really a surprise. I have just confirmed that iomp doesn't seem to have that issue while gomp does, so using clang instead of gcc seems to fix the problem.

@dselivanov

This comment has been minimized.

Copy link

@dselivanov dselivanov commented Jun 12, 2019

@s-u does it mean that R should be build with clang? Or it will be enough to just build Rserve and/or LightGBM with clang?

@s-u

This comment has been minimized.

Copy link

@s-u s-u commented Jun 12, 2019

Building LightGBM with clang should be sufficient as a first shot, but make sure the package actually gets build with clang - the most important part is to make sure the linking is done against libomp. The general advice would be to build R with clang (which is what I do for CRAN releases) since that way everything is compatible and uses clang. In essence, all OpenMP code should be built with clang so it doesn't suffer from the issue.

One thing I didn't try is you could attempt to swap libgomp for libomp without re-building - it used to work since both were ABI-compatible, but I don't know if that is still the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.