New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] slow when function returning R6 object #539
Comments
I think |
The problem is reproducible after restarting the machine. |
@BlindApe
|
BTW, |
@BlindApe label must be a vector like in xgboost. |
@guolinke , @Laurae2 , thank for your support. As you see this code takes 1 minute with a matrix of 20 columns, nearly 5 minutes with 100 columns and an eternity (I'm still waiting a crash) with 1300 columns. This is the sessionInfo(): locale: attached base packages: other attached packages: loaded via a namespace (and not attached): It's a dual xeon server with 2 x E5-2690 v2 and 256 Gb RAM. Code:
|
With xgboost (dataset creation with 1300 columns takes a while, but setinfo is a blink):
|
I can't reproduce it. I am getting this: > library(R.utils)
> library(lightgbm)
> data <- matrix(1, 4000000, 20)
> label <- rep(0, 4000000)
> train <- lgb.Dataset(data = data, label = label)
> Laurae::timer_func_print(setinfo(train, 'label', label))
The function ran in 2.001 milliseconds.
[1] 2.000977 > library(R.utils)
> library(lightgbm)
> data <- matrix(1, 4000000, 100)
> label <- rep(0, 4000000)
> label2 <- rep(1, 4000000)
> train <- lgb.Dataset(data = data, label = label)
> Laurae::timer_func_print(setinfo(train, 'label', label2))
The function ran in 3.004 milliseconds.
[1] 3.00415 From a fully fresh session: > library(R.utils)
Loading required package: R.oo
Loading required package: R.methodsS3
R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
R.oo v1.21.0 (2016-10-30) successfully loaded. See ?R.oo for help.
Attaching package: ‘R.oo’
The following objects are masked from ‘package:methods’:
getClasses, getMethods
The following objects are masked from ‘package:base’:
attach, detach, gc, load, save
R.utils v2.5.0 (2016-11-07) successfully loaded. See ?R.utils for help.
Attaching package: ‘R.utils’
The following object is masked from ‘package:RevoMods’:
timestamp
The following object is masked from ‘package:utils’:
timestamp
The following objects are masked from ‘package:base’:
cat, commandArgs, getOption, inherits, isOpen, parse, warnings
> library(lightgbm)
Loading required package: R6
Warning message:
package ‘R6’ was built under R version 3.3.3
> data <- matrix(1, 4000000, 100)
> label <- rep(0, 4000000)
> label2 <- rep(1, 4000000)
> train <- lgb.Dataset(data = data)
> Laurae::timer_func_print(setinfo(train, 'label', label2))
The function ran in 1.001 milliseconds.
[1] 1.000977 Using a laptop here. Did I miss something from your script? |
The problem is setinfo after lgb.Dataset. Doing data and label at same time isn't problem:
|
@BlindApe My last example was done without doing data and label at the same time. It is still taking about 1 millisecond. However, when I replace all 1s by 0s, I get the issue you mentioned (but it takes only 10 seconds here, not 5 minutes). @guolinke any reason why it is slower when using 0s instead of 1s? It seems to duplicate the data also, which duplicates the memory used. |
This is in other server, running different R version under Windows:
|
Can you change |
I was doing:
|
@BlindApe |
@BlindApe |
Seems the problem isn't in as.numeric(). In linux server:
In Windows server:
|
Time is proportional to rows and columns, and the behavior is the same both in Windows and Linux
|
81.27s: require(lightgbm)
data <- matrix(1, 4000000, 20)
label <- rep(0, 4000000)
tic = proc.time()[3]
label <- as.numeric(label)
cat(proc.time()[3] - tic, 'secs', '\n')
print(typeof(label))
print(head(label))
train <- lgb.Dataset(data = data)
tic = proc.time()[3]
train$setinfo('label', label)
cat(proc.time()[3] - tic, 'secs', '\n') // 81.27 secs but running this is really strange as it does exactly the same thing but faster!:
The behavior is really strange when out of a function and running in the global environment. |
I think the message I obtained first time I tried with 1152 columns could give a tip: Error in paste(as.character(obj), collapse = " ") : It's like internally all data were pasted in some way... |
@Laurae2 @BlindApe And it calls |
I ran this: library(lightgbm)
data <- matrix(1, 4000000, 20)
label <- rep(1, 4000000)
train <- lgb.Dataset(data = data)
debugonce(train$setinfo)
train$setinfo("label", label) It seems there is a problem with This post shows an issue with assignments on R6 objects and type checking leading to 30x slower assignments on small stuff: https://cran.r-project.org/web/packages/R6/vignettes/Performance.html |
@Laurae2 I've reproduced the behaviour doing inside a naive function:
So a temporary work around is call it inside a function. |
@Laurae2 What is the state of this issue now ? |
@guolinke Currently can't seem to fix this, I managed to delay the issue but this is not a good workaround. |
@BlindApe Your/Our workaround works until we print the lgb.unloader(wipe = TRUE)
require(lightgbm)
nothing <- function() {
data <- matrix(1, 4000000, 20)
label <- rep(0, 4000000)
tic = proc.time()[3]
label <- as.numeric(label)
cat(proc.time()[3] - tic, 'secs', '\n')
print(typeof(label))
print(head(label))
train <- lgb.Dataset(data = data)
tic = proc.time()[3]
train$setinfo('label', label)
cat(proc.time()[3] - tic, 'secs', '\n')
return(train)
}
train <- nothing() # FAST
train # SLOW |
@Laurae2 |
@guolinke The cause is in the screenshot: I have no idea what this In RStudio, doing this reproduces exactly the issue: library(profvis)
profvis({
library(lightgbm)
data <- matrix(1, 4000000, 20)
label <- rep(1, 4000000)
train <- lgb.Dataset(data = data)
train$setinfo("label", label)
print(train)
}) |
I think this https://stat.ethz.ch/R-manual/R-devel/library/base/html/invisible.html may can solve this issue. |
@guolinke the issue is still here when returning invisibly. I start to think it is the fault of the environment / R6 object, and not of LightGBM. The print function is 100% unrelated to LightGBM, even though LightGBM seems to create the issue. |
@Laurae2 after using |
@BlindApe can you try the latest code and see what happen ? |
@guolinke It defers the issue when printing. We may add a warning on docs about it. |
close since it is R6`s bug. |
BlindApe commentedMay 22, 2017
I'll give a detailed enviroment sheet and a reproducible example, but meanwhile this is the problem:
level1 has 4214186 rows and 1152 columns.
The text was updated successfully, but these errors were encountered: