split_indices crashes on not-so-large input #179

Closed
vladpetyuk opened this Issue Nov 18, 2013 · 1 comment

Projects

None yet

1 participant

@vladpetyuk

Here is a toy example where acast crashes (seg_fault) because of split_indices. The memory footprints of the objects does not seem to be an issues for 16Gb computer.
A similar issues has been discussed here:
http://stackoverflow.com/questions/14548570/plyr-split-indices-function-crashes-for-long-vectors

# This example is fine
library("reshape2")
indata <- data.frame(A=rep(1:10000,20), B=rep(1:100,200)) 
print(object.size(indata),units="Mb") # 1.5 Mb
outdata <- acast(indata, A ~ B)
print(object.size(outdata),units="Mb") # 4.4 Mb

# This one crashes
indata <- data.frame(A=rep(1:100000,20), B=rep(1:100,2000)) 
print(object.size(indata),units="Mb") # 15.3 Mb
outdata <- acast(indata, A ~ B) # <- crashes here !!
print(object.size(outdata),units="Mb")

The problem seems to be in this call

.Call("split_indices", group, as.integer(n))
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape2_1.2.2

loaded via a namespace (and not attached):
[1] plyr_1.8      stringr_0.6.2 tools_3.0.1  

Thanks!
Vlad

@vladpetyuk vladpetyuk closed this Nov 18, 2013
@vladpetyuk

Solved by installing latest version from straight from github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment