Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: cannot allocate vector of size 2.0 Gb #3

Open
Nadahuihui opened this issue Jul 14, 2019 · 5 comments
Open

Error: cannot allocate vector of size 2.0 Gb #3

Nadahuihui opened this issue Jul 14, 2019 · 5 comments

Comments

@Nadahuihui
Copy link

hi sir,
loggle package is a very handy tool! recently, i will use it to handle some gene data,but there are some poblems.
p=500, It’s okay to implement ,at this time I set "num.thread =1"; When P increases to 1000, I set "num.thread =4", the operation is out of order. However, what I need is P=5000.
Code and error are as follows:

result1 <- loggle.cv.h(braingene, pos, h = 0.26,

  •                   d.list = c(0, 0.02, 0.05), 
    
  •                   lambda.list = 10 ^ c(-0.5, -0.3), cv.fold = 3,
    
  •                   fit.type = "pseudo", num.thread = 4)
    

Using d.list: 0 0.02 0.05
Using lambda.list: 0.3162278 0.5011872
Detrending each variable in data matrix...

Running fold 1 out of 3 folds...
Generating sample correlation matrices for training dataset...
Error: cannot allocate vector of size 2.0 Gb
Called from: array(0, c(p, p, N))
Browse[1]>
the array are as follows:
function (data = NA, dim = length(data), dimnames = NULL)
{
if (is.atomic(data) && !is.object(data))
return(.Internal(array(data, dim, dimnames)))
data <- as.vector(data)
if (is.object(data)) {
dim <- as.integer(dim)
if (!length(dim))
stop("'dim' cannot be of length 0")
vl <- prod(dim)
if (length(data) != vl) {
if (vl > .Machine$integer.max)
stop("'dim' specifies too large an array")
data <- rep_len(data, vl)
}
if (length(dim))
dim(data) <- dim
if (is.list(dimnames) && length(dimnames))
dimnames(data) <- dimnames
data
}
else .Internal(array(data, dim, dimnames))
}

@jlyang1990
Copy link
Owner

Hi,

Thanks for your interest in this package. This issue is most likely due to "out of memory error". I have listed some suggestions for you, and you could take one or several out of them and see whether they would help resolve this issue.

  1. Run your code on a server instead of a personal laptop, since the server would have much larger memory to store your data.
  2. Reduce the number of time points. For example, if there are 10,000 time points in your data, you can split them into 10 consecutive pieces with 1,000 time points each, and fit "loggle" on each of them. Just keep in mind that all the hyper-parameters used in "loggle" should be adjusted accordingly.
  3. Reduce the number of dimension P. You mentioned that you eventually need P=5000, which would occupy a huge memory space according to my experience. Before fitting "loggle", you could double check whether all the 5000 genes are needed, and whether they could be naturally split into several clusters, say 10 clusters each with P=500, and fit "loggle" on each of them.
  4. Increase the value of lambda and decrease the value of d. This may not resolve your issue directly, but is a good practice to save memory in your follow-up model fitting procedure.

Let me know if these suggestions would be helpful to you.

@Nadahuihui
Copy link
Author

Nadahuihui commented Jul 20, 2019 via email

@jlyang1990
Copy link
Owner

Hi Huihui,

  1. I see d=0.8 might be too large to your use case. Could you try h=0.26, d.list=c(0.15,0.3), lambda.list=c(0.25,0.3) instead? If it still doesn't work, could you send your corresponding data and code to me so that I can test run on my side if possible?
  2. You may need to install R package "igraph" in advance.

Let me know if these would be helpful.

@Nadahuihui
Copy link
Author

Nadahuihui commented Jul 22, 2019 via email

@jlyang1990
Copy link
Owner

Hi Huihui,

Sorry for the late reply since I accidentally haven't received the reminder email of your question.

  1. This is derived from the theory of kernel density estimation. You may check this wiki https://en.wikipedia.org/wiki/Kernel_density_estimation for more details. In practice, we can set h to be within the grid 0.1, 0.15, 0.2, 0.25 and 0.3, and check the performance.
  2. "pos" is to determine the positions where we build the graphs. We always use 100% of data in estimating the graphs. For example, we have observed data at time point 1, 2, ..., 101, and we only build the graphs at time point 1, 11, 21, ..., 101 by setting pos = c(1, 11, 21, ..., 101).
  3. This issue is quite weird. I would suggest to check the estimated graphs to see if some of them are too sparse as the selected λ value is quite large. Also, check the estimated graphs corresponding to INF cv.scores to see if they share some similar patterns.

Let me know if these would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants