Error: cannot allocate vector of size 2.0 Gb #3

Nadahuihui · 2019-07-14T03:28:00Z

hi sir,
loggle package is a very handy tool! recently, i will use it to handle some gene data,but there are some poblems.
p=500, It’s okay to implement ,at this time I set "num.thread =1"; When P increases to 1000, I set "num.thread =4"， the operation is out of order. However, what I need is P=5000.
Code and error are as follows：

result1 <- loggle.cv.h(braingene, pos, h = 0.26,

                  d.list = c(0, 0.02, 0.05),

                  lambda.list = 10 ^ c(-0.5, -0.3), cv.fold = 3,

                  fit.type = "pseudo", num.thread = 4)

Using d.list: 0 0.02 0.05
Using lambda.list: 0.3162278 0.5011872
Detrending each variable in data matrix...

Running fold 1 out of 3 folds...
Generating sample correlation matrices for training dataset...
Error: cannot allocate vector of size 2.0 Gb
Called from: array(0, c(p, p, N))
Browse[1]>
the array are as follows:
function (data = NA, dim = length(data), dimnames = NULL)
{
if (is.atomic(data) && !is.object(data))
return(.Internal(array(data, dim, dimnames)))
data <- as.vector(data)
if (is.object(data)) {
dim <- as.integer(dim)
if (!length(dim))
stop("'dim' cannot be of length 0")
vl <- prod(dim)
if (length(data) != vl) {
if (vl > .Machine$integer.max)
stop("'dim' specifies too large an array")
data <- rep_len(data, vl)
}
if (length(dim))
dim(data) <- dim
if (is.list(dimnames) && length(dimnames))
dimnames(data) <- dimnames
data
}
else .Internal(array(data, dim, dimnames))
}

The text was updated successfully, but these errors were encountered:

jlyang1990 · 2019-07-15T23:12:52Z

Hi,

Thanks for your interest in this package. This issue is most likely due to "out of memory error". I have listed some suggestions for you, and you could take one or several out of them and see whether they would help resolve this issue.

Run your code on a server instead of a personal laptop, since the server would have much larger memory to store your data.
Reduce the number of time points. For example, if there are 10,000 time points in your data, you can split them into 10 consecutive pieces with 1,000 time points each, and fit "loggle" on each of them. Just keep in mind that all the hyper-parameters used in "loggle" should be adjusted accordingly.
Reduce the number of dimension P. You mentioned that you eventually need P=5000, which would occupy a huge memory space according to my experience. Before fitting "loggle", you could double check whether all the 5000 genes are needed, and whether they could be naturally split into several clusters, say 10 clusters each with P=500, and fit "loggle" on each of them.
Increase the value of lambda and decrease the value of d. This may not resolve your issue directly, but is a good practice to save memory in your follow-up model fitting procedure.

Let me know if these suggestions would be helpful to you.

Nadahuihui · 2019-07-20T14:38:18Z

Hi, Thanks for your help. Recently, I further compressed the variable to p=500, then adjusted the parameters, trying to make the edge of each time point around 150, and the node is above 100.There were 2 problems in the process of adjusting the parameters, which hindered my process: Ⅰ. when h=0.26,d.list=c(0.15,0.3,0.8),lambda.list=c(0.25,0.3),the computing process is stuck at a certain time point. Ⅱ.when set num.thread≥2，always prompting an error:there are no "graph.adjacency" function. I hope you can give your opinion. thank you. huihui At 2019-07-16 07:12:52, "Jilei Yang" <notifications@github.com> wrote: Hi, Thanks for your interest in this package. This issue is most likely due to "out of memory error". I have listed some suggestions for you, and you could take one or several out of them and see whether they would help resolve this issue. Run your code on a server instead of a personal laptop, since the server would have much larger memory to store your data. Reduce the number of time points. For example, if there are 10,000 time points in your data, you can split them into 10 consecutive pieces with 1,000 time points each, and fit "loggle" on each of them. Just keep in mind that all the hyper-parameters used in "loggle" should be adjusted accordingly. Reduce the number of dimension P. You mentioned that you eventually need P=5000, which would occupy a huge memory space according to my experience. Before fitting "loggle", you could double check whether all the 5000 genes are needed, and whether they could be naturally split into several clusters, say 10 clusters each with P=500, and fit "loggle" on each of them. Increase the value of lambda and decrease the value of d. This may not resolve your issue directly, but is a good practice to save memory in your follow-up model fitting procedure. Let me know if these suggestions would be helpful to you. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jlyang1990 · 2019-07-20T21:43:47Z

Hi Huihui,

I see d=0.8 might be too large to your use case. Could you try h=0.26, d.list=c(0.15,0.3), lambda.list=c(0.25,0.3) instead? If it still doesn't work, could you send your corresponding data and code to me so that I can test run on my side if possible?
You may need to install R package "igraph" in advance.

Let me know if these would be helpful.

Nadahuihui · 2019-07-22T09:04:43Z

When I set the parameter :h=0.2,d.list = c(0.05,0.2,0.3),lambda.list = c(0.2, 0.25,0.3) the result can be output, but the result does not achieve the effect I expected.Through the introduction of your paper, as well as some of my personal practices, there are still some questions I would like to ask. 1. h set default = 0.8*N^(-1/5),what is the principle, and what rules should be followed for artificial settings. 2.The position of the time point you used the example is only contains 90% of the samples（pos <- round(seq(0.1, 0.9, length=9)*(ncol(X)-1)+1)）, why didn't it reach 100%. My own set of 100%, does this affect my results? 3.When I use the example you provided to do loggle.cv.h, the d and λ selected at each time point are different, which makes the number of edges relatively evenly distributed over time. But when I use my own data, the parameters selected at each time point are the maximum values in the range I provide. The distribution of edges always changes from large to small, and the cv.score output is INF, I think This will affects my results. I hope that you can provide some suggestions and thank you for your help. huihui, At 2019-07-21 05:43:47, "Jilei Yang" <notifications@github.com> wrote: Hi Huihui, I see d=0.8 might be too large to your use case. Could you try h=0.26, d.list=c(0.15,0.3), lambda.list=c(0.25,0.3) instead? If it still doesn't work, could you send your corresponding data and code to me so that I can test run on my side if possible? You may need to install R package "igraph" in advance. Let me know if these would be helpful. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jlyang1990 · 2019-10-09T23:32:27Z

Hi Huihui,

Sorry for the late reply since I accidentally haven't received the reminder email of your question.

This is derived from the theory of kernel density estimation. You may check this wiki https://en.wikipedia.org/wiki/Kernel_density_estimation for more details. In practice, we can set h to be within the grid 0.1, 0.15, 0.2, 0.25 and 0.3, and check the performance.
"pos" is to determine the positions where we build the graphs. We always use 100% of data in estimating the graphs. For example, we have observed data at time point 1, 2, ..., 101, and we only build the graphs at time point 1, 11, 21, ..., 101 by setting pos = c(1, 11, 21, ..., 101).
This issue is quite weird. I would suggest to check the estimated graphs to see if some of them are too sparse as the selected λ value is quite large. Also, check the estimated graphs corresponding to INF cv.scores to see if they share some similar patterns.

Let me know if these would be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: cannot allocate vector of size 2.0 Gb #3

Error: cannot allocate vector of size 2.0 Gb #3

Nadahuihui commented Jul 14, 2019

jlyang1990 commented Jul 15, 2019

Nadahuihui commented Jul 20, 2019 via email

jlyang1990 commented Jul 20, 2019

Nadahuihui commented Jul 22, 2019 via email

jlyang1990 commented Oct 9, 2019

Error: cannot allocate vector of size 2.0 Gb #3

Error: cannot allocate vector of size 2.0 Gb #3

Comments

Nadahuihui commented Jul 14, 2019

jlyang1990 commented Jul 15, 2019

Nadahuihui commented Jul 20, 2019 via email

jlyang1990 commented Jul 20, 2019

Nadahuihui commented Jul 22, 2019 via email

jlyang1990 commented Oct 9, 2019