Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in optimize #1

Closed
mcieslik-mctp opened this issue Jul 5, 2017 · 35 comments
Closed

error in optimize #1

mcieslik-mctp opened this issue Jul 5, 2017 · 35 comments

Comments

@mcieslik-mctp
Copy link

I tried running SAVER on a dataset of ~1500 cells. After approx. 24h on 64 cores the program crashed with the following message:

Error in optimize(calc.loglik.a, interval = c(0, var(y/sf)/mean(y/sf)^2),  :                                                                                                                                                                                             
   invalid 'xmin' value                                                                                                                                                                                                                                                   
 In addition: Warning message:                                                                                                                                                                                                                                            
 In matrix(gene.means, ngenes, ncells) :                                                                                                                                                                                                                                  
 data length [16562] is not a sub-multiple or multiple of the number of rows [16710]

Thanks!

@MaxKman
Copy link

MaxKman commented Jul 11, 2017

I encountered a similar error message just a few minutes after starting the algorithm. I used a very sparse dataset with ~10000 cells.

argument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAPredictions finished. Calculating posterior...
argument is not numeric or logical: returning NA
Show Traceback
Error in optimize(calc.loglik.b, interval = c(0, var(y/sf)/mean(y/sf)), : invalid 'xmin' value

Help appreciated. Thanks!

@mohuangx
Copy link
Owner

Sorry for the late response. I will take a look and will let you know when it is fixed.

@mohuangx
Copy link
Owner

SAVER v0.1.3 should be able to solve the issue. Please let me know if this same problem occurs.

@MaxKman
Copy link

MaxKman commented Jul 14, 2017

Thanks very much for addressing the issue. I tried to update the package but whatever I did packageVersion("SAVER") always returned 0.1.2. I am running Microsoft R Open 3.4.0.

I tried the following:
removing the package restarting the session +

  • install_github("mohuangx/SAVER")
  • install_github("mohuangx/SAVER@v0.1.3")
  • download as tar.gz from https://github.com/mohuangx/SAVER/releases and installing using install.packages(path_to_file, repos = NULL, type="source")

none of it worked.

EDIT: I found that the version name is not updated in the DESCRIPTION file so the installation probably worked but didn't solve the problem. I am still getting:

argument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAargument is not numeric or logical: returning NAPredictions finished. Calculating posterior...
argument is not numeric or logical: returning NA
Show Traceback
Error in optimize(calc.loglik.b, interval = c(0, var(y/sf)/mean(y/sf)), : invalid 'xmin' value

I sent you my dataset via email to make it easier to address the issue.

Best
Max

@mohuangx
Copy link
Owner

Hi Max,

Apologies. I forgot to update the version number in the DESCRIPTION file. However, it is still concerning that the issue is not resolved. I will try to run it on your dataset and will let you know how it goes.

Mo

@mohuangx
Copy link
Owner

Hi Max,

Sorry for the lengthy turnaround. I updated the package to version 0.2.0, which I was able to run without errors on your dataset. Let me know if you're able to run it as well.

Mo

@MaxKman
Copy link

MaxKman commented Jul 24, 2017 via email

@mohuangx
Copy link
Owner

Hi Max,

Try running the saver function on as.matrix(my.data), i.e.,
my.data.normalized <- saver(as.matrix(my.data), parallel = TRUE)

Mo

@MaxKman
Copy link

MaxKman commented Jul 24, 2017 via email

@MaxKman
Copy link

MaxKman commented Jul 26, 2017 via email

@mcieslik-mctp
Copy link
Author

Thanks! Our initial tests indicate that the problem is resolved, @MaxKman FYI it takes approx 24h on 64 cores w/ ~8k detected genes.

@mohuangx
Copy link
Owner

Hi Max,

I only ran it on 100 genes on 10 cores, which took about 3 hours (although the posterior calculation was performed on all ~20,000 genes). I would guess it might take around 60-80 hours running on 8 cores. I'm currently working on ways to speed up the program so look out for improvements in the coming versions!

@MaxKman
Copy link

MaxKman commented Aug 7, 2017

Thank you for all the help so far. I led saver run over the weekend on 10 cores. When I returned for work I found the following error:

Error in rownames<-(*tmp*, value = c("A1BG", "A1BG-AS1", "A2M", "A2M-AS1", :
length of 'dimnames' [1] not equal to array extent

Help appreciated!

Best
Max

@mohuangx
Copy link
Owner

mohuangx commented Aug 7, 2017

Hi Max,

Sorry for the inconvenience. Could you provide the command that you used to run saver and the version?

Thanks,
Mo

@MaxKman
Copy link

MaxKman commented Aug 8, 2017

Hey Mo,

I used saver 0.2.0 and the following commands:

cells.data <- read.delim("10681cells.txt.gz", header = TRUE)
rownames(cells.data) <- cells.data[,1]
cells.data <- cells.data[,2:ncol(cells.data)]

library(doParallel)
library(SAVER)
registerDoParallel(cores = 10)
cells.normalized <- saver(as.matrix(cells.data), parallel = TRUE)
save(cells.normalized, file="cells.normalized.rData")

The output was the following:

Removing 3 cells with zero expression.
Calculating predictions...
Approximate finish time: 2017-08-05 16:10:46
Running in parallel: 10 workers
Loading required package: Matrix
Loaded glmnet 2.0-10
Error in rownames<-(*tmp*, value = c("A1BG", "A1BG-AS1", "A2M", "A2M-AS1", :
length of 'dimnames' [1] not equal to array extent

@nicolee-mctp
Copy link

Hi Mo,

I'm a collaborator of OP, and we ran into another error with SAVER. We got it to work on one dataset fine a couple weeks ago, but when we tried to use it on another dataset, we got the following error:

library(SAVER)
library(doParallel)
registerDoParallel(cores = 64)
saver <- saver(as.matrix(mat), parallel = TRUE)
calculating predictions...
Approximate finish time: 2017-08-08 05:09:55
Running in parallel: 64 workers
Loading required package: Matrix
Loaded glmnet 2.0-10

Predictions finished. Calculating posterior...
Error in mu[pred.genes, ] <- mu.par : 
  number of items to replace is not a multiple of replacement length

Any insights on how to fix this? Thanks!

Nicole

@mohuangx
Copy link
Owner

mohuangx commented Aug 8, 2017

Hi Max,

I ran SAVER 0.2.1 on a subset of the dataset and was able to get it to run without any errors. Could you try running it on a subset using your current version SAVER 0.2.0 to see if you get the same error and then try updating to SAVER 0.2.1?

Sorry again for the repeated issues.

Mo

@mohuangx
Copy link
Owner

mohuangx commented Aug 8, 2017

Hi Nicole,

It appears that you're using an older version of SAVER. Try reinstalling SAVER and run it on a subset of the dataset to see if it works, and if it works then try running it on the full dataset. Please let me know if you are still getting an error.

Mo

@MaxKman
Copy link

MaxKman commented Aug 12, 2017

Hi Mo,

I ran a small subset of my dataset with SAVER 0.2.1 and it went through fine. After that I tried the whole dataset and it ran for 5 days on 10 cores when finally returning an error. See below:

Removing 3 cells with zero expression.
Calculating predictions for 11841 genes using 10678 cells...
Approximate finish time: 2017-08-11 01:17:54
Running in parallel: 10 workers
Loading required package: Matrix
Loaded glmnet 2.0-10

Error in out[[i]][lasso.genes, ] <- lasso[[i]] :
number of items to replace is not a multiple of replacement length

save(cells.normalized, file="cells.normalized.rData")
Error in save(cells.normalized, file = "cells.normalized.rData") :
object ‘cells.normalized’ not found

The commands I used are the same as posted above.

@mohuangx
Copy link
Owner

Hi Max,

Sorry for the repeated errors. I will try running it on the entire dataset and will get back to you when it's finished.

Mo

@MaxKman
Copy link

MaxKman commented Aug 12, 2017

Great, thank you!

@mohuangx
Copy link
Owner

Hi Max,

I updated SAVER to version 0.2.2 and was able to run it on your dataset without any problems. Hopefully it will finally work for you.

Mo

@MaxKman
Copy link

MaxKman commented Aug 13, 2017

Hi Mo,

thanks very much! I will attempt another run tomorow. Did you by any chance save the results from your run and could send it to me?

Best
Max

@mohuangx
Copy link
Owner

Hi Max,

Sure, I will email you a link.

Mo

@MaxKman
Copy link

MaxKman commented Aug 23, 2017

Hi Mo,

this time everything worked without throwing an error. I had to set nzero = 50 like you did for it to work though. Thank you for your help!

Best
Max

@mohuangx
Copy link
Owner

Hi Max,

Thanks for the response. Did it not work when nzero was not specified for SAVER version 0.2.2?

Thanks,
Mo

@MaxKman
Copy link

MaxKman commented Aug 31, 2017

Exactly. Unfortunately I didn't log the error message this time but from what I remember it was similiar to the one before.

@mohuangx
Copy link
Owner

mohuangx commented Sep 1, 2017

Thanks Max for bringing this to my attention. I'll try and see what the problem is.

@nicolee-mctp
Copy link

Hi Mo,

I got the same error as Max even though I set nzero (10 to match my other analyses). I verified that I was using version 0.2.2.

> mat <- read.csv("data/raw/counts.csv", row.names = 1)
> mat <- as.matrix(mat)
> library(SAVER)
> registerDoParallel(cores = 64) 
> saver <- saver(mat, parallel = TRUE, nzero = 10)

Calculating predictions for 19205 genes using 9393 cells...
Approximate finish time: 2017-09-13 13:05:06
Running in parallel: 64 workers
Loading required package: Matrix
Loaded glmnet 2.0-10

Error in out[[i]][lasso.genes, ] <- Reduce(rbind, lapply(lasso, `[[`,  : 
  number of items to replace is not a multiple of replacement length

@mohuangx
Copy link
Owner

Hi Nicole,

Sorry for the error. Do you mind sharing the dataset so that I can try to diagnose the problem?

Thanks,
Mo

@nicolee-mctp
Copy link

Hi Mo,

I emailed you yesterday with the dataset. Please let me know if you got it.

Thanks!
Nicole

@mohuangx
Copy link
Owner

Hi Nicole,

Thanks for emailing me the dataset! I'm currently running it and will hopefully identify the error soon.

Mo

@fanli-gcb
Copy link

Those types of errors are commonly seen in parallel computation when jobs die unexpectedly (ie due to lack of memory). Perhaps reducing the number of cores down from 64 would help?

@nicolee-mctp
Copy link

@fanli-gcb Thanks for the suggestion! I'll try that now, hopefully that will fix the problem.

@mohuangx
Copy link
Owner

@fanli-gcb Thanks for pointing this out. Indeed, this seemed to be where the bottleneck was, since Reduce was being used to combine the list of lists, which is computationally intensive.

I have updated the combine function to use unlist instead, which is much faster. The changes can be found in SAVER version 0.3.0.

@nicolee-mctp I ran your dataset with SAVER version 0.3.0 and was able to get results without any issues. I sent you an email with a link to the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants