Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxiter #76

Open
Lauren-Blake opened this issue Dec 6, 2017 · 28 comments
Open

maxiter #76

Lauren-Blake opened this issue Dec 6, 2017 · 28 comments

Comments

@Lauren-Blake
Copy link

Last night when running ash, I got the following message:

warning("Optimization failed to converge. Results may be unreliable. Try increasing maxiter and rerunning.")

  1. Can you please tell me what the default maxiter is so that I can increase it? Do you have any guidelines or recommendations for what I should increase maxiter to?

  2. The way that I was going to enter maxiter into ash is with the following command below (1000000 is arbitrary). Can you please confirm that you want to put it in the control argument? On the notes for ash.R, you may want to clarify which arguments can be included as part of the control. This was not clear in the current version.

result <- ash(betahat = x$coefficients[, coef], sebetahat = x$stdev.unscaled[, coef] * sqrt(x$s2.post), df = x$df.total[1], control = list(maxiter = 1000000))

@pcarbo
Copy link
Collaborator

pcarbo commented Dec 7, 2017

@Lauren-Blake Have you installed MOSEK on your computer? What OS do you have---Mac, Windows or Linux?

@jhsiao999
Copy link

@Lauren-Blake Here is the instruction for installing MOSEK on mac. Try it and see if you still get the same errors. https://github.com/stephens999/ashr/blob/master/inst/rmosek-mac.md

@pcarbo
Copy link
Collaborator

pcarbo commented Dec 7, 2017

@Lauren-Blake See also Issue #77 for updated instructions. It isn't completely straightforward to install, so feel free to email separately if you get stuck.

@stephens999
Copy link
Owner

default maxiter is 5000
yes, you set it in control
but better to install RMosek

@willwerscheid
Copy link
Contributor

I would like to re-open this issue because the warning message is vague and not really actionable as is. There is no maxiter parameter in ashr (nor in mixsqp). We should be more specific.

@orangebromeliad
Copy link

I have been getting the same error code and have no idea what it means. What is Rmosek and why would this solve the problem (whatever the problem is exactly?)?

@stephens999
Copy link
Owner

stephens999 commented Jun 14, 2019

some of the discussion above, including Rmosek, is now out of date and no longer relevant.

@orangebromeliad What version of ashr are you using, and do you have mixsqp installed?
mixsqp can be installed from CRAN:
install.packages("mixsqp")

@stephens999 stephens999 reopened this Jun 14, 2019
@lukeholman
Copy link

Hey there,

Just an FYI that I also had this error, leading me to find this page. I got the error running mash_1by1 in mashr. My naive suggestions to fix this are:

  • make it a bit easier to see what the default is for maxiter in ash, and make it easier to adjust (e.g. by explaining how to do it under ?ash or ?estimate_mixprop, or by making maxiter an argument with an obvious default rather than something that you interact with via the control argument)
  • add a control and/or maxiter argument to mash_1by1, for use by ash (which is called by mash_1by1)

@nicolerg
Copy link

I also got this error using mash_1by1. Increasing maxiter.sqp to 10,000 still did not resolve the problem. Should I just keep increasing maxiter, or is this some indication that there's something strange about my input data? I updated the mashed data to allow for correlations.

@stephens999
Copy link
Owner

stephens999 commented Feb 17, 2020 via email

@nicolerg
Copy link

Yes, I just removed and re-installed mixsqp to make sure.

There are a lot of missing values in my input because of genes only expressed in one of the 5 tissues included in the merged input (I'm testing logFC and corresponding stderrs from DE analyses performed independently in multiple tissues). About 1/3 of the genes have at least one NA value (35 measurements per gene). I thought this could be causing the problem, but when I restrict the input to complete cases only, I still get the same message:

Warning messages:
1: In estimate_mixprop(data, g, prior, optmethod = optmethod, control = control,  :
  Optimization failed to converge. Results may be unreliable. Try increasing maxiter and rerunning.
2: In estimate_mixprop(data, g, prior, optmethod = optmethod, control = control,  :
  Optimization failed to converge. Results may be unreliable. Try increasing maxiter and rerunning.
3: In estimate_mixprop(data, g, prior, optmethod = optmethod, control = control,  :
  Optimization failed to converge. Results may be unreliable. Try increasing maxiter and rerunning.
4: In estimate_mixprop(data, g, prior, optmethod = optmethod, control = control,  :
  Optimization failed to converge. Results may be unreliable. Try increasing maxiter and rerunning.

This is sort of a separate problem, but in case it sheds any more light on the issue, I am also getting a repeated warning: chol(): given matrix is not symmetric warning when I try to run mash just with the canonical covariance matrices:

warning: chol(): given matrix is not symmetric
 - Likelihood calculations took 2142.80 seconds.
 - Fitting model with 1361 mixture components.
Error in verify.likelihood.matrix(L) :
  Input argument "L" should be a numeric matrix with >= 2 columns, >= 1 rows, all its entries should be non-negative, finite and not NA, and some entries should be positive
Timing stopped at: 65.28 0.378 65.67

@stephens999
Copy link
Owner

@nicolerg
can you share a reproducible example and we can investigate?

@nicolerg
Copy link

@stephens999 definitely! Here is my data and the code that was throwing the errors. Please let me know if anything is unclear. Thank you!
mashr_issue76.zip

@stephens999
Copy link
Owner

@pcarbo so one of the problems here is that
it seems to be an example where mixSQP does not converge.

I did find mixIP works.
I found that increasing iterations in mixSQP did not help, but running more
initial em iterations did help.

I wonder if, in general, we should be suggesting using more em iterations rather than increasing
the mixSQP iterations?

here is my code:

library("ashr")
load('mashr_input_shared.RData') # e (effects), s (stderrs) (data.frames)
x = e[,10]
s = s[,10]
s = s[!is.na(x)]
x = x[!is.na(x)]

ash(x,s,mixcompdist = "normal")
ash(x,s,mixcompdist = "normal",control=list(maxiter.sqp = 10000))
ash(x,s,mixcompdist = "normal",control=list(numiter.em=100)) #converges

ash(x,s,mixcompdist = "normal",optmethod="mixIP") #converges

@stephens999
Copy link
Owner

@nicolerg if you try increasing number of em iterations in mixSQP this should fix the initial problem.
eg:

ashres = ash(Bhat[,i],Shat[,i],mixcompdist="normal", alpha=alpha, control=list(numiter.em=200), optmethod='mixSQP') # get ash results for first condition

@stephens999
Copy link
Owner

stephens999 commented Feb 17, 2020

@nicolerg can you also try replacing the missing values before running mash, as follows, to see if it fixes the other problem:
...
s[is.na(e)]=10000
e[is.na(e)]= 0
....

this intuitiion is that missing values effectively correspond to very large standard error (s)

@nicolerg
Copy link

@stephens999 replacing NAs with 0 effect/large stderr and increasing numiter.em solved both of those problems! Thank you very much for looking into this!

@stephens999
Copy link
Owner

btw, @nicolerg why is so much of your data missing?
We are now discussing ways to deal with missing data internally and it might be useful to understand why it occurs.

@pcarbo
Copy link
Collaborator

pcarbo commented Feb 18, 2020

@stephens999 Increasing the number of EM iterations would be one solution.

I do also find that the choice of pi_init is often problematic,

> round(pi_init,digits=4)
 [1] 0.9979 0.0001 0.0001 ...

probably because the likelihood surface can be very "flat" near this initial estimate (and explains why increasing the number of EM iterations is needed to "escape" this flat region); I have now seen several examples where convergence issues could be avoided if the default initialization (uniform) were used.

In this particular example, the convergence problem goes away if the mixsqp call is instead:

out <- mixsqp::mixsqp(A,w,control = control)

So I would recommend not using pi_init over increasing the number of EM iterations.

@stephens999
Copy link
Owner

stephens999 commented Feb 18, 2020

thanks @pcarbo I have pushed a fix to ashr to use the (1,1,1,..1) initialization for pi when using mixSQP.

@nicolerg can you try reinstalling ashr (from github)
and running mash_1by1 again?

@nicolerg
Copy link

@stephens999 Removing and reinstalling ashr with install_github("stephens999/ashr") optimization failed to converge with m.1by1 = mash_1by1(data.V) (ashr_2.2-41). I replaced the missing values as before.

My input data are DESeq2 logFCs and LFCstderrs from differential analysis performed independently in each of 5 bulk tissues, with gene expression (RNA-seq) at 7 time points per tissue (so each "sample" is the effects in one timepoint in one tissue relative to baseline in that tissue). Within each tissue, we filter out genes that are lowly expressed across all time points before performing DE analysis. DESeq also returns some NA values if genes are still lowly expressed for a given contrast. So my missing values are just from merging across 5 tissues with quite different expression profiles (i.e. lowly expressed in one or more of the 5 tissues).

@stephens999
Copy link
Owner

@pcarbo so with the updated ash (using default mixsqp initialization and default number of em iterations i believe) the column 10 succeeds, but columns 11 and 12 seem to still fail.

eg try:
load('mashr_input_shared.RData') # e (effects), s (stderrs) (data.frames)
x = e[,12]
s = s[,12]
s = s[!is.na(x)]
x = x[!is.na(x)]

ash(x,s,mixcompdist = "normal")

@stephens999
Copy link
Owner

@nicolerg if you are comparing several measurements against a common baseline,
this induces correlations that you need to take account of... we have unpublished methods for doing this, developed by Sarah Urbut and @zouyuxin

just want to make sure you are aware of these
https://github.com/stephenslab/mashr/blob/master/vignettes/intro_mashcommonbaseline.Rmd

@nicolerg
Copy link

@stephens999 thank you for pointing me to that. I think the correlation structure I have is more complex than the default behavior that the "common baseline" approach allows, but that probably means I have to think more about how to do this analysis.

I understand how the common baseline approach would be used if you had, for example, gene expression data at one pre-perturbation time point and several post-perturbation time points and you wanted to estimate the effects between the pre-perturbation time point and each of the post-perturbation time points, where all samples were from the same tissue.

My input data is this x5: each "sample" is a single post-perturbation time point in a single tissue, where effects are relative to a pre-perturbation control in the same tissue; I have 7 post-perturbation time points in each of 5 tissues. So I would have to indicate a separate reference sample for each set of conditions/time points from a given tissue.

In my mashr analysis, I updated the mashdata object to have a non-null correlation matrix, but it sounds like this is not sufficient. My motivation for using mashr is to find sharing or uniqueness of differentially expressed genes across tissues with temporal resolution, which is why I modeled time discretely and have effects at each time point. For example, I want to be able to see if a gene is not only differentially expressed in 2 or more tissues but also if they have similar magnitudes of effects at the same or different time points. Does this make sense? Thank you!

@stephens999
Copy link
Owner

stephens999 commented Feb 19, 2020 via email

@stephens999
Copy link
Owner

i added a ... to mash_1by1 to allow users to pass control to ash.
Currently debating a more informative error message.

@pcarbo
Copy link
Collaborator

pcarbo commented Mar 2, 2020

@nicolerg @lukeholman We have made several updates to mixsqp that should help with some of the issues you have been reporting.

@nicolerg
Copy link

nicolerg commented Mar 8, 2020

Thank you very much for your responsiveness!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants